Re: Annotating using DBPedia ontology

srecko joksimovic Thu, 12 Jan 2012 09:42:20 -0800

Hi Rupert,

I have another question, and I will finish soon.


I tried to annotate pdf document, and I didn't get result I expected. Then
I put string you sent to me
"John Smith works for the Apple Inc. in Cupertino, California."
in MS Word document, and this is the result I got:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns:j.0="http://www.semanticdesktop.org/ontologies/2007/01/19/nie#";
    xmlns:j.1="http://purl.org/dc/terms/";
    xmlns:j.2="http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#";
    xmlns:j.3="http://fise.iks-project.eu/ontology/"; >
  <rdf:Description
rdf:about="urn:enhancement-55016818-eb97-7b98-521a-422e3742173b">
    <rdf:type rdf:resource="
http://fise.iks-project.eu/ontology/TextAnnotation"/>
    <j.1:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string
">org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine</j.1:creator>
    <j.1:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime
">2012-01-12T17:34:20.288Z</j.1:created>
    <j.3:extracted-from
rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f"/>
    <rdf:type rdf:resource="http://fise.iks-project.eu/ontology/Enhancement
"/>
    <j.1:language>fr</j.1:language>
  </rdf:Description>
  <rdf:Description
rdf:about="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f">
    <rdf:type rdf:resource="
http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#PaginatedTextDocument
"/>
    <j.0:plainTextContent>Microsoft Word-Dokument&#xD;
srecko</j.0:plainTextContent>
  </rdf:Description>
  <rdf:Description
rdf:about="urn:enhancement-0644a1ed-f1d8-334d-d4e9-690a0446cba8">
    <j.3:confidence rdf:datatype="http://www.w3.org/2001/XMLSchema#double
">1.0</j.3:confidence>
    <rdf:type rdf:resource="
http://fise.iks-project.eu/ontology/TextAnnotation"/>
    <j.1:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string
">org.apache.stanbol.enhancer.engines.metaxa.MetaxaEngine</j.1:creator>
    <j.1:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime
">2012-01-12T17:34:20.273Z</j.1:created>
    <j.3:extracted-from
rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f"/>
    <rdf:type rdf:resource="http://fise.iks-project.eu/ontology/Enhancement
"/>
  </rdf:Description>
</rdf:RDF>


and this is the code:

public List<String> Annotate(byte[] _stream_to_annotate,
ServiceUtils.MIMETypes _content_type, String _encoding)
{
List<String> _return_list = new ArrayList<String>();
try
{
URL url = new URL(ServiceUtils.SERVICE_URL);
HttpURLConnection con = (HttpURLConnection)url.openConnection();

con.setDoOutput(true);
con.setRequestMethod("POST");
con.setRequestProperty("Accept", "application/rdf+xml");
con.setRequestProperty("Content-type", _content_type.getValue());

java.io.OutputStream out = con.getOutputStream();

IOUtils.write(_stream_to_annotate, out);
IOUtils.closeQuietly(out);

con.connect(); //send the request

if(con.getResponseCode() > 299)
{
java.io.InputStream errorStream = con.getErrorStream();

if(errorStream != null)
{
String errorMessage = IOUtils.toString(errorStream);

IOUtils.closeQuietly(errorStream);
}
else
{
//no error data
//write default error message with the status code

}
}
else
{
Model model = ModelFactory.createDefaultModel();

java.io.InputStream enhancementResults = con.getInputStream();

model.read(enhancementResults, null);
String queryStringForGraph =  "PREFIX t: <
http://fise.iks-project.eu/ontology/> " +
"SELECT ?label WHERE {?alias t:entity-reference ?label}";

Query query = QueryFactory.create(queryStringForGraph);

QueryExecution qe = QueryExecutionFactory.create(query, model);

ResultSet results = qe.execSelect();
while(results.hasNext())
{
_return_list.add(results.next().toString());
}
}
}
catch(Exception ex)
{
System.out.println(ex.getMessage());
}
return _return_list;
}

On Thu, Jan 12, 2012 at 8:32 AM, srecko joksimovic <
[email protected]> wrote:

>
> Hi Rupert,
>
> Thank you for the answer. I've probably missed that.
>
> Best,
> Srecko
>
>
> On Thu, Jan 12, 2012 at 6:12 AM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> Hi Srecko
>>
>> I think the last time I directly used this API is about 3-4 years ago,
>> but after a look at the http client tutorial [1] I think the reason for
>> your problem is that you do not execute the GetMethod.
>>
>> Based on this tutorial the code should look like
>>
>>    // Create an instance of HttpClient.
>>    HttpClient client = new HttpClient();
>>    GetMethod get = new GetMethod(url);
>>    try {
>>        // Execute the method.
>>        int statusCode = client.executeMethod(get);
>>        if (statusCode != HttpStatus.SC_OK) {
>>            //handle the error
>>        }
>>        InputStream t_is = get.getResponseBodyAsStream();
>>        //read the data of the stream
>>    }
>>
>> In addition you should not use a Reader if you want to read byte oriented
>> data from the input stream.
>>
>> hope this helps
>> best
>> Rupert
>>
>> [1] http://hc.apache.org/httpclient-3.x/tutorial.html
>>
>> On 11.01.2012, at 22:34, Srecko Joksimovic wrote:
>>
>> > That's it. Thank you!
>> > I have already configured KeywordLinkingEngine when I used my own
>> ontology.
>> > I think I'm familiar with that and I will try that option too.
>> >
>> > In meanwhile I found another interesting problem. I tried to annotate
>> > document and web page. With web page, I tried
>> > IOUtils.write(byte[], out) and I had to convert URL to byte[]:
>> >
>> > public static byte[] GetBytesFromURL(String _url) throws IOException
>> > {
>> >       GetMethod get = new GetMethod(_url);
>> >       InputStream t_is = get.getResponseBodyAsStream();
>> >       byte[] buffer = new byte[1024];
>> >       int count = -1;
>> >       Reader t_url_reader = new BufferedReader(new
>> > InputStreamReader(t_is));
>> >       byte[] t_bytes = IOUtils.toByteArray(t_url_reader, "UTF-8");
>> >
>> >       return t_bytes;
>> > }
>> >
>> > But, the problem is that I'm getting null for InputStream.
>> >
>> > Any ideas?
>> >
>> > Best,
>> > Srecko
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Rupert Westenthaler [mailto:[email protected]]
>> > Sent: Wednesday, January 11, 2012 22:08
>> > To: Srecko Joksimovic
>> > Cc: [email protected]
>> > Subject: Re: Annotating using DBPedia ontology
>> >
>> >
>> > On 11.01.2012, at 21:41, Srecko Joksimovic wrote:
>> >> Hi Rupert,
>> >>
>> >> When I load localhost:8080/engines it says this:
>> >>
>> >> There are currently 5 active engines.
>> >> org.apache.stanbol.enhancer.engines.metaxa.MetaxaEngine
>> >> org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine
>> >>
>> >
>> org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhanc
>> >> ementEngine
>> >>
>> >
>> org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng
>> >> ine
>> >>
>> >
>> org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng
>> >> ine
>> >>
>> >> Maybe this could tell you something?
>> >>
>> >
>> > This are exactly the 5 engines that are expected to run with the default
>> > configuration.
>> > Based on this the Stanbol Enhnacer should just work fine.
>> >
>> > After looking at the the text you enhanced I noticed however that is
>> does
>> > not mention
>> > any named entities such as Persons, Organizations and Places. So I
>> checked
>> > it with
>> > my local Stanbol version and was also not any detected entities.
>> >
>> > So to check if Stanbol works as expected you should try to use an other
>> text
>> > the
>> > mentions some Named Entities such as
>> >
>> >    "John Smith works for the Apple Inc. in Cupertino, California."
>> >
>> >
>> > If you want to search also for entities like "Bank", "Blog", "Consumer",
>> > "Telephone" .
>> > you need to also configure a KeywordLinkingEngine for dbpedia. Part B
>> or [3]
>> > provides
>> > more information on how to do that.
>> >
>> > But let me mention that the KeywordLinkingEngine is more useful if used
>> in
>> > combination
>> > with an own domain specific thesaurus rather than a global data set like
>> > dbpedia. When
>> > used with dbpedia you will also get a lot of false positives.
>> >
>> > best
>> > Rupert
>> >
>> > [3]
>> http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
>> >
>>
>>
>

Re: Annotating using DBPedia ontology

Reply via email to