RE: Annotating using DBPedia ontology

Srecko Joksimovic Thu, 12 Jan 2012 11:22:02 -0800

Thank you Rupert!

It is probably something that I missed.


Best,
Srecko

-----Original Message-----
From: Rupert Westenthaler [mailto:[email protected]] 
Sent: Thursday, January 12, 2012 20:16
To: Srecko Joksimovic; [email protected]
Cc: [email protected]
Subject: Re: Annotating using DBPedia ontology

Hi Srecko

I seams that both cases are related to the Metaxa Engine. My knowledge abut
the libs used by this engine to extract the textual content is very limited.
So I might not be the right person to look into that. 

In the first Example I think Metaxa was not able to extract the text from
the word document because the only plainTextContent triple noted is

<j.0:plainTextContent>Microsoft Word-Dokument&#xD;
srecko</j.0:plainTextContent>

The  second example looks like an issue within the RDF metadata generation
in Aperture.
 
I sent this replay also directly to Walter Kasper. He is the one who
contributed this engine and should be able to provide a more information.

best
Rupert

On 12.01.2012, at 18:40, srecko joksimovic wrote:

> Hi Rupert,
> 
> I have another question, and I will finish soon.
> 
> I tried to annotate pdf document, and I didn't get result I expected. Then
I put string you sent to me 
> "John Smith works for the Apple Inc. in Cupertino, California."
> in MS Word document, and this is the result I got:
> 
> <rdf:RDF
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
>     xmlns:j.0="http://www.semanticdesktop.org/ontologies/2007/01/19/nie#";
>     xmlns:j.1="http://purl.org/dc/terms/";
>     xmlns:j.2="http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#";
>     xmlns:j.3="http://fise.iks-project.eu/ontology/"; > 
>   <rdf:Description
rdf:about="urn:enhancement-55016818-eb97-7b98-521a-422e3742173b">
>     <rdf:type
rdf:resource="http://fise.iks-project.eu/ontology/TextAnnotation"/>
>     <j.1:creator
rdf:datatype="http://www.w3.org/2001/XMLSchema#string";>org.apache.stanbol.en
hancer.engines.langid.LangIdEnhancementEngine</j.1:creator>
>     <j.1:created
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime";>2012-01-12T17:34:20
.288Z</j.1:created>
>     <j.3:extracted-from
rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f
"/>
>     <rdf:type
rdf:resource="http://fise.iks-project.eu/ontology/Enhancement"/>
>     <j.1:language>fr</j.1:language>
>   </rdf:Description>
>   <rdf:Description
rdf:about="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f">
>     <rdf:type
rdf:resource="http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Pagin
atedTextDocument"/>
>     <j.0:plainTextContent>Microsoft Word-Dokument&#xD;
> srecko</j.0:plainTextContent>
>   </rdf:Description>
>   <rdf:Description
rdf:about="urn:enhancement-0644a1ed-f1d8-334d-d4e9-690a0446cba8">
>     <j.3:confidence
rdf:datatype="http://www.w3.org/2001/XMLSchema#double";>1.0</j.3:confidence>
>     <rdf:type
rdf:resource="http://fise.iks-project.eu/ontology/TextAnnotation"/>
>     <j.1:creator
rdf:datatype="http://www.w3.org/2001/XMLSchema#string";>org.apache.stanbol.en
hancer.engines.metaxa.MetaxaEngine</j.1:creator>
>     <j.1:created
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime";>2012-01-12T17:34:20
.273Z</j.1:created>
>     <j.3:extracted-from
rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f
"/>
>     <rdf:type
rdf:resource="http://fise.iks-project.eu/ontology/Enhancement"/>
>   </rdf:Description>
> </rdf:RDF>
> 
> 
> and this is the code:
> 
>       public List<String> Annotate(byte[] _stream_to_annotate,
ServiceUtils.MIMETypes _content_type, String _encoding)
>       {        
>               List<String> _return_list = new ArrayList<String>();
>               try             
>               {                          
>                       URL url = new URL(ServiceUtils.SERVICE_URL);

>                       HttpURLConnection con =
(HttpURLConnection)url.openConnection();                    
>                       con.setDoOutput(true);

>                       con.setRequestMethod("POST");                    
>                       con.setRequestProperty("Accept",
"application/rdf+xml");                    
>                       con.setRequestProperty("Content-type",
_content_type.getValue());
>                                     
>                       java.io.OutputStream out = con.getOutputStream();
>                  
>                       IOUtils.write(_stream_to_annotate, out);

>                       IOUtils.closeQuietly(out);
>                  
>                       con.connect(); //send the request           
>          
>                       if(con.getResponseCode() > 299)         
>                       { 
>                               java.io.InputStream errorStream =
con.getErrorStream();                            
>                               if(errorStream != null)             
>                               {                                 
>                                       String errorMessage =
IOUtils.toString(errorStream);                                   
>                                       IOUtils.closeQuietly(errorStream);

>                               }              
>                               else              
>                               { 
>                                       //no error data                
>                                       //write default error message with
the status code                            
>                               }                    
>                       }          
>                       else                     
>                       {   
>                               Model model =
ModelFactory.createDefaultModel();

>                               java.io.InputStream enhancementResults =
con.getInputStream();

>                               model.read(enhancementResults, null);

>                               String queryStringForGraph =  "PREFIX t:
<http://fise.iks-project.eu/ontology/> " +
>                                               "SELECT ?label WHERE {?alias
t:entity-reference ?label}";                            
>                               Query query =
QueryFactory.create(queryStringForGraph);                            
>                               QueryExecution qe =
QueryExecutionFactory.create(query, model);                             
>                               
>                               ResultSet results = qe.execSelect();
>                               while(results.hasNext())

>                               {

>
_return_list.add(results.next().toString());
>                               }

>                       }                 
>               }                 
>               catch(Exception ex)                            
>               {                 
>                       System.out.println(ex.getMessage());

>               }               
>               return _return_list;
>       }
> 
> On Thu, Jan 12, 2012 at 8:32 AM, srecko joksimovic
<[email protected]> wrote:
> 
> Hi Rupert,
> 
> Thank you for the answer. I've probably missed that. 
> 
> Best,
> Srecko
> 
> 
> On Thu, Jan 12, 2012 at 6:12 AM, Rupert Westenthaler
<[email protected]> wrote:
> Hi Srecko
> 
> I think the last time I directly used this API is about 3-4 years ago, but
after a look at the http client tutorial [1] I think the reason for your
problem is that you do not execute the GetMethod.
> 
> Based on this tutorial the code should look like
> 
>    // Create an instance of HttpClient.
>    HttpClient client = new HttpClient();
>    GetMethod get = new GetMethod(url);
>    try {
>        // Execute the method.
>        int statusCode = client.executeMethod(get);
>        if (statusCode != HttpStatus.SC_OK) {
>            //handle the error
>        }
>        InputStream t_is = get.getResponseBodyAsStream();
>        //read the data of the stream
>    }
> 
> In addition you should not use a Reader if you want to read byte oriented
data from the input stream.
> 
> hope this helps
> best
> Rupert
> 
> [1] http://hc.apache.org/httpclient-3.x/tutorial.html
> 
> On 11.01.2012, at 22:34, Srecko Joksimovic wrote:
> 
> > That's it. Thank you!
> > I have already configured KeywordLinkingEngine when I used my own
ontology.
> > I think I'm familiar with that and I will try that option too.
> >
> > In meanwhile I found another interesting problem. I tried to annotate
> > document and web page. With web page, I tried
> > IOUtils.write(byte[], out) and I had to convert URL to byte[]:
> >
> > public static byte[] GetBytesFromURL(String _url) throws IOException
> > {
> >       GetMethod get = new GetMethod(_url);
> >       InputStream t_is = get.getResponseBodyAsStream();
> >       byte[] buffer = new byte[1024];
> >       int count = -1;
> >       Reader t_url_reader = new BufferedReader(new
> > InputStreamReader(t_is));
> >       byte[] t_bytes = IOUtils.toByteArray(t_url_reader, "UTF-8");
> >
> >       return t_bytes;
> > }
> >
> > But, the problem is that I'm getting null for InputStream.
> >
> > Any ideas?
> >
> > Best,
> > Srecko
> >
> >
> >
> > -----Original Message-----
> > From: Rupert Westenthaler [mailto:[email protected]]
> > Sent: Wednesday, January 11, 2012 22:08
> > To: Srecko Joksimovic
> > Cc: [email protected]
> > Subject: Re: Annotating using DBPedia ontology
> >
> >
> > On 11.01.2012, at 21:41, Srecko Joksimovic wrote:
> >> Hi Rupert,
> >>
> >> When I load localhost:8080/engines it says this:
> >>
> >> There are currently 5 active engines.
> >> org.apache.stanbol.enhancer.engines.metaxa.MetaxaEngine
> >> org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine
> >>
> >
org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhanc
> >> ementEngine
> >>
> >
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng
> >> ine
> >>
> >
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng
> >> ine
> >>
> >> Maybe this could tell you something?
> >>
> >
> > This are exactly the 5 engines that are expected to run with the default
> > configuration.
> > Based on this the Stanbol Enhnacer should just work fine.
> >
> > After looking at the the text you enhanced I noticed however that is
does
> > not mention
> > any named entities such as Persons, Organizations and Places. So I
checked
> > it with
> > my local Stanbol version and was also not any detected entities.
> >
> > So to check if Stanbol works as expected you should try to use an other
text
> > the
> > mentions some Named Entities such as
> >
> >    "John Smith works for the Apple Inc. in Cupertino, California."
> >
> >
> > If you want to search also for entities like "Bank", "Blog", "Consumer",
> > "Telephone" .
> > you need to also configure a KeywordLinkingEngine for dbpedia. Part B or
[3]
> > provides
> > more information on how to do that.
> >
> > But let me mention that the KeywordLinkingEngine is more useful if used
in
> > combination
> > with an own domain specific thesaurus rather than a global data set like
> > dbpedia. When
> > used with dbpedia you will also get a lot of false positives.
> >
> > best
> > Rupert
> >
> > [3] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html
> >
> 
> 
>

RE: Annotating using DBPedia ontology

Reply via email to