Re: Annotating using DBPedia ontology

Walter Kasper Fri, 13 Jan 2012 01:42:28 -0800

Hi Srecko,

I don't know what the problem with your Word document could have been.Could you send it to me for testing?

The error with your HTML page apparently arises from a bug in resolvingrelative URLs in one of the HTML extractors. We will fix that.


Best regards,

Walter

Srecko Joksimovic wrote:

Thank you Rupert!

It is probably something that I missed.

Best,
Srecko

-----Original Message-----
From: Rupert Westenthaler [mailto:[email protected]]
Sent: Thursday, January 12, 2012 20:16
To: Srecko Joksimovic; [email protected]
Cc: [email protected]
Subject: Re: Annotating using DBPedia ontology

Hi Srecko

I seams that both cases are related to the Metaxa Engine. My knowledge abut
the libs used by this engine to extract the textual content is very limited.
So I might not be the right person to look into that.

In the first Example I think Metaxa was not able to extract the text from
the word document because the only plainTextContent triple noted is

<j.0:plainTextContent>Microsoft Word-Dokument&#xD;
srecko</j.0:plainTextContent>

The  second example looks like an issue within the RDF metadata generation
in Aperture.

I sent this replay also directly to Walter Kasper. He is the one who
contributed this engine and should be able to provide a more information.

best
Rupert

On 12.01.2012, at 18:40, srecko joksimovic wrote:

Hi Rupert,

I have another question, and I will finish soon.

I tried to annotate pdf document, and I didn't get result I expected. Then

I put string you sent to me

"John Smith works for the Apple Inc. in Cupertino, California."
in MS Word document, and this is the result I got:

<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
     xmlns:j.0="http://www.semanticdesktop.org/ontologies/2007/01/19/nie#";
     xmlns:j.1="http://purl.org/dc/terms/";
     xmlns:j.2="http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#";
     xmlns:j.3="http://fise.iks-project.eu/ontology/";>
   <rdf:Description

rdf:about="urn:enhancement-55016818-eb97-7b98-521a-422e3742173b">

     <rdf:type

rdf:resource="http://fise.iks-project.eu/ontology/TextAnnotation"/>

     <j.1:creator

rdf:datatype="http://www.w3.org/2001/XMLSchema#string";>org.apache.stanbol.en
hancer.engines.langid.LangIdEnhancementEngine</j.1:creator>

     <j.1:created

rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime";>2012-01-12T17:34:20
.288Z</j.1:created>

     <j.3:extracted-from

rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f
"/>

     <rdf:type

rdf:resource="http://fise.iks-project.eu/ontology/Enhancement"/>

     <j.1:language>fr</j.1:language>
   </rdf:Description>
   <rdf:Description

rdf:about="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f">

     <rdf:type

rdf:resource="http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Pagin
atedTextDocument"/>

     <j.0:plainTextContent>Microsoft Word-Dokument&#xD;
srecko</j.0:plainTextContent>
   </rdf:Description>
   <rdf:Description

rdf:about="urn:enhancement-0644a1ed-f1d8-334d-d4e9-690a0446cba8">

     <j.3:confidence

rdf:datatype="http://www.w3.org/2001/XMLSchema#double";>1.0</j.3:confidence>

     <rdf:type

rdf:resource="http://fise.iks-project.eu/ontology/TextAnnotation"/>

     <j.1:creator

rdf:datatype="http://www.w3.org/2001/XMLSchema#string";>org.apache.stanbol.en
hancer.engines.metaxa.MetaxaEngine</j.1:creator>

     <j.1:created

rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime";>2012-01-12T17:34:20
.273Z</j.1:created>

     <j.3:extracted-from

rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f
"/>

     <rdf:type

rdf:resource="http://fise.iks-project.eu/ontology/Enhancement"/>

   </rdf:Description>
</rdf:RDF>


and this is the code:

        public List<String>  Annotate(byte[] _stream_to_annotate,

ServiceUtils.MIMETypes _content_type, String _encoding)

        {       
                List<String>  _return_list = new ArrayList<String>();
                try
                {                       
                        URL url = new URL(ServiceUtils.SERVICE_URL);
                        HttpURLConnection con =

(HttpURLConnection)url.openConnection();

                        con.setDoOutput(true);
                        con.setRequestMethod("POST");
                        con.setRequestProperty("Accept",

"application/rdf+xml");

                        con.setRequestProperty("Content-type",

_content_type.getValue());

                        
                        java.io.OutputStream out = con.getOutputStream();

                        IOUtils.write(_stream_to_annotate, out);
                        IOUtils.closeQuietly(out);

                        con.connect(); //send the request

                        if(con.getResponseCode()>  299)
                        {
                                java.io.InputStream errorStream =

con.getErrorStream();

                                if(errorStream != null)
                                {
                                        String errorMessage =

IOUtils.toString(errorStream);

                                        IOUtils.closeQuietly(errorStream);
                                }
                                else
                                {
                                        //no error data
                                        //write default error message with

the status code

                                }
                        }
                        else
                        {
                                Model model =

ModelFactory.createDefaultModel();

                                java.io.InputStream enhancementResults =

con.getInputStream();

                                model.read(enhancementResults, null);
                                String queryStringForGraph =  "PREFIX t:

<http://fise.iks-project.eu/ontology/>  " +

                                                "SELECT ?label WHERE {?alias

t:entity-reference ?label}";

                                Query query =

QueryFactory.create(queryStringForGraph);

                                QueryExecution qe =

QueryExecutionFactory.create(query, model);

                                
                                ResultSet results = qe.execSelect();
                                while(results.hasNext())
                                {

_return_list.add(results.next().toString());

                                }
                        }
                }
                catch(Exception ex)
                {
                        System.out.println(ex.getMessage());
                }               
                return _return_list;
        }

On Thu, Jan 12, 2012 at 8:32 AM, srecko joksimovic

<[email protected]>  wrote:

Hi Rupert,

Thank you for the answer. I've probably missed that.

Best,
Srecko


On Thu, Jan 12, 2012 at 6:12 AM, Rupert Westenthaler

<[email protected]>  wrote:

Hi Srecko

I think the last time I directly used this API is about 3-4 years ago, but

after a look at the http client tutorial [1] I think the reason for your
problem is that you do not execute the GetMethod.

Based on this tutorial the code should look like

    // Create an instance of HttpClient.
    HttpClient client = new HttpClient();
    GetMethod get = new GetMethod(url);
    try {
        // Execute the method.
        int statusCode = client.executeMethod(get);
        if (statusCode != HttpStatus.SC_OK) {
            //handle the error
        }
        InputStream t_is = get.getResponseBodyAsStream();
        //read the data of the stream
    }

In addition you should not use a Reader if you want to read byte oriented

data from the input stream.

hope this helps
best
Rupert

[1] http://hc.apache.org/httpclient-3.x/tutorial.html

On 11.01.2012, at 22:34, Srecko Joksimovic wrote:

That's it. Thank you!
I have already configured KeywordLinkingEngine when I used my own

ontology.

I think I'm familiar with that and I will try that option too.

In meanwhile I found another interesting problem. I tried to annotate
document and web page. With web page, I tried
IOUtils.write(byte[], out) and I had to convert URL to byte[]:

public static byte[] GetBytesFromURL(String _url) throws IOException
{
       GetMethod get = new GetMethod(_url);
       InputStream t_is = get.getResponseBodyAsStream();
       byte[] buffer = new byte[1024];
       int count = -1;
       Reader t_url_reader = new BufferedReader(new
InputStreamReader(t_is));
       byte[] t_bytes = IOUtils.toByteArray(t_url_reader, "UTF-8");

       return t_bytes;
}

But, the problem is that I'm getting null for InputStream.

Any ideas?

Best,
Srecko



-----Original Message-----
From: Rupert Westenthaler [mailto:[email protected]]
Sent: Wednesday, January 11, 2012 22:08
To: Srecko Joksimovic
Cc: [email protected]
Subject: Re: Annotating using DBPedia ontology


On 11.01.2012, at 21:41, Srecko Joksimovic wrote:

Hi Rupert,

When I load localhost:8080/engines it says this:

There are currently 5 active engines.
org.apache.stanbol.enhancer.engines.metaxa.MetaxaEngine
org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine

org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhanc

ementEngine

org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng

ine

org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng

ine

Maybe this could tell you something?

This are exactly the 5 engines that are expected to run with the default
configuration.
Based on this the Stanbol Enhnacer should just work fine.

After looking at the the text you enhanced I noticed however that is

does

not mention
any named entities such as Persons, Organizations and Places. So I

checked

it with
my local Stanbol version and was also not any detected entities.

So to check if Stanbol works as expected you should try to use an other

text

the
mentions some Named Entities such as

    "John Smith works for the Apple Inc. in Cupertino, California."


If you want to search also for entities like "Bank", "Blog", "Consumer",
"Telephone" .
you need to also configure a KeywordLinkingEngine for dbpedia. Part B or

[3]

provides
more information on how to do that.

But let me mention that the KeywordLinkingEngine is more useful if used

in

combination
with an own domain specific thesaurus rather than a global data set like
dbpedia. When
used with dbpedia you will also get a lot of false positives.

best
Rupert

[3] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html

Re: Annotating using DBPedia ontology

Reply via email to