Re: Annotating using DBPedia ontology

srecko joksimovic Fri, 13 Jan 2012 05:44:05 -0800

Thank you very much!

Best,
Srecko


On Fri, Jan 13, 2012 at 2:41 PM, Walter Kasper <[email protected]> wrote:

> Hi,
>
> Here are recognized standard mime types:
>
> pdf: application/pdf
> txt: text/plain
> ppt: application/vnd.ms-powerpoint
> xls: application/vnd.ms-excel
> odt: application/vnd.oasis.**opendocument.text
>
> Regards,
>
> Walter
>
> srecko joksimovic wrote:
>
>> Hi,
>>
>> Thank you! I will checkout the last version.
>> I'm using application/msword, because I thought that is the right one.
>> Could you please send me correct formats for pdf, txt, ppt, xls and odt
>> formats?
>>
>> Best,
>> Srecko
>>
>> On Fri, Jan 13, 2012 at 1:34 PM, Walter Kasper <[email protected]<mailto:
>> [email protected]>> wrote:
>>
>>    Hi,
>>
>>    We fixed the problem with unresolved relative URL from HTML
>>    documents. In the case of your Wikipedia page it came from an
>>    embedded rel-license microformat. If you are interested only in
>>    text extraction you can also just disable the RDFa and Microformat
>>    extractors in the configuration for the html extraction.
>>
>>    We tested also Word documents with your test sentence. Everything
>>    worked fine for us. Did you use the correct mime type? The correct
>>    ones for Word documents are:
>>
>>    doc-Format (<= Word-2003): application/vnd.ms-word
>>    docx-Format (Word-2007):
>>    application/vnd.**openxmlformats-officedocument.**wordprocessingml
>>
>>    Best regards,
>>
>>    Walter
>>
>>    srecko joksimovic wrote:
>>
>>        Hi Walter,
>>
>>        Word document is nothing special, just one line of text:
>>
>>        "John Smith works for the Apple Inc. in Cupertino, California."
>>
>>        Rupert suggested this sentence in order to test text
>>        annotation. As I now
>>        result after annotating this string, I thought to create Word
>>        document with
>>        same content for test purposes.
>>
>>        The error with your HTML page apparently arises from a bug in
>>        resolving
>>        relative URLs in one of the HTML extractors. We will fix that.
>>
>>        Does it means that I can't annotate HTML page at this moment,
>>        or that
>>        depends on page to page basis?
>>
>>        Best,
>>        Srecko
>>
>>        On Fri, Jan 13, 2012 at 9:51 AM, Walter
>>        Kasper<[email protected] <mailto:[email protected]>>  wrote:
>>
>>
>>            Hi Srecko,
>>
>>            I don't know what the problem with your Word document
>>            could have been.
>>            Could you send it to me for testing?
>>
>>            The error with your HTML page apparently arises from a bug
>>            in resolving
>>            relative URLs in one of the HTML extractors. We will fix that.
>>
>>            Best regards,
>>
>>            Walter
>>
>>
>>            Srecko Joksimovic wrote:
>>
>>                Thank you Rupert!
>>
>>                It is probably something that I missed.
>>
>>                Best,
>>                Srecko
>>
>>                -----Original Message-----
>>                From: Rupert Westenthaler [mailto:rupert.westenthaler@
>>                <mailto:rupert.westenthaler@>****gmail.com
>>                
>> <http://gmail.com><rupert.**[email protected]<[email protected]>
>>                
>> <mailto:rupert.westenthaler@**gmail.com<[email protected]>
>> >>
>>                ]
>>                Sent: Thursday, January 12, 2012 20:16
>>                To: Srecko Joksimovic; [email protected]
>>                <mailto:[email protected]>
>>                Cc:
>>                [email protected].****org<
>> stanbol-dev@incubator.**apache.org <[email protected]>
>>                
>> <mailto:stanbol-dev@incubator.**apache.org<[email protected]>
>> >>
>>
>>                Subject: Re: Annotating using DBPedia ontology
>>
>>                Hi Srecko
>>
>>                I seams that both cases are related to the Metaxa
>>                Engine. My knowledge
>>                abut
>>                the libs used by this engine to extract the textual
>>                content is very
>>                limited.
>>                So I might not be the right person to look into that.
>>
>>                In the first Example I think Metaxa was not able to
>>                extract the text from
>>                the word document because the only plainTextContent
>>                triple noted is
>>
>>                <j.0:plainTextContent>****Microsoft Word-Dokument&#xD;
>>
>>                srecko</j.0:plainTextContent>
>>
>>                The  second example looks like an issue within the RDF
>>                metadata generation
>>                in Aperture.
>>
>>                I sent this replay also directly to Walter Kasper. He
>>                is the one who
>>                contributed this engine and should be able to provide
>>                a more information.
>>
>>                best
>>                Rupert
>>
>>                On 12.01.2012, at 18:40, srecko joksimovic wrote:
>>
>>                 Hi Rupert,
>>
>>                    I have another question, and I will finish soon.
>>
>>                    I tried to annotate pdf document, and I didn't get
>>                    result I expected.
>>                    Then
>>
>>                I put string you sent to me
>>
>>                    "John Smith works for the Apple Inc. in Cupertino,
>>                    California."
>>                    in MS Word document, and this is the result I got:
>>
>>                    <rdf:RDF
>>                                           xmlns:rdf="http://www.w3.org/**
>> **1999/02/22-rdf-syntax-ns#<http://www.w3.org/**1999/02/22-rdf-syntax-ns#>
>> <htt**p://www.w3.org/1999/02/22-rdf-**syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>> >
>>                    "
>>                        
>> xmlns:j.0="http://www.**semant**icdesktop.org/**<http://semanticdesktop.org/**>
>>                    <http://semanticdesktop.org/****>
>>
>>                    ontologies/2007/01/19/nie#<htt**
>> p://www.semanticdesktop.org/**ontologies/2007/01/19/nie#<http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>
>> >
>>                    "
>>                                           xmlns:j.1="http://purl.org/dc/*
>> ***terms/ 
>> <http://purl.org/dc/**terms/><http://purl.org/dc/**terms/<http://purl.org/dc/terms/>
>> >"
>>                        
>> xmlns:j.2="http://www.**semant**icdesktop.org/**<http://semanticdesktop.org/**>
>>                    <http://semanticdesktop.org/****>
>>
>>                    ontologies/2007/03/22/nfo#<htt**
>> p://www.semanticdesktop.org/**ontologies/2007/03/22/nfo#<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>
>> >
>>                    "
>>                                           xmlns:j.3="http://fise.iks-**p*
>> *roject.eu/ontology/ <http://project.eu/ontology/>
>>                    <http://project.eu/ontology/><**
>> http://fise.iks-project.eu/**ontology/<http://fise.iks-project.eu/ontology/>
>> >
>>
>>                    ">
>>                    <rdf:Description
>>
>>                rdf:about="urn:enhancement-****55016818-eb97-7b98-521a-***
>> *422e3742173b">
>>
>>                    <rdf:type
>>
>>                rdf:resource="http://fise.iks-****project.eu/ontology/****
>> TextAnnotation <http://project.eu/ontology/**TextAnnotation>
>>                
>> <http://project.eu/ontology/****TextAnnotation<http://project.eu/ontology/**TextAnnotation>
>> ><http://fise.**iks-project.eu/ontology/**TextAnnotation<http://fise.iks-project.eu/ontology/TextAnnotation>
>> >
>>
>>                "/>
>>
>>                    <j.1:creator
>>
>>                rdf:datatype="http://www.w3.****org/2001/XMLSchema#string<
>> http**://www.w3.org/2001/XMLSchema#**string<http://www.w3.org/2001/XMLSchema#string>
>> >
>>                ">**org.apache.stanbol.en
>>                hancer.engines.langid.****LangIdEnhancementEngine</j.1:***
>> *creator>
>>
>>                    <j.1:created
>>
>>                rdf:datatype="http://www.w3.****
>> org/2001/XMLSchema#dateTime<ht**tp://www.w3.org/2001/**XMLSchema#dateTime<http://www.w3.org/2001/XMLSchema#dateTime>
>> >
>>                ">**2012-01-12T17:34:20
>>
>>                .288Z</j.1:created>
>>
>>                    <j.3:extracted-from
>>
>>                rdf:resource="urn:content-****item-sha1-****
>> 835c8a5397d9b376a268b7bb5d3c8b****
>>                4ab7e8b81f
>>                "/>
>>
>>                    <rdf:type
>>
>>                rdf:resource="http://fise.iks-****project.eu/ontology/****
>> Enhancement <http://project.eu/ontology/**Enhancement>
>>                
>> <http://project.eu/ontology/****Enhancement<http://project.eu/ontology/**Enhancement>
>> ><http://fise.iks-**project.eu/ontology/**Enhancement<http://fise.iks-project.eu/ontology/Enhancement>
>> >
>>
>>                "/>
>>
>>                    <j.1:language>fr</j.1:****language>
>>                    </rdf:Description>
>>                    <rdf:Description
>>
>>                rdf:about="urn:content-item-****sha1-****
>> 835c8a5397d9b376a268b7bb5d3c8b****
>>                4ab7e8b81f">
>>
>>                    <rdf:type
>>
>>                
>> rdf:resource="http://www.**sem**anticdesktop.org/**<http://semanticdesktop.org/**>
>>                <http://semanticdesktop.org/****>
>>
>>                ontologies/2007/03/22/nfo#****Pagin<http://www.**
>> semanticdesktop.org/**ontologies/2007/03/22/nfo#**Pagin<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Pagin>
>> >
>>                atedTextDocument"/>
>>
>>                    <j.0:plainTextContent>****Microsoft Word-Dokument&#xD;
>>
>>                    srecko</j.0:plainTextContent>
>>                    </rdf:Description>
>>                    <rdf:Description
>>
>>                rdf:about="urn:enhancement-****0644a1ed-f1d8-334d-d4e9-***
>> *690a0446cba8">
>>
>>                    <j.3:confidence
>>
>>                rdf:datatype="http://www.w3.****org/2001/XMLSchema#double<
>> http**://www.w3.org/2001/XMLSchema#**double<http://www.w3.org/2001/XMLSchema#double>
>> >
>>                ">1.**0</j.3:confidence>
>>
>>                    <rdf:type
>>
>>                rdf:resource="http://fise.iks-****project.eu/ontology/****
>> TextAnnotation <http://project.eu/ontology/**TextAnnotation>
>>                
>> <http://project.eu/ontology/****TextAnnotation<http://project.eu/ontology/**TextAnnotation>
>> ><http://fise.**iks-project.eu/ontology/**TextAnnotation<http://fise.iks-project.eu/ontology/TextAnnotation>
>> >
>>
>>                "/>
>>
>>                    <j.1:creator
>>
>>                rdf:datatype="http://www.w3.****org/2001/XMLSchema#string<
>> http**://www.w3.org/2001/XMLSchema#**string<http://www.w3.org/2001/XMLSchema#string>
>> >
>>                ">**org.apache.stanbol.en
>>                hancer.engines.metaxa.****MetaxaEngine</j.1:creator>
>>
>>                    <j.1:created
>>
>>                rdf:datatype="http://www.w3.****
>> org/2001/XMLSchema#dateTime<ht**tp://www.w3.org/2001/**XMLSchema#dateTime<http://www.w3.org/2001/XMLSchema#dateTime>
>> >
>>                ">**2012-01-12T17:34:20
>>
>>                .273Z</j.1:created>
>>
>>                    <j.3:extracted-from
>>
>>                rdf:resource="urn:content-****item-sha1-****
>> 835c8a5397d9b376a268b7bb5d3c8b****
>>                4ab7e8b81f
>>                "/>
>>
>>                    <rdf:type
>>
>>                rdf:resource="http://fise.iks-****project.eu/ontology/****
>> Enhancement <http://project.eu/ontology/**Enhancement>
>>                
>> <http://project.eu/ontology/****Enhancement<http://project.eu/ontology/**Enhancement>
>> ><http://fise.iks-**project.eu/ontology/**Enhancement<http://fise.iks-project.eu/ontology/Enhancement>
>> >
>>
>>
>>
>>                "/>
>>
>>                    </rdf:Description>
>>                    </rdf:RDF>
>>
>>
>>                    and this is the code:
>>
>>                           public List<String>   Annotate(byte[]
>>                    _stream_to_annotate,
>>
>>                ServiceUtils.MIMETypes _content_type, String _encoding)
>>
>>                           {
>>                                   List<String>   _return_list = new
>>                    ArrayList<String>();
>>                                   try
>>                                   {
>>                                           URL url = new
>>                    URL(ServiceUtils.SERVICE_URL);
>>                                           HttpURLConnection con =
>>
>>                (HttpURLConnection)url.****openConnection();
>>
>>                                           con.setDoOutput(true);
>>                                           con.setRequestMethod("POST");
>>
>>  con.setRequestProperty("****Accept",
>>
>>                "application/rdf+xml");
>>
>>
>>  con.setRequestProperty("****Content-type",
>>
>>                _content_type.getValue());
>>
>>                                           java.io.OutputStream out =
>>                    con.getOutputStream();
>>
>>
>>  IOUtils.write(_stream_to_****annotate, out);
>>
>>                                           IOUtils.closeQuietly(out);
>>
>>                                           con.connect(); //send the
>>                    request
>>
>>                                           if(con.getResponseCode()>
>>                299)
>>                                           {
>>                                                   java.io.InputStream
>>                    errorStream =
>>
>>                con.getErrorStream();
>>
>>                                                   if(errorStream != null)
>>                                                   {
>>                                                           String
>>                    errorMessage =
>>
>>                IOUtils.toString(errorStream);
>>
>>
>>    IOUtils.closeQuietly(**
>>
>>                    errorStream);
>>                                                   }
>>                                                   else
>>                                                   {
>>                                                           //no error data
>>                                                           //write
>>                    default error message with
>>
>>                the status code
>>
>>                                                   }
>>                                           }
>>                                           else
>>                                           {
>>                                                   Model model =
>>
>>                ModelFactory.****createDefaultModel();
>>
>>
>>                                                java.io.InputStream
>>                enhancementResults =
>>                con.getInputStream();
>>
>>
>> model.read(enhancementResults, null);
>>
>>                                                   String
>>                    queryStringForGraph =  "PREFIX t:
>>
>>                
>> <http://fise.iks-project.eu/****ontology/<http://fise.iks-project.eu/**ontology/>
>> <http://fise.iks-**project.eu/ontology/<http://fise.iks-project.eu/ontology/>
>> >>
>>
>>
>>                 " +
>>
>>
>>            "SELECT ?label WHERE
>>                    {?alias
>>
>>                t:entity-reference ?label}";
>>
>>                                                   Query query =
>>
>>                QueryFactory.create(****queryStringForGraph);
>>
>>                                                   QueryExecution qe =
>>
>>                QueryExecutionFactory.create(****query, model);
>>
>>
>>
>>                                                   ResultSet results =
>>                    qe.execSelect();
>>
>>  while(results.hasNext())
>>                                                   {
>>
>>                _return_list.add(results.next(****).toString());
>>
>>                                                   }
>>                                           }
>>                                   }
>>                                   catch(Exception ex)
>>                                   {
>>
>>  System.out.println(ex.****getMessage());
>>
>>                                   }
>>                                   return _return_list;
>>                           }
>>
>>                    On Thu, Jan 12, 2012 at 8:32 AM, srecko joksimovic
>>
>>                <[email protected]
>>                
>> <mailto:sreckojoksimovic@**gmail.com<[email protected]>>>
>>   wrote:
>>
>>                    Hi Rupert,
>>
>>                    Thank you for the answer. I've probably missed that.
>>
>>                    Best,
>>                    Srecko
>>
>>
>>                    On Thu, Jan 12, 2012 at 6:12 AM, Rupert Westenthaler
>>
>>                <[email protected]
>>                
>> <mailto:rupert.westenthaler@**gmail.com<[email protected]>>**>
>>   wrote:
>>
>>                    Hi Srecko
>>
>>                    I think the last time I directly used this API is
>>                    about 3-4 years ago,
>>                    but
>>
>>                after a look at the http client tutorial [1] I think
>>                the reason for your
>>                problem is that you do not execute the GetMethod.
>>
>>                    Based on this tutorial the code should look like
>>
>>                       // Create an instance of HttpClient.
>>                       HttpClient client = new HttpClient();
>>                       GetMethod get = new GetMethod(url);
>>                       try {
>>                           // Execute the method.
>>                           int statusCode = client.executeMethod(get);
>>                           if (statusCode != HttpStatus.SC_OK) {
>>                               //handle the error
>>                           }
>>                           InputStream t_is =
>>                    get.getResponseBodyAsStream();
>>                           //read the data of the stream
>>                       }
>>
>>                    In addition you should not use a Reader if you
>>                    want to read byte oriented
>>
>>                data from the input stream.
>>
>>                    hope this helps
>>                    best
>>                    Rupert
>>
>>                    [1]
>>                    
>> http://hc.apache.org/****httpclient-3.x/tutorial.html<http://hc.apache.org/**httpclient-3.x/tutorial.html>
>> <h**ttp://hc.apache.org/**httpclient-3.x/tutorial.html<http://hc.apache.org/httpclient-3.x/tutorial.html>
>> >
>>
>>
>>
>>                    On 11.01.2012, at 22:34, Srecko Joksimovic wrote:
>>
>>                     That's it. Thank you!
>>
>>                        I have already configured KeywordLinkingEngine
>>                        when I used my own
>>
>>                    ontology.
>>                    I think I'm familiar with that and I will try that
>>                    option too.
>>
>>                        In meanwhile I found another interesting
>>                        problem. I tried to annotate
>>                        document and web page. With web page, I tried
>>                        IOUtils.write(byte[], out) and I had to
>>                        convert URL to byte[]:
>>
>>                        public static byte[] GetBytesFromURL(String
>>                        _url) throws IOException
>>                        {
>>                              GetMethod get = new GetMethod(_url);
>>                              InputStream t_is =
>>                        get.getResponseBodyAsStream();
>>                              byte[] buffer = new byte[1024];
>>                              int count = -1;
>>                              Reader t_url_reader = new BufferedReader(new
>>                        InputStreamReader(t_is));
>>                              byte[] t_bytes =
>>                        IOUtils.toByteArray(t_url_****reader, "UTF-8");
>>
>>
>>                              return t_bytes;
>>                        }
>>
>>                        But, the problem is that I'm getting null for
>>                        InputStream.
>>
>>                        Any ideas?
>>
>>                        Best,
>>                        Srecko
>>
>>
>>
>>                        -----Original Message-----
>>                        From: Rupert Westenthaler
>>                        [mailto:rupert.westenthaler@
>>                        <mailto:rupert.westenthaler@>****gmail.com
>>                        
>> <http://gmail.com><rupert.**[email protected]<[email protected]>
>>                        
>> <mailto:rupert.westenthaler@**gmail.com<[email protected]>
>> >>
>>                        ]
>>                        Sent: Wednesday, January 11, 2012 22:08
>>                        To: Srecko Joksimovic
>>                        Cc:
>>                        [email protected].****org<
>> stanbol-dev@incubator.**apache.org <[email protected]>
>>                        
>> <mailto:stanbol-dev@incubator.**apache.org<[email protected]>
>> >>
>>
>>                        Subject: Re: Annotating using DBPedia ontology
>>
>>
>>                        On 11.01.2012, at 21:41, Srecko Joksimovic wrote:
>>
>>                            Hi Rupert,
>>
>>                            When I load localhost:8080/engines it says
>>                            this:
>>
>>                            There are currently 5 active engines.
>>                            org.apache.stanbol.enhancer.****
>> engines.metaxa.MetaxaEngine
>>                            org.apache.stanbol.enhancer.****
>> engines.langid.****LangIdEnhancementEngine
>>
>>                             org.apache.stanbol.enhancer.****
>> engines.opennlp.impl.**
>>
>>                NamedEntityExtractionEnhanc
>>
>>                    ementEngine
>>
>>                             org.apache.stanbol.enhancer.****
>> engines.entitytagging.impl.**
>>
>>                NamedEntityTaggingEng
>>
>>                    ine
>>
>>                             org.apache.stanbol.enhancer.****
>> engines.entitytagging.impl.**
>>
>>                NamedEntityTaggingEng
>>
>>                    ine
>>
>>                            Maybe this could tell you something?
>>
>>                             This are exactly the 5 engines that are
>>                            expected to run with the
>>
>>                        default
>>                        configuration.
>>                        Based on this the Stanbol Enhnacer should just
>>                        work fine.
>>
>>                        After looking at the the text you enhanced I
>>                        noticed however that is
>>
>>                    does
>>                    not mention
>>
>>                        any named entities such as Persons,
>>                        Organizations and Places. So I
>>
>>                    checked
>>                    it with
>>
>>                        my local Stanbol version and was also not any
>>                        detected entities.
>>
>>                        So to check if Stanbol works as expected you
>>                        should try to use an other
>>
>>                    text
>>                    the
>>
>>                        mentions some Named Entities such as
>>
>>                           "John Smith works for the Apple Inc. in
>>                        Cupertino, California."
>>
>>
>>                        If you want to search also for entities like
>>                        "Bank", "Blog", "Consumer",
>>                        "Telephone" .
>>                        you need to also configure a
>>                        KeywordLinkingEngine for dbpedia. Part B or
>>
>>                    [3]
>>                    provides
>>
>>                        more information on how to do that.
>>
>>                        But let me mention that the
>>                        KeywordLinkingEngine is more useful if used
>>
>>                    in
>>                    combination
>>
>>                        with an own domain specific thesaurus rather
>>                        than a global data set like
>>                        dbpedia. When
>>                        used with dbpedia you will also get a lot of
>>                        false positives.
>>
>>                        best
>>                        Rupert
>>
>>                        [3]
>>                        http://incubator.apache.org/****
>> stanbol/docs/trunk/**<http://incubator.apache.org/**stanbol/docs/trunk/**>
>>                        customvocabulary.html<http://**
>> incubator.apache.org/stanbol/**docs/trunk/customvocabulary.**html<http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html>
>> >
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Annotating using DBPedia ontology

Reply via email to