Thank you very much! Best, Srecko
On Fri, Jan 13, 2012 at 2:41 PM, Walter Kasper <[email protected]> wrote: > Hi, > > Here are recognized standard mime types: > > pdf: application/pdf > txt: text/plain > ppt: application/vnd.ms-powerpoint > xls: application/vnd.ms-excel > odt: application/vnd.oasis.**opendocument.text > > Regards, > > Walter > > srecko joksimovic wrote: > >> Hi, >> >> Thank you! I will checkout the last version. >> I'm using application/msword, because I thought that is the right one. >> Could you please send me correct formats for pdf, txt, ppt, xls and odt >> formats? >> >> Best, >> Srecko >> >> On Fri, Jan 13, 2012 at 1:34 PM, Walter Kasper <[email protected]<mailto: >> [email protected]>> wrote: >> >> Hi, >> >> We fixed the problem with unresolved relative URL from HTML >> documents. In the case of your Wikipedia page it came from an >> embedded rel-license microformat. If you are interested only in >> text extraction you can also just disable the RDFa and Microformat >> extractors in the configuration for the html extraction. >> >> We tested also Word documents with your test sentence. Everything >> worked fine for us. Did you use the correct mime type? The correct >> ones for Word documents are: >> >> doc-Format (<= Word-2003): application/vnd.ms-word >> docx-Format (Word-2007): >> application/vnd.**openxmlformats-officedocument.**wordprocessingml >> >> Best regards, >> >> Walter >> >> srecko joksimovic wrote: >> >> Hi Walter, >> >> Word document is nothing special, just one line of text: >> >> "John Smith works for the Apple Inc. in Cupertino, California." >> >> Rupert suggested this sentence in order to test text >> annotation. As I now >> result after annotating this string, I thought to create Word >> document with >> same content for test purposes. >> >> The error with your HTML page apparently arises from a bug in >> resolving >> relative URLs in one of the HTML extractors. We will fix that. >> >> Does it means that I can't annotate HTML page at this moment, >> or that >> depends on page to page basis? >> >> Best, >> Srecko >> >> On Fri, Jan 13, 2012 at 9:51 AM, Walter >> Kasper<[email protected] <mailto:[email protected]>> wrote: >> >> >> Hi Srecko, >> >> I don't know what the problem with your Word document >> could have been. >> Could you send it to me for testing? >> >> The error with your HTML page apparently arises from a bug >> in resolving >> relative URLs in one of the HTML extractors. We will fix that. >> >> Best regards, >> >> Walter >> >> >> Srecko Joksimovic wrote: >> >> Thank you Rupert! >> >> It is probably something that I missed. >> >> Best, >> Srecko >> >> -----Original Message----- >> From: Rupert Westenthaler [mailto:rupert.westenthaler@ >> <mailto:rupert.westenthaler@>****gmail.com >> >> <http://gmail.com><rupert.**[email protected]<[email protected]> >> >> <mailto:rupert.westenthaler@**gmail.com<[email protected]> >> >> >> ] >> Sent: Thursday, January 12, 2012 20:16 >> To: Srecko Joksimovic; [email protected] >> <mailto:[email protected]> >> Cc: >> [email protected].****org< >> stanbol-dev@incubator.**apache.org <[email protected]> >> >> <mailto:stanbol-dev@incubator.**apache.org<[email protected]> >> >> >> >> Subject: Re: Annotating using DBPedia ontology >> >> Hi Srecko >> >> I seams that both cases are related to the Metaxa >> Engine. My knowledge >> abut >> the libs used by this engine to extract the textual >> content is very >> limited. >> So I might not be the right person to look into that. >> >> In the first Example I think Metaxa was not able to >> extract the text from >> the word document because the only plainTextContent >> triple noted is >> >> <j.0:plainTextContent>****Microsoft Word-Dokument
 >> >> srecko</j.0:plainTextContent> >> >> The second example looks like an issue within the RDF >> metadata generation >> in Aperture. >> >> I sent this replay also directly to Walter Kasper. He >> is the one who >> contributed this engine and should be able to provide >> a more information. >> >> best >> Rupert >> >> On 12.01.2012, at 18:40, srecko joksimovic wrote: >> >> Hi Rupert, >> >> I have another question, and I will finish soon. >> >> I tried to annotate pdf document, and I didn't get >> result I expected. >> Then >> >> I put string you sent to me >> >> "John Smith works for the Apple Inc. in Cupertino, >> California." >> in MS Word document, and this is the result I got: >> >> <rdf:RDF >> xmlns:rdf="http://www.w3.org/** >> **1999/02/22-rdf-syntax-ns#<http://www.w3.org/**1999/02/22-rdf-syntax-ns#> >> <htt**p://www.w3.org/1999/02/22-rdf-**syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> > >> " >> >> xmlns:j.0="http://www.**semant**icdesktop.org/**<http://semanticdesktop.org/**> >> <http://semanticdesktop.org/****> >> >> ontologies/2007/01/19/nie#<htt** >> p://www.semanticdesktop.org/**ontologies/2007/01/19/nie#<http://www.semanticdesktop.org/ontologies/2007/01/19/nie#> >> > >> " >> xmlns:j.1="http://purl.org/dc/* >> ***terms/ >> <http://purl.org/dc/**terms/><http://purl.org/dc/**terms/<http://purl.org/dc/terms/> >> >" >> >> xmlns:j.2="http://www.**semant**icdesktop.org/**<http://semanticdesktop.org/**> >> <http://semanticdesktop.org/****> >> >> ontologies/2007/03/22/nfo#<htt** >> p://www.semanticdesktop.org/**ontologies/2007/03/22/nfo#<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#> >> > >> " >> xmlns:j.3="http://fise.iks-**p* >> *roject.eu/ontology/ <http://project.eu/ontology/> >> <http://project.eu/ontology/><** >> http://fise.iks-project.eu/**ontology/<http://fise.iks-project.eu/ontology/> >> > >> >> "> >> <rdf:Description >> >> rdf:about="urn:enhancement-****55016818-eb97-7b98-521a-*** >> *422e3742173b"> >> >> <rdf:type >> >> rdf:resource="http://fise.iks-****project.eu/ontology/**** >> TextAnnotation <http://project.eu/ontology/**TextAnnotation> >> >> <http://project.eu/ontology/****TextAnnotation<http://project.eu/ontology/**TextAnnotation> >> ><http://fise.**iks-project.eu/ontology/**TextAnnotation<http://fise.iks-project.eu/ontology/TextAnnotation> >> > >> >> "/> >> >> <j.1:creator >> >> rdf:datatype="http://www.w3.****org/2001/XMLSchema#string< >> http**://www.w3.org/2001/XMLSchema#**string<http://www.w3.org/2001/XMLSchema#string> >> > >> ">**org.apache.stanbol.en >> hancer.engines.langid.****LangIdEnhancementEngine</j.1:*** >> *creator> >> >> <j.1:created >> >> rdf:datatype="http://www.w3.**** >> org/2001/XMLSchema#dateTime<ht**tp://www.w3.org/2001/**XMLSchema#dateTime<http://www.w3.org/2001/XMLSchema#dateTime> >> > >> ">**2012-01-12T17:34:20 >> >> .288Z</j.1:created> >> >> <j.3:extracted-from >> >> rdf:resource="urn:content-****item-sha1-**** >> 835c8a5397d9b376a268b7bb5d3c8b**** >> 4ab7e8b81f >> "/> >> >> <rdf:type >> >> rdf:resource="http://fise.iks-****project.eu/ontology/**** >> Enhancement <http://project.eu/ontology/**Enhancement> >> >> <http://project.eu/ontology/****Enhancement<http://project.eu/ontology/**Enhancement> >> ><http://fise.iks-**project.eu/ontology/**Enhancement<http://fise.iks-project.eu/ontology/Enhancement> >> > >> >> "/> >> >> <j.1:language>fr</j.1:****language> >> </rdf:Description> >> <rdf:Description >> >> rdf:about="urn:content-item-****sha1-**** >> 835c8a5397d9b376a268b7bb5d3c8b**** >> 4ab7e8b81f"> >> >> <rdf:type >> >> >> rdf:resource="http://www.**sem**anticdesktop.org/**<http://semanticdesktop.org/**> >> <http://semanticdesktop.org/****> >> >> ontologies/2007/03/22/nfo#****Pagin<http://www.** >> semanticdesktop.org/**ontologies/2007/03/22/nfo#**Pagin<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Pagin> >> > >> atedTextDocument"/> >> >> <j.0:plainTextContent>****Microsoft Word-Dokument
 >> >> srecko</j.0:plainTextContent> >> </rdf:Description> >> <rdf:Description >> >> rdf:about="urn:enhancement-****0644a1ed-f1d8-334d-d4e9-*** >> *690a0446cba8"> >> >> <j.3:confidence >> >> rdf:datatype="http://www.w3.****org/2001/XMLSchema#double< >> http**://www.w3.org/2001/XMLSchema#**double<http://www.w3.org/2001/XMLSchema#double> >> > >> ">1.**0</j.3:confidence> >> >> <rdf:type >> >> rdf:resource="http://fise.iks-****project.eu/ontology/**** >> TextAnnotation <http://project.eu/ontology/**TextAnnotation> >> >> <http://project.eu/ontology/****TextAnnotation<http://project.eu/ontology/**TextAnnotation> >> ><http://fise.**iks-project.eu/ontology/**TextAnnotation<http://fise.iks-project.eu/ontology/TextAnnotation> >> > >> >> "/> >> >> <j.1:creator >> >> rdf:datatype="http://www.w3.****org/2001/XMLSchema#string< >> http**://www.w3.org/2001/XMLSchema#**string<http://www.w3.org/2001/XMLSchema#string> >> > >> ">**org.apache.stanbol.en >> hancer.engines.metaxa.****MetaxaEngine</j.1:creator> >> >> <j.1:created >> >> rdf:datatype="http://www.w3.**** >> org/2001/XMLSchema#dateTime<ht**tp://www.w3.org/2001/**XMLSchema#dateTime<http://www.w3.org/2001/XMLSchema#dateTime> >> > >> ">**2012-01-12T17:34:20 >> >> .273Z</j.1:created> >> >> <j.3:extracted-from >> >> rdf:resource="urn:content-****item-sha1-**** >> 835c8a5397d9b376a268b7bb5d3c8b**** >> 4ab7e8b81f >> "/> >> >> <rdf:type >> >> rdf:resource="http://fise.iks-****project.eu/ontology/**** >> Enhancement <http://project.eu/ontology/**Enhancement> >> >> <http://project.eu/ontology/****Enhancement<http://project.eu/ontology/**Enhancement> >> ><http://fise.iks-**project.eu/ontology/**Enhancement<http://fise.iks-project.eu/ontology/Enhancement> >> > >> >> >> >> "/> >> >> </rdf:Description> >> </rdf:RDF> >> >> >> and this is the code: >> >> public List<String> Annotate(byte[] >> _stream_to_annotate, >> >> ServiceUtils.MIMETypes _content_type, String _encoding) >> >> { >> List<String> _return_list = new >> ArrayList<String>(); >> try >> { >> URL url = new >> URL(ServiceUtils.SERVICE_URL); >> HttpURLConnection con = >> >> (HttpURLConnection)url.****openConnection(); >> >> con.setDoOutput(true); >> con.setRequestMethod("POST"); >> >> con.setRequestProperty("****Accept", >> >> "application/rdf+xml"); >> >> >> con.setRequestProperty("****Content-type", >> >> _content_type.getValue()); >> >> java.io.OutputStream out = >> con.getOutputStream(); >> >> >> IOUtils.write(_stream_to_****annotate, out); >> >> IOUtils.closeQuietly(out); >> >> con.connect(); //send the >> request >> >> if(con.getResponseCode()> >> 299) >> { >> java.io.InputStream >> errorStream = >> >> con.getErrorStream(); >> >> if(errorStream != null) >> { >> String >> errorMessage = >> >> IOUtils.toString(errorStream); >> >> >> IOUtils.closeQuietly(** >> >> errorStream); >> } >> else >> { >> //no error data >> //write >> default error message with >> >> the status code >> >> } >> } >> else >> { >> Model model = >> >> ModelFactory.****createDefaultModel(); >> >> >> java.io.InputStream >> enhancementResults = >> con.getInputStream(); >> >> >> model.read(enhancementResults, null); >> >> String >> queryStringForGraph = "PREFIX t: >> >> >> <http://fise.iks-project.eu/****ontology/<http://fise.iks-project.eu/**ontology/> >> <http://fise.iks-**project.eu/ontology/<http://fise.iks-project.eu/ontology/> >> >> >> >> >> " + >> >> >> "SELECT ?label WHERE >> {?alias >> >> t:entity-reference ?label}"; >> >> Query query = >> >> QueryFactory.create(****queryStringForGraph); >> >> QueryExecution qe = >> >> QueryExecutionFactory.create(****query, model); >> >> >> >> ResultSet results = >> qe.execSelect(); >> >> while(results.hasNext()) >> { >> >> _return_list.add(results.next(****).toString()); >> >> } >> } >> } >> catch(Exception ex) >> { >> >> System.out.println(ex.****getMessage()); >> >> } >> return _return_list; >> } >> >> On Thu, Jan 12, 2012 at 8:32 AM, srecko joksimovic >> >> <[email protected] >> >> <mailto:sreckojoksimovic@**gmail.com<[email protected]>>> >> wrote: >> >> Hi Rupert, >> >> Thank you for the answer. I've probably missed that. >> >> Best, >> Srecko >> >> >> On Thu, Jan 12, 2012 at 6:12 AM, Rupert Westenthaler >> >> <[email protected] >> >> <mailto:rupert.westenthaler@**gmail.com<[email protected]>>**> >> wrote: >> >> Hi Srecko >> >> I think the last time I directly used this API is >> about 3-4 years ago, >> but >> >> after a look at the http client tutorial [1] I think >> the reason for your >> problem is that you do not execute the GetMethod. >> >> Based on this tutorial the code should look like >> >> // Create an instance of HttpClient. >> HttpClient client = new HttpClient(); >> GetMethod get = new GetMethod(url); >> try { >> // Execute the method. >> int statusCode = client.executeMethod(get); >> if (statusCode != HttpStatus.SC_OK) { >> //handle the error >> } >> InputStream t_is = >> get.getResponseBodyAsStream(); >> //read the data of the stream >> } >> >> In addition you should not use a Reader if you >> want to read byte oriented >> >> data from the input stream. >> >> hope this helps >> best >> Rupert >> >> [1] >> >> http://hc.apache.org/****httpclient-3.x/tutorial.html<http://hc.apache.org/**httpclient-3.x/tutorial.html> >> <h**ttp://hc.apache.org/**httpclient-3.x/tutorial.html<http://hc.apache.org/httpclient-3.x/tutorial.html> >> > >> >> >> >> On 11.01.2012, at 22:34, Srecko Joksimovic wrote: >> >> That's it. Thank you! >> >> I have already configured KeywordLinkingEngine >> when I used my own >> >> ontology. >> I think I'm familiar with that and I will try that >> option too. >> >> In meanwhile I found another interesting >> problem. I tried to annotate >> document and web page. With web page, I tried >> IOUtils.write(byte[], out) and I had to >> convert URL to byte[]: >> >> public static byte[] GetBytesFromURL(String >> _url) throws IOException >> { >> GetMethod get = new GetMethod(_url); >> InputStream t_is = >> get.getResponseBodyAsStream(); >> byte[] buffer = new byte[1024]; >> int count = -1; >> Reader t_url_reader = new BufferedReader(new >> InputStreamReader(t_is)); >> byte[] t_bytes = >> IOUtils.toByteArray(t_url_****reader, "UTF-8"); >> >> >> return t_bytes; >> } >> >> But, the problem is that I'm getting null for >> InputStream. >> >> Any ideas? >> >> Best, >> Srecko >> >> >> >> -----Original Message----- >> From: Rupert Westenthaler >> [mailto:rupert.westenthaler@ >> <mailto:rupert.westenthaler@>****gmail.com >> >> <http://gmail.com><rupert.**[email protected]<[email protected]> >> >> <mailto:rupert.westenthaler@**gmail.com<[email protected]> >> >> >> ] >> Sent: Wednesday, January 11, 2012 22:08 >> To: Srecko Joksimovic >> Cc: >> [email protected].****org< >> stanbol-dev@incubator.**apache.org <[email protected]> >> >> <mailto:stanbol-dev@incubator.**apache.org<[email protected]> >> >> >> >> Subject: Re: Annotating using DBPedia ontology >> >> >> On 11.01.2012, at 21:41, Srecko Joksimovic wrote: >> >> Hi Rupert, >> >> When I load localhost:8080/engines it says >> this: >> >> There are currently 5 active engines. >> org.apache.stanbol.enhancer.**** >> engines.metaxa.MetaxaEngine >> org.apache.stanbol.enhancer.**** >> engines.langid.****LangIdEnhancementEngine >> >> org.apache.stanbol.enhancer.**** >> engines.opennlp.impl.** >> >> NamedEntityExtractionEnhanc >> >> ementEngine >> >> org.apache.stanbol.enhancer.**** >> engines.entitytagging.impl.** >> >> NamedEntityTaggingEng >> >> ine >> >> org.apache.stanbol.enhancer.**** >> engines.entitytagging.impl.** >> >> NamedEntityTaggingEng >> >> ine >> >> Maybe this could tell you something? >> >> This are exactly the 5 engines that are >> expected to run with the >> >> default >> configuration. >> Based on this the Stanbol Enhnacer should just >> work fine. >> >> After looking at the the text you enhanced I >> noticed however that is >> >> does >> not mention >> >> any named entities such as Persons, >> Organizations and Places. So I >> >> checked >> it with >> >> my local Stanbol version and was also not any >> detected entities. >> >> So to check if Stanbol works as expected you >> should try to use an other >> >> text >> the >> >> mentions some Named Entities such as >> >> "John Smith works for the Apple Inc. in >> Cupertino, California." >> >> >> If you want to search also for entities like >> "Bank", "Blog", "Consumer", >> "Telephone" . >> you need to also configure a >> KeywordLinkingEngine for dbpedia. Part B or >> >> [3] >> provides >> >> more information on how to do that. >> >> But let me mention that the >> KeywordLinkingEngine is more useful if used >> >> in >> combination >> >> with an own domain specific thesaurus rather >> than a global data set like >> dbpedia. When >> used with dbpedia you will also get a lot of >> false positives. >> >> best >> Rupert >> >> [3] >> http://incubator.apache.org/**** >> stanbol/docs/trunk/**<http://incubator.apache.org/**stanbol/docs/trunk/**> >> customvocabulary.html<http://** >> incubator.apache.org/stanbol/**docs/trunk/customvocabulary.**html<http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html> >> > >> >> >> >> >> >> >> >> >> > >
