Thank you Rupert! It is probably something that I missed.
Best, Srecko -----Original Message----- From: Rupert Westenthaler [mailto:[email protected]] Sent: Thursday, January 12, 2012 20:16 To: Srecko Joksimovic; [email protected] Cc: [email protected] Subject: Re: Annotating using DBPedia ontology Hi Srecko I seams that both cases are related to the Metaxa Engine. My knowledge abut the libs used by this engine to extract the textual content is very limited. So I might not be the right person to look into that. In the first Example I think Metaxa was not able to extract the text from the word document because the only plainTextContent triple noted is <j.0:plainTextContent>Microsoft Word-Dokument
 srecko</j.0:plainTextContent> The second example looks like an issue within the RDF metadata generation in Aperture. I sent this replay also directly to Walter Kasper. He is the one who contributed this engine and should be able to provide a more information. best Rupert On 12.01.2012, at 18:40, srecko joksimovic wrote: > Hi Rupert, > > I have another question, and I will finish soon. > > I tried to annotate pdf document, and I didn't get result I expected. Then I put string you sent to me > "John Smith works for the Apple Inc. in Cupertino, California." > in MS Word document, and this is the result I got: > > <rdf:RDF > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > xmlns:j.0="http://www.semanticdesktop.org/ontologies/2007/01/19/nie#" > xmlns:j.1="http://purl.org/dc/terms/" > xmlns:j.2="http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#" > xmlns:j.3="http://fise.iks-project.eu/ontology/" > > <rdf:Description rdf:about="urn:enhancement-55016818-eb97-7b98-521a-422e3742173b"> > <rdf:type rdf:resource="http://fise.iks-project.eu/ontology/TextAnnotation"/> > <j.1:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">org.apache.stanbol.en hancer.engines.langid.LangIdEnhancementEngine</j.1:creator> > <j.1:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-01-12T17:34:20 .288Z</j.1:created> > <j.3:extracted-from rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f "/> > <rdf:type rdf:resource="http://fise.iks-project.eu/ontology/Enhancement"/> > <j.1:language>fr</j.1:language> > </rdf:Description> > <rdf:Description rdf:about="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f"> > <rdf:type rdf:resource="http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Pagin atedTextDocument"/> > <j.0:plainTextContent>Microsoft Word-Dokument
 > srecko</j.0:plainTextContent> > </rdf:Description> > <rdf:Description rdf:about="urn:enhancement-0644a1ed-f1d8-334d-d4e9-690a0446cba8"> > <j.3:confidence rdf:datatype="http://www.w3.org/2001/XMLSchema#double">1.0</j.3:confidence> > <rdf:type rdf:resource="http://fise.iks-project.eu/ontology/TextAnnotation"/> > <j.1:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">org.apache.stanbol.en hancer.engines.metaxa.MetaxaEngine</j.1:creator> > <j.1:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2012-01-12T17:34:20 .273Z</j.1:created> > <j.3:extracted-from rdf:resource="urn:content-item-sha1-835c8a5397d9b376a268b7bb5d3c8b4ab7e8b81f "/> > <rdf:type rdf:resource="http://fise.iks-project.eu/ontology/Enhancement"/> > </rdf:Description> > </rdf:RDF> > > > and this is the code: > > public List<String> Annotate(byte[] _stream_to_annotate, ServiceUtils.MIMETypes _content_type, String _encoding) > { > List<String> _return_list = new ArrayList<String>(); > try > { > URL url = new URL(ServiceUtils.SERVICE_URL); > HttpURLConnection con = (HttpURLConnection)url.openConnection(); > con.setDoOutput(true); > con.setRequestMethod("POST"); > con.setRequestProperty("Accept", "application/rdf+xml"); > con.setRequestProperty("Content-type", _content_type.getValue()); > > java.io.OutputStream out = con.getOutputStream(); > > IOUtils.write(_stream_to_annotate, out); > IOUtils.closeQuietly(out); > > con.connect(); //send the request > > if(con.getResponseCode() > 299) > { > java.io.InputStream errorStream = con.getErrorStream(); > if(errorStream != null) > { > String errorMessage = IOUtils.toString(errorStream); > IOUtils.closeQuietly(errorStream); > } > else > { > //no error data > //write default error message with the status code > } > } > else > { > Model model = ModelFactory.createDefaultModel(); > java.io.InputStream enhancementResults = con.getInputStream(); > model.read(enhancementResults, null); > String queryStringForGraph = "PREFIX t: <http://fise.iks-project.eu/ontology/> " + > "SELECT ?label WHERE {?alias t:entity-reference ?label}"; > Query query = QueryFactory.create(queryStringForGraph); > QueryExecution qe = QueryExecutionFactory.create(query, model); > > ResultSet results = qe.execSelect(); > while(results.hasNext()) > { > _return_list.add(results.next().toString()); > } > } > } > catch(Exception ex) > { > System.out.println(ex.getMessage()); > } > return _return_list; > } > > On Thu, Jan 12, 2012 at 8:32 AM, srecko joksimovic <[email protected]> wrote: > > Hi Rupert, > > Thank you for the answer. I've probably missed that. > > Best, > Srecko > > > On Thu, Jan 12, 2012 at 6:12 AM, Rupert Westenthaler <[email protected]> wrote: > Hi Srecko > > I think the last time I directly used this API is about 3-4 years ago, but after a look at the http client tutorial [1] I think the reason for your problem is that you do not execute the GetMethod. > > Based on this tutorial the code should look like > > // Create an instance of HttpClient. > HttpClient client = new HttpClient(); > GetMethod get = new GetMethod(url); > try { > // Execute the method. > int statusCode = client.executeMethod(get); > if (statusCode != HttpStatus.SC_OK) { > //handle the error > } > InputStream t_is = get.getResponseBodyAsStream(); > //read the data of the stream > } > > In addition you should not use a Reader if you want to read byte oriented data from the input stream. > > hope this helps > best > Rupert > > [1] http://hc.apache.org/httpclient-3.x/tutorial.html > > On 11.01.2012, at 22:34, Srecko Joksimovic wrote: > > > That's it. Thank you! > > I have already configured KeywordLinkingEngine when I used my own ontology. > > I think I'm familiar with that and I will try that option too. > > > > In meanwhile I found another interesting problem. I tried to annotate > > document and web page. With web page, I tried > > IOUtils.write(byte[], out) and I had to convert URL to byte[]: > > > > public static byte[] GetBytesFromURL(String _url) throws IOException > > { > > GetMethod get = new GetMethod(_url); > > InputStream t_is = get.getResponseBodyAsStream(); > > byte[] buffer = new byte[1024]; > > int count = -1; > > Reader t_url_reader = new BufferedReader(new > > InputStreamReader(t_is)); > > byte[] t_bytes = IOUtils.toByteArray(t_url_reader, "UTF-8"); > > > > return t_bytes; > > } > > > > But, the problem is that I'm getting null for InputStream. > > > > Any ideas? > > > > Best, > > Srecko > > > > > > > > -----Original Message----- > > From: Rupert Westenthaler [mailto:[email protected]] > > Sent: Wednesday, January 11, 2012 22:08 > > To: Srecko Joksimovic > > Cc: [email protected] > > Subject: Re: Annotating using DBPedia ontology > > > > > > On 11.01.2012, at 21:41, Srecko Joksimovic wrote: > >> Hi Rupert, > >> > >> When I load localhost:8080/engines it says this: > >> > >> There are currently 5 active engines. > >> org.apache.stanbol.enhancer.engines.metaxa.MetaxaEngine > >> org.apache.stanbol.enhancer.engines.langid.LangIdEnhancementEngine > >> > > org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhanc > >> ementEngine > >> > > org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng > >> ine > >> > > org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng > >> ine > >> > >> Maybe this could tell you something? > >> > > > > This are exactly the 5 engines that are expected to run with the default > > configuration. > > Based on this the Stanbol Enhnacer should just work fine. > > > > After looking at the the text you enhanced I noticed however that is does > > not mention > > any named entities such as Persons, Organizations and Places. So I checked > > it with > > my local Stanbol version and was also not any detected entities. > > > > So to check if Stanbol works as expected you should try to use an other text > > the > > mentions some Named Entities such as > > > > "John Smith works for the Apple Inc. in Cupertino, California." > > > > > > If you want to search also for entities like "Bank", "Blog", "Consumer", > > "Telephone" . > > you need to also configure a KeywordLinkingEngine for dbpedia. Part B or [3] > > provides > > more information on how to do that. > > > > But let me mention that the KeywordLinkingEngine is more useful if used in > > combination > > with an own domain specific thesaurus rather than a global data set like > > dbpedia. When > > used with dbpedia you will also get a lot of false positives. > > > > best > > Rupert > > > > [3] http://incubator.apache.org/stanbol/docs/trunk/customvocabulary.html > > > > >
