Dear Rupert, thanks ... thing are going clearer and clearer. Best regards, A
2012/6/21 Rupert Westenthaler <[email protected]> > Hi Andrea > > The CELI Lemmatizer engine (see STANBOL-583) does exactly that. It > creates TextAnnotations for each word and adds the POS and Lemma (if > you enable the "Full Morphological Analysis" in its configuration). > > Here is an example for "Tagen" of the german sentence "An Tagen wie > diesen würde man lieber baden gehen!" > > <urn:enhancement-3bf15662-f87a-dcba-e4cb-92024b167d30> > a <http://fise.iks-project.eu/ontology/TextAnnotation> , > <http://fise.iks-project.eu/ontology/Enhancement> ; > <http://fise.iks-project.eu/ontology/selected-text> > "Tagen"@de ; > <http://fise.iks-project.eu/ontology/selection-context> > "An Tagen wie diesen würde man lieber baden gehen!"@de ; > <http://fise.iks-project.eu/ontology/start> > "3"^^<http://www.w3.org/2001/XMLSchema#int> ; > <http://fise.iks-project.eu/ontology/end> > "8"^^<http://www.w3.org/2001/XMLSchema#int> ; > <http://fise.iks-project.eu/ontology/hasLemmaForm> > "tagen"@de , "Tag"@de ; > <http://fise.iks-project.eu/ontology/hasMorphologicalFeature> > "MOOD=SUB" , "MOOD=INF" , "POS=N" , "PERSON=P3" , > "CASE=DAT" , "POS=V"^^ , "TENSE=PRS" , "GENDER=MAS" , "NUMBER=PLU" ; > > > This engines uses the Properties > > * fise:hasLemmaForm > * fise:hasMorphologicalFeature: values are {key}={value} > > to encode results of the Morphological analyses. However note that > this two properties are NOT specified in the Stanbol Enhancement > Structure. > > Doing the same with the POSTagger of OpenNLP would be quite easy. > Especially when you use the > "org.apache.stanbol.commons.opennlp.TextAnalyzer" as the > KeywordLinkingEngine does. > > @Reference > OpenNLP openNLP; //injected -> loads models from config > > //get the plain text from the ContentItem > Entry<UriRef,Blob> contentPart = ContentItemHelper.getBlob(ci, > Collections.singleton("text/plain")); > String text = ContentItemHelper.getText(contentPart.getValue()); > //get the language of the Text > String lang = EnhancementEngineHelper.getLanguage(ci); > > //Analyze the text > //config for the TextAnalyzer ... you may expose some of them > //in the Engine config > TextAnalyzerConfig config = new TextAnalyzerConfig(); //uses defaults > > //create the TextAnalyzer > TextAnalyzer analyzer = new TextAnalyzer(openNLP, language,config); > //process the text > Iterator<AnalysedText> analysedSentences = analyzer.analyse(text); > while(analysedSentences.hasNext()){ > AnalysedText analysed = analysedSentences.next(); > //NOTE: depending on the config and the available models > // Tokens and/or Chunks might not be present > for(Token token : tokens){ > String posTag = token.getPosTag(); > double posProb = token.getPosProbability(); > } > for(Chunk chunk : chunks){ > //similar things for chunks > } > } > > While iterating over the sentences, tokens and chunk you could create > similar TextAnnotations as created by the CELI engine > > However note that - as Olivier mentioned - this creates a lot of RDF > triples. So it will not scale to very long texts. Assume 20 > Triples/Word. So texts with some thousands words should be still fine, > but if you analyze longer texts you will run into performance and > memory issues. > > best > Rupert > > On Thu, Jun 21, 2012 at 2:52 PM, Olivier Grisel > <[email protected]> wrote: > > 2012/6/21 Andrea Taurchini <[email protected]>: > >> Dear Olivier, > >> thanks for your reply. > >> Ok, so it is possible, but I have to implement it as a new Engine on my > own. > >> As for "Tagging Server" is a new restful interface to OpenNLP exposing > on > >> http its algorithm. > > > > Alright then you can indeed write a set of new low level, pure NLP > > engines and let delegate the semantic intepretations of such > > annotations to the caller. > > > > The Stanbol RDF-based output format might be a little bit verbose for > > such kind of low level annotations though. > > > > -- > > Olivier > > http://twitter.com/ogrisel - http://github.com/ogrisel > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >
