Loic, First, the Span[] it returns is an array of Span that contain start and end markers for the names found in the text. It is a class that wraps the start and end markers to help easily use them. Each Span has a method to getStart(), and getEnd() that return these values. To get the name found for one of the Spans, you just have to get the list of tokens you provided as the input to find()... and use something like this:
String sentence[] = {"The", "match", "was", "won", "by", "Vanessa", "Williams", "."}; Span names[] = mNameFinder.find(sentence); for ( i = 0; i < names.length; i ++ ) { Span entry = names[ i ]; for ( j = entry.getStart(); j < entry.getEnd(); j ++ ) { // use sentence[ j ] here for each index, to build the full name found or have a list of tokens. } // here, you should have created something in the loop above for the first name found. // each sentence can have multiple names found... so, keep looping. } // here you are done with this sentence. To find the tag, you will have to keep a copy of the Dictionary you passed into the DictionaryNameFinder constructor. Then, look up the tokens at the end of the second loop. This will allow you to get the longest match in the dictionary with the Span specified. Unfortunately, I see we don't have anything to get the matching entry from the dictionary.... or I'm overlooking something. Jorn: Any comments on this? Hope this helps some, James On 12/21/2011 7:54 AM, Loic Descotte wrote: > >>>>> On 12/20/2011 8:38 AM, Loic Descotte wrote: >>>>>> Hello, >>>>>> I'm trying to use OpenNLP Dictionary and DictionaryNameFinder to do a >>>>>> dictionnary lookup. >>>>>> >>>>>> I'm building my dictionary with the DictionarySerializer class. >>>>>> My dictionary contains entries with attributes. >>>>>> >>>>>> Example : >>>>>> >>>>>> <dictionary case_sensitive="false"> >>>>>> <entry ref="cheese"> >>>>>> <token>cheddar</token> >>>>>> </entry> >>>>>> <entry ref="vegetable"> >>>>>> <token>tomato</token> >>>>>> </entry> >>>>>> </dictionary> >>>>>> >>>>>> >>>>>> The keyword lookup is working but there are things I don't know how to >>>>>> do. >>>>>> >>>>>> 1. >>>>>> When I find a token in a text , I get a list of Span objects : >>>>>> >>>>>> Span[] spans = finder.find(tokenizedText); >>>>>> >>>>>> I don't know how to retrieve the found token attributes: >>>>>> For example, if I find "tomato", I would like to be able to retrieve >>>>>> the "ref" attribute (vegetable). >>>>>> >>>>>>