Loic, For #2: Your dictionary will contain more than 1 <token></token> for the same entry. > <entry ref="cheese"> > <token>cheddar</token> <token>cheese</token> > </entry> would label "cheddar" "cheese" as a "cheese".
I'm still looking at the bug for #3. James On 12/20/2011 8:38 AM, Loic Descotte wrote: > Hello, > I'm trying to use OpenNLP Dictionary and DictionaryNameFinder to do a > dictionnary lookup. > > I'm building my dictionary with the DictionarySerializer class. > My dictionary contains entries with attributes. > > Example : > > <dictionary case_sensitive="false"> > <entry ref="cheese"> > <token>cheddar</token> > </entry> > <entry ref="vegetable"> > <token>tomato</token> > </entry> > </dictionary> > > > The keyword lookup is working but there are things I don't know how to > do. > > 1. > When I find a token in a text , I get a list of Span objects : > > Span[] spans = finder.find(tokenizedText); > > I don't know how to retrieve the found token attributes: > For example, if I find "tomato", I would like to be able to retrieve > the "ref" attribute (vegetable). > > 2. > If in my dictionary I want to find a composed name (e.g. green > cabbage) , I am able to find "green", "cabage", but not "green > cabbage". Is there a special way to insert composed names in the > dictionary? > > 3. I've set my dictionnary to "case_sensitive="false" " but if there > is "Tomato" in my text, then "tomato" will not be found. > > Thanks a lot for your help > > -- > Loic > > ________________________________ > Kelkoo SAS > Société par Actions Simplifiée > Au capital de € 4.168.964,30 > Siège social : 8, rue du Sentier 75002 Paris > 425 093 069 RCS Paris > > Ce message et les pièces jointes sont confidentiels et établis à > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le > destinataire de ce message, merci de le détruire et d'en avertir > l'expéditeur. >