Jorn, We have a problem, but, need some guidance on how we want to address the issue here. The problem is that the hash code doesn't compute for the single token unless we already have a single token in the dictionary with that match. In simple English, the test dictionary has "venessa" and "venessa", "williams" added as two entries. When the Dictionary name finder tries to find... first "Venessa" we have a match for the first dictionary entry... when it later tests with "Venessa", "Williams" we get a hit on the second entry. The problem is if we don't have "venessa" in the dictionary... we don't get the first match; so, the DictionaryNameFinder overlooks the name entirely.
So the problem is all in the hashcode............. Question is how to fix, or address the issue without making this an 2^N type problem. James On 3/14/2012 10:28 AM, Jörn Kottmann wrote: > That simply means that nobody yet tested/debugged with > your posted dictionary and sample. > > Jörn > > On 03/14/2012 03:24 PM, Dimitrios wrote: >> No response in the last 2-3 hours... >> >> Should i assume that someone did confirm my issue using the tiny >> dictionary and the test paragraph i posted before? >> >> Or should i assume that Jame's confirmation that everything works ok >> still stands? >> >> Jim >> >> >> On 14/03/12 11:39, Dimitrios wrote: >>> Forgot to mention that after find() returns the Spans i'm doing >>> Span.SpansToStrings(span-array token-array) to get the human >>> readable array of strings.... >>> >>> Jim >>> >>> >>> On 14/03/12 11:34, Dimitrios wrote: >>>> On 14/03/12 11:23, Jörn Kottmann wrote: >>>>> Can you re-produce your issue with a dictionary which only >>>>> contains a single entry? >>>> >>>> Yes i can indeed re-produce the issue with the following dictionary: >>>> -------------------------------------------------------------------------------------------- >>>> >>>> <?xml version="1.0" encoding="UTF-8"?> >>>> <dictionary case_sensitive="false"> >>>> <entry> >>>> <token>Folic</token> >>>> <token>acid</token> >>>> </entry> >>>> <entry> >>>> <token>Baclofen</token> >>>> </entry> >>>> </dictionary> >>>> -------------------------------------------------------------------------------------------- >>>> >>>> >>>> The small paragraph i'm using for testing is this: >>>> >>>> "Folic acid is one variable, but other factors remain. >>>> Studies suggest that substances active at the GABA receptor may >>>> produce NTDs. >>>> To test this hypothesis pregnant rats were exposed to either the >>>> GABA a agonist muscimol (1, 2 or 4 mg/kg), the GABA a antagonist >>>> bicuculline (.5, 1, or 2 mg/kg), the GABA b agonist baclofen (15, >>>> 30, 60 mg/kg), or the GABA b antagonist hydroxysaclofen (1, 3, or 5 >>>> mg/kg) during neural tube formation. >>>> Normal saline was used as a control and valproic acid (600 mg/kg) >>>> as a positive control." >>>> >>>> >>>> The dictionary finds "baclofen" but it does not find "Folic acid"! >>>> The workflow is as follows: >>>> >>>> 1. get-sentences >>>> 2. tokenize -sentences >>>> 3. call dictionary name finder ".find()" method with an array of >>>> srings (tokens of a single sentence) >>>> >>>> Jim >>>> >>> >> >> >
