Re: Dictionary name finder question

James Kosin Wed, 14 Mar 2012 20:32:40 -0700

Jorn,

We have a problem, but, need some guidance on how we want to address the
issue here.
The problem is that the hash code doesn't compute for the single token
unless we already have a single token in the dictionary with that match.
In simple English, the test dictionary has "venessa" and "venessa",
"williams" added as two entries.  When the Dictionary name finder tries
to find... first "Venessa" we have a match for the first dictionary
entry... when it later tests with "Venessa", "Williams" we get a hit on
the second entry.
The problem is if we don't have "venessa" in the dictionary... we don't
get the first match; so, the DictionaryNameFinder overlooks the name
entirely.


So the problem is all in the hashcode.............

Question is how to fix, or address the issue without making this an 2^N
type problem.

James

On 3/14/2012 10:28 AM, Jörn Kottmann wrote:
> That simply means that nobody yet tested/debugged with
> your posted dictionary and sample.
>
> Jörn
>
> On 03/14/2012 03:24 PM, Dimitrios wrote:
>> No response in the last 2-3 hours...
>>
>> Should i assume that someone did confirm my issue using the tiny
>> dictionary and the test paragraph i posted before?
>>
>> Or should i assume that Jame's confirmation that everything works ok
>> still stands?
>>
>> Jim
>>
>>
>> On 14/03/12 11:39, Dimitrios wrote:
>>> Forgot to mention that after find() returns the Spans i'm doing
>>> Span.SpansToStrings(span-array token-array) to get the human
>>> readable array of strings....
>>>
>>> Jim
>>>
>>>
>>> On 14/03/12 11:34, Dimitrios wrote:
>>>> On 14/03/12 11:23, Jörn Kottmann wrote:
>>>>> Can you re-produce your issue with a dictionary which only
>>>>> contains a single entry? 
>>>>
>>>> Yes i can indeed re-produce the issue with the following dictionary:
>>>> --------------------------------------------------------------------------------------------
>>>>
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <dictionary case_sensitive="false">
>>>> <entry>
>>>> <token>Folic</token>
>>>> <token>acid</token>
>>>> </entry>
>>>> <entry>
>>>> <token>Baclofen</token>
>>>> </entry>
>>>> </dictionary>
>>>> --------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> The small paragraph i'm using for testing is this:
>>>>
>>>> "Folic acid is one variable, but other factors remain.
>>>> Studies suggest that substances active at the GABA receptor may
>>>> produce NTDs.
>>>> To test this hypothesis pregnant rats were exposed to either the
>>>> GABA a agonist muscimol (1, 2 or 4 mg/kg), the GABA a antagonist
>>>> bicuculline (.5, 1, or 2 mg/kg), the GABA b agonist baclofen (15,
>>>> 30, 60 mg/kg), or the GABA b antagonist hydroxysaclofen (1, 3, or 5
>>>> mg/kg) during neural tube formation.
>>>> Normal saline was used as a control and valproic acid (600 mg/kg)
>>>> as a positive control."
>>>>
>>>>
>>>> The dictionary finds "baclofen" but it does not find "Folic acid"!
>>>> The workflow is as follows:
>>>>
>>>>  1. get-sentences
>>>>  2. tokenize -sentences
>>>>  3. call dictionary name finder ".find()" method with an array of
>>>>     srings (tokens of a single sentence)
>>>>
>>>> Jim
>>>>
>>>
>>
>>
>

Re: Dictionary name finder question

Reply via email to