Re: name finder training tool

James Kosin Sat, 11 Dec 2010 09:09:57 -0800

Jorn,

Wouldn't the name dictionary be more of what A. Allen is looking for?


James K

On 12/10/2010 12:14 PM, A. Allen wrote:
> Thank you for the response. I made changes to my training data to include
> data that aren't names. I used old search term data. I received the same
> error. A sample of the new training data is listed below.
>
> <START:person>cantor<END>
> crs
> debt commission
> hr 4213
> hr3081
> hr5297
> <START:person>johnny isakson<END>
> lame duck session
> paycheck fairness act
> pigford
> unemployment insurance
> <START:person>wyden<END>
> 112th
> 112th Congress
> Dream Act
> GAO
> HR 5712
> Lame Duck
> <START:person>boehner<END>
>
> -AA
>
> On Wed, Dec 8, 2010 at 2:37 PM, Jörn Kottmann <[email protected]> wrote:
>
>> Hello,
>>
>> your training data only contains tokens which are
>> the begin or a continuation of a name, but zero "other"
>> tokens.
>>
>> If the name finder would be trained like this, it will always
>> estimate that these are the two only valid outcomes. That should
>> be possible actually (but maybe not useful).
>>
>> I didn't look at the source code, but I guess the error is caused by
>> a bug in the outcome validating code. We should add your case
>> to the unit test and fix the problem
>> .
>> To work around the problem just add a few sentences to your training
>> data which contain normal plain text without names.
>>
>> Please feel free to open a jira issue.
>>
>> Thanks,
>> Jörn
>>
>>
>> On 12/8/10 8:24 PM, A. Allen wrote:
>>
>>> Hello,
>>>
>>> Has anyone been able to train the name finder? I followed the instructions
>>> in the wiki and used pieces of the sample code, but keep getting the
>>> following:
>>>
>>> Indexing events using cutoff of 5
>>>
>>> Computing event counts...  done. 29376 events
>>> Indexing...  done.
>>> Sorting and merging events... done. Reduced 29376 events to 8313.
>>> Done indexing.
>>> Incorporating indexed data for training...
>>> done.
>>> Number of Event Tokens: 8313
>>>     Number of Outcomes: 1
>>>   Number of Predicates: 11869
>>> ...done.
>>> Computing model parameters...
>>> Performing 100 iterations.
>>>   1:  .. loglikelihood=0.0 1.0
>>>   2:  .. loglikelihood=0.0 1.0
>>> Exception in thread "main" java.lang.IllegalArgumentException: Model not
>>> compatible with name finder!
>>> at
>>>
>>> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
>>> at NameTrainer.main(NameTrainer.java:21)
>>>
>>> My training data looks like this:
>>> <START:person>Neil Abercrombie<END>
>>> <START:person>Anibal Acevedo-Vila<END>
>>> <START:person>Gary Ackerman<END>
>>> <START:person>Robert Aderholt<END>
>>> <START:person>Daniel Akaka<END>
>>> <START:person>Todd Akin<END>
>>> <START:person>Lamar Alexander<END>
>>> <START:person>Rodney Alexander<END>
>>>
>>> I appreciate any help that can be provided . Thank you.
>>>
>>> -AA
>>>
>>>

Re: name finder training tool

Reply via email to