Jorn, Wouldn't the name dictionary be more of what A. Allen is looking for?
James K On 12/10/2010 12:14 PM, A. Allen wrote: > Thank you for the response. I made changes to my training data to include > data that aren't names. I used old search term data. I received the same > error. A sample of the new training data is listed below. > > <START:person>cantor<END> > crs > debt commission > hr 4213 > hr3081 > hr5297 > <START:person>johnny isakson<END> > lame duck session > paycheck fairness act > pigford > unemployment insurance > <START:person>wyden<END> > 112th > 112th Congress > Dream Act > GAO > HR 5712 > Lame Duck > <START:person>boehner<END> > > -AA > > On Wed, Dec 8, 2010 at 2:37 PM, Jörn Kottmann <[email protected]> wrote: > >> Hello, >> >> your training data only contains tokens which are >> the begin or a continuation of a name, but zero "other" >> tokens. >> >> If the name finder would be trained like this, it will always >> estimate that these are the two only valid outcomes. That should >> be possible actually (but maybe not useful). >> >> I didn't look at the source code, but I guess the error is caused by >> a bug in the outcome validating code. We should add your case >> to the unit test and fix the problem >> . >> To work around the problem just add a few sentences to your training >> data which contain normal plain text without names. >> >> Please feel free to open a jira issue. >> >> Thanks, >> Jörn >> >> >> On 12/8/10 8:24 PM, A. Allen wrote: >> >>> Hello, >>> >>> Has anyone been able to train the name finder? I followed the instructions >>> in the wiki and used pieces of the sample code, but keep getting the >>> following: >>> >>> Indexing events using cutoff of 5 >>> >>> Computing event counts... done. 29376 events >>> Indexing... done. >>> Sorting and merging events... done. Reduced 29376 events to 8313. >>> Done indexing. >>> Incorporating indexed data for training... >>> done. >>> Number of Event Tokens: 8313 >>> Number of Outcomes: 1 >>> Number of Predicates: 11869 >>> ...done. >>> Computing model parameters... >>> Performing 100 iterations. >>> 1: .. loglikelihood=0.0 1.0 >>> 2: .. loglikelihood=0.0 1.0 >>> Exception in thread "main" java.lang.IllegalArgumentException: Model not >>> compatible with name finder! >>> at >>> >>> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50) >>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350) >>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356) >>> at NameTrainer.main(NameTrainer.java:21) >>> >>> My training data looks like this: >>> <START:person>Neil Abercrombie<END> >>> <START:person>Anibal Acevedo-Vila<END> >>> <START:person>Gary Ackerman<END> >>> <START:person>Robert Aderholt<END> >>> <START:person>Daniel Akaka<END> >>> <START:person>Todd Akin<END> >>> <START:person>Lamar Alexander<END> >>> <START:person>Rodney Alexander<END> >>> >>> I appreciate any help that can be provided . Thank you. >>> >>> -AA >>> >>>
