On 6/21/2011 9:03 PM, Amal Elmah wrote: > Thanks > > I noticed that and I corrected mine now it works the problem in this I could > not find any error in the format but the trainer does not accept this data > > Throughout <START> Ray <END> ’ s career , he was committed to developing > public engagement with sociology and ensuring the value of sociological > research is understood by decision makers . > > thanks > > > The error is related to the encoding and specifying the wrong type. I saved the file with the Windows default for Notepad and got an error like this, if I specified utf-8 as the encoding: > C:\Users\James > Kosin\Documents\NetBeansProjects\thesis\DocCompare>opennlp.bat To > kenNameFinderTrainer -lang en -encoding utf-8 -cutoff 0 -data > temp2.txt -model t > emp.model > Indexing events using cutoff of 0 > > Computing event counts... > java.nio.charset.MalformedInputException: Inp > ut length = 1 > Incorporating indexed data for training... > Exception in thread "main" java.lang.NullPointerException > at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) > at opennlp.maxent.GIS.trainModel(GIS.java:256) > at opennlp.model.TrainUtil.train(TrainUtil.java:170) > at > opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:381) > at > opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:453) > at > opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:476) > at > opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNa > meFinderTrainerTool.java:188) > at opennlp.tools.cmdline.CLI.main(CLI.java:187) Windows uses ANSI as the default for Notepad; which probably causes problems on the ' (apostrophe) character in the string. You can force UTF-8 by using Save as... instead of the normal save in Windows. Java doesn't support ANSI as an encoding at least it didn't take the encoding as that.... I'm sure that there are other issues with encoding if not specified properly on the command line.
James