Hi Jorn, thanks for replying. I changed the encoding of the file to the ANSI but I got another error ----------------------------------------------------------------------------------------------------- C:\OpenNLP\apache-opennlp-1.5.1-incubating-bin\apache-opennlp-1.5.1-incubating>j ava -jar lib\opennlp-tools-*.jar TokenNameFinderTrainer -encoding UTF-8 -lang en -data data1.txt -model maha.bin Indexing events using cutoff of 5 Computing event counts... java.io.IOException: Found unexpected annotat ion <END>. Incorporating indexed data for training... Exception in thread "main" java.lang.NullPointerException at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:272) at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:252) at opennlp.maxent.GIS.trainModel(GIS.java:228) at opennlp.maxent.GIS.trainModel(GIS.java:179) at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:345) at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356) at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNa meFinderTrainerTool.java:87) at opennlp.tools.cmdline.CLI.main(CLI.java:183) -------------------------------------------------------------------------------------------------------- I am sure there is not annotation END followed by period in my file there is always space between <END> and .
> Date: Tue, 21 Jun 2011 19:02:14 +0200 > From: kottm...@gmail.com > To: opennlp-users@incubator.apache.org > Subject: Re: What is the problem with the training filr > > Hi, > > there is an issue with the encoding of your trainingFile.txt, for some > reason it cannot be decoded > using UTF-8. Try to open it in a text editor with UTF-8 and you will get > an error too. > > Hope that helps, > Jörn > > On 6/21/11 6:59 PM, Amal Elmah wrote: > > When I used command line training tool on my data (training.txt) it gives > > error as follows: > > ------------------------------------------------------------------------------------------------------------------------ > > C:\OpenNLP\apache-opennlp-1.5.1-incubating-bin\apache-opennlp-1.5.1-incubating>java > > -jar lib\opennlp-tools-*.jar TokenNameFinderTrainer -encoding UTF-8 -lang > > en > > -data trainingFile.txt -model mymodel.bin > > Indexing events using cutoff of 5 > > Computing event counts... java.nio.charset.MalformedInputException: Input > > length = 1 > > Incorporating indexed data for training... > > Exception in thread "main" java.lang.NullPointerException > > at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:272) > > at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:252) > > at opennlp.maxent.GIS.trainModel(GIS.java:228) > > at opennlp.maxent.GIS.trainModel(GIS.java:179) > > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:345) > > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356) > > at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNa > > meFinderTrainerTool.java:87) > > at opennlp.tools.cmdline.CLI.main(CLI.java:183) > > --------------------------------------------------------------------------- > > I do not know what is the problem and this is part of my data in the text > > file > > > > Professor<START> Michael<END> > > Professor<START> Naci<END> > > Dr<START> Richard<END> ( p / t ) > > Dr<START> David<END> > > Professor<START> Vic<END> > > Dr<START> Adrian<END> > > Dr<START> Martin<END> > > Dr<START> Timothy<END> > > Dr<START> Ian<END> > > Dr<START> Ali<END> > > ----------------------------------------------------------------------------------------------------------------------- > > >