Hi,
there is an issue with the encoding of your trainingFile.txt, for some
reason it cannot be decoded
using UTF-8. Try to open it in a text editor with UTF-8 and you will get
an error too.
Hope that helps,
Jörn
On 6/21/11 6:59 PM, Amal Elmah wrote:
When I used command line training tool on my data (training.txt) it gives
error as follows:
------------------------------------------------------------------------------------------------------------------------
C:\OpenNLP\apache-opennlp-1.5.1-incubating-bin\apache-opennlp-1.5.1-incubating>java
-jar lib\opennlp-tools-*.jar TokenNameFinderTrainer -encoding UTF-8 -lang en
-data trainingFile.txt -model mymodel.bin
Indexing events using cutoff of 5
Computing event counts... java.nio.charset.MalformedInputException:
Input length = 1
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:272)
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:252)
at opennlp.maxent.GIS.trainModel(GIS.java:228)
at opennlp.maxent.GIS.trainModel(GIS.java:179)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:345)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
at
opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNa
meFinderTrainerTool.java:87)
at opennlp.tools.cmdline.CLI.main(CLI.java:183)
---------------------------------------------------------------------------
I do not know what is the problem and this is part of my data in the text file
Professor<START> Michael<END>
Professor<START> Naci<END>
Dr<START> Richard<END> ( p / t )
Dr<START> David<END>
Professor<START> Vic<END>
Dr<START> Adrian<END>
Dr<START> Martin<END>
Dr<START> Timothy<END>
Dr<START> Ian<END>
Dr<START> Ali<END>
-----------------------------------------------------------------------------------------------------------------------