On 6/21/2011 9:03 PM, Amal Elmah wrote:
> Thanks 
>  
> I noticed that and I corrected mine now it works the problem in this I could 
> not find any error in the format but the trainer does not accept this data
>  
> Throughout <START> Ray <END> ’ s career , he was committed to developing 
> public engagement with sociology and ensuring the value of sociological 
> research is understood by decision makers .
>
>  thanks 
>  
>  
>
The error is related to the encoding and specifying the wrong type.  I
saved the file with the Windows default for Notepad and got an error
like this, if I specified utf-8 as the encoding:
> C:\Users\James
> Kosin\Documents\NetBeansProjects\thesis\DocCompare>opennlp.bat To
> kenNameFinderTrainer -lang en -encoding utf-8 -cutoff 0 -data
> temp2.txt -model t
> emp.model
> Indexing events using cutoff of 0
>
>         Computing event counts... 
> java.nio.charset.MalformedInputException: Inp
> ut length = 1
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>         at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>         at opennlp.maxent.GIS.trainModel(GIS.java:256)
>         at opennlp.model.TrainUtil.train(TrainUtil.java:170)
>         at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:381)
>         at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:453)
>         at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:476)
>         at
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNa
> meFinderTrainerTool.java:188)
>         at opennlp.tools.cmdline.CLI.main(CLI.java:187)
Windows uses ANSI as the default for Notepad; which probably causes
problems on the ' (apostrophe) character in the string.  You can force
UTF-8 by using Save as... instead of the normal save in Windows.
Java doesn't support ANSI as an encoding at least it didn't take the
encoding as that....
I'm sure that there are other issues with encoding if not specified
properly on the command line.

James

Reply via email to