done here https://issues.apache.org/jira/browse/OPENNLP-316
On Thu, Oct 6, 2011 at 5:32 PM, [email protected] <[email protected]> wrote: > Hi Nicolas, > > Can you please open a Jira? > I will investigate the issue. > > Thanks, > William > > > On Thu, Oct 6, 2011 at 9:46 AM, Nicolas Hernandez < > [email protected]> wrote: > >> On Thu, Oct 6, 2011 at 2:34 PM, Jörn Kottmann <[email protected]> wrote: >> > Looks like the Cross Validator is failing because you do >> > not have enough data? On how many sample sentences do you >> > run it? >> I tested with 1 000 and 1 000 000... same results except I had to >> extend the java heap size for one of them before getting the error... >> >> >> >> Just to let you know for, below you will find what I got for the >> Tokenizer (here with a 1000 sentences train corpus) >> >> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model >> data/model/fr-token.bin -data data/test/fr-token.test >> Loading Tokenizer model ... done (0,428s) >> Evaluating ... Exception in thread "main" java.lang.NullPointerException >> at >> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76) >> at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98) >> at >> opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81) >> at opennlp.tools.cmdline.CLI.main(CLI.java:191) >> >> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data >> data/train/fr-token.train >> Indexing events using cutoff of 5 >> Computing event counts... done. 100333 events >> Indexing... done. >> Sorting and merging events... done. Reduced 100333 events to 30168. >> Done indexing. >> Incorporating indexed data for training... >> done. >> Number of Event Tokens: 30168 >> Number of Outcomes: 2 >> Number of Predicates: 8287 >> ...done. >> Computing model parameters ... >> Performing 100 iterations. >> 1: ... loglikelihood=-69545.53606709359 0.9337805108987073 >> 2: ... loglikelihood=-18987.123809719425 0.9497872085953774 >> ... >> 98: ... loglikelihood=-607.4216932752298 0.9989534848952987 >> 99: ... loglikelihood=-603.2346954947699 0.9989734185163406 >> 100: ... loglikelihood=-599.1235213848983 0.9989833853268616 >> Exception in thread "main" java.lang.NullPointerException >> at >> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76) >> at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98) >> at >> opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98) >> at >> opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94) >> at opennlp.tools.cmdline.CLI.main(CLI.java:191) >> >> >> > >> > We will investigate this further. >> > >> > Jörn >> > >> > On 10/6/11 2:26 PM, Nicolas Hernandez wrote: >> >> >> >> Please find below the output of two runs which lead to an error: >> >> SentenceDetectorEvaluator without "-misclassified true" parameter and >> >> SentenceDetectorCrossValidator (which gives the same error with or >> >> without "-misclassified true"). >> >> >> >> I tested on the examples from the documentation and also with my data. >> >> Tell if you want more details or anything >> >> >> >> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model >> >> data/model/fr-sent.bin -data data/test/fr-sent.test >> >> Loading Sentence Detector model ... done (0,013s) >> >> Evaluating ... in thread "main" java.lang.NullPointerException >> >> at >> >> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80) >> >> at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98) >> >> at >> >> >> opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80) >> >> at opennlp.tools.cmdline.CLI.main(CLI.java:191) >> >> >> >> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data >> >> data/train/fr-sent.train -misclassified true >> >> Indexing events using cutoff of 5 >> >> >> >> Computing event counts... done. 0 events >> >> Indexing... done. >> >> Sorting and merging events... Done indexing. >> >> Incorporating indexed data for training... >> >> Exception in thread "main" java.lang.NullPointerException >> >> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) >> >> at opennlp.maxent.GIS.trainModel(GIS.java:256) >> >> at opennlp.model.TrainUtil.train(TrainUtil.java:182) >> >> at >> >> >> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283) >> >> at >> >> >> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104) >> >> at >> >> >> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98) >> >> at opennlp.tools.cmdline.CLI.main(CLI.java:191) >> >> >> >> >> >> >> >> On Thu, Oct 6, 2011 at 1:02 PM, Jörn Kottmann<[email protected]> >> wrote: >> >>> >> >>> On 10/6/11 12:42 PM, Nicolas Hernandez wrote: >> >>>> >> >>>> I try to run the Evaluator and CrossValidator programs of the 1.5.3 in >> >>>> command line ? >> >>>> >> >>>> It seems that the SentenceDetector, Tokenizer, PosTagger and the >> >>>> chunker (at least) throw a java.lang.NullPointerException if the >> >>>> misclassified parameter is set to false or not present for the >> >>>> Evaluator programs. The CrossValidator programs do not work at all. >> >>>> >> >>>> Before looking at it, is something (e.g. global refactoring) planed >> >>>> about >> >>>> it ? >> >>> >> >>> 1.5.3 is the mostly the same version as the 1.5.2 RC 2. >> >>> >> >>> The bugs you describe here should of course not be present, and must be >> >>> fixed for the 1.5.2 release. We just did a major refactoring of a lot >> of >> >>> cmd >> >>> line >> >>> code. Looks like a regression. >> >>> >> >>> Can you please give us more details? The stack trace would be helpful >> and >> >>> the >> >>> command line arguments you passed in. To find a bug I believe it should >> >>> be >> >>> enough >> >>> to get this for one of the mentioned evaluators. >> >>> >> >>> Jörn >> >>> >> > >> > >> > -- [email protected] # http://enicolashernandez.blogspot.com http://www.univ-nantes.fr/hernandez-n # Laboratoire Informatique de Nantes Atlantique CNRS UMR 6241 tel. +33 (0)2 51 12 58 55 # Université de Nantes - Institut Universitaire de Technologie - Département Informatique tel. +33 (0)2 40 30 60 67
