[
https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122803#comment-13122803
]
Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------
Hi William,
Just to let you know, I'm trying to test my data to see if something goes
wrong. But as much as I understand, the eval file should have the same format
as the training data.
I tried the Trainer and the CrossValidator program of the Sentence Detector,
the Tokenizer, the PosTagger and the Chunker. Each time I used the same data
for the trainer and the crossValidator.
It works for the Tokenizer and the PosTagger.
For the Sentence Detector and the Chunker, the trainer work but not the
CrossValidator program though I use the same data !
Indeed, 0 events are reported in these cases.
For the Sentence Detector I tried with 100, 1,000 and 1,000,000 of sentences.
Same message.
For the chunker I tried with 500 and 500,000 words.
But for the chunker, I actually managed to get the line "Skipping corrupt
line:..." displayed with lines in the wrong format on purpuse.
But finally when I think to get a clean input, no event is counted.
Below the output for the chunker. I still continue to check my data but soon I
will have a look at the code.
Indexing events using cutoff of 5 Computing event counts... done. 0 events
Indexing... done. Sorting and merging events... Done indexing. Incorporating
indexed data for training... Exception in thread "main"
java.lang.NullPointerException at
opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) at
opennlp.maxent.GIS.trainModel(GIS.java:256) at
opennlp.model.TrainUtil.train(TrainUtil.java:182) at
opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:208) at
opennlp.tools.chunker.ChunkerCrossValidator.evaluate(ChunkerCrossValidator.java:78)
at
opennlp.tools.cmdline.chunker.ChunkerCrossValidatorTool.run(ChunkerCrossValidatorTool.java:102)
at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
> Key: OPENNLP-316
> URL: https://issues.apache.org/jira/browse/OPENNLP-316
> Project: OpenNLP
> Issue Type: Bug
> Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
> Affects Versions: tools-1.5.2-incubating
> Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc
> version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17
> UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
> Reporter: Nicolas Hernandez
> Assignee: William Colen
> Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an
> exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs.
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data.
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per
> line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ... in thread "main" java.lang.NullPointerException
> at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
> at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
> at
> opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
> at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
> Computing event counts... done. 0 events
> Indexing... done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> at opennlp.maxent.GIS.trainModel(GIS.java:256)
> at opennlp.model.TrainUtil.train(TrainUtil.java:182)
> at
> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
> at
> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
> at
> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
> at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
> at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
> at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
> at
> opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
> at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
> Computing event counts... done. 100333 events
> Indexing... done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 30168
> Number of Outcomes: 2
> Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
> 1: ... loglikelihood=-69545.53606709359 0.9337805108987073
> 2: ... loglikelihood=-18987.123809719425 0.9497872085953774
> ...
> 98: ... loglikelihood=-607.4216932752298 0.9989534848952987
> 99: ... loglikelihood=-603.2346954947699 0.9989734185163406
> 100: ... loglikelihood=-599.1235213848983 0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
> at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
> at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
> at
> opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
> at
> opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira