Evaluator and CrossValidator programs of the main analyzers throw exceptions
----------------------------------------------------------------------------

                 Key: OPENNLP-316
                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
             Project: OpenNLP
          Issue Type: Bug
          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
    Affects Versions: tools-1.5.2-incubating
         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc 
version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 
2011

java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

            Reporter: Nicolas Hernandez


Evaluator and CrossValidator programs of the main analyzers throw an exception 
when running

(test performed on the 1.5.3 dist via command line)

It seems that the SentenceDetector, Tokenizer, PosTagger and the
chunker (at least) throw a java.lang.NullPointerException if the
misclassified parameter is set to false or not present for the
Evaluator programs. 
The Evaluator programs works (provide a result) when the
misclassified parameter is set.
The CrossValidator programs do not work at all.

I have not test the other opennlp programs.

See below some example of the runs.
I tested on the examples from the documentation and also with my data. 
For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per 
line.
Tell if you want more details or anything

$opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
data/model/fr-sent.bin -data data/test/fr-sent.test
Loading Sentence Detector model ... done (0,013s)
Evaluating ...  in thread "main" java.lang.NullPointerException
       at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
       at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
       at 
opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
       at opennlp.tools.cmdline.CLI.main(CLI.java:191)

$opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
data/train/fr-sent.train -misclassified true
Indexing events using cutoff of 5

       Computing event counts...  done. 0 events
       Indexing...  done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
       at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
       at opennlp.maxent.GIS.trainModel(GIS.java:256)
       at opennlp.model.TrainUtil.train(TrainUtil.java:182)
       at 
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
       at 
opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
       at 
opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
       at opennlp.tools.cmdline.CLI.main(CLI.java:191)

$ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
data/model/fr-token.bin -data data/test/fr-token.test
Loading Tokenizer model ... done (0,428s)
Evaluating ... Exception in thread "main" java.lang.NullPointerException
       at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
       at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
       at 
opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
       at opennlp.tools.cmdline.CLI.main(CLI.java:191)

$ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
data/train/fr-token.train
Indexing events using cutoff of 5
       Computing event counts...  done. 100333 events
       Indexing...  done.
Sorting and merging events... done. Reduced 100333 events to 30168.
Done indexing.
Incorporating indexed data for training...
done.
       Number of Event Tokens: 30168
           Number of Outcomes: 2
         Number of Predicates: 8287
...done.
Computing model parameters ...
Performing 100 iterations.
 1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
 2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
...
 98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
 99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
Exception in thread "main" java.lang.NullPointerException
       at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
       at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
       at 
opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
       at 
opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to