Hi Nicolas,

Can you please open a Jira?
I will investigate the issue.

Thanks,
William


On Thu, Oct 6, 2011 at 9:46 AM, Nicolas Hernandez <
[email protected]> wrote:

> On Thu, Oct 6, 2011 at 2:34 PM, Jörn Kottmann <[email protected]> wrote:
> > Looks like the Cross Validator is failing because you do
> > not have enough data? On how many sample sentences do you
> > run it?
> I tested with 1 000 and 1 000 000... same results except I had to
> extend the java heap size for one of them before getting the error...
>
>
>
> Just to let you know for, below you will find what I got for the
> Tokenizer (here with a 1000 sentences train corpus)
>
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at
> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at
> opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>         Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>         at
> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at
> opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at
> opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>
>
> >
> > We will investigate this further.
> >
> > Jörn
> >
> > On 10/6/11 2:26 PM, Nicolas Hernandez wrote:
> >>
> >> Please find below the output of two runs which lead to an error:
> >> SentenceDetectorEvaluator without "-misclassified true" parameter and
> >> SentenceDetectorCrossValidator (which gives the same error with or
> >> without "-misclassified true").
> >>
> >> I tested on the examples from the documentation and also with my data.
> >> Tell if you want more details or anything
> >>
> >> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> >> data/model/fr-sent.bin -data data/test/fr-sent.test
> >> Loading Sentence Detector model ... done (0,013s)
> >> Evaluating ...  in thread "main" java.lang.NullPointerException
> >>        at
> >> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
> >>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
> >>        at
> >>
> opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
> >>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> >>
> >> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> >> data/train/fr-sent.train -misclassified true
> >> Indexing events using cutoff of 5
> >>
> >>        Computing event counts...  done. 0 events
> >>        Indexing...  done.
> >> Sorting and merging events... Done indexing.
> >> Incorporating indexed data for training...
> >> Exception in thread "main" java.lang.NullPointerException
> >>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> >>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
> >>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
> >>        at
> >>
> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
> >>        at
> >>
> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
> >>        at
> >>
> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
> >>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> >>
> >>
> >>
> >> On Thu, Oct 6, 2011 at 1:02 PM, Jörn Kottmann<[email protected]>
>  wrote:
> >>>
> >>> On 10/6/11 12:42 PM, Nicolas Hernandez wrote:
> >>>>
> >>>> I try to run the Evaluator and CrossValidator programs of the 1.5.3 in
> >>>> command line ?
> >>>>
> >>>> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> >>>> chunker (at least) throw a java.lang.NullPointerException if the
> >>>> misclassified parameter is set to false or not present for the
> >>>> Evaluator programs. The CrossValidator programs do not work at all.
> >>>>
> >>>> Before looking at it, is something (e.g. global refactoring) planed
> >>>> about
> >>>> it ?
> >>>
> >>> 1.5.3 is the mostly the same version as the 1.5.2 RC 2.
> >>>
> >>> The bugs you describe here should of course not be present, and must be
> >>> fixed for the 1.5.2 release. We just did a major refactoring of a lot
> of
> >>> cmd
> >>> line
> >>> code. Looks like a regression.
> >>>
> >>> Can you please give us more details? The stack trace would be helpful
> and
> >>> the
> >>> command line arguments you passed in. To find a bug I believe it should
> >>> be
> >>> enough
> >>> to get this for one of the mentioned evaluators.
> >>>
> >>> Jörn
> >>>
> >
> >
>

Reply via email to