Re: do any of the CrossValidators work at all?

Jörn Kottmann Mon, 25 Nov 2013 14:23:22 -0800

You probably don't have enough training data, OpenNLP doesn't really
like that and can fail in some unpleasant ways.


How many sentences do you have? How many name annotations do you have?

Jörn

On 11/25/2013 11:10 PM, Walrus theCat wrote:

Hi Jörn, William, and the rest of you OpenNLPers,

This problem is resurfacing.  I found out that my input didn't meet the
input specified in the docs, that it should be 1 sentence per line.  After
properly sentence-breaking my input, a very similar error is cropping up,
viz, that it works with a TokenNameFinderEvaluator but not with a
CrossValidator.  I'm using the FileChannel constructor on the stream.

I've been stepping through the source, but to no avail.  The stack trace is
as follows:

Exception in thread "main" java.lang.NullPointerException
     at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
     at opennlp.maxent.GIS.trainModel(GIS.java:256)
     at opennlp.model.TrainUtil.train(TrainUtil.java:184)
     at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:366)
     at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
     at
opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(TokenNameFinderCrossValidator.java:275)
     at
walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:153)
     at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$3.apply(TrainNERModels.scala:58)
     at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$3.apply(TrainNERModels.scala:53)
     at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
     at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:53)
     at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)


On Fri, Nov 22, 2013 at 2:57 AM, Jörn Kottmann <[email protected]> wrote:

The first exception usually indicates that you don't have enough training
data, or it contains
no names. Try to create more training data.

The second exception indicates that the stream you are using can't be
reset, and therefore doesn't work
with the cross validator, we should definetley make this more clear.

Jörn


On 11/21/2013 06:46 PM, Walrus theCat wrote:

Jörn,

Thanks for your interest.

Here's the exception when I use the BufferedReader.  This exception is
thrown during training.  It does a couple "log likelihood" statements
first, before throwing this:

Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
      at
opennlp.tools.namefind.TokenNameFinderModel.<init>(
TokenNameFinderModel.java:81)
      at
opennlp.tools.namefind.TokenNameFinderModel.<init>(
TokenNameFinderModel.java:106)
      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:374)
      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
      at
walrusthecat.ml.ner.TrainNERModels$.trainModel(TrainNERModels.scala:118)
      at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
2.apply(TrainNERModels.scala:53)
      at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
2.apply(TrainNERModels.scala:49)
      at
scala.collection.mutable.ResizableArray$class.foreach(
ResizableArray.scala:60)
      at scala.collection.mutable.ArrayBuffer.foreach(
ArrayBuffer.scala:47)
      at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:49)
      at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)

And here it is when I use the ByteArrayInputStream.  This exception is
thrown when cross-validating, but not when evaluating the training data
stream:

Exception in thread "main" java.io.IOException: Stream not marked
      at java.io.BufferedReader.reset(BufferedReader.java:505)
      at
opennlp.tools.util.PlainTextByLineStream.reset(
PlainTextByLineStream.java:79)
      at
opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
      at
opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
      at
opennlp.tools.namefind.TokenNameFinderCrossValidator$
NameToDocumentSampleStream.reset(TokenNameFinderCrossValidator.java:99)
      at
opennlp.tools.util.eval.CrossValidationPartitioner.next(
CrossValidationPartitioner.java:264)
      at
opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(
TokenNameFinderCrossValidator.java:272)
      at
walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:129)
      at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
2.apply(TrainNERModels.scala:55)
      at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
2.apply(TrainNERModels.scala:47)
      at
scala.collection.mutable.ResizableArray$class.foreach(
ResizableArray.scala:60)
      at scala.collection.mutable.ArrayBuffer.foreach(
ArrayBuffer.scala:47)
      at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:47)
      at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)


On Thu, Nov 21, 2013 at 12:25 AM, Jörn Kottmann <[email protected]>
wrote:

  Please post the exception with stack trace here.

Jörn



On 11/21/2013 07:53 AM, Walrus theCat wrote:

  To update, when I create the stream as above

(PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
marked"
error when attempting to cross validate (but not when just evaluating on
the training data).  When I, instead, create the PlainTextByLineStream
on
a
BufferedReader (see below), I get the error " Model not compatible with
name finder!" during training.  The result is I can't cross validate,
something I really need to do.


     def linesToStream(lines:Array[String]) = {
       val charset = Charset.forName(CHARSET)
       val reader = new BufferedReader(new InputStreamReader(new
ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
       new NameSampleDataStream(
           new PlainTextByLineStream(
               reader))
     }


On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <[email protected]

wrote:

   Thanks for the reply, even though I was kind of rude.  I'm using the
API.

The evaluator gives me suspiciously high metrics, and the cross
validator
fails out as mentioned.

The code is in Scala:

     def linesToStream(lines:Array[String]) = {
       val charset = Charset.forName(CHARSET)
       new NameSampleDataStream(
           new PlainTextByLineStream(
               new
ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)),
charset))
     }

I train the model with the above:
         NameFinderME.train("en", entityName, linesToStream(lines),
TrainingParameters.defaultParams(),
               null:Array[Byte], Collections.emptyMap[String,
Object]());

When it comes time to evaluate, I recreate the stream to try to
circumvent
these kinds of problems ("resetting" it also throws the same error):

       val crossValidator = new TokenNameFinderCrossValidator("en",
entityName, TrainingParameters.defaultParams(),
               null:Array[Byte], Collections.emptyMap[String, Object](),
listener)
crossValidator.evaluate(sampleStream, 10)

Thanks



On Wed, Nov 20, 2013 at 3:43 PM, William Colen <
[email protected]>
wrote:

   Are you using the API or the command line tools? Can you send a code

snippet showing how do you load the ObjectStream?


2013/11/20 Walrus theCat <[email protected]>

   I'm getting  "java.io.IOException: Stream not marked" when calling

TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream.

     This

  works when I use a TokenNameFinderEvaluator instead.  I'm led to

believe
that .reset isn't called on the stream in the CrossValidator.

Re: do any of the CrossValidators work at all?

Reply via email to