Re: do any of the CrossValidators work at all?

Jörn Kottmann Fri, 22 Nov 2013 02:58:45 -0800

The first exception usually indicates that you don't have enoughtraining data, or it contains

no names. Try to create more training data.

The second exception indicates that the stream you are using can't bereset, and therefore doesn't work

with the cross validator, we should definetley make this more clear.


Jörn

On 11/21/2013 06:46 PM, Walrus theCat wrote:

Jörn,

Thanks for your interest.

Here's the exception when I use the BufferedReader.  This exception is
thrown during training.  It does a couple "log likelihood" statements
first, before throwing this:

Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
     at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:81)
     at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:106)
     at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:374)
     at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
     at
walrusthecat.ml.ner.TrainNERModels$.trainModel(TrainNERModels.scala:118)
     at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:53)
     at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:49)
     at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
     at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:49)
     at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)

And here it is when I use the ByteArrayInputStream.  This exception is
thrown when cross-validating, but not when evaluating the training data
stream:

Exception in thread "main" java.io.IOException: Stream not marked
     at java.io.BufferedReader.reset(BufferedReader.java:505)
     at
opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
     at
opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
     at
opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
     at
opennlp.tools.namefind.TokenNameFinderCrossValidator$NameToDocumentSampleStream.reset(TokenNameFinderCrossValidator.java:99)
     at
opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:264)
     at
opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(TokenNameFinderCrossValidator.java:272)
     at
walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:129)
     at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:55)
     at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:47)
     at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
     at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:47)
     at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)


On Thu, Nov 21, 2013 at 12:25 AM, Jörn Kottmann <[email protected]> wrote:

Please post the exception with stack trace here.

Jörn



On 11/21/2013 07:53 AM, Walrus theCat wrote:

To update, when I create the stream as above
(PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
marked"
error when attempting to cross validate (but not when just evaluating on
the training data).  When I, instead, create the PlainTextByLineStream on
a
BufferedReader (see below), I get the error " Model not compatible with
name finder!" during training.  The result is I can't cross validate,
something I really need to do.


    def linesToStream(lines:Array[String]) = {
      val charset = Charset.forName(CHARSET)
      val reader = new BufferedReader(new InputStreamReader(new
ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
      new NameSampleDataStream(
          new PlainTextByLineStream(
              reader))
    }


On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <[email protected]

wrote:

  Thanks for the reply, even though I was kind of rude.  I'm using the API.

The evaluator gives me suspiciously high metrics, and the cross validator
fails out as mentioned.

The code is in Scala:

    def linesToStream(lines:Array[String]) = {
      val charset = Charset.forName(CHARSET)
      new NameSampleDataStream(
          new PlainTextByLineStream(
              new
ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)), charset))
    }

I train the model with the above:
        NameFinderME.train("en", entityName, linesToStream(lines),
TrainingParameters.defaultParams(),
              null:Array[Byte], Collections.emptyMap[String, Object]());

When it comes time to evaluate, I recreate the stream to try to
circumvent
these kinds of problems ("resetting" it also throws the same error):

      val crossValidator = new TokenNameFinderCrossValidator("en",
entityName, TrainingParameters.defaultParams(),
              null:Array[Byte], Collections.emptyMap[String, Object](),
listener)
crossValidator.evaluate(sampleStream, 10)

Thanks



On Wed, Nov 20, 2013 at 3:43 PM, William Colen <[email protected]>
wrote:

  Are you using the API or the command line tools? Can you send a code

snippet showing how do you load the ObjectStream?


2013/11/20 Walrus theCat <[email protected]>

  I'm getting  "java.io.IOException: Stream not marked" when calling

TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream.

   This

works when I use a TokenNameFinderEvaluator instead.  I'm led to
believe
that .reset isn't called on the stream in the CrossValidator.

Re: do any of the CrossValidators work at all?

Reply via email to