Re: do any of the CrossValidators work at all?

Walrus theCat Thu, 21 Nov 2013 10:28:07 -0800

Jörn,

Thanks for your interest.


Here's the exception when I use the BufferedReader.  This exception is
thrown during training.  It does a couple "log likelihood" statements
first, before throwing this:

Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
    at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:81)
    at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:106)
    at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:374)
    at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
    at
walrusthecat.ml.ner.TrainNERModels$.trainModel(TrainNERModels.scala:118)
    at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:53)
    at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:49)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:49)
    at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)

And here it is when I use the ByteArrayInputStream.  This exception is
thrown when cross-validating, but not when evaluating the training data
stream:

Exception in thread "main" java.io.IOException: Stream not marked
    at java.io.BufferedReader.reset(BufferedReader.java:505)
    at
opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
    at
opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
    at
opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
    at
opennlp.tools.namefind.TokenNameFinderCrossValidator$NameToDocumentSampleStream.reset(TokenNameFinderCrossValidator.java:99)
    at
opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:264)
    at
opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(TokenNameFinderCrossValidator.java:272)
    at
walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:129)
    at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:55)
    at
walrusthecat.ml.ner.TrainNERModels$$anonfun$main$2.apply(TrainNERModels.scala:47)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:47)
    at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)


On Thu, Nov 21, 2013 at 12:25 AM, Jörn Kottmann <[email protected]> wrote:

> Please post the exception with stack trace here.
>
> Jörn
>
>
>
> On 11/21/2013 07:53 AM, Walrus theCat wrote:
>
>> To update, when I create the stream as above
>> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
>> marked"
>> error when attempting to cross validate (but not when just evaluating on
>> the training data).  When I, instead, create the PlainTextByLineStream on
>> a
>> BufferedReader (see below), I get the error " Model not compatible with
>> name finder!" during training.  The result is I can't cross validate,
>> something I really need to do.
>>
>>
>>    def linesToStream(lines:Array[String]) = {
>>      val charset = Charset.forName(CHARSET)
>>      val reader = new BufferedReader(new InputStreamReader(new
>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
>>      new NameSampleDataStream(
>>          new PlainTextByLineStream(
>>              reader))
>>    }
>>
>>
>> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <[email protected]
>> >wrote:
>>
>>  Thanks for the reply, even though I was kind of rude.  I'm using the API.
>>> The evaluator gives me suspiciously high metrics, and the cross validator
>>> fails out as mentioned.
>>>
>>> The code is in Scala:
>>>
>>>    def linesToStream(lines:Array[String]) = {
>>>      val charset = Charset.forName(CHARSET)
>>>      new NameSampleDataStream(
>>>          new PlainTextByLineStream(
>>>              new
>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)), charset))
>>>    }
>>>
>>> I train the model with the above:
>>>        NameFinderME.train("en", entityName, linesToStream(lines),
>>> TrainingParameters.defaultParams(),
>>>              null:Array[Byte], Collections.emptyMap[String, Object]());
>>>
>>> When it comes time to evaluate, I recreate the stream to try to
>>> circumvent
>>> these kinds of problems ("resetting" it also throws the same error):
>>>
>>>      val crossValidator = new TokenNameFinderCrossValidator("en",
>>> entityName, TrainingParameters.defaultParams(),
>>>              null:Array[Byte], Collections.emptyMap[String, Object](),
>>> listener)
>>> crossValidator.evaluate(sampleStream, 10)
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen <[email protected]>
>>> wrote:
>>>
>>>  Are you using the API or the command line tools? Can you send a code
>>>> snippet showing how do you load the ObjectStream?
>>>>
>>>>
>>>> 2013/11/20 Walrus theCat <[email protected]>
>>>>
>>>>  I'm getting  "java.io.IOException: Stream not marked" when calling
>>>>> TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream.
>>>>>
>>>>   This
>>>>
>>>>> works when I use a TokenNameFinderEvaluator instead.  I'm led to
>>>>> believe
>>>>> that .reset isn't called on the stream in the CrossValidator.
>>>>>
>>>>>
>>>
>

Re: do any of the CrossValidators work at all?

Reply via email to