Re: do any of the CrossValidators work at all?

Walrus theCat Mon, 25 Nov 2013 15:58:38 -0800

I'd like to get more data, but this is what I've got right now... I have
5,000 sentences, and 10-20 name annotations per label.  Would that really
cause the null pointer, though? And only when cross-validating?


Thanks


On Mon, Nov 25, 2013 at 2:22 PM, Jörn Kottmann <[email protected]> wrote:

> You probably don't have enough training data, OpenNLP doesn't really
> like that and can fail in some unpleasant ways.
>
> How many sentences do you have? How many name annotations do you have?
>
> Jörn
>
>
> On 11/25/2013 11:10 PM, Walrus theCat wrote:
>
>> Hi Jörn, William, and the rest of you OpenNLPers,
>>
>> This problem is resurfacing.  I found out that my input didn't meet the
>> input specified in the docs, that it should be 1 sentence per line.  After
>> properly sentence-breaking my input, a very similar error is cropping up,
>> viz, that it works with a TokenNameFinderEvaluator but not with a
>> CrossValidator.  I'm using the FileChannel constructor on the stream.
>>
>> I've been stepping through the source, but to no avail.  The stack trace
>> is
>> as follows:
>>
>> Exception in thread "main" java.lang.NullPointerException
>>      at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>>      at opennlp.maxent.GIS.trainModel(GIS.java:256)
>>      at opennlp.model.TrainUtil.train(TrainUtil.java:184)
>>      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:366)
>>      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403)
>>      at
>> opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(
>> TokenNameFinderCrossValidator.java:275)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:153)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 3.apply(TrainNERModels.scala:58)
>>      at
>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>> 3.apply(TrainNERModels.scala:53)
>>      at
>> scala.collection.mutable.ResizableArray$class.foreach(
>> ResizableArray.scala:60)
>>      at scala.collection.mutable.ArrayBuffer.foreach(
>> ArrayBuffer.scala:47)
>>      at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:53)
>>      at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
>>
>>
>> On Fri, Nov 22, 2013 at 2:57 AM, Jörn Kottmann <[email protected]>
>> wrote:
>>
>>  The first exception usually indicates that you don't have enough training
>>> data, or it contains
>>> no names. Try to create more training data.
>>>
>>> The second exception indicates that the stream you are using can't be
>>> reset, and therefore doesn't work
>>> with the cross validator, we should definetley make this more clear.
>>>
>>> Jörn
>>>
>>>
>>> On 11/21/2013 06:46 PM, Walrus theCat wrote:
>>>
>>>  Jörn,
>>>>
>>>> Thanks for your interest.
>>>>
>>>> Here's the exception when I use the BufferedReader.  This exception is
>>>> thrown during training.  It does a couple "log likelihood" statements
>>>> first, before throwing this:
>>>>
>>>> Exception in thread "main" java.lang.IllegalArgumentException: Model
>>>> not
>>>> compatible with name finder!
>>>>       at
>>>> opennlp.tools.namefind.TokenNameFinderModel.<init>(
>>>> TokenNameFinderModel.java:81)
>>>>       at
>>>> opennlp.tools.namefind.TokenNameFinderModel.<init>(
>>>> TokenNameFinderModel.java:106)
>>>>       at opennlp.tools.namefind.NameFinderME.train(
>>>> NameFinderME.java:374)
>>>>       at opennlp.tools.namefind.NameFinderME.train(
>>>> NameFinderME.java:403)
>>>>       at
>>>> walrusthecat.ml.ner.TrainNERModels$.trainModel(
>>>> TrainNERModels.scala:118)
>>>>       at
>>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>>>> 2.apply(TrainNERModels.scala:53)
>>>>       at
>>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>>>> 2.apply(TrainNERModels.scala:49)
>>>>       at
>>>> scala.collection.mutable.ResizableArray$class.foreach(
>>>> ResizableArray.scala:60)
>>>>       at scala.collection.mutable.ArrayBuffer.foreach(
>>>> ArrayBuffer.scala:47)
>>>>       at walrusthecat.ml.ner.TrainNERModels$.main(
>>>> TrainNERModels.scala:49)
>>>>       at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
>>>>
>>>> And here it is when I use the ByteArrayInputStream.  This exception is
>>>> thrown when cross-validating, but not when evaluating the training data
>>>> stream:
>>>>
>>>> Exception in thread "main" java.io.IOException: Stream not marked
>>>>       at java.io.BufferedReader.reset(BufferedReader.java:505)
>>>>       at
>>>> opennlp.tools.util.PlainTextByLineStream.reset(
>>>> PlainTextByLineStream.java:79)
>>>>       at
>>>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>>>>       at
>>>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>>>>       at
>>>> opennlp.tools.namefind.TokenNameFinderCrossValidator$
>>>> NameToDocumentSampleStream.reset(TokenNameFinderCrossValidator.java:99)
>>>>       at
>>>> opennlp.tools.util.eval.CrossValidationPartitioner.next(
>>>> CrossValidationPartitioner.java:264)
>>>>       at
>>>> opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate(
>>>> TokenNameFinderCrossValidator.java:272)
>>>>       at
>>>> walrusthecat.ml.ner.TrainNERModels$.getResults(
>>>> TrainNERModels.scala:129)
>>>>       at
>>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>>>> 2.apply(TrainNERModels.scala:55)
>>>>       at
>>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$
>>>> 2.apply(TrainNERModels.scala:47)
>>>>       at
>>>> scala.collection.mutable.ResizableArray$class.foreach(
>>>> ResizableArray.scala:60)
>>>>       at scala.collection.mutable.ArrayBuffer.foreach(
>>>> ArrayBuffer.scala:47)
>>>>       at walrusthecat.ml.ner.TrainNERModels$.main(
>>>> TrainNERModels.scala:47)
>>>>       at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala)
>>>>
>>>>
>>>> On Thu, Nov 21, 2013 at 12:25 AM, Jörn Kottmann <[email protected]>
>>>> wrote:
>>>>
>>>>   Please post the exception with stack trace here.
>>>>
>>>>> Jörn
>>>>>
>>>>>
>>>>>
>>>>> On 11/21/2013 07:53 AM, Walrus theCat wrote:
>>>>>
>>>>>   To update, when I create the stream as above
>>>>>
>>>>>> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
>>>>>> marked"
>>>>>> error when attempting to cross validate (but not when just evaluating
>>>>>> on
>>>>>> the training data).  When I, instead, create the PlainTextByLineStream
>>>>>> on
>>>>>> a
>>>>>> BufferedReader (see below), I get the error " Model not compatible
>>>>>> with
>>>>>> name finder!" during training.  The result is I can't cross validate,
>>>>>> something I really need to do.
>>>>>>
>>>>>>
>>>>>>      def linesToStream(lines:Array[String]) = {
>>>>>>        val charset = Charset.forName(CHARSET)
>>>>>>        val reader = new BufferedReader(new InputStreamReader(new
>>>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
>>>>>>        new NameSampleDataStream(
>>>>>>            new PlainTextByLineStream(
>>>>>>                reader))
>>>>>>      }
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <
>>>>>> [email protected]
>>>>>>
>>>>>>  wrote:
>>>>>>>
>>>>>>>     Thanks for the reply, even though I was kind of rude.  I'm using
>>>>>> the
>>>>>> API.
>>>>>>
>>>>>>  The evaluator gives me suspiciously high metrics, and the cross
>>>>>>> validator
>>>>>>> fails out as mentioned.
>>>>>>>
>>>>>>> The code is in Scala:
>>>>>>>
>>>>>>>      def linesToStream(lines:Array[String]) = {
>>>>>>>        val charset = Charset.forName(CHARSET)
>>>>>>>        new NameSampleDataStream(
>>>>>>>            new PlainTextByLineStream(
>>>>>>>                new
>>>>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)),
>>>>>>> charset))
>>>>>>>      }
>>>>>>>
>>>>>>> I train the model with the above:
>>>>>>>          NameFinderME.train("en", entityName, linesToStream(lines),
>>>>>>> TrainingParameters.defaultParams(),
>>>>>>>                null:Array[Byte], Collections.emptyMap[String,
>>>>>>> Object]());
>>>>>>>
>>>>>>> When it comes time to evaluate, I recreate the stream to try to
>>>>>>> circumvent
>>>>>>> these kinds of problems ("resetting" it also throws the same error):
>>>>>>>
>>>>>>>        val crossValidator = new TokenNameFinderCrossValidator("en",
>>>>>>> entityName, TrainingParameters.defaultParams(),
>>>>>>>                null:Array[Byte], Collections.emptyMap[String,
>>>>>>> Object](),
>>>>>>> listener)
>>>>>>> crossValidator.evaluate(sampleStream, 10)
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen <
>>>>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>    Are you using the API or the command line tools? Can you send a
>>>>>>> code
>>>>>>>
>>>>>>>  snippet showing how do you load the ObjectStream?
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/11/20 Walrus theCat <[email protected]>
>>>>>>>>
>>>>>>>>    I'm getting  "java.io.IOException: Stream not marked" when
>>>>>>>> calling
>>>>>>>>
>>>>>>>>  TokenNameFinderCrossValidator.evaluate with a
>>>>>>>>> NameSampleDataStream.
>>>>>>>>>
>>>>>>>>>      This
>>>>>>>>>
>>>>>>>>   works when I use a TokenNameFinderEvaluator instead.  I'm led to
>>>>>>>>
>>>>>>>>> believe
>>>>>>>>> that .reset isn't called on the stream in the CrossValidator.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>

Re: do any of the CrossValidators work at all?

Reply via email to