I'd like to get more data, but this is what I've got right now... I have 5,000 sentences, and 10-20 name annotations per label. Would that really cause the null pointer, though? And only when cross-validating?
Thanks On Mon, Nov 25, 2013 at 2:22 PM, Jörn Kottmann <[email protected]> wrote: > You probably don't have enough training data, OpenNLP doesn't really > like that and can fail in some unpleasant ways. > > How many sentences do you have? How many name annotations do you have? > > Jörn > > > On 11/25/2013 11:10 PM, Walrus theCat wrote: > >> Hi Jörn, William, and the rest of you OpenNLPers, >> >> This problem is resurfacing. I found out that my input didn't meet the >> input specified in the docs, that it should be 1 sentence per line. After >> properly sentence-breaking my input, a very similar error is cropping up, >> viz, that it works with a TokenNameFinderEvaluator but not with a >> CrossValidator. I'm using the FileChannel constructor on the stream. >> >> I've been stepping through the source, but to no avail. The stack trace >> is >> as follows: >> >> Exception in thread "main" java.lang.NullPointerException >> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) >> at opennlp.maxent.GIS.trainModel(GIS.java:256) >> at opennlp.model.TrainUtil.train(TrainUtil.java:184) >> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:366) >> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:403) >> at >> opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate( >> TokenNameFinderCrossValidator.java:275) >> at >> walrusthecat.ml.ner.TrainNERModels$.getResults(TrainNERModels.scala:153) >> at >> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$ >> 3.apply(TrainNERModels.scala:58) >> at >> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$ >> 3.apply(TrainNERModels.scala:53) >> at >> scala.collection.mutable.ResizableArray$class.foreach( >> ResizableArray.scala:60) >> at scala.collection.mutable.ArrayBuffer.foreach( >> ArrayBuffer.scala:47) >> at walrusthecat.ml.ner.TrainNERModels$.main(TrainNERModels.scala:53) >> at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala) >> >> >> On Fri, Nov 22, 2013 at 2:57 AM, Jörn Kottmann <[email protected]> >> wrote: >> >> The first exception usually indicates that you don't have enough training >>> data, or it contains >>> no names. Try to create more training data. >>> >>> The second exception indicates that the stream you are using can't be >>> reset, and therefore doesn't work >>> with the cross validator, we should definetley make this more clear. >>> >>> Jörn >>> >>> >>> On 11/21/2013 06:46 PM, Walrus theCat wrote: >>> >>> Jörn, >>>> >>>> Thanks for your interest. >>>> >>>> Here's the exception when I use the BufferedReader. This exception is >>>> thrown during training. It does a couple "log likelihood" statements >>>> first, before throwing this: >>>> >>>> Exception in thread "main" java.lang.IllegalArgumentException: Model >>>> not >>>> compatible with name finder! >>>> at >>>> opennlp.tools.namefind.TokenNameFinderModel.<init>( >>>> TokenNameFinderModel.java:81) >>>> at >>>> opennlp.tools.namefind.TokenNameFinderModel.<init>( >>>> TokenNameFinderModel.java:106) >>>> at opennlp.tools.namefind.NameFinderME.train( >>>> NameFinderME.java:374) >>>> at opennlp.tools.namefind.NameFinderME.train( >>>> NameFinderME.java:403) >>>> at >>>> walrusthecat.ml.ner.TrainNERModels$.trainModel( >>>> TrainNERModels.scala:118) >>>> at >>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$ >>>> 2.apply(TrainNERModels.scala:53) >>>> at >>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$ >>>> 2.apply(TrainNERModels.scala:49) >>>> at >>>> scala.collection.mutable.ResizableArray$class.foreach( >>>> ResizableArray.scala:60) >>>> at scala.collection.mutable.ArrayBuffer.foreach( >>>> ArrayBuffer.scala:47) >>>> at walrusthecat.ml.ner.TrainNERModels$.main( >>>> TrainNERModels.scala:49) >>>> at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala) >>>> >>>> And here it is when I use the ByteArrayInputStream. This exception is >>>> thrown when cross-validating, but not when evaluating the training data >>>> stream: >>>> >>>> Exception in thread "main" java.io.IOException: Stream not marked >>>> at java.io.BufferedReader.reset(BufferedReader.java:505) >>>> at >>>> opennlp.tools.util.PlainTextByLineStream.reset( >>>> PlainTextByLineStream.java:79) >>>> at >>>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43) >>>> at >>>> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43) >>>> at >>>> opennlp.tools.namefind.TokenNameFinderCrossValidator$ >>>> NameToDocumentSampleStream.reset(TokenNameFinderCrossValidator.java:99) >>>> at >>>> opennlp.tools.util.eval.CrossValidationPartitioner.next( >>>> CrossValidationPartitioner.java:264) >>>> at >>>> opennlp.tools.namefind.TokenNameFinderCrossValidator.evaluate( >>>> TokenNameFinderCrossValidator.java:272) >>>> at >>>> walrusthecat.ml.ner.TrainNERModels$.getResults( >>>> TrainNERModels.scala:129) >>>> at >>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$ >>>> 2.apply(TrainNERModels.scala:55) >>>> at >>>> walrusthecat.ml.ner.TrainNERModels$$anonfun$main$ >>>> 2.apply(TrainNERModels.scala:47) >>>> at >>>> scala.collection.mutable.ResizableArray$class.foreach( >>>> ResizableArray.scala:60) >>>> at scala.collection.mutable.ArrayBuffer.foreach( >>>> ArrayBuffer.scala:47) >>>> at walrusthecat.ml.ner.TrainNERModels$.main( >>>> TrainNERModels.scala:47) >>>> at walrusthecat.ml.ner.TrainNERModels.main(TrainNERModels.scala) >>>> >>>> >>>> On Thu, Nov 21, 2013 at 12:25 AM, Jörn Kottmann <[email protected]> >>>> wrote: >>>> >>>> Please post the exception with stack trace here. >>>> >>>>> Jörn >>>>> >>>>> >>>>> >>>>> On 11/21/2013 07:53 AM, Walrus theCat wrote: >>>>> >>>>> To update, when I create the stream as above >>>>> >>>>>> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not >>>>>> marked" >>>>>> error when attempting to cross validate (but not when just evaluating >>>>>> on >>>>>> the training data). When I, instead, create the PlainTextByLineStream >>>>>> on >>>>>> a >>>>>> BufferedReader (see below), I get the error " Model not compatible >>>>>> with >>>>>> name finder!" during training. The result is I can't cross validate, >>>>>> something I really need to do. >>>>>> >>>>>> >>>>>> def linesToStream(lines:Array[String]) = { >>>>>> val charset = Charset.forName(CHARSET) >>>>>> val reader = new BufferedReader(new InputStreamReader(new >>>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)))) >>>>>> new NameSampleDataStream( >>>>>> new PlainTextByLineStream( >>>>>> reader)) >>>>>> } >>>>>> >>>>>> >>>>>> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat < >>>>>> [email protected] >>>>>> >>>>>> wrote: >>>>>>> >>>>>>> Thanks for the reply, even though I was kind of rude. I'm using >>>>>> the >>>>>> API. >>>>>> >>>>>> The evaluator gives me suspiciously high metrics, and the cross >>>>>>> validator >>>>>>> fails out as mentioned. >>>>>>> >>>>>>> The code is in Scala: >>>>>>> >>>>>>> def linesToStream(lines:Array[String]) = { >>>>>>> val charset = Charset.forName(CHARSET) >>>>>>> new NameSampleDataStream( >>>>>>> new PlainTextByLineStream( >>>>>>> new >>>>>>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)), >>>>>>> charset)) >>>>>>> } >>>>>>> >>>>>>> I train the model with the above: >>>>>>> NameFinderME.train("en", entityName, linesToStream(lines), >>>>>>> TrainingParameters.defaultParams(), >>>>>>> null:Array[Byte], Collections.emptyMap[String, >>>>>>> Object]()); >>>>>>> >>>>>>> When it comes time to evaluate, I recreate the stream to try to >>>>>>> circumvent >>>>>>> these kinds of problems ("resetting" it also throws the same error): >>>>>>> >>>>>>> val crossValidator = new TokenNameFinderCrossValidator("en", >>>>>>> entityName, TrainingParameters.defaultParams(), >>>>>>> null:Array[Byte], Collections.emptyMap[String, >>>>>>> Object](), >>>>>>> listener) >>>>>>> crossValidator.evaluate(sampleStream, 10) >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen < >>>>>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Are you using the API or the command line tools? Can you send a >>>>>>> code >>>>>>> >>>>>>> snippet showing how do you load the ObjectStream? >>>>>>>> >>>>>>>> >>>>>>>> 2013/11/20 Walrus theCat <[email protected]> >>>>>>>> >>>>>>>> I'm getting "java.io.IOException: Stream not marked" when >>>>>>>> calling >>>>>>>> >>>>>>>> TokenNameFinderCrossValidator.evaluate with a >>>>>>>>> NameSampleDataStream. >>>>>>>>> >>>>>>>>> This >>>>>>>>> >>>>>>>> works when I use a TokenNameFinderEvaluator instead. I'm led to >>>>>>>> >>>>>>>>> believe >>>>>>>>> that .reset isn't called on the stream in the CrossValidator. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >
