Thanks William,

I'll give it a shot.  I really need to be able to work with String[]s, as
concatenating them all into a new file and reading it back is not that
scalable.  I'll  let you know how it works out.


On Thu, Nov 21, 2013 at 5:12 AM, William Colen <[email protected]>wrote:

> Can you try using this other constructor?
>  PlainTextByLineStream(FileChannel channel, Charset encoding)
>
> I don't know if it is related, but internally we don't use the one that
> takes a InputStream.
>
> Let me know what happens.
>
>
> Thank you,
>
> William
>
>
> 2013/11/21 Jörn Kottmann <[email protected]>
>
> > Please post the exception with stack trace here.
> >
> > Jörn
> >
> >
> >
> > On 11/21/2013 07:53 AM, Walrus theCat wrote:
> >
> >> To update, when I create the stream as above
> >> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
> >> marked"
> >> error when attempting to cross validate (but not when just evaluating on
> >> the training data).  When I, instead, create the PlainTextByLineStream
> on
> >> a
> >> BufferedReader (see below), I get the error " Model not compatible with
> >> name finder!" during training.  The result is I can't cross validate,
> >> something I really need to do.
> >>
> >>
> >>    def linesToStream(lines:Array[String]) = {
> >>      val charset = Charset.forName(CHARSET)
> >>      val reader = new BufferedReader(new InputStreamReader(new
> >> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
> >>      new NameSampleDataStream(
> >>          new PlainTextByLineStream(
> >>              reader))
> >>    }
> >>
> >>
> >> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <[email protected]
> >> >wrote:
> >>
> >>  Thanks for the reply, even though I was kind of rude.  I'm using the
> API.
> >>> The evaluator gives me suspiciously high metrics, and the cross
> validator
> >>> fails out as mentioned.
> >>>
> >>> The code is in Scala:
> >>>
> >>>    def linesToStream(lines:Array[String]) = {
> >>>      val charset = Charset.forName(CHARSET)
> >>>      new NameSampleDataStream(
> >>>          new PlainTextByLineStream(
> >>>              new
> >>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)), charset))
> >>>    }
> >>>
> >>> I train the model with the above:
> >>>        NameFinderME.train("en", entityName, linesToStream(lines),
> >>> TrainingParameters.defaultParams(),
> >>>              null:Array[Byte], Collections.emptyMap[String, Object]());
> >>>
> >>> When it comes time to evaluate, I recreate the stream to try to
> >>> circumvent
> >>> these kinds of problems ("resetting" it also throws the same error):
> >>>
> >>>      val crossValidator = new TokenNameFinderCrossValidator("en",
> >>> entityName, TrainingParameters.defaultParams(),
> >>>              null:Array[Byte], Collections.emptyMap[String, Object](),
> >>> listener)
> >>> crossValidator.evaluate(sampleStream, 10)
> >>>
> >>> Thanks
> >>>
> >>>
> >>>
> >>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen <
> [email protected]>
> >>> wrote:
> >>>
> >>>  Are you using the API or the command line tools? Can you send a code
> >>>> snippet showing how do you load the ObjectStream?
> >>>>
> >>>>
> >>>> 2013/11/20 Walrus theCat <[email protected]>
> >>>>
> >>>>  I'm getting  "java.io.IOException: Stream not marked" when calling
> >>>>> TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream.
> >>>>>
> >>>>   This
> >>>>
> >>>>> works when I use a TokenNameFinderEvaluator instead.  I'm led to
> >>>>> believe
> >>>>> that .reset isn't called on the stream in the CrossValidator.
> >>>>>
> >>>>>
> >>>
> >
>

Reply via email to