Ok, to conclude, the *only* way I could get the CrossValidator to work is
to construct the PlainTextByLineStream with a FileChannel (with UTF-8
encoding).  I could only do this by printing my String[] to a temporary
file.  This has solved my problem, but is obviously not optimal.  Thanks
for the suggestion, William.


On Thu, Nov 21, 2013 at 11:34 AM, Walrus theCat <[email protected]>wrote:

> William,
>
> Using the FileChannel constructor seems to have done away with that
> error.  I'll post back if anything goes awry, but it's nice to have at
> least worked past that.
>
> Thanks
>
>
> On Thu, Nov 21, 2013 at 9:36 AM, Walrus theCat <[email protected]>wrote:
>
>> Thanks William,
>>
>> I'll give it a shot.  I really need to be able to work with String[]s, as
>> concatenating them all into a new file and reading it back is not that
>> scalable.  I'll  let you know how it works out.
>>
>>
>> On Thu, Nov 21, 2013 at 5:12 AM, William Colen 
>> <[email protected]>wrote:
>>
>>> Can you try using this other constructor?
>>>  PlainTextByLineStream(FileChannel channel, Charset encoding)
>>>
>>> I don't know if it is related, but internally we don't use the one that
>>> takes a InputStream.
>>>
>>> Let me know what happens.
>>>
>>>
>>> Thank you,
>>>
>>> William
>>>
>>>
>>> 2013/11/21 Jörn Kottmann <[email protected]>
>>>
>>> > Please post the exception with stack trace here.
>>> >
>>> > Jörn
>>> >
>>> >
>>> >
>>> > On 11/21/2013 07:53 AM, Walrus theCat wrote:
>>> >
>>> >> To update, when I create the stream as above
>>> >> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not
>>> >> marked"
>>> >> error when attempting to cross validate (but not when just evaluating
>>> on
>>> >> the training data).  When I, instead, create the
>>> PlainTextByLineStream on
>>> >> a
>>> >> BufferedReader (see below), I get the error " Model not compatible
>>> with
>>> >> name finder!" during training.  The result is I can't cross validate,
>>> >> something I really need to do.
>>> >>
>>> >>
>>> >>    def linesToStream(lines:Array[String]) = {
>>> >>      val charset = Charset.forName(CHARSET)
>>> >>      val reader = new BufferedReader(new InputStreamReader(new
>>> >> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET))))
>>> >>      new NameSampleDataStream(
>>> >>          new PlainTextByLineStream(
>>> >>              reader))
>>> >>    }
>>> >>
>>> >>
>>> >> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <
>>> [email protected]
>>> >> >wrote:
>>> >>
>>> >>  Thanks for the reply, even though I was kind of rude.  I'm using the
>>> API.
>>> >>> The evaluator gives me suspiciously high metrics, and the cross
>>> validator
>>> >>> fails out as mentioned.
>>> >>>
>>> >>> The code is in Scala:
>>> >>>
>>> >>>    def linesToStream(lines:Array[String]) = {
>>> >>>      val charset = Charset.forName(CHARSET)
>>> >>>      new NameSampleDataStream(
>>> >>>          new PlainTextByLineStream(
>>> >>>              new
>>> >>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)),
>>> charset))
>>> >>>    }
>>> >>>
>>> >>> I train the model with the above:
>>> >>>        NameFinderME.train("en", entityName, linesToStream(lines),
>>> >>> TrainingParameters.defaultParams(),
>>> >>>              null:Array[Byte], Collections.emptyMap[String,
>>> Object]());
>>> >>>
>>> >>> When it comes time to evaluate, I recreate the stream to try to
>>> >>> circumvent
>>> >>> these kinds of problems ("resetting" it also throws the same error):
>>> >>>
>>> >>>      val crossValidator = new TokenNameFinderCrossValidator("en",
>>> >>> entityName, TrainingParameters.defaultParams(),
>>> >>>              null:Array[Byte], Collections.emptyMap[String,
>>> Object](),
>>> >>> listener)
>>> >>> crossValidator.evaluate(sampleStream, 10)
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen <
>>> [email protected]>
>>> >>> wrote:
>>> >>>
>>> >>>  Are you using the API or the command line tools? Can you send a code
>>> >>>> snippet showing how do you load the ObjectStream?
>>> >>>>
>>> >>>>
>>> >>>> 2013/11/20 Walrus theCat <[email protected]>
>>> >>>>
>>> >>>>  I'm getting  "java.io.IOException: Stream not marked" when calling
>>> >>>>> TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream.
>>> >>>>>
>>> >>>>   This
>>> >>>>
>>> >>>>> works when I use a TokenNameFinderEvaluator instead.  I'm led to
>>> >>>>> believe
>>> >>>>> that .reset isn't called on the stream in the CrossValidator.
>>> >>>>>
>>> >>>>>
>>> >>>
>>> >
>>>
>>
>>
>

Reply via email to