Thanks William, I'll give it a shot. I really need to be able to work with String[]s, as concatenating them all into a new file and reading it back is not that scalable. I'll let you know how it works out.
On Thu, Nov 21, 2013 at 5:12 AM, William Colen <[email protected]>wrote: > Can you try using this other constructor? > PlainTextByLineStream(FileChannel channel, Charset encoding) > > I don't know if it is related, but internally we don't use the one that > takes a InputStream. > > Let me know what happens. > > > Thank you, > > William > > > 2013/11/21 Jörn Kottmann <[email protected]> > > > Please post the exception with stack trace here. > > > > Jörn > > > > > > > > On 11/21/2013 07:53 AM, Walrus theCat wrote: > > > >> To update, when I create the stream as above > >> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not > >> marked" > >> error when attempting to cross validate (but not when just evaluating on > >> the training data). When I, instead, create the PlainTextByLineStream > on > >> a > >> BufferedReader (see below), I get the error " Model not compatible with > >> name finder!" during training. The result is I can't cross validate, > >> something I really need to do. > >> > >> > >> def linesToStream(lines:Array[String]) = { > >> val charset = Charset.forName(CHARSET) > >> val reader = new BufferedReader(new InputStreamReader(new > >> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)))) > >> new NameSampleDataStream( > >> new PlainTextByLineStream( > >> reader)) > >> } > >> > >> > >> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat <[email protected] > >> >wrote: > >> > >> Thanks for the reply, even though I was kind of rude. I'm using the > API. > >>> The evaluator gives me suspiciously high metrics, and the cross > validator > >>> fails out as mentioned. > >>> > >>> The code is in Scala: > >>> > >>> def linesToStream(lines:Array[String]) = { > >>> val charset = Charset.forName(CHARSET) > >>> new NameSampleDataStream( > >>> new PlainTextByLineStream( > >>> new > >>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)), charset)) > >>> } > >>> > >>> I train the model with the above: > >>> NameFinderME.train("en", entityName, linesToStream(lines), > >>> TrainingParameters.defaultParams(), > >>> null:Array[Byte], Collections.emptyMap[String, Object]()); > >>> > >>> When it comes time to evaluate, I recreate the stream to try to > >>> circumvent > >>> these kinds of problems ("resetting" it also throws the same error): > >>> > >>> val crossValidator = new TokenNameFinderCrossValidator("en", > >>> entityName, TrainingParameters.defaultParams(), > >>> null:Array[Byte], Collections.emptyMap[String, Object](), > >>> listener) > >>> crossValidator.evaluate(sampleStream, 10) > >>> > >>> Thanks > >>> > >>> > >>> > >>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen < > [email protected]> > >>> wrote: > >>> > >>> Are you using the API or the command line tools? Can you send a code > >>>> snippet showing how do you load the ObjectStream? > >>>> > >>>> > >>>> 2013/11/20 Walrus theCat <[email protected]> > >>>> > >>>> I'm getting "java.io.IOException: Stream not marked" when calling > >>>>> TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream. > >>>>> > >>>> This > >>>> > >>>>> works when I use a TokenNameFinderEvaluator instead. I'm led to > >>>>> believe > >>>>> that .reset isn't called on the stream in the CrossValidator. > >>>>> > >>>>> > >>> > > >
