Ok, to conclude, the *only* way I could get the CrossValidator to work is to construct the PlainTextByLineStream with a FileChannel (with UTF-8 encoding). I could only do this by printing my String[] to a temporary file. This has solved my problem, but is obviously not optimal. Thanks for the suggestion, William.
On Thu, Nov 21, 2013 at 11:34 AM, Walrus theCat <[email protected]>wrote: > William, > > Using the FileChannel constructor seems to have done away with that > error. I'll post back if anything goes awry, but it's nice to have at > least worked past that. > > Thanks > > > On Thu, Nov 21, 2013 at 9:36 AM, Walrus theCat <[email protected]>wrote: > >> Thanks William, >> >> I'll give it a shot. I really need to be able to work with String[]s, as >> concatenating them all into a new file and reading it back is not that >> scalable. I'll let you know how it works out. >> >> >> On Thu, Nov 21, 2013 at 5:12 AM, William Colen >> <[email protected]>wrote: >> >>> Can you try using this other constructor? >>> PlainTextByLineStream(FileChannel channel, Charset encoding) >>> >>> I don't know if it is related, but internally we don't use the one that >>> takes a InputStream. >>> >>> Let me know what happens. >>> >>> >>> Thank you, >>> >>> William >>> >>> >>> 2013/11/21 Jörn Kottmann <[email protected]> >>> >>> > Please post the exception with stack trace here. >>> > >>> > Jörn >>> > >>> > >>> > >>> > On 11/21/2013 07:53 AM, Walrus theCat wrote: >>> > >>> >> To update, when I create the stream as above >>> >> (PlainTextByLineStream(ByteArrayInputStream)) I get the "Stream not >>> >> marked" >>> >> error when attempting to cross validate (but not when just evaluating >>> on >>> >> the training data). When I, instead, create the >>> PlainTextByLineStream on >>> >> a >>> >> BufferedReader (see below), I get the error " Model not compatible >>> with >>> >> name finder!" during training. The result is I can't cross validate, >>> >> something I really need to do. >>> >> >>> >> >>> >> def linesToStream(lines:Array[String]) = { >>> >> val charset = Charset.forName(CHARSET) >>> >> val reader = new BufferedReader(new InputStreamReader(new >>> >> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)))) >>> >> new NameSampleDataStream( >>> >> new PlainTextByLineStream( >>> >> reader)) >>> >> } >>> >> >>> >> >>> >> On Wed, Nov 20, 2013 at 5:42 PM, Walrus theCat < >>> [email protected] >>> >> >wrote: >>> >> >>> >> Thanks for the reply, even though I was kind of rude. I'm using the >>> API. >>> >>> The evaluator gives me suspiciously high metrics, and the cross >>> validator >>> >>> fails out as mentioned. >>> >>> >>> >>> The code is in Scala: >>> >>> >>> >>> def linesToStream(lines:Array[String]) = { >>> >>> val charset = Charset.forName(CHARSET) >>> >>> new NameSampleDataStream( >>> >>> new PlainTextByLineStream( >>> >>> new >>> >>> ByteArrayInputStream(lines.mkString("\n").getBytes(CHARSET)), >>> charset)) >>> >>> } >>> >>> >>> >>> I train the model with the above: >>> >>> NameFinderME.train("en", entityName, linesToStream(lines), >>> >>> TrainingParameters.defaultParams(), >>> >>> null:Array[Byte], Collections.emptyMap[String, >>> Object]()); >>> >>> >>> >>> When it comes time to evaluate, I recreate the stream to try to >>> >>> circumvent >>> >>> these kinds of problems ("resetting" it also throws the same error): >>> >>> >>> >>> val crossValidator = new TokenNameFinderCrossValidator("en", >>> >>> entityName, TrainingParameters.defaultParams(), >>> >>> null:Array[Byte], Collections.emptyMap[String, >>> Object](), >>> >>> listener) >>> >>> crossValidator.evaluate(sampleStream, 10) >>> >>> >>> >>> Thanks >>> >>> >>> >>> >>> >>> >>> >>> On Wed, Nov 20, 2013 at 3:43 PM, William Colen < >>> [email protected]> >>> >>> wrote: >>> >>> >>> >>> Are you using the API or the command line tools? Can you send a code >>> >>>> snippet showing how do you load the ObjectStream? >>> >>>> >>> >>>> >>> >>>> 2013/11/20 Walrus theCat <[email protected]> >>> >>>> >>> >>>> I'm getting "java.io.IOException: Stream not marked" when calling >>> >>>>> TokenNameFinderCrossValidator.evaluate with a NameSampleDataStream. >>> >>>>> >>> >>>> This >>> >>>> >>> >>>>> works when I use a TokenNameFinderEvaluator instead. I'm led to >>> >>>>> believe >>> >>>>> that .reset isn't called on the stream in the CrossValidator. >>> >>>>> >>> >>>>> >>> >>> >>> > >>> >> >> >
