Guys, if somebody knows that part of the code well, it would be nice to take a look at:
1) TODO left there 2) .reset() raising the above exception if the PlainTextByLineStream is created with a stream. Aliaksandr On Tue, Jan 17, 2012 at 12:12 AM, [email protected] < [email protected]> wrote: > Thank you, Aliaksandr! > > > > On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu > <[email protected]> wrote: > > I have reproduced the problem. It boils down to different initialization > > of PlainTextByLineStream. If it is instantiated by > > > > public PlainTextByLineStream(Reader in) { > > this.in = new BufferedReader(in); > > this.channel = null; > > this.encoding = null; > > } > > > > it does not work. If it is instantiated with a channel: > > > > public PlainTextByLineStream(FileChannel channel, String charsetName) { > > this.encoding = charsetName; > > this.channel = channel; > > > > // TODO: Why isn't reset called here ? > > in = new BufferedReader(Channels.newReader(channel, encoding)); > > } > > > > it does work, because later on in reset: > > > > if (channel == null) { > > in.reset(); > > } > > else { > > channel.position(0); > > in = new BufferedReader(Channels.newReader(channel, encoding)); > > } > > > > reader is recreated instead of direct in.reset() call. > > > > > > Now, these differences come into play because WordTagSampleStreamFactory > has > > different PlainTextByLineStream initialization, which is probably my > fault > > due to work on factories in 402. Looks like a copy-paste error. > > > > I have tried to commit a fix, but I'm getting 403 error :( Please, apply > > the attached patch. > > > > Aliaksandr > > > > > > On Mon, Jan 16, 2012 at 12:54 AM, [email protected] > > <[email protected]> wrote: > >> > >> Hi, > >> > >> I am having an error in POS Tagger CrossValidator tool from the trunk. > >> I tried the same command with a released version and it worked, also I > >> tried Chunker CV tool and it is working too. > >> I tried debugging the code and check the SVN history for some clue, > >> but could not find anything. Any idea what is wrong? > >> > >> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman > >> -data pos1.txt -cutoff 50 > >> > >> IO error while reading training data or indexing data: Stream not marked > >> > >> Stack trace: > >> java.io.IOException: Stream not marked > >> at java.io.BufferedReader.reset(BufferedReader.java:485) > >> at > >> > opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79) > >> at > >> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43) > >> at > >> > opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256) > >> at > >> > opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113) > >> at > >> > opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72) > >> at opennlp.tools.cmdline.CLI.main(CLI.java:212) > >> > >> > >> Any idea what is wrong? > >> > >> Thanks, > >> William > > > > >
