Guys, if somebody knows that part of the code well, it would be nice to
take a look at:

1) TODO left there
2) .reset() raising the above exception if the PlainTextByLineStream is
created with a stream.

Aliaksandr

On Tue, Jan 17, 2012 at 12:12 AM, [email protected] <
[email protected]> wrote:

> Thank you, Aliaksandr!
>
>
>
> On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu
> <[email protected]> wrote:
> > I have reproduced the problem. It boils down to different initialization
> > of PlainTextByLineStream. If it is instantiated by
> >
> >   public PlainTextByLineStream(Reader in) {
> >     this.in = new BufferedReader(in);
> >     this.channel = null;
> >     this.encoding = null;
> >   }
> >
> > it does not work. If it is instantiated with a channel:
> >
> >   public PlainTextByLineStream(FileChannel channel, String charsetName) {
> >     this.encoding = charsetName;
> >     this.channel = channel;
> >
> >     // TODO: Why isn't reset called here ?
> >     in = new BufferedReader(Channels.newReader(channel, encoding));
> >   }
> >
> > it does work, because later on in reset:
> >
> >     if (channel == null) {
> >         in.reset();
> >     }
> >     else {
> >       channel.position(0);
> >       in = new BufferedReader(Channels.newReader(channel, encoding));
> >     }
> >
> > reader is recreated instead of direct in.reset() call.
> >
> >
> > Now, these differences come into play because WordTagSampleStreamFactory
> has
> > different PlainTextByLineStream initialization, which is probably my
> fault
> > due to work on factories in 402. Looks like a copy-paste error.
> >
> > I have tried to commit a fix, but I'm getting 403 error :(  Please, apply
> > the attached patch.
> >
> > Aliaksandr
> >
> >
> > On Mon, Jan 16, 2012 at 12:54 AM, [email protected]
> > <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> I am having an error in POS Tagger CrossValidator tool from the trunk.
> >> I tried the same command with a released version and it worked, also I
> >> tried Chunker CV tool and it is working too.
> >> I tried debugging the code and check the SVN history for some clue,
> >> but could not find anything. Any idea what is wrong?
> >>
> >> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
> >> -data pos1.txt -cutoff 50
> >>
> >> IO error while reading training data or indexing data: Stream not marked
> >>
> >> Stack trace:
> >> java.io.IOException: Stream not marked
> >>        at java.io.BufferedReader.reset(BufferedReader.java:485)
> >>        at
> >>
> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
> >>        at
> >> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
> >>        at
> >>
> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
> >>        at
> >>
> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
> >>        at
> >>
> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
> >>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
> >>
> >>
> >> Any idea what is wrong?
> >>
> >> Thanks,
> >> William
> >
> >
>

Reply via email to