Getting started with OpenNLP and POS tagger

Giorgio Valoti Mon, 16 Sep 2013 23:57:34 -0700

Hi all,
this is my first post to the list. I’ve tried to gather some info from the 
documentation and googling around but I haven’t found a satisfying answer to 
the following questions. Please tell me where to RTFM if some of these 
questions belong to some FAQ or are off-topic.


It seems there’s no way to incrementally train the POS tagger nor to 
parallelize this task. Is this correct? 

If the only way to train the POS tagger is in one single shot, how can I 
estimate memory requirements for the JVM? In other words, given, say, a 1GB 
training corpus, is there a way to estimate how much RAM would it be needed?

Finally, I have tried to use the `-ngram` switch:
> opennlp POSTaggerTrainer.conllx -type maxent -ngram 3 ... <other options as 
> usual: -lang -model -data -encoding>

but I get this error:
> Building ngram dictionary ... IO error while building NGram Dictionary: 
> Stream not marked
> Stream not marked
> java.io.IOException: Stream not marked
>         at java.io.BufferedReader.reset(BufferedReader.java:485)
>         at 
> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
>         at 
> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>         at 
> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
>         at 
> opennlp.tools.cmdline.postag.POSTaggerTrainerTool.run(POSTaggerTrainerTool.java:80)
>         at opennlp.tools.cmdline.CLI.main(CLI.java:222)


But I can’t find out what I’m doing wrong.


Any help really appreciated.

--
Giorgio Valoti

Getting started with OpenNLP and POS tagger

Reply via email to