Re: Getting started with OpenNLP and POS tagger

Jörn Kottmann Tue, 17 Sep 2013 01:20:15 -0700

On 09/17/2013 09:53 AM, Giorgio Valoti wrote:

<http://www.corpusitaliano.it/en/index.html>  The whole corpus is well over 
9GB. It’s not my plan to analyze the whole thing, of course! Do you think would it 
realistic to use the evaluation tool to decide a reasonable size for the corpus? I’m 
not an expert, but I guess there’s no point in analyzed that many data if you can 
achieve a good enough accuracy with a much smaller sample, right?

The model performance depends on the quality of your training data, Thedescription says that the corpus is in part manually corrected forannotations. I would suggest to only train on these parts if possible,because the other parts are probably less accurate.

Depending on the performance of the model on your data, you couldannotate some of your documents and add them to the training data, thisusually helps a lot.


Jörn

Re: Getting started with OpenNLP and POS tagger

Reply via email to