On 03/09/13 17:25, Danica Damljanovic wrote:
I was trying to find the original opennlp corpora used for training, but
could not get anything apart from the binary model...
Anyone has any idea on whether it is possible to get this and how?
If I'm not mistaken the original corpora cannot be re-distributed due to
licensing issues...However, don't take my word for it - someone with the
appropriate authority should answer this (someone from the dev-team)...
Also, if I remember correctly, you can get a pretty decent
sentence-detecting model with less than 100 sentences, whereas for the
rest of the components (Tokenizer ,POSTagger, NER etc etc) you need
thousands of sentences!
Jim