I downloaded OpenNLP hoping that I could use it to lemmatize Spanish text.
But https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html#tools.lemmatizer
seems to be saying I have to train a model first to use the lemmatizer.
Although it says "The Universal Dependencies Treebank and the CoNLL 2009
datasets distribute training data for many languages.", I am having difficulty
finding one.
I thought
https://github.com/UniversalDependencies/UD_Spanish-GSD
may be it, but the files there are in an XML format.
Can someone point me to an open-source lemmatizer training data in the format
openNLP UIMA Lemmatizer can use ?
Thank you in advance.
--
T. "Kuro" Kurosaka, Berkeley, California, USA