I downloaded OpenNLP hoping that I could use it to lemmatize Spanish text.

But https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html#tools.lemmatizer
seems to be saying I have to train a model first to use the lemmatizer.

Although it says "The Universal Dependencies Treebank and the CoNLL 2009 datasets distribute training data for many languages.", I am having difficulty finding one.

I thought
https://github.com/UniversalDependencies/UD_Spanish-GSD
may be it, but the files there are in an XML format.

Can someone point me to an open-source lemmatizer training data in the format openNLP UIMA Lemmatizer can use ?

Thank you in advance.

--
T. "Kuro" Kurosaka, Berkeley, California, USA

Reply via email to