Hello, It looks like the GitHub repo has files in conllu format, which is readable by opennlp.
es_gsd-ud-dev.conllu Daniel > On Jul 8, 2019, at 9:04 PM, T. Kuro Kurosaka <k...@bhlab.com> wrote: > > I downloaded OpenNLP hoping that I could use it to lemmatize Spanish text. > > But https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html#tools.lemmatizer > seems to be saying I have to train a model first to use the lemmatizer. > > Although it says "The Universal Dependencies Treebank and the CoNLL 2009 > datasets distribute training data for many languages.", I am having > difficulty finding one. > > I thought > https://github.com/UniversalDependencies/UD_Spanish-GSD > may be it, but the files there are in an XML format. > > Can someone point me to an open-source lemmatizer training data in the format > openNLP UIMA Lemmatizer can use ? > > Thank you in advance. > > -- > T. "Kuro" Kurosaka, Berkeley, California, USA >