Hello,
   It looks like the GitHub repo has files in conllu format, which is readable 
by opennlp.

   es_gsd-ud-dev.conllu
Daniel

> On Jul 8, 2019, at 9:04 PM, T. Kuro Kurosaka <k...@bhlab.com> wrote:
> 
> I downloaded OpenNLP hoping that I could use it to lemmatize Spanish text.
> 
> But https://opennlp.apache.org/docs/1.9.1/manual/opennlp.html#tools.lemmatizer
> seems to be saying I have to train a model first to use the lemmatizer.
> 
> Although it says "The Universal Dependencies Treebank and the CoNLL 2009 
> datasets distribute training data for many languages.", I am having 
> difficulty finding one.
> 
> I thought
> https://github.com/UniversalDependencies/UD_Spanish-GSD
> may be it, but the files there are in an XML format.
> 
> Can someone point me to an open-source lemmatizer training data in the format 
> openNLP UIMA Lemmatizer can use ?
> 
> Thank you in advance.
> 
> -- 
> T. "Kuro" Kurosaka, Berkeley, California, USA
> 

Reply via email to