What is the licensing on that? Do we have permission from the corpus owners to distribute trained models?
On Tue, May 17, 2011 at 7:52 AM, Jörn Kottmann <[email protected]> wrote: > It was trained on Cast3LB. > > Jörn > > > On 5/17/11 2:31 PM, Jason Baldridge wrote: > >> Where is the Spanish data and what is the source? >> >> On Tue, May 17, 2011 at 3:00 AM, Jörn Kottmann<[email protected]> >> wrote: >> >> Hello Jason, >>> >>> I do not have the training data in the correct format and I >>> never took time to convert it. >>> Another way to solve it would be to wrap the old models in our >>> new model package. >>> >>> The sentence detector and tokenizer can now also be trained on >>> the conll data. Should we do that instead? >>> >>> To train the tokenizer we need a detokenizer dictionary. >>> >>> Jörn >>> >>> >>> >>> On 5/13/11 10:33 PM, Jason Baldridge wrote: >>> >>> It seems as though the Spanish models for tokenization and sentence >>>> splitting are no longer around, e.g. the models download page only has >>>> NER >>>> models: >>>> >>>> http://opennlp.sourceforge.net/models-1.5/ >>>> >>>> But there were models before: >>>> >>>> http://opennlp.sourceforge.net/models-1.3/spanish/ >>>> >>>> Anyone know what happened to them? Sorry if I'm forgetting something... >>>> >>>> Jason >>>> >>>> >>>> >> > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
