Where is the Spanish data and what is the source? On Tue, May 17, 2011 at 3:00 AM, Jörn Kottmann <[email protected]> wrote:
> Hello Jason, > > I do not have the training data in the correct format and I > never took time to convert it. > Another way to solve it would be to wrap the old models in our > new model package. > > The sentence detector and tokenizer can now also be trained on > the conll data. Should we do that instead? > > To train the tokenizer we need a detokenizer dictionary. > > Jörn > > > > On 5/13/11 10:33 PM, Jason Baldridge wrote: > >> It seems as though the Spanish models for tokenization and sentence >> splitting are no longer around, e.g. the models download page only has NER >> models: >> >> http://opennlp.sourceforge.net/models-1.5/ >> >> But there were models before: >> >> http://opennlp.sourceforge.net/models-1.3/spanish/ >> >> Anyone know what happened to them? Sorry if I'm forgetting something... >> >> Jason >> >> > -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
