Where is the Spanish data and what is the source?

On Tue, May 17, 2011 at 3:00 AM, Jörn Kottmann <[email protected]> wrote:

> Hello Jason,
>
> I do not have the training data in the correct format and I
> never took time to convert it.
> Another way to solve it would be to wrap the old models in our
> new model package.
>
> The sentence detector and tokenizer can now also be trained on
> the conll data. Should we do that instead?
>
> To train the tokenizer we need a detokenizer dictionary.
>
> Jörn
>
>
>
> On 5/13/11 10:33 PM, Jason Baldridge wrote:
>
>> It seems as though the Spanish models for tokenization and sentence
>> splitting are no longer around, e.g. the models download page only has NER
>> models:
>>
>> http://opennlp.sourceforge.net/models-1.5/
>>
>> But there were models before:
>>
>> http://opennlp.sourceforge.net/models-1.3/spanish/
>>
>> Anyone know what happened to them? Sorry if I'm forgetting something...
>>
>> Jason
>>
>>
>


-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to