What is the licensing on that? Do we have permission from the corpus owners
to distribute trained models?

On Tue, May 17, 2011 at 7:52 AM, Jörn Kottmann <[email protected]> wrote:

> It was trained on Cast3LB.
>
> Jörn
>
>
> On 5/17/11 2:31 PM, Jason Baldridge wrote:
>
>> Where is the Spanish data and what is the source?
>>
>> On Tue, May 17, 2011 at 3:00 AM, Jörn Kottmann<[email protected]>
>>  wrote:
>>
>>  Hello Jason,
>>>
>>> I do not have the training data in the correct format and I
>>> never took time to convert it.
>>> Another way to solve it would be to wrap the old models in our
>>> new model package.
>>>
>>> The sentence detector and tokenizer can now also be trained on
>>> the conll data. Should we do that instead?
>>>
>>> To train the tokenizer we need a detokenizer dictionary.
>>>
>>> Jörn
>>>
>>>
>>>
>>> On 5/13/11 10:33 PM, Jason Baldridge wrote:
>>>
>>>  It seems as though the Spanish models for tokenization and sentence
>>>> splitting are no longer around, e.g. the models download page only has
>>>> NER
>>>> models:
>>>>
>>>> http://opennlp.sourceforge.net/models-1.5/
>>>>
>>>> But there were models before:
>>>>
>>>> http://opennlp.sourceforge.net/models-1.3/spanish/
>>>>
>>>> Anyone know what happened to them? Sorry if I'm forgetting something...
>>>>
>>>> Jason
>>>>
>>>>
>>>>
>>
>


-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to