(Sorry for the late reply)

I just cloned the repository and I'll add the scripts I used to
convert the input files and to train the models. this afternoon I'll
put them together on a pull request.

Should we keep a copy of the training data in GitHub? I think it could
be useful for training again the models and it also be helpful in case
that the original files are not available anymore (e.g. 404 errors).
Otherwise, should be enough to include links those files?

I also have a script for generating a Maven repository for the models.
The GitHub project could also be used for hosting that repository,
what do you think?

On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge
<jasonbaldri...@gmail.com> wrote:
> That's great! Would you be interested in contributing code and/or data to
> the OpenNLP Models repo?
>
> https://github.com/utcompling/OpenNLP-Models
>
>
>
> On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal
> <j...@cavorite.com> wrote:
>>
>> Hello everyone,
>>
>> I trained POS tagging models for Spanish using the CoNLL data [1].
>>
>> I created two versions using a different model type (percetron and
>> maxent) and I also created versions of the models using the universal
>> Part-of-Speech Tags [2].
>>
>> I uploaded the files to my server, you can read more details here,
>> including the evaluation results:
>>
>> http://cavorite.com/labs/nlp/opennlp-models-es/
>>
>> And the files are here:
>>
>> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
>>
>>
>> Feel free to host them on the OpenNLP website and do not hesitate to
>> send me your questions or comments.
>>
>> Cheers,
>>
>> Juan Manuel Caicedo
>>
>> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
>> [2] http://code.google.com/p/universal-pos-tags/
>
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>
>

Reply via email to