On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal < j...@cavorite.com> wrote:
> (Sorry for the late reply) > > I just cloned the repository and I'll add the scripts I used to > convert the input files and to train the models. this afternoon I'll > put them together on a pull request. > > Great! > Should we keep a copy of the training data in GitHub? I think it could > be useful for training again the models and it also be helpful in case > that the original files are not available anymore (e.g. 404 errors). > Otherwise, should be enough to include links those files? > > It depends on whether it is legal to do so. For example, the Norwegian data used to train the models there cannot be distributed. If it is fine to have it and the corpus isn't too massive, then it might make sense. > I also have a script for generating a Maven repository for the models. > The GitHub project could also be used for hosting that repository, > what do you think? > > +1 Sounds interesting, so if you want to set that up, it sounds good to me. -Jason On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge > <jasonbaldri...@gmail.com> wrote: > > That's great! Would you be interested in contributing code and/or data to > > the OpenNLP Models repo? > > > > https://github.com/utcompling/OpenNLP-Models > > > > > > > > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal > > <j...@cavorite.com> wrote: > >> > >> Hello everyone, > >> > >> I trained POS tagging models for Spanish using the CoNLL data [1]. > >> > >> I created two versions using a different model type (percetron and > >> maxent) and I also created versions of the models using the universal > >> Part-of-Speech Tags [2]. > >> > >> I uploaded the files to my server, you can read more details here, > >> including the evaluation results: > >> > >> http://cavorite.com/labs/nlp/opennlp-models-es/ > >> > >> And the files are here: > >> > >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/ > >> > >> > >> Feel free to host them on the OpenNLP website and do not hesitate to > >> send me your questions or comments. > >> > >> Cheers, > >> > >> Juan Manuel Caicedo > >> > >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html > >> [2] http://code.google.com/p/universal-pos-tags/ > > > > > > > > > > -- > > Jason Baldridge > > Associate Professor, Department of Linguistics > > The University of Texas at Austin > > http://www.jasonbaldridge.com > > http://twitter.com/jasonbaldridge > > > > > -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge