I am a Portuguese native speaker. I contributed with parsers for some of the Linguateca formats and we can train models for most of the OpenNLP tools now. It is missing the Coreference and Parser, but I will have time to work on that next year. (I still have to work with the paper and data you sent, about the Portuguese parser, but I had to change my priorities).
And yes, the tools Jörn is working on are great. I hope I can start using/working with it as soon as I finish my thesis, in a couple of months. I am thinking of organizing an Apache OpenNLP event here with students from the Linguistics and CS departments to bootstrap the Portuguese annotation project, maybe we will have a few new contributors! On Mon, Dec 5, 2011 at 7:33 PM, Jason Baldridge <[email protected]>wrote: > One thing that I think might be nice moving forward is to develop a robust > set of models and test sets that involve at least two languages. I'm > thinking Portuguese would be a good one in addition to English since: > > - several of us speak it (I'm a non-native speaker who lived in Brazil > for a couple of years -- who else?) > - there are truly free annotated resources for it: > http://www.linguateca.pt/ > - it's pretty darn widely spoken in the world, both as first and second > language > > Doing something like this would help push the annotation effort forward as > well. E.g. we commit to providing support for a language means we need to > get at least some annotations going for each level of analysis we want to > support, and that will in turn spur development on the tool that Jorn has > been putting together. > > Jason > > -- > Jason Baldridge > Associate Professor, Department of Linguistics > The University of Texas at Austin > http://www.jasonbaldridge.com > http://twitter.com/jasonbaldridge >
