I am a Portuguese native speaker.
I contributed with parsers for some of the Linguateca formats and we can
train models for most of the OpenNLP tools now. It is missing the
Coreference and Parser, but I will have time to work on that next year. (I
still have to work with the paper and data you sent, about the Portuguese
parser, but I had to change my priorities).

And yes, the tools Jörn is working on are great. I hope I can start
using/working with it as soon as I finish my thesis, in a couple of months.
I am thinking of organizing an Apache OpenNLP event here with students from
the Linguistics and CS departments to bootstrap the Portuguese annotation
project, maybe we will have a few new contributors!

On Mon, Dec 5, 2011 at 7:33 PM, Jason Baldridge <[email protected]>wrote:

> One thing that I think might be nice moving forward is to develop a robust
> set of models and test sets that involve at least two languages. I'm
> thinking Portuguese would be a good one in addition to English since:
>
>   - several of us speak it (I'm a non-native speaker who lived in Brazil
>   for a couple of years -- who else?)
>   - there are truly free annotated resources for it:
>   http://www.linguateca.pt/
>   - it's pretty darn widely spoken in the world, both as first and second
>   language
>
> Doing something like this would help push the annotation effort forward as
> well. E.g. we commit to providing support for a language means we need to
> get at least some annotations going for each level of analysis we want to
> support, and that will in turn spur development on the tool that Jorn has
> been putting together.
>
> Jason
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>

Reply via email to