Great!

What I'm thinking of here is in part process so that we know the steps to
create the data for adding new languages such that others who want to add
them can do so much more easily, basically following a recipe and putting
in the effort.

If others want to spearhead efforts to add other languages, that's also
great. The more the merrier, as long as we use a standardized, replicable
process.

Jason

On Mon, Dec 5, 2011 at 6:26 PM, [email protected] <
[email protected]> wrote:

> I am a Portuguese native speaker.
> I contributed with parsers for some of the Linguateca formats and we can
> train models for most of the OpenNLP tools now. It is missing the
> Coreference and Parser, but I will have time to work on that next year. (I
> still have to work with the paper and data you sent, about the Portuguese
> parser, but I had to change my priorities).
>
> And yes, the tools Jörn is working on are great. I hope I can start
> using/working with it as soon as I finish my thesis, in a couple of months.
> I am thinking of organizing an Apache OpenNLP event here with students from
> the Linguistics and CS departments to bootstrap the Portuguese annotation
> project, maybe we will have a few new contributors!
>
> On Mon, Dec 5, 2011 at 7:33 PM, Jason Baldridge <[email protected]
> >wrote:
>
> > One thing that I think might be nice moving forward is to develop a
> robust
> > set of models and test sets that involve at least two languages. I'm
> > thinking Portuguese would be a good one in addition to English since:
> >
> >   - several of us speak it (I'm a non-native speaker who lived in Brazil
> >   for a couple of years -- who else?)
> >   - there are truly free annotated resources for it:
> >   http://www.linguateca.pt/
> >   - it's pretty darn widely spoken in the world, both as first and second
> >   language
> >
> > Doing something like this would help push the annotation effort forward
> as
> > well. E.g. we commit to providing support for a language means we need to
> > get at least some annotations going for each level of analysis we want to
> > support, and that will in turn spur development on the tool that Jorn has
> > been putting together.
> >
> > Jason
> >
> > --
> > Jason Baldridge
> > Associate Professor, Department of Linguistics
> > The University of Texas at Austin
> > http://www.jasonbaldridge.com
> > http://twitter.com/jasonbaldridge
> >
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to