What about more diverse languages? Chinese, Arabic, Russian might be good examples. In a sense, they will provide wider test coverage. And any of the above has quite a large audience. Annotated resources and speakers in the team might be a problem, though.
Aliaksandr On Mon, Dec 5, 2011 at 10:33 PM, Jason Baldridge <[email protected]>wrote: > One thing that I think might be nice moving forward is to develop a robust > set of models and test sets that involve at least two languages. I'm > thinking Portuguese would be a good one in addition to English since: > > - several of us speak it (I'm a non-native speaker who lived in Brazil > for a couple of years -- who else?) > - there are truly free annotated resources for it: > http://www.linguateca.pt/ > - it's pretty darn widely spoken in the world, both as first and second > language > > Doing something like this would help push the annotation effort forward as > well. E.g. we commit to providing support for a language means we need to > get at least some annotations going for each level of analysis we want to > support, and that will in turn spur development on the tool that Jorn has > been putting together. > > Jason > > -- > Jason Baldridge > Associate Professor, Department of Linguistics > The University of Texas at Austin > http://www.jasonbaldridge.com > http://twitter.com/jasonbaldridge >
