Instead, how about a complete Latin implementation, then we slowly go about extending it to support all languages which have extended from this.
I believe there are three languages: Chinese, Latin and Greek; which we'd need to build off to get a pretty complete foundation of all languages. Then we slowly simulate the syntax and language evolution across time, and voilà! On Tue, Dec 6, 2011 at 8:33 AM, Jason Baldridge <[email protected]> wrote: > One thing that I think might be nice moving forward is to develop a robust > set of models and test sets that involve at least two languages. I'm > thinking Portuguese would be a good one in addition to English since: > > - several of us speak it (I'm a non-native speaker who lived in Brazil > for a couple of years -- who else?) > - there are truly free annotated resources for it: > http://www.linguateca.pt/ > - it's pretty darn widely spoken in the world, both as first and second > language > > Doing something like this would help push the annotation effort forward as > well. E.g. we commit to providing support for a language means we need to > get at least some annotations going for each level of analysis we want to > support, and that will in turn spur development on the tool that Jorn has > been putting together. > > Jason > > -- > Jason Baldridge > Associate Professor, Department of Linguistics > The University of Texas at Austin > http://www.jasonbaldridge.com > http://twitter.com/jasonbaldridge
