On Fri, Jun 10, 2011 at 10:29 AM, Olivier Grisel <[email protected]>wrote:
> > No idea. I think Jacob Perkins (and possibly others) who works with > NLTK was also interested in such open copora. See for instance this > thread on metaoptimize.com/qa: > > > http://metaoptimize.com/qa/questions/4650/what-licenses-cover-a-nltk-tagger-trained-on-treebank > > Great. I think a lot of people would benefit from a standard infrastructure for annotation and training of models for different languages. > > BTW, there is a lot that can be done to bootstrap POS-taggers from raw > data > > and the tags in Wiktionary, so if folks are interested in that I'm happy > to > > provide pointers. > > As mentionned by Tommaso I think we should start to structure the wiki > for this effort. Do you want me to create sub-pages of [1] for > POS-tagging and NE detection? I could write the NE detection page > with a description of the current effort on corpus-refiner / Walter > and let you add pointers for the POS tags case. > > [1] https://cwiki.apache.org/OPENNLP/opennlp-annotations.html > > Yep, that sounds great. I might not be able to get to it right away, but can put it on my stack! Jason -- Jason Baldridge Assistant Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
