On 12/6/11 3:23 AM, Jason Baldridge wrote:
What I'm thinking of here is in part process so that we know the steps to
create the data for adding new languages such that others who want to add
them can do so much more easily, basically following a recipe and putting
in the effort.
If others want to spearhead efforts to add other languages, that's also
great. The more the merrier, as long as we use a standardized, replicable
process.
I think we should stick to the idea to make a web based annotation tool
which can be used by our community to label data.
To get started we can use the existing tooling and create a test set.
The test is in my opinion the first step anyway, we need some data to
determine how and what is labeled, we need some sample data to teach
new annotators and we need a test set to evaluate models trained
on community labeled data.
Jörn