Re: Portuguese

Jason Baldridge Tue, 06 Dec 2011 06:57:18 -0800

Yes, this is very much in line with what I'm thinking. I was suggesting
starting with two languages that I knew the team has specific first-hand
knowledge of, and for which we have decent resources to start the process.
And, if others want to chime in with other languages, so much the better!


On Tue, Dec 6, 2011 at 3:34 AM, Jörn Kottmann <[email protected]> wrote:

> On 12/6/11 3:23 AM, Jason Baldridge wrote:
>
>> What I'm thinking of here is in part process so that we know the steps to
>> create the data for adding new languages such that others who want to add
>> them can do so much more easily, basically following a recipe and putting
>> in the effort.
>>
>> If others want to spearhead efforts to add other languages, that's also
>> great. The more the merrier, as long as we use a standardized, replicable
>> process.
>>
>
> I think we should stick to the idea to make a web based annotation tool
> which can be used by our community to label data.
>
> To get started we can use the existing tooling and create a test set.
> The test is in my opinion the first step anyway, we need some data to
> determine how and what is labeled, we need some sample data to teach
> new annotators and we need a test set to evaluate models trained
> on community labeled data.
>
> Jörn
>
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Portuguese

Reply via email to