You do understand correctly!

The main idea about NLP components is with POS tagger as an example:

1. a fall back system that does unsupervised POS tagging.
2. the ability to plug in an existing POS tagger as these become  available for 
specific languages.

I would as supervisor would recommend working with 3 languages.
English, Hebrew, and the GSOC native language.

If we could get QA from other native speakers we would incorporate them into 
the workflow.

I think that by using a deletion/reversion based heuristic we may also be able 
to make a spam corpus to boost the accuracy of the corpuses.

Operation Manager 
Mobil: +36 30 866 6706

Római Horizon Kft. 
H-1039 Budapest 
Királyok útja  291. D. ép. fszt. 2.
Tel:   +36 1 492 1492
Fax:  +36 1 266 5529

-----Original Message-----
[] On Behalf Of Amir E. Aharoni
Sent: Tuesday, April 03, 2012 10:19 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools

2012/4/3 karthik prasad <>:
> Hello,
> I am a GSoC aspirant and have compiled a proposal for one of the 
> project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I 
> would sincerely appreciate if you could kindly go through it and 
> suggest corrections/additions so that I can settle with a coherent proposal.
> Link to my proposal :

Nice, but why only English?

If i understand the proposal correctly, this project is supposed to be able to 
work with almost any language with very little effort.

Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי 
‪“We're living in pieces, I want to live in peace.” – T. Moore‬

Wikitech-l mailing list

Wikitech-l mailing list

Reply via email to