On 24/04/13 08:29, Erik Moeller wrote:
Could open source MT be such a strategic investment? I don't know, but
I'd like to at least raise the question. I think the alternative will
be, for the foreseeable future, to accept that this piece of
technology will be proprietary, and to rely on goodwill for any
integration that concerns Wikimedia. Not the worst outcome, but also
not the best one.

Are there open source MT efforts that are close enough to merit
scrutiny? In order to be able to provide high quality result, you
would need not only a motivated, well-intentioned group of people, but
some of the smartest people in the field working on it.  I doubt we
could more than kickstart an effort, but perhaps financial backing at
significant scale could at least help a non-profit, open source effort
to develop enough critical mass to go somewhere.

A huge and worthwile effort on its own, and anyway a necessary step for creating free MT software, would be to build a free (as in freedom) parallel translation corpus. This corpus could then be used as the starting point by people and groups who are producing free MT software, either under WMF or on their own.

This could be done by creating a new project where volunteers could compare Wikipedia articles and other free translated texts and mark sentences that are translations of other sentences. By the way, I believe Google Translate's corpus was created in this way.

Perhaps this could be best achieved by teaming with www.zooniverse.org or www.pgdp.net who have experience in this kind of projects. This would require specialized non-wiki software, and I don't think that the Foundation has enough experience in developing it.

(By the way, similar things that could be similarly useful include free OCR training data or free fully annotated text.)

_______________________________________________
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

Reply via email to