More specifically, the discussion aligns with the following project. https://github.com/ankur-india/ankur-india/wiki/Project-Ideas#add-language-grammar-rules-to-a-machine-translation-system
regards Runa ---------- Forwarded message ---------- From: Erik Moeller <[email protected]> Date: Wed, Apr 24, 2013 at 11:59 AM Subject: [Wikimedia-l] The case for supporting open source machine translation To: Wikimedia Mailing List <[email protected]> Wikimedia's mission is to make the sum of all knowledge available to every person on the planet. We do this by enabling communities in all languages to organize and collect knowledge in our projects, removing any barriers that we're able to remove. In spite of this, there are and will always be large disparities in the amount of locally created and curated knowledge available per language, as is evident by simple statistical comparison (and most beautifully visualized in Erik Zachte's bubble chart [1]). Google, Microsoft and others have made great strides in developing free-as-in-beer translation tools that can be used to translate from and to many different languages. Increasingly, it is possible to at least make basic sense of content in many different languages using these tools. Machine translation can also serve as a starting point for human translations. Although free-as-in-beer for basic usage, integration can be expensive. Google Translate charges $20 per 1M characters of text for API usage. [2] These tools get better from users using them, but I've seen little evidence of sharing of open datasets that would help the field get better over time. Undoubtedly, building the technology and the infrastructure for these translation services is a very expensive undertaking, and it's understandable that there are multiple commercial reasons that drive the major players' ambitions in this space. But if we look at it from the perspective of "How will billions of people learn in the coming decades", it seems clear that better translation tools should at least play some part in reducing knowledge disparities in different languages, and that ideally, such tools should be "free-as-in-speech" (since they're fundamentally related to speech itself). If we imagine a world where top notch open source MT is available, that would be a world where increasingly, language barriers to accessing human knowledge could be reduced. True, translation is no substitute for original content creation in a language -- but it could at least powerfully support and enable such content creation, and thereby help hundreds of millions of people. Beyond Wikimedia, high quality open source MT would likely be integrated in many contexts where it would do good for humanity and allow people to cross into cultural and linguistic spaces they would otherwise not have access to. While Wikimedia is still only a medium-sized organization, it is not poor. With more than 1M donors supporting our mission and a cash position of $40M, we do now have a greater ability to make strategic investments that further our mission, as communicated to our donors. That's a serious level of trust and not to be taken lightly, either by irresponsibly spending, or by ignoring our ability to do good. Could open source MT be such a strategic investment? I don't know, but I'd like to at least raise the question. I think the alternative will be, for the foreseeable future, to accept that this piece of technology will be proprietary, and to rely on goodwill for any integration that concerns Wikimedia. Not the worst outcome, but also not the best one. Are there open source MT efforts that are close enough to merit scrutiny? In order to be able to provide high quality result, you would need not only a motivated, well-intentioned group of people, but some of the smartest people in the field working on it. I doubt we could more than kickstart an effort, but perhaps financial backing at significant scale could at least help a non-profit, open source effort to develop enough critical mass to go somewhere. All best, Erik [1] http://stats.wikimedia.org/wikimedia/animations/growth/AnimationProjectsGrowthWp.html [2] https://developers.google.com/translate/v2/pricing -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Wikipedia and our other projects reach more than 500 million people every month. The world population is estimated to be >7 billion. Still a long way to go. Support us. Join us. Share: https://wikimediafoundation.org/ _______________________________________________ Wikimedia-l mailing list [email protected] Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l -- http://about.me/runa.bhattacharjee http://fedoraproject.org/wiki/User:Runab
_______________________________________________ Project-ideas mailing list [email protected] http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in
