Hi everyone, I'm interested in working with Sugar Labs for this year's Google Summer of Code, and I wanted to get some feedback on my project proposal before I actually submit my application.
I apologize for the relatively late introduction, I've been discussing this idea on IRC since last week, but hadn't thought to put it up on the mailing list until today. That said, comments / suggestions / feedback on the idea would be greatly appreciated. Pluggable Translation Server - GSoC Idea Proposal ================================================= As a global project, internationalization is a central tenet of Sugar and OLPC. The aim of this project is to establish a server program and client API that can be used in activities to introduce a way to reliably access quality machine translations of arbitrary strings. Overview -------- Since accurate machine translation is a computationally and memory expensive operation, it is not reasonable to expect good results from running directly on an XO. A server to supply these translations to a larger network of XOs is therefore a preferable solution to create these translations. As not all translators are created equal for all possible language pairs, or may not be possible in a given situation (due to hardware, software, monetary, etc. limitations), it is advantageous to give our translation server program the ability to access multiple services, via a plug-in architecture. For example, Google Translate will likely offer very high quality translations for many language pairs, but the associated cost of $20 USD/1 Million translated characters through the API means that it is irresponsible to require it. Likewise, a FOSS project such as Apertium may well provide good es<->en translations, but has no way of translating e.g. de->ru, which limits the global usefulness. To overcome these obstacles, pooling all possible translation sources into a single server allows a convenient and consistent means of providing reliable machine translation for any purpose. Plan ---- This is a very general overview of my plan for finishing the project, and how it will be split up. It will be split into appropriate weekly goals based on the feedback I get regarding this initial division of work. ### Phase I The first order of business for this project is to establish a minimal-dependency Python HTTP server application with a plug-in architecture to facilitate any interested developer to add machine translator backends later on in the project. Along with this, some initial backends will of course need to be created. I plan to add one that would run on the same server, and one that would use a web service, to ensure the robustness and generality of the server architecture. The first of these plug-ins would be using Apertium, the FOSS project already used by Sugar through the #meeting-es irc channel on freenode. Next, Bing Translate will likely be added, due to it being one of the major web translators that provides a free API key. Google Translate is another high priority service due to its quality, but will not be added initially because its API has no free tier for usage. Some of these other plugins will be considered for any remaining time left at the end of the project, but these are of course far lower priority than the initial two systems, and will only be added during GSoC if possible. (If not, I'll likely just add some other systems after GSoC has finished) Though not yet finalized, the server will most likely use RESTful HTTP and JSON responses to make it easily accessible from any programming language that wants to interact with it. ### Phase II The next leg in the project will involve creating a Python client API to request and receive translations from a given server. This will of course be designed before any coding starts on the server, and will be designed to be as generic and straightforward to use as possible, so it can be used easily and efficiently even outside of the sugar environment. >From the point of view of the client API, the backend the server is using to actually translate the text is unimportant. it will just send a call to the server, specifying the language pair and source text, and receive a resultant string, or appropriate error. The server will handle selecting the appropriate translator and any fallbacks that may be needed. The API user need only specify the source text and the language pair to translate in order to interact with the server. ### Phase III This stage of the project involves the addition of the client API to the Chat activity. As Chat is a very simple activity, this should not take much time at all, and a new Translate activity will be developed in addition. This activity will be very, very simplistic, and while functional, essentially a demo of how to use the client API and server. This will also allow me to give some additional real world testing to the programs, so that any potential issues can be caught while there's still time to fix them. Some Benefits ------------- - A generic translation API that can be utilized anywhere within the Sugar environment. - Increase internationalization and language learning possibilities across the Sugar project. - Easy to use general purpose translation API that can be utilized within any programming language that can make a network connection and parse JSON. Deliverable ----------- - Translation server with at least one server-based and one web-based backend, possibly more if time allows / need be. - Generic Python client API that is simple to use for arbitrary translations - A "translate message" functionality added to the Chat activity to aid understanding for language learners and to allow students who do not know the same language to communicate. - A simplistic translation Activity for Sugar utilizing the new library. Some Unknowns / Feedback Requested ---------------------------------- - How do the clients become aware of the server? Is it configured, or is there some kind of auto-detection? - How much technical ability is expected of the server operators? - Is it reasonable to establish large servers with more resources to be used by XO users who may not have access to a server or the technical abilities to manage one? How would abuse be prevented? Conclusion ---------- I think that this proposal has a lot of potential to make a positive impact on the Sugar and OLPC ecosystem. I'm very excited about the broader applicability of the project to anyone who wants a reliable way of accessing translations, and I hope you also consider this project worthy of GSoC funding. Thanks! Erik Price _______________________________________________ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel