[Wikimediaindia-l] Updates on Google translation project in Tamil Wikipedia
Hi, Some updates on the Google translation project in Tamil Wikipedia. ( Refer http://bit.ly/adRtak for some background info ) The project started around August 2009 and went on continuously till August 2010 and added 1000+ articles. During May 2010 - August 2010, we reduced adding new articles and asked the translators to rework on their existing articles. We did a quality review of these articles and found that only around 50% of them has an acceptable minimum quality regarding translation ( We just rated the style of the language and accuracy in translation. We did not do a full review on the merit of the article). Following this, the Tamil Wiki community arrived at the following consensus with Google: * The translators should be rated and only the select few should be allowed to continue ( done now ) * Only the list of topics given by the community should be translated. Earlier Google gave a list based on most searched queries from which we chose. ( To begin with we have now given a list of 25 featured articles from English Wikipedia ) * New articles will be added in user namespace and will only be moved to article namespace when it reaches acceptable quality. * Earlier articles should be reworked again simultaneously to improve their quality. Google and Tamil Wikipedia collaborated intensively in order to arrive at a process that can be followed in other Wikipedias too. We feel that we have reached a stage where we have good pointers like above that other Wiki communities can follow. Since the project may be unmanageable if it grows too much, we will urge all Indic language communities having Google translated articles to start talking to Google without further delay. * The Wiki communities should start a dialogue / process / agreement to rework existing articles. * *New articles should only be added in a rate which is suitable for the size and activity of the community. Once all the issues are solved, then the scale of the operation may be increased gradually.* We still some have issues in this project and there is no final consensus yet regarding the future of the project. But, we hope that this update helps. Ravi ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Updates on Google translation project in Tamil Wikipedia
Congrats to Tamil community for trying to bring out a process for Google's Translation project. I really wonder how other language communities are handling this. Apart from Tamil, Google Translation project is going on at least in Hindi, Kannada, and Telugu. It is banned in Bengali wiki. I could see many articles are loaded to wikis each day. And for many of them the only contributor is the Google employee who translated it. Shiju On Thu, Dec 2, 2010 at 7:55 PM, Arjuna Rao Chavala arjunar...@googlemail.com wrote: Hi, Thanks a lot for the update. I think the updated process is similar with Open source community philosophy when Commercial companies (like IBM, Sun etc) contribute source code. Tamil Wiki has that kind of rigor in quality checking and is able to do a good job. Other Wikipedias may not be in a position to engage in a similar way, due to policies and/or level of active wikipedians. One more comment below. On Thu, Dec 2, 2010 at 7:19 PM, Ravishankar ravidre...@gmail.com wrote: Hi, Some updates on the Google translation project in Tamil Wikipedia. --snip We did a quality review of these articles and found that only around 50% of them has an acceptable minimum quality regarding translation ( We just rated the style of the language and accuracy in translation. We did not do a full review on the merit of the article). --snip-- Can you elaborate more on the style? How did you measure the accuracy of translation? It may be desirable to adopt the English articles for the target wikipedia, than verbatim translation. How much effort was spent to arrive at the above conclusions? # of articles, # of reviewers, time frame etc would help. Thanks Arjun ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] Updates on Google translation project in Tamil Wikipedia
Ravi, That's a nice process to deal with Google translation project. In Bengali Wikipedia, if the translation is not in acceptable quality community also shift content to the user namespace of that translator and ask them to improve it. But there are very few examples that translator rewrite or retouch the article to improve it. Translators are not coming back to take care of their article. So a lot of untouched bad translated articles are in the user namespace at Bengali Wikipedia. And from my experiences in Bengali Wikipedia the translators are not consistent with their translations. If you rated someone for the first time for his first translation, it is not be sure the second translation will be same quality or better than the first one. So community have to re-rated him every time he post an article. Since the translators are not regular at Wikipedia and they are not responsive. There is no other contact point available for the community to communicate with them. We can create a translation coordination page at local Wikipedia, but there is no way to inform the existing or new translators to follow the page. I am very much interested to know, how Tamil community communicating with the translators at Google? Belayet On 3 December 2010 07:12, Shiju Alex shijualexonl...@gmail.com wrote: Congrats to Tamil community for trying to bring out a process for Google's Translation project. I really wonder how other language communities are handling this. Apart from Tamil, Google Translation project is going on at least in Hindi, Kannada, and Telugu. It is banned in Bengali wiki. I could see many articles are loaded to wikis each day. And for many of them the only contributor is the Google employee who translated it. Shiju On Thu, Dec 2, 2010 at 7:55 PM, Arjuna Rao Chavala arjunar...@googlemail.com wrote: Hi, Thanks a lot for the update. I think the updated process is similar with Open source community philosophy when Commercial companies (like IBM, Sun etc) contribute source code. Tamil Wiki has that kind of rigor in quality checking and is able to do a good job. Other Wikipedias may not be in a position to engage in a similar way, due to policies and/or level of active wikipedians. One more comment below. On Thu, Dec 2, 2010 at 7:19 PM, Ravishankar ravidre...@gmail.com wrote: Hi, Some updates on the Google translation project in Tamil Wikipedia. --snip We did a quality review of these articles and found that only around 50% of them has an acceptable minimum quality regarding translation ( We just rated the style of the language and accuracy in translation. We did not do a full review on the merit of the article). --snip-- Can you elaborate more on the style? How did you measure the accuracy of translation? It may be desirable to adopt the English articles for the target wikipedia, than verbatim translation. How much effort was spent to arrive at the above conclusions? # of articles, # of reviewers, time frame etc would help. Thanks Arjun ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Belayet Hossain http://www.facebook.com/bellayet http://twitter.com/bellayet http://bellayet.wordpress.com (Bangla) Knowledge is universal ...so share it. Hillel If I am not for myself, who will be for me? If I am only for myself, what am I? If not now, when? ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia
I agree with what Belayet mentioned about the Google translated articles on Bengali Wikipedia. So far, we have not been contacted by Google directly. Rather, we have dealt with the paid contractors who were hired by Google. As Belayet has said, the paid translators do not follow up with the translations (except for a single case). Which in turn, causes a lot of problems for us to fix the articles. Technically, we have not banned Google translated articles or contributors who use GTT. Rather what we have discouraged is the dump-and-run translators who just dump their malformed translation and never responds to our messages or makes a second edit to the article to fix it. In most of the cases we dealt with recently, we haven't even deleted the articles ... rather we moved them to user space, giving the user a chance to fix the article to readable Bengali. So, basically, here are the issues: 1. We'd love to work with Google if they collaborate with us and take responsibility of producing readable content. 2. Which means, translations can't be dump-and-run jobs. Since the translator toolkit is still horrible in English-to-Bengali translation, the Google team or their translators need to do fix the pages to make them readable and grammatically correct. 3. We can use the current system ... i.e., translators are free to do all of the sandboxing/translation experiments in their user space. We are very picky about the content that goes into the article space and don't want half-done, incorrect language articles to go there. So, after a translation article has been approved by the community, it can be moved to the main space. But to do any of the above, the Google team needs to contact us. We don't know who is or who isn't working for Google. Almost all the vendors/contractors we have dealt so far used thorwaway accounts that are used only once, and never again. Bottom line: we are happy to work with Google, but only if Google does not bypass the existing Bengali wikipedian community. Thanks, Ragib User:Ragib on bn and en -- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218 Website: http://www.ragibhasan.com On Thu, Dec 2, 2010 at 9:19 PM, Belayet Hossain bella...@gmail.com wrote: Ravi, That's a nice process to deal with Google translation project. In Bengali Wikipedia, if the translation is not in acceptable quality community also shift content to the user namespace of that translator and ask them to improve it. But there are very few examples that translator rewrite or retouch the article to improve it. Translators are not coming back to take care of their article. So a lot of untouched bad translated articles are in the user namespace at Bengali Wikipedia. And from my experiences in Bengali Wikipedia the translators are not consistent with their translations. If you rated someone for the first time for his first translation, it is not be sure the second translation will be same quality or better than the first one. So community have to re-rated him every time he post an article. Since the translators are not regular at Wikipedia and they are not responsive. There is no other contact point available for the community to communicate with them. We can create a translation coordination page at local Wikipedia, but there is no way to inform the existing or new translators to follow the page. I am very much interested to know, how Tamil community communicating with the translators at Google? Belayet On 3 December 2010 07:12, Shiju Alex shijualexonl...@gmail.com wrote: Congrats to Tamil community for trying to bring out a process for Google's Translation project. I really wonder how other language communities are handling this. Apart from Tamil, Google Translation project is going on at least in Hindi, Kannada, and Telugu. It is banned in Bengali wiki. I could see many articles are loaded to wikis each day. And for many of them the only contributor is the Google employee who translated it. Shiju On Thu, Dec 2, 2010 at 7:55 PM, Arjuna Rao Chavala arjunar...@googlemail.com wrote: Hi, Thanks a lot for the update. I think the updated process is similar with Open source community philosophy when Commercial companies (like IBM, Sun etc) contribute source code. Tamil Wiki has that kind of rigor in quality checking and is able to do a good job. Other Wikipedias may not be in a position to engage in a similar way, due to policies and/or level of active wikipedians. One more comment below. On Thu, Dec 2, 2010 at 7:19 PM, Ravishankar ravidre...@gmail.com wrote: Hi, Some updates on the Google translation project in Tamil Wikipedia. --snip We did a quality review of these articles and found that only around 50% of them has an acceptable minimum quality regarding translation ( We just rated the style of the language and
Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia
Sundar, That will be a nice idea. Please go ahead. They can discuss it in that mailing lists or at local Wikipedia page. Belayet On 3 December 2010 10:56, BalaSundaraRaman sundarbe...@yahoo.com wrote: Hi Ragib, Belayet, Good to see you here. :) Shall I forward your email to the Google Translation team? If they wish so, they can take it from there. What do you think? Cheers, Sundar That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted. - George Boole, quoted in Iverson's Turing Award Lecture - Original Message From: Ragib Hasan ragibha...@gmail.com To: Discussion list for Bangladeshi Wikimedians wikimedia...@lists.wikimedia.org Cc: wikimediaindia-l@lists.wikimedia.org Sent: Fri, December 3, 2010 10:01:15 AM Subject: Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia I agree with what Belayet mentioned about the Google translated articles on Bengali Wikipedia. So far, we have not been contacted by Google directly. Rather, we have dealt with the paid contractors who were hired by Google. As Belayet has said, the paid translators do not follow up with the translations (except for a single case). Which in turn, causes a lot of problems for us to fix the articles. Technically, we have not banned Google translated articles or contributors who use GTT. Rather what we have discouraged is the dump-and-run translators who just dump their malformed translation and never responds to our messages or makes a second edit to the article to fix it. In most of the cases we dealt with recently, we haven't even deleted the articles ... rather we moved them to user space, giving the user a chance to fix the article to readable Bengali. So, basically, here are the issues: 1. We'd love to work with Google if they collaborate with us and take responsibility of producing readable content. 2. Which means, translations can't be dump-and-run jobs. Since the translator toolkit is still horrible in English-to-Bengali translation, the Google team or their translators need to do fix the pages to make them readable and grammatically correct. 3. We can use the current system ... i.e., translators are free to do all of the sandboxing/translation experiments in their user space. We are very picky about the content that goes into the article space and don't want half-done, incorrect language articles to go there. So, after a translation article has been approved by the community, it can be moved to the main space. But to do any of the above, the Google team needs to contact us. We don't know who is or who isn't working for Google. Almost all the vendors/contractors we have dealt so far used thorwaway accounts that are used only once, and never again. Bottom line: we are happy to work with Google, but only if Google does not bypass the existing Bengali wikipedian community. Thanks, Ragib User:Ragib on bn and en -- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218 Website: http://www.ragibhasan.com On Thu, Dec 2, 2010 at 9:19 PM, Belayet Hossain bella...@gmail.com wrote: Ravi, That's a nice process to deal with Google translation project. In Bengali Wikipedia, if the translation is not in acceptable quality community also shift content to the user namespace of that translator and ask them to improve it. But there are very few examples that translator rewrite or retouch the article to improve it. Translators are not coming back to take care of their article. So a lot of untouched bad translated articles are in the user namespace at Bengali Wikipedia. And from my experiences in Bengali Wikipedia the translators are not consistent with their translations. If you rated someone for the first time for his first translation, it is not be sure the second translation will be same quality or better than the first one. So community have to re-rated him every time he post an article. Since the translators are not regular at Wikipedia and they are not responsive. There is no other contact point available for the community to communicate with them. We can create a translation coordination page at local Wikipedia, but there is no way to inform the existing or new translators to follow the page. I am very much interested to know, how Tamil community communicating with the translators at Google? Belayet On 3 December 2010 07:12, Shiju Alex shijualexonl...@gmail.com wrote: Congrats to Tamil community for trying to bring out a process for Google's Translation project. I really wonder how
Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia
Hi Sundar, Please go ahead and forward our emails/thread to your contacts at Google. Once again, we are open to the Google translation project as long as we are kept informed and the article creation process follows our local community standards. Ragib On Fri, Dec 3, 2010 at 12:41 AM, Belayet Hossain bella...@gmail.com wrote: Sundar, That will be a nice idea. Please go ahead. They can discuss it in that mailing lists or at local Wikipedia page. Belayet On 3 December 2010 10:56, BalaSundaraRaman sundarbe...@yahoo.com wrote: Hi Ragib, Belayet, Good to see you here. :) Shall I forward your email to the Google Translation team? If they wish so, they can take it from there. What do you think? Cheers, Sundar That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted. - George Boole, quoted in Iverson's Turing Award Lecture - Original Message From: Ragib Hasan ragibha...@gmail.com To: Discussion list for Bangladeshi Wikimedians wikimedia...@lists.wikimedia.org Cc: wikimediaindia-l@lists.wikimedia.org Sent: Fri, December 3, 2010 10:01:15 AM Subject: Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia I agree with what Belayet mentioned about the Google translated articles on Bengali Wikipedia. So far, we have not been contacted by Google directly. Rather, we have dealt with the paid contractors who were hired by Google. As Belayet has said, the paid translators do not follow up with the translations (except for a single case). Which in turn, causes a lot of problems for us to fix the articles. Technically, we have not banned Google translated articles or contributors who use GTT. Rather what we have discouraged is the dump-and-run translators who just dump their malformed translation and never responds to our messages or makes a second edit to the article to fix it. In most of the cases we dealt with recently, we haven't even deleted the articles ... rather we moved them to user space, giving the user a chance to fix the article to readable Bengali. So, basically, here are the issues: 1. We'd love to work with Google if they collaborate with us and take responsibility of producing readable content. 2. Which means, translations can't be dump-and-run jobs. Since the translator toolkit is still horrible in English-to-Bengali translation, the Google team or their translators need to do fix the pages to make them readable and grammatically correct. 3. We can use the current system ... i.e., translators are free to do all of the sandboxing/translation experiments in their user space. We are very picky about the content that goes into the article space and don't want half-done, incorrect language articles to go there. So, after a translation article has been approved by the community, it can be moved to the main space. But to do any of the above, the Google team needs to contact us. We don't know who is or who isn't working for Google. Almost all the vendors/contractors we have dealt so far used thorwaway accounts that are used only once, and never again. Bottom line: we are happy to work with Google, but only if Google does not bypass the existing Bengali wikipedian community. Thanks, Ragib User:Ragib on bn and en -- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218 Website: http://www.ragibhasan.com On Thu, Dec 2, 2010 at 9:19 PM, Belayet Hossain bella...@gmail.com wrote: Ravi, That's a nice process to deal with Google translation project. In Bengali Wikipedia, if the translation is not in acceptable quality community also shift content to the user namespace of that translator and ask them to improve it. But there are very few examples that translator rewrite or retouch the article to improve it. Translators are not coming back to take care of their article. So a lot of untouched bad translated articles are in the user namespace at Bengali Wikipedia. And from my experiences in Bengali Wikipedia the translators are not consistent with their translations. If you rated someone for the first time for his first translation, it is not be sure the second translation will be same quality or better than the first one. So community have to re-rated him every time he post an article. Since the translators are not regular at Wikipedia and they are not responsive. There is no other contact point available for the community to communicate with them. We can create a translation coordination page at local Wikipedia, but there is no way to inform the existing or new
Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia
Forwarded the thread along with links to your two emails. - Sundar That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted. - George Boole, quoted in Iverson's Turing Award Lecture - Original Message From: Ragib Hasan ragibha...@gmail.com To: Discussion list for Bangladeshi Wikimedians wikimedia...@lists.wikimedia.org Cc: wikimediaindia-l@lists.wikimedia.org Sent: Fri, December 3, 2010 11:27:24 AM Subject: Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia Hi Sundar, Please go ahead and forward our emails/thread to your contacts at Google. Once again, we are open to the Google translation project as long as we are kept informed and the article creation process follows our local community standards. Ragib On Fri, Dec 3, 2010 at 12:41 AM, Belayet Hossain bella...@gmail.com wrote: Sundar, That will be a nice idea. Please go ahead. They can discuss it in that mailing lists or at local Wikipedia page. Belayet On 3 December 2010 10:56, BalaSundaraRaman sundarbe...@yahoo.com wrote: Hi Ragib, Belayet, Good to see you here. :) Shall I forward your email to the Google Translation team? If they wish so, they can take it from there. What do you think? Cheers, Sundar That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted. - George Boole, quoted in Iverson's Turing Award Lecture - Original Message From: Ragib Hasan ragibha...@gmail.com To: Discussion list for Bangladeshi Wikimedians wikimedia...@lists.wikimedia.org Cc: wikimediaindia-l@lists.wikimedia.org Sent: Fri, December 3, 2010 10:01:15 AM Subject: Re: [Wikimediaindia-l] [Wikimedia-BD] Updates on Google translation project in Tamil Wikipedia I agree with what Belayet mentioned about the Google translated articles on Bengali Wikipedia. So far, we have not been contacted by Google directly. Rather, we have dealt with the paid contractors who were hired by Google. As Belayet has said, the paid translators do not follow up with the translations (except for a single case). Which in turn, causes a lot of problems for us to fix the articles. Technically, we have not banned Google translated articles or contributors who use GTT. Rather what we have discouraged is the dump-and-run translators who just dump their malformed translation and never responds to our messages or makes a second edit to the article to fix it. In most of the cases we dealt with recently, we haven't even deleted the articles ... rather we moved them to user space, giving the user a chance to fix the article to readable Bengali. So, basically, here are the issues: 1. We'd love to work with Google if they collaborate with us and take responsibility of producing readable content. 2. Which means, translations can't be dump-and-run jobs. Since the translator toolkit is still horrible in English-to-Bengali translation, the Google team or their translators need to do fix the pages to make them readable and grammatically correct. 3. We can use the current system ... i.e., translators are free to do all of the sandboxing/translation experiments in their user space. We are very picky about the content that goes into the article space and don't want half-done, incorrect language articles to go there. So, after a translation article has been approved by the community, it can be moved to the main space. But to do any of the above, the Google team needs to contact us. We don't know who is or who isn't working for Google. Almost all the vendors/contractors we have dealt so far used thorwaway accounts that are used only once, and never again. Bottom line: we are happy to work with Google, but only if Google does not bypass the existing Bengali wikipedian community. Thanks, Ragib User:Ragib on bn and en -- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218 Website: http://www.ragibhasan.com On Thu, Dec 2, 2010 at 9:19 PM, Belayet Hossain bella...@gmail.com wrote: Ravi, That's a nice process to deal with Google translation project. In Bengali Wikipedia, if the translation is not in acceptable quality community also shift content to the user namespace of that translator and ask them to improve it. But there are very few examples that translator rewrite or retouch the article to improve it. Translators are not coming back to take
Re: [Wikimediaindia-l] Bangalore 23rd Wikimeetup on Dec 18 , with Erik / Danese / Alolita
Hi all, As some of you would have known, In addition to the meeting with folks from Wikimedia Foundation on 18th December, the regular Bangalore Wikipedia monthly meetup will be held on usual 2nd Sunday, 12th December. We would focus on planning for 10th anniversary celebration event on the 22nd Meetup...We are thinking of a plan for one day public event for the anniverary and would like to shape it well with thoughts / planning / execution in this meetup.. Please do refer to detailed agenda and sign up at the meetup pages. *12th Dec Sunday WPMBL22* http://en.wikipedia.org/wiki/Wikipedia:Meetup/Bangalore/Bangalore22 - CIS office, Domulur, Bangalore *18th Dec Saturday WPMBL23* http://en.wikipedia.org/wiki/Wikipedia:Meetup/Bangalore/Bangalore23 -- TERI (just few yards before CIS), Domulur , Bangalore ...This meetup will be attended by Erik Möller ( Deputy Director , Wikimedia Foundation) , Danese Cooper ( Chief Technical Officer, Wikimedia Foundation) and Alolita Sharma (Engineering Programs Manager, Wikimedia Foundation). Regards Tinu Cherian On Mon, Nov 29, 2010 at 3:48 PM, CherianTinu Abraham tinucher...@gmail.comwrote: Hi all, I am extremely pleased to announce the 22nd Wikipedia /Wikimedia Wikimeetup , Bangalore on *Dec 18, Satuday* with *Erik Möller *( Deputy Director , Wikimedia Foundation) , * Danese Cooper* ( Chief Technical Officer, Wikimedia Foundation) and* Alolita Sharma* (Engineering Programs Manager, Wikimedia Foundation). More details can be found here http://en.wikipedia.org/wiki/Wikipedia:MBL22 Please do signup on the meetup page participate ... Regards Tinu Cherian N.B. Please note the change of date/day of the usual Wikimeetup for Decemeber month. ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l