Re: [Foundation-l] Is Google translation is good for Wikipedias?
We welcome automation in translation, but not at the expense of introducing incorrect and messy content on wikipedia. We'd rather stay small and hand-craft than allow an experimental tool and unskilled paid translators creating a big mess. Yes. This is the answer that you will get from most of the active wiki ((small wikis) communities where this project is going on. Many of the small wiki communities are not worried about the numbers as some big wikipedias do. Quality is more important for small wikis when number of contributors are less. *Many of us will use this quality matrix* itself to bring in more people. My real concern is about the rift that is happening in a language community due to this project. Issues of a language wiki is taken outside wiki to prove some points against its contributors. Two types are communities are evolving out of this project. *Google's Wiki community* and *Wiki's wiki community*. :) This is really annoying as far as small wikis are concerned. So, some sort of intervention is required to make sure this project run smootly on different wiikipedias. ~Shiju On Wed, Jul 28, 2010 at 1:38 AM, Ragib Hasan ragibha...@gmail.com wrote: As an admin in Bengali wikipedia, I had to deal with this issue a lot (some of which were discussed with the Telegraph (India) newspaper article). But I'd like to elaborate our stance here: (The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.) Issues: 1. Community involvement: First of all, the local community was not at all involved or informed about this project. All on a sudden, we found new users signing up, dropping a large article on a random topic, and move away. These users never responded to any talk page messages, so we first assumed these were just random users experimenting with wikipedia. Even now, no one from Google has contacted us in Bengali wikipedia and inform us about Google's intentions. This is not a problem by itself, but see the following points. 2. Translation quality: The quality of the translations was awful. The translations added to Bengali wikipedia were artificial, dry, and used obscure words and phrases. It looked as if a non-native speaker sat down with a dictionary in hand, and mechanically translated each sentence word by word. That led to sentences which are hard to understand, or downright nonsensical. The articles were half-done. Numerals were not translated at all. The punctuation symbol for Bengali language (the danda symbol: । ) was not used. (apparently, GTT and/or the google transliteration tool does not support that). The articles were also full of spelling mistakes. The paid translator misspelled many simple words, or even used different spellings for the same word in different parts of the article. Finally, different languages have different sentence structures. Sometimes, a complex sentence is better expressed if broken up in two sentences in another language. We found that the translators simply translated sentences preserving their English language structure. This caused the resulting Bengali sentences awkward and artificial to read. For example, we do not write If x then y in Bengali just by replacing if and then with the corresponding Bengali words. But the translators did that, apparently this is an artifact of using GTT. 3. Lack of follow up: When we found the above problems, naturally, we asked the contributor to fix them. Got no reply. It is NOT the task of volunteers to clean up the mess after the one-night-standish paid translators. Given the small number of volunteers active at any given moment, it will take enormous efforts in our part to go through these articles and fix the punctuation, spelling, and grammar issues. Not to mention the awkward language style used by the translators. So, after getting a cold shoulder from the paid translators about fixing their mess, we had to ban such edits outright. We didn't know who was behind this, until the Wikimania talk from Google. Not that it matters ... even now, we won't allow these half done and badly translated articles on bengali wikipedia. Bengali wikipedia is small (21k articles), but we do not want to populate it overnight with badly translated content, some of which won't even qualify as grammatically correct Bengali. While wikipedia may be a perpetual work in progress, that does not mean we need to be guinea-pigs of some careless experiments. So, our stance is, Thanks, but NO Thanks!. Unless, of course, they can put enough commitment into the translations and fix mistakes. We welcome automation in translation, but not at the expense of introducing incorrect and messy
Re: [Foundation-l] Is Google translation is good for Wikipedias?
Dear colleagues, My experiences with the Translate Kit are negative, too. It happened just too often that a sentence was so twisted that I did not understand it. Checking it with the original took me a lot of time, so I decided that doing the translation by myself is much quicker and reliable. It is good for nobody to read Wikipedia articles in gibberish. The idea that the translation tool is doing the work and that a human being has to make just some little corrections, has simply failed. Especially negative was, to me, that the Translator kit encourages you to translate sentence by sentence. I don't want to do injustice to anyone, but in my view there are two groups of Wikipedians: - those who want to see huge article numbers and believe that any article with any content is good, in any quality, and that the Wikipedians are sufficient to do the rest. - those who believe that (at least a minimum) quality is important and that articles below a certain niveau do damage to a Wikipedia. The small numbers of Wikipedians cannot cope with the work. They welcome not any content, but content that meets the possible interests of their readers. It seems to me that the first group is mainly populated by computer specialists and natives of English. The second group consists of language specialists and non natives of English. But of course there are many exceptions. Kind regards Ziko van Dijk 2010/7/28 Shiju Alex shijualexonl...@gmail.com: We welcome automation in translation, but not at the expense of introducing incorrect and messy content on wikipedia. We'd rather stay small and hand-craft than allow an experimental tool and unskilled paid translators creating a big mess. Yes. This is the answer that you will get from most of the active wiki ((small wikis) communities where this project is going on. Many of the small wiki communities are not worried about the numbers as some big wikipedias do. Quality is more important for small wikis when number of contributors are less. *Many of us will use this quality matrix* itself to bring in more people. My real concern is about the rift that is happening in a language community due to this project. Issues of a language wiki is taken outside wiki to prove some points against its contributors. Two types are communities are evolving out of this project. *Google's Wiki community* and *Wiki's wiki community*. :) This is really annoying as far as small wikis are concerned. So, some sort of intervention is required to make sure this project run smootly on different wiikipedias. ~Shiju On Wed, Jul 28, 2010 at 1:38 AM, Ragib Hasan ragibha...@gmail.com wrote: As an admin in Bengali wikipedia, I had to deal with this issue a lot (some of which were discussed with the Telegraph (India) newspaper article). But I'd like to elaborate our stance here: (The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.) Issues: 1. Community involvement: First of all, the local community was not at all involved or informed about this project. All on a sudden, we found new users signing up, dropping a large article on a random topic, and move away. These users never responded to any talk page messages, so we first assumed these were just random users experimenting with wikipedia. Even now, no one from Google has contacted us in Bengali wikipedia and inform us about Google's intentions. This is not a problem by itself, but see the following points. 2. Translation quality: The quality of the translations was awful. The translations added to Bengali wikipedia were artificial, dry, and used obscure words and phrases. It looked as if a non-native speaker sat down with a dictionary in hand, and mechanically translated each sentence word by word. That led to sentences which are hard to understand, or downright nonsensical. The articles were half-done. Numerals were not translated at all. The punctuation symbol for Bengali language (the danda symbol: । ) was not used. (apparently, GTT and/or the google transliteration tool does not support that). The articles were also full of spelling mistakes. The paid translator misspelled many simple words, or even used different spellings for the same word in different parts of the article. Finally, different languages have different sentence structures. Sometimes, a complex sentence is better expressed if broken up in two sentences in another language. We found that the translators simply translated sentences preserving their English language structure. This caused the resulting Bengali sentences awkward and artificial to read. For example, we do not write If x then y in Bengali just by replacing if and then
Re: [Foundation-l] Is Google translation is good for Wikipedias?
Consider Malayalam Language sentense വിക്കിപീഡിയ ഒരു നല്ല വിജ്ഞാനകോശം ആണ് means Wikipedia is a good encyclopedia. How one can understand if a translator picks meaning of Malayalam words and create an English sentence like wikipedia one good encyclopedia is. Please think about more complex sentences. Sentence structure of Indian languages are completely different from English or European languages. Google's current attempt putting extra weight over tiny communities by pushing them complete rewriting (Easiest way is deletion because some sentence does not make any sense at all). I am not against machine translations but Google must improve their tool or toolkit before trying it over small wikipedias. On Sunday 25 July 2010 09:01 PM, Andreas Kolbe wrote: --- On Sun, 25/7/10, Fajrofai...@gmail.com wrote: Machine translation is always unsuitable to produce usable articles, but can help to start new ones in smaller wikipedias. I second that. About 50% of machine translation output is gibberish, or worse, plausible-sounding text that actually says the opposite of what the original said. To get it into readable form takes about as long as starting from scratch. Translation memory software only helps where content is repetitive. A. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
On Wed, Jul 28, 2010 at 10:20 AM, praveenp me.prav...@gmail.com wrote: Consider Malayalam Language sentense വിക്കിപീഡിയ ഒരു നല്ല വിജ്ഞാനകോശം ആണ് means Wikipedia is a good encyclopedia. How one can understand if a translator picks meaning of Malayalam words and create an English sentence like wikipedia one good encyclopedia is. Please think about more complex sentences. Sentence structure of Indian languages are completely different from English or European languages. Google's current attempt putting extra weight over tiny communities by pushing them complete rewriting (Easiest way is deletion because some sentence does not make any sense at all). I am not against machine translations but Google must improve their tool or toolkit before trying it over small wikipedias. Nor google nor the wmf is creating articles automatically via machine translations. Google is not pushing translated articles. Toolkit is a page where you can see a (sometimes not good) translation, and you (if you want to) are able to complete or fix it. When you believe it is complete, you upload it to wikipedia, just like you waoult upload a fully manual translation when you consider it's complete. ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
Ziko, again, we are not talking about machine translations; Google doesn't have machine translation for Bangla, Malayalam, Tamil etc. yet. This is about translation memory. One of the things about MAT, whose use in the professional translator community is still debated but most popular for translations of time-dependent things like news, is that the original is often a very rough translation that requires a _lot_ of editing. The biggest problem is not the toolkit itself (with some exceptions - punctuation and templates, for example) but the translators who do not bother to use it properly, creating poor translations with lots of spelling mistakes and leaving behind a wasteland of poor quality articles. GTTK can be used as a force of good if someone puts in the appropriate time and effort; when used _properly_ by a careful, knowledgeable translator who gives ample time for proofreading, articles created with it should be virtually indistinguishable from any other article. It is my thought that the huge problem here is lack of engagement with communities. Essentially, Google swooped down and started dropping large amounts of poor quality content on our projects without engaging the people from those communities. The people in Google's contest also didn't engage the communities, nor did they respond to requests to improve their content. -m. On Wed, Jul 28, 2010 at 7:18 AM, Ziko van Dijk zvand...@googlemail.com wrote: 2010/7/28 Nathan nawr...@gmail.com: Just to be sure I understand... It's good that you ask, indeed. :-) No, it's not about free software, and the Wikimedians are not too snobby or lazy to correct poor language. That is what I frequently do in de.WP and eo.WP, and I suppose Ragib and many others as well. The point is: The machine translated articles are often so bad that I simply don't understand them. I *cannot* correct them, because I don't know what they are saying. Kind regards Ziko What's happening here is that human beings, using a software tool, are translating articles from the English Wikipedia into a variety of other languages and posting them on the comparatively small Wikipedia projects in these languages. The articles, of unknown intrinsic quality, are usually mid to low quality translations. In the projects with an active community, some have rejected these articles because they are not high quality and because the community refuses to be responsible for fixing punctuation and other errors made by editors who are not members of the community. In the projects without an active community, Wikimedians (who may not speak any of the languages affected by the Google initiative) are objecting for a variety of other reasons - because the software used to assist translation isn't free, because the effort is managed by a commercial organization or because the endeavor wasn't cleared with the Wikimedia community first. Some are also concerned that these new articles will somehow deter new editors from becoming involved, despite clear evidence that a larger base of content attracts more readers, and more readers plus imperfect content leads to more editors. What I find interesting is that few seem to be interested in keeping or improving the translated articles; Google's attempt to provide content in under-served languages is actually offending Wikimedians, despite our ostensible commitment to the same goal. Concerns like bureaucratic pre-approval, using free software, etc. are somehow more important than reaching more people with more content. It all seems strange and un-Wikimedian like to me. Obviously there are things Google should have done differently. Maybe working with them to improve their process should be the focus here? ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l -- Ziko van Dijk Niederlande ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Push translation
Yes, of course if it's not actually reviewed and corrected by a human it's going to be bad. What I said was that if it's used as it was meant to be used, the results should be indistinguishable from a normal human translation, regardless of the language involved because all mistakes would be fixed by a person. People often neglect to do that, but that doesn't make the tool inherently evil. -m. On Tue, Jul 27, 2010 at 2:35 PM, Aphaia aph...@gmail.com wrote: Ah, I omitted T, and I meant Toolkit. A toolkit with garbage could be called toolkit, but it doesn't change it is useless; it cannot deal with syntax properly, i.e. conjugation etc. at this moment. Intended to be reviewed and corrected by a human doesn't assure it was really reviewed and corrected by a human to a sufficient extent. It could be enough for your target language, but not for mine. Thanks. On Wed, Jul 28, 2010 at 5:15 AM, Casey Brown li...@caseybrown.org wrote: On Tue, Jul 27, 2010 at 3:44 PM, Mark Williamson node...@gmail.com wrote: Aphaia, Shiju Alex and I are referring to Google Translator Toolkit, not Google Translate. If the person using the Toolkit uses it as it was _meant_ to be used, the results should be as good as a human translation because they've been reviewed and corrected by a human. But if the program were being used by a human who speaks the language, wouldn't it be *pull* translation and not *push* translation? -- Casey Brown Cbrown1023 ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l -- KIZU Naoko http://d.hatena.ne.jp/Britty (in Japanese) Quote of the Day (English): http://en.wikiquote.org/wiki/WQ:QOTD ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
Mark Williamson: GTTK can be used as a force of good if someone puts in the appropriate time and effort; when used _properly_ by a careful, knowledgeable It is my thought that the huge problem here is lack of engagement with communities. Essentially, Google swooped down and started dropping Agreed. Again, in my experience it is quicker and delivers more quality to translate by your own. If others have different experiences (it may depend on the language), okay. It seems that something went very wrong when telling people who to contribute to a Wikipedia language version. Could you report more about that, Mark? Kind regards Ziko -- Ziko van Dijk Niederlande ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Push translation
Is anyone from Google reading this thread? Because of this thread i tried to play with the Google Translator Toolkit a little and found some technical problems. When i tried to send bug reports about them through the Contact us form, i received after a few minutes a bounce message from the translation-editor-supp...@google.com address. I love reporting bugs, and developers are supposed to love reading them, but it looks like i'm stuck here... 2010/7/27 Mark Williamson node...@gmail.com Aphaia, Shiju Alex and I are referring to Google Translator Toolkit, not Google Translate. If the person using the Toolkit uses it as it was _meant_ to be used, the results should be as good as a human translation because they've been reviewed and corrected by a human. -- אָמִיר אֱלִישָׁע אַהֲרוֹנִי Amir Elisha Aharoni http://aharoni.wordpress.com We're living in pieces, I want to live in peace. - T. Moore ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
[Foundation-l] I didn't know we're on the BBC!
I've just discovered that the BBC's music site [1] is using our content for their biographies of musicians/bands [2]. This makes me happy. [1] http://www.bbc.co.uk/music/ [2] http://www.bbc.co.uk/music/faqs#why_is_the_bbc_using_wikipedia ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
On Wednesday 28 July 2010 09:19 PM, Pedro Sanchez wrote: Nor google nor the wmf is creating articles automatically via machine translations. Google is not pushing translated articles. Toolkit is a page where you can see a (sometimes not good) translation, and you (if you want to) are able to complete or fix it. When you believe it is complete, you upload it to wikipedia, just like you waoult upload a fully manual translation when you consider it's complete. Unfortunately that is not something happening around. It looks like somebody hiring someone and they are creating a database of words. :( ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
On Wed, Jul 28, 2010 at 1:40 PM, praveenp me.prav...@gmail.com wrote: On Wednesday 28 July 2010 09:19 PM, Pedro Sanchez wrote: Nor google nor the wmf is creating articles automatically via machine translations. Google is not pushing translated articles. Toolkit is a page where you can see a (sometimes not good) translation, and you (if you want to) are able to complete or fix it. When you believe it is complete, you upload it to wikipedia, just like you waoult upload a fully manual translation when you consider it's complete. Unfortunately that is not something happening around. It looks like somebody hiring someone and they are creating a database of words. :( From my experience in Bengali wikipedia, many GTT-assisted edits are unsalvageable. This is not a fault of GTT per se, but rather a fault of the model Google followed here. Of course GTT does not provide a translation magically, but the translators hired by Google did an awful job of the first draft of translation, and never fixed that. If you show a volunteer a 1 para stub with problems, they are happy to go and fix it. But when you bring a 100 KB full article where every sentence needs fixing, the volunteers just give up. Even seasoned wikipedians are not willing to devote several hours in doing a complete rewrite of the article ... a manual translation from scratch takes a much shorter time. Of course, last week, one of the translators came back with a much better version of an article, and we allowed the translator to create it in the user space. If the translation passes the community's standards, we will move it to the main namespace. So, we are not completely blocking/banning such paid translations, rather we banned bad, unfixed, unreadable translations and translators that were not willing to fix their problems. -- Ragib ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] I didn't know we're on the BBC!
On Wed, Jul 28, 2010 at 10:01 AM, Bod Notbod bodnot...@gmail.com wrote: I've just discovered that the BBC's music site [1] is using our content for their biographies of musicians/bands [2]. This makes me happy. [1] http://www.bbc.co.uk/music/ [2] http://www.bbc.co.uk/music/faqs#why_is_the_bbc_using_wikipedia Marvelous! They appear to be using just the lead section from our articles, and then from Musicbrainz.org they're pulling the metadata (band members, collaborations) and external links list. Examples (they use the same unfortunate url scheme as musicbrainz...) The Beatles http://www.bbc.co.uk/music/artists/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d Ludwig van Beethoven http://www.bbc.co.uk/music/artists/1f9df192-a621-4f54-8850-2c5373b7eac9 Portishead http://www.bbc.co.uk/music/artists/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11 - It's not fully automated. They're missing a few obscure bands (that we and musicbrainz have the relevant content for) such as Lullatone. Also they're occasionally missing Wikipedia's content when we've disambiguated the page name, such as Solex http://www.bbc.co.uk/music/artists/e064f6f6-76a8-4efe-a94b-09bec8942347 http://en.wikipedia.org/wiki/Solex_%28musician%29 On the other hand, they have pages for artists that we don't cover yet, such as Ogurusu Norihide. - Relatedly, all their album reviews (including items from c.2002) seem to be released under CC BY-NC-SA - Separately, I tried to send the BBC this bug report, but the webform refused to send because is not plain text... Gah! Maybe someone here can pass it along to their webmaster? Broken link in FAQ: The link here: http://www.bbc.co.uk/music/faqs#what_happens_if_wikipedias_vandalised currently points to: http://en.wikipedia.org/wiki/Wikipedia_vandalism which is incorrect. It should point to: http://en.wikipedia.org/wiki/Wikipedia:Vandalism Thanks for the pointer, Bod. Quiddity ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] I didn't know we're on the BBC!
On Wed, Jul 28, 2010 at 2:51 PM, quiddity pandiculat...@gmail.com wrote: On Wed, Jul 28, 2010 at 10:01 AM, Bod Notbod bodnot...@gmail.com wrote: I've just discovered that the BBC's music site [1] is using our content for their biographies of musicians/bands [2]. This makes me happy. [1] http://www.bbc.co.uk/music/ [2] http://www.bbc.co.uk/music/faqs#why_is_the_bbc_using_wikipedia Marvelous! They appear to be using just the lead section from our articles, and then from Musicbrainz.org they're pulling the metadata (band members, collaborations) and external links list. Examples (they use the same unfortunate url scheme as musicbrainz...) The Beatles http://www.bbc.co.uk/music/artists/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d Ludwig van Beethoven http://www.bbc.co.uk/music/artists/1f9df192-a621-4f54-8850-2c5373b7eac9 Portishead http://www.bbc.co.uk/music/artists/8f6bd1e4-fbe1-4f50-aa9b-94c450ec0f11 - It's not fully automated. They're missing a few obscure bands (that we and musicbrainz have the relevant content for) such as Lullatone. Also they're occasionally missing Wikipedia's content when we've disambiguated the page name, such as Solex http://www.bbc.co.uk/music/artists/e064f6f6-76a8-4efe-a94b-09bec8942347 http://en.wikipedia.org/wiki/Solex_%28musician%29 On the other hand, they have pages for artists that we don't cover yet, such as Ogurusu Norihide. - Relatedly, all their album reviews (including items from c.2002) seem to be released under CC BY-NC-SA - Separately, I tried to send the BBC this bug report, but the webform refused to send because is not plain text... Gah! Maybe someone here can pass it along to their webmaster? Broken link in FAQ: The link here: http://www.bbc.co.uk/music/faqs#what_happens_if_wikipedias_vandalised currently points to: http://en.wikipedia.org/wiki/Wikipedia_vandalism which is incorrect. It should point to: http://en.wikipedia.org/wiki/Wikipedia:Vandalism Thanks for the pointer, Bod. Quiddity Did we set this up? Even links to edit with a statement to do so if you find anything problematic with it. Very nice! James ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
Fajro wrote: On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibha...@gmail.com wrote: (The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.) Another issue: The resulting translation memory is not free This is a red herring. Some real and important issues have been raised about machine translations, but this is not one of them. The fact that the source codes for the translation processes are not free does not make the results of such machine translations unfree. Key to anything being copyright is that material must be original and not the result of a mechanical process. Machine translations are mechanical processes. Another person using the same software with the same text should have the same results. It is also important that the allegedly infringing text must have been fixed in some medium. A person issuing a take down order must show, as an necessary element of that order, where the material in question was previously published. Two identical texts by different authors need not be copies of each other. With human efforts two such identical texts are highly improbable, but this need not be the case with machine translation. Indeed if the same software keeps producing different results I would question its reliability. Ray ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
[Foundation-l] Pending Changes update for July 28
Hi folks, It's been a little while since I've sent out an update (sorry about that). The Pending Changes trial continues apace, with 1,382 articles configured to use the feature as of this writing. Most of the work on the software that powers Pending Changes is focused on refactoring and stability. Some of the performance problems associated with this feature have been fixed, and we believe we have fixed all of the user-visible performance problems. Looking at our backend systems, there's some areas where this feature is still causing more load than it should, which is where our work is focused now. Aaron Schulz, who has done the lion's share of the development to date (thanks Aaron!) continues to stay involved, but at a much reduced level as he focuses on non-Wikimedia stuff, while Chad Horohoe ramps up. We'll be publishing some statistics soon which outline per page metrics on revisions under Pending Changes. Nimish Gautam and Devin Finzer (Devin is an intern that is working for Wikimedia Foundation this summer) are working on some statistics that they'll be publishing soon. More discussion is here: http://en.wikipedia.org/wiki/Wikipedia_talk:Pending_changes/Metrics It will be time for a vote soon about whether to keep Pending Changes enabled on en.wikipedia.org. We'll be pinging folks in the community about the post-trial discussion. If we're rigidly following the proposal, the trial will end on August 15, regardless of whether a vote has happened. However, we're probably already running late for making a decision by then. For a variety of operational reasons, we plan to leave the feature running while the community decides whether to keep the feature on, assuming that process lasts no more than a month or so after August 15. The main discussion area for this feature is here: http://en.wikipedia.org/wiki/Wikipedia:Pending_changes/Feedback If you have comments/suggestions/questions, that's a good place to post them. Rob ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Pending Changes update for July 28
On Wed, Jul 28, 2010 at 1:43 PM, Rob Lanphier ro...@wikimedia.org wrote: For a variety of operational reasons, we plan to leave the feature running while the community decides whether to keep the feature on, assuming that process lasts no more than a month or so after August 15. Wanted to expand on this point a bit. The justification is that turning off Pending Changes is quite a bit of work and would clutter the logs (all of the Pending Changes pages would be Semi-Protected so they're not all immediate targets for vandalism). It would be a lot of work with no net benefit if the community decides to keep the feature on. This applies to both the operations staff as well as the community (they would have to mark everything with pending changes again after it was reactivated). If the community decides to not keep the feature, a little extra time of leaving it on during the discussion period wouldn't hurt--and would give people a chance to do some last-minute evaluation if they're on the fence. -Chad ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
[Foundation-l] A prerequisite for the neutral, notable sum of all human knowledge
The WMF mission is to provide free knowledge to the world. Wikipedia, in particular, hopes to summarize all notable topics into a neutral sum. Accomplishing this goal means Wikipedia an the WMF will have to evolve. Consider the implications of the mission: Every single work that contains notable topics must have complete coverage in Wikipedia. While every article need not cite every work, every article must accurately summarize every notable opinion of every notable topic in every work. Some have interpreted the role of the proposed citations project as one of merely centralizing the citations that already exist in Wikipedia. The mission, however, calls for a broader vision. This new project should have a bibliography of all works since that is the scope of the mission. The nature of knowledge further calls for us to understand the links between items containing knowledge, their categorical context and their abstract relationships. This broad, unambiguous view of works and their topics will allow us to explicate them neutrally and select only the most notable ones for inclusion. It will, in the limit of time, prevent our judgment from being clouded by the limited, local view of knowledge that we currently have. The proposed new project has the following features: It is a bibliography of all kinds of works that fall under the umbrella of the WMF mission. Works and collections of works contain disambiguating user contributed text and media. Works can link to other works. Works come together to form categories. People can use this site as their personal bibliography, encouraging participation of a much greater community of users and curation of the bibliography them. There are many challenges to creating a project of such scale, but in order to accomplish our goals of freeing knowledge we must strive to collect it and understand it in a more nuanced way than we currently are. Brian Mingus Graduate student Computational Cognitive Neuroscience Lab University of Colorado at Boulder ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Push translation
Google is, in my experience, very difficult for regular people to get in touch with. Sometimes, when a product is in beta, they give you a way to contact them. They used to have an e-mail to contact them at if you had information about bilingual corpora (I found one online from the Nunavut parliament for English and Inuktitut, but now it looks like they've removed the address) so they could use it to improve Google Translate. I think they intentionally have a relatively small support staff. I read somewhere that that had turned out to be a huge problem for the mobile phone they produced - people might not expect great support for a huge website like Google, but when they buy electronics, they certainly do expect to have someone they can call and talk to within 24 hours. I don't think that's completely unwise, though. I'm sure they get tons of crackpot e-mails all the time. I was reading an official blog about Google Translate, and in the post about their Wikipedia contests, someone wrote an angry comment that google must hate Spain because the Spanish language wasn't mentioned in that particular post. Now multiply that by millions, and that is part of the reason (or so I imagine) that Google makes it difficult to contact them. -m. On Wed, Jul 28, 2010 at 9:14 AM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: Is anyone from Google reading this thread? Because of this thread i tried to play with the Google Translator Toolkit a little and found some technical problems. When i tried to send bug reports about them through the Contact us form, i received after a few minutes a bounce message from the translation-editor-supp...@google.com address. I love reporting bugs, and developers are supposed to love reading them, but it looks like i'm stuck here... 2010/7/27 Mark Williamson node...@gmail.com Aphaia, Shiju Alex and I are referring to Google Translator Toolkit, not Google Translate. If the person using the Toolkit uses it as it was _meant_ to be used, the results should be as good as a human translation because they've been reviewed and corrected by a human. -- אָמִיר אֱלִישָׁע אַהֲרוֹנִי Amir Elisha Aharoni http://aharoni.wordpress.com We're living in pieces, I want to live in peace. - T. Moore ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
Well, my impression, and I'm by no means an expert in this (I'm not associated with Google), is that they emphasized quantity over quality and forgot to mention the importance of community to our projects. I heard that for the Swahili Wikipedia contest at least, they gave away prizes... but perhaps they should've included a requirement that the articles they created be rated as good by the community, not full of errors and nonsense sentences, and that all project participants who want any chance at winning must respond to all talkpage messages within 72 hours (or something like that). I think telling a group of newbies that they'll get a big prize if they translate the most articles is a recipe for disaster. What incentive do they have to make sure their translation is of good quality? What incentive do they have to stick around afterwards? -m. On Wed, Jul 28, 2010 at 9:10 AM, Ziko van Dijk zvand...@googlemail.com wrote: Mark Williamson: GTTK can be used as a force of good if someone puts in the appropriate time and effort; when used _properly_ by a careful, knowledgeable It is my thought that the huge problem here is lack of engagement with communities. Essentially, Google swooped down and started dropping Agreed. Again, in my experience it is quicker and delivers more quality to translate by your own. If others have different experiences (it may depend on the language), okay. It seems that something went very wrong when telling people who to contribute to a Wikipedia language version. Could you report more about that, Mark? Kind regards Ziko -- Ziko van Dijk Niederlande ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Re: [Foundation-l] Is Google translation is good for Wikipedias?
I'm not sure that's exactly the question. Rather, by using GTTK, people are contributing to building [[Translation memory]] for Google, which they can in turn use to build their statistical models. It's not that we're using non-free software, but rather that we're contributing to it. -m. On Wed, Jul 28, 2010 at 1:09 PM, Ray Saintonge sainto...@telus.net wrote: Fajro wrote: On Tue, Jul 27, 2010 at 8:38 PM, Ragib Hasan ragibha...@gmail.com wrote: (The tool used was Google Translation Toolkit. (not Google Translate). There is a distinction between these two tools. Google Translation Toolkit (GTT) is a translation-memory based semi-manual translation tool. That is, it learns translation skills as you gradually translate articles by hand. Later, this can be used to automate translation.) Another issue: The resulting translation memory is not free This is a red herring. Some real and important issues have been raised about machine translations, but this is not one of them. The fact that the source codes for the translation processes are not free does not make the results of such machine translations unfree. Key to anything being copyright is that material must be original and not the result of a mechanical process. Machine translations are mechanical processes. Another person using the same software with the same text should have the same results. It is also important that the allegedly infringing text must have been fixed in some medium. A person issuing a take down order must show, as an necessary element of that order, where the material in question was previously published. Two identical texts by different authors need not be copies of each other. With human efforts two such identical texts are highly improbable, but this need not be the case with machine translation. Indeed if the same software keeps producing different results I would question its reliability. Ray ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l ___ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l