Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-28 Thread Ragib Hasan
On Wed, Jul 28, 2010 at 1:40 PM, praveenp me.prav...@gmail.com wrote:
 On Wednesday 28 July 2010 09:19 PM, Pedro Sanchez wrote:

 Nor google nor the wmf is creating articles automatically via machine
 translations.
 Google is not pushing translated articles.

 Toolkit is a page where you can see a (sometimes not good)
 translation, and you (if you want to) are able to complete or fix it.

 When you believe it is complete, you upload it to wikipedia, just like
 you waoult upload a fully manual translation when you consider it's
 complete.

 Unfortunately that is not something happening around. It looks like
 somebody hiring someone and they are creating a database of words. :(




From my experience in Bengali wikipedia, many GTT-assisted edits are
unsalvageable. This is not a fault of GTT per se, but rather a fault
of the model Google followed here. Of course GTT does not provide a
translation magically, but the translators hired by Google did an
awful job of the first draft of translation, and never fixed that. If
you show a volunteer a 1 para stub with problems, they are happy to go
and fix it. But when you bring a 100 KB full article where every
sentence needs fixing, the volunteers just give up. Even seasoned
wikipedians are not willing to devote several hours in doing a
complete rewrite of the article ... a manual translation from scratch
takes a much shorter time.

Of course, last week, one of the translators came back with a much
better version of an article, and we allowed the translator to create
it in the user space. If the translation passes the community's
standards, we will move it to the main namespace. So, we are not
completely blocking/banning such paid translations, rather we banned
bad, unfixed, unreadable translations and translators that were not
willing to fix their problems.

--
Ragib

___
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l


Re: [Foundation-l] Is Google translation is good for Wikipedias?

2010-07-27 Thread Ragib Hasan
As an admin in Bengali wikipedia, I had to deal with this issue a lot
(some of which were discussed with the Telegraph (India) newspaper
article). But I'd like to elaborate our stance here:

(The tool used was Google Translation Toolkit. (not Google Translate).
There is a distinction between these two tools. Google Translation
Toolkit (GTT) is a translation-memory based semi-manual translation
tool. That is, it learns translation skills as you gradually translate
articles by hand. Later, this can be used to automate translation.)

Issues:
1. Community involvement: First of all, the local community was not at
all involved or informed about this project. All on a sudden, we found
new users signing up, dropping a large article on a random topic, and
move away. These users never responded to any talk page messages, so
we first assumed these were just random users experimenting with
wikipedia.

Even now, no one from Google has contacted us in Bengali wikipedia and
inform us about Google's intentions. This is not a problem by itself,
but see the following points.

2. Translation quality: The quality of the translations was awful. The
translations added to Bengali wikipedia were artificial, dry, and used
obscure words and phrases. It looked as if a non-native speaker sat
down with a dictionary in hand, and mechanically translated each
sentence word by word. That led to sentences which are hard to
understand, or downright nonsensical.

The articles were half-done. Numerals were not translated at all. The
punctuation symbol for Bengali language (the danda symbol: ред ) was
not used. (apparently, GTT and/or the google transliteration tool does
not support that).

The articles were also full of spelling mistakes. The paid translator
misspelled many simple words, or even used different spellings for the
same word in different parts of the article.

Finally, different languages have different sentence structures.
Sometimes, a complex sentence is better expressed if broken up in two
sentences in another language. We found that the translators simply
translated sentences preserving their English language structure. This
caused the resulting Bengali sentences awkward and artificial to read.
For example, we do not write If x then y in Bengali just by
replacing if and then with the corresponding Bengali words. But the
translators did that, apparently this is an artifact of using GTT.


3. Lack of follow up: When we found the above problems, naturally, we
asked the contributor to fix them. Got no reply. It is NOT the task of
volunteers to clean up the mess after the one-night-standish paid
translators. Given the small number of volunteers active at any given
moment, it will take enormous efforts in our part to go through these
articles and fix the punctuation, spelling, and grammar issues. Not to
mention the awkward language style used by the translators.

So, after getting a cold shoulder from the paid translators about
fixing their mess, we had to ban such edits outright. We didn't know
who was behind this, until the Wikimania talk from Google. Not that it
matters ... even now, we won't allow these half done and badly
translated articles on bengali wikipedia.

Bengali wikipedia is small (21k articles), but we do not want to
populate it overnight with badly translated content, some of which
won't even qualify as grammatically correct Bengali. While wikipedia
may be a perpetual work in progress, that does not mean we need to be
guinea-pigs of some careless experiments. So, our stance is, Thanks,
but NO Thanks!. Unless, of course, they can put enough commitment
into the translations and fix mistakes.

We welcome automation in translation, but not at the expense of
introducing incorrect and messy content on wikipedia. We'd rather stay
small and hand-craft than allow an experimental tool and unskilled
paid translators creating a big mess.


Thanks

Ragib (User:Ragib on en and bn)

--
Ragib Hasan, Ph.D
NSF Computing Innovation Fellow and
Assistant Research Scientist

Dept of Computer Science
Johns Hopkins University
3400 N Charles Street
Baltimore, MD 21218

Website:
http://www.ragibhasan.com




On Sun, Jul 25, 2010 at 2:12 AM, Shiju Alex shijualexonl...@gmail.com wrote:
 Hello All,

 Recently there are lot of discussions (in this list also) regarding the
 translation project by Google for some of the big language wikipedias. The
 foundation also seems like approved the efforts of Google. But I am not sure
 whether any one is interested to consult the respective language community
 to know their views.

 As far as I know only Tamil, Bengali, and Swahili Wikipedians have raised
 their concerns about Google's project. But, does this means that other
 communities are happy about Google efforts? If there is no active community
 in a wikipedia how can we expect response from communities? If there is no
 response from a community, does that mean that Google can hire some native
 speakers and use machine translation to create articles