Re: [Wikimedia-l] machine translation

Pharos Tue, 02 May 2017 13:17:44 -0700

I think it all depends on the level of engagement of the human translator.

When the tool is used in the right way, it is a fantastic tool.


Maybe we can find better methods to nudge people toward taking their time
and really doing work on their translations.

Thanks,
Pharos

On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
bodhisattwa.rg...@gmail.com> wrote:

> Content translation with Yandex is also a problem in Bengali Wikipedia.
> Some users have grown a tendency to create machine translated meaningless
> articles with this extension to increase edit count and article count. This
> has increased the workloads of admins to find and delete those articles.
>
> Yandex is not ready for many languages and it is better to shut it. We
> don't need it in Bengali.
>
> Regards
> On May 3, 2017 12:17 AM, "John Erling Blad" <jeb...@gmail.com> wrote:
>
> > Actually this _is_ about turning ContentTranslation off, that is what
> > several users in the community want. They block people using the
> extension
> > and delete the translated articles. Use of ContentTranslation has become
> a
> >  rather contentious case.
> >
> > Yandex as a general translation engine to be able to read some alien
> > language is quite good, but as an engine to produce written text it is
> not
> > very good at all. In fact it often creates quite horrible Norwegian, even
> > for closely related languages. One quite common problem is reordering of
> > words into meaningless constructs, an other problem is reordering lexical
> > gender in weird ways. The English preposition "a" is often translated as
> > "en" in a propositional phrase, and then the gender is added to the
> > following phrase. That gives a translation of  "Oppland is a county in…"
> >  into something like "Oppland er en fylket i…" This should be "Oppland er
> > et fylke i…".
> >
> > (I just checked and it seems like Yandex messes up a lot less now than
> > previously, but it is still pretty bad.)
> >
> > Apertium works because the language is closely related, Yandex does not
> > work because it is used between very different languages. People try to
> use
> > Yandex and gets disappointed, and falsely conclude that all language
> > translations are equally weird. They are not, but Yandex translations are
> > weird.
> >
> > The numerical threshold does not work. The reason is simple, the number
> of
> > fixes depends on language constructs that fails, and that is simply not a
> > constant for small text fragments. Perhaps if we could flag specific
> > language constructs that is known to give a high percentage of failures,
> > and if the translator must check those sentences. One such language
> > construct is disappearances between the preposition and the gender of the
> > following term in a prepositional phrase. If they are not similar, then
> the
> > sentence must be checked. It is not always wrong to write "en jenta" in
> > Norwegian, but it is likely to be wrong.
> >
> > A language model could be a statistical model for the language itself,
> not
> > for the translation into that language. We don't want a perfect language
> > model, but a sufficient language model to mark weird constructs. A very
> > simple solution could simply be to mark tri-grams that does not  already
> > exist in the text base for the destination as possible errors. It is not
> > necessary to do a live check, but  at least do it before the page can be
> > saved.
> >
> > Note the difference in what Yandex do and what we want to achieve; Yandex
> > translates a text between two different languages, without any clear
> reason
> > why. It is not to important if there are weird constructs in the text, as
> > long as it is usable in "some" context. We translate a text for the
> purpose
> > of republishing it. The text should be usable and easily readable in that
> > language.
> >
> >
> >
> > On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni <
> > amir.ahar...@mail.huji.ac.il> wrote:
> >
> > > 2017-05-02 18:20 GMT+03:00 John Erling Blad <jeb...@gmail.com>:
> > >
> > > > Brute force solution; turn the ContentTranslation off. Really stupid
> > > > solution.
> > >
> > >
> > > ... Then I guess you don't mind that I'm changing the thread name :)
> > >
> > >
> > > > The next solution; turn the Yandex engine off. That would solve a
> > > > part of the problem. Kind of lousy solution though.
> > > >
> > >
> > > > What about adding a language model that warns when the language
> > > constructs
> > > > gets to weird? It is like a "test" for the translation. The CT is
> used
> > > for
> > > > creating a translation, but the language model is used for verifying
> if
> > > the
> > > > translation is good enough. If it does not validate against the
> > language
> > > > model it should simply not be published to the main name space. It
> will
> > > > still be possible to create a draft, but then the user is completely
> > > aware
> > > > that the translation isn't good enough.
> > > >
> > > > Such a language model should be available as a test for any article,
> as
> > > it
> > > > can be used as a quality measure for the article. It is really a
> > quantity
> > > > measure for the well-spokenness of the article, but that isn't quite
> so
> > > > intuitive.
> > > >
> > >
> > > So, I'll allow myself to guess that you are talking about one
> particular
> > > language, probably Norwegian.
> > >
> > > Several technical facts:
> > >
> > > 1. In the past there were several cases in which translators to
> different
> > > languages who reported common translation mistakes to me. I passed them
> > on
> > > to Yandex developers, with whom I communicate quite regularly. They
> > > acknowledged receiving all of them. I am aware of at least one such
> > common
> > > mistake that was fixed; possibly there were more. If you can give me a
> > list
> > > of such mistakes for Norwegian, I'll be very happy to pass them on. I
> > > absolutely cannot promise that they will be fixed upstream, but it's
> > > possible.
> > >
> > > 2. In Norwegian, Apertium is used for translating between the two
> > varieties
> > > of Norwegian itself (Bokmål and Nynorsk), and from other Scandinavian
> > > languages. That's probably why it works so well—they are similar in
> > > grammar, vocabulary, and narrative style (I'll pass it on to Apertium
> > > developers—I'm sure they'll be happy to hear it). Unfortunately,
> machine
> > > translation from English is not available in Apertium. Apertium works
> > best
> > > with very similar languages, and English has two characteristics, which
> > are
> > > unfortunate when combined: it is both the most popular source for
> > > translation into almost all other languages (including Norwegian), and
> it
> > > is not _very_ similar to any other languages (except maybe Scots).
> > Machine
> > > translation from English into Norwegian is only possible with Yandex at
> > the
> > > moment. More engines may be added in the future, but at the moment
> that's
> > > all we have. That's why disabling Yandex completely would indeed be a
> > lousy
> > > solution: A lot of people say that without machine translation
> > integration
> > > Content Translation is useless. Not all users think like that, but many
> > do.
> > >
> > > 3. We can define a numerical threshold of acceptable percentage of
> > machine
> > > translation post-editing. Currently it's 75%. It's a tad embarrassing,
> > but
> > > it's hard-coded at the moment, but it can be very easily be made into a
> > > variable per language. If the translator tries to publish a page in
> which
> > > less than that is modified, a warning will be shown.
> > >
> > > 4. I'm not sure what do you mean by "language model". If it's any kind
> > of a
> > > linguistic engine, then it's definitely not within the resources that
> the
> > > Language team itself can currently dedicate. However, if somebody who
> > knows
> > > Norwegian and some programming will write a script that analyzes common
> > bad
> > > constructs in a Wikipedia dump, this will be very useful. This would
> > > basically be an upgraded version of suggestion #1 above. (In my spare
> > time
> > > as a volunteer I'm doing something comparable for Hebrew, although not
> > for
> > > translation, but for improving how MediaWiki link trails work.)
> > > _______________________________________________
> > > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> > > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> > > wiki/Wikimedia-l
> > > New messages to: Wikimedia-l@lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> > wiki/Wikimedia-l
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> _______________________________________________
> Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
> wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>
_______________________________________________
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] machine translation

Reply via email to