I think it all depends on the level of engagement of the human translator. When the tool is used in the right way, it is a fantastic tool.
Maybe we can find better methods to nudge people toward taking their time and really doing work on their translations. Thanks, Pharos On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal < bodhisattwa.rg...@gmail.com> wrote: > Content translation with Yandex is also a problem in Bengali Wikipedia. > Some users have grown a tendency to create machine translated meaningless > articles with this extension to increase edit count and article count. This > has increased the workloads of admins to find and delete those articles. > > Yandex is not ready for many languages and it is better to shut it. We > don't need it in Bengali. > > Regards > On May 3, 2017 12:17 AM, "John Erling Blad" <jeb...@gmail.com> wrote: > > > Actually this _is_ about turning ContentTranslation off, that is what > > several users in the community want. They block people using the > extension > > and delete the translated articles. Use of ContentTranslation has become > a > > rather contentious case. > > > > Yandex as a general translation engine to be able to read some alien > > language is quite good, but as an engine to produce written text it is > not > > very good at all. In fact it often creates quite horrible Norwegian, even > > for closely related languages. One quite common problem is reordering of > > words into meaningless constructs, an other problem is reordering lexical > > gender in weird ways. The English preposition "a" is often translated as > > "en" in a propositional phrase, and then the gender is added to the > > following phrase. That gives a translation of "Oppland is a county in…" > > into something like "Oppland er en fylket i…" This should be "Oppland er > > et fylke i…". > > > > (I just checked and it seems like Yandex messes up a lot less now than > > previously, but it is still pretty bad.) > > > > Apertium works because the language is closely related, Yandex does not > > work because it is used between very different languages. People try to > use > > Yandex and gets disappointed, and falsely conclude that all language > > translations are equally weird. They are not, but Yandex translations are > > weird. > > > > The numerical threshold does not work. The reason is simple, the number > of > > fixes depends on language constructs that fails, and that is simply not a > > constant for small text fragments. Perhaps if we could flag specific > > language constructs that is known to give a high percentage of failures, > > and if the translator must check those sentences. One such language > > construct is disappearances between the preposition and the gender of the > > following term in a prepositional phrase. If they are not similar, then > the > > sentence must be checked. It is not always wrong to write "en jenta" in > > Norwegian, but it is likely to be wrong. > > > > A language model could be a statistical model for the language itself, > not > > for the translation into that language. We don't want a perfect language > > model, but a sufficient language model to mark weird constructs. A very > > simple solution could simply be to mark tri-grams that does not already > > exist in the text base for the destination as possible errors. It is not > > necessary to do a live check, but at least do it before the page can be > > saved. > > > > Note the difference in what Yandex do and what we want to achieve; Yandex > > translates a text between two different languages, without any clear > reason > > why. It is not to important if there are weird constructs in the text, as > > long as it is usable in "some" context. We translate a text for the > purpose > > of republishing it. The text should be usable and easily readable in that > > language. > > > > > > > > On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni < > > amir.ahar...@mail.huji.ac.il> wrote: > > > > > 2017-05-02 18:20 GMT+03:00 John Erling Blad <jeb...@gmail.com>: > > > > > > > Brute force solution; turn the ContentTranslation off. Really stupid > > > > solution. > > > > > > > > > ... Then I guess you don't mind that I'm changing the thread name :) > > > > > > > > > > The next solution; turn the Yandex engine off. That would solve a > > > > part of the problem. Kind of lousy solution though. > > > > > > > > > > > What about adding a language model that warns when the language > > > constructs > > > > gets to weird? It is like a "test" for the translation. The CT is > used > > > for > > > > creating a translation, but the language model is used for verifying > if > > > the > > > > translation is good enough. If it does not validate against the > > language > > > > model it should simply not be published to the main name space. It > will > > > > still be possible to create a draft, but then the user is completely > > > aware > > > > that the translation isn't good enough. > > > > > > > > Such a language model should be available as a test for any article, > as > > > it > > > > can be used as a quality measure for the article. It is really a > > quantity > > > > measure for the well-spokenness of the article, but that isn't quite > so > > > > intuitive. > > > > > > > > > > So, I'll allow myself to guess that you are talking about one > particular > > > language, probably Norwegian. > > > > > > Several technical facts: > > > > > > 1. In the past there were several cases in which translators to > different > > > languages who reported common translation mistakes to me. I passed them > > on > > > to Yandex developers, with whom I communicate quite regularly. They > > > acknowledged receiving all of them. I am aware of at least one such > > common > > > mistake that was fixed; possibly there were more. If you can give me a > > list > > > of such mistakes for Norwegian, I'll be very happy to pass them on. I > > > absolutely cannot promise that they will be fixed upstream, but it's > > > possible. > > > > > > 2. In Norwegian, Apertium is used for translating between the two > > varieties > > > of Norwegian itself (Bokmål and Nynorsk), and from other Scandinavian > > > languages. That's probably why it works so well—they are similar in > > > grammar, vocabulary, and narrative style (I'll pass it on to Apertium > > > developers—I'm sure they'll be happy to hear it). Unfortunately, > machine > > > translation from English is not available in Apertium. Apertium works > > best > > > with very similar languages, and English has two characteristics, which > > are > > > unfortunate when combined: it is both the most popular source for > > > translation into almost all other languages (including Norwegian), and > it > > > is not _very_ similar to any other languages (except maybe Scots). > > Machine > > > translation from English into Norwegian is only possible with Yandex at > > the > > > moment. More engines may be added in the future, but at the moment > that's > > > all we have. That's why disabling Yandex completely would indeed be a > > lousy > > > solution: A lot of people say that without machine translation > > integration > > > Content Translation is useless. Not all users think like that, but many > > do. > > > > > > 3. We can define a numerical threshold of acceptable percentage of > > machine > > > translation post-editing. Currently it's 75%. It's a tad embarrassing, > > but > > > it's hard-coded at the moment, but it can be very easily be made into a > > > variable per language. If the translator tries to publish a page in > which > > > less than that is modified, a warning will be shown. > > > > > > 4. I'm not sure what do you mean by "language model". If it's any kind > > of a > > > linguistic engine, then it's definitely not within the resources that > the > > > Language team itself can currently dedicate. However, if somebody who > > knows > > > Norwegian and some programming will write a script that analyzes common > > bad > > > constructs in a Wikipedia dump, this will be very useful. This would > > > basically be an upgraded version of suggestion #1 above. (In my spare > > time > > > as a volunteer I'm doing something comparable for Hebrew, although not > > for > > > translation, but for improving how MediaWiki link trails work.) > > > _______________________________________________ > > > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > > > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > > > wiki/Wikimedia-l > > > New messages to: Wikimedia-l@lists.wikimedia.org > > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > > wiki/Wikimedia-l > > New messages to: Wikimedia-l@lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> > _______________________________________________ > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > wiki/Wikimedia-l > New messages to: Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> > _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>