Hello all,

Quick reminder that we will be starting our monthly Research Showcase
on *Machine
Translation on Wikipedia* in 30 minutes. Join us at
https://www.youtube.com/live/O7AqvHgqUVk.

Best,
Kinneret

On Fri, Jul 19, 2024 at 3:12 PM Kinneret Gordon <[email protected]>
wrote:

> Hi all,
>
> The next Research Showcase will be live-streamed next Wednesday, July 24,
> at 9:30 AM PST / 16:30 UTC. Find your local time here
> <https://zonestamp.toolforge.org/1721838600>. The theme for this showcase
> is *Machine Translation on Wikipedia*.
>
> You are welcome to watch via the YouTube stream:
> https://www.youtube.com/live/O7AqvHgqUVk. As usual, you can join the
> conversation in the YouTube chat as soon as the showcase goes live.
>
> This month's presentations:
> The Promise and Pitfalls of AI Technology in Bridging Digital Language
> DivideBy *Kai Zhu, Bocconi University*Machine translation technologies
> have the potential to bridge knowledge gaps across languages, promoting
> more inclusive access to information regardless of native languages. This
> study examines the impact of integrating Google Translate into Wikipedia's
> Content Translation system in January 2019. Employing a natural experiment
> design and difference-in-differences strategy, we analyze how this
> translation technology shock influenced the dynamics of content production
> and accessibility on Wikipedia across over a hundred languages. We find
> that this technology integration leads to a 149% increase in content
> production through translation, driven by existing editors becoming more
> productive as well as an expansion of the editor base. Moreover, we observe
> that machine translation enhances the propagation of biographical and
> geographical information, helping to close these knowledge gaps in the
> multilingual context. However, our findings also underscore the need for
> continued efforts to mitigate the preexisting systemic barriers. Our study
> contributes to our knowledge on the evolving role of artificial
> intelligence in shaping knowledge dissemination through enhanced language
> translation capabilities.Implications of Using Inorganic Content in
> Arabic Wikipedia EditionsBy *Saied Alshahrani and Jeanna Matthews,
> Clarkson University*Wikipedia articles (content pages) are one of the
> widely utilized training corpora for NLP tasks and systems, yet these
> articles are not always created, generated, or even edited organically by
> native speakers; some are automatically created, generated, or translated
> using Wikipedia bots or off-the-shelf translation tools like Google
> Translate without human revision or supervision. We first analyzed the
> three Arabic Wikipedia editions, Arabic (AR), Egyptian Arabic (ARZ), and
> Moroccan Arabic (ARY), and found that these Arabic Wikipedia editions
> suffer from a few serious issues, like large-scale automatic creations and
> translations from English to Arabic, all without human involvement,
> generating content (articles) that lack not only linguistic richness and
> diversity but also content that lacks cultural richness and meaningful
> representation of the Arabic language and its native speakers. We second
> studied the performance implications of using such inorganic,
> unrepresentative articles to train NLP tasks or systems, where we
> intrinsically evaluated the performance of two main NLP upstream tasks,
> namely word representation and language modeling, using word analogy and
> fill-mask evaluations. We found that most of the models trained on the
> organic and representative content outperformed or, at worst, performed on
> par with the models trained with inorganic content generated using bots or
> translated using templates included, demonstrating that training on
> unrepresentative content not only impacts the representation of native
> speakers but also impacts the performance of NLP tasks or systems. We
> recommend avoiding utilizing the automatically created, generated, or
> translated articles on Wikipedia when the task is a representation-based
> task, like measuring opinions, sentiments, or perspectives of native
> speakers, and also suggest that when registered users employ automated
> creation or translation, their contributions should be marked differently
> than “registered user” for better transparency; perhaps “registered user
> (automation-assisted)”.
> Best,Kinneret
>
_______________________________________________
Wiki-research-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to