Sorry I'm coming to this discussion a bit late, but I'd like to underline a
slightly different aspect of the concern that Phoebe raised:

> It concerns me that, at least in the high-level project proposals I've
> seen (I haven't been tracking this closely, and haven't read the academic
> papers) I have not yet seen discussions of ethical data, or how we might
> think about identifying bias, or even how to recruit contributors and the
> impact on existing contributors.

Using the terminology of Ibram X. Kendi (and others), I'd put this as:
"it's not enough to not be racist, you must actively be *anti-racist*."

Abstract Wikipedia is a "color blind" project.  Indeed it is often
described as advancing WMF goals by improving the amount of content
available for minority languages.

However, it is built on a huge edifice of ML and AI technology which
advantages majority languages and the already-powerful.

As Phoebe mentioned, the subtle biases of ML translation toward majority
views (selecting the "proper" gender pronoun for someone described as a
"doctor" or "professor", say) are well known, and certainly deserve to be
foregrounded from the start, as Danny has pledged to do in his response to

But the infrastructure of this project is built this way from the ground
up.  Language models for European languages are orders of magnitude better
than language models for minority languages (if the latter exist at all).
The same is true for ontologies and every other constructed abstraction,
down to choices of what topics are significant enough to include in an
abstract article---but that ground has been ably covered by Kaldari and
others.  So let me concentrate solely on language models in the remainder
(with some parenthetical asides, for which I hope you'll forgive me).

I would like to challenge Abstract Wikipedia not only to be "not racist" or
"color blind", but to be actively *antiracist*.  That is, instead of
passively accepting the status quo wrt language models (& etc), to commit
to actively supporting a language model in *at least one* minority
language, treating it as a first-class citizen or (better) the *main*
output of the project.  That means not just looking for "a good enough
language model that happens not to be a European language" but *actively
developing the language model* so that the Abstract Wikipedia project *from
inception* has a positive effect on *at least one* community speaking a
underrepresented language with a small Wikipedia.  (Again, WLOG this could
apply to general AI/ML support for many many minority groups, but I'm
sticking with "at least one" and "language model" in order to make this as
concrete and actionable as possible.)  This of course also means committing
to hire a speaker of that non-European language as part of the core team
(not just an "and translations" afterthought), committing to foregrounding
that language in demonstrations, and doing outreach and community building
to the language group in question.  (All the mockups I've seen have been in
German and English, and have been pitched to an English-speaking audience.)

I don't think it is wise in 2020 to pretend that "colorblind" business as
usual will advance the goals of our organization.  We need to actively work
to ensure this project has effects that *work against* the significant
pre-existing biases toward highly-educated speakers of European languages.
It is not enough to say that "someday" this "may" have an effect on
minority language groups if "somebody" ever gets around to doing it.  We
must make those investments proactively and with clear intention in order
to effect the change we wish to see in the world.
  -- C. Scott Ananian
Wikimedia-l mailing list, guidelines at: and
New messages to:

Reply via email to