[Wikimedia-l] Re: the question of openness in the AI context [was Re: Wikipedia at 25: A Wake-Up Call]

Jan Ainali via Wikimedia-l Mon, 04 May 2026 06:25:57 -0700

Thank you Luis for your thoughtful reply. Your historical examples are
particularly illustrating because they show compromises with a long term
goal of "breaking free". They are stories of building new tools rather than
locking themselves into the existing technologies. In our context that may
mean that we will build upon and borrow where we can, rather than being
just a consumer of the Big Products of the day.


If we have any red lines in this compromise it surely is at a privacy
preserving aspect. Even if we for a moment in time are building on
less-than-ideally-open tools, handing over data about users or readers
should be a no-no. (This can likely be mitigated by running things in our
own ecosystem.) I am higlighting this as even if we may build some bridges,
not all bridges are good to cross.

The example with the Wikipedia library is interesting because even if we
are using works that are non-free as sources and the ethics of many
publishers deserves to be discussed, arguably these authors at least
knowingly (albeit begrudgedly) have some sort of deal with them. It's also
not unreasonable to think that many of the authors are happy being cited
and that a citation on our projects bring some, but perhaps mostly a low,
value to them. That's a stark contrast to the ethics of how the "Frontier
models" are being made.

So yes, let's think creatively on how we can exploit these tools and at the
same time minimize the exploitation of everyone else in the world.

Best regards,
Jan

On Mon, May 4, 2026, 04:24 Luis Villa <[email protected]> wrote:

> On Fri, May 1, 2026 at 1:45 PM Jan Ainali via Wikimedia-l <
> [email protected]> wrote:
>
>> ... there is one of our central values I want us to keep held front of
>> mind in this moment, and that is to focus on open source and not fall for
>> the lure of the proprietary just because it is AI. And here I would like us
>> to follow the principles of the Digital Public Goods Alliance (who made us
>> so proud when they awarded Wikipedia and Wikidata with their certification
>> of being Digital Public Goods) and go even further than the definition from
>> the Open Source Initiative definition for open source AI. Their extension
>> means that beyond the free license on the model and the code, also the
>> dataset used for training should be freely licensed.[1]
>>
>> This would not only be the ethically right thing to do, it would also
>> ensure we aren't dependent on Big Tech when doing our adaptation to the new
>> landscape.
>>
>
> Hey, Jan! Thanks for raising this. I think it's such an important topic
> that it is worth breaking out into a separate thread.
>
> I agree with your bottom line: we can't have truly open knowledge without
> a truly open ecosystem (not just software stack). That should be the goal
> we are always, always striving to get to.
>
> But there are some important wrinkles.
>
> *Our knowledge ecosystem has never been purely open*
> Our core web services have always been FOSS from the ground up.
>
> But our knowledge ecosystem is very much not open.
>
> Our tech ecosystem has always been co-dependent on web search generally,
> and Google Web Search specifically. Google is how most people find us, and
> how most of us find knowledge to put into the encyclopedia. This is not
> *good*—it is in fact very bad—but it is, and always has been, our reality.
> Mostly we ignore this inconvenient dependency, and mostly that is fine. But
> if we’re going to try to see the world as it is, we also have to be honest
> about that dependency.
>
> LLMs are not perfect, but *at worst* the reasons they’re bad are the same
> reasons Google Web Search (and essentially every other web search, and the
> publishing industry too) is bad: controlled by an unaccountable
> corporation, hard to audit, subject to all sorts of biases.
>
> Open-weight models still aren’t perfect, but: we can audit them for bias;
> we can modify them (within boundaries); we can rebuild them with open
> knowledge (Ai2 says hi); we can even run them locally. That’s true even
> when they aren’t DGPA-open (or in many cases even when they’re not
> OSI-open).
>
> And there are still *possibilities* of truly open (training data and
> weights) models, about which more in the next point.
>
> *Open has always involved compromise*
> New open ecosystems do not just magically spring into existence—they have
> always required hard work *and strategic compromise*. The GNU folks had
> to compromise for almost two decades, running on proprietary Unices. It
> took Mozilla most of a decade to beat IE, and they had to run proprietary
> plugins starting on day one to do it. As you’re well aware, open access
> publishing is still very much a work in progress two decades in.
>
> All of those things built on each other. If Stallman hadn’t compromised by
> building his open compiler on Solaris, Linus doesn’t build learn about GPL
> and free Linux. If Linus doesn’t build Linux, Netscape doesn’t open
> Mozilla. Mozilla used a compromise open license deliberately written to
> ensure Netscape could ship proprietary plugins. Etc. Etc. Etc.
>
> We’re only five years or so into the LLM era. It is not very open. I am
> not sure what compromises will be made. But we’re probably going to need to
> make compromises, in order to learn; to gain influence; to beat back our
> competitors. In the best case that’s going to mean tech like Olmo and
> partners like Ai2, but it is also going to mean some compromises—and some 
> *hope
> *that our work will inspire the next generation of openness.
>
> *There’s supposed to be a third thing*
> I really want to have a third thing but uh I’m drawing a blank. So again:
>
> I have to stress, this is not a call to throw away our principles.
>
> We should absolutely be using every bit of influence and leverage (and
> money) we have to push every player in this ecosystem towards the most
> possible openness. But that’s also going to mean getting involved and
> building bridges, not sitting on the sidelines. And it’s going to mean
> building *practical *bridges, so that (like Wikipedia Library) we
> sometimes are doing deals with entities who don’t share our values. Those
> compromises will have to be done vigilantly and carefully. But the world is
> changing radically, and fast. So our compromises will have to be done
> boldly too.
>
> Sincerely—in open and in progress—
> Luis
>

_______________________________________________
Wikimedia-l mailing list -- [email protected], guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/7VOQBCOQX7JEQJUDKIFU4S63BJXUQHCD/
To unsubscribe send an email to [email protected]

[Wikimedia-l] Re: the question of openness in the AI context [was Re: Wikipedia at 25: A Wake-Up Call]

Reply via email to