‫בתאריך יום א׳, 15 בפבר׳ 2026 ב-20:45 מאת ‪Shani Evenstein Sigalov via
Wikimedia-l‬‏ <‪[email protected]‬‏>:‬

> Hi Amir,
>
> Good to see you in this discussion.
>
> While I get where you are coming from with this question,  what you're
> asking is more of a general "AI vs. Wikimedia / Open Knowledge" question.
>

No, that's not what I'm asking. I've really just asked What's the
difference between "making data available to 'AI' " and simply "making data
available". Don't hear what I didn't say :)


> It is important to discuss, but perhaps in a separate thread, so the focus
> of this one remains the project.
>

Perhaps I misunderstood something, but I thought that this _is_ the
project ¯\_(ツ)_/¯


> Yes, there was a reason to think AI has "special requirements"in some
> cases. At the point in time when the project was pitched, most LLM were
> trained heavily on WP,  but not WD (all Linked Open Data, in fact, despite
> being "machine readable'). WD (and all Linked Open Data for that matter)
> remained an untapped resource. This meant that when you asked GenAI
> platform questions and answers were not on WP, WD / WB would not be
> consulted (despite being machine readable), which led to less accurate
> informationon GenAI platforms.
>

This is true. I am by no means a true expert on the internals of machine
learning and language model algorithms, but from the little I do know, it
seems like the reason for this is that the "AI" boom that began in 2022
with the release of ChatGPT is actually an LLM boom, and LLMs are built for
texts and not for databases. Wikipedia is a big collection of texts, and
Wikidata is a database. Wikidata was made to be machine-readable, but LLMs
are just not the right kind of machine to read it. This was definitely true
for the LLMs of 2022, and as far as I can tell, it is still true in 2026. I
heard about models that were trained specifically for Wikidata, but every
time I tried actually using them, they didn't do anything useful. (I'm not
saying that what LLMs do with the text of Wikipedia is totally good either,
but it is relatively more useful than what they do with Wikidata.)

Making LLMs able to better process databases in general and Wikidata in
particular is a good goal. The question, however, is what is really lacking
here: The capabilities of the Wikidata technology, the skills and the goals
of the Wikidata community, or the motivation of the LLM developers. I might
be wrong, but I suspect that it's mostly the latter. And that's why I'm
asking: What's the difference between "making data available to 'AI' " and
simply "making data available"? Did any LLM developers seriously try to use
Wikidata *as a database*, and if they did, did they have difficulties that
were specific to them?
_______________________________________________
Wikimedia-l mailing list -- [email protected], guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/PWERGZX65KJLIMEFXAT226RGOSHO4SY2/
To unsubscribe send an email to [email protected]

Reply via email to