בתאריך יום א׳, 15 בפבר׳ 2026 ב-20:45 מאת Shani Evenstein Sigalov via Wikimedia-l <[email protected]>:
> Hi Amir, > > Good to see you in this discussion. > > While I get where you are coming from with this question, what you're > asking is more of a general "AI vs. Wikimedia / Open Knowledge" question. > No, that's not what I'm asking. I've really just asked What's the difference between "making data available to 'AI' " and simply "making data available". Don't hear what I didn't say :) > It is important to discuss, but perhaps in a separate thread, so the focus > of this one remains the project. > Perhaps I misunderstood something, but I thought that this _is_ the project ¯\_(ツ)_/¯ > Yes, there was a reason to think AI has "special requirements"in some > cases. At the point in time when the project was pitched, most LLM were > trained heavily on WP, but not WD (all Linked Open Data, in fact, despite > being "machine readable'). WD (and all Linked Open Data for that matter) > remained an untapped resource. This meant that when you asked GenAI > platform questions and answers were not on WP, WD / WB would not be > consulted (despite being machine readable), which led to less accurate > informationon GenAI platforms. > This is true. I am by no means a true expert on the internals of machine learning and language model algorithms, but from the little I do know, it seems like the reason for this is that the "AI" boom that began in 2022 with the release of ChatGPT is actually an LLM boom, and LLMs are built for texts and not for databases. Wikipedia is a big collection of texts, and Wikidata is a database. Wikidata was made to be machine-readable, but LLMs are just not the right kind of machine to read it. This was definitely true for the LLMs of 2022, and as far as I can tell, it is still true in 2026. I heard about models that were trained specifically for Wikidata, but every time I tried actually using them, they didn't do anything useful. (I'm not saying that what LLMs do with the text of Wikipedia is totally good either, but it is relatively more useful than what they do with Wikidata.) Making LLMs able to better process databases in general and Wikidata in particular is a good goal. The question, however, is what is really lacking here: The capabilities of the Wikidata technology, the skills and the goals of the Wikidata community, or the motivation of the LLM developers. I might be wrong, but I suspect that it's mostly the latter. And that's why I'm asking: What's the difference between "making data available to 'AI' " and simply "making data available"? Did any LLM developers seriously try to use Wikidata *as a database*, and if they did, did they have difficulties that were specific to them?
_______________________________________________ Wikimedia-l mailing list -- [email protected], guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/PWERGZX65KJLIMEFXAT226RGOSHO4SY2/ To unsubscribe send an email to [email protected]
