To Shani, and all the rest of the AI-BRIDGES folks, I offer an enthusiastic note of appreciation for undertaking this work!
I'm just starting to learn about the project, so these are preliminary thoughts, but I'll simply say that - There are lots of ways I can imagine of making linked data from Wikidata more usable by LLMs, e.g. via high-quality MCP servers (Model Context Protocol). They are explicitly targeted at helping LLMs query databases to help provide informed answers. Researchers are also actively developing other techniques to help LLMs leverage database-aware tools. Pursuing them is one of the most direct ways of "making data available to 'AI' " these days. There is a rich diversity of databases of course, which implies a wealth of corresponding challenges in designing the best servers and bridges. - There are lots of other types of AI out there, which are great at using databases. - making more institutional data available in ways that Wikimedia can leverage it clearly makes lots of sense Neal On Sun, Feb 15, 2026 at 7:39 PM Amir E. Aharoni via Wikimedia-l < [email protected]> wrote: > > בתאריך יום א׳, 15 בפבר׳ 2026 ב-20:45 מאת Shani Evenstein Sigalov via > Wikimedia-l <[email protected]>: > >> Hi Amir, >> >> Good to see you in this discussion. >> >> While I get where you are coming from with this question, what you're >> asking is more of a general "AI vs. Wikimedia / Open Knowledge" question. >> > > No, that's not what I'm asking. I've really just asked What's the > difference between "making data available to 'AI' " and simply "making data > available". Don't hear what I didn't say :) > > >> It is important to discuss, but perhaps in a separate thread, so the >> focus of this one remains the project. >> > > Perhaps I misunderstood something, but I thought that this _is_ the > project ¯\_(ツ)_/¯ > > >> Yes, there was a reason to think AI has "special requirements"in some >> cases. At the point in time when the project was pitched, most LLM were >> trained heavily on WP, but not WD (all Linked Open Data, in fact, despite >> being "machine readable'). WD (and all Linked Open Data for that matter) >> remained an untapped resource. This meant that when you asked GenAI >> platform questions and answers were not on WP, WD / WB would not be >> consulted (despite being machine readable), which led to less accurate >> informationon GenAI platforms. >> > > This is true. I am by no means a true expert on the internals of machine > learning and language model algorithms, but from the little I do know, it > seems like the reason for this is that the "AI" boom that began in 2022 > with the release of ChatGPT is actually an LLM boom, and LLMs are built for > texts and not for databases. Wikipedia is a big collection of texts, and > Wikidata is a database. Wikidata was made to be machine-readable, but LLMs > are just not the right kind of machine to read it. This was definitely true > for the LLMs of 2022, and as far as I can tell, it is still true in 2026. I > heard about models that were trained specifically for Wikidata, but every > time I tried actually using them, they didn't do anything useful. (I'm not > saying that what LLMs do with the text of Wikipedia is totally good either, > but it is relatively more useful than what they do with Wikidata.) > > Making LLMs able to better process databases in general and Wikidata in > particular is a good goal. The question, however, is what is really lacking > here: The capabilities of the Wikidata technology, the skills and the goals > of the Wikidata community, or the motivation of the LLM developers. I might > be wrong, but I suspect that it's mostly the latter. And that's why I'm > asking: What's the difference between "making data available to 'AI' " and > simply "making data available"? Did any LLM developers seriously try to use > Wikidata *as a database*, and if they did, did they have difficulties that > were specific to them? > _______________________________________________ > Wikimedia-l mailing list -- [email protected], guidelines > at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > https://meta.wikimedia.org/wiki/Wikimedia-l > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/PWERGZX65KJLIMEFXAT226RGOSHO4SY2/ > To unsubscribe send an email to [email protected] -- Neal McBurnett *https://neal.mcburnett.org/ <https://neal.mcburnett.org/>* Don't believe everything you think
_______________________________________________ Wikimedia-l mailing list -- [email protected], guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/YKFFCM2MCUE6CMUYVHGV43SBHNGHHZE2/ To unsubscribe send an email to [email protected]
