Dear all, I fully agree with Andra's response in terms of content. Provenance and governance are crucial for responsible use and propagation of information.
I myself work as a healthcare professional in the field of medical guidelines and biomedical research and am confronted daily with the question of whether a term definition is reliable or not. It makes a difference whether a term is defined by an authoritative body, such as a WHO expert working group, or by an undefined institution. Also, from my involvement with LIFES <https://www.lifes.institute/> , I see that term definitions, and the sources of those definitions, are an overlooked aspect in the data community, which greatly complicates machine interpretability and reuse of data. Especially when Wikidata/Wikibase is used for a controlled vocabulary in a KAG. When no source is listed for a term (definition), lossless propagation of information is not guaranteed and is therefore essentially useless for further use. The problem outlined can be explained by two aspects. First, it is an intrinsic given. Systems such as Wikidata are not designed to go beyond a lexicological concept. The world is much more complex than that and needs to be described with a far more expressive encyclopedic model. In practice, knowledge graph-like systems get stuck for more complex knowledge models. Second, it is a result of uncontrolled growth of source silos, which gives term mapping a disproportionate role in the (poorly defined) propagation of information. It would be better to address these through an extensive federative policy. Sincerely, Frans van der Horst Van: Andra Waagmeester <[email protected]> Verzonden: dinsdag 6 januari 2026 12:53 Aan: Discussion list for the Wikidata project <[email protected]> Onderwerp: [Wikidata] Re: RDF Linked Data responses of Wikidata URIs Dear Martynas, I strongly disagree that the provenance statements should be removed from the default responses, since it is exactly the provenance that makes Wikidata so valuable. Wikidata comes with a lot of noise, since often references are not provided. Personally, I mostly consider a Wikidata statement without a reference, without any value and is best ignored. . So if we remove the provenance, Wikidata becomes just a bag of noise. Having said this, I do acknowledge that wikidata comes with a lot of baggage or weight, but there are some decent tools out there to subset Wikidata into more manageable portions. We did a paper on that a few years back: <https://www.semantic-web-journal.net/system/files/swj3491.pdf> https://www.semantic-web-journal.net/system/files/swj3491.pdf Cheers, Andra Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius < <mailto:[email protected]> [email protected]>: Hi all, I hope this is the right place for this discussion :) First of all, as developer of software for RDF Linked Data consumption, I am naturally delighted that Wikidata serves Linked Data and supports content negotiation (not many services get it right). However, IMO, the amount of meta-triples not relevant to the requested entity, and the sheer size of the RDF data that it causes, make Wikidata's RDF responses pretty much unusable. Let's take a single entity as an example: curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748' The size of the Turtle response is 1.6MB! All of the schema metadata such as property and class descriptions are not needed as they can be discovered by dereferencing the respective term URIs: wdno:P2960 a owl:Class ; owl:complementOf _:e8842935d39a233def3d267ae3737d8c . _:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; owl:onProperty wdt:P2960 ; owl:someValuesFrom owl:Thing . p:P518 a owl:ObjectProperty . psv:P518 a owl:ObjectProperty . pqv:P518 a owl:ObjectProperty . prv:P518 a owl:ObjectProperty . wdt:P518 a owl:ObjectProperty . ps:P518 a owl:ObjectProperty . pq:P518 a owl:ObjectProperty . pr:P518 a owl:ObjectProperty . wd:Q1775415 a wikibase:Item ; rdfs:label "feminine"@en ; skos:prefLabel "feminine"@en ; schema:name "feminine"@en ; schema:description "grammatical gender"@en . and so on and so forth. Then I would argue that the provenance statements such as <http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd75 1882b7ce> are also *not* necessary for the majority of use cases of the majority of users. I suppose they are included to provide a complete and "truthy" response, but by doing so the usability of the data is diminished. I think the provenance statements should be removed from the default responses and relegated to some "complete" or "truthy" profile with a distinct URI, linked to from the default response. What do you think? Martynas atomgraph.com <http://atomgraph.com> _______________________________________________ Wikidata mailing list -- [email protected] <mailto:[email protected]> Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/mes sage/6CALPNUWKMID3UE2RK7OCIZIGOAKNAVK/ To unsubscribe send an email to [email protected] <mailto:[email protected]>
_______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/VUMUFL7D35SCR2WRN6J5IT4EYHRGNG26/ To unsubscribe send an email to [email protected]
