As for subsetting Wikidata, what sort of users do have the resources to do that? Also that would mean new entity URIs (due to a different hostname) which are not widely known (including by LLMs), so not a practical solution IMO.
On Tue, Jan 6, 2026 at 12:54 PM Andra Waagmeester <[email protected]> wrote: > > Dear Martynas, > > I strongly disagree that the provenance statements should be removed > from the default responses, since it is exactly the provenance that makes > Wikidata so valuable. Wikidata comes with a lot of noise, since often > references are not provided. Personally, I mostly consider a Wikidata > statement without a reference, without any value and is best ignored. . So if > we remove the provenance, Wikidata becomes just a bag of noise. > Having said this, I do acknowledge that wikidata comes with a lot of baggage > or weight, but there are some decent tools out there to subset Wikidata into > more manageable portions. > We did a paper on that a few years back: > https://www.semantic-web-journal.net/system/files/swj3491.pdf > > Cheers, > > Andra > > Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius > <[email protected]>: >> >> Hi all, >> >> I hope this is the right place for this discussion :) >> >> First of all, as developer of software for RDF Linked Data >> consumption, I am naturally delighted that Wikidata serves Linked Data >> and supports content negotiation (not many services get it right). >> >> However, IMO, the amount of meta-triples not relevant to the requested >> entity, and the sheer size of the RDF data that it causes, make >> Wikidata's RDF responses pretty much unusable. >> >> Let's take a single entity as an example: >> >> curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748' >> >> The size of the Turtle response is 1.6MB! >> >> All of the schema metadata such as property and class descriptions are >> not needed as they can be discovered by dereferencing the respective >> term URIs: >> >> wdno:P2960 a owl:Class ; >> owl:complementOf _:e8842935d39a233def3d267ae3737d8c . >> >> _:e8842935d39a233def3d267ae3737d8c a owl:Restriction ; >> owl:onProperty wdt:P2960 ; >> >> owl:someValuesFrom owl:Thing . >> >> p:P518 a owl:ObjectProperty . >> psv:P518 a owl:ObjectProperty . >> pqv:P518 a owl:ObjectProperty . >> prv:P518 a owl:ObjectProperty . >> wdt:P518 a owl:ObjectProperty . >> ps:P518 a owl:ObjectProperty . >> pq:P518 a owl:ObjectProperty . >> pr:P518 a owl:ObjectProperty . >> >> wd:Q1775415 a wikibase:Item ; >> rdfs:label "feminine"@en ; >> skos:prefLabel "feminine"@en ; >> schema:name "feminine"@en ; >> schema:description "grammatical gender"@en . >> >> and so on and so forth. >> >> Then I would argue that the provenance statements such as >> <http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd751882b7ce> >> are also *not* necessary for the majority of use cases of the majority >> of users. >> >> I suppose they are included to provide a complete and "truthy" >> response, but by doing so the usability of the data is diminished. I >> think the provenance statements should be removed from the default >> responses and relegated to some "complete" or "truthy" profile with a >> distinct URI, linked to from the default response. >> >> What do you think? >> >> Martynas >> atomgraph.com >> _______________________________________________ >> Wikidata mailing list -- [email protected] >> Public archives at >> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/6CALPNUWKMID3UE2RK7OCIZIGOAKNAVK/ >> To unsubscribe send an email to [email protected] > > _______________________________________________ > Wikidata mailing list -- [email protected] > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/FR2MT2ZU3HRYOXZK3RHUVNH6HOW664PY/ > To unsubscribe send an email to [email protected] _______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/LYA5B65WALWRS4ERZPIS4MIONO2HWWDC/ To unsubscribe send an email to [email protected]
