Dear all,

I fully agree with Andra's response in terms of content. Provenance and
governance are crucial for responsible use and propagation of information. 

I myself work as a healthcare professional in the field of medical
guidelines and biomedical research and am confronted daily with the question
of whether a term definition is reliable or not. 

It makes a difference whether a term is defined by an authoritative body,
such as a WHO expert working group, or by an undefined institution. 

 

Also, from my involvement with LIFES <https://www.lifes.institute/> , I see
that term definitions, and the sources of those definitions, are an
overlooked aspect in the data community, which greatly complicates machine
interpretability and reuse of data. 

Especially when Wikidata/Wikibase is used for a controlled vocabulary in a
KAG.

When no source is listed for a term (definition), lossless propagation of
information is not guaranteed and is therefore essentially useless for
further use. 

 

The problem outlined can be explained by two aspects. 

First, it is an intrinsic given. Systems such as Wikidata are not designed
to go beyond a lexicological concept. The world is much more complex than
that and needs to be described with a far more expressive encyclopedic
model. 

In practice, knowledge graph-like systems get stuck for more complex
knowledge models. 

Second, it is a result of uncontrolled growth of source silos, which gives
term mapping a disproportionate role in the (poorly defined) propagation of
information. 

It would be better to address these through an extensive federative policy. 

 

Sincerely, 

Frans van der Horst

 

 

Van: Andra Waagmeester <[email protected]> 
Verzonden: dinsdag 6 januari 2026 12:53
Aan: Discussion list for the Wikidata project <[email protected]>
Onderwerp: [Wikidata] Re: RDF Linked Data responses of Wikidata URIs

 

Dear Martynas, 

 

     I strongly disagree that the provenance statements should be removed
from the default responses, since it is exactly the provenance that makes
Wikidata so valuable. Wikidata comes with a lot of noise, since often
references are not provided. Personally, I mostly consider a Wikidata
statement without a reference, without any value and is best ignored. . So
if we remove the provenance, Wikidata becomes just a bag of noise. 

Having said this, I do acknowledge that wikidata comes with a lot of baggage
or weight, but there are some decent tools out there to subset Wikidata into
more manageable portions. 

We did a paper on that a few years back:
<https://www.semantic-web-journal.net/system/files/swj3491.pdf>
https://www.semantic-web-journal.net/system/files/swj3491.pdf

 

Cheers, 

Andra

 

Op di 6 jan 2026 om 11:15 schreef Martynas Jusevičius <
<mailto:[email protected]> [email protected]>:

Hi all,

I hope this is the right place for this discussion :)

First of all, as developer of software for RDF Linked Data
consumption, I am naturally delighted that Wikidata serves Linked Data
and supports content negotiation (not many services get it right).

However, IMO, the amount of meta-triples not relevant to the requested
entity, and the sheer size of the RDF data that it causes, make
Wikidata's RDF responses pretty much unusable.

Let's take a single entity as an example:

    curl -L -H "Accept: text/turtle" 'https://www.wikidata.org/entity/Q1748'

The size of the Turtle response is 1.6MB!

All of the schema metadata such as property and class descriptions are
not needed as they can be discovered by dereferencing the respective
term URIs:

wdno:P2960 a owl:Class ;
  owl:complementOf _:e8842935d39a233def3d267ae3737d8c .

_:e8842935d39a233def3d267ae3737d8c a owl:Restriction ;
  owl:onProperty wdt:P2960 ;

owl:someValuesFrom owl:Thing .

p:P518 a owl:ObjectProperty .
psv:P518 a owl:ObjectProperty .
pqv:P518 a owl:ObjectProperty .
prv:P518 a owl:ObjectProperty .
wdt:P518 a owl:ObjectProperty .
ps:P518 a owl:ObjectProperty .
pq:P518 a owl:ObjectProperty .
pr:P518 a owl:ObjectProperty .

wd:Q1775415 a wikibase:Item ;
rdfs:label "feminine"@en ;
skos:prefLabel "feminine"@en ;
schema:name "feminine"@en ;
schema:description "grammatical gender"@en .

and so on and so forth.

Then I would argue that the provenance statements such as
<http://www.wikidata.org/entity/statement/Q1748-cfb94fd5-464b-1b83-a513-dd75
1882b7ce>
are also *not* necessary for the majority of use cases of the majority
of users.

I suppose they are included to provide a complete and "truthy"
response, but by doing so the usability of the data is diminished. I
think the provenance statements should be removed from the default
responses and relegated to some "complete" or "truthy" profile with a
distinct URI, linked to from the default response.

What do you think?

Martynas
atomgraph.com <http://atomgraph.com> 
_______________________________________________
Wikidata mailing list -- [email protected]
<mailto:[email protected]> 
Public archives at
https://lists.wikimedia.org/hyperkitty/list/[email protected]/mes
sage/6CALPNUWKMID3UE2RK7OCIZIGOAKNAVK/
To unsubscribe send an email to [email protected]
<mailto:[email protected]> 

_______________________________________________
Wikidata mailing list -- [email protected]
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/VUMUFL7D35SCR2WRN6J5IT4EYHRGNG26/
To unsubscribe send an email to [email protected]

Reply via email to