Coming back to the question of P's and Q's (sorry, it's been a busy few weeks)

I read people saying "Don't worry because prefixes", but with respect I don't agree.

IMO "Don't worry because prefixes" may make sense as a response if one interacts with Wikidata primarily via RDF dumps, or SPARQL, or perhaps writing system code -- environments where those prefixes may be generally present and used.

But for anyone actually working first-hand with the data, whose work involves any substantial checking and/or manual editing of data through the wikibase user interface, I think it fails to ring true. Extensive hand-editing in this way tends to be an unavoidable aspect when curating a dataset in wikibase -- eg investigating anomalies revealed by query reports, perhaps after a large data upload or data matching procedure, and then identifying and making appropriate edits to resolve them.

For people making a lot of hand-edits like that, a process which as I have said I think is inevitable when actively curating datasets, certain property identifiers become so often encountered and so often used and repeated that they become so deeply ingrained and internalised as to become essentially second nature -- eg P18 for image, P373 for commonscat, P131 for located in administrative territorial entity, etc etc, the precise properties depending on the kind of data and items one is working with. Similarly also for a lot of certain item identifiers, eg Q5 human etc.

If one's doing a lot of editing and looking-up through the interface, these identifications become very very familiar - as internalised and unconscious and automatic as breathing.

So I do think that reusing the same identifiers for quite different meanings in a different wikibase (but with essentially exactly the same editing interface) is to create a cognitive dissonance which (IMO) is significant, unnecessary, unfortunate, and (I believe) ought to be avoidable.


A second issue is Daniel's scenarios 2 to 4, where external repos want to be using and referencing some or all of Wikidata's items and properties, with the same identifiers as Wikidata, plus some additional further properties and items of their own defined locally.

That's not straightforward, if they all have to be placed in the same shared numerical sequences following the same restricted set of initial letters.


I do take the point that it is useful to be able to use the initial letter to distinguish different kinds of Wikibase object -- ie Properties (P), Items (Q), Lexemes (L), MediaInfo items (M)

One solution might be to allow Wikibase instances to use additional characters in the identifier for the local properties, items etc specific to that Wikibase -- so that that the Wikibase could have property identifiers like Px50 or Pz50 or Pm50 to distinguish them from Wikidata's P50, or identifiers like Qx5000 or Qz5000 or Qosm5000 to distinguish them from Wikidata's Q5000.

This would straightforwardly allow Wikidata and local items and properties to exist side by side, and avoid confusion and dissonance with internalised learnt identifier codes from the items and properties on Wikidata itself.


Best regards,

   James.

---
This email has been checked for viruses by AVG.
https://www.avg.com


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to