[Wikidata-bugs] [Maniphest] [Retitled] T161527: Canonical data URLs for machine readable page content

daniel Sat, 01 Apr 2017 03:06:07 -0700

daniel changed the title from "Canonical data URIs and URLs for machine readable page content" to "Canonical data URLs for machine readable page content".
daniel edited the task description. (Show Details)

EDIT DETAILS

**Revised after public discussion, April 1 2017**

== Problem ==

Wikimedia is managing a growing amount of machine readable data as wiki page content. The latest addition is the Data namespace on commons, which hosts tabular data like [[https://commons.wikimedia.org/wiki/Data:Dolmens_of_the_Preseli_Hills.tab|Data:Dolmens_of_the_Preseli_Hills.tab]] and geographic data like [[https://commons.wikimedia.org/wiki/Data:Avignon_City_Wall.map|Data:Avignon_City_Wall.map]].

There is currently no canonical URL for referring to and retrieving these data sets. Canonical URLs are needed as stable identifiers (URIs) in linked data.

**Concrete need:** Wikidata can reference geo-shape data from the Data namespace on Commons. To represent such references in RDF, the data set needs a canonical URI. See {T159517}

Problem:== Proposed Solution ==:

* Use URLs of the form https://commons.wikimedia.org/data/Data:Avignon_City_Wall.map to identify and retrieve machine readable page content.

* The ```/data/``` path is rewritten to a special page, Special:PageData
There is currently no canonical URI/URL for referring to and retrieving these data sets. 

Concrete need:* Special Special:PageData will redirect (with status 303) to an appropriate (and typically cacheable) URL for retrieving the page data. For now, this will use the ```action="" interface.

* Special:PageData may apply content negotiation based on the Accept header sent by the client. In the first iteration, it will only check if any accept header sent by the client is compatible with the content model of the requested page.
Wikidata can reference geo-shape data from the Data namespace on Commons. To represent such references in RDF* The 303 redirects are not cecheable for now, the data set needs a canonical URI.because they depend on the Accept header;See {T159517}complex normalization would be needed to allow the cache to vary on the Accept header without causing massive cache fragementation.

Current solutions:Note that in contrast to Wikidata entity URIs, the above URIs identify //descriptions// (data), not the thing described by the data.

== Status Quo ==
* A* There is a way to get raw page data for most data types, using action="" with the "ugly" URL form: <https://commons.wikimedia.org/w/index.php?title=Data:Avignon_City_Wall.map&action="" However, this is not supported for data types that have "direct editing " disabled. E.g. <https://www.wikidata.org/w/index.php?title=Q23&action="" does not work.

* Wikidata uses <https://www.wikidata.org/entity/Q23> as the canonical URI of concepts, and <https://www.wikidata.org/wiki/Special:EntityData/Q23> as the canonical URI of the description. Both apply content negotiation and trigger a 303 redirect. The canonical URL for a specific serialization has the form <https://www.wikidata.org/wiki/Special:EntityData/Q23.ttl>.

== Concerns an Alternatives Considered ==

Proposed URIs for * Do not include the namespace after /data:/, e.g. https://commons.wikimedia.org/data/Avignon_City_Wall.map
* Special case for the data namespace: https://commons.wikimedia.org/data/Avignon_City_Wall.map

* ...or with the namespace,: TBD

* Use "raw" instead of "data", e.g.so other kinds of data can be added: https://commons.wikimedia.org/dataraw/Data:Avignon_City_Wall.map
* ...or bind it to action="" explicity: https://commons.wikimedia.org/raw/Data:Avignon_City_Wall.map: TBD

Note that in contrast to Wikidata concept URIs, the above URIs identify //descriptions// (data), not the thing described by the data.* Use REST API URLS

: TBD

Also note that these would return the "internal" serialization of the data (with the appropriate MIME type in the response header). They do not support custom serialization or apply content negotiation. 

Question:

Do we need to plan for supporting custom serialization and content negotiation? Is it sufficient to later add a query parameter to specify an alternative serialization?* Apply content negotiation to the established page URLs using the /wiki/ path 

: TBD

Example: to get .tab data as CSV instead of JSON, one would use a URL like <https://commons.wikimedia.org/data/Avignon_City_Wall.map?format=text/csv>. 

Note that specifying the format makes no sense for a "pure" URI, this is only relevant when resolving the URI as a URL and fetching the associated data.

Useful reading for URI design:* "URLs don't need to be pretty"

: TBD

== Resources ==

* https://www.w3.org/TR/cooluris/...

TASK DETAIL

https://phabricator.wikimedia.org/T161527

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: MZMcBride, Rybesh, Dzahn, GWicke, tstarling, Aklapper, Jonas, Smalyshev, mkroetzsch, Lydia_Pintscher, daniel, QZanden, Salgo60, D3r1ck01, Izno, suriyaa, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, jayvdb, Southparkfan, fbstj, RobLa-WMF, santhosh, Mbch331, Jay8g, Ltrlg, Glaisher, bd808, Krenair, Legoktm

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Retitled] T161527: Canonical data URLs for machine readable page content

Reply via email to