[Wikidata-bugs] [Maniphest] [Edited] T161527: Canonical data URLs for machine readable page content

daniel Thu, 13 Apr 2017 11:46:24 -0700

daniel edited the task description. (Show Details)

EDIT DETAILS

**Revised after public discussion, April 1 2017** and April 13 2017**

NOTE: Last call for comments! If no new pertinent concerns are raised by April 26 2017, this RFC will be approved for implementation!

== Problem ==...
* Use URLs of the form https://commons.wikimedia.org/data/main/Data:Avignon_City_Wall.map to identify and retrieve machine readable page content. "main" refers to the main slot, see T107595.

* The `/data/<slot>` path is rewritten to a special page, Special:PageData...
Note that in contrast to Wikidata entity URIs, the above URIs identify //descriptions// (data), not the thing described by the data. They also do not identify wiki pages, as the /wiki/ path does. 

Also note that the primary purpose of these URLs are to act as canonical stable identifiers (URIs). They should be resolvable, but they are not intended as a full-fledged data access API. They may however be implemented to redirect to such an API....
  * While URLs do not have to be pretty, they should be stable, especially when they are to be used as stable unique identifiers. Remocing all application specific information from the URL provides more stability by adding a layer of abstraction.
* We could apply content negotiation to the established page URLs using the `/wiki/` path. Such URLs are already in use for referring to Wikipedia pages in RDF.

  * The semantics of /wiki is "a wiki page", while the intended semantics of /data is "a machine readable data set".

  * The /wiki path has no room for addressing individual slots - in fact, it refers to the page as rendered using information from all slots (compare T107595).
== Open Questions and Concerns ==  * The /wiki path on Wikimedia sites is well established and heavily used. It's risky to overload it with new semantics and behavior.
* We could apply content negotiation to the established page URLs using the `/wiki/` path* The proposed URL scheme does not have room for slot names. Such URLs are already in use for referring to Wikipedia pages in RDF (e.g.We will not be able to refer to slots other than the main slot. 

  * The proposal was amended to use the /data/<slot>/ prefix,by DBpedia and also by Wikidata)for forward compatibility. On the other handThe intended meaning or semantics of <slot> is not yet fixed, the `/wiki/` path is really a UI entry point,though it is expected to align with slot names (compare T107595). 

* The proposed schemes are not stable against page renames.and it seems like a good idea to keep the UI separate fromWe could use page IDs instead of the data identifierstitle.
* The proposed URL scheme does not have room for slot names  * Page IDs are also brittle: sometimes, a page is moved to an archive-style title, and a new page is created using the old title. We will not be able to refer to slots other than the main slot.In such a case,Possible solution: https://commons.wikimedia.org/data/main/Avignon_City_Wall.map.the intended semantics of the data URLs is unknown. 

  * Most entry points, including the REST API, relies on titles,This is looking more and more thike the REST API URLsnot page IDs.
* The porposed schemes are not stable against page renames. We could use page IDs instead of the title. That makes the URLs a lot less intuitive,  * Page IDs will often not be known to the code that constructs the data URL.and requiresIt may take a database access in orderor API request to construct themdetermine the page ID. 
  * Page IDs don't allow for "eyeballing", they are not self-explanatory.  

* The URL pattern should include a versioning mechanism

  * The idea of versioning is somewhat contrary to the idea of stable canonical identifiers. The canonical identifier should stay canonical, and not be replaced by a new canonical URL. The primary concern is the identity of the object identified, not the format of the data returned when resolving the URL. This situation is contrary to the situation for APIs: here, it's important to know exactly the format of the data returned, and how to request which bits of data. Here, versioning is a good thing.

* The proposed URL pattern introduces a new API for MediaWiki; there is no need for another API beyond the old school action API, the traditional web API and the new REST API.

  * The proposed URL pattern is merely a naming convention; it can act as a from fro any of the existing APIs. Its primary aim is to provide stable identifiers, to provide fine grained data access.

  * The concerns of identifiers and APIs are related, but dissimilar, as explained above. They can be seen as complementary.   

...

TASK DETAIL

https://phabricator.wikimedia.org/T161527

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Scott_WUaS, bmansurov, WMDE-leszek, MZMcBride, Rybesh, GWicke, tstarling, Aklapper, Jonas, Smalyshev, mkroetzsch, Lydia_Pintscher, daniel, QZanden, Izno, suriyaa, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, jayvdb, Southparkfan, fbstj, RobLa-WMF, santhosh, Mbch331, Jay8g, Ltrlg, Glaisher, bd808, Krenair, Legoktm

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Edited] T161527: Canonical data URLs for machine readable page content

Reply via email to