[Wikidata-bugs] [Maniphest] [Commented On] T132319: Sitelink URIs should be IRIs

Christopher Mon, 11 Apr 2016 17:09:09 -0700

Christopher added a comment.


  The main issue is with string comparison of the percent encoded and unencoded 
forms of Unicode IRIs as **resources**.
  
  Per https://tools.ietf.org/html/rfc3987#section-5.3.1
  
  > When comparing character by character, the comparison  function MUST NOT 
map IRIs to URIs, because  such a mapping would create additional spurious 
equivalences.  It follows that an IRI SHOULD NOT be  modified when being 
transported if there is any chance that this IRI might be used as an identifier.
  
  For experimentation, I have created a named graph 
(http://wikidata.org/en-sitelinks) in Virtuoso that contains all 7M+ english 
sitelinks in their normal Unicode form.  The query below shows a side-by-side 
comparison of the two variants.
  
    
http://wdm-rdf.wmflabs.org/sparql?default-graph-uri=&query=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0ASELECT+%3Fs+%3Fsitelink+%3Fenclink+%0D%0AFROM+%3Chttp%3A%2F%2Fwikidata.org%3E%0D%0AFROM+%3Chttp%3A%2F%2Fwikidata.org%2Fen-sitelinks%3E%0D%0AWHERE+%7B%3Fs+foaf%3AisPrimaryTopicOf+%3Fsitelink%3B%0D%0Awdt%3AP17+wd%3AQ189+.%0D%0A%3Fenclink+schema%3Aabout+%3Fs%0D%0A%7D+LIMIT+100&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=0&debug=on
  
  It seems quite obvious that the unencoded form is what should be represented 
in the RDF.  I can see no reason why the RDF should have a percent encoded IRI 
that is not only ugly, but practically useless to connect disparate external 
sources that may reference the Wikipedia canonical **article name**

TASK DETAIL
  https://phabricator.wikimedia.org/T132319

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Smalyshev, Aklapper, Christopher, Avner, debt, Gehel, D3r1ck01, FloNight, 
Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T132319: Sitelink URIs should be IRIs

Reply via email to