Christopher added a comment.
The main issue is with string comparison of the percent encoded and unencoded forms of Unicode IRIs as **resources**. Per https://tools.ietf.org/html/rfc3987#section-5.3.1 > When comparing character by character, the comparison function MUST NOT map IRIs to URIs, because such a mapping would create additional spurious equivalences. It follows that an IRI SHOULD NOT be modified when being transported if there is any chance that this IRI might be used as an identifier. For experimentation, I have created a named graph (http://wikidata.org/en-sitelinks) in Virtuoso that contains all 7M+ english sitelinks in their normal Unicode form. The query below shows a side-by-side comparison of the two variants. http://wdm-rdf.wmflabs.org/sparql?default-graph-uri=&query=PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0ASELECT+%3Fs+%3Fsitelink+%3Fenclink+%0D%0AFROM+%3Chttp%3A%2F%2Fwikidata.org%3E%0D%0AFROM+%3Chttp%3A%2F%2Fwikidata.org%2Fen-sitelinks%3E%0D%0AWHERE+%7B%3Fs+foaf%3AisPrimaryTopicOf+%3Fsitelink%3B%0D%0Awdt%3AP17+wd%3AQ189+.%0D%0A%3Fenclink+schema%3Aabout+%3Fs%0D%0A%7D+LIMIT+100&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=0&debug=on It seems quite obvious that the unencoded form is what should be represented in the RDF. I can see no reason why the RDF should have a percent encoded IRI that is not only ugly, but practically useless to connect disparate external sources that may reference the Wikipedia canonical **article name** TASK DETAIL https://phabricator.wikimedia.org/T132319 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Christopher Cc: Smalyshev, Aklapper, Christopher, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331 _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
