Re: [Wikidata] BREAKING CHANGE: sitelink encoding changing in RDF export/WDQS

2017-06-30 Thread Stas Malyshev
Hi!

>> We have decided to change the encoding, so that the encoding of the
>> sitelink above would be /wiki/Category:Stuffed_animals. The encoding now
>> should match how titles are encoded in Mediawiki codebase (non-ASCII
>> characters that Mediawiki encodes will still be encoded as before).

Continuing the saga, the change now has been deployed on Wikidata. Which
means, unfortunately, that for some time some entities (newly edited)
would have the new encoding for sitelinks, and some (not edited
recently) may still have the old encoding. As soon as the new dump will
be produced (next Tuesday) we will begin reloading the servers, after
which all the data will be using the new encoding, matching the encoding
in Mediaiwki.
I apologize in advance for the inconvenience that may be caused by
having inconsistent data for a while (shouldn't be more than a week). I
will make an announcement (hopefully the final one on the topic :) when
it is all done.

Thanks,
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] BREAKING CHANGE: sitelink encoding changing in RDF export/WDQS

2017-05-24 Thread Stas Malyshev
Hi!

In RDF exports of Wikidata[1] and in Wikidata Query Service, sitelinks
were always encoded by url-encoding the sitelink text - i.e. link to
"Category:Stuffed animals" were encoded as
/wiki/Category%3AStuffed%20animals.

While this encoding produces a working link, after some time we've
arrived to a conclusion that such encoding is very inconvenient, due to
mismatch with how titles are encoded in Mediawiki, and this mismatch
makes it harder to look up the links. See more in
https://phabricator.wikimedia.org/T131960

We have decided to change the encoding, so that the encoding of the
sitelink above would be /wiki/Category:Stuffed_animals. The encoding now
should match how titles are encoded in Mediawiki codebase (non-ASCII
characters that Mediawiki encodes will still be encoded as before).

Implementation of this change will require database reload, and during
that time there might be inconsistent results returned for some time
(some entities may have new sitelink encoding and some the old one). I
apologize in advance for any inconvenience caused by that. I will
announce additionally when the switch is process has started and when it
is complete.

Thanks,
[1] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata