matej_suchanek triaged this task as "High" priority. matej_suchanek added a comment.
Yes, this is annoying. There are multiple issues: - #Wikidata <https://phabricator.wikimedia.org/tag/wikidata/> apparently still stores the invalid data (but does not allow to re-submit it) - #Pywikibot <https://phabricator.wikimedia.org/tag/pywikibot/> doesn't do well when deciding which claims were changed Let's debug it on a random example from my logs: https://www.wikidata.org/wiki/Q486235 When `ItemPage.toJSON` with `diffto` is called, the following is run: claims = {} for prop in self.claims: if len(self.claims[prop]) > 0: claims[prop] = [claim.toJSON() for claim in self.claims[prop]] if diffto and 'claims' in diffto: temp = defaultdict(list) claim_ids = set() diffto_claims = diffto['claims'] for prop in claims: for claim in claims[prop]: if (prop not in diffto_claims or claim not in diffto_claims[prop]): # <- this is the key temp[prop].append(claim) if 'id' in claim: claim_ids.add(claim['id']) What does `claim not in diffto_claims[prop]`? It checks whether the JSON of each claim is present in what we are diffing against (the original entity content when it was loaded). This way it can catch if the claim was modified locally (eg. by calling `setTarget`). If a claim hasn't been changed locally, there is no point in submitting it. So why are coordinates submitted even if we didn't change them? {'mainsnak': {'snaktype': 'value', 'property': 'P625', 'datatype': 'globe-coordinate', 'datavalue': {'value': {'latitude': 27.72334, 'longitude': 109.18851, 'altitude': None, 'globe': 'http://www.wikidata.org/entity/Q2', 'precision': None}, 'type': 'globecoordinate'}}, 'type': 'statement', 'id': 'q486235$1280901A-091B-4374-9E3E-66CF41212194', 'rank': 'normal'} {'mainsnak': {'snaktype': 'value', 'property': 'P625', 'hash': '01e88adb61cc5b89e157d42d972804b49e55877b', 'datavalue': {'value': {'latitude': 27.72334, 'longitude': 109.18851, 'altitude': None, 'precision': None, 'globe': 'http://www.wikidata.org/entity/Q2'}, 'type': 'globecoordinate'}, 'datatype': 'globe-coordinate'}, 'type': 'statement', 'id': 'q486235$1280901A-091B-4374-9E3E-66CF41212194', 'rank': 'normal'} Current and original JSONs are always different. Here the only difference is the `hash` key. But for other datatypes, eg. `wikibase-item`... {'mainsnak': {'snaktype': 'value', 'property': 'P17', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 148}, 'type': 'wikibase-entityid'}}, 'type': 'statement', 'id': 'q486235$8B49EBB5-04AD-480A-A1F1-764B5018D076', 'rank': 'normal', 'references': [{'snaks': {'P143': [{'snaktype': 'value', 'property': 'P143', 'datatype': 'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 30239}, 'type': 'wikibase-entityid'}}]}, 'snaks-order': ['P143'], 'hash': '0ee3b3ba1c958f4c3dcba7ed8091fe4b57311348'}]} {'mainsnak': {'snaktype': 'value', 'property': 'P17', 'hash': '30e172796b0726589e92b001c327f5d55fa0782e', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 148, 'id': 'Q148'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 'q486235$8B49EBB5-04AD-480A-A1F1-764B5018D076', 'rank': 'normal', 'references': [{'hash': '0ee3b3ba1c958f4c3dcba7ed8091fe4b57311348', 'snaks': {'P143': [{'snaktype': 'value', 'property': 'P143', 'hash': 'cb49f6fa327b245e4a5aaf48c44b3f503bcd4265', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 30239, 'id': 'Q30239'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}]}, 'snaks-order': ['P143']}]} ... even more differences. So although comparing serializations is very efficient, it's currently broken and also not future-proof. A promising solution is to use T76615: Claim equality operator <https://phabricator.wikimedia.org/T76615> (it doesn't compare references but T186200#4267477 <https://phabricator.wikimedia.org/T186200#4267477> suggests a way around). TASK DETAIL https://phabricator.wikimedia.org/T246359 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: matej_suchanek Cc: matej_suchanek, Aklapper, Liuxinyu970226, pywikibot-bugs-list, SilentSpike, Zkhalido, Viztor, Wenyi, Tbscho, MayS, Mdupont, JJMC89, Dvorapa, Altostratus, Avicennasis, mys_721tx, jayvdb, Ricordisamoa, Masti, Alchimista, Rxy
_______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
