matej_suchanek triaged this task as "High" priority.
matej_suchanek added a comment.


  Yes, this is annoying. There are multiple issues:
  
  - #Wikidata <https://phabricator.wikimedia.org/tag/wikidata/> apparently 
still stores the invalid data (but does not allow to re-submit it)
  - #Pywikibot <https://phabricator.wikimedia.org/tag/pywikibot/> doesn't do 
well when deciding which claims were changed
  
  Let's debug it on a random example from my logs: 
https://www.wikidata.org/wiki/Q486235
  
  When `ItemPage.toJSON` with `diffto` is called, the following is run:
  
    claims = {}
    for prop in self.claims:
        if len(self.claims[prop]) > 0:
            claims[prop] = [claim.toJSON() for claim in self.claims[prop]]
    
    if diffto and 'claims' in diffto:
        temp = defaultdict(list)
        claim_ids = set()
    
        diffto_claims = diffto['claims']
    
        for prop in claims:
            for claim in claims[prop]:
                if (prop not in diffto_claims
                        or claim not in diffto_claims[prop]):  # <- this is the 
key
                    temp[prop].append(claim)
    
                if 'id' in claim:
                    claim_ids.add(claim['id'])
  
  What does `claim not in diffto_claims[prop]`? It checks whether the JSON of 
each claim is present in what we are diffing against (the original entity 
content when it was loaded). This way it can catch if the claim was modified 
locally (eg. by calling `setTarget`). If a claim hasn't been changed locally, 
there is no point in submitting it. So why are coordinates submitted even if we 
didn't change them?
  
    {'mainsnak': {'snaktype': 'value', 'property': 'P625', 'datatype': 
'globe-coordinate', 'datavalue': {'value': {'latitude': 27.72334, 'longitude': 
109.18851, 'altitude': None, 'globe': 'http://www.wikidata.org/entity/Q2', 
'precision': None}, 'type': 'globecoordinate'}}, 'type': 'statement', 'id': 
'q486235$1280901A-091B-4374-9E3E-66CF41212194', 'rank': 'normal'}
    
    {'mainsnak': {'snaktype': 'value', 'property': 'P625', 'hash': 
'01e88adb61cc5b89e157d42d972804b49e55877b', 'datavalue': {'value': {'latitude': 
27.72334, 'longitude': 109.18851, 'altitude': None, 'precision': None, 'globe': 
'http://www.wikidata.org/entity/Q2'}, 'type': 'globecoordinate'}, 'datatype': 
'globe-coordinate'}, 'type': 'statement', 'id': 
'q486235$1280901A-091B-4374-9E3E-66CF41212194', 'rank': 'normal'}
  
  Current and original JSONs are always different. Here the only difference is 
the `hash` key. But for other datatypes, eg. `wikibase-item`...
  
    {'mainsnak': {'snaktype': 'value', 'property': 'P17', 'datatype': 
'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 
148}, 'type': 'wikibase-entityid'}}, 'type': 'statement', 'id': 
'q486235$8B49EBB5-04AD-480A-A1F1-764B5018D076', 'rank': 'normal', 'references': 
[{'snaks': {'P143': [{'snaktype': 'value', 'property': 'P143', 'datatype': 
'wikibase-item', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 
30239}, 'type': 'wikibase-entityid'}}]}, 'snaks-order': ['P143'], 'hash': 
'0ee3b3ba1c958f4c3dcba7ed8091fe4b57311348'}]}
    
    {'mainsnak': {'snaktype': 'value', 'property': 'P17', 'hash': 
'30e172796b0726589e92b001c327f5d55fa0782e', 'datavalue': {'value': 
{'entity-type': 'item', 'numeric-id': 148, 'id': 'Q148'}, 'type': 
'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 
'q486235$8B49EBB5-04AD-480A-A1F1-764B5018D076', 'rank': 'normal', 'references': 
[{'hash': '0ee3b3ba1c958f4c3dcba7ed8091fe4b57311348', 'snaks': {'P143': 
[{'snaktype': 'value', 'property': 'P143', 'hash': 
'cb49f6fa327b245e4a5aaf48c44b3f503bcd4265', 'datavalue': {'value': 
{'entity-type': 'item', 'numeric-id': 30239, 'id': 'Q30239'}, 'type': 
'wikibase-entityid'}, 'datatype': 'wikibase-item'}]}, 'snaks-order': ['P143']}]}
  
  ... even more differences. So although comparing serializations is very 
efficient, it's currently broken and also not future-proof. A promising 
solution is to use T76615: Claim equality operator 
<https://phabricator.wikimedia.org/T76615> (it doesn't compare references but 
T186200#4267477 <https://phabricator.wikimedia.org/T186200#4267477> suggests a 
way around).

TASK DETAIL
  https://phabricator.wikimedia.org/T246359

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: matej_suchanek
Cc: matej_suchanek, Aklapper, Liuxinyu970226, pywikibot-bugs-list, SilentSpike, 
Zkhalido, Viztor, Wenyi, Tbscho, MayS, Mdupont, JJMC89, Dvorapa, Altostratus, 
Avicennasis, mys_721tx, jayvdb, Ricordisamoa, Masti, Alchimista, Rxy
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to