Rosalie_WMDE added a comment.

  I ran this python script on 
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 and it 
ran for about 4 hours, ( did not complete but I terminated it, as the json dump 
in about 62G of data) . Nevertheless for those 5 hours I did not get a 
malformed precision in the output ( `badprecisions.txt` ) fille. in otherwords 
they are extremely rare in the database or they are not present or //the script 
is missing something?//.
  
    import sys
    import json
    import os
    import bz2
    import gzip
    GlOBE_COORDINATE_PROPERTIES= [
        'P625', 'P626', 'P1259', 'P1332',
        'P1333', 'P1334', 'P1335', 'P2786',
        'P5140', 'P8981', 'P9149'
    ]
    def read_dump(path):
        mode = 'r'
        file_ = os.path.split(path)[-1]
        if file_.endswith('.gz'):
            f = gzip.open(path, mode)
        elif file_.endswith('.bz2'):
            f = bz2.BZ2File(path, mode)
        elif file_.endswith('.json'):
            f = open(path, mode)
        else:
            raise NotImplementedError(f'Reading file {file_} is not supported')
        try:
            for line in f:
                if isinstance(line, bytes):
                    line = line.decode('utf-8')
                try:
                    yield json.loads(line.strip().strip(','))
                except json.JSONDecodeError:
                    continue
        finally:
            f.close()
    
    with open('badprecisions.txt', 'w') as f:
        f.write('')
    
    
    for item in read_dump(sys.argv[1]):
        precision = 0
        for geoProperty in  GlOBE_COORDINATE_PROPERTIES:
            for claim in item.get('claims', {}).get(geoProperty, []):
                try:
                    precision = 
claim['mainsnak']['datavalue']['value']['precision']
                    if precision >= 360 or precision <= -360:
                        with open('badprecisions.txt', 'a') as f:
                            f.write(id_ +'\t'+precision+ '\n')
                except:
                    continue

TASK DETAIL
  https://phabricator.wikimedia.org/T283576

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Rosalie_WMDE
Cc: Aklapper, Addshore, ItamarWMDE, Invadibot, maantietaja, Akuckartz, Iflorez, 
alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to