Rosalie_WMDE added a comment.
I ran this python script on https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 and it ran for about 4 hours, ( did not complete but I terminated it, as the json dump in about 62G of data) . Nevertheless for those 5 hours I did not get a malformed precision in the output ( `badprecisions.txt` ) fille. in otherwords they are extremely rare in the database or they are not present or //the script is missing something?//. import sys import json import os import bz2 import gzip GlOBE_COORDINATE_PROPERTIES= [ 'P625', 'P626', 'P1259', 'P1332', 'P1333', 'P1334', 'P1335', 'P2786', 'P5140', 'P8981', 'P9149' ] def read_dump(path): mode = 'r' file_ = os.path.split(path)[-1] if file_.endswith('.gz'): f = gzip.open(path, mode) elif file_.endswith('.bz2'): f = bz2.BZ2File(path, mode) elif file_.endswith('.json'): f = open(path, mode) else: raise NotImplementedError(f'Reading file {file_} is not supported') try: for line in f: if isinstance(line, bytes): line = line.decode('utf-8') try: yield json.loads(line.strip().strip(',')) except json.JSONDecodeError: continue finally: f.close() with open('badprecisions.txt', 'w') as f: f.write('') for item in read_dump(sys.argv[1]): precision = 0 for geoProperty in GlOBE_COORDINATE_PROPERTIES: for claim in item.get('claims', {}).get(geoProperty, []): try: precision = claim['mainsnak']['datavalue']['value']['precision'] if precision >= 360 or precision <= -360: with open('badprecisions.txt', 'a') as f: f.write(id_ +'\t'+precision+ '\n') except: continue TASK DETAIL https://phabricator.wikimedia.org/T283576 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Rosalie_WMDE Cc: Aklapper, Addshore, ItamarWMDE, Invadibot, maantietaja, Akuckartz, Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org