Addshore added a comment.
Using the sqooped tables.. Looking at the first 10 million items // Find the diff spark.sql(""" SELECT term_entity_id as old, wbit_item_id as new FROM ( SELECT DISTINCT term_entity_id FROM joal.wikibase_wb_terms WHERE wiki_db = 'wikidatawiki' AND snapshot = '2019-10' AND term_entity_id > 0 AND term_entity_id < 10000000 AND term_entity_type = 'item' ) as old LEFT JOIN ( SELECT DISTINCT wbit_item_id FROM joal.wikibase_wbt_item_terms WHERE wiki_db = 'wikidatawiki' AND snapshot = '2019-10' AND wbit_item_id > 0 AND wbit_item_id < 10000000 ) as new ON term_entity_id = wbit_item_id WHERE wbit_item_id IS NULL """).repartition(64).createOrReplaceTempView("wd_comparison_2") spark.table("wd_comparison_2").cache() I find there are 14054 that seem to appear in the old table but not in the new ones. I'll generate a list of what we need to run over TASK DETAIL https://phabricator.wikimedia.org/T239470 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore Cc: Addshore, Aklapper, Iflorez, darthmon_wmde, alaa_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs