GoranSMilovanovic added a comment.
- Cutting this thing into batches (2.5M items x various number of external
identifiers per batch) w. `pyspark`;
- hopefully, R `{data.table}` will be able to put it back together w.
`rbindlist()` and compute the contingency table.
TASK DETAIL
https://phabricator.wikimedia.org/T214897
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: RazShuty, Addshore, JAllemandou, Aklapper, GoranSMilovanovic,
Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen,
rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs