GoranSMilovanovic added a comment.

  - Cutting this thing into batches (2.5M items x various number of external 
identifiers per batch) w. `pyspark`;
  - hopefully, R `{data.table}` will be able to put it back together w. 
`rbindlist()` and compute the contingency table.

TASK DETAIL
  https://phabricator.wikimedia.org/T214897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: RazShuty, Addshore, JAllemandou, Aklapper, GoranSMilovanovic, 
Lydia_Pintscher, alaa_wmde, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, 
rosalieper, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to