daniel added a comment.

@Ladsgroup yey, but I think there's a middle way that is much faster than a complete rebuild, and more robust than a mega-query. I imagine an algorithm like this:

Declare an empty list of row-ids to delete.
Iterate over all entities. For each entity:
  Load all terms into an array.
  In that array, find all duplicates
    and add their row-ids to the deletion list.
  When the deletion list hits some limit:
    delete the rows that are in the deletion list
    call commitAndWaitForReplication. 
    reset the deletion list

This can be stopped and continues at any time, does batched insert and wait, and only runs small, trivial select queries.


TASK DETAIL
https://phabricator.wikimedia.org/T171461

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: aude, daniel
Cc: Marostegui, PokestarFan, hoo, Aklapper, Ladsgroup, aude, daniel, GoranSMilovanovic, QZanden, Minhnv-2809, Izno, Luke081515, Wikidata-bugs, Mbch331, Jay8g, Krenair
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to