[Wikidata-bugs] [Maniphest] [Commented On] T171461: Populate term_full_entity_id on test.wikidata.org

daniel Fri, 28 Jul 2017 15:01:33 -0700

daniel added a comment.

@Ladsgroup yey, but I think there's a middle way that is much faster than a complete rebuild, and more robust than a mega-query. I imagine an algorithm like this:

Declare an empty list of row-ids to delete.
Iterate over all entities. For each entity:
  Load all terms into an array.
  In that array, find all duplicates
    and add their row-ids to the deletion list.
  When the deletion list hits some limit:
    delete the rows that are in the deletion list
    call commitAndWaitForReplication. 
    reset the deletion list

This can be stopped and continues at any time, does batched insert and wait, and only runs small, trivial select queries.

TASK DETAIL

https://phabricator.wikimedia.org/T171461

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: aude, daniel
Cc: Marostegui, PokestarFan, hoo, Aklapper, Ladsgroup, aude, daniel, GoranSMilovanovic, QZanden, Minhnv-2809, Izno, Luke081515, Wikidata-bugs, Mbch331, Jay8g, Krenair

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T171461: Populate term_full_entity_id on test.wikidata.org

Reply via email to