| daniel added a comment. |
@Ladsgroup yey, but I think there's a middle way that is much faster than a complete rebuild, and more robust than a mega-query. I imagine an algorithm like this:
Declare an empty list of row-ids to delete.
Iterate over all entities. For each entity:
Load all terms into an array.
In that array, find all duplicates
and add their row-ids to the deletion list.
When the deletion list hits some limit:
delete the rows that are in the deletion list
call commitAndWaitForReplication.
reset the deletion listThis can be stopped and continues at any time, does batched insert and wait, and only runs small, trivial select queries.
TASK DETAIL
EMAIL PREFERENCES
To: aude, daniel
Cc: Marostegui, PokestarFan, hoo, Aklapper, Ladsgroup, aude, daniel, GoranSMilovanovic, QZanden, Minhnv-2809, Izno, Luke081515, Wikidata-bugs, Mbch331, Jay8g, Krenair
Cc: Marostegui, PokestarFan, hoo, Aklapper, Ladsgroup, aude, daniel, GoranSMilovanovic, QZanden, Minhnv-2809, Izno, Luke081515, Wikidata-bugs, Mbch331, Jay8g, Krenair
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
