[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-06 Thread JeroenDeDauw
JeroenDeDauw added a comment. The updating optimization ticket is relevant for this cleanup. We now have two main approaches: **delete and insert everything** Downside: performance penalty for re-inserting things that did not change

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-06 Thread JeroenDeDauw
JeroenDeDauw added a comment. @alaa_wmde seems to have a different idea of how the maintenance script would work then I do. My thoughts where that the script would need to go through each record and then do some query to find out if it is unused. That means the script would run the

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-05 Thread Ladsgroup
Ladsgroup added a comment. In T220150#5087733 , @alaa_wmde wrote: > @Ladsgroup I believe solution 3 isn't a production solution, am I right? I mean deleting in batches ought to be more performant than deleting separately? If the

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-05 Thread alaa_wmde
alaa_wmde added a comment. Thinking about it again. If what @JeroenDeDauw said about those orphan data existing in the table for a little while isn't really a privacy issue, then solution 2 should be sufficient. @Ladsgroup I believe solution 3 isn't a production solution, am I

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-05 Thread alaa_wmde
alaa_wmde added a comment. What I was thinking in order to keep it simple is this way: - there is a general cleanup script, cron scheduled - on deletion, we trigger the same script as a post-request (we need not pass it anything, it will go check and cleanup everything orphan) If

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-04 Thread JeroenDeDauw
JeroenDeDauw added a comment. So this is semi-blocked on figuring out what we do for labs, since that impacts the reasons for immediate cleanup. We can already think about this question though: if labs is not a concern, can we get away with not spending effort onto this? (no cleanup

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-04 Thread JeroenDeDauw
JeroenDeDauw added a comment. I was wondering about how much extra complexity the post request approach (4) would bring. In particular, which info do we need to give to the job. Giving the property id is not sufficient. You could give the ids of the text records and then in the job check if

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Decide on initial cleanup strategy

2019-04-04 Thread Addshore
Addshore added a comment. As pointed out at some point we probably want to remove the strings form the table fairly sharpish, as content my be removed, revdeled, and should not continue to appear in public places such as labs dB replicas. So that rules out maintenance scripts run