[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-23 Thread alaa_wmde
alaa_wmde added a comment. Sweet.. let's implement it as a post-request then. As planned, the actual clean up logic resides in doctrine-term-store (and the upcoming mediawiki-term-store) and then invoked in a job within wikibase. TASK DETAIL https://phabricator.wikimedia.org/T220150 EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-23 Thread JeroenDeDauw
JeroenDeDauw added a comment. https://www.mediawiki.org/wiki/Manual:Job_queue TASK DETAIL https://phabricator.wikimedia.org/T220150 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JeroenDeDauw Cc: sarhan.alaa, Ladsgroup, Addshore, alaa_wmde, Aklap

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-23 Thread alaa_wmde
alaa_wmde added a comment. I second the post-request solution. Do these post-request jobs exist by default in all mediawiki instances? or do other wikibase instance need to set it up extra in that case? how are these jobs done in MW actually? TASK DETAIL https://phabricator.wikimedia.org/T

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-23 Thread Ladsgroup
Ladsgroup added a comment. This needs to be a job, it's expensive compared to the main parts and will cause deadlocks (all of transactions in a webrequst all are rolled in a transaction and gets reverted all together). TASK DETAIL https://phabricator.wikimedia.org/T220150 EMAIL PREFERENCE

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-22 Thread JeroenDeDauw
JeroenDeDauw added a comment. This means we have to go with the "smart update using diff" approach, since otherwise we do not know which terms have been removed. Not clear to me it will make sense to do the cleanup in post-request, we might end up only delaying a few % of the cost. I suggest

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-16 Thread alaa_wmde
alaa_wmde added a comment. Explored in detail with @Ladsgroup .. a script that tries to identify all orphans will end up doing a full table scan. That's the case in the snippet approach in a previous comment. As pointed out by @Addshore, we will have to go with a solution that uses the

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-12 Thread Addshore
Addshore added a comment. My final thought here before heading to vacation is, when terms are removed from an entity, we already know what terms therefor might need to be removed from the other tables, so we should probably use that information rather than have some secondary process cycle t

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-11 Thread sarhan.alaa
sarhan.alaa added a comment. oh yeah sure that won't be running in production like that .. I just was wondering if there are any extra optimization here that I could've missed re using indexes or using the sub-queries. re running in batches, sure it should limit deleting to an acceptable

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-11 Thread Addshore
Addshore added a comment. I have no idea how these queries would perform, but they are scary and I don't think we should be running them on the production master DB. If cleanup is wanted from some sort of maintenance script it should probably use batches of deletes and or selects. For

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-10 Thread alaa_wmde
alaa_wmde added a comment. a quick snippet on clean up sql script DELETE FROM wbt_term_in_lang WHERE wbtl_id IS NOT IN (SELECT wbpt_term_in_lang_id FROM wbt_item_terms) AND wbtl_id IS NOT IN (SELECT wbpt_term_in_lang_id FROM wbt_property_terms) ; DELETE

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-09 Thread JeroenDeDauw
JeroenDeDauw added a comment. "delete everything" means deleting all terms for an item/property in the item/property_terms table, rather than just those that actually need to be removed. TASK DETAIL https://phabricator.wikimedia.org/T220150 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-09 Thread Addshore
Addshore added a comment. I'm slightly confused with the description 2 comments up. In T220150#5090136 , @JeroenDeDauw wrote: > **delete and insert everything** > Downside: performance penalty for re-inserting things that did not

[Wikidata-bugs] [Maniphest] [Commented On] T220150: Clean up unused records

2019-04-09 Thread JeroenDeDauw
JeroenDeDauw added a comment. We figured we go with delete and insert everything. Task description updated to reflect this. TASK DETAIL https://phabricator.wikimedia.org/T220150 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JeroenDeDauw Cc: Lad