[Wikidata-bugs] [Maniphest] [Commented On] T151993: Implement ChangeDispatchCoordinator based on RedisLockManager

2017-03-27 Thread jcrespo
jcrespo added a comment. It just occurred to me an extra reason to avoid using a db master- master failover is a relative frequent operation: it will happen every time the master mysql is upgraded, or when there is a datacenter failover (2 of those will happen on April/May)- probably there wasn't

[Wikidata-bugs] [Maniphest] [Commented On] T159828: Use redis-based lock manager for dispatchChanges on test sites.

2017-03-29 Thread jcrespo
jcrespo added a comment. Sorry about this- you are not the only "sufferers" of beta not being a reliable place for testing in a truly distributed fashion- we were just discussing this on IRC. I also support a test on test, and offer my help if I can provide it. Thanks again f

[Wikidata-bugs] [Maniphest] [Created] T160887: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user

2017-03-20 Thread jcrespo
jcrespo created this task.jcrespo added projects: Wikidata, DBA.Herald added a subscriber: Aklapper. TASK DESCRIPTIONFor example, I found on db1070 2 long running queries: Server Connection User Client Database Time db1070 51133099 wikiuser mw1256 wikidatawiki 19h SELECT /* Wikibase\Repo

[Wikidata-bugs] [Maniphest] [Commented On] T160887: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user

2017-04-04 Thread jcrespo
jcrespo added a comment. Thank you very much for working on this- do you have an estimation on when this will be fully deployed?TASK DETAILhttps://phabricator.wikimedia.org/T160887EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot

[Wikidata-bugs] [Maniphest] [Commented On] T160887: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user

2017-04-04 Thread jcrespo
jcrespo added a comment. Thank you again!TASK DETAILhttps://phabricator.wikimedia.org/T160887EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, Lydia_Pintscher, thiemowmde, Marostegui, aude, hoo, daniel, Aklapper, jcrespo, QZanden, Salgo60

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-12 Thread jcrespo
jcrespo added a comment. I would ask Addshore to confirm by running SELECT on the empty tables from terbium/tin, etc, using mediawiki scripts.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Marostegui

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-04-12 Thread jcrespo
jcrespo added a comment. I did not understand your last comment, is the previous patch invalid? Do you have another patch to show me?TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Halfak, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-04-12 Thread jcrespo
jcrespo added a comment. To clarify, I have to be specially strict in this particular case because in the past, wbc_entity_usage (with the exception of linksupdate job) was a large point of contention and a major cause of lag, and this ticket starts by saying: we'd write a lot (?) more rows

[Wikidata-bugs] [Maniphest] [Updated] T159828: Use redis-based lock manager for dispatchChanges on test.wikidata.org

2017-04-07 Thread jcrespo
jcrespo added a comment. As a small side note- that can also happen on mysql. Despite locks being released on session disconnection, there has been some occasions where the mysql session is not killed (it continuous), but the thread on mediawiki has been. There are several known bugs about

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-04-14 Thread jcrespo
jcrespo added a comment. We want to collect additional information on one of these wikis for a while If that doesn't involve a schema change, sure.TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc

[Wikidata-bugs] [Maniphest] [Created] T163544: Wikibase\TermSqlIndex::getMatchingTerms , Wikibase\TermSqlIndex::fetchTerms have bad performance after codfw failover

2017-04-21 Thread jcrespo
jcrespo created this task.jcrespo added a project: Wikidata. TASK DESCRIPTIONthese 2 queries were among the most expensive on datacenter failover or after it- while it is normal to have lower performance than usual due to colder caches, most likely they are surfacing issues with improvements

[Wikidata-bugs] [Maniphest] [Edited] T163544: Wikibase\TermSqlIndex::getMatchingTerms , Wikibase\TermSqlIndex::fetchTerms have bad performance after codfw failover

2017-04-21 Thread jcrespo
jcrespo edited the task description. (Show Details) EDIT DETAILSthese 2 queries were among the most expensive on datacenter failover or after it- while it is normal to have lower performance than usual due to colder caches, most likely they are surfacing existing issues with improvements, which

[Wikidata-bugs] [Maniphest] [Commented On] T163544: Wikibase\TermSqlIndex::getMatchingTerms , Wikibase\TermSqlIndex::fetchTerms have bad performance after codfw failover

2017-04-21 Thread jcrespo
jcrespo added a comment. I was thinking on https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_eq_range_index_dive_limit but that is probably more wishful thinking than practical give we are on MariaDB 10.TASK DETAILhttps://phabricator.wikimedia.org/T163544EMAIL

[Wikidata-bugs] [Maniphest] [Updated] T111535: Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks

2017-03-02 Thread jcrespo
jcrespo added subscribers: ArielGlenn, Zppix.jcrespo merged a task: T139636: Wikidata Database contention under high edit rate. TASK DETAILhttps://phabricator.wikimedia.org/T111535EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Zppix, ArielGlenn

[Wikidata-bugs] [Maniphest] [Merged] T139636: Wikidata Database contention under high edit rate

2017-03-02 Thread jcrespo
jcrespo closed this task as a duplicate of T111535: Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks. TASK DETAILhttps://phabricator.wikimedia.org/T139636EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc

[Wikidata-bugs] [Maniphest] [Commented On] T111535: Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks

2017-03-02 Thread jcrespo
jcrespo added a comment. See merged ticket, this happened again when 300-400 new pages per minute were being created, with 35 parallel threads.TASK DETAILhttps://phabricator.wikimedia.org/T111535EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Zppix

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. Depending on the answer to this, we will plan further steps. I think you should add the full plan here ASAP, even if it is not 100% clear or decided, otherwise we may be adding steps to the process and make it unnecessarily long. E.g. if you plan to add an index later

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. Evaluate if it is feasible to add such an "empty" column without making Wikidata readonly. we can probably do it How certain are you? In my experience, the biggest blocker on production is not the size, but how busy the table is. That would create metadata lock

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate feasibility of adding a column for full entity ID to wb_terms

2017-03-06 Thread jcrespo
jcrespo added a comment. @Marostegui I think they do not yet want it done yet, but an ok from us/review. But they should probably clarify that. "feasibility" is an ambiguous term.TASK DETAILhttps://phabricator.wikimedia.org/T159718EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. If we depool the slaves we should be fine, shouldn't we? And if we use the DC switchover to alter the masters we'd also get rid of that issue? Hey, don't tell me, tell @WMDE-leszek, and see if he is ok with that schedule. :-)TASK DETAILhttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. Ok, now I have some comments against that method, logistically, I am at a meeting, let me finish it and I will have some time to properly explain myself (nothing against the spirit of the changes, I would just do it in a different way, if code can handle it).TASK

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. So the comments: do not defer the creation of the indexes- those are extra alter tables and do no make things easier in any way- just create the indexes from the start- assuming they will be used. Renaming columns is a big no- specially to an already existent name

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. @Addshore I hope labs access is not a blocker for this, that can be done at a later date.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Liuxinyu970226, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. production hosts Was thinking on dbstore (backup) hosts, which were problematic (remember you where the ones to set up those last time) + private table filtering. x1 has been traditionally not replicated to labs, this can be challenging (I would start by not replicationg

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. So that we are not a blocker- creation of tables in production, specially if we have areadly given the OK to the plans, is not considered a schema change, so anyone with production rights can do it- you just need to mark it on the deployments calendar. Wis this we (DBAs

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. This sentence actually confused me x1 == extension1 :-)TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, Marostegui, Lydia_Pintscher, Aklapper, Lea_Lacroix_WMDE

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. To clarify- it may be blocked on us right now to create the database and because labs filtering is not well managed, but the general idea stays for normal table creations.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-03 Thread jcrespo
jcrespo added a comment. $ check_mariadb.py -h db1052 --slave-status --primary-dc=eqiad {"datetime": 1501777331.898183, "ssl_expiration": 1619276854.0, "connection": "ok", "connection_latency": 0.07626748085021973, "ssl": true, "to

[Wikidata-bugs] [Maniphest] [Updated] T171928: Wikidata and dewiki databases locked

2017-08-04 Thread jcrespo
jcrespo added a subtask: T172490: Monitor swap/memory usage on databases. TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, mark, Marostegui, Elitre, Joe, jcrespo, greg, Mbch331

[Wikidata-bugs] [Maniphest] [Updated] T171928: Wikidata and dewiki databases locked

2017-08-04 Thread jcrespo
jcrespo added a subtask: T172489: Monitor read_only variable and/or uptime on atabase masters, make it page. TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, mark, Marostegui, Elitre

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-10 Thread jcrespo
jcrespo added a comment. I've been told that several thousands of UPDATES Title::invalidateCache per second had caused trouble on s7 over the night, not sure if this is related: https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?orgId=1=eqiad%20prometheus%2Fops=7=1502261251242

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-10 Thread jcrespo
jcrespo added a comment. To avoid the continuous lagging on non-directly pooled hosts (passive dc codfw, labs, other hosts replicating on a second tier), I have forced a slowdown of writes to go at the pace of the slowest slaves of eqiad with semisync replication, adding automaticaly a pause of up

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T173189: Investigate separating wbc_entity_usage out to a separate mariadb shard

2017-08-12 Thread jcrespo
jcrespo added subscribers: Manuel, jcrespo.jcrespo added a comment. Yes, a dedicated database service for this could make sense to separate writes if tracking changes starts taking most of the database writes. I suggested this as a means to offload some of the load if it starts to become a problem

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata database locked

2017-07-28 Thread jcrespo
jcrespo added a comment. Database crashed, it should be ok to edit now.TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, greg, Mbch331, Smalyshev, MisterSynergy, TerraCodes, Jay8g

[Wikidata-bugs] [Maniphest] [Claimed] T171928: Wikidata database locked

2017-07-28 Thread jcrespo
jcrespo claimed this task.jcrespo added a comment. Investigation is not over, here is what we have found out for now of the causes: https://wikitech.wikimedia.org/wiki/Incident_documentation/20170728-s5_(WikiData_and_dewiki)_read-onlyTASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL

[Wikidata-bugs] [Maniphest] [Retitled] T171928: Wikidata and dewiki databases locked

2017-07-28 Thread jcrespo
jcrespo renamed this task from "Wikidata database locked" to "Wikidata and dewiki databases locked". TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Joe, jcrespo, greg

[Wikidata-bugs] [Maniphest] [Created] T169336: slow master queries on Wikibase\Client\Usage\Sql\EntityUsageTable::getAffectedRowIds

2017-07-01 Thread jcrespo
jcrespo created this task.jcrespo added projects: Wikidata, Performance, DBA.Herald added a subscriber: Aklapper. TASK DESCRIPTIONThe following query was detected running on a master database: Host User Schema Client Source Thread Transaction Runtime Stamp db1063 wikiuser wikidatawiki

[Wikidata-bugs] [Maniphest] [Commented On] T169336: slow master queries on Wikibase\Client\Usage\Sql\EntityUsageTable::getAffectedRowIds

2017-07-05 Thread jcrespo
jcrespo added a comment. I've setup a temporary watchdog on the s5 master: pt-kill F=/dev/null --socket=/tmp/mysql.sock --print --kill --victims=all --match-info="EntityUsageTable" --match-db=wikidatawiki --match-user=wikiuser --busy-time=1 This will mitigate for now the close-to

[Wikidata-bugs] [Maniphest] [Commented On] T169336: slow master queries on Wikibase\Client\Usage\Sql\EntityUsageTable::getAffectedRowIds

2017-07-05 Thread jcrespo
jcrespo added a comment. This is still ongoing with 15-minute queries. I am going to setup a task to kill all related queries on s5-master to prevent a potential outage of dewiki and wikidata writesTASK DETAILhttps://phabricator.wikimedia.org/T169336EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Updated] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-07-07 Thread jcrespo
jcrespo added a comment. Probably related: T169884TASK DETAILhttps://phabricator.wikimedia.org/T164173EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Krinkle, aaron, MZMcBride, daniel, Ladsgroup, hoo, Marostegui, Aklapper, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-07-07 Thread jcrespo
jcrespo added a comment. I also wonder why some of those log warnings come from close() and others have the proper commitMasterChanges() bit in the stack trace. Normally, there should be nothing to commit by close() and it is just commits for sanity. We were theorizing the other day on IRC

[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-04-24 Thread jcrespo
jcrespo edited projects, added MediaWiki-Database; removed DBA.jcrespo added a comment. That is the max lag, and it is normal on the slaves that are not waited by mediawiki. This issue has nothing to do with databases, mediawiki does what it is programmed to do: if it detects lag even if a few

[Wikidata-bugs] [Maniphest] [Updated] T86530: Replace wb_terms table with more specialized mechanisms for terms (tracking)

2017-04-28 Thread jcrespo
jcrespo added a comment. I may have done this comment on the wrong ticket: T163551#3221748TASK DETAILhttps://phabricator.wikimedia.org/T86530EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, Ricordisamoa, Lydia_Pintscher, adrianheine

[Wikidata-bugs] [Maniphest] [Commented On] T163551: Huge number of duplicate rows in wb_terms

2017-04-28 Thread jcrespo
jcrespo added a comment. This is a bit offtopic to T163551 but with the latest schema changes, wb_terms has become the largest table on a wiki (with the exception of revision on enwiki and image on commons)- and I think it will get bigger once the new column (I assume) gets populated with actual

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. I have started working on more complete monitoring, useful if we go over the route of human monitoring rather than automation, here is one example: $ ./check_mariadb.py --icinga -h db1052.eqiad.wmnet --check_read_only=0 Version 10.0.28-MariaDB, Uptime 16295390s, read_only

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. I've almost finished the above incident documentation. However, I am unsure about which are the right actionables and their priorities (last section). let's use this ticket to agree on what would be the best followup, a) making puppet change read-only state of the db

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. Wikidata goes into read-only the subscriptions mentioned Yes, definitely some extensions in the past do not behave perfectly and do not respect mediawiki's read-only mode- I do not know what is the sate of Wikidata, but for what you say, a ticket should be filed so its

[Wikidata-bugs] [Maniphest] [Closed] T171928: Wikidata and dewiki databases locked

2017-08-04 Thread jcrespo
jcrespo closed this task as "Resolved".jcrespo added a comment. I have created all actionables on both the incident documentation ( https://wikitech.wikimedia.org/wiki/Incident_documentation/20170728-s5_(WikiData_and_dewiki)_read-only ) and phabricator- consequently, I have closed t

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-18 Thread jcrespo
jcrespo added a comment. @thcipriani @aaron I know something was done yesterday, (thank you!), may I ask for an update of the state, to know if the UBN is still active?TASK DETAILhttps://phabricator.wikimedia.org/T164173EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Raised Priority] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-17 Thread jcrespo
jcrespo raised the priority of this task from "High" to "Unbreak Now!".Herald added subscribers: Liuxinyu970226, Jay8g, TerraCodes. TASK DETAILhttps://phabricator.wikimedia.org/T164173EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/T

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-17 Thread jcrespo
jcrespo added a comment. Setting as unbreak now because this is preventing collaborators from editing articles, which causes not only to slowdown volunteers, also causing large frustration among them: https://grafana.wikimedia.org/dashboard/db/wikidata-edits?refresh=1m=1=1=1502980683784

[Wikidata-bugs] [Maniphest] [Edited] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-17 Thread jcrespo
jcrespo updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION**Edit**: This used to be punctual at some point, now it is causing generalized edit slowdowns and inability to read recentchanges/watchlists on several wikis. The most impacted ones are s2 and s7, but looks

[Wikidata-bugs] [Maniphest] [Commented On] T173269: Wikidata.org currently very slow

2017-08-17 Thread jcrespo
jcrespo added a comment. I think this is just UPDATE /* Title::invalidateCache */ based on the binlogs, not the above script.TASK DETAILhttps://phabricator.wikimedia.org/T173269EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: WMDE-leszek, jcrespo

[Wikidata-bugs] [Maniphest] [Updated] T173269: Wikidata.org currently very slow

2017-08-17 Thread jcrespo
jcrespo added a comment. which means most probably a direct cause of T164173, getting worse?TASK DETAILhttps://phabricator.wikimedia.org/T173269EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: WMDE-leszek, jcrespo, Peachey88, Marostegui, Krenair

[Wikidata-bugs] [Maniphest] [Raised Priority] T173269: Wikidata.org currently very slow

2017-08-17 Thread jcrespo
jcrespo raised the priority of this task from "Normal" to "Unbreak Now!".jcrespo added a comment. The execution of: www-data 20284 0.0 0.1 328216 60744 ?S15:30 0:00 php5 /srv/mediawiki-staging/multiversion/MWScript.php extensions/Wikidata/extensions/Wikib

[Wikidata-bugs] [Maniphest] [Commented On] T162539: Deploy schema change for adding term_full_entity_id column to wb_terms table

2017-05-14 Thread jcrespo
jcrespo added a comment. @aude: don't run update.php on s3 for altering a table- you will create lag on 900 wikis unless connections are cleared and table is pre-warmed-up (and appropiately tested). Also, the only wiki mentioned here was wikidatawiki, you have to create a separate request

[Wikidata-bugs] [Maniphest] [Updated] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread jcrespo
jcrespo added a comment. So do you think this had something to do with reports like T123867 T164191? This is highly surprising- I was expecting low to no master or replication performance impact, but zero is highly suspicious. Was this expected? Couldn't this be related to a bug on monitoring due

[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread jcrespo
jcrespo added a comment. With the current state, we still have the same amount of connections to the master DBs, but we don't use GET_LOCK etc. on them anymore. And that for me is a huge win alone.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T165246: Add term_full_entity_id column to wb_terms table on testwikidatawiki

2017-05-16 Thread jcrespo
jcrespo added a comment. The problem usually is not the alter size, but the metadata locking, which creates way more contention.TASK DETAILhttps://phabricator.wikimedia.org/T165246EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Marostegui, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-05-26 Thread jcrespo
jcrespo added a comment. While contention is bad in general- it is the opposite of lag- more contention would create less lag - of course it could be a common source: large updates causing contention on the master, and then lag because the transaction size is large. I wonder if instead of fixing

[Wikidata-bugs] [Maniphest] [Commented On] T163551: Huge number of duplicate rows in wb_terms

2017-05-19 Thread jcrespo
jcrespo added a comment. To clarify- reads on a slave are not a big concern for MySQL- of course, if you get in the end better latency, that is cool (and I normally ping because it means there is an inefficiency that could be solved); but reads are easy to scale in the large order of things ("

[Wikidata-bugs] [Maniphest] [Commented On] T176273: Move the wbc_entity_usage table onto a dedicated DB shard

2017-09-20 Thread jcrespo
jcrespo added a comment. @hoo Regarding Wikimedia setup, you must know that it is our priority right now to move wikidata to a dedicated server group; which means from ops side no other structural change can happen at the same time. This is still needed (and doing it is blocked on code being

[Wikidata-bugs] [Maniphest] [Commented On] T171027: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-09-22 Thread jcrespo
jcrespo added a comment. @Catrope Unfortunately it will take quite a bit of time to research and investigate the index, and I'm not able to attend to this right now due to several other currently happening infrastructure fires going on. I already commented some possibilities and I am open to other

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-09-20 Thread jcrespo
jcrespo added a comment. Cool, get if you can some `SHOW TABLE STATUS like stats, to get the "before" state in bytes (even if it had started some time ago).TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailp

[Wikidata-bugs] [Maniphest] [Commented On] T163551: Huge number of duplicate rows in wb_terms

2017-09-17 Thread jcrespo
jcrespo added a comment. We identify and delete duplicate rows (not trivial, but not difficult either), then we add a UNIQUE restriction over that combination of columns so that never happens again.TASK DETAILhttps://phabricator.wikimedia.org/T163551EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T163551: Huge number of duplicate rows in wb_terms

2017-09-17 Thread jcrespo
jcrespo added a comment. Not sure if with "you", you mean me, but if it is safe, yes. We may have to defragment the table later to reclaim disk space, but that can be done later and it is not a blocker.TASK DETAILhttps://phabricator.wikimedia.org/T163551EMAIL PREFER

[Wikidata-bugs] [Maniphest] [Commented On] T176273: Move the wbc_entity_usage table onto a dedicated DB shard

2017-09-20 Thread jcrespo
jcrespo added a comment. I know- it is only related because the wikidata migration require replication channels movement and that consumes DBA time, not because it contains wikidata.TASK DETAILhttps://phabricator.wikimedia.org/T176273EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-10-05 Thread jcrespo
jcrespo added a comment. I am a bit lost with the estimation- is that realistic, is the number of usages more or less right with your estimation, or is it going to be a quadratic _expression_ because we do not know accurately the number of statement usages per statement? If the estimations

[Wikidata-bugs] [Maniphest] [Changed Project Column] T177601: Deploy dropping wb_entity_per_page table

2017-10-06 Thread jcrespo
jcrespo removed a project: Blocked-on-schema-change.jcrespo moved this task from Backlog to Blocked external/Not db team on the DBA board.jcrespo added a comment. Add #blocked-on-schema-change, that way we will know it is ready :-)TASK DETAILhttps://phabricator.wikimedia.org/T177601WORKBOARDhttps

[Wikidata-bugs] [Maniphest] [Created] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-09 Thread jcrespo
jcrespo created this task.jcrespo added projects: MediaWiki-Watchlist, DBA, Wikidata.Herald added subscribers: Liuxinyu970226, Jay8g, TerraCodes, Aklapper. TASK DESCRIPTIONOnce rows stop from coming in (see parent task), this will likely either solve or mitigate performance issues. The rows

[Wikidata-bugs] [Maniphest] [Block] T171027: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-10-09 Thread jcrespo
jcrespo created subtask T12: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata). TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T171027: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-10-10 Thread jcrespo
jcrespo added a comment. I forgot to say we suspect the same thing happens on other s3 hosts, but these 2 previous wikis create so many errors that it is difficult to say until we fix these 2.TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-10-09 Thread jcrespo
jcrespo added a comment. What you write is ok, but IF you want our opinion, can you translate that into increase of row storage and inserts/other write activity compared to the full table size/previous state?TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-10-09 Thread jcrespo
jcrespo added a comment. I have the feeling that these numbers could be meaningless on such small wikis, given the issues on recentchanges with only some large wikis such as commons and ruwiki. Could the same issue happen to those a x10 growth. I know the case is different, pages * edits, vs

[Wikidata-bugs] [Maniphest] [Updated] T177707: don't dispatch changes to all affected pages for highly used items

2017-10-10 Thread jcrespo
jcrespo added a subscriber: brion.jcrespo added a comment. you are essentially proposing to get rid of the recentchanges table altogether No, I was asking to keep recentchanges as it was before, with the previous load/meaning and implement "new features" on separate tables, or on separa

[Wikidata-bugs] [Maniphest] [Commented On] T177707: don't dispatch changes to all affected pages for highly used items

2017-10-10 Thread jcrespo
jcrespo added a comment. One last thought, it is early to say, but commonswiki seem to have stopped having insert spikes of 100x the normal rate: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=2=1=eqiad%20prometheus%2Fops=db1068=9104=now-7d=now (click "inserts") Somethi

[Wikidata-bugs] [Maniphest] [Commented On] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-10 Thread jcrespo
jcrespo added a comment. I have allowed for codfw to lag- so that we can go at around 500 deletes/s. That means the whole thing will take less than 3 hours. Shout if anyone see any strangeness on comonswiki (you shouldn't)- worse case scenario- kill the pt-archiver job on the screen session

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-10-10 Thread jcrespo
jcrespo added a comment. 51.5M you meant, maybe?TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Bawolff, eranroz, Ottomata, PokestarFan, Ladsgroup, Stashbot, gerritbot, Halfak, jcrespo, TomT0m

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-10-09 Thread jcrespo
jcrespo added a comment. What I mean is that the number are ok to proceeed (not a big deal), but still worried for the large wikis. I know you do not have all the answers, I just was talking aloud.TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-09 Thread jcrespo
jcrespo added a comment. I am leaving a screen open on dbstore1002 loading a copy of recentchanges from commonswiki. Because of the size, it will take some time to be copied. Tomorrow I will test purging the table with something such as: pt-archiver --source h=dbstore1002.eqiad.wmnet,D

[Wikidata-bugs] [Maniphest] [Updated] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-10 Thread jcrespo
jcrespo added a comment. That means the whole thing will take less than 3 hours I had a mind slip... we have to delete 60M rows, not 6M, that means 30 hours, not 3. I ran this for 6 hours, 10M rows were deleted. We will continue after the s4 maintenance tomorrow: T168661TASK DETAILhttps

[Wikidata-bugs] [Maniphest] [Unblock] T90435: [Epic] Wikidata watchlist improvements (client)

2017-10-16 Thread jcrespo
jcrespo closed subtask T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T90435EMAIL PREFERENCEShttps://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Lowered Priority] T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-10-13 Thread jcrespo
jcrespo lowered the priority of this task from "Unbreak Now!" to "Normal".jcrespo added a comment. Lowering priority unless we get more reports from other ruwiki or commons users.TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Unassigned] T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-10-13 Thread jcrespo
jcrespo removed jcrespo as the assignee of this task. TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: JEumerus, Alsee, awight, Matthewrbowker, Noella94, Nirmos, Stryn, Mike_Peel, Capankajsmilyo

[Wikidata-bugs] [Maniphest] [Unblock] T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-10-13 Thread jcrespo
jcrespo closed subtask T12: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Closed] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-13 Thread jcrespo
jcrespo closed this task as "Resolved".jcrespo added a comment. There are more things pending, like running optimize table on the non-rc replicas of eqiad or all of codfw, and checking other wikis different from ruwiki and commons, but the initial scope (emergency) has been fixed.TASK D

[Wikidata-bugs] [Maniphest] [Commented On] T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-10-13 Thread jcrespo
jcrespo added a comment. The initial scope- query issues on ruwiki and commonswiki, I think has been successfully patched. These are the statistics I have now: F10184141: Screenshot_20171013_165442.png I would ask users affected to confirm the issues are gone- probably not 100% of them

[Wikidata-bugs] [Maniphest] [Commented On] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-11 Thread jcrespo
jcrespo added a comment. ruwiki results are more extreme: root@db2076[ruwiki]> SELECT count(*) FROM recentchanges; +--+ | count(*) | +--+ | 37427748 | +--+ 1 row in set (11.19 sec) root@db2076[ruwiki]> SELECT COUNT(*) FROM recentchanges WHERE rc_source

[Wikidata-bugs] [Maniphest] [Commented On] T178063: Decide on scalable approach for watchlist integration of Wikidata

2017-10-12 Thread jcrespo
jcrespo added a comment. Small correction/clarification on "this was found to generate too much load" as an ops, I interpret load as throughput/backlog work. The insertion [load] itself was not the problem (the spikes on inserts were too large, but something that could be smoothed); t

[Wikidata-bugs] [Maniphest] [Commented On] T174028: Finalize database schema for MCR content meta-data

2017-10-12 Thread jcrespo
jcrespo added a comment. @Anomie I know it is on your mind, it was a wink that for any estimation we give you know on some schema changes, I firmly believe we can cut by 10 the time it takes to deploy them if that was already there. Also an explanation why they take some much time right now.TASK

[Wikidata-bugs] [Maniphest] [Commented On] T174028: Finalize database schema for MCR content meta-data

2017-10-12 Thread jcrespo
jcrespo added a comment. One additional comment is that if you kind of agree, in some cases, to do the "new table and copy" pattern rather than an alter. It is much easier to add or transform columns later, so no need to optimize in advance. NULLs in compact or above rows formats take

[Wikidata-bugs] [Maniphest] [Commented On] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-12 Thread jcrespo
jcrespo added a comment. root@db2076[ruwiki]> SELECT COUNT(*) FROM recentchanges WHERE rc_source = 'wb'; +--+ | COUNT(*) | +--+ |0 | +--+ 1 row in set (1.19 sec)TASK DETAILhttps://phabricator.wikimedia.org/T12EMAIL PREFERENCEShttps://phabricator.wikimedia.

[Wikidata-bugs] [Maniphest] [Commented On] T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-10-12 Thread jcrespo
jcrespo added a comment. This is the result on query error rate after ruwiki has been purged: F10158519: Screenshot_20171012_130744.png Please note that these are preliminary results, and that tables have yet to be optimized/analyzed, but I think this confirms we are in the right track. Commons

[Wikidata-bugs] [Maniphest] [Commented On] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata)

2017-10-13 Thread jcrespo
jcrespo added a comment. 53156406 rows purged on commons so far of the initial 58M estimation (it will probably be less because regular rc purge by timestamp).TASK DETAILhttps://phabricator.wikimedia.org/T12EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-06 Thread jcrespo
jcrespo added a comment. Of course, that doesn't apply to cases that are limited by a common resource (e.g. database). If I could add to the ideal scenario, the jobqueue would have dedicated slaves AND would write with a different domain id (allowing parallelism) than the rest of the writes so we

[Wikidata-bugs] [Maniphest] [Updated] T173710: Job queue is increasing non-stop

2017-09-07 Thread jcrespo
jcrespo added a comment. Could, at least, that part have something to do with T164173, as a problem from the same cause, or a consequence of the fix? I also remember some tunning of some wikidata crons or job size, but not sure if upwards or downwards and not sure if related. Aaron, Ladsgroup

[Wikidata-bugs] [Maniphest] [Commented On] T171027: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-09-12 Thread jcrespo
jcrespo added a comment. This is my view of the issue, based on the comments above. Short term: If the number of watchlist items for the users is less than N (N to be determined), join Watchlist -> recentchanges, then "suffer" a small in-memory sort (I belive this is the current si

[Wikidata-bugs] [Maniphest] [Commented On] T171027: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-09-12 Thread jcrespo
jcrespo added a comment. After testing some indexes, I do not see a huge improvement- we can reduce from scanning 100M rows to 18M, but there can be always a combination of query parameters that does not filter many rows on recentchanges. Paging by id (or timestamp) is the only reliable solution

[Wikidata-bugs] [Maniphest] [Updated] T171027: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-09-12 Thread jcrespo
jcrespo added a comment. Using STRAIGHT_JOIN and making sure an index is used more efficiently could help too (here I change the original query to order by rc_timestamp, and the explain is better, but the query is actually slower P5988#33002).TASK DETAILhttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T171027: "2062 Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis

2017-09-12 Thread jcrespo
jcrespo added a comment. I am testing with a new index on dbstore1002; meanwhile I had a chat with Bawolff and he mentioned that rc used to be a small table where many indexes an inefficient scanning was possible because it was a much smaller summary of revision. Apparently with the latest

<    1   2   3   4   5   6   >