[Wikidata-bugs] [Maniphest] [Changed Subscribers] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread jcrespo
jcrespo added subscribers: Manuel, aaron.jcrespo added a comment. @Manuel, @daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on the pool of connections, which means half of the connections are reserved but doing nothing, so you are limiting the master

[Wikidata-bugs] [Maniphest] [Updated] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread jcrespo
jcrespo removed projects: netops, Operations. TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, Manuel, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv

[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread jcrespo
jcrespo added a comment. max_connections is 5000, maximum active threads is 32 enforced on the connection pool. No connections should be open that are idle, and a typical connection should take less than 1 second, otherwise it has the risk of getting killed by the watchdog looking for idle

[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-30 Thread jcrespo
jcrespo added a comment. Another example of why long running connections are a problem: I am depooling es1017 for important maintenance, I have depooled it, so I expect connections so finish within a few seconds, with the exception of wikiadmin's known long running queries, but I just

[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-30 Thread jcrespo
jcrespo added a comment. I also do not want you to make you work more than necessary. If you only need 1000 rows, and it contains no private data, I can give you access to a misc server shared with other resources, no need to have a dedicated server.TASK DETAILhttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-30 Thread jcrespo
jcrespo added a comment. Hm, these are both job runners, jobs (probably) shouldn't run for so long. I wonder what's causing this. Separate issue then, but heads up for it.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-01-23 Thread jcrespo
jcrespo added a comment. Storage is not a problem. I wonder what is the impact in IO activity (write QPS). Could we separate usage tracking to a different set of servers? This table(s) are probably very dynamic, but also probably not 100% in sync with the content edits (handled on asynchronous

[Wikidata-bugs] [Maniphest] [Updated] T151717: Usage tracking: record which statement group is used

2017-01-24 Thread jcrespo
jcrespo added a comment. how would we generate the kind of estimates you would need in order to sign off on this type of change? Measure the QPS writes/rows written/percentage of write IOPS you have now, evaluate what is the increase with the new method, and scale with a worst-case-scenario

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-01-24 Thread jcrespo
jcrespo added a comment. To clarify, I am not saying it should be one way or another, what I am asking is: measure the write load impact Have into account both options, and be aware of them (e.g. maybe it is now worth it now, but we can prepare things so if it is needed in the future, we do not

[Wikidata-bugs] [Maniphest] [Unblock] T150182: Deploy Cognate extension to production

2017-01-26 Thread jcrespo
jcrespo closed subtask T148988: Cognate DB review as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T150182EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Meno25, Yair_rand, Addshore, Aklapper, Lydia_Pintscher, Urbanecm, D3r1

[Wikidata-bugs] [Maniphest] [Closed] T148988: Cognate DB review

2017-01-26 Thread jcrespo
jcrespo closed this task as "Resolved".jcrespo claimed this task.jcrespo added a comment. Yes, no major problem in the current state.TASK DETAILhttps://phabricator.wikimedia.org/T148988EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespo

[Wikidata-bugs] [Maniphest] [Created] T156638: Large amount of "DatabaseMysqlBase::lock failed to acquire lock 'wikidatawiki-activeusers'" timeout-errors on wikidata since 2017-01-30 8:05 UTC

2017-01-30 Thread jcrespo
jcrespo created this task.jcrespo added projects: Wikimedia-log-errors, Wikidata.Herald added a subscriber: Aklapper. TASK DESCRIPTIONI am not sure if this is wikidata or deployment related: https://logstash.wikimedia.org/goto/29bb2ebcf0ccd9e90e4f7773fdc666dd They are caused by a job, but no

[Wikidata-bugs] [Maniphest] [Updated] T111535: Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks

2017-03-02 Thread jcrespo
jcrespo added subscribers: ArielGlenn, Zppix.jcrespo merged a task: T139636: Wikidata Database contention under high edit rate. TASK DETAILhttps://phabricator.wikimedia.org/T111535EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Zppix, ArielGlenn

[Wikidata-bugs] [Maniphest] [Merged] T139636: Wikidata Database contention under high edit rate

2017-03-02 Thread jcrespo
jcrespo closed this task as a duplicate of T111535: Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks. TASK DETAILhttps://phabricator.wikimedia.org/T139636EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc

[Wikidata-bugs] [Maniphest] [Commented On] T111535: Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks

2017-03-02 Thread jcrespo
jcrespo added a comment. See merged ticket, this happened again when 300-400 new pages per minute were being created, with 35 parallel threads.TASK DETAILhttps://phabricator.wikimedia.org/T111535EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Zppix

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate feasibility of adding a column for full entity ID to wb_terms

2017-03-06 Thread jcrespo
jcrespo added a comment. @Marostegui I think they do not yet want it done yet, but an ok from us/review. But they should probably clarify that. "feasibility" is an ambiguous term.TASK DETAILhttps://phabricator.wikimedia.org/T159718EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. Evaluate if it is feasible to add such an "empty" column without making Wikidata readonly. we can probably do it How certain are you? In my experience, the biggest blocker on production is not the size, but how busy the table is. That would create metadata lock

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. If we depool the slaves we should be fine, shouldn't we? And if we use the DC switchover to alter the masters we'd also get rid of that issue? Hey, don't tell me, tell @WMDE-leszek, and see if he is ok with that schedule. :-)TASK DETAILhttps://phabricato

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. Depending on the answer to this, we will plan further steps. I think you should add the full plan here ASAP, even if it is not 100% clear or decided, otherwise we may be adding steps to the process and make it unnecessarily long. E.g. if you plan to add an index later, it

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. Ok, now I have some comments against that method, logistically, I am at a meeting, let me finish it and I will have some time to properly explain myself (nothing against the spirit of the changes, I would just do it in a different way, if code can handle it).TASK

[Wikidata-bugs] [Maniphest] [Commented On] T159718: Evaluate if it is possbile to add an empty column for full entity ID to wb_terms without affecting wikidata.org users

2017-03-06 Thread jcrespo
jcrespo added a comment. So the comments: do not defer the creation of the indexes- those are extra alter tables and do no make things easier in any way- just create the indexes from the start- assuming they will be used. Renaming columns is a big no- specially to an already existent name

[Wikidata-bugs] [Maniphest] [Created] T160887: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user

2017-03-20 Thread jcrespo
jcrespo created this task.jcrespo added projects: Wikidata, DBA.Herald added a subscriber: Aklapper. TASK DESCRIPTIONFor example, I found on db1070 2 long running queries: Server Connection User Client Database Time db1070 51133099 wikiuser mw1256 wikidatawiki 19h SELECT /* Wikibase\Repo

[Wikidata-bugs] [Maniphest] [Commented On] T151993: Implement ChangeDispatchCoordinator based on RedisLockManager

2017-03-27 Thread jcrespo
jcrespo added a comment. It just occurred to me an extra reason to avoid using a db master- master failover is a relative frequent operation: it will happen every time the master mysql is upgraded, or when there is a datacenter failover (2 of those will happen on April/May)- probably there wasn&#

[Wikidata-bugs] [Maniphest] [Commented On] T160887: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user

2017-03-28 Thread jcrespo
jcrespo added a comment. This is ongoing right now, for example: db1082 307726822 wikiuser mw1248 wikidatawiki 4m SELECT /* Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm */ page_title AS `entity_id_serialization` FROM `page` LEFT JOIN `wb_terms` ON

[Wikidata-bugs] [Maniphest] [Commented On] T159828: Use redis-based lock manager for dispatchChanges on test sites.

2017-03-29 Thread jcrespo
jcrespo added a comment. Sorry about this- you are not the only "sufferers" of beta not being a reliable place for testing in a truly distributed fashion- we were just discussing this on IRC. I also support a test on test, and offer my help if I can provide it. Thanks again for

[Wikidata-bugs] [Maniphest] [Commented On] T160887: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user

2017-04-04 Thread jcrespo
jcrespo added a comment. Thank you very much for working on this- do you have an estimation on when this will be fully deployed?TASK DETAILhttps://phabricator.wikimedia.org/T160887EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot

[Wikidata-bugs] [Maniphest] [Commented On] T160887: Wikibase\Repo\Store\Sql\SqlEntitiesWithoutTermFinder::getEntitiesWithoutTerm can take 19 hours to execute and it is run by the web requests user

2017-04-04 Thread jcrespo
jcrespo added a comment. Thank you again!TASK DETAILhttps://phabricator.wikimedia.org/T160887EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, Lydia_Pintscher, thiemowmde, Marostegui, aude, hoo, daniel, Aklapper, jcrespo, QZanden, Salgo60

[Wikidata-bugs] [Maniphest] [Updated] T159828: Use redis-based lock manager for dispatchChanges on test.wikidata.org

2017-04-07 Thread jcrespo
jcrespo added a comment. As a small side note- that can also happen on mysql. Despite locks being released on session disconnection, there has been some occasions where the mysql session is not killed (it continuous), but the thread on mediawiki has been. There are several known bugs about that

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. So that we are not a blocker- creation of tables in production, specially if we have areadly given the OK to the plans, is not considered a schema change, so anyone with production rights can do it- you just need to mark it on the deployments calendar. Wis this we (DBAs

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. To clarify- it may be blocked on us right now to create the database and because labs filtering is not well managed, but the general idea stays for normal table creations.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. This sentence actually confused me x1 == extension1 :-)TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, Marostegui, Lydia_Pintscher, Aklapper, Lea_Lacroix_WMDE

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. @Addshore I hope labs access is not a blocker for this, that can be done at a later date.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Liuxinyu970226, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-07 Thread jcrespo
jcrespo added a comment. production hosts Was thinking on dbstore (backup) hosts, which were problematic (remember you where the ones to set up those last time) + private table filtering. x1 has been traditionally not replicated to labs, this can be challenging (I would start by not replicationg

[Wikidata-bugs] [Maniphest] [Commented On] T162252: Create SQL database and Tables for Cognate extension to be used on Wiktionaries

2017-04-12 Thread jcrespo
jcrespo added a comment. I would ask Addshore to confirm by running SELECT on the empty tables from terbium/tin, etc, using mediawiki scripts.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Marostegui

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-04-12 Thread jcrespo
jcrespo added a comment. I did not understand your last comment, is the previous patch invalid? Do you have another patch to show me?TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Halfak, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-04-12 Thread jcrespo
jcrespo added a comment. To clarify, I have to be specially strict in this particular case because in the past, wbc_entity_usage (with the exception of linksupdate job) was a large point of contention and a major cause of lag, and this ticket starts by saying: we'd write a lot (?) more ro

[Wikidata-bugs] [Maniphest] [Commented On] T151717: Usage tracking: record which statement group is used

2017-04-14 Thread jcrespo
jcrespo added a comment. We want to collect additional information on one of these wikis for a while If that doesn't involve a schema change, sure.TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcre

[Wikidata-bugs] [Maniphest] [Created] T163544: Wikibase\TermSqlIndex::getMatchingTerms , Wikibase\TermSqlIndex::fetchTerms have bad performance after codfw failover

2017-04-21 Thread jcrespo
jcrespo created this task.jcrespo added a project: Wikidata. TASK DESCRIPTIONthese 2 queries were among the most expensive on datacenter failover or after it- while it is normal to have lower performance than usual due to colder caches, most likely they are surfacing issues with improvements. It

[Wikidata-bugs] [Maniphest] [Edited] T163544: Wikibase\TermSqlIndex::getMatchingTerms , Wikibase\TermSqlIndex::fetchTerms have bad performance after codfw failover

2017-04-21 Thread jcrespo
jcrespo edited the task description. (Show Details) EDIT DETAILSthese 2 queries were among the most expensive on datacenter failover or after it- while it is normal to have lower performance than usual due to colder caches, most likely they are surfacing existing issues with improvements, which

[Wikidata-bugs] [Maniphest] [Commented On] T163544: Wikibase\TermSqlIndex::getMatchingTerms , Wikibase\TermSqlIndex::fetchTerms have bad performance after codfw failover

2017-04-21 Thread jcrespo
jcrespo added a comment. I was thinking on https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_eq_range_index_dive_limit but that is probably more wishful thinking than practical give we are on MariaDB 10.TASK DETAILhttps://phabricator.wikimedia.org/T163544EMAIL

[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-04-24 Thread jcrespo
jcrespo edited projects, added MediaWiki-Database; removed DBA.jcrespo added a comment. That is the max lag, and it is normal on the slaves that are not waited by mediawiki. This issue has nothing to do with databases, mediawiki does what it is programmed to do: if it detects lag even if a few

[Wikidata-bugs] [Maniphest] [Commented On] T163551: Huge number of duplicate rows in wb_terms

2017-04-28 Thread jcrespo
jcrespo added a comment. This is a bit offtopic to T163551 but with the latest schema changes, wb_terms has become the largest table on a wiki (with the exception of revision on enwiki and image on commons)- and I think it will get bigger once the new column (I assume) gets populated with actual

[Wikidata-bugs] [Maniphest] [Updated] T86530: Replace wb_terms table with more specialized mechanisms for terms (tracking)

2017-04-28 Thread jcrespo
jcrespo added a comment. I may have done this comment on the wrong ticket: T163551#3221748TASK DETAILhttps://phabricator.wikimedia.org/T86530EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, Ricordisamoa, Lydia_Pintscher, adrianheine

[Wikidata-bugs] [Maniphest] [Updated] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread jcrespo
jcrespo added a comment. So do you think this had something to do with reports like T123867 T164191? This is highly surprising- I was expecting low to no master or replication performance impact, but zero is highly suspicious. Was this expected? Couldn't this be related to a bug on monitorin

[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread jcrespo
jcrespo added a comment. With the current state, we still have the same amount of connections to the master DBs, but we don't use GET_LOCK etc. on them anymore. And that for me is a huge win alone.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T162539: Deploy schema change for adding term_full_entity_id column to wb_terms table

2017-05-14 Thread jcrespo
jcrespo added a comment. @aude: don't run update.php on s3 for altering a table- you will create lag on 900 wikis unless connections are cleared and table is pre-warmed-up (and appropiately tested). Also, the only wiki mentioned here was wikidatawiki, you have to create a separate reques

[Wikidata-bugs] [Maniphest] [Commented On] T165246: Add term_full_entity_id column to wb_terms table on testwikidatawiki

2017-05-16 Thread jcrespo
jcrespo added a comment. The problem usually is not the alter size, but the metadata locking, which creates way more contention.TASK DETAILhttps://phabricator.wikimedia.org/T165246EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Marostegui, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T163551: Huge number of duplicate rows in wb_terms

2017-05-19 Thread jcrespo
jcrespo added a comment. To clarify- reads on a slave are not a big concern for MySQL- of course, if you get in the end better latency, that is cool (and I normally ping because it means there is an inefficiency that could be solved); but reads are easy to scale in the large order of things ("

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-05-26 Thread jcrespo
jcrespo added a comment. While contention is bad in general- it is the opposite of lag- more contention would create less lag - of course it could be a common source: large updates causing contention on the master, and then lag because the transaction size is large. I wonder if instead of fixing

[Wikidata-bugs] [Maniphest] [Created] T169336: slow master queries on Wikibase\Client\Usage\Sql\EntityUsageTable::getAffectedRowIds

2017-07-01 Thread jcrespo
jcrespo created this task.jcrespo added projects: Wikidata, Performance, DBA.Herald added a subscriber: Aklapper. TASK DESCRIPTIONThe following query was detected running on a master database: Host User Schema Client Source Thread Transaction Runtime Stamp db1063 wikiuser wikidatawiki

[Wikidata-bugs] [Maniphest] [Commented On] T169336: slow master queries on Wikibase\Client\Usage\Sql\EntityUsageTable::getAffectedRowIds

2017-07-05 Thread jcrespo
jcrespo added a comment. This is still ongoing with 15-minute queries. I am going to setup a task to kill all related queries on s5-master to prevent a potential outage of dewiki and wikidata writesTASK DETAILhttps://phabricator.wikimedia.org/T169336EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T169336: slow master queries on Wikibase\Client\Usage\Sql\EntityUsageTable::getAffectedRowIds

2017-07-05 Thread jcrespo
jcrespo added a comment. I've setup a temporary watchdog on the s5 master: pt-kill F=/dev/null --socket=/tmp/mysql.sock --print --kill --victims=all --match-info="EntityUsageTable" --match-db=wikidatawiki --match-user=wikiuser --busy-time=1 This will mitigate for now the clo

[Wikidata-bugs] [Maniphest] [Updated] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-07-06 Thread jcrespo
jcrespo added a comment. Probably related: T169884TASK DETAILhttps://phabricator.wikimedia.org/T164173EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Krinkle, aaron, MZMcBride, daniel, Ladsgroup, hoo, Marostegui, Aklapper, jcrespo

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-07-07 Thread jcrespo
jcrespo added a comment. I also wonder why some of those log warnings come from close() and others have the proper commitMasterChanges() bit in the stack trace. Normally, there should be nothing to commit by close() and it is just commits for sanity. We were theorizing the other day on IRC that

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata database locked

2017-07-28 Thread jcrespo
jcrespo added a comment. Database crashed, it should be ok to edit now.TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, greg, Mbch331, Smalyshev, MisterSynergy, TerraCodes, Jay8g

[Wikidata-bugs] [Maniphest] [Claimed] T171928: Wikidata database locked

2017-07-28 Thread jcrespo
jcrespo claimed this task.jcrespo added a comment. Investigation is not over, here is what we have found out for now of the causes: https://wikitech.wikimedia.org/wiki/Incident_documentation/20170728-s5_(WikiData_and_dewiki)_read-onlyTASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL

[Wikidata-bugs] [Maniphest] [Retitled] T171928: Wikidata and dewiki databases locked

2017-07-28 Thread jcrespo
jcrespo renamed this task from "Wikidata database locked" to "Wikidata and dewiki databases locked". TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Joe, jcrespo, greg

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. I've almost finished the above incident documentation. However, I am unsure about which are the right actionables and their priorities (last section). let's use this ticket to agree on what would be the best followup, a) making puppet change read-only state

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. I have started working on more complete monitoring, useful if we go over the route of human monitoring rather than automation, here is one example: $ ./check_mariadb.py --icinga -h db1052.eqiad.wmnet --check_read_only=0 Version 10.0.28-MariaDB, Uptime 16295390s, read_only

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. Wikidata goes into read-only the subscriptions mentioned Yes, definitely some extensions in the past do not behave perfectly and do not respect mediawiki's read-only mode- I do not know what is the sate of Wikidata, but for what you say, a ticket should be filed s

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-03 Thread jcrespo
jcrespo added a comment. $ check_mariadb.py -h db1052 --slave-status --primary-dc=eqiad {"datetime": 1501777331.898183, "ssl_expiration": 1619276854.0, "connection": "ok", "connection_latency": 0.07626748085021973, "ssl": true, "to

[Wikidata-bugs] [Maniphest] [Updated] T171928: Wikidata and dewiki databases locked

2017-08-04 Thread jcrespo
jcrespo added a subtask: T172489: Monitor read_only variable and/or uptime on atabase masters, make it page. TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, mark, Marostegui, Elitre

[Wikidata-bugs] [Maniphest] [Updated] T171928: Wikidata and dewiki databases locked

2017-08-04 Thread jcrespo
jcrespo added a subtask: T172490: Monitor swap/memory usage on databases. TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, mark, Marostegui, Elitre, Joe, jcrespo, greg, Mbch331

[Wikidata-bugs] [Maniphest] [Closed] T171928: Wikidata and dewiki databases locked

2017-08-04 Thread jcrespo
jcrespo closed this task as "Resolved".jcrespo added a comment. I have created all actionables on both the incident documentation ( https://wikitech.wikimedia.org/wiki/Incident_documentation/20170728-s5_(WikiData_and_dewiki)_read-only ) and phabricator- consequently, I have closed this

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-09 Thread jcrespo
jcrespo added a comment. I've been told that several thousands of UPDATES Title::invalidateCache per second had caused trouble on s7 over the night, not sure if this is related: https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?orgId=1&var-dc=eqiad%20prometheus%2Fops&

[Wikidata-bugs] [Maniphest] [Commented On] T164173: Cache invalidations coming from the JobQueue are causing lag on several wikis

2017-08-10 Thread jcrespo
jcrespo added a comment. To avoid the continuous lagging on non-directly pooled hosts (passive dc codfw, labs, other hosts replicating on a second tier), I have forced a slowdown of writes to go at the pace of the slowest slaves of eqiad with semisync replication, adding automaticaly a pause of up

[Wikidata-bugs] [Maniphest] T48643: [Story] Dispatching via job queue (instead of cron script)

2021-10-01 Thread jcrespo
jcrespo added a comment. At 16:02-16:06, which would fit with the deployment of https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/725019/ logstash baseline number of messages increased around a 50%: https://grafana.wikimedia.org/d/00561/logstash?orgId=1&viewPan

[Wikidata-bugs] [Maniphest] T186716: enable fine grained usage tracking on Commons

2021-10-05 Thread jcrespo
jcrespo added a subscriber: Kormat. jcrespo added a comment. > the DBAs to approve this That should be @Kormat and/or @Marostegui (he is on vacations right now). TASK DETAIL https://phabricator.wikimedia.org/T186716 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/pa

[Wikidata-bugs] [Maniphest] T186716: enable fine grained usage tracking on Commons

2021-10-05 Thread jcrespo
jcrespo added a project: Data-Persistence (Consultation). TASK DETAIL https://phabricator.wikimedia.org/T186716 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: Kormat, Michael, Ladsgroup, Marostegui, RP88, Mike_Peel, Aklapper

[Wikidata-bugs] [Maniphest] T294355: Several Wikidata Grafana boards missing data before October 2021

2021-11-24 Thread jcrespo
jcrespo added projects: bacula, Data-Persistence-Backup, Data-Persistence. jcrespo added a comment. number of files are (within reason) a non-blocker for bacula, as files are packaged into volumes. It is true that each file is stored as a mysql record, but that should be able to scale until

[Wikidata-bugs] [Maniphest] T294355: Several Wikidata Grafana boards missing data before October 2021

2021-11-25 Thread jcrespo
jcrespo added a comment. One more question, to finally decide if setting up weekly full backups or daily but incremental- do all files mostly change completely, or only a subset of them? Incrementals are able to be done with file granularity only (it will backup fully files as long as its

[Wikidata-bugs] [Maniphest] T294355: Several Wikidata Grafana boards missing data before October 2021

2021-11-26 Thread jcrespo
jcrespo added a comment. I don't have the answer to that question, but whenever any of you have the servers and path(s), you can follow the instructions at https://wikitech.wikimedia.org/wiki/Bacula#Adding_a_new_client to send a preliminary backup proposal to Puppet, and I will assis

[Wikidata-bugs] [Maniphest] T294355: Several Wikidata Grafana boards missing data before October 2021

2021-12-10 Thread jcrespo
jcrespo added a comment. Let me give it a deeper look, while the patch by itself looks good as is, I want to check if a different (non-default) backup policy would be more advantageous in frequency and space. :-) TASK DETAIL https://phabricator.wikimedia.org/T294355 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T294355: Several Wikidata Grafana boards missing data before October 2021

2021-12-13 Thread jcrespo
jcrespo added a comment. Running Jobs: Console connected using TLS at 13-Dec-21 09:20 JobId Type Level Files Bytes Name Status == 396417 Back Full 4,568412.9 M graphite1004

[Wikidata-bugs] [Maniphest] T294355: Several Wikidata Grafana boards missing data before October 2021

2021-12-13 Thread jcrespo
jcrespo added a comment. Terminated Jobs: JobId Level FilesBytes Status FinishedName 396417 Full 108,32011.70 G OK 13-Dec-21 09:34 graphite1004.eqiad.wmnet-Weekly-Mon

[Wikidata-bugs] [Maniphest] T307586: wbsearchentities produces no results on 1.39.0-wmf.10

2022-05-04 Thread jcrespo
jcrespo added a comment. In addition to the fix/rollback- could some integration test or heuristic production monitoring also be able to be implemented, for future faster detection? TASK DETAIL https://phabricator.wikimedia.org/T307586 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T307586: wbsearchentities produces no results on 1.39.0-wmf.10

2022-05-04 Thread jcrespo
jcrespo updated the task description. jcrespo added a project: User-notice. TASK DETAIL https://phabricator.wikimedia.org/T307586 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: jcrespo, Raymond, Moebeus, Lucas_Werkmeister_WMDE, Aklapper

[Wikidata-bugs] [Maniphest] T307586: wbsearchentities produces no results on 1.39.0-wmf.10

2022-05-04 Thread jcrespo
jcrespo added a comment. Thanks for such a quick reaction, BTW. TASK DETAIL https://phabricator.wikimedia.org/T307586 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: brennen, hashar, jcrespo, Raymond, Moebeus, Lucas_Werkmeister_WMDE

[Wikidata-bugs] [Maniphest] T307586: wbsearchentities produces no results on 1.39.0-wmf.10

2022-05-05 Thread jcrespo
jcrespo added a comment. In T307586#7908045 <https://phabricator.wikimedia.org/T307586#7908045>, @Quiddity wrote: > For Tech News purposes, how should this entry be described? IIUC from the description, something like this? > >> There was a problem with

[Wikidata-bugs] [Maniphest] T323096: WDQS Data Reload

2023-01-11 Thread jcrespo
jcrespo added a comment. Sorry if it is the wrong ticket, but several services of wdqs2010, wdqs2011 and wdqs2012 are alerting. The sevice is returnin 400 commands. My guess is this is due to this ongoing data reload (no issue). If that is the case, could the alerts "WDQS SPARQL"

[Wikidata-bugs] [Maniphest] T323096: WDQS Data Reload

2023-01-11 Thread jcrespo
jcrespo added a comment. I was told by @Gehel that it was unrelated to this, but related to T301167 <https://phabricator.wikimedia.org/T301167>. Sorry for the confussion. TASK DETAIL https://phabricator.wikimedia.org/T323096 EMAIL PREFERENCES https://phabricator.wikimedia.org/se

[Wikidata-bugs] [Maniphest] T301167: Service implementation for wdqs20[09,10,11,12]

2023-01-11 Thread jcrespo
jcrespo added a comment. One tip to avoid having people on call (like me) worrying about pending implementation services is to add the hiera key `profile::monitoring::notifications_enabled: false`. This is not promoted much because most people handle stateless services that are easy and

[Wikidata-bugs] [Maniphest] T272571: Updating's Wikidata property suggester caused replica lag on all wikidata databases

2021-01-21 Thread jcrespo
jcrespo created this task. jcrespo added projects: Wikidata, MediaWiki-extensions-PropertySuggester, Wikimedia-production-error. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION We got an alert on `#wikimedia-databases` IRC saying: PROBLEM - MariaDB sustained

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-01-26 Thread jcrespo
jcrespo added a subscriber: Marostegui. jcrespo added a comment. Handover to @Marostegui for him to comment, as he will be the person to know if this continues happening or now. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T276762: Wikibase\Lib\Store\Sql\SiteLinkTable::getLinks can take over a minute to execute

2021-03-08 Thread jcrespo
jcrespo created this task. jcrespo added projects: Wikidata, Wikibase. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION This weekend, while an ongoing incident was being handled, I checked and saw several badly performing queries running. These didn't have (I believe

[Wikidata-bugs] [Maniphest] T276762: Wikibase\Lib\Store\Sql\SiteLinkTable::getLinks can take over a minute to execute

2021-03-08 Thread jcrespo
jcrespo updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T276762 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: Marostegui, hoo, Aklapper, jcrespo, maantietaja, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] T276762: Wikibase\Lib\Store\Sql\SiteLinkTable::getLinks can take over a minute to execute

2021-03-18 Thread jcrespo
jcrespo added a comment. Offtopic- and feel free to PM in private. What is a good way to report database-related issues to wikidata development team? I am a bit intimidated by the amount of tags and dashboards (which will probably reflect your internal organization, but I am not too

[Wikidata-bugs] [Maniphest] T281480: Cannot access the database: Too many connections

2021-04-29 Thread jcrespo
jcrespo triaged this task as "Unbreak Now!" priority. jcrespo added a comment. This should be a blocker- es traffic has grown almost grown 100x since 14 april, correlates strongly with the 19h deploy: F34434387: es_issue.png <https://phabricator.wikimedia.org/F34434387&

<    1   2   3   4   5   6