jcrespo added a comment.
It just occurred to me an extra reason to avoid using a db master- master failover is a relative frequent operation: it will happen every time the master mysql is upgraded, or when there is a datacenter failover (2 of those will happen on April/May)- probably there wasn't
jcrespo added a comment.
Sorry about this- you are not the only "sufferers" of beta not being a reliable place for testing in a truly distributed fashion- we were just discussing this on IRC. I also support a test on test, and offer my help if I can provide it. Thanks again f
jcrespo created this task.jcrespo added projects: Wikidata, DBA.Herald added a subscriber: Aklapper.
TASK DESCRIPTIONFor example, I found on db1070 2 long running queries:
Server Connection User Client Database Time
db1070 51133099 wikiuser mw1256 wikidatawiki 19h
SELECT /* Wikibase\Repo
jcrespo added a comment.
Thank you very much for working on this- do you have an estimation on when this will be fully deployed?TASK DETAILhttps://phabricator.wikimedia.org/T160887EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot
jcrespo added a comment.
Thank you again!TASK DETAILhttps://phabricator.wikimedia.org/T160887EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, Lydia_Pintscher, thiemowmde, Marostegui, aude, hoo, daniel, Aklapper, jcrespo, QZanden, Salgo60
jcrespo added a comment.
I would ask Addshore to confirm by running SELECT on the empty tables from terbium/tin, etc, using mediawiki scripts.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Marostegui
jcrespo added a comment.
I did not understand your last comment, is the previous patch invalid? Do you have another patch to show me?TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Halfak, jcrespo
jcrespo added a comment.
To clarify, I have to be specially strict in this particular case because in the past, wbc_entity_usage (with the exception of linksupdate job) was a large point of contention and a major cause of lag, and this ticket starts by saying: we'd write a lot (?) more rows
jcrespo added a comment.
As a small side note- that can also happen on mysql. Despite locks being released on session disconnection, there has been some occasions where the mysql session is not killed (it continuous), but the thread on mediawiki has been. There are several known bugs about
jcrespo added a comment.
We want to collect additional information on one of these wikis for a while
If that doesn't involve a schema change, sure.TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc
jcrespo created this task.jcrespo added a project: Wikidata.
TASK DESCRIPTIONthese 2 queries were among the most expensive on datacenter failover or after it- while it is normal to have lower performance than usual due to colder caches, most likely they are surfacing issues with improvements
jcrespo edited the task description. (Show Details)
EDIT DETAILSthese 2 queries were among the most expensive on datacenter failover or after it- while it is normal to have lower performance than usual due to colder caches, most likely they are surfacing existing issues with improvements, which
jcrespo added a comment.
I was thinking on https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_eq_range_index_dive_limit but that is probably more wishful thinking than practical give we are on MariaDB 10.TASK DETAILhttps://phabricator.wikimedia.org/T163544EMAIL
jcrespo added subscribers: ArielGlenn, Zppix.jcrespo merged a task: T139636: Wikidata Database contention under high edit rate.
TASK DETAILhttps://phabricator.wikimedia.org/T111535EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Zppix, ArielGlenn
jcrespo closed this task as a duplicate of T111535: Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks.
TASK DETAILhttps://phabricator.wikimedia.org/T139636EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc
jcrespo added a comment.
See merged ticket, this happened again when 300-400 new pages per minute were being created, with 35 parallel threads.TASK DETAILhttps://phabricator.wikimedia.org/T111535EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Zppix
jcrespo added a comment.
Depending on the answer to this, we will plan further steps.
I think you should add the full plan here ASAP, even if it is not 100% clear or decided, otherwise we may be adding steps to the process and make it unnecessarily long. E.g. if you plan to add an index later
jcrespo added a comment.
Evaluate if it is feasible to add such an "empty" column without making Wikidata readonly.
we can probably do it
How certain are you? In my experience, the biggest blocker on production is not the size, but how busy the table is. That would create metadata lock
jcrespo added a comment.
@Marostegui I think they do not yet want it done yet, but an ok from us/review. But they should probably clarify that. "feasibility" is an ambiguous term.TASK DETAILhttps://phabricator.wikimedia.org/T159718EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett
jcrespo added a comment.
If we depool the slaves we should be fine, shouldn't we? And if we use the DC switchover to alter the masters we'd also get rid of that issue?
Hey, don't tell me, tell @WMDE-leszek, and see if he is ok with that schedule. :-)TASK DETAILhttps://phabricator.wikimedia.org
jcrespo added a comment.
Ok, now I have some comments against that method, logistically, I am at a meeting, let me finish it and I will have some time to properly explain myself (nothing against the spirit of the changes, I would just do it in a different way, if code can handle it).TASK
jcrespo added a comment.
So the comments:
do not defer the creation of the indexes- those are extra alter tables and do no make things easier in any way- just create the indexes from the start- assuming they will be used.
Renaming columns is a big no- specially to an already existent name
jcrespo added a comment.
@Addshore I hope labs access is not a blocker for this, that can be done at a later date.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Liuxinyu970226, jcrespo
jcrespo added a comment.
production hosts
Was thinking on dbstore (backup) hosts, which were problematic (remember you where the ones to set up those last time) + private table filtering. x1 has been traditionally not replicated to labs, this can be challenging (I would start by not replicationg
jcrespo added a comment.
So that we are not a blocker- creation of tables in production, specially if we have areadly given the OK to the plans, is not considered a schema change, so anyone with production rights can do it- you just need to mark it on the deployments calendar. Wis this we (DBAs
jcrespo added a comment.
This sentence actually confused me
x1 == extension1 :-)TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, Marostegui, Lydia_Pintscher, Aklapper, Lea_Lacroix_WMDE
jcrespo added a comment.
To clarify- it may be blocked on us right now to create the database and because labs filtering is not well managed, but the general idea stays for normal table creations.TASK DETAILhttps://phabricator.wikimedia.org/T162252EMAIL PREFERENCEShttps
jcrespo added a comment.
$ check_mariadb.py -h db1052 --slave-status --primary-dc=eqiad
{"datetime": 1501777331.898183, "ssl_expiration": 1619276854.0, "connection": "ok", "connection_latency": 0.07626748085021973, "ssl": true, "to
jcrespo added a subtask: T172490: Monitor swap/memory usage on databases.
TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, mark, Marostegui, Elitre, Joe, jcrespo, greg, Mbch331
jcrespo added a subtask: T172489: Monitor read_only variable and/or uptime on atabase masters, make it page.
TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: gerritbot, mark, Marostegui, Elitre
jcrespo added a comment.
I've been told that several thousands of UPDATES Title::invalidateCache per second had caused trouble on s7 over the night, not sure if this is related:
https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?orgId=1=eqiad%20prometheus%2Fops=7=1502261251242
jcrespo added a comment.
To avoid the continuous lagging on non-directly pooled hosts (passive dc codfw, labs, other hosts replicating on a second tier), I have forced a slowdown of writes to go at the pace of the slowest slaves of eqiad with semisync replication, adding automaticaly a pause of up
jcrespo added subscribers: Manuel, jcrespo.jcrespo added a comment.
Yes, a dedicated database service for this could make sense to separate writes if tracking changes starts taking most of the database writes. I suggested this as a means to offload some of the load if it starts to become a problem
jcrespo added a comment.
Database crashed, it should be ok to edit now.TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, greg, Mbch331, Smalyshev, MisterSynergy, TerraCodes, Jay8g
jcrespo claimed this task.jcrespo added a comment.
Investigation is not over, here is what we have found out for now of the causes:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20170728-s5_(WikiData_and_dewiki)_read-onlyTASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL
jcrespo renamed this task from "Wikidata database locked" to "Wikidata and dewiki databases locked".
TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Joe, jcrespo, greg
jcrespo created this task.jcrespo added projects: Wikidata, Performance, DBA.Herald added a subscriber: Aklapper.
TASK DESCRIPTIONThe following query was detected running on a master database:
Host User Schema Client Source Thread Transaction Runtime Stamp
db1063 wikiuser wikidatawiki
jcrespo added a comment.
I've setup a temporary watchdog on the s5 master:
pt-kill F=/dev/null --socket=/tmp/mysql.sock --print --kill --victims=all --match-info="EntityUsageTable" --match-db=wikidatawiki --match-user=wikiuser --busy-time=1
This will mitigate for now the close-to
jcrespo added a comment.
This is still ongoing with 15-minute queries. I am going to setup a task to kill all related queries on s5-master to prevent a potential outage of dewiki and wikidata writesTASK DETAILhttps://phabricator.wikimedia.org/T169336EMAIL PREFERENCEShttps
jcrespo added a comment.
Probably related: T169884TASK DETAILhttps://phabricator.wikimedia.org/T164173EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Krinkle, aaron, MZMcBride, daniel, Ladsgroup, hoo, Marostegui, Aklapper, jcrespo
jcrespo added a comment.
I also wonder why some of those log warnings come from close() and others have the proper commitMasterChanges() bit in the stack trace. Normally, there should be nothing to commit by close() and it is just commits for sanity.
We were theorizing the other day on IRC
jcrespo edited projects, added MediaWiki-Database; removed DBA.jcrespo added a comment.
That is the max lag, and it is normal on the slaves that are not waited by mediawiki. This issue has nothing to do with databases, mediawiki does what it is programmed to do: if it detects lag even if a few
jcrespo added a comment.
I may have done this comment on the wrong ticket: T163551#3221748TASK DETAILhttps://phabricator.wikimedia.org/T86530EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: jcrespo, Ricordisamoa, Lydia_Pintscher, adrianheine
jcrespo added a comment.
This is a bit offtopic to T163551 but with the latest schema changes, wb_terms has become the largest table on a wiki (with the exception of revision on enwiki and image on commons)- and I think it will get bigger once the new column (I assume) gets populated with actual
jcrespo added a comment.
I have started working on more complete monitoring, useful if we go over the route of human monitoring rather than automation, here is one example:
$ ./check_mariadb.py --icinga -h db1052.eqiad.wmnet --check_read_only=0
Version 10.0.28-MariaDB, Uptime 16295390s, read_only
jcrespo added a comment.
I've almost finished the above incident documentation. However, I am unsure about which are the right actionables and their priorities (last section).
let's use this ticket to agree on what would be the best followup, a) making puppet change read-only state of the db
jcrespo added a comment.
Wikidata goes into read-only the subscriptions mentioned
Yes, definitely some extensions in the past do not behave perfectly and do not respect mediawiki's read-only mode- I do not know what is the sate of Wikidata, but for what you say, a ticket should be filed so its
jcrespo closed this task as "Resolved".jcrespo added a comment.
I have created all actionables on both the incident documentation ( https://wikitech.wikimedia.org/wiki/Incident_documentation/20170728-s5_(WikiData_and_dewiki)_read-only ) and phabricator- consequently, I have closed t
jcrespo added a comment.
@thcipriani @aaron I know something was done yesterday, (thank you!), may I ask for an update of the state, to know if the UBN is still active?TASK DETAILhttps://phabricator.wikimedia.org/T164173EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel
jcrespo raised the priority of this task from "High" to "Unbreak Now!".Herald added subscribers: Liuxinyu970226, Jay8g, TerraCodes.
TASK DETAILhttps://phabricator.wikimedia.org/T164173EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/T
jcrespo added a comment.
Setting as unbreak now because this is preventing collaborators from editing articles, which causes not only to slowdown volunteers, also causing large frustration among them: https://grafana.wikimedia.org/dashboard/db/wikidata-edits?refresh=1m=1=1=1502980683784
jcrespo updated the task description. (Show Details)
CHANGES TO TASK DESCRIPTION**Edit**: This used to be punctual at some point, now it is causing generalized edit slowdowns and inability to read recentchanges/watchlists on several wikis.
The most impacted ones are s2 and s7, but looks
jcrespo added a comment.
I think this is just UPDATE /* Title::invalidateCache */ based on the binlogs, not the above script.TASK DETAILhttps://phabricator.wikimedia.org/T173269EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: WMDE-leszek, jcrespo
jcrespo added a comment.
which means most probably a direct cause of T164173, getting worse?TASK DETAILhttps://phabricator.wikimedia.org/T173269EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: WMDE-leszek, jcrespo, Peachey88, Marostegui, Krenair
jcrespo raised the priority of this task from "Normal" to "Unbreak Now!".jcrespo added a comment.
The execution of:
www-data 20284 0.0 0.1 328216 60744 ?S15:30 0:00 php5 /srv/mediawiki-staging/multiversion/MWScript.php extensions/Wikidata/extensions/Wikib
jcrespo added a comment.
@aude: don't run update.php on s3 for altering a table- you will create lag
on 900 wikis unless connections are cleared and table is pre-warmed-up (and
appropiately tested).
Also, the only wiki mentioned here was wikidatawiki, you have to create a
separate request
jcrespo added a comment.
So do you think this had something to do with reports like T123867 T164191? This is highly surprising- I was expecting low to no master or replication performance impact, but zero is highly suspicious. Was this expected? Couldn't this be related to a bug on monitoring due
jcrespo added a comment.
With the current state, we still have the same amount of connections to the master DBs, but we don't use GET_LOCK etc. on them anymore.
And that for me is a huge win alone.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps
jcrespo added a comment.
The problem usually is not the alter size, but the metadata locking, which creates way more contention.TASK DETAILhttps://phabricator.wikimedia.org/T165246EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Marostegui, jcrespo
jcrespo added a comment.
While contention is bad in general- it is the opposite of lag- more contention would create less lag - of course it could be a common source: large updates causing contention on the master, and then lag because the transaction size is large.
I wonder if instead of fixing
jcrespo added a comment.
To clarify- reads on a slave are not a big concern for MySQL- of course, if you get in the end better latency, that is cool (and I normally ping because it means there is an inefficiency that could be solved); but reads are easy to scale in the large order of things ("
jcrespo added a comment.
@hoo Regarding Wikimedia setup, you must know that it is our priority right now to move wikidata to a dedicated server group; which means from ops side no other structural change can happen at the same time.
This is still needed (and doing it is blocked on code being
jcrespo added a comment.
@Catrope Unfortunately it will take quite a bit of time to research and investigate the index, and I'm not able to attend to this right now due to several other currently happening infrastructure fires going on. I already commented some possibilities and I am open to other
jcrespo added a comment.
Cool, get if you can some `SHOW TABLE STATUS like stats, to get the "before" state in bytes (even if it had started some time ago).TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailp
jcrespo added a comment.
We identify and delete duplicate rows (not trivial, but not difficult either), then we add a UNIQUE restriction over that combination of columns so that never happens again.TASK DETAILhttps://phabricator.wikimedia.org/T163551EMAIL PREFERENCEShttps
jcrespo added a comment.
Not sure if with "you", you mean me, but if it is safe, yes. We may have to defragment the table later to reclaim disk space, but that can be done later and it is not a blocker.TASK DETAILhttps://phabricator.wikimedia.org/T163551EMAIL PREFER
jcrespo added a comment.
I know- it is only related because the wikidata migration require replication channels movement and that consumes DBA time, not because it contains wikidata.TASK DETAILhttps://phabricator.wikimedia.org/T176273EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings
jcrespo added a comment.
I am a bit lost with the estimation- is that realistic, is the number of usages more or less right with your estimation, or is it going to be a quadratic _expression_ because we do not know accurately the number of statement usages per statement? If the estimations
jcrespo removed a project: Blocked-on-schema-change.jcrespo moved this task from Backlog to Blocked external/Not db team on the DBA board.jcrespo added a comment.
Add #blocked-on-schema-change, that way we will know it is ready :-)TASK DETAILhttps://phabricator.wikimedia.org/T177601WORKBOARDhttps
jcrespo created this task.jcrespo added projects: MediaWiki-Watchlist, DBA, Wikidata.Herald added subscribers: Liuxinyu970226, Jay8g, TerraCodes, Aklapper.
TASK DESCRIPTIONOnce rows stop from coming in (see parent task), this will likely either solve or mitigate performance issues.
The rows
jcrespo created subtask T12: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata).
TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences
jcrespo added a comment.
I forgot to say we suspect the same thing happens on other s3 hosts, but these 2 previous wikis create so many errors that it is difficult to say until we fix these 2.TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org
jcrespo added a comment.
What you write is ok, but IF you want our opinion, can you translate that into increase of row storage and inserts/other write activity compared to the full table size/previous state?TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps
jcrespo added a comment.
I have the feeling that these numbers could be meaningless on such small wikis, given the issues on recentchanges with only some large wikis such as commons and ruwiki. Could the same issue happen to those a x10 growth. I know the case is different, pages * edits, vs
jcrespo added a subscriber: brion.jcrespo added a comment.
you are essentially proposing to get rid of the recentchanges table altogether
No, I was asking to keep recentchanges as it was before, with the previous load/meaning and implement "new features" on separate tables, or on separa
jcrespo added a comment.
One last thought, it is early to say, but commonswiki seem to have stopped having insert spikes of 100x the normal rate: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=2=1=eqiad%20prometheus%2Fops=db1068=9104=now-7d=now (click "inserts") Somethi
jcrespo added a comment.
I have allowed for codfw to lag- so that we can go at around 500 deletes/s. That means the whole thing will take less than 3 hours. Shout if anyone see any strangeness on comonswiki (you shouldn't)- worse case scenario- kill the pt-archiver job on the screen session
jcrespo added a comment.
51.5M you meant, maybe?TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Bawolff, eranroz, Ottomata, PokestarFan, Ladsgroup, Stashbot, gerritbot, Halfak, jcrespo, TomT0m
jcrespo added a comment.
What I mean is that the number are ok to proceeed (not a big deal), but still worried for the large wikis. I know you do not have all the answers, I just was talking aloud.TASK DETAILhttps://phabricator.wikimedia.org/T151717EMAIL PREFERENCEShttps
jcrespo added a comment.
I am leaving a screen open on dbstore1002 loading a copy of recentchanges from commonswiki. Because of the size, it will take some time to be copied. Tomorrow I will test purging the table with something such as:
pt-archiver --source h=dbstore1002.eqiad.wmnet,D
jcrespo added a comment.
That means the whole thing will take less than 3 hours
I had a mind slip... we have to delete 60M rows, not 6M, that means 30 hours, not 3. I ran this for 6 hours, 10M rows were deleted. We will continue after the s4 maintenance tomorrow: T168661TASK DETAILhttps
jcrespo closed subtask T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several wikis as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T90435EMAIL PREFERENCEShttps://phabricator.wikimedi
jcrespo lowered the priority of this task from "Unbreak Now!" to "Normal".jcrespo added a comment.
Lowering priority unless we get more reports from other ruwiki or commons users.TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedi
jcrespo removed jcrespo as the assignee of this task.
TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: JEumerus, Alsee, awight, Matthewrbowker, Noella94, Nirmos, Stryn, Mike_Peel, Capankajsmilyo
jcrespo closed subtask T12: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) as "Resolved".
TASK DETAILhttps://phabricator.wikimedia.org/T171027EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett
jcrespo closed this task as "Resolved".jcrespo added a comment.
There are more things pending, like running optimize table on the non-rc replicas of eqiad or all of codfw, and checking other wikis different from ruwiki and commons, but the initial scope (emergency) has been fixed.TASK D
jcrespo added a comment.
The initial scope- query issues on ruwiki and commonswiki, I think has been successfully patched. These are the statistics I have now:
F10184141: Screenshot_20171013_165442.png
I would ask users affected to confirm the issues are gone- probably not 100% of them
jcrespo added a comment.
ruwiki results are more extreme:
root@db2076[ruwiki]> SELECT count(*) FROM recentchanges;
+--+
| count(*) |
+--+
| 37427748 |
+--+
1 row in set (11.19 sec)
root@db2076[ruwiki]> SELECT COUNT(*) FROM recentchanges WHERE rc_source
jcrespo added a comment.
Small correction/clarification on "this was found to generate too much load" as an ops, I interpret load as throughput/backlog work. The insertion [load] itself was not the problem (the spikes on inserts were too large, but something that could be smoothed); t
jcrespo added a comment.
@Anomie I know it is on your mind, it was a wink that for any estimation we give you know on some schema changes, I firmly believe we can cut by 10 the time it takes to deploy them if that was already there. Also an explanation why they take some much time right now.TASK
jcrespo added a comment.
One additional comment is that if you kind of agree, in some cases, to do the "new table and copy" pattern rather than an alter. It is much easier to add or transform columns later, so no need to optimize in advance. NULLs in compact or above rows formats take
jcrespo added a comment.
root@db2076[ruwiki]> SELECT COUNT(*) FROM recentchanges WHERE rc_source = 'wb';
+--+
| COUNT(*) |
+--+
|0 |
+--+
1 row in set (1.19 sec)TASK DETAILhttps://phabricator.wikimedia.org/T12EMAIL PREFERENCEShttps://phabricator.wikimedia.
jcrespo added a comment.
This is the result on query error rate after ruwiki has been purged:
F10158519: Screenshot_20171012_130744.png
Please note that these are preliminary results, and that tables have yet to be optimized/analyzed, but I think this confirms we are in the right track.
Commons
jcrespo added a comment.
53156406 rows purged on commons so far of the initial 58M estimation (it will probably be less because regular rc purge by timestamp).TASK DETAILhttps://phabricator.wikimedia.org/T12EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences
jcrespo added a comment.
Of course, that doesn't apply to cases that are limited by a common resource (e.g. database).
If I could add to the ideal scenario, the jobqueue would have dedicated slaves AND would write with a different domain id (allowing parallelism) than the rest of the writes so we
jcrespo added a comment.
Could, at least, that part have something to do with T164173, as a problem from the same cause, or a consequence of the fix? I also remember some tunning of some wikidata crons or job size, but not sure if upwards or downwards and not sure if related. Aaron, Ladsgroup
jcrespo added a comment.
This is my view of the issue, based on the comments above.
Short term:
If the number of watchlist items for the users is less than N (N to be determined), join Watchlist -> recentchanges, then "suffer" a small in-memory sort (I belive this is the current si
jcrespo added a comment.
After testing some indexes, I do not see a huge improvement- we can reduce from scanning 100M rows to 18M, but there can be always a combination of query parameters that does not filter many rows on recentchanges. Paging by id (or timestamp) is the only reliable solution
jcrespo added a comment.
Using STRAIGHT_JOIN and making sure an index is used more efficiently could help too (here I change the original query to order by rc_timestamp, and the explain is better, but the query is actually slower P5988#33002).TASK DETAILhttps://phabricator.wikimedia.org
jcrespo added a comment.
I am testing with a new index on dbstore1002; meanwhile I had a chat with Bawolff and he mentioned that rc used to be a small table where many indexes an inefficient scanning was possible because it was a much smaller summary of revision. Apparently with the latest
201 - 300 of 581 matches
Mail list logo