jcrespo added a comment.
@Aklapper Probably, but I would close that one, as that should not be happening right now, unless you have reports saying it is again.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-11-06T09:37:49Z] <_joe_> manually running htmlCacheUpdate for commonswiki and ruwiki on terbium, T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 389427 merged by Giuseppe Lavagetto:
[operations/puppet@production] jobrunner: make refreshlinks jobs low-priority
https://gerrit.wikimedia.org/r/389427TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 389427 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] jobrunner: make refreshlinks jobs low-priority
https://gerrit.wikimedia.org/r/389427TASK
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-11-03T10:39:07Z] Synchronized wmf-config/CommonSettings.php: Increase concurrency of htmlCacheUpdate jobs T173710 (duration: 00m 48s)TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 388416 merged by Giuseppe Lavagetto:
[operations/mediawiki-config@master] Increase concurrency for htmlCacheUpdate
https://gerrit.wikimedia.org/r/388416TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 388416 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/mediawiki-config@master] Increase concurrency for htmlCacheUpdate
https://gerrit.wikimedia.org/r/388416TASK
mobrovac added a comment.
In T173710#3730359, @elukey wrote:
https://gerrit.wikimedia.org/r/#/c/385248 should be already working for commons, but from mwlog1001's runJob.log I can only see stuff like causeAction=unknown causeAgent=unknown (that probably only confirms that no authenticated
elukey added a comment.
Status:
elukey@terbium:~$ mwscript extensions/WikimediaMaintenance/getJobQueueLengths.php |sort -n -k2 | tail -n 20
euwiki 237
tgwiki 3759
cawiki 4822
enwiktionary 17148
zhwiki 19958
nowiki 21167
wikidatawiki 28257
bewiki 110296
arwiki 132139
ukwiki 132246
dewiki 155322
EBernhardson added a comment.
It was perhaps noted before, but because of the recursive nature of the refreshLinks and htmlCacheUpdate jobs even if the backlog is being processed it may not look like it, because the jobs are just enqueing new jobs. Will probably take some time to really know what
elukey added a comment.
In T173710#3720358, @EBernhardson wrote:
All jobs have a requestId parameter, which is passed down through the execution chain. This is the same as the reqId field in logstash. Basically this means if the originating request logged anything to logstash, you should be able
EBernhardson added a comment.
All jobs have a requestId parameter, which is passed down through the execution chain. This is the same as the reqId field in logstash. Basically this means if the originating request logged anything to logstash, you should be able to find it with the query
elukey added a comment.
We had some relief after the last change in the configs of the jobrunners, namely the queue started shrinking, but then we got back into the bad behavior in which we have constantly more jobs enqueued vs completed:
F10519970: Screen Shot 2017-10-30 at 6.19.11 PM.png
I am
Ladsgroup added a comment.
In T173710#3718725, @Jack_who_built_the_house wrote:
Thanks for the reply. It just surprises me that on enwiki, the job queue is very lightweight, while on ruwiki, it's 2/3 of the overall pages count, and enwiki is much more active. Is it because of wide use of Wikidata
Jack_who_built_the_house added a comment.
Thanks for the reply. It just surprises me that on enwiki, the job queue is very lightweight, while on ruwiki, it's 2/3 of the overall pages count, and enwiki is much more active. Is it because of wide use of Wikidata in ruwiki?TASK
elukey added a comment.
In T173710#3717940, @Jack_who_built_the_house wrote:
On ruwiki, many editors are complaining about slow updating of pages with their templates. We have a huge job queue, and it keeps growing day by day, while no top-used templates/modules have been changed in the last
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-10-30T08:42:42Z] raised priority of refreshlink and htmlcacheupdate job execution on jobrunners (https://gerrit.wikimedia.org/r/#/c/386636/) - T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 386636 merged by Elukey:
[operations/puppet@production] role::mediawiki::jobrunner: inc runners for refreshLinks/htmlCacheUpdate
https://gerrit.wikimedia.org/r/386636TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
Jack_who_built_the_house added a comment.
On ruwiki, many editors are complaining about slow updating of pages with their templates. We have a huge job queue, while no popular templates/modules have been changed in the last days.
Please tell, is there any advice that could be given to us, as well
gerritbot added a comment.
Change 386636 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::mediawiki::jobrunner: raise temporarily runners for refreshLinks/hmtlCacheUpdate
https://gerrit.wikimedia.org/r/386636TASK
elukey added a comment.
Updated status:
elukey@terbium:~$ /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/group1.dblist showJobs.php --group | awk '{if ($3 > 1) print $_}'
cawiki: refreshLinks: 13566 queued; 6 claimed (6 active, 0 abandoned); 0 delayed
commonswiki: refreshLinks:
Jack_who_built_the_house added a comment.
In T173710#3701806, @Ladsgroup wrote:
I think one of the reasons contributing to the problem is the same problem we had with T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several
EBernhardson added a comment.
I think we might be able to add some capacity to processing those jobs on monday, but we probably have either to re-think the approach to the problem or throw more hardware at it.
I'm not sure if we need more hardware, or just more effective use of the current
Joe added a comment.
oblivian@terbium:~$ /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/group1.dblist showJobs.php --group | awk '{if ($3 > 1) print $_}'
cawiki: refreshLinks: 104355 queued; 3 claimed (3 active, 0 abandoned); 0 delayed
commonswiki: refreshLinks: 2073193 queued; 44
Ladsgroup added a comment.
The jobqueue size just bumped to 12M in two days and it's not going down. I don't know if it's related to wikidata or not but that's something people need to look into.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
EBernhardson added a comment.
In T173710#3625333, @Daimona wrote:
Today on it.wiki I noticed a massive increase in search results for some queries related to errors that I'm currently trying to fix. This search:
Daimona added a comment.
Today on it.wiki I noticed a massive increase in search results for some queries related to errors that I'm currently trying to fix. This search: https://it.wikipedia.org/w/index.php?search=insource%3A%2F%27%27parlate+prego%27%27%5C%3C%5C%2F%2F=Speciale:Ricerca="">
now has
gerritbot added a comment.
Change 378719 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Refactor possibly fragile ChangeHandler/WikiPageUpdater hash calculations
https://gerrit.wikimedia.org/r/378719TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 377046 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately.
https://gerrit.wikimedia.org/r/377046TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
Joe added a comment.
FWIW we're seeing another almost-incontrollable growth of jobs on commons and probably other wikis. I might decide to raise the concurrency of those jobs.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 375819 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Pass root job params through WikiPageUpdater
https://gerrit.wikimedia.org/r/375819TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 378719 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig (WMDE)):
[mediawiki/extensions/Wikibase@master] Refactor possibly fragile ChangeHandler/WikiPageUpdater hash calculations
https://gerrit.wikimedia.org/r/378719TASK
gerritbot added a comment.
Change 377811 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/extensions/Wikibase@master] Split page set before constructing InjectRCRecordsJob
https://gerrit.wikimedia.org/r/377811TASK
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-09-12T13:12:13Z] Synchronized wmf-config/Wikibase-production.php: Reduce wikiPageUpdaterDbBatchSize to 20 - T173710 (duration: 00m 45s)TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 377458 merged by jenkins-bot:
[operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20
https://gerrit.wikimedia.org/r/377458TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 377458 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20
https://gerrit.wikimedia.org/r/377458TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 376562 merged by jenkins-bot:
[operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20
https://gerrit.wikimedia.org/r/376562TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 377046 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately.
https://gerrit.wikimedia.org/r/377046TASK
gerritbot added a comment.
Change 376562 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20
https://gerrit.wikimedia.org/r/376562TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
Ladsgroup added a comment.
I made the batch smaller from 100 to 50 and I can do it to 20. Let me make a patch.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: mobrovac, Nikerabbit, Mholloway,
mobrovac added a comment.
In T173710#3588015, @Joe wrote:
Wikibase refreshlinks jobs might benefit from being in smaller batches
+1 on this. As we have now all jobs being emitted to EventBus as well, we have had Kafka reject a portion of the jobs because they were larger than 4MB each. Upon
Joe added a comment.
I did some more number crunching on the instances of runJob.php I'm running on terbium, I found what follows:
Wikibase refreshlinks jobs might benefit from being in smaller batches, as many of those are taking a long time to execute. Out of 33.4k wikibase jobs, we had the
Joe added a comment.
In T173710#3584505, @Krinkle wrote:
In T173710#3583445, @Joe wrote:
As a side comment: this is one of the cases where I would've loved to have an elastic environment to run MediaWiki-related applications: I could've spun up 10 instances of jobrunner dedicated to refreshlinks
jcrespo added a comment.
Of course, that doesn't apply to cases that are limited by a common resource (e.g. database).
If I could add to the ideal scenario, the jobqueue would have dedicated slaves AND would write with a different domain id (allowing parallelism) than the rest of the writes so we
Krinkle added a comment.
In T173710#3583445, @Joe wrote:
As a side comment: this is one of the cases where I would've loved to have an elastic environment to run MediaWiki-related applications: I could've spun up 10 instances of jobrunner dedicated to refreshlinks (or, ideally, the system could
Joe added a comment.
In T173710#3581849, @aaron wrote:
Those refreshLInks jobs (from wikibase) are the only ones that use multiple titles per job, so they will be a lot slower (seems to be 50 pages/job) than the regular ones from MediaWiki core. That is a bit on the slow side for a run time of a
aaron added a comment.
Those refreshLInks jobs (from wikibase) are the only ones that use multiple titles per job, so they will be a lot slower (seems to be 50 pages/job) than the regular ones from MediaWiki core. That is a bit on the slow side for a run time of a non-rare job type (e.g. TMH or
gerritbot added a comment.
Change 375819 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/extensions/Wikibase@master] Pass root job params through WikiPageUpdater
https://gerrit.wikimedia.org/r/375819TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
Joe added a comment.
We still have around 1.4 million items in queue for commons, evenly divided between htmlCacheUpdate jobs and refreshLinks jobs.
I've started a few runs of the refreshLinks job and since yesterday most jobs are just processing the same root job from August 26th.
Those jobs
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-09-05T07:03:25Z] <_joe_> launching manually 3 workers for refreshLinks jobs on commons, T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-09-04T07:02:33Z] <_joe_> starting additional runJobs instance for htmlcacheupdate on commons T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
aaron added a comment.
In T173710#3571046, @EBernhardson wrote:
In T173710#3571009, @Legoktm wrote:
Could we always bump page_touched, but only send the purges to varnish if the timestamp is within the past four days? Would that let us run the older jobs faster since if I understand correctly
GWicke added a comment.
I updated https://gerrit.wikimedia.org/r/#/c/295027/ to apply on current master. This removes CDN purges from HTMLCacheUpdate, and only performs them after RefreshLinks, and only if nothing else caused a re-render since.
With this patch applied, we should be able to reduce
EBernhardson added a comment.
In T173710#3571009, @Legoktm wrote:
Could we always bump page_touched, but only send the purges to varnish if the timestamp is within the past four days? Would that let us run the older jobs faster since if I understand correctly the throttling is to avoid
EBernhardson added a comment.
With the refresh links problem looking mostly resolved, the remaining top queues in the job queue (as of aug 31, 1am UTC):
commonswiki: htmlCacheUpdate: 809453 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
commonswiki: refreshLinks: 532823 queued; 5492
Legoktm added a comment.
Could we always bump page_touched, but only send the purges to varnish if the timestamp is within the past four days? Would that let us run the older jobs faster since if I understand correctly the throttling is to avoid overloading varnish with purges?TASK
aaron added a comment.
In T173710#3570037, @Joe wrote:
Correcting myself after a discussion with @ema: since we have up to 4 cache layers (at most), we should process any job with a root timestamp newer than 4 times the cache TTL cap. So anything older than 4 days should be safely discardable.
Agabi10 added a comment.
@Joe, that might be true for the htmlCacheUpdate jobs, but not for the refreshLinks jobs. From my understanding, the refreshLinks jobs should be processed even if they are older than the max TTL, because discarding those jobs only because they are old would make the
Joe added a comment.
@aaron so you're saying that when we have someone editing a lot of pages with a lot of backlinks we will see the jobqueue growing basically for quite a long time, as the divided jobs will be executed at a later time, and as long as the queue is long enough, we'll see jobs
gerritbot added a comment.
Change 373521 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Decrease dbBatchSize in WikiPageUpdater
https://gerrit.wikimedia.org/r/373521TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
aaron added a comment.
As far as retries go, the attempts hash for wikidatawiki:htmlCacheUpdate has few entries with run counts no greater than 3. The onl incrementing code is doPop() in MediaWiki, the same code that made them go up to 3 to begin with. If the same job ran many times, I'd expect
Krinkle added a comment.
Added a mitigation section to the task description. Also a summary of the impact of the mitigations so far (based on input from @aaron).
Dashboard: Job Queue Helath
F9232210: Screen Shot 2017-08-31 at 00.00.31.png F9232209: Screen Shot 2017-08-31 at 00.00.18.png
Job
GWicke added a comment.
HTMLCacheUpdate root job timestamp distribution, jobs executed within the last 15 hours:
1233 20170407
8237 20170408
18 20170423
18 20170426
20 20170429
50 20170430
18 20170502
18 20170504
20 20170509
10 20170512
18
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-08-28T14:28:13Z] Synchronized php-1.30.0-wmf.15/includes/jobqueue/jobs/HTMLCacheUpdateJob.php: SWAT: [[gerrit:373984|Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob (T173710)]] (duration: 00m
gerritbot added a comment.
Change 373984 merged by jenkins-bot:
[mediawiki/core@wmf/1.30.0-wmf.15] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob
https://gerrit.wikimedia.org/r/373984TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 373984 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz):
[mediawiki/core@wmf/1.30.0-wmf.15] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob
https://gerrit.wikimedia.org/r/373984TASK
aaron added a comment.
Though this bit is problematic:
"page_touched < " . $dbw->addQuotes( $dbw->timestamp( $touchTimestamp ) )
...seems like that comparison should use rootJobTimestamp if present.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
aaron added a comment.
Ignored purges still count as work items, yes.
Rebound purges could explain some of the number. Also, given the backlog, lots of them probably had actually different rootJobTimestamps. MediaWiki can de-duplicate those when it's the same backlinked page X being edited
EBernhardson added a comment.
In T173710#3554154, @aaron wrote:
Note that for de-duplication, as long as the job has rootJobTimestamp set, it will ignore rows already touched (page_touched) to a higher/equal value, and likewise not send purges to the corresponding pages. So the CDN aspects
aaron added a comment.
Note that for de-duplication, as long as the job has rootJobTimestamp set, it will ignore rows already touched (page_touched) to a higher/equal value, and likewise not send purges to the corresponding pages. So the CDN aspects *should* already have lots of de-duplication,
jcrespo added a comment.
This is probably a symptom and not a cause, but I wanted to comment it anyway in case it was interesting:
There seems to be higher than usual hhvm exceptions:
https://logstash.wikimedia.org/goto/80fa5708f0a5e9da4be9f4630969b72e
Most of those, at least the ones that are
gerritbot added a comment.
Change 373705 merged by jenkins-bot:
[mediawiki/core@master] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob
https://gerrit.wikimedia.org/r/373705TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
aaron added a comment.
In T173710#3551156, @aaron wrote:
Secondary purges where for dealing with replication lag scenarios, not lost purges. That was one extra purge (2X).
One easy change I can see to not use CdnCacheUpdate from HtmlCacheUpdateJob (but still for the pages directly being edited).
gerritbot added a comment.
Change 373705 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz):
[mediawiki/core@master] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob
https://gerrit.wikimedia.org/r/373705TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
EBernhardson added a comment.
Note necessarily a cause, but while looking into viwiki's backlog, i noticed this bot which seems to be creating an incredible number of purge jobs: https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:%C4%90%C3%B3ng_g%C3%B3p/TuanminhBot?uselang=enTASK
aaron added a comment.
Secondary purges where for dealing with replication lag scenarios, not lost purges. That was one extra purge (2X).
One easy change I can see to not use CdnCacheUpdate from HtmlCacheUpdateJob (but still for the pages directly being edited). There is already processing delay
EBernhardson added a comment.
In T173710#3550759, @Jdforrester-WMF wrote:
Well, it's dropped by ~1.5M jobs in the last couple of hours and seems to be now more slowly draining the pool.
Thats because i ran all the htmlCacheUpdate jobs on srwiki (~2M) with throttling disabled to see what kind of
Jdforrester-WMF added a comment.
Well, it's dropped by ~1.5M jobs in the last couple of hours and seems to be now more slowly draining the pool.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
EBernhardson added a comment.
Some background from bblack about the cache purge pipeline:
A) Sometime in the distant past, the way it worked is that when an edit/delete POST request came in, at the end of the request (after sending the response), there's some kind of post-response hook for async
aaron added a comment.
In T173710#3548223, @daniel wrote:
In T173710#3547580, @aaron wrote:
In other words, base jobs for entities that will divide up and purge all backlinks to the given entity. Note that each job has two entries.
Wait - each job has two entries? You mean, there are
EBernhardson added a comment.
In T173710#3547826, @Ladsgroup wrote:
The jobqueue has slowed down but still increasing, and cirrusSearchIncomingLinkCount still increases the jobqueue with rate of 100 jobs/second.
Cirrus link counting jobs are probably just a symptom of the backlog of refresh
daniel added a comment.
Now let's see what the reduced batch size does. It may actually make the problem worse, but increasing the total number of jobs. Let's hope it makes it better, by reducing the time job runners are blocked...TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-08-24T14:44:42Z] Synchronized php-1.30.0-wmf.15/extensions/Wikidata/extensions/Wikibase/client/includes/Changes/WikiPageUpdater.php: Reduce batch size in WikiPageUpdater (T173710) (duration: 00m 48s)TASK
gerritbot added a comment.
Change 373551 merged by jenkins-bot:
[mediawiki/extensions/Wikidata@wmf/1.30.0-wmf.15] Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373551TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 373551 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikidata@wmf/1.30.0-wmf.15] Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373551TASK
gerritbot added a comment.
Change 373548 abandoned by Ladsgroup:
Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373548TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
gerritbot added a comment.
Change 373548 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373548TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 373547 abandoned by Ladsgroup:
Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373547TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
gerritbot added a comment.
Change 373547 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373547TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 373539 merged by jenkins-bot:
[mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373539TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 373539 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater
https://gerrit.wikimedia.org/r/373539TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
daniel added a comment.
In T173710#3547580, @aaron wrote:
In other words, base jobs for entities that will divide up and purge all backlinks to the given entity. Note that each job has two entries.
Wait - each job has two entries? You mean, there are duplicates inserted, and not pruned?...TASK
daniel added a comment.
So, @Ladsgroup told me that he observed HtmlCacheUpdate jobs for 100 pages taking more than one minute. Given that the purging process is parallelized using fork, this is quite surprising. Why is this so slow? It used be be really fast, just sending out a few UDP packages.
daniel added a comment.
In T173710#3542688, @aaron wrote:
Mostly htmlCacheUpdate jobs on wikidatawiki:
htmlCacheUpdate: 6014947 queued; 5 claimed (0 active, 5 abandoned); 0 delayed
These are HtmlCacheUpdates *on* wikidata? Really? That's quite surprising. I would have expected HtmlCacheUpdates
daniel added a comment.
In T173710#3545392, @Esc3300 wrote:
Are these originating also in clients or initially coming from Wikidata? What triggers them?
wikibase_addUsagesForPage are essentially like LinksUpdates: they get triggered by any parse, recording what entities are used on the page.
Ladsgroup added a comment.
I take that back, I ran runJobs on terbium to see what's going on there and most jobs gets passed easily (including cirrusSearchIncomingLinkCount and htmlCacheUpdate) but there are cases where we have jobs like this that block the whole thing:
2017-08-24 09:46:14
gerritbot added a comment.
Change 373521 had a related patch set uploaded (by AnotherLadsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Decrease dbBatchSize in WikiPageUpdater
https://gerrit.wikimedia.org/r/373521TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
Ladsgroup added a comment.
The jobqueue has slowed down but still increasing, and cirrusSearchIncomingLinkCount still increases the jobqueue with rate of 100 jobs/second.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
gerritbot added a comment.
Change 373390 merged by jenkins-bot:
[mediawiki/core@wmf/1.30.0-wmf.15] Make workItemCount() smarter for htmlCacheUpdate/refreshLinks
https://gerrit.wikimedia.org/r/373390TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL
aaron added a comment.
From
mwscript maintenance/runJobs.php wikidatawiki --type htmlCacheUpdate --nothrottle --maxjobs 100 | grep "IsSelf=1"
I can see almost all of the jobs are things like:
2017-08-24 01:15:39 htmlCacheUpdate Q36985371 table=pagelinks recursive=1 rootJobIsSelf=1
1 - 100 of 109 matches
Mail list logo