[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-24 Thread jcrespo
jcrespo added a comment. @Aklapper Probably, but I would close that one, as that should not be happening right now, unless you have reports saying it is again.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-06 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-11-06T09:37:49Z] <_joe_> manually running htmlCacheUpdate for commonswiki and ruwiki on terbium, T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-06 Thread gerritbot
gerritbot added a comment. Change 389427 merged by Giuseppe Lavagetto: [operations/puppet@production] jobrunner: make refreshlinks jobs low-priority https://gerrit.wikimedia.org/r/389427TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-06 Thread gerritbot
gerritbot added a comment. Change 389427 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto): [operations/puppet@production] jobrunner: make refreshlinks jobs low-priority https://gerrit.wikimedia.org/r/389427TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-11-03T10:39:07Z] Synchronized wmf-config/CommonSettings.php: Increase concurrency of htmlCacheUpdate jobs T173710 (duration: 00m 48s)TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-03 Thread gerritbot
gerritbot added a comment. Change 388416 merged by Giuseppe Lavagetto: [operations/mediawiki-config@master] Increase concurrency for htmlCacheUpdate https://gerrit.wikimedia.org/r/388416TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-03 Thread gerritbot
gerritbot added a comment. Change 388416 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto): [operations/mediawiki-config@master] Increase concurrency for htmlCacheUpdate https://gerrit.wikimedia.org/r/388416TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-03 Thread mobrovac
mobrovac added a comment. In T173710#3730359, @elukey wrote: https://gerrit.wikimedia.org/r/#/c/385248 should be already working for commons, but from mwlog1001's runJob.log I can only see stuff like causeAction=unknown causeAgent=unknown (that probably only confirms that no authenticated

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-03 Thread elukey
elukey added a comment. Status: elukey@terbium:~$ mwscript extensions/WikimediaMaintenance/getJobQueueLengths.php |sort -n -k2 | tail -n 20 euwiki 237 tgwiki 3759 cawiki 4822 enwiktionary 17148 zhwiki 19958 nowiki 21167 wikidatawiki 28257 bewiki 110296 arwiki 132139 ukwiki 132246 dewiki 155322

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-11-02 Thread EBernhardson
EBernhardson added a comment. It was perhaps noted before, but because of the recursive nature of the refreshLinks and htmlCacheUpdate jobs even if the backlog is being processed it may not look like it, because the jobs are just enqueing new jobs. Will probably take some time to really know what

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-31 Thread elukey
elukey added a comment. In T173710#3720358, @EBernhardson wrote: All jobs have a requestId parameter, which is passed down through the execution chain. This is the same as the reqId field in logstash. Basically this means if the originating request logged anything to logstash, you should be able

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-30 Thread EBernhardson
EBernhardson added a comment. All jobs have a requestId parameter, which is passed down through the execution chain. This is the same as the reqId field in logstash. Basically this means if the originating request logged anything to logstash, you should be able to find it with the query

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-30 Thread elukey
elukey added a comment. We had some relief after the last change in the configs of the jobrunners, namely the queue started shrinking, but then we got back into the bad behavior in which we have constantly more jobs enqueued vs completed: F10519970: Screen Shot 2017-10-30 at 6.19.11 PM.png I am

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-30 Thread Ladsgroup
Ladsgroup added a comment. In T173710#3718725, @Jack_who_built_the_house wrote: Thanks for the reply. It just surprises me that on enwiki, the job queue is very lightweight, while on ruwiki, it's 2/3 of the overall pages count, and enwiki is much more active. Is it because of wide use of Wikidata

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-30 Thread Jack_who_built_the_house
Jack_who_built_the_house added a comment. Thanks for the reply. It just surprises me that on enwiki, the job queue is very lightweight, while on ruwiki, it's 2/3 of the overall pages count, and enwiki is much more active. Is it because of wide use of Wikidata in ruwiki?TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-30 Thread elukey
elukey added a comment. In T173710#3717940, @Jack_who_built_the_house wrote: On ruwiki, many editors are complaining about slow updating of pages with their templates. We have a huge job queue, and it keeps growing day by day, while no top-used templates/modules have been changed in the last

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-30 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-10-30T08:42:42Z] raised priority of refreshlink and htmlcacheupdate job execution on jobrunners (https://gerrit.wikimedia.org/r/#/c/386636/) - T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-30 Thread gerritbot
gerritbot added a comment. Change 386636 merged by Elukey: [operations/puppet@production] role::mediawiki::jobrunner: inc runners for refreshLinks/htmlCacheUpdate https://gerrit.wikimedia.org/r/386636TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-28 Thread Jack_who_built_the_house
Jack_who_built_the_house added a comment. On ruwiki, many editors are complaining about slow updating of pages with their templates. We have a huge job queue, while no popular templates/modules have been changed in the last days. Please tell, is there any advice that could be given to us, as well

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-26 Thread gerritbot
gerritbot added a comment. Change 386636 had a related patch set uploaded (by Elukey; owner: Elukey): [operations/puppet@production] role::mediawiki::jobrunner: raise temporarily runners for refreshLinks/hmtlCacheUpdate https://gerrit.wikimedia.org/r/386636TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-26 Thread elukey
elukey added a comment. Updated status: elukey@terbium:~$ /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/group1.dblist showJobs.php --group | awk '{if ($3 > 1) print $_}' cawiki: refreshLinks: 13566 queued; 6 claimed (6 active, 0 abandoned); 0 delayed commonswiki: refreshLinks:

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-10-23 Thread Jack_who_built_the_house
Jack_who_built_the_house added a comment. In T173710#3701806, @Ladsgroup wrote: I think one of the reasons contributing to the problem is the same problem we had with T171027: "Read timeout is reached" DBQueryError when trying to load specific users' watchlists (with +1000 articles) on several

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-29 Thread EBernhardson
EBernhardson added a comment. I think we might be able to add some capacity to processing those jobs on monday, but we probably have either to re-think the approach to the problem or throw more hardware at it. I'm not sure if we need more hardware, or just more effective use of the current

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-29 Thread Joe
Joe added a comment. oblivian@terbium:~$ /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/group1.dblist showJobs.php --group | awk '{if ($3 > 1) print $_}' cawiki: refreshLinks: 104355 queued; 3 claimed (3 active, 0 abandoned); 0 delayed commonswiki: refreshLinks: 2073193 queued; 44

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-29 Thread Ladsgroup
Ladsgroup added a comment. The jobqueue size just bumped to 12M in two days and it's not going down. I don't know if it's related to wikidata or not but that's something people need to look into.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-21 Thread EBernhardson
EBernhardson added a comment. In T173710#3625333, @Daimona wrote: Today on it.wiki I noticed a massive increase in search results for some queries related to errors that I'm currently trying to fix. This search:

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-21 Thread Daimona
Daimona added a comment. Today on it.wiki I noticed a massive increase in search results for some queries related to errors that I'm currently trying to fix. This search: https://it.wikipedia.org/w/index.php?search=insource%3A%2F%27%27parlate+prego%27%27%5C%3C%5C%2F%2F=Speciale:Ricerca=""> now has

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-21 Thread gerritbot
gerritbot added a comment. Change 378719 merged by jenkins-bot: [mediawiki/extensions/Wikibase@master] Refactor possibly fragile ChangeHandler/WikiPageUpdater hash calculations https://gerrit.wikimedia.org/r/378719TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-20 Thread gerritbot
gerritbot added a comment. Change 377046 merged by jenkins-bot: [mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately. https://gerrit.wikimedia.org/r/377046TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-19 Thread Joe
Joe added a comment. FWIW we're seeing another almost-incontrollable growth of jobs on commons and probably other wikis. I might decide to raise the concurrency of those jobs.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-18 Thread gerritbot
gerritbot added a comment. Change 375819 merged by jenkins-bot: [mediawiki/extensions/Wikibase@master] Pass root job params through WikiPageUpdater https://gerrit.wikimedia.org/r/375819TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-18 Thread gerritbot
gerritbot added a comment. Change 378719 had a related patch set uploaded (by Thiemo Mättig (WMDE); owner: Thiemo Mättig (WMDE)): [mediawiki/extensions/Wikibase@master] Refactor possibly fragile ChangeHandler/WikiPageUpdater hash calculations https://gerrit.wikimedia.org/r/378719TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-13 Thread gerritbot
gerritbot added a comment. Change 377811 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler): [mediawiki/extensions/Wikibase@master] Split page set before constructing InjectRCRecordsJob https://gerrit.wikimedia.org/r/377811TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-12 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-09-12T13:12:13Z] Synchronized wmf-config/Wikibase-production.php: Reduce wikiPageUpdaterDbBatchSize to 20 - T173710 (duration: 00m 45s)TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-12 Thread gerritbot
gerritbot added a comment. Change 377458 merged by jenkins-bot: [operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20 https://gerrit.wikimedia.org/r/377458TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-12 Thread gerritbot
gerritbot added a comment. Change 377458 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani): [operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20 https://gerrit.wikimedia.org/r/377458TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-11 Thread gerritbot
gerritbot added a comment. Change 376562 merged by jenkins-bot: [operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20 https://gerrit.wikimedia.org/r/376562TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-10 Thread gerritbot
gerritbot added a comment. Change 377046 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler): [mediawiki/extensions/Wikibase@master] Allow batch sizes for different jobs to be defined separately. https://gerrit.wikimedia.org/r/377046TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-07 Thread gerritbot
gerritbot added a comment. Change 376562 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani): [operations/mediawiki-config@master] Reduce wikiPageUpdaterDbBatchSize to 20 https://gerrit.wikimedia.org/r/376562TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-07 Thread Ladsgroup
Ladsgroup added a comment. I made the batch smaller from 100 to 50 and I can do it to 20. Let me make a patch.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: mobrovac, Nikerabbit, Mholloway,

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-07 Thread mobrovac
mobrovac added a comment. In T173710#3588015, @Joe wrote: Wikibase refreshlinks jobs might benefit from being in smaller batches +1 on this. As we have now all jobs being emitted to EventBus as well, we have had Kafka reject a portion of the jobs because they were larger than 4MB each. Upon

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-07 Thread Joe
Joe added a comment. I did some more number crunching on the instances of runJob.php I'm running on terbium, I found what follows: Wikibase refreshlinks jobs might benefit from being in smaller batches, as many of those are taking a long time to execute. Out of 33.4k wikibase jobs, we had the

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-06 Thread Joe
Joe added a comment. In T173710#3584505, @Krinkle wrote: In T173710#3583445, @Joe wrote: As a side comment: this is one of the cases where I would've loved to have an elastic environment to run MediaWiki-related applications: I could've spun up 10 instances of jobrunner dedicated to refreshlinks

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-06 Thread jcrespo
jcrespo added a comment. Of course, that doesn't apply to cases that are limited by a common resource (e.g. database). If I could add to the ideal scenario, the jobqueue would have dedicated slaves AND would write with a different domain id (allowing parallelism) than the rest of the writes so we

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-06 Thread Krinkle
Krinkle added a comment. In T173710#3583445, @Joe wrote: As a side comment: this is one of the cases where I would've loved to have an elastic environment to run MediaWiki-related applications: I could've spun up 10 instances of jobrunner dedicated to refreshlinks (or, ideally, the system could

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-06 Thread Joe
Joe added a comment. In T173710#3581849, @aaron wrote: Those refreshLInks jobs (from wikibase) are the only ones that use multiple titles per job, so they will be a lot slower (seems to be 50 pages/job) than the regular ones from MediaWiki core. That is a bit on the slow side for a run time of a

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-05 Thread aaron
aaron added a comment. Those refreshLInks jobs (from wikibase) are the only ones that use multiple titles per job, so they will be a lot slower (seems to be 50 pages/job) than the regular ones from MediaWiki core. That is a bit on the slow side for a run time of a non-rare job type (e.g. TMH or

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-05 Thread gerritbot
gerritbot added a comment. Change 375819 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler): [mediawiki/extensions/Wikibase@master] Pass root job params through WikiPageUpdater https://gerrit.wikimedia.org/r/375819TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-05 Thread Joe
Joe added a comment. We still have around 1.4 million items in queue for commons, evenly divided between htmlCacheUpdate jobs and refreshLinks jobs. I've started a few runs of the refreshLinks job and since yesterday most jobs are just processing the same root job from August 26th. Those jobs

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-05 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-09-05T07:03:25Z] <_joe_> launching manually 3 workers for refreshLinks jobs on commons, T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-09-04 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-09-04T07:02:33Z] <_joe_> starting additional runJobs instance for htmlcacheupdate on commons T173710TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread aaron
aaron added a comment. In T173710#3571046, @EBernhardson wrote: In T173710#3571009, @Legoktm wrote: Could we always bump page_touched, but only send the purges to varnish if the timestamp is within the past four days? Would that let us run the older jobs faster since if I understand correctly

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread GWicke
GWicke added a comment. I updated https://gerrit.wikimedia.org/r/#/c/295027/ to apply on current master. This removes CDN purges from HTMLCacheUpdate, and only performs them after RefreshLinks, and only if nothing else caused a re-render since. With this patch applied, we should be able to reduce

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread EBernhardson
EBernhardson added a comment. In T173710#3571009, @Legoktm wrote: Could we always bump page_touched, but only send the purges to varnish if the timestamp is within the past four days? Would that let us run the older jobs faster since if I understand correctly the throttling is to avoid

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread EBernhardson
EBernhardson added a comment. With the refresh links problem looking mostly resolved, the remaining top queues in the job queue (as of aug 31, 1am UTC): commonswiki: htmlCacheUpdate: 809453 queued; 0 claimed (0 active, 0 abandoned); 0 delayed commonswiki: refreshLinks: 532823 queued; 5492

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread Legoktm
Legoktm added a comment. Could we always bump page_touched, but only send the purges to varnish if the timestamp is within the past four days? Would that let us run the older jobs faster since if I understand correctly the throttling is to avoid overloading varnish with purges?TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread aaron
aaron added a comment. In T173710#3570037, @Joe wrote: Correcting myself after a discussion with @ema: since we have up to 4 cache layers (at most), we should process any job with a root timestamp newer than 4 times the cache TTL cap. So anything older than 4 days should be safely discardable.

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread Agabi10
Agabi10 added a comment. @Joe, that might be true for the htmlCacheUpdate jobs, but not for the refreshLinks jobs. From my understanding, the refreshLinks jobs should be processed even if they are older than the max TTL, because discarding those jobs only because they are old would make the

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread Joe
Joe added a comment. @aaron so you're saying that when we have someone editing a lot of pages with a lot of backlinks we will see the jobqueue growing basically for quite a long time, as the divided jobs will be executed at a later time, and as long as the queue is long enough, we'll see jobs

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-31 Thread gerritbot
gerritbot added a comment. Change 373521 merged by jenkins-bot: [mediawiki/extensions/Wikibase@master] Decrease dbBatchSize in WikiPageUpdater https://gerrit.wikimedia.org/r/373521TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-30 Thread aaron
aaron added a comment. As far as retries go, the attempts hash for wikidatawiki:htmlCacheUpdate has few entries with run counts no greater than 3. The onl incrementing code is doPop() in MediaWiki, the same code that made them go up to 3 to begin with. If the same job ran many times, I'd expect

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-30 Thread Krinkle
Krinkle added a comment. Added a mitigation section to the task description. Also a summary of the impact of the mitigations so far (based on input from @aaron). Dashboard: Job Queue Helath F9232210: Screen Shot 2017-08-31 at 00.00.31.png F9232209: Screen Shot 2017-08-31 at 00.00.18.png Job

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-30 Thread GWicke
GWicke added a comment. HTMLCacheUpdate root job timestamp distribution, jobs executed within the last 15 hours: 1233 20170407 8237 20170408 18 20170423 18 20170426 20 20170429 50 20170430 18 20170502 18 20170504 20 20170509 10 20170512 18

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-28 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-08-28T14:28:13Z] Synchronized php-1.30.0-wmf.15/includes/jobqueue/jobs/HTMLCacheUpdateJob.php: SWAT: [[gerrit:373984|Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob (T173710)]] (duration: 00m

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-28 Thread gerritbot
gerritbot added a comment. Change 373984 merged by jenkins-bot: [mediawiki/core@wmf/1.30.0-wmf.15] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob https://gerrit.wikimedia.org/r/373984TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-25 Thread gerritbot
gerritbot added a comment. Change 373984 had a related patch set uploaded (by Krinkle; owner: Aaron Schulz): [mediawiki/core@wmf/1.30.0-wmf.15] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob https://gerrit.wikimedia.org/r/373984TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-25 Thread aaron
aaron added a comment. Though this bit is problematic: "page_touched < " . $dbw->addQuotes( $dbw->timestamp( $touchTimestamp ) ) ...seems like that comparison should use rootJobTimestamp if present.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-25 Thread aaron
aaron added a comment. Ignored purges still count as work items, yes. Rebound purges could explain some of the number. Also, given the backlog, lots of them probably had actually different rootJobTimestamps. MediaWiki can de-duplicate those when it's the same backlinked page X being edited

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-25 Thread EBernhardson
EBernhardson added a comment. In T173710#3554154, @aaron wrote: Note that for de-duplication, as long as the job has rootJobTimestamp set, it will ignore rows already touched (page_touched) to a higher/equal value, and likewise not send purges to the corresponding pages. So the CDN aspects

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-25 Thread aaron
aaron added a comment. Note that for de-duplication, as long as the job has rootJobTimestamp set, it will ignore rows already touched (page_touched) to a higher/equal value, and likewise not send purges to the corresponding pages. So the CDN aspects *should* already have lots of de-duplication,

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-25 Thread jcrespo
jcrespo added a comment. This is probably a symptom and not a cause, but I wanted to comment it anyway in case it was interesting: There seems to be higher than usual hhvm exceptions: https://logstash.wikimedia.org/goto/80fa5708f0a5e9da4be9f4630969b72e Most of those, at least the ones that are

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373705 merged by jenkins-bot: [mediawiki/core@master] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob https://gerrit.wikimedia.org/r/373705TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread aaron
aaron added a comment. In T173710#3551156, @aaron wrote: Secondary purges where for dealing with replication lag scenarios, not lost purges. That was one extra purge (2X). One easy change I can see to not use CdnCacheUpdate from HtmlCacheUpdateJob (but still for the pages directly being edited).

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373705 had a related patch set uploaded (by Aaron Schulz; owner: Aaron Schulz): [mediawiki/core@master] Disable rebound CDN purges for backlinks in HTMLCacheUpdateJob https://gerrit.wikimedia.org/r/373705TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread EBernhardson
EBernhardson added a comment. Note necessarily a cause, but while looking into viwiki's backlog, i noticed this bot which seems to be creating an incredible number of purge jobs: https://vi.wikipedia.org/wiki/%C4%90%E1%BA%B7c_bi%E1%BB%87t:%C4%90%C3%B3ng_g%C3%B3p/TuanminhBot?uselang=enTASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread aaron
aaron added a comment. Secondary purges where for dealing with replication lag scenarios, not lost purges. That was one extra purge (2X). One easy change I can see to not use CdnCacheUpdate from HtmlCacheUpdateJob (but still for the pages directly being edited). There is already processing delay

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread EBernhardson
EBernhardson added a comment. In T173710#3550759, @Jdforrester-WMF wrote: Well, it's dropped by ~1.5M jobs in the last couple of hours and seems to be now more slowly draining the pool. Thats because i ran all the htmlCacheUpdate jobs on srwiki (~2M) with throttling disabled to see what kind of

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread Jdforrester-WMF
Jdforrester-WMF added a comment. Well, it's dropped by ~1.5M jobs in the last couple of hours and seems to be now more slowly draining the pool.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread EBernhardson
EBernhardson added a comment. Some background from bblack about the cache purge pipeline: A) Sometime in the distant past, the way it worked is that when an edit/delete POST request came in, at the end of the request (after sending the response), there's some kind of post-response hook for async

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread aaron
aaron added a comment. In T173710#3548223, @daniel wrote: In T173710#3547580, @aaron wrote: In other words, base jobs for entities that will divide up and purge all backlinks to the given entity. Note that each job has two entries. Wait - each job has two entries? You mean, there are

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread EBernhardson
EBernhardson added a comment. In T173710#3547826, @Ladsgroup wrote: The jobqueue has slowed down but still increasing, and cirrusSearchIncomingLinkCount still increases the jobqueue with rate of 100 jobs/second. Cirrus link counting jobs are probably just a symptom of the backlog of refresh

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread daniel
daniel added a comment. Now let's see what the reduced batch size does. It may actually make the problem worse, but increasing the total number of jobs. Let's hope it makes it better, by reducing the time job runners are blocked...TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2017-08-24T14:44:42Z] Synchronized php-1.30.0-wmf.15/extensions/Wikidata/extensions/Wikibase/client/includes/Changes/WikiPageUpdater.php: Reduce batch size in WikiPageUpdater (T173710) (duration: 00m 48s)TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373551 merged by jenkins-bot: [mediawiki/extensions/Wikidata@wmf/1.30.0-wmf.15] Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373551TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373551 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani): [mediawiki/extensions/Wikidata@wmf/1.30.0-wmf.15] Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373551TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373548 abandoned by Ladsgroup: Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373548TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373548 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani): [mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373548TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373547 abandoned by Ladsgroup: Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373547TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373547 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani): [mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373547TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373539 merged by jenkins-bot: [mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373539TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373539 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani): [mediawiki/extensions/Wikidata@master] Hotfix: Reduce batch size in WikiPageUpdater https://gerrit.wikimedia.org/r/373539TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread daniel
daniel added a comment. In T173710#3547580, @aaron wrote: In other words, base jobs for entities that will divide up and purge all backlinks to the given entity. Note that each job has two entries. Wait - each job has two entries? You mean, there are duplicates inserted, and not pruned?...TASK

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread daniel
daniel added a comment. So, @Ladsgroup told me that he observed HtmlCacheUpdate jobs for 100 pages taking more than one minute. Given that the purging process is parallelized using fork, this is quite surprising. Why is this so slow? It used be be really fast, just sending out a few UDP packages.

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread daniel
daniel added a comment. In T173710#3542688, @aaron wrote: Mostly htmlCacheUpdate jobs on wikidatawiki: htmlCacheUpdate: 6014947 queued; 5 claimed (0 active, 5 abandoned); 0 delayed These are HtmlCacheUpdates *on* wikidata? Really? That's quite surprising. I would have expected HtmlCacheUpdates

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread daniel
daniel added a comment. In T173710#3545392, @Esc3300 wrote: Are these originating also in clients or initially coming from Wikidata? What triggers them? wikibase_addUsagesForPage are essentially like LinksUpdates: they get triggered by any parse, recording what entities are used on the page.

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread Ladsgroup
Ladsgroup added a comment. I take that back, I ran runJobs on terbium to see what's going on there and most jobs gets passed easily (including cirrusSearchIncomingLinkCount and htmlCacheUpdate) but there are cases where we have jobs like this that block the whole thing: 2017-08-24 09:46:14

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread gerritbot
gerritbot added a comment. Change 373521 had a related patch set uploaded (by AnotherLadsgroup; owner: Amir Sarabadani): [mediawiki/extensions/Wikibase@master] Decrease dbBatchSize in WikiPageUpdater https://gerrit.wikimedia.org/r/373521TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-24 Thread Ladsgroup
Ladsgroup added a comment. The jobqueue has slowed down but still increasing, and cirrusSearchIncomingLinkCount still increases the jobqueue with rate of 100 jobs/second.TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-23 Thread gerritbot
gerritbot added a comment. Change 373390 merged by jenkins-bot: [mediawiki/core@wmf/1.30.0-wmf.15] Make workItemCount() smarter for htmlCacheUpdate/refreshLinks https://gerrit.wikimedia.org/r/373390TASK DETAILhttps://phabricator.wikimedia.org/T173710EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T173710: Job queue is increasing non-stop

2017-08-23 Thread aaron
aaron added a comment. From mwscript maintenance/runJobs.php wikidatawiki --type htmlCacheUpdate --nothrottle --maxjobs 100 | grep "IsSelf=1" I can see almost all of the jobs are things like: 2017-08-24 01:15:39 htmlCacheUpdate Q36985371 table=pagelinks recursive=1 rootJobIsSelf=1

  1   2   >