[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-06-01 Thread BBlack
BBlack added a comment. Yeah that was the plan, for XKey to help here by consolidating that down to a single HTCP / PURGE per article touched. It's not useful for the mass-scale case (e.g. template/link references), as it doesn't scale well in that direction. But for the case like "1 article ==

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-05-31 Thread Gilles
Gilles added a comment. Would the use of xkey help here? It sounds like a single user action currently generates several purge requests.TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GillesCc: Gilles,

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-05-31 Thread BBlack
BBlack added a comment. We can get broader averages by dividing the values seen in the aggregate client status code graphs using eqiad's text cluster (the remote sites would expect fewer due to some of the bursts being more likely to be dropped by the network) This shows the past week's average

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-05-30 Thread Krinkle
Krinkle added a comment. In T124418#3302573, @BBlack wrote: In T124418#1985526, @BBlack wrote: Continuing with some stuff I was saying in IRC the other day. At the "new normal", we're seeing something in the approximate ballpark of 400/s articles purged (which is then multiplied commonly for

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-05-25 Thread Krinkle
Krinkle added a comment. In T124418#3276610, @BBlack wrote: Not resolved, as the purge graphs can attest! Which graphs? This issue is about a "massive increase" observed between December 2015 and January 2016, however our graphs don't go back far enough. It seems data prior to June 2016 has

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-06-29 Thread gerritbot
gerritbot added a comment. Change 294876 merged by jenkins-bot: Improve HTMLCacheUpdate job CDN purge de-duplication https://gerrit.wikimedia.org/r/294876TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-06-17 Thread gerritbot
gerritbot added a comment. Change 295027 had a related patch set uploaded (by GWicke): For discussion: Reduce purge volume by moving dependent purges to RefreshLinksJob https://gerrit.wikimedia.org/r/295027TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-06-16 Thread gerritbot
gerritbot added a comment. Change 294876 had a related patch set uploaded (by Aaron Schulz): Improve HTMLCacheUpdate job CDN purge de-duplication https://gerrit.wikimedia.org/r/294876TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-06-01 Thread ori
ori added a comment. We are definitely sending duplicate purges. I added some debug logging and saw URLs get purged a half dozen times for the same job. TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-06-01 Thread ori
ori added a comment. Breakdown of 10,718,138 PURGEs, captured on 2016-06-01 between 18:00 and 22:00 UTC: Top PURGE issuers by wiki - | Wiki | Percent | | | --- | | svwiki | 24.64% | | itwiki | 12.05% | | enwiki

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-05-03 Thread BBlack
BBlack added a comment. I really don't think it's specifically Wikidata-related either at this point. Wikidata might be a significant driver of update jobs in general, but the code changes driving the several large rate increases were probably generic to all update jobs. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-05-03 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. So when I look at http://graphite.wikimedia.org/render/?width=586=308&_salt=1453466175.547=MediaWiki.jobqueue.inserts_actual.htmlCacheUpdate.rate=-180days it recently got even worse. But Adam says

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-04-07 Thread BBlack
BBlack added a comment. F3845100: Screen Shot 2016-04-07 at 7.47.28 PM.png TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: BBlack Cc: Smalyshev,

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-03-10 Thread Addshore
Addshore added a comment. For easy access from the ticket the dashboard for the above patchset is at https://grafana.wikimedia.org/dashboard/db/wikipageupdater-calls TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-26 Thread gerritbot
gerritbot added a comment. Change 268588 merged by jenkins-bot: Count calls to WikiPageUpdater methods https://gerrit.wikimedia.org/r/268588 TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-04 Thread gerritbot
gerritbot added a subscriber: gerritbot. gerritbot added a comment. Change 268588 had a related patch set uploaded (by Addshore): Count calls to WikiPageUpdater methods https://gerrit.wikimedia.org/r/268588 TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-03 Thread daniel
daniel added a comment. In https://phabricator.wikimedia.org/T124418#1993413, @Addshore wrote: > In the last 20 minutes we had roughly 7000 edits ...and every such edit naturally causes at least one HTMLCacheUpdate on wikidata.org. TASK DETAIL https://phabricator.wikimedia.org/T124418

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-03 Thread Addshore
Addshore added a comment. In https://phabricator.wikimedia.org/T124418#1985370, @ori wrote: > Distribution of (indirect) callers of `HTMLCacheUpdate::__construct` for the > past 20 minutes: > > [fluorine:/a/mw-log] $ python /home/ori/cacheUpdateGrepper.py | dist >

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-03 Thread BBlack
BBlack added a comment. So, current thinking is that at least one of (maybe two of?) the bumps are from moving what used to be synchronous HTCP purge during requests to JobRunner jobs which should be doing the same thing. However, assuming it's that alone (or even just investigating that part

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-03 Thread BBlack
BBlack added a comment. Well then apparently the 10/s edits to all projects number I found before is complete bunk :) http://wikipulse.herokuapp.com/ has numbers for wikidata edits that approximately line up with yours, and then shows Wikipedias at about double that rate (which might be a

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread BBlack
BBlack added a comment. @ori - yeah that makes sense for the initial bump, and I think there may have even been a followup to do deferred purges, which may be one of the other multipliers, but I haven't found it yet (as in, insert an immediate job and also somehow insert one that fires a

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread BBlack
BBlack added a comment. Another data point from the weekend: In one sample I took Saturday morning, when I sampled for 300s, the top site being purged was srwiki, and something like 98% of the purges flowing for srwiki were all Talk: pages (well, with Talk: as %-encoded something in Serbian).

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread daniel
daniel added a comment. Yea, looks like the srwiki talk pages wasn't us, but an edit to a much-used template. TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Legoktm, Addshore,

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread BBlack
BBlack added a comment. Regardless, the average rate of HTCP these days is normally-flat-ish (a few scary spikes aside), and is mostly throttled by the jobqueue. The question still remains: what caused permanent, large bumps in the jobqueue htmlCacheUpdate insertion rate on ~Dec4, ~Dec11, and

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. Thanks. I looked at one of them and the only thing in the page is the template for the header they have. This was also the only edit ever to that page. In the header template itself however there was an edit on the 30th:

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. Very strange. Wikidata use on templates on talk pages isn't impossible but I'd consider it pretty unlikely. H. @addshore said he'd look into some stats as well on our side. TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread daniel
daniel added a comment. @BBlack can you give a few examples of such pages on srwiki? Were these non-existant talk pages, or talk pages emptied by an archival process? Maybe someone on srwiki messed with their talkpage archive header template? That would purge a *lot* of pages... TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread ori
ori added a comment. Distribution of purge URLs by hostname: [fluorine:/a/mw-log] $ field 7 AdHocDebug.log | sed 's/\.m\./\./' | sort | dist | head Key|Ct (Pct)Histogram en.wiktionary.org|893075 (18.66%)

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. FYI: Wiktionary isn't supported yet by Wikidata so at least that part can't come from us. TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lydia_Pintscher Cc:

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread ori
ori added a comment. Distribution of (indirect) callers of `HTMLCacheUpdate::__construct` for the past 20 minutes: [fluorine:/a/mw-log] $ python /home/ori/cacheUpdateGrepper.py | dist Key|Ct (Pct) Histogram

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread Legoktm
Legoktm added a subscriber: Legoktm. Legoktm added a comment. In https://phabricator.wikimedia.org/T124418#1985525, @BBlack wrote: > it's not like we gained a 5x increase in human article editing rate... If templates have been updated to use arbitrary access from Wikidata, one edit to an item

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread BBlack
BBlack added a comment. Well, we have 3 different stages of rate-increase in the insert graph, so it could well be that we have 3 independent causes to look at here. And it's not necessarily true that any of them are buggy, but we need to understand what they're doing and why, because maybe

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread BBlack
BBlack added a comment. Continuing with some stuff I was saying in IRC the other day. At the "new normal", we're seeing something in the approximate ballpark of 400/s articles purged (which is then multiplied commonly for ?action=history and mobile and ends up more like ~1600/s actual HTCP

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-26 Thread Addshore
Addshore added a subscriber: Addshore. Addshore added a comment. As far as I can tell in Wikibase - WikiPgaeUpdater::scheduleRefereshLinks creates RefreshLinksJobs. - These jobs call Content::getSecondaryDataUpdates and AbstractContent which can create LinksUpdates - These in turn can

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-25 Thread JanZerebecki
JanZerebecki added a comment. https://grafana-admin.wikimedia.org/dashboard/db/tmp-t124418 TASK DETAIL https://phabricator.wikimedia.org/T124418 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JanZerebecki Cc: daniel, hoo, aude, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-25 Thread ori
ori added a comment. In https://phabricator.wikimedia.org/T124418#1963710, @JanZerebecki wrote: > https://grafana-admin.wikimedia.org/dashboard/db/tmp-t124418 What is that dashboard supposed to be showing? It is very hard to make sense of. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-25 Thread JanZerebecki
JanZerebecki added a comment. I added it so I can look at the things related to this ticket in one graph (queue size, htmlCacheUpdate pop, htmlCacheUpdate insert/push, purge, scap). I looked at a few other stats related to Wikidata initiated purges. I found nothing more yet than what was

[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-25 Thread BBlack
BBlack added a comment. Yeah but the rate increase we're looking at is actually in the htmlCacheUpdate job insertion rate, regardless of magnification due to pages-affected-per-update. I'm surprised that we don't have any logs/data as to the source of those jobs. TASK DETAIL