[Wikidata-bugs] [Maniphest] T331356: Wikidata seems to still be utilizing insecure HTTP URIs

2023-04-13 Thread BBlack
BBlack added a comment.


  In T331356#8718619 <https://phabricator.wikimedia.org/T331356#8718619>, 
@MisterSynergy wrote:
  
  > Some remarks:
  >
  > - We should consider these canonical HTTP URIs to be //names// in the first 
place, which are unique worldwide and issued by the Wikidata project as the 
"owner" [1] of the wikidata.org domain. The purpose of these //names// is to 
identify things.
  
  If they're only names, that's relatively-fine.  However, there are user 
agents that end up following them as access URIs.  If we could control every 
agent, we could require that they all upconvert to HTTPS for access, but we 
can't.
  
  > - Following linked data principles, it is no coincidence that these names 
happen to be valid URIs. These are meant to be used to look up information 
about the named entity. It is okay to redirect a canonical URI to another 
location, including of course to a secure HTTPS location.
  
  The problem with relying on redirects is that they're insecure.The 
initial request goes over the wire in the clear, as does the initial redirect 
response.  They can both be hijacked, modified, censored, and surveilled, 
before the redirect to HTTPS ever happens.  An advanced agent on the wire (like 
a national telecom) can even persistently hijack a whole session this way, by 
proxying the traffic into our servers as HTTPS.
  
  We support redirects as a "better than breakage/nothing" solution, but 
ideally UAs shouldn't ever utilize insecure HTTP to begin with.  This is why 
all of our Canonical URIs (in the HTTP/HTML sense) begin with `https`, as 
evidenced in all the normal pageviews' `https://...` tags.
  
  > - To my understanding, HSTS can be used to secure all but the first request 
of a client (that supports HSTS).
  
  It can be, and we ever participate in HSTS Preload for all of our canonical 
domains as well, which protects even the first request to a domain from 
browsers which use the preload list.  However, there are many clients, 
especially bots and scripted tools, which rely on HTTP libraries or CLI tools 
which do not, by default, honor HSTS or load the preload list.
  
  > - Canonical HTTP URIs are still widespread in many other linked data 
resources, since many projects have started issueing these before everything 
transitioned to HTTPS. Some projects have transitioned to canonical HTTPS URIs, 
however, with GND doing this in 2019 being a prominent example [3].
  
  This would be the ideal end-outcome: that we're able to transition the URLs 
to be HTTPS everywhere.  Barring that, we could also look at where and how 
they're being emitted.  We may have HTML page outputs which are rendering these 
canonical URIs for access purposes, where it would make sense to convert them 
to HTTPS as part of the rendering process to cut down on the problem.

TASK DETAIL
  https://phabricator.wikimedia.org/T331356

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: OlafJanssen, MisterSynergy, BCornwall, Bugreporter, Ennomeijers, Nikki, 
Volans, Aklapper, BBlack, Astuthiodit_1, KOfori, karapayneWMDE, joanna_borun, 
Invadibot, Devnull, maantietaja, Muchiri124, ItamarWMDE, Akuckartz, 
Legado_Shulgin, ReaperDawn, Nandana, Davinaclare77, Techguru.pc, Lahi, Gq86, 
GoranSMilovanovic, Hfbn0, QZanden, LawExplorer, Zppix, _jensen, rosalieper, 
Scott_WUaS, Wong128hk, Wikidata-bugs, aude, faidon, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T330906: HTTP URIs do not resolve from NL and DE?

2023-03-06 Thread BBlack
BBlack closed this task as "Resolved".
BBlack added a comment.


  The redirects are neither //good// nor //bad//, they're instead both 
necessary (although that necessity is waning) and insecure.  We thought we had 
standardized on all canonical URIs being of the secure variant ~8 years ago, 
and this oversight has flown under the radar since then, only to be exposed 
recently when we intentionally (for unrelated operational reasons) partially 
degraded our port 80 services.
  
  I've made a new ticket, since that seems better all around.  Let's move the 
rest of this discussion there.

TASK DETAIL
  https://phabricator.wikimedia.org/T330906

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ennomeijers, BBlack
Cc: TheDJ, jbond, bking, akosiaris, Nikki, Vgutierrez, BBlack, Ennomeijers, 
Aklapper, Astuthiodit_1, KOfori, karapayneWMDE, joanna_borun, Invadibot, 
Devnull, maantietaja, Muchiri124, ItamarWMDE, Akuckartz, Legado_Shulgin, 
ReaperDawn, Nandana, Davinaclare77, Techguru.pc, Lahi, Gq86, GoranSMilovanovic, 
Hfbn0, QZanden, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Wong128hk, 
Wikidata-bugs, aude, faidon, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T331356: Wikidata seems to still be utilizing insecure HTTP URIs

2023-03-06 Thread BBlack
BBlack created this task.
BBlack triaged this task as "High" priority.
BBlack added projects: Wikidata, Traffic.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: wdwb-tech.

TASK DESCRIPTION
  It has come to our attention via T330906 
<https://phabricator.wikimedia.org/T330906> that some part of the Wikidata 
software/ecosystem is emitting insecure HTTP URIs that some UAs are consuming 
for insecure access.  We need to find a way to secure these accesses.  We also 
need to understand a little more about the nature of the use of these URIs as 
identifiers and what the challenges are in changing them at some level (either 
rewriting them just for output purposes, or changing them in a deeper way).

TASK DETAIL
  https://phabricator.wikimedia.org/T331356

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Aklapper, BBlack, Astuthiodit_1, KOfori, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T330906: HTTP URIs do not resolve from NL and DE?

2023-03-06 Thread BBlack
BBlack reopened this task as "Open".
BBlack added a comment.


  In T330906#8661013 <https://phabricator.wikimedia.org/T330906#8661013>, 
@Ennomeijers wrote:
  
  > As I already mentioned earlier, the SPARQL endpoint and the RDF serialized 
data all use the HTTP version as the canonical identifier. This makes sense to 
me and is, as far as I know, in line with other linked data best practices. But 
there needs to be a machine readable way to access the data.
  >
  > Using a 301 to redirect to the HTTPS url is the correct approach and in 
fact this is already implemented and currently working again from my end. When 
I run the same command as mentioned in my first report I now do get a 301 
reply. I hope this will keep working in this way until HTTP are no longer used 
within WD. I will close the issue for now.
  
  Please don't close this task unless we're replacing it with a more-focused 
one on the uncovered issues here.  For the reasons stated earlier, relying on 
the 301 to "fix" this is not the correct approach.  We can open a separate new 
task if you prefer, but either way we need to get this properly addressed (by 
having all live links in our control use the proper canonical URIs via 
`https://`).

TASK DETAIL
  https://phabricator.wikimedia.org/T330906

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ennomeijers, BBlack
Cc: bking, akosiaris, Nikki, Vgutierrez, BBlack, Ennomeijers, Aklapper, 
Astuthiodit_1, KOfori, karapayneWMDE, joanna_borun, Invadibot, Devnull, 
maantietaja, Muchiri124, ItamarWMDE, Akuckartz, Legado_Shulgin, ReaperDawn, 
Nandana, Davinaclare77, Techguru.pc, Lahi, Gq86, GoranSMilovanovic, Hfbn0, 
QZanden, LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Wong128hk, 
Wikidata-bugs, aude, faidon, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T330906: HTTP URIs do not resolve from NL and DE?

2023-03-02 Thread BBlack
BBlack added a comment.


  In T330906#8657917 <https://phabricator.wikimedia.org/T330906#8657917>, 
@Ennomeijers wrote:
  
  > Thanks for the replies! Advising to use HTTPS over HTTP makes sense.
  >
  > But not supporting redirection from HTTP to HTTPS will in my opinion 
introduce a fundamental problem for using Wikidata as a source for Linked Data. 
When querying Wikidata through the sparql endpoint the entities of the result 
set are all HTTP URIs. The RDF description of WD entities (accessed as 
described on https://www.wikidata.org/wiki/Wikidata:Data_access) contain many 
HTTP URIs for related entities and other resources.
  >
  > Using the HTTP as identifier for the entity is not problematic as long as 
the redirection from HTTP to HTTPS can deliver access to the data itself.
  
  
  
  In T330906#8660183 <https://phabricator.wikimedia.org/T330906#8660183>, 
@Ennomeijers wrote:
  
  > I think this touches upon a fundamental question of how to model WD 
information as Linked Data. As currently stated in 
https://www.wikidata.org/wiki/Wikidata:Data_access the //concept URI// of an 
entity is its **HTTP** version.
  
  We don't have plans to get rid of port 80 HTTP->HTTPS redirects anytime soon. 
 However, we consider that traffic pretty low priority, and in this particular 
case we partially disabled it temporarily while dealing with an operational 
incident, which luckily led to us uncovering this issue!
  
  However, the canonical (i.e. "official", "should be used in all links") URIs 
for traffic/access to all Wikimedia project domains are HTTPS URIs, not HTTP 
ones.  We shouldn't be publishing plain-HTTP URIs.
  
  The HTTP->HTTPS redirects are designed to help smooth over issues with legacy 
links we don't control out in the wild Internet, when accessed by UAs that 
don't respect HSTS.  These redirects, by their nature, are **not secure**.  
When users access content through plain-HTTP URIs, even though we try serve a 
helpful 301 redirect to HTTPS, literally anyone on the Internet path between 
the user and the WMF can both see and modify both the request and the response 
in flight.
  
  The initial, insecure request via plain-HTTP can be censored, surveilled, and 
modified.  This means individual resources can be blocked/censored/replaced by 
bad actors.  The article names you're reading can be catalogued to build 
profiles on readers.  The intended 301 redirect can be replaced with something 
completely different, such as a redirect to a different site, an alternative 
version of our content, or even an attack payload or banner ad injection.
  
  All of that aside, HTTP access is also going to perform worse, as you have to 
do the full TCP and HTTP transaction (multiple latency roundtrips) just to get 
the redirect response, then start over again with a fresh HTTPS transaction 
again on a fresh TCP connection (more redundant network roundtrips).  Normal 
redirects that stay within one protocol and domainname can generally re-use the 
same connection, but not HTTP->HTTPS protocol upgrade redirects.
  
  For all of these reasons, for Traffic purposes, all canonical URIs for our 
projects are HTTPS, not HTTP.  We hadn't been aware that anything wikidata 
-related was publishing canonical URIs that start with `http://`, and we're 
collectively going to need to find a way to stop doing that.
  
  > Accessing the data associated with the concept URI should be possible both 
for humans (through browsers) and for applications. Can you point me to 
examples for machine readable processing using the HTTP Strict Transport 
Security implementation or is this a browser only solution?
  
  HSTS is basically a legacy transition mechanism, much like the redirects, but 
both stronger and less-universal.  It's defined in 
https://www.rfc-editor.org/rfc/rfc6797 .  Its goal is to help paper over issues 
exactly like these - the first time you access 
`https://www.wikidata.org/`, you get an extra header that informs the 
UA that all future accesses to this whole domain should be upgraded to HTTPS 
without attempting plain HTTP at all.  Further, there's a public "HSTS Preload" 
list at https://hstspreload.org/ that all modern browsers utilize, and which 
contains all of our domains.  This avoids the problem of first access and HSTS 
caching, so that Preloading UAs transform even the first HTTP access to HTTPS 
before sending anything over the network.
  
  It's not specific to browsers; it's implemented as some generic headers that 
are intended to be honored by any UA, but obviously many less-user-focused UAs 
(various HTTP library implementations for scripts, the curl CLI tool, etc) do 
not necessarily implement it strongly.

TASK DETAIL
  https://phabricator.wikimedia.org/T330906

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: akosiaris, Nikki, Vgutierrez, BBlack, Ennomeijers, 

[Wikidata-bugs] [Maniphest] T284981: SELECT query arriving to wikidatawiki db codfw hosts causing pile ups during schema change

2021-10-08 Thread BBlack
BBlack added a comment.


  We chose S:BP for those queries on the assumption that, by its nature, it 
would be a cheap page to monitor.  Is there a better option we should be using, 
or is this ticket more about fixing inefficiencies in it?

TASK DETAIL
  https://phabricator.wikimedia.org/T284981

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, ArielGlenn, Ladsgroup, Addshore, Aklapper, Marostegui, 
joanna_borun, the0001, Invadibot, Zabe, Selby, Devnull, AndreCstr, maantietaja, 
XeroS_SkalibuR, lmata, Muchiri124, Akuckartz, RhinosF1, Legado_Shulgin, 
DannyS712, ReaperDawn, Nandana, Mirahamira, Davinaclare77, Techguru.pc, Lahi, 
Gq86, Markhalsey, GoranSMilovanovic, Jayprakash12345, Hfbn0, QZanden, 
LawExplorer, Zppix, _jensen, rosalieper, Scott_WUaS, Wong128hk, Wikidata-bugs, 
aude, mark, faidon, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T266702: Move WDQS UI to microsites

2020-10-29 Thread BBlack
BBlack added a comment.


  We can route different URI subspaces differently at the edge layer, based on 
URI regexes, as shown here for the split of the API namespace of the primary 
wiki sites:
  
  
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/hieradata/common/profile/trafficserver/backend.yaml#263

TASK DETAIL
  https://phabricator.wikimedia.org/T266702

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, Gehel, Dzahn, Addshore, Aklapper, lmata, CBogen, Akuckartz, 
Legado_Shulgin, Nandana, Namenlos314, Davinaclare77, Qtn1293, Techguru.pc, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, 
Mahir256, QZanden, EBjune, merbst, LawExplorer, Salgo60, Zppix, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Lydia_Pintscher, faidon, Mbch331, Rxy, 
Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T237319: 502 errors on ATS/8.0.5

2019-11-26 Thread BBlack
BBlack added a comment.


  I think you ran into a temporary blip in some unrelated DNS work (which is 
already dealt with), not this bug (502 errors can happen for real infra failure 
reasons, too!)

TASK DETAIL
  https://phabricator.wikimedia.org/T237319

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, BBlack
Cc: BBlack, MoritzMuehlenhoff, darthmon_wmde, elukey, Addshore, WMDE-leszek, 
Ladsgroup, CDanis, Joe, Vgutierrez, ema, Nikerabbit, DannyS712, Aklapper, 
Legado_Shulgin, Nandana, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, 
GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, LawExplorer, Zppix, _jensen, 
rosalieper, Scott_WUaS, Jonas, Wong128hk, Wikidata-bugs, aude, Lydia_Pintscher, 
faidon, Mbch331, Rxy, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T232006: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients

2019-09-18 Thread BBlack
BBlack added a comment.


  We'll also need to normalize the incoming `Accept` headers up in the edge 
cache layer to avoid pointless vary explosions.  Ideally the normalization 
should exactly match the application-layer logic that chooses the output 
content type.  Do you have some pseudo-code (or real code link is fine too) 
description of how accept is parsed to select content-types?

TASK DETAIL
  https://phabricator.wikimedia.org/T232006

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, Lucas_Werkmeister_WMDE, Aklapper, Lexie_23Kr, Hook696, Daryl-TTMG, 
RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, 
joker88john, Legado_Shulgin, DannyS712, CucyNoiD, Nandana, NebulousIris, 
thifranc, AndyTan, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, 
Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, Techguru.pc, Lahi, Gq86, Af420, 
Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Hfbn0, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, WSH1906, 
Lewizho99, Zppix, Maathavan, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, 
faidon, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T99531: [Task] move wikiba.se webhosting to wikimedia cluster

2019-08-14 Thread BBlack
BBlack added a comment.


  As noted in T155359 <https://phabricator.wikimedia.org/T155359> - WMDE has 
moved the hosting of this to some other platform, including the DNS hosting 
(and we never had the whois entry).  So this task can resolve as Decline I 
think (or whatever), but we should use it to track down various revert patches 
first before we close it up (revert the DNS repo stuff and whatever else we've 
got going on in various other repos supporting the wikiba.se site).

TASK DETAIL
  https://phabricator.wikimedia.org/T99531

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Abraham, Franziska_Heine, CRoslof, MasinAlDujailiWMDE, WMDE-leszek, abian, 
BBlack, Lucas_Werkmeister_WMDE, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, 
mark, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, 
JeroenDeDauw, hoo, JanZerebecki, Aklapper, Hook696, Daryl-TTMG, RomaAmorRoma, 
0010318400, E.S.A-Sheild, darthmon_wmde, joker88john, Legado_Shulgin, 
DannyS712, CucyNoiD, Nandana, NebulousIris, thifranc, jijiki, AndyTan, 
Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, 
Qtn1293, Cpaulf30, Techguru.pc, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, 
Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, 
QZanden, LawExplorer, WSH1906, Lewizho99, Zppix, Maathavan, _jensen, 
rosalieper, Wong128hk, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331, Jay8g, 
fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T99531: [Task] move wikiba.se webhosting to wikimedia cluster

2019-04-25 Thread BBlack
BBlack added a comment.


  @WMDE-leszek Thanks for looking into it!  I believe @CRoslof is who you want 
to coordinate with on our end, whose last statement on this topic back in 
January was:
  
  In T99531#4878798 <https://phabricator.wikimedia.org/T99531#4878798>, 
@CRoslof wrote:
  
  > Transferring the domain name from WMDE to the Foundation requires that WMDE 
complete an ownership change form. I emailed with @Abraham and the Foundation's 
domain name registrar about it a while back, but the paperwork was never 
completed. Let me know when WMDE is ready to move forward with the transfer.

TASK DETAIL
  https://phabricator.wikimedia.org/T99531

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dzahn, BBlack
Cc: Abraham, Franziska_Heine, CRoslof, MasinAlDujailiWMDE, WMDE-leszek, abian, 
BBlack, Lucas_Werkmeister_WMDE, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, 
mark, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, 
JeroenDeDauw, hoo, JanZerebecki, Aklapper, alaa_wmde, joker88john, 
Legado_Shulgin, CucyNoiD, Nandana, NebulousIris, thifranc, AndyTan, Gaboe420, 
Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, 
Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, 
LawExplorer, WSH1906, Lewizho99, Zppix, Maathavan, _jensen, rosalieper, 
Wong128hk, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T99531: [Task] move wikiba.se webhosting to wikimedia cluster

2019-04-25 Thread BBlack
BBlack added a comment.


  Re: `wikibase.org`, adding it as a non-canonical redirection to catch 
confusion from those that manually type URLs is fine, but we should make sure 
everyone is clear on which domainname is canonical for this project (I assume 
`https://wikiba.se/`) and make sure that's the only one that's published, 
promoted, and used for links we control and such.  It's an important notion 
that one name is canonical!
  
  Re: the HSTS/HTTPS stuff in 
https://gerrit.wikimedia.org/r/c/operations/puppet/+/500711 :
  
  - It's policy for canonical domains we support here, so there's no real 
debate about whether we'll end up with full-value HSTS and the HTTP->HTTPS 
redirect.
  - But we don't need this separate patch at the apache level; we'll handle it 
in VCL with the same code that handles the other canonical project domains.
  
  Re: handing off registration - I really think we should stop touching this 
whole project until this gets resolved, which means stalling on the above 
HSTS/HTTPS work and on the switch of IPs.
  
  This is a policy issue as well, which we've tried to explain politely more 
than once in this thread, but if it's going to end up being a blocker there's 
no point expending further effort on this until they figure out what direction 
they want to go.  I think the original statement way back from @Faidon was that 
it was a "very strong preference" that we get ownership transfer, but in fact 
we'd already made the declaration that it's a policy requirement about a week 
before that in 
https://wikitech.wikimedia.org/wiki/HTTPS#For_the_Foundation's_canonical_domainnames
 , and honestly I really don't want to wade into the mess of having an 
exception to those rules during all the future improvements we have coming at 
the DNS and HTTPS layers.  Even just applying strong HSTS with ownership issues 
seems irresponsible of us at best, as the current hosting WMDE has it on lacks 
HTTPS entirely.

TASK DETAIL
  https://phabricator.wikimedia.org/T99531

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Dzahn, BBlack
Cc: Abraham, Franziska_Heine, CRoslof, MasinAlDujailiWMDE, WMDE-leszek, abian, 
BBlack, Lucas_Werkmeister_WMDE, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, 
mark, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, 
JeroenDeDauw, hoo, JanZerebecki, Aklapper, alaa_wmde, joker88john, 
Legado_Shulgin, CucyNoiD, Nandana, NebulousIris, thifranc, AndyTan, Gaboe420, 
Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, 
Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, 
GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, 
LawExplorer, WSH1906, Lewizho99, Zppix, Maathavan, _jensen, rosalieper, 
Wong128hk, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T217897: Reduce / remove the aggessive cache busting behaviour of wdqs-updater

2019-03-12 Thread BBlack
BBlack added a comment.


  I think it would be better, from my perspective, to really understand the 
use-cases better (which I don't).  Why do these remote clients need "realtime" 
(no staleness) fetches of Q items?  What I hear is it sounds like all clients 
expect everything to be perfectly synchronous, but I don't understand why they 
need to be perfectly synchronous.  In the case that lead to this ticket, it was 
a remote client at Orange issuing a very high rate of these uncacheable 
queries, which seems like a bulk data load/update process, not an "I just 
edited this thing and need to see my own edits reflected" sort of case.

TASK DETAIL
  https://phabricator.wikimedia.org/T217897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, Nandana, 
thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, 
Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T217897: Reduce / remove the aggessive cache busting behaviour of wdqs-updater

2019-03-08 Thread BBlack
BBlack added a comment.


  Looking at an internal version of the flavor=dump outputs of an entity, 
related observations:
  
  Test request from the inside: `curl -v 
'https://www.wikidata.org/wiki/Special:EntityData/Q15223487.ttl?flavor=dump' 
--resolve www.wikidata.org:443:10.2.2.1`
  
  - There is LM data, for this QID it currently says:  `last-modified: Fri, 08 
Mar 2019 06:24:59 GMT`
  - This could be used with standard HTTP conditional requests for 
`If-Modified-Since`.  This would still cause a ping through to the applayer, 
but would not transfer the body if no change.
  - Or alternatively, use the same data that's informing the LM/IMS conditional 
stuff to set metadata in the dump output as well, so that your queries can use 
this as a datestamp that's shared among more clients (this is basically the 
`use event date` idea from the summary), so that it doesn't even need an LM/IMS 
roundtrip and can be a true cache hit.
  - The CC header is: `cache-control: public, s-maxage=3600, max-age=3600`
  - 1H seems short in general.  We prefer 1d+ for the actual CC times 
advertised by major cacheable production endpoints so that everything doesn't 
go stale too quickly during minor maintenance work on a cache or a site.  Is 
there a reason (often it's set low because other issues around purging and this 
kind of update traffic not being well-engineered yet?).
  - However, assuming the 1H is staying for now, can't updaters just be ok with 
up to 1H of stale data and not cache bust at all?  There's no such thing as 
async+realtime; there's always a staleness, it's just a question of how much is 
tolerable for the use-case.

TASK DETAIL
  https://phabricator.wikimedia.org/T217897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, Nandana, thifranc, 
AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, 
Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, 
fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T99531: [Task] move wikiba.se webhosting to wikimedia cluster

2019-02-20 Thread BBlack
BBlack added a comment.
There are different layers of "handing off" DNS management which are being conflated, but to run through them in order:


 "Point the A record to the right place" - We don't support this, and can't realistically.  We need control of the zone data directly on our nameservers for a variety of technical reasons (e.g. setting policy controls like CAA authorizations, future ESNI-related records, etc), and we don't use simple A-records, we use a dynamic system that hands out any of a number of addresses to the nearest of our global edge datacenters, and these things evolve over time on a technical level.  If we're going to host something in our production infrastructure and manage it correctly, we have to move to at least the next level of handoff:



Leaving the registration of the domain with WMDE and their registrar, but having the Nameservers pointed at WMF nameservers.  This is what we've already done earlier in the ticket and where we're at now.  Currently the domain is registered to WMDE (presumably, it's hidden in public view) via registrar "united-domains", and the nameserver values are set to the 3x WMF nameserver hosts (ns0.wikimedia.org, ns1.wikimedia.org, and ns2.wikimedia.org, at specific IP addresses for each).  This allows the WMF nameservers and SRE staff to do all the basic technical things referenced above, and is the first logical step before:



Switching the registration to WMF's registrar/ownership.  This is more on a policy/standards/legal level, and maybe @CRoslof can give more details than me on that front about legal-related things.  It would be odd in the general case to be canonicalizing a WMF domain without registrar control though, as it could be swapped out from under us at any time.  However, even on a purely technical level it matters to us as well: we have future plans to deploy more authdns servers, change their IPs, and deploy anycasted authdns as well, all of which require WMF to have tight control over the registrar settings for all the domains we host resources for so that we can get through transition periods smoothly as those ns[012] hostnames and their IPs change.  It's not scalable for those processes to involve contacting N third parties and having them all indirectly contact their registrars on our behalf, etc.
TASK DETAILhttps://phabricator.wikimedia.org/T99531EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Dzahn, BBlackCc: Abraham, Franziska_Heine, CRoslof, MasinAlDujailiWMDE, WMDE-leszek, abian, BBlack, Lucas_Werkmeister_WMDE, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, mark, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, JeroenDeDauw, hoo, JanZerebecki, Aklapper, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, LawExplorer, Zppix, _jensen, Wong128hk, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T99531: [Task] move wikiba.se webhosting to wikimedia cluster

2018-12-13 Thread BBlack
BBlack added a comment.
There's still a couple of things that can be done serially at present, one of which is necessary for the cert issuance later:


Switch the nameservers for wikiba.se to ns[012].wikimedia.org with your current registrar (United Domains).  We have to have this to later issue the cert at all.  The cert likely won't be issued until sometime in Jan/Feb.
Switch the ownership/registration of wikiba.se to the Foundation and its registrar(s).  This isn't required to issue the cert on a technical level, but as a matter of general policy we'll want this done at some point before we're really hosting wikiba.se, and there's nothing blocking it after (1) is done above.
TASK DETAILhttps://phabricator.wikimedia.org/T99531EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Dzahn, BBlackCc: MasinAlDujailiWMDE, WMDE-leszek, abian, BBlack, Lucas_Werkmeister_WMDE, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, mark, greg, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, JeroenDeDauw, hoo, JanZerebecki, Aklapper, Legado_Shulgin, CucyNoiD, Nandana, NebulousIris, thifranc, AndyTan, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Zppix, Maathavan, _jensen, D3r1ck01, Wong128hk, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T99531: [Task] move wikiba.se webhosting to wikimedia cluster

2018-11-21 Thread BBlack
BBlack added a comment.
Thanks for the data and the patch!  We'll dig into the DNS patch next week and get it merged in so we're serving wikiba.se from our DNS as-is (as in, pointing at your existing server IPs).  Then we can do handoff of the domain ownership/registration without causing any interruptions.

That gets us over the first handoff hurdle, at which point it's on #Traffic to get wikiba.se certs added to our cache clusters using DNS-based validation (again, server IPs still pointing at current server throughout).  We're migrating our existing LE certs to a new solution this quarter ( T207050 ), and after that's done we'll look early in the next at defining these new certs' slightly more-complicated case.  Once those are issued and deployed, you'll have some time (if needed!) to test and refine the data in our version of the site is hosting ( https://gerrit.wikimedia.org/r/plugins/gitiles/wikibase/wikiba.se/+/master ), and then we switch server IPs and we're done.TASK DETAILhttps://phabricator.wikimedia.org/T99531EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Dzahn, BBlackCc: MasinAlDujailiWMDE, WMDE-leszek, abian, BBlack, Lucas_Werkmeister_WMDE, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, mark, greg, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, JeroenDeDauw, hoo, JanZerebecki, Aklapper, Legado_Shulgin, CucyNoiD, Nandana, NebulousIris, thifranc, AndyTan, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Zppix, Maathavan, D3r1ck01, Wong128hk, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T206105: Optimize networking configuration for WDQS

2018-10-15 Thread BBlack
BBlack added a comment.
Yes, let's look at this today.  I think we need better tg3 ethernet card support in interface::rps for one of our authdnses anyways, which you'll need here too.TASK DETAILhttps://phabricator.wikimedia.org/T206105EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Gehel, BBlackCc: gerritbot, BBlack, Aklapper, Gehel, CucyNoiD, Nandana, NebulousIris, thifranc, AndyTan, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Zppix, Maathavan, Jonas, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T99531: [Task] move wikiba.se webhosting to wikimedia misc-cluster

2018-08-18 Thread BBlack
BBlack added a comment.
There are plans underway at this point to support multiple LE certs on our standard cache terminators via the work in T199711 due by EOQ (end of Sept), which would make this whole thing simpler and zero cert cost.  I couldn't say for sure how fast we'll shake out all the bugs in such a system after initial deployment, but I'd hope quickly.

In the interim, our best option aside from waiting would be to purchase a commercial DV wikiba.se cert and deploy it on the caches (which requires a little bit of testing, we haven't run multiple SNI certs there in a while now).  Nobody's worked on this in a while on our end, mostly for lack of priority/time/focus.

In either case, the first few steps are relatively-trivial and would be the same:


Create a wikiba.se microsite in WMF infra (already done by @Dzahn I believe, sourcing from https://gerrit.wikimedia.org/r/plugins/gitiles/wikibase/wikiba.se/+/master )
Create a wikiba.se template in our authdns, matching the current data (including current non-WMF server IPs) - any complications here, e.g. MX service is currently to udag.de, we can mirror that setting for now I guess.  Any other service hostnames besides wikiba.se and www.wikiba.se pointing at 89.31.143.100?).
Move authdns control for wikiba.se over to the WMF nameservers (no-op for users, but allows DV on our end).
[Issue commercial DV cert to caches to avoid waiting, and/or deploy automated LE DV cert to caches at a later date]
TASK DETAILhttps://phabricator.wikimedia.org/T99531EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Dzahn, BBlackCc: abian, BBlack, Lucas_Werkmeister_WMDE, Liuxinyu970226, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, mark, greg, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, JeroenDeDauw, thiemowmde, hoo, JanZerebecki, Aklapper, AndyTan, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Davinaclare77, Adrian1985, Qtn1293, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Zppix, Maathavan, Wong128hk, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T199219: WDQS should use internal endpoint to communicate to Wikidata

2018-07-11 Thread BBlack
BBlack added a comment.
It's a complicated topic I think, on our end.  There are ways to make it work today, but when I try to write down generic steps any internal service could take to talk to any other (esp MW or RB), it bogs down in complications that are probably less than ideal in various language/platform contexts.

For this very particular case, the simplest way would be to do your language/platform/library's equivalent of:

curl -H 'Host: www.wikidata.org' 'https://appservers-ro.discovery.wmnet/wiki/Special:EntityData/Q2408871.ttl?nocache=1530836328152=dump'

That is, use the internal service endpoint hostname in the URI for TLS connection purposes, but then explicitly set the request Host header to www.wikidata.org for use at the HTTP level.

Whether you need the appservers-rw or api-ro or restbase-async (...) for a particular URL path for other cases underneath www.wikidata.org is the deep complication hereTASK DETAILhttps://phabricator.wikimedia.org/T199219EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: BBlack, Aklapper, Smalyshev, Gehel, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T199146: "Blocked" response when trying to access constraintsrdf action from production host

2018-07-09 Thread BBlack
BBlack added a comment.
This raises some questions that are probably unrelated to the problem at hand, but might affect things indirectly:


Why is an internal service (wdqs) querying a public endpoint?  It should probably use private internal endpoints like appservers.svc or api.svc, but there may be arguments about desirability of [Varnish] caching.  This is something we're grappling with in general in the longer-term (trying to understand and/or eliminate private internal service<->service traffic routing through the public edge unnecessarily).
Why is it using webproxy to access it?  It should be able to reach www.wikidata.org without any kind of proxy.
TASK DETAILhttps://phabricator.wikimedia.org/T199146EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: Mahir256, Jonas, Aklapper, BBlack, Gehel, Smalyshev, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T99531: [Task] move wikiba.se webhosting to wikimedia misc-cluster

2017-12-11 Thread BBlack
BBlack added a comment.
It's a pain any direction we slice this, and I'm not fond of adding new canonical domains outside the known set for individual low-traffic projects.  We didn't add new domains for a variety of other public-facing efforts (e.g. wdqs, ORES, maps, etc).

We don't have clear standards about these things, and frankly some of the existing legacy canonical project domains currently bloating our unified certs probably wouldn't comply with any standard I'd want to put in place here going forward, either.  Those other projects domains are "special" though, in that they can only reasonably be handled via commercial wildcards today due to the language-subdomain issues.

We're not structured in such a way that adding new domain registrations to our termination is trivial, and I'm not sure it's ever going to be completely trivial.  There is always going to be some overhead associated with it, and we don't in the general case want to build up a pile of canonical domains that are mostly low-traffic and/or defunct but kept around for historical compatibility.

The three paths forward to support the unique wikiba.se domainname on our termination are:


Add it to our unified certs that we just renewed.  This costs some $$ ongoing per-year, and will require us to prove ownership of the domain first before we integrate it (we need it in our DNS control either way).  It also bloats the unified certificate size sent with every session on all projects (e.g. every TLS handshake for enwiki), which makes it really unpalatable to add new things here that could just as easily have been wikimedia.org subdomains for smaller projects "for free".
Add it as a separate commercial cert deployed alongside the unified.  Same $$ ongoing.  More maintenance burden on our end (e.g. accounting for it in OCSP Stapling and nginx server configs, etc).  We've had multiple certificates deployed like this in the past (for wmfusercontent.org before it was integrated into the unified wildcards cert), but there have been several refactors during the era of one-cert-only, and so I'm not sure there isn't some debt to clean up before we successfully switch back to multiple, independent certs.
Add it separately as above, but using LetsEncrypt.  This avoids the $$ cost, but adds additional complexities to deal with initially, as our current puppetized deployment of LE certs isn't robust enough for our primary traffic terminators, only for smaller single-host/one-off sites.  It lacks dual-cert support (as in ECDSA+RSA), it lacks OCSP Stapling integration, and most-importantly it doesn't know how to do renewal updates for a service with many global traffic termination points (i.e. we need to add support for centralizing the renewal process with updates out to all the global edge terminators, as well as support on all the terminators to forward challenge requests to the central renewer, etc).
TASK DETAILhttps://phabricator.wikimedia.org/T99531EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Dzahn, BBlackCc: BBlack, Lucas_Werkmeister_WMDE, Liuxinyu970226, Stashbot, gerritbot, Dzahn, Lydia_Pintscher, mark, greg, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, JeroenDeDauw, thiemowmde, hoo, JanZerebecki, Aklapper, Lahi, Gq86, Baloch007, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Zppix, Maathavan, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-22 Thread BBlack
BBlack added a comment.
No, we never made an incident rep on this one, and  I don't think it would be fair at this time to implicate ORES as a cause.  We can't really say that ORES was directly involved at all (or any of the other services investigated here).  Because the cause was so unknown at the time, we stared at lots of recently-deployed things, and probably uncovered hints at minor issues in various services incidentally, but none of them may have been causative.

All we know for sure is that switching Varnish's default behavior from streaming to store-and-forward of certain applayer responses (which was our normal mode over a year ago) broke things, probably because some services are violating assumptions we hold.  Unfortunately proper investigation of this will stall for quite a while on our end, but we'll probably eventually come back with some analysis on that later and push for some fixups in various services so that we can move forward on that path again.  The RB timeouts mentioned earlier seem a more-likely candidate for what we'll eventually uncover than ORES at this point.TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: daniel, Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Abo00tamr, Lahi, Gq86, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Status] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-11-06 Thread BBlack
BBlack lowered the priority of this task from "High" to "Normal".BBlack changed the task status from "Open" to "Stalled".BBlack added a comment.
The timeout changes above will offer some insulation, and as time passes we're not seeing evidence of this problem recurring with the do_stream=false patch reverted.

Some related investigations on slow requests have turned up some pointers 120-240s timeouts on requests to the REST API at /api/rest_v1/transform/wikitext/to/html.  which are eerily similar to the kinds of problems we saw a while back in T150247 .  RB was dropping the connection from Varnish, and doing so in a way that Varnish would retry it indefinitely internally.  We patched Varnish to mitigate that particular problem in the past, but something related may be surfacing here...

We have a few steps to go here, but there's going to be considerable delays before we get to the end of all of this:


We have a preliminary patch to Varnish to limit the total response transaction time on backend requests (if the backend is dribbling response bytes often enough to evade hitting the between_bytes_timeout) at https://gerrit.wikimedia.org/r/#/c/387236/ .  However, the patch is built on Varnish v5, and cache_text currently runs Varnish v4.  We weren't planning to do any more Varnish v4 releases before moving all the clusters to v5 unless an emergency arose, as it complicates our process and timelines considerably, and this isn't enough of an emergency to justify it.  Therefore, this part is blocked on https://phabricator.wikimedia.org/T168529 .
We want to log slow backend queries so that we have a better handle on these cases in general.  There's ongoing work for this in https://gerrit.wikimedia.org/r/#/c/389515/ , https://gerrit.wikimedia.org/r/#/c/389516 , and more to come.  One of those patches also has the v4/v5 issues above and blocks on upgrading cache_text to v5.
With those measures in place, we should be able to definitively identify (and/or workaround) the problematic transactions and figure out what needs fixing at the application layer, at which point we can un-revert the do_stream=false and move forward with our other VCL plans around exp(-size/c) admission policies on cache_text frontends as part of T144187 (but none of this ties up doing the same on cache_upload).
TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: daniel, Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack
BBlack added a comment.

In T179156#3720392, @BBlack wrote:

In T179156#3719995, @BBlack wrote:
We have an obvious case of normal slow chunked uploads of large files to commons to look at for examples to observe, though.


Rewinding a little: this is false, I was just getting confused by terminology.  Commons "chunked" uploads through UploadWizard are not HTTP chunked transfer encoding, which is what I meant by "chunked uploads" in the rest of this conversation.


In T179156#3720290, @daniel wrote:
"pass" means stream, right? wouldn't that also grab a backend connection from the pool, and hog it if throughput is slow?


I'm pretty sure non-piped client request bodies are not streamed to backends, looking at the code (even in the pass case), and we don't use pipe-mode in cache_text at all.  There's still an open question about whether that allows for resource exhaustion on the frontend (plain memory consumption, or slowloris-like), but again it's not the problem we're looking at here.

We've gathered those manually in specific cases so far. Aggregating them across the varnishes to somewhere central all the time will require some significant work I think.

How about doing this on the app servers instead of varnish? We do track MediaWiki execution time, right? Would it be possible to also track overall execution time, from the moment php starts receiving data, before giving controle to mediawiki?

That would be nice too I think.  But at the end of the day, probably our Varnishes should assume that we won't necessarily have sane execution timeouts at all possible underlying applayer services (if nothing else because Bugs).  So we probably still want to capture this at the Varnish level as well.

Relatedly, I know hhvm has a max_execution_time parameter which we've set to 60s, so you'd *think* that would be a limit for the MediaWiki requests in question.  But on the other hand, I know during the weekend I logged requests going into (through?) the MW API for flow-parsoid stuff that timed out at ~80s (when we had the super-short Varnish timeouts configured as emergency workaround, which helped somewhat).


TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: daniel, Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack
BBlack added a comment.

In T179156#3719995, @BBlack wrote:
We have an obvious case of normal slow chunked uploads of large files to commons to look at for examples to observe, though.


Rewinding a little: this is false, I was just getting confused by terminology.  Commons "chunked" uploads through UploadWizard are not HTTP chunked transfer encoding, which is what I meant by "chunked uploads" in the rest of this conversation.


In T179156#3720290, @daniel wrote:
"pass" means stream, right? wouldn't that also grab a backend connection from the pool, and hog it if throughput is slow?


I'm pretty sure non-piped client request bodies are not streamed to backends, looking at the code (even in the pass case), and we don't use pipe-mode in cache_text at all.  There's still an open question about whether that allows for resource exhaustion on the frontend (plain memory consumption, or slowloris-like) is an open question, but again it's not the problem we're looking at here.

We've gathered those manually in specific cases so far. Aggregating them across the varnishes to somewhere central all the time will require some significant work I think.

How about doing this on the app servers instead of varnish? We do track MediaWiki execution time, right? Would it be possible to also track overall execution time, from the moment php starts receiving data, before giving controle to mediawiki?

That would be nice too I think.  But at the end of the day, probably our Varnishes should assume that we won't necessarily have sane execution timeouts at all possible underlying applayer services (if nothing else because Bugs).  So we probably still want to capture this at the Varnish level as well.

Relatedly, I know hhvm has a max_execution_time parameter which we've set to 60s, so you'd *think* that would be a limit for the MediaWiki requests in question.  But on the other hand, I know during the weekend I logged requests going into (through?) the MW API for flow-parsoid stuff that timed out at ~80s (when we had the super-short Varnish timeouts configured as emergency workaround, which helped somewhat).TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: daniel, Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Lowered Priority] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack
BBlack lowered the priority of this task from "Unbreak Now!" to "High".BBlack added a comment.
Reducing this from UBN->High, because current best-working-theory is this problem is gone so long as we keep the VCL do_stream=false change reverted.  Obviously, there's still some related investigations ongoing, and I'm going to write up an Incident_Report about the 503s later today as well.TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: daniel, Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack
BBlack added a comment.

In T179156#3719928, @daniel wrote:
In any case, this would consume front-edge client connections, but wouldn't trigger anything deeper into the stack

That's assuming varnish always caches the entire request, and never "streams" to the backend, even for file uploads. When discussing this with @hoo he told me that this should be the case - but is it? That would make it easy to exhaust RAM on the varnish boxes, no?


Maybe? I haven't really delved deeply into this angle yet, because it seems less-likely to be the cause of the current issues.  We have an obvious case of normal slow chunked uploads of large files to commons to look at for examples to observe, though.  Because they're POST they'd be handled as an immediate pass through the varnish layers, so I don't think this would cause what we're looking at now.  GETs with request-bodies that were slowly-chunked-out might be different, I don't know yet.

this is definitely on the receiving end of responses from the applayer.

So a slow-request-log would help?

Yes.  We've gathered those manually in specific cases so far.  Aggregating them across the varnishes to somewhere central all the time will require some significant work I think.  Right now I'm more-worried about the fact that, since varnish doesn't log a transaction at all until the transaction is complete, without overall transaction timeouts on the backend connections there are cases that would slip through all possible logging (if they stayed open virtually-indefinitely and sent data often enough to evade the between_bytes_timeout check).TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: daniel, Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack
BBlack added a comment.
Trickled-in POST on the client side would be something else.  Varnish's timeout_idle, which is set to 5s on our frontends, acts as the limit for receiving all client request headers, but I'm not sure that it has such a limitation that applies to client-sent bodies.  In any case, this would consume front-edge client connections, but wouldn't trigger anything deeper into the stack.  We could/should double-check varnish's behavior there, but that's not what's causing this, this is definitely on the receiving end of responses from the applayer.TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: daniel, Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-30 Thread BBlack
BBlack added a comment.

In T179156#3718772, @ema wrote:
There's a timeout limiting the total amount of time varnish is allowed to spend on a single request, send_timeout, defaulting to 10 minutes. Unfortunately there's no counter tracking when the timer kicks in,  although a debug line is logged to VSL when that happens. We can identify requests causing the "unreasonable" behavior as follows:

varnishlog -q 'Debug ~ "Hit total send timeout"'


Yeah, this might help find the triggering clients.  However, I don't know if the backend side of Varnish would actually abandon the backend request on send_timeout to the client


In T179156#3718957, @Lucas_Werkmeister_WMDE wrote:
Another thing that might be similar: for certain queries, the Wikidata Query Service can push out lots of results for up to sixty seconds (at which point the query is killed by timeout) or even longer (if the server returned results faster than they could be transferred – it seems the timeout only applies to the query itself). The simplest such query would be SELECT * WHERE { ?s ?p ?o. }; when I just tried it out (curl -d 'query=SELECT * WHERE { ?s ?p ?o. }' https://query.wikidata.org/sparql -o /dev/null), I received 1468M in 5 minutes (at which point I killed curl – I have no idea how much longer it would have continued to receive results). However, if I understand it correctly, WDQS’ proxy seems to be running in do_stream mode, since I’m receiving results immediately and continuously.


That's probably not causing the problem on text-lb, since query.wikidata.org goes through cache_misc at present.  But if there's no known actual-push traffic, the next-best hypothesis is behavior exactly like the above: something that's doing a legitimate request->response cycle, but trickling out the bytes of it over a very long period.  This would wrap back around to why we were looking at some of these cases before I think: could other services on text-lb be making these kinds of queries to WDQS on behalf of the client and basically proxying the same behavior through?TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: Peachey88, ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-29 Thread BBlack
BBlack added a comment.
Now that I'm digging deeper, it seems there are one or more projects in progress built around Push-like things, in particular T113125 .  I don't see any evidence that there's been live deploy of them yet, but maybe I'm missing something or other.  If we have a live deploy of any kind of push-like functionality through the text cluster, it's a likely candidate for issues above in the short term.

In the long term, discussions about push services need to loop in #traffic much earlier in the process.  This kind of thing is a definite No through the current traffic termination architecture as it's configured today.  I've even seen some tickets mention the possibility of push for anonymous users (!).  The changes on our end to sanely accommodate various push technologies reliably at wiki-scale could potentially be very large and costly, and could involve carving out a separate parallel edge-facing architecture for this stuff, distinct from the edge architecture we use for simpler transactions.  We don't have any kind of long-term planning at the #traffic level around supporting this in our annual plans and headcounts, either.  It may seem like a small thing from some perspectives, but push notifications at wiki-scale is a huge sea-change on our end from simple HTTP transactions.TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-29 Thread BBlack
BBlack added a comment.
Does Echo have any kind of push notification going on, even in light testing yet?TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: ema, Gehel, Smalyshev, TerraCodes, Jay8g, Liuxinyu970226, Paladox, Zppix, Stashbot, gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread BBlack
BBlack added a comment.
A while after the above, @hoo started focusing on a different aspect of this we've been somewhat ignoring as more of a side-symptom: that there tend to be a lot of sockets in a strange state on the "target" varnish, to various MW nodes.  They look strange on both sides, in that they spend significant time in the CLOSE_WAIT state on the varnish side and FIN_WAIT_2 on the MW side.  This is a consistent state between the two nodes, but it's not usually one that non-buggy application code spends much time in.  In this state, the MW side has sent FIN, Varnish has seen that and sent FIN+ACK, but Varnish has not yet decided to send its own FIN to finish the active closing process, and MW is still waiting on it.

While staring at the relevant Varnish code to figure out why or how it would delay closing in this case, it seemed like it was possible in certain cases related to connections in the VBC_STOLEN state.  Instead of closing immediately, in some such cases it defers killing the socket until some future eventloop event fires, which could explain the closing-delays under heavy load (and we know Varnish is backlogged in some senses when the problem is going on, because mailbox lag rises indefinitely).  All of that aside, at some point while staring at related code I realized that do_stream behaviors can influence some related things as well, and we had a recent related VCL patch

The patch in question was https://gerrit.wikimedia.org/r/#/c/386616/ , which was merged around 14:13 Oct 26, about 4.5 hours before the problems were first noticed (!).

I manually reverted the patch on cp1067 (current target problem node) as a test, and all of the CLOSE_WAIT sockets disappeared shortly, never to return.  I reverted the whole patch through gerrit shortly afterwards, so that's permanent now across the board.

I think there's a strong chance this patch was the catalyst for start of the problems.  At the very least, it was exacerbating the impact of the problems.  If it turns out to be the problem, I think we still have more post-mortem investigation to do here, because the issues that raises are tricky.  If it's just exacerbating, I think it's still useful to think about why it would, because that may help pin down the real problem.

Operating on the assumption that it's the catalyst and diving a little deeper on that angle:

The patch simply turned off do_stream behavior when the backend-most Varnish was talking to application layer services, when the applayer response did not contain a Content-Length header.  Turning off do_stream makes Varnish act in a store-and-forward mode for the whole response, rather than forwarding bytes onwards to upstream clients as they arrive from the application.  The benefit we were aiming for there was to have Varnish calculate the value of the missing Content-Length so that we can make more-informed cache-tuning decisions at higher layers.  Minor performance tradeoffs aside, turning off do_stream shouldn't be harmful to any of our HTTP transactions under "reasonable" assumptions (more later on what "reasonable" is here).  In fact, that was the default/only mode our caches operated in back when we were running Varnish 3, but streaming became the default for the text cluster when it switched to Varnish 4 just under a year ago.  So this was "OK" a year ago, but clearly isn't ok for some requests today.

That there was always a singular chash target within the text cluster for the problems also resonates here: there's probably only one special URI out there which breaks the "reasonable" assumption.  Another oddity that we didn't delve into much before was that when we restarted the problematic varnish, it only depooled for a short period (<1 min), yet the problem would move *permanently* to its next chash target node and stay there even after the previous target node was repooled.  This might indicate that the clients making these requests are doing so over very-long-lived connections, and even that the request->response cycle itself must be very-long-lived  It moves via re-chashing when a backend is restarted, but doesn't move on repool because everything's still connected and transacting...

My best hypothesis for the "unreasonable" behavior that would break under do_stream=false is that we have some URI which is abusing HTTP chunked responses to stream an indefinite response.  Sort of like websockets, but using the normal HTTP protocol primitives.  Client sends a request for "give me a live stream of some events or whatever", and the server periodically sends new HTTP response chunks to the client containing new bits of the event feed.  Varnish has no way to distinguish this behavior from normal chunked HTTP (where the response chunks will eventually reach a natural end in a reasonable timeframe), and in the do_stream=false store-and-forward mode, Varnish would consume this chunk st

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-28 Thread BBlack
BBlack added a comment.
Updates from the Varnish side of things today (since I've been bad about getting commits/logs tagged onto this ticket):


18:15 - I took over looking at today's outburst on the Varnish side
The current target at the time was cp1053 (after elukey's earlier restart of cp1055 varnish-be above)
18:21 - I manually reduced the backend timeouts for api+appservers to from defaults of connect/firstbyte/betweenbytes of 5/180/60 to 3/20/10
cp1053 had already been under assault for quite a while though, and this didn't seem to help much.
18:39 - restarted varnish-be on cp1053 to clear it of issues and move to a new target
18:41 - identified cp1065 as the new target (more on identifying these below!)
18:42 - Merged->deployed https://gerrit.wikimedia.org/r/#/c/387024/ to apply the shorter-timeouts workaround above to all text caches
at this point, cp1065 was showing various signs of the issue (rising connection counts + mailbox lag), but connection counts stabilized much lower than before, ~200-300 instead of rising towards ~3K, an apparent success of the timeout-reduction mitigation.
18:56 - Identified the first slow-running requests in cp1065 logs with the reduced timeouts:


18:56 < bblack> -   BereqURL   
/w/api.php?action=""
18:56 < bblack> -   BereqHeaderHost: www.wikidata.org
18:56 < bblack> -   Timestamp  Bereq: 1509216884.761646 0.42 0.42
18:56 < bblack> -   Timestamp  Beresp: 1509216965.538549 80.776945 80.776903
18:56 < bblack> -   Timestamp  Error: 1509216965.538554 80.776950 0.05
18:56 < bblack> -   Timestamp  Start: 1509216970.911803 0.00 0.00


after this, identified several other slow requests.  All were for the same basic flow-parsoid-utils API + www.wikidata.org
19:39 - hoo's parsoid timeout reduction for Flow (above) hits
19:39 - restarted varnish-backend on cp1065 due to rising mailbox lag
19:41 - new target seems to be cp1067, briefly, but within a minute or two it recovers to normal state and stops exhibiting the symptoms much?  Apparently the problem-causing traffic may have temporarily died off on its own.


For future reference by another opsen who might be looking at this: one of the key metrics that identifies what we've been calling the "target cache" in eqiad, the one that will (eventually) have issues due to whatever bad traffic is currently mapped through it, is by looking at the connection counts to appservers.svc.eqiad.wmnet + api-appservers.svc.eqiad.wmnet on all the eqiad cache nodes.  For this, I've been using:

bblack@neodymium:~$ sudo cumin A:cp-text_eqiad 'netstat -an|egrep "10\.2\.2\.(1|22)"|awk "{print \$5}"|sort|uniq -c|sort -n'

Which during the latter/worst part of cp1053's earlier target-period produced output like:

= NODE GROUP = 
(1) cp1068.eqiad.wmnet 
- OUTPUT of 'netstat -an|egre...|uniq -c|sort -n' -
  1 10.2.2.18:8080 
 15 10.2.2.17:7231 
 79 10.2.2.1:80
111 10.2.2.22:80
= NODE GROUP = 
(1) cp1066.eqiad.wmnet 
- OUTPUT of 'netstat -an|egre...|uniq -c|sort -n' -
  1 10.2.2.18:8080 
 14 10.2.2.17:7231 
 92 10.2.2.1:80
111 10.2.2.22:80
= NODE GROUP = 

[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack
BBlack added a comment.



In T179156#3715432, @hoo wrote:
I think I found the root cuase now,  seems it's actually related to the WikibaseQualityConstraints extension:


Isn't that the same extension referenced in the suspect commits mentioned above?

18:51 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/Wikidata/extensions/Constraints/includes/ConstraintCheck/DelegatingConstraintChecker.php: Fix sorting of NullResults (T179038) (duration: 01m 04s)
18:52 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/Wikidata/extensions/Constraints/tests/phpunit/DelegatingConstraintCheckerTest.php: Fix sorting of NullResults (T179038) (duration: 00m 49s)
19:12 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/WikibaseQualityConstraints/tests/phpunit/DelegatingConstraintCheckerTest.php: Fix sorting of NullResults (T179038) (duration: 00m 50s)
19:14 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/WikibaseQualityConstraints/includes/ConstraintCheck/DelegatingConstraintChecker.php: Fix sorting of NullResults (T179038) (duration: 00m 49s)

Or is an unrelated problem in the same area?TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: gerritbot, thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, Lewizho99, Zppix, Maathavan, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, Jay8g, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack
BBlack added a comment.
Unless anyone objects, I'd like to start with reverting our emergency varnish max_connections changes from https://gerrit.wikimedia.org/r/#/c/386756 .  Since the end of the log above, connection counts have returned to normal, which is ~100, which is 1/10th the normal 1K limit that usually isn't a problem.  If we leave the 10K limit in place, it will only serve to mask (for a time) any recurrence of the issue, making it only possible to detect it early by watching varnish socket counts on all the text cache machines.TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Zppix, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, Jay8g, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack
BBlack added a comment.
My gut instinct remains what it was at the end of the log above.  I think something in the revert of wikidatawiki to wmf.4 fixed this.  And I think given the timing alignment of the Fix sorting of NullResults changes + the initial ORES->wikidata fatals makes those in particular a strong candidate.  I would start with undo all of the other emergency changes first, leaving the wikidatawiki->wmf.4 bit for last.TASK DETAILhttps://phabricator.wikimedia.org/T179156EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: thiemowmde, aude, Marostegui, Lucas_Werkmeister_WMDE, Legoktm, tstarling, awight, Ladsgroup, Lydia_Pintscher, ori, BBlack, demon, greg, Aklapper, hoo, Lahi, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Zppix, Mkdw, Liudvikas, srodlund, Luke081515, Wikidata-bugs, ArielGlenn, faidon, zeljkofilipin, Alchimista, He7d3r, Mbch331, Rxy, Jay8g, fgiunchedi, mmodell___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179156: 503 spikes and resulting API slowness starting 18:45 October 26

2017-10-27 Thread BBlack
BBlack added a comment.
Copying this in from etherpad (this is less awful than 6 hours of raw IRC+SAL logs, but still pretty verbose):

# cache servers work ongoing here, ethtool changes that require short depooled downtimes around short ethernet port outages:
17:49 bblack: ulsfo cp servers: rolling quick depool -> repool around ethtool parameter changes for -lro,-pause
17:57 bblack@neodymium: conftool action : set/pooled=no; selector: name=cp4024.ulsfo.wmnet
17:59 bblack: codfw cp servers: rolling quick depool -> repool around ethtool parameter changes for -lro,-pause
18:00 <+jouncebot> Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will 
   be rewarded with a sticker.
18:27 bblack: esams cp servers: rolling quick depool -> repool around ethtool parameter changes for -lro,-pause
18:41 bblack: eqiad cp servers: rolling quick depool -> repool around ethtool parameter changes for -lro,-pause

# 5xx alerts start appearing.  initial assumption is related to ethtool work above
18:44 <+icinga-wm> PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
18:46 <+icinga-wm> PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
18:47 <+icinga-wm> PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
18:48 <+icinga-wm> PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0]
# ...but once the MW exceptions hit, seems less-likely to be related to the ethtool work

# notices hit IRC for these wikidata sorting changes:
18:51 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/Wikidata/extensions/Constraints/includes/ConstraintCheck/DelegatingConstraintChecker.php: Fix sorting of NullResults (T179038) (duration: 01m 04s)
18:52 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/Wikidata/extensions/Constraints/tests/phpunit/DelegatingConstraintCheckerTest.php: Fix sorting of NullResults (T179038) (duration: 00m 49s)
19:12 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/WikibaseQualityConstraints/tests/phpunit/DelegatingConstraintCheckerTest.php: Fix sorting of NullResults (T179038) (duration: 00m 50s)
19:14 ladsgroup@tin: Synchronized php-1.31.0-wmf.5/extensions/WikibaseQualityConstraints/includes/ConstraintCheck/DelegatingConstraintChecker.php: Fix sorting of NullResults (T179038) (duration: 00m 49s)

# Lots of discussion and digging ensues on all sides...
# bblack figures out that while the logs implicate a single eqiad text backend cache, depooling said cache moves the problem to a different cache host (repeatedly), so it doesn't seem to be a faulty cache node.
# One cache just happens to be the unlucky chash destination for more of the problematic traffic than the others at any given time.
# The problematic traffic load/patterns consumes all of the 1K connection slots varnish allows to api.svc+appservers.svc, and then this causes many unrelated 503s for lack of available backend connection slots to service requests.

# The Fatals logs seem to be related to ORES fetching from Wikidata
# So, a timeout is increased there to cope with slow wikidata responses:

19:33 awight@tin: Started deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107
19:33 awight@tin: Finished deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107 (duration: 00m 10s)
19:34 awight@tin: Started deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107
19:36 aaron@tin: Started restart [jobrunner/jobrunner@a20d043]: (no justification provided)
19:41 awight@tin: Finished deploy [ores/deploy@0adae70]: Increase extractor wikidata API timeout to 15s, T179107 (duration: 07m 25s)

# Still doesn't fix the problem, so the next attempted fix is to disable ores+wikidata entirely:

20:02 ladsgroup@tin: Synchronized wmf-config/InitialiseSettings.php: UBN! disbale ores for wikidata (T179107) (duration: 00m 50s)
20:00 ladsgroup@tin: Synchronized wmf-config/InitialiseSettings.php: UBN! disbale ores for wikidata (T179107) (duration: 00m 50s)

# Things are still borked, try reverting some other recent Wikidata-related changes:

20:59 hoo@tin: Synchronized wmf-config/Wikibase.php: Revert "Add property for RDF mapping of external identifiers for Wikidata" (T178180) (duration: 00m 50s)
21:00 hoo: Fully revert all changes related to T178180

# Still borked.  Tried reverting something else that looks dangerous in the logstash errors, but also wasn't the cause:

21:30 hoo@tin: Synchronized wmf-config/InitialiseSettings.php: Temporary disable remex html (T178632) (duration: 00m 50s)
21:32 hoo

[Wikidata-bugs] [Maniphest] [Commented On] T175588: Server overloaded .. can't save (only remove or cancel)

2017-09-11 Thread BBlack
BBlack added a comment.
Can you explain in more detail? Is the subject of this ticket was was shown as an error in your browser window?  I doubt this is related to varnish and/or "mailbox lag".TASK DETAILhttps://phabricator.wikimedia.org/T175588EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: BBlack, Aklapper, Esc3300, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T175588: Server overloaded .. can't save (only remove or cancel)

2017-09-11 Thread BBlack
BBlack removed parent tasks: T174932: Recurrent 'mailbox lag' critical alerts and 500s, T175473: Multiple 503 Errors.
TASK DETAILhttps://phabricator.wikimedia.org/T175588EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: Aklapper, Esc3300, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T99531: [Task] move wikiba.se webhosting to wikimedia misc-cluster

2017-07-27 Thread BBlack
BBlack added a project: Traffic.
TASK DETAILhttps://phabricator.wikimedia.org/T99531EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: mark, greg, PokestarFan, faidon, Ladsgroup, Ivanhercaz, Addshore, Jonas, JeroenDeDauw, thiemowmde, hoo, JanZerebecki, Aklapper, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Zppix, Izno, Wikidata-bugs, aude, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T153563: Consider switching to HTTPS for Wikidata query service links

2017-06-26 Thread BBlack
BBlack removed a parent task: T104681: HTTPS Plans (tracking / high-level info).
TASK DETAILhttps://phabricator.wikimedia.org/T153563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: Kghbln, Dalba, Lydia_Pintscher, Jonas, Ricordisamoa, Lokal_Profil, DSGalaktos, MisterSynergy, Esc3300, Smalyshev, MZMcBride, Aklapper, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, Avner, Zppix, debt, Gehel, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, Manybubbles, faidon, Seb35, Mbch331, Jay8g, Krenair, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-06-01 Thread BBlack
BBlack added a comment.
Yeah that was the plan, for XKey to help here by consolidating that down to a single HTCP / PURGE per article touched.  It's not useful for the mass-scale case (e.g. template/link references), as it doesn't scale well in that direction.  But for the case like "1 article == 7 URLs for different formats/variants/derivatives" it should work great.  The varnish module for it is deployed, but we haven't ever found/made the time to loop back to actually using it (defining standards for how to transmit it over the existing HTCP protocol or the new EventBus and pushing developers to make use of it).  I think last we talked we were going to move cache-purge traffic over to EventBus before tackling this (with kafka consumers on the cache nodes pulling the purges), but I'm not sure what the relative timelines on all related projects look like anymore.TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: Gilles, GWicke, ArielGlenn, Krinkle, Peter, EBernhardson, Smalyshev, gerritbot, Legoktm, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, BBlack, Aklapper, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Vali.matei, Zppix, Izno, Wikidata-bugs, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-05-31 Thread BBlack
BBlack added a comment.
We can get broader averages by dividing the values seen in the aggregate client status code graphs using eqiad's text cluster (the remote sites would expect fewer due to some of the bursts being more likely to be dropped by the network)

This shows the past week's average at 33.6K/s avg PURGE rate for text@eqiad: https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes?panelId=6=1=eqiad_type=text_type=1_type=2_type=3_type=4_type=5

There are 7 active servers there the past week (cp1053 has been depooled for the past couple of weeks), so that puts us at ~4800/sec raw rate of HTCP purges.  The numbers from before (100, 400) were htmlCacheUpdate numbers though, before they're multiplied by 4 (desktop/mobile, action="" for actual HTCP purging.  So the comparable number now would be something like ~1200/sec.

It's a little blurrier than that now, though, because in the meantime we've added RB purges as well (e.g. for the mobile content sections).   I think these are 3x per article for mobile-sections, mobile-sections-lead, mobile-sections-remaining, and I'm not sure exactly how it hooks into the updating pipeline.  I would suspect that, indirectly, all 3 of those are triggered for many of the same conditions as regular wiki purges, so we may be seeing a ~7x HTCP multiplier overall for title->URLs, which would divide the 4800/s down to 685s on the htmlCacheUpdate side as perhaps a more-comparable number to the earlier 100 and 400 numbers?TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: GWicke, ArielGlenn, Krinkle, Peter, EBernhardson, Smalyshev, gerritbot, Legoktm, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, BBlack, Aklapper, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Vali.matei, Zppix, Izno, Wikidata-bugs, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-05-30 Thread BBlack
BBlack added a comment.
The lack of graph data from falling off the history is a sad commentary on how long this has remained unresolved :(

Some salient points from earlier within this ticket, to recap:


In T124418#1985526, @BBlack wrote:
Continuing with some stuff I was saying in IRC the other day.  At the "new normal", we're seeing something in the approximate ballpark of 400/s articles purged (which is then multiplied commonly for ?action="" and mobile and ends up more like ~1600/s actual HTCP packets), whereas the edit rate across all projects is something like 10/s.  That 400/s number used to be somewhere south of 100 before December.





In T124418#1986594, @BBlack wrote:
Regardless, the average rate of HTCP these days is normally-flat-ish (a few scary spikes aside), and is mostly throttled by the jobqueue.  The question still remains: what caused permanent, large bumps in the jobqueue htmlCacheUpdate insertion rate on ~Dec4, ~Dec11, and ~Jan20?


Re; the outstanding patch that's been seeing some bumps ( https://gerrit.wikimedia.org/r/#/c/295027/ ) - The situation has evolved since that patch was first uploaded.  Our current maximum TTLs are capped at a single day in all cache layers.  However, they can still add up across layers if the race to refresh content plays out just right, with the worst theoretical edge case being 4 total days (fetching from ulsfo when eqiad is the primary).

Those edge cases are also bounded by the actual Cache-Control (s-)max-age, but that's currently at two weeks still, I believe, so they don't really come into play.  We should probably look at moving the mediawiki-config wgSquidMaxAge (and similar) down to something around 5-7 days, so that it's more reflective of the reality of the situation on the caches.

We'll eventually get to a point where we've eliminated the corner-case refreshes and can definitely say that the whole of the cache infrastructure has a hard cap at one full day, but there's more work to do there in T124954 + T50835 (Surrogate-Control) first.

I think even now, and especially once we reach that point later, we're starting to reach a point where purging Varnish for mass invalidations like refreshLinks and templating doesn't make sense.  Those would be spooled out over a fairly long asynchronous period anyways.  They can simply get updated as the now-short TTLs expire, reserving immediate HTCP invalidation for actual content edits of specific articles.  Those kinds of ideas may need to be a separate discussion in another ticket?TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: GWicke, ArielGlenn, Krinkle, Peter, EBernhardson, Smalyshev, gerritbot, Legoktm, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, BBlack, Aklapper, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Vali.matei, Zppix, Izno, Wikidata-bugs, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Reopened] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2017-05-19 Thread BBlack
BBlack reopened this task as "Open".BBlack added a comment.
Not resolved, as the purge graphs can attest!TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aaron, BBlackCc: GWicke, ArielGlenn, Krinkle, Peter, EBernhardson, Smalyshev, gerritbot, Legoktm, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, BBlack, Aklapper, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, Vali.matei, Zppix, Izno, Wikidata-bugs, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T142944: Performance and caching considerations for article placeholders accesses

2016-11-08 Thread BBlack
BBlack added a comment.
I clicked Submit too soon :) Continuing:

We'd expect content to be at minimum a day, if not significantly longer.  MW currently emits 2-week cache headers (with plans to eventually bring that down closer to a day, but those plans are still further off).  Cache invalidation is a hard problem, but it's not something we can just ignore, either.  Perhaps this should be tied into the broader X-Key effort to sweep these up when the underlying wikidata is updated?TASK DETAILhttps://phabricator.wikimedia.org/T142944EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: DaBPunkt, BBlack, daniel, Lydia_Pintscher, Joe, Lucie, Aklapper, hoo, Zppix, D3r1ck01, Izno, Wikidata-bugs, aude, jayvdb, Ricordisamoa, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T142944: Performance and caching considerations for article placeholders accesses

2016-11-08 Thread BBlack
BBlack added a comment.
Nothing was ever resolved here.  30 minutes seems like an arbitrary number with no formal basis or reasoning, and is way shorter than we'd like for anything article-like.TASK DETAILhttps://phabricator.wikimedia.org/T142944EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: DaBPunkt, BBlack, daniel, Lydia_Pintscher, Joe, Lucie, Aklapper, hoo, Zppix, D3r1ck01, Izno, Wikidata-bugs, aude, jayvdb, Ricordisamoa, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T132457: Move wdqs to an LVS service

2016-10-12 Thread BBlack
BBlack closed this task as "Resolved".BBlack claimed this task.
TASK DETAILhttps://phabricator.wikimedia.org/T132457EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: Stashbot, gerritbot, ema, Gehel, BBlack, Aklapper, mschwarzer, Avner, Lewizho99, Maathavan, debt, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T132457: Move wdqs to an LVS service

2016-10-11 Thread BBlack
BBlack added a parent task: T147844: Standardize varnish applayer backend definitions.
TASK DETAILhttps://phabricator.wikimedia.org/T132457EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: Stashbot, gerritbot, ema, Gehel, BBlack, Aklapper, mschwarzer, Avner, Lewizho99, Maathavan, debt, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, faidon, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T142944: Performance and caching considerations for article placeholders accesses

2016-08-17 Thread BBlack
BBlack added a comment.
I think I'm lacking a lot of context here about these special pages and placeholders.  But my bottom line thoughts are currently along these lines:


How do actual, real-world, anonymous users interact with these placeholders and special pages?  What value is it providing the average reader, in what way?  How does the scope of the new code and new invalidation problems (esp potential purge traffic) compare to that?  Because I tend to think (with what little context I have) that this sounds like a ton of churn on our end for very little real value to the user.  Maybe most of the value is to logged-in editors, who don't face invalidation problems in the first place?



For the most part, we can categorize the invalidation model of page content into one of two bins: either it's purged on relevant update nearly-immediately (at most, a few seconds' delay for asynchronicity and such), or it's something that sometimes goes stale for some real amount of time, where we really have to think about what happens when users read a stale page, and we need an upper bound on staleness to consider that question properly.  Once you're in the latter bin of stale things, there needs to be a rational way to quantify the fallout of a stale view.  Is a stale page broken itself, or does it have broken links, or simply outdated content?  I tend to think that, in the examples I've seen so far, either something requires immediate invalidation, or staleness isn't a real issue within a reasonable (e.g. hours, days) timeframe.  30 minutes seems arbitrary and probably not tied to a real-world constraint on how broken a stale view is.  It sounds more like a compromise because we really want immediate purging but we know the purge volume will be unreasonable.
TASK DETAILhttps://phabricator.wikimedia.org/T142944EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: BBlack, daniel, Lydia_Pintscher, Joe, Lucie, Aklapper, hoo, D3r1ck01, Izno, Wikidata-bugs, aude, jayvdb, Ricordisamoa, faidon, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T142944: Performance and caching considerations for article placeholders accesses

2016-08-16 Thread BBlack
BBlack added a comment.
30 minutes isn't really reasonable, and neither is spamming more purge traffic.  If there's a constant risk of the page content breaking without invalidation, how is even 30 minutes acceptable?  Doesn't this mean that on average they'll be broken for 15 minutes after an affecting change?TASK DETAILhttps://phabricator.wikimedia.org/T142944EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: BBlack, daniel, Lydia_Pintscher, Joe, Lucie, Aklapper, hoo, D3r1ck01, Izno, Wikidata-bugs, aude, jayvdb, Ricordisamoa, faidon, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-06-17 Thread BBlack
BBlack added a subscriber: GWicke.BBlack added a comment.
@aaron and @GWicke - both patches sound promising, thanks for digging into this topic!TASK DETAILhttps://phabricator.wikimedia.org/T124418EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BBlackCc: GWicke, ArielGlenn, Krinkle, Peter, EBernhardson, Smalyshev, gerritbot, Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, BBlack, Aklapper, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-16 Thread BBlack
BBlack added a comment.


  cache_maps cluster switched to the new varnish package today

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: jeremyb, Ronarts12, Krenair, Dzahn, GWicke, Smalyshev, Heather, ZMcCune, 
ema, Stashbot, Luke081515, matmarex, TerraCodes, Urbanecm, KDDLB, hashar, 
Jonas, gerritbot, BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, Avner, 
Lewizho99, Maathavan, debt, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-13 Thread BBlack
BBlack added a comment.


  Current State:
  
  - cp3007 and cp1045 are depooled from user traffic, icinga-downtimed for 
several days, and have puppet disabled.  Please do not re-enable puppet on 
these!  They also have confd shut down, and are running custom configs to 
continue debugging this issue under varnish4.
  - The rest of cache_misc is reverted to varnish3, which should temporarily 
resolve this issue for user traffic over the weekend and into next week while 
we continued isolated investigation using the two nodes above.
  - Please do **not** resolve this ticket - this is a bandaid, and we still 
have a lot of digging to do.
  - Please **do** report any similar user-facing failures from here forward, as 
there shouldn't be any while the cluster is reverted to varnish3.

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Ronarts12, Krenair, Dzahn, GWicke, Smalyshev, Heather, Nirzar, ZMcCune, 
ema, Stashbot, Luke081515, matmarex, TerraCodes, Urbanecm, KDDLB, hashar, 
Jonas, gerritbot, BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, Avner, 
Lewizho99, Maathavan, debt, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Block] T133490: Wikidata Query Service REST endpoint returns truncated results

2016-05-13 Thread BBlack
BBlack reopened blocking task T131501: Convert misc cluster to Varnish 4 as 
"Open".

TASK DETAIL
  https://phabricator.wikimedia.org/T133490

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: MZMcBride, gerritbot, BBlack, Bovlb, Aklapper, Mushroom, Avner, Lewizho99, 
Maathavan, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-13 Thread BBlack
BBlack added a blocked task: T131501: Convert misc cluster to Varnish 4.

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Ronarts12, Krenair, Dzahn, GWicke, Smalyshev, Heather, Nirzar, ZMcCune, 
ema, Stashbot, Luke081515, matmarex, TerraCodes, Urbanecm, KDDLB, hashar, 
Jonas, gerritbot, BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, Avner, 
Lewizho99, Maathavan, debt, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-13 Thread BBlack
BBlack added a comment.


  I forgot one of our temporary hacks in the list above in 
https://phabricator.wikimedia.org/T134989#2290254:
  
  4. https://gerrit.wikimedia.org/r/#/c/288656/ - we also enabled a critical 
small bit here in v4 vcl_hit.  I reverted this for now during the varnish3 
downgrade.  Need to remember that once we find the right bug and start cleaning 
everything up for upgrade again...

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Ronarts12, Krenair, Dzahn, GWicke, Smalyshev, Heather, Nirzar, ZMcCune, 
ema, Stashbot, Luke081515, matmarex, TerraCodes, Urbanecm, KDDLB, hashar, 
Jonas, gerritbot, BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, Avner, 
Lewizho99, Maathavan, debt, D3r1ck01, FloNight, Izno, jkroll, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-12 Thread BBlack
BBlack added a comment.


  Has anyone been able to reproduce any of the problems in the tickets merged 
into here, since roughly the timestamp of the above message?

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Heather, Nirzar, ZMcCune, ema, Stashbot, Luke081515, matmarex, TerraCodes, 
Urbanecm, KDDLB, hashar, Jonas, gerritbot, BBlack, Aklapper, Zppix, 
Lydia_Pintscher, Gehel, Avner, Lewizho99, Maathavan, debt, D3r1ck01, FloNight, 
Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-12 Thread BBlack
BBlack added a comment.


  So we're currently have several experiments in play trying to figure this out:
  
  1. We've got 2x upstream bugfixes applied to our varnishd on cache_misc: 
https://github.com/varnishcache/varnish-cache/commit/d828a042b3fc2c2b4f1fea83021f0d5508649e50
 + 
https://github.com/varnishcache/varnish-cache/commit/e142a199c53dd9331001cb29678602e726a35690
  
  2. We've removed all of our Content-Length sensitive VCL that was on 
cache_misc temporarily (basically https://gerrit.wikimedia.org/r/#/c/288231/ , 
which at one point we partially put back, but then removed again)
  
  3. We've switched from persistent to file storage on the misc backends, 
manually with puppet disabled.  Puppetization to make that semi-permanent if 
need be: https://gerrit.wikimedia.org/r/288440 (untested)
  
  I don't think anyone has reproduced the problem since (3) went live 
everywhere.  So we're in a new state and needing proof that things are still 
messed up (or not!).

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Heather, Nirzar, ZMcCune, ema, Stashbot, Luke081515, matmarex, TerraCodes, 
Urbanecm, KDDLB, hashar, Jonas, gerritbot, BBlack, Aklapper, Zppix, 
Lydia_Pintscher, Gehel, Avner, Lewizho99, Maathavan, debt, D3r1ck01, FloNight, 
Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-12 Thread BBlack
BBlack added a comment.


  In the merged ticket above, it's browser access to status.wm.o, and the 
browser's getting a 304 Not Modified and complaining about it (due to missing 
character encoding supposedly, but it's entirely likely it's missing everything 
and that's just the first thing it notices).

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Kghbln, ema, Stashbot, Luke081515, matmarex, TerraCodes, Urbanecm, KDDLB, 
hashar, Jonas, gerritbot, BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, 
Avner, Lewizho99, Maathavan, debt, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Merged] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-12 Thread BBlack
BBlack added a subscriber: Kghbln.
BBlack merged a task: T135121: stats.wikimedia.org down.

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Kghbln, ema, Stashbot, Luke081515, matmarex, TerraCodes, Urbanecm, KDDLB, 
hashar, Jonas, gerritbot, BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, 
Avner, Lewizho99, Maathavan, debt, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-11 Thread BBlack
BBlack added a comment.


  Status update: we've been debugging this off and on all day.  It's some kind 
of bug fallout from cache_misc's upgrade to Varnish 4.  It's a very complicated 
bug, and we don't really understand it yet.  We've made some band-aid fixes to 
VCL for now which should keep the problem at bay while investigating further.

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, BBlack
Cc: Luke081515, matmarex, TerraCodes, Urbanecm, KDDLB, hashar, Jonas, 
gerritbot, BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, Avner, Lewizho99, 
Maathavan, debt, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, 
Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134989: WDQS empty response - transfer clsoed with 15042 bytes remaining to read

2016-05-11 Thread BBlack
BBlack added a comment.


  Assuming there was no transient issue (which became cached) on the wdqs end 
of things, then this was likely a transient thing from nginx experiments or the 
cache_misc varnish4 upgrade.  I banned all wdqs objects from cache_misc and now 
your test URL works fine.  Can you confirm?

TASK DETAIL
  https://phabricator.wikimedia.org/T134989

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, Aklapper, Zppix, Lydia_Pintscher, Gehel, Avner, debt, D3r1ck01, 
FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, 
Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-09 Thread BBlack
BBlack closed this task as "Resolved".
BBlack added a comment.


  My test cases on cache_text work now, should be resolved!

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: gerritbot, mahmoud, Slaporte, Zppix, Ricordisamoa, Trung.anh.dinh, 
MZMcBride, Anomie, Yurivict, TerraCodes, Orlodrim, BBlack, akosiaris, 
zhuyifei1999, elukey, ema, Aklapper, hoo, Lewizho99, Maathavan, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331, Jay8g, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Closed] T133490: Wikidata Query Service REST endpoint returns truncated results

2016-05-09 Thread BBlack
BBlack closed this task as "Resolved".
BBlack claimed this task.
BBlack added a comment.


  This works now.  There's a significant pause at the start of the transfer 
from the user's perspective if it's not a cache hit, because streaming is 
disabled as a workaround (so it has to completely load the data into each cache 
layer before starting the data stream to the user), but it does function 
correctly.  The non-streamed pause behavior will go away with 
https://phabricator.wikimedia.org/T131501 .

TASK DETAIL
  https://phabricator.wikimedia.org/T133490

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: MZMcBride, gerritbot, BBlack, Bovlb, Aklapper, Mushroom, Avner, Lewizho99, 
Maathavan, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T133490: Wikidata Query Service REST endpoint returns truncated results

2016-05-09 Thread BBlack
BBlack added a comment.


  We now have some understanding of the mechanism of this bug ( 
https://phabricator.wikimedia.org/T133866#2275985 ).  It should go away in the 
imminent varnish 4 upgrade of the misc cluster in 
https://phabricator.wikimedia.org/T131501.

TASK DETAIL
  https://phabricator.wikimedia.org/T133490

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, Bovlb, Aklapper, Mushroom, Avner, debt, Gehel, D3r1ck01, FloNight, 
Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T133490: Wikidata Query Service REST endpoint returns truncated results

2016-05-09 Thread BBlack
BBlack added a blocking task: T131501: Convert misc cluster to Varnish 4.

TASK DETAIL
  https://phabricator.wikimedia.org/T133490

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Bovlb, Aklapper, Mushroom, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-09 Thread BBlack
BBlack added a comment.


  So, as it turns out, this is a general varnishd bug in our specific varnishd 
build.  For purposes of this bug, our varnishd code is essentially 3.0.7 plus a 
bunch of ancient forward-ported 'plus' patches related to streaming, and we're 
missing 
https://github.com/varnishcache/varnish-cache/commit/72981734a141a0a52172b85bae55f8877f69ff42
 (do_gzip + do_stream content-length bug for HTTP/1.0 reqs, which is eerily 
similar to this issue, but not quite the same) because it doesn't apply 
cleanly/sanely to our codebase due to conflicts with the former.
  
  What I can reliably and predictably observe and control for now is: we have a 
response-length-specific response corruption bug, only when both of these 
conditions are met:
  
  1. do_stream is in effect for this request (for text cluster, this means it's 
pass or initial miss+chfp(Created-Hit-For-Pass) traffic)
  2. the response has to be gunzipped for the client (client does not advertise 
gzip support, but backend response is gzipped by the applayer, or gzipped by 
varnish due to do_gzip rules).
  
  In a lot of the test scenarios/requests myself and others were using 
previously, we weren't necessarily controlling for these variables well, which 
led to a lot of inconsistent results (notably, X-Wikimedia-Debug effectively 
turns non-pass traffic into pass-traffic when debugging, but the same might not 
be true if testing directly from varnish to mw1017 without X-Wikimedia-Debug).
  
  The do_gzip (and related gunzip) behaviors have been in place for a long 
time.  What's new lately is the do_stream behaviors.  These were added to the 
cache_text cluster in the past couple of months for the pass-traffic cases.  
cache_upload has had do_stream for certain requests for a very long time, but 
various constraints there conspire to make it accidentally-unlikely we'll 
observe this bug on cache_upload for legitimate traffic.  cache_misc probably 
suffers from this as well, but the conditions under which it will or won't is 
trickier in this case, but almost surely this is related to 
https://phabricator.wikimedia.org/T133490 as well.
  
  So the basic game plan for this bug is:
  cache_text - revert the relatively-recent do_stream-enabling VCL patches.
  cache_misc - will resolve itself with varnish4 upgrade, which is imminent for 
this cluster
  cache_upload - keep ignoring what is probably a non-problem in practice there 
for now, will eventually get fixed up with varnish 4 upgrade.
  cache_maps - already varnish4, wouldn't have this issue.

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Ricordisamoa, Trung.anh.dinh, MZMcBride, Anomie, Yurivict, TerraCodes, 
Orlodrim, BBlack, akosiaris, zhuyifei1999, elukey, ema, Aklapper, hoo, 
D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, Jay8g, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unblock] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-09 Thread BBlack
BBlack closed blocking task Restricted Task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Ricordisamoa, Trung.anh.dinh, MZMcBride, Anomie, Yurivict, TerraCodes, 
Orlodrim, BBlack, akosiaris, zhuyifei1999, elukey, ema, Aklapper, hoo, 
D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, Jay8g, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Triaged] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-07 Thread BBlack
BBlack triaged this task as "High" priority.

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Trung.anh.dinh, MZMcBride, Anomie, Yurivict, TerraCodes, Orlodrim, BBlack, 
akosiaris, zhuyifei1999, elukey, ema, Aklapper, hoo, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331, Jay8g, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-07 Thread BBlack
BBlack added a blocking task: Restricted Task.

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Trung.anh.dinh, MZMcBride, Anomie, Yurivict, TerraCodes, Orlodrim, BBlack, 
akosiaris, zhuyifei1999, elukey, ema, Aklapper, hoo, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331, Jay8g, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-07 Thread BBlack
BBlack added a comment.


  Thanks for merging in the probably-related tasks.  I had somehow missed 
really noticing T123159 earlier...  So probably digging into gunzip itself 
isn't a fruitful path.  I'm going to open a separate blocker for this that's 
private, so we can keep merging public tickets into this...

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Trung.anh.dinh, MZMcBride, Anomie, Yurivict, TerraCodes, Orlodrim, BBlack, 
akosiaris, zhuyifei1999, elukey, ema, Aklapper, hoo, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331, Jay8g, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-05 Thread BBlack
BBlack added a comment.


  Did some further testing on an isolated test machine, using our current 
varnish3 package.
  
  - Got 2833-byte test file from uncorrupted (--compressed) output on prod.  
This is the exact compressed content bytes emitted by MW/Apache for the broken 
test URL.
  - Configured a test backend (nginx) to serve static files, and to always set 
CE:gzip.
  - Placed the gzipped 2833 byte file in test directory, fetched over curl w/ 
--compressed, md5sum comes out right.
  - When fetched through our varnishd with a default config and do_gzip turned 
on, varnish does decompress this file for the curl client, and there is no 
corruption (still same md5sum).
  
  This rules out the possibility that this is some pure, data-sensitive varnish 
bug with 
  gunzipping the content itself.  However, the notable diff in this test from 
reality is that nginx serving a static pre-gzipped file is (a) not emitting it 
as TE:chunked and (b) even if it did, it probably wouldn't use the same chunk 
boundaries, nor is it likely to share any TE:chunked encoding bugs or 
varnish-bug-triggers...

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: elukey, ema, Aklapper, hoo, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, 
Jay8g, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-03 Thread BBlack
BBlack added a comment.


  Just jotting down the things I know so far from investigating this morning.  
I still don't have a good answer yet.
  
  Based on just the test URL, debugging it extensively at various layers:
  
  1. The response size of that URL is in the ballpark of 32KB uncompressed, so 
this is not a large-objects issue.  It also streams out of the backend reliably 
and quickly, there are no significant timing pauses.
  2. Without a doubt, anytime the client doesn't use AE:gzip when talking to 
the public endpoint, the response is corrupt.
  3. Slightly deeper, even when testing requests against a single layer of 
varnish internally, the response to a non-AE:gzip client request is corrupt.
  4. It's definitely happening at the Varnish<->Apache/MediaWiki boundary 
(disagreement) or within a single varnishd process as it prepares the response. 
 It's not a Varnish<->Varnish, Varnish<->Nginx, or Nginx<->client issue.
  5. All of the gzip/chunked stuff looks basically correct in the headers at 
varnish/apache boundary and varnish/client boundary.  We do send AE:gzip when 
expected, we do get CE:gzip when expected (only when asked for), the gzip-ness 
of MW/Apache's output always correctly follows its CE:gzip header (or lack 
thereof), etc.
  6. Curl has no problem parsing the output of a direct fetch from 
Apache/MediaWiki, whether using `--compressed` to set AE:gzip or not, and the 
results hash the same (identical content).
  7. Varnish emits the corrupt content for the non-AE:gzip client regardless of 
whether I tweak the test scenario to ensure that varnish is the one gzipping 
the content, or that Apache/Mediawiki are the ones gzipping the content.  So 
it's not an error in gzip compression of the response by just one party or the 
other.  The error happens when gunzipping the response for the non-AE:gzip 
client.
  8. However, when I run through a similar set of fully-debugged test scenarios 
for https://en.wikipedia.org/wiki/Barack_Obama , which is ~1MB in uncompressed 
length, and similarly TE:chunked with backend gzip capabilities and 
do_gzip=true, and on the same cluster and VCL (and even same test machine), I 
don't get the corruption for a non-AE:gzip client, even though varnish is 
decompressing that on the fly as with the bad test URL.
  
  The obvious distinctions here between the Barack article and the failing test 
URL aren't much: api.svc vs appservers.svc shouldn't matter, right, they're 
both behind the same apache and hhvm configs.  This leaves me guessing that 
there's something special about the specific output of the test URL that's 
causing this.
  
  There's almost certainly a varnish bug involved here, but the question is: is 
this a pure varnish gunzip bug that's sensitive to certain conditions which 
exist for the test URL but not the Barack one?  Is the output of Apache/MW 
buggy in some way for the test URL such that it's tripping the bug (in which 
case it's still a varnish bug that it doesn't reject the buggy response and 
turn it into a 503 or similar), or is it non-buggy, but "special" in a way that 
trips a varnish bug?

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: elukey, ema, Aklapper, hoo, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, 
jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T133866: Varnish seems to sometimes mangle uncompressed API results

2016-05-03 Thread BBlack
BBlack added a comment.


  Do you know if some normal traffic is affected, such that we'd know a start 
date for a recent change in behavior?  Or is it suspected that it was always 
this way?
  
  I've been digging through some debugging on this URL (which is an applayer 
chunked-response with no-cache headers), and it's definitely happening at the 
varnish<->MW boundary (as opposed to further up, at varnish<->varnish or 
nginx<->varnish), and only for non-AE:gzip requests.  The length of the result 
is correct, but there's corruption in the trailing bytes.

TASK DETAIL
  https://phabricator.wikimedia.org/T133866

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: ema, Aklapper, hoo, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-05-03 Thread BBlack
BBlack added a comment.


  I really don't think it's specifically Wikidata-related either at this point. 
 Wikidata might be a significant driver of update jobs in general, but the code 
changes driving the several large rate increases were probably generic to all 
update jobs.

TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Smalyshev, gerritbot, Legoktm, Addshore, daniel, hoo, aude, 
Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, 
BBlack, Aklapper, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331, 
Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-04-27 Thread BBlack
BBlack added a blocked task: T133821: Content purges are unreliable.

TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Smalyshev, gerritbot, Legoktm, Addshore, daniel, hoo, aude, 
Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, aaron, faidon, Joe, ori, 
BBlack, Aklapper, Lewizho99, Maathavan, D3r1ck01, Izno, Wikidata-bugs, Mbch331, 
Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T102476: RFC: Requirements for change propagation

2016-04-27 Thread BBlack
BBlack added a blocked task: T133821: Content purges are unreliable.

TASK DETAIL
  https://phabricator.wikimedia.org/T102476

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: ArielGlenn, hoo, Addshore, RobLa-WMF, StudiesWorld, intracer, JanZerebecki, 
brion, Ltrlg, Anomie, Milimetric, mark, BBlack, aaron, daniel, Eevans, 
mobrovac, GWicke, D3r1ck01, Izno, Hardikj, Wikidata-bugs, aude, jayvdb, fbstj, 
Mbch331, Jay8g, bd808, Legoktm



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T133490: Wikidata Query Service REST endpoint returns truncated results

2016-04-24 Thread BBlack
BBlack edited projects, added Traffic; removed Varnish.

TASK DETAIL
  https://phabricator.wikimedia.org/T133490

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Aklapper, Mushroom, Avner, debt, TerraCodes, Gehel, D3r1ck01, FloNight, 
Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, 
Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-04-07 Thread BBlack
BBlack added a comment.


  F3845100: Screen Shot 2016-04-07 at 7.47.28 PM.png 
<https://phabricator.wikimedia.org/F3845100>

TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Smalyshev, gerritbot, Legoktm, Addshore, daniel, hoo, aude, 
Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, Denniss, aaron, faidon, 
Joe, ori, BBlack, Aklapper, Lewizho99, TerraCodes, D3r1ck01, Izno, 
Wikidata-bugs, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Edited] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-04-07 Thread BBlack
BBlack edited the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Smalyshev, gerritbot, Legoktm, Addshore, daniel, hoo, aude, 
Lydia_Pintscher, JanZerebecki, MZMcBride, Luke081515, Denniss, aaron, faidon, 
Joe, ori, BBlack, Aklapper, Lewizho99, TerraCodes, D3r1ck01, Izno, 
Wikidata-bugs, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T121135: Banners fail to show up occassionally on Russian Wikivoyage

2016-04-05 Thread BBlack
BBlack added a comment.


  In https://phabricator.wikimedia.org/T121135#1910435, @Atsirlin wrote:
  
  > @Legoktm: Frankly speaking, for a small project like Wikivoyage the cache 
brings no obvious benefits, but triggers many serious issues including the 
problem of page banners and ToC. The dirty trick of automatic cache purge 
worked perfectly fine in the last 3 weeks, and I believe that it could be used 
further even if it violates some general philosophy of the Mediawiki software. 
I am fine with having this cache purge feature reverted, but you have to 
propose another solution. At this point, having stable page banners and ToC is 
very important for us, while anything related to the cache is of minor 
relevance to the project.
  
  
  That JS hack, if I'm reading it correctly, effectively sends us a purge on 
every pageview?  That's horrendous and abusive of our infrastructure, and the 
problem could grow if people start copying it to other wikis to work around 
other perceived problems.

TASK DETAIL
  https://phabricator.wikimedia.org/T121135

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Jdlrobson, BBlack
Cc: BBlack, faidon, Tgr, jcrespo, Krenair, Legoktm, LtPowers, mark, Wrh2, 
Sumit, Jdlrobson, Atsirlin, Aklapper, TerraCodes, D3r1ck01, Izno, 
Wikidata-bugs, aude, Lydia_Pintscher, Arlolra, Jackmcbarn, Mbch331, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T127014: Empty result on a tree query

2016-03-23 Thread BBlack
BBlack added a blocking task: T128813: cache_misc's misc_fetch_large_objects 
has issues.

TASK DETAIL
  https://phabricator.wikimedia.org/T127014

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, BBlack
Cc: gerritbot, BBlack, Gehel, Nikki, Mbch331, Magnus, JanZerebecki, Smalyshev, 
Aklapper, StudiesWorld, Bugreporter, debt, TerraCodes, D3r1ck01, FloNight, 
Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T127014: Empty result on a tree query

2016-03-23 Thread BBlack
BBlack edited projects, added Traffic; removed Varnish.

TASK DETAIL
  https://phabricator.wikimedia.org/T127014

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, BBlack
Cc: gerritbot, BBlack, Gehel, Nikki, Mbch331, Magnus, JanZerebecki, Smalyshev, 
Aklapper, StudiesWorld, Bugreporter, debt, TerraCodes, D3r1ck01, FloNight, 
Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Jay8g



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T127014: Empty result on a tree query

2016-03-23 Thread BBlack
BBlack added a comment.


  I did some live experimentation with manual edits to the VCL.  It is the 
`between_bytes_timeout`, but the situation is complex.  The timeout that's 
failing is on the varnish frontend fetching from the varnish backend.  These 
are fixed at 2s, but because this is all in do_stream mode, the stream delays 
come through directly.  So, as I guessed earlier, this is all inter-related 
with https://phabricator.wikimedia.org/T128813.  We should fix that issue first 
before deciding what to do about the between bytes timeout for 
varnish<->varnish (where it's not per-service...).

TASK DETAIL
  https://phabricator.wikimedia.org/T127014

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, BBlack
Cc: gerritbot, BBlack, Gehel, Nikki, Mbch331, Magnus, JanZerebecki, Smalyshev, 
Aklapper, StudiesWorld, Bugreporter, debt, TerraCodes, D3r1ck01, FloNight, 
Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Jay8g, 
jeremyb



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T127014: Empty result on a tree query

2016-03-23 Thread BBlack
BBlack added a comment.


  This is probably due to backend timeouts, I would guess?  The default 
applayer settings being applied to wdqs include `between_bytes_timeout` at only 
4s, whereas `first_byte_timeout` is 185s.  So if wdqs delayed all output, it 
would have 3 minutes or so, but once it outputs the first byte, any delay over 
4s will kill it.  Although I'm surprised that doesn't result in a 503.  There's 
probably also multi-layer interaction with 
https://phabricator.wikimedia.org/T128813 and the do_stream and such...

TASK DETAIL
  https://phabricator.wikimedia.org/T127014

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, BBlack
Cc: BBlack, Gehel, Nikki, Mbch331, Magnus, JanZerebecki, Smalyshev, Aklapper, 
StudiesWorld, Bugreporter, debt, D3r1ck01, FloNight, Izno, jkroll, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidata Sparql queries

2016-02-17 Thread BBlack
BBlack added a comment.

In https://phabricator.wikimedia.org/T126730#2034900, @Christopher wrote:

> I may be wrong, but the headers that are returned from a request to the nginx 
> server wdqs1002 say that varnish 1.1 is already being used there.


It's varnish 3.0.6 currently (4.x is coming down the road).

> And, for whatever reason,** it misses**, because repeating the same query 
> gives the same response time.

It misses because the response is sent with `Transfer-Encoding: chunked`.  If 
it were sent un-chunked with a Content-Length, the varnish would have a chance 
at caching it.  However, the next thing you'd run into is that the response 
doesn't contain any caching-relevant headers (e.g. `Expires`, `Cache-Control`, 
`Age`).  Lacking these, varnish would cache it with our configured default_ttl, 
which on the misc cluster where `query.wikidata.org` is currently hosted, is 
only 120 seconds.

> Even though Varnish cache **should work** to proxy nginx for optimizing 
> delivery of static query results, it lacks several important features of an 
> object broker.  Namely, client control of object expiration (TTL) and 
> retrieval of "named query results" from persistent storage.
> 
>   A WDQS service use case may in fact be to compare results from several days 
> ago with current results.   Thus, assuming the latest results state is what 
> the client wants my actually not be true.

I think all of this is doable.  Named query results is something we talked 
about in the previous discussion re `GET` length restrictions.  `POST`ing 
(and/or server-side configuring, either way!) a complex query and saving it as 
a named query through a separate query-setup interface, then executing the 
query for results with a `GET` on just the query name.

I don't think we really want client control of object expiration (at least, not 
"varnish cache object expiration"), but what we want is the ability to 
parameterize named queries based on time, right?  e.g. a named query that gives 
a time-series graph might have parameters for start time and duration.  You 
might initially post the complex SPARQL template and save it as `fooquery`, 
then later have a client get it as 
`/sparql?saved_query=fooquery=201601011234=1w`.  Varnish would 
have the chance to cache those based on the query args as separate results, and 
you could limit the time resolution if you want to enhance cacheability.

If it's for inclusion from a page that wants to graph that data and always show 
a "current" graph rather than hardcoded start/duration (and I could see 
use-cases for both in articles), you could support a start time of `now` with 
an optional resolution specifier that defaults to 1 day, like `=now/1d`.  
The response to such a query would set cache-control headers that allow caching 
at varnish up to 24H (based on `now/1d` resolution), which means everyone 
executing that query gets new results about once a day and they all shared a 
single cached result per day.

The important thing here is there's no need for a client to have control over 
result object expiration if the query encodes everything that's relevant to 
expiration and the maximum cache lifetime is set small enough that other 
effects (e.g. data updates to existing historical data) are negligible in the 
big picture.

> Possibly, the optimal solution would use the varnish-api-engine 
> (http://info.varnish-software.com/blog/introducing-varnish-api-engine) in 
> conjunction with a WDQS REST API (provided with a modified RESTBase?).   Is 
> the varnish-api-engine being used anywhere in WMF?  Also, delegating query 
> requests to an API could allow POSTs.  Simply with Varnish cache, the POST 
> problem would remain unresolved.

We're not using the Varnish API Engine, and I don't see us pursuing that 
anytime soon.  Most of what it does can be done other ways, and more 
importantly it's commercial software.

There seems to be some confusion as to whether `POST` is or isn't still an 
issue here...

Also, a whole separate issue is that WDQS is currently mapped through our 
`cache_misc` cluster.  That cluster is for small lightweight miscellaneous 
infrastructure.  WDQS was probably always a poor match for that, but we put it 
there because at the time it was seen as being a lightweight / low-rate service 
that would mostly be used directly by humans to execute one-off complicated 
queries.  The plans in this ticket sound nothing like that, and `cache_misc` 
probably isn't an appropriate home for a complex query services that's going to 
backend serious query load from wikis and the rest of the world...


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, 
Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehe

[Wikidata-bugs] [Maniphest] [Commented On] T125392: [Task] figure out the ratio of page views by logged-in vs. logged-out users

2016-02-16 Thread BBlack
BBlack added a comment.

In https://phabricator.wikimedia.org/T125392#1994242, @Milimetric wrote:

> @BBlack - so you think cache_status is not even close to accurate?  Do we 
> have other accurate measurements of it so we could compare to what extent 
> it's misleading?  I'm happy to remove it from the data if it's really bad.


On this topic, it is pretty misleading, and we do have other stats we look at 
more-manually to compare.  We don't have a good singular, simple replacement 
for cache_status to include in analytics yet, though.  What we do have (that 
we've looked at manually in some cases lately) is the `X-Cache` response 
header.  That header has evolved a bit in how it's generated over the past 
couple of months so that it's less-misleading than it was before, but it still 
requires various regex operations to bin responses according to what exactly 
one is trying to measure.  But for an example, this pseudo-code would be an 
accurate way to put all requests into 3 distinct non-overlapping bins based on 
X-Cache regex:

  if (X-Cache ~ / hit/) {
  print "This is a real cache object hit";
  }
  else if (X-Cache ~ / int/) {
  print "This response was generated internally by varnish (e.g. 301 
redirect for HTTPS, desktop->mobile redirect on UA detect, some kinds of error 
response, etc)";
  }
  else {
  print "This is a cache miss or a cache pass (pass would be due to 
uncacheable content, which is more-often true for loggedin users than others, 
but exists in both cases in notable numbers)";
  }

However, I think X-Cache's raw data is still open to further modification.  
Ideally we'll build on top of this and start emitting some standard, simple 
header that can be one of N simple strings and reflects overall cache status 
bins (and hopefully with better detail as to miss-vs-pass and the nature of the 
pass to some degree).


TASK DETAIL
  https://phabricator.wikimedia.org/T125392

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, BBlack
Cc: JanZerebecki, Milimetric, BBlack, ori, gerritbot, hoo, daniel, Aklapper, 
Addshore, Lydia_Pintscher, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidata Sparql queries

2016-02-15 Thread BBlack
BBlack added a comment.

IIRC, the problem we've beat our heads against in past SPARQL-related tickets 
is the fact that SPARQL clients are using `POST` method for readonly queries, 
due to argument length issues and whatnot.  On the surface, that's a 
dealbreaker for caching them as `POST` isn't cacheable.  The conflict here 
comes from a fundamental limitation of HTTP: the only idempotent/readonly 
methods have un-ignorable input data length restrictions.  There are probably 
ways to design around that in a scratch design, but SPARQL software is 
already-written...


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: BBlack, GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, 
Jonas, Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, 
jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-03 Thread BBlack
BBlack added a comment.

So, current thinking is that at least one of (maybe two of?) the bumps are from 
moving what used to be synchronous HTCP purge during requests to JobRunner jobs 
which should be doing the same thing.  However, assuming it's that alone (or 
even just investigating that part in isolation), we're still left with "why did 
the resulting rate of HTCP purges go up by unexpected multiples just from 
moving them to the jobqueue?".


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, 
MZMcBride, Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, 
Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-03 Thread BBlack
BBlack added a comment.

Well then apparently the 10/s edits to all projects number I found before is 
complete bunk :)

http://wikipulse.herokuapp.com/ has numbers for wikidata edits that 
approximately line up with yours, and then shows Wikipedias at about double 
that rate (which might be a reasonable interpretation of the distribution 
shown).


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, 
MZMcBride, Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, 
Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-03 Thread BBlack
BBlack added a comment.

heh so: https://phabricator.wikimedia.org/T113192 -> 
https://gerrit.wikimedia.org/r/#/c/258365/5 is probably the Jan 20 bump.


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, 
MZMcBride, Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, 
Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125392: [Task] figure out the ratio of page views by logged-in vs. logged-out users

2016-02-03 Thread BBlack
BBlack added a comment.

FYI - "cache_status" is not an accurate reflection of anything.  I'm not sure 
why we really even log it for analytics.  The problem is that it only reflects 
some varnish state about the first of up to 3 layers of caching, and even then 
it does so poorly.


TASK DETAIL
  https://phabricator.wikimedia.org/T125392

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore, BBlack
Cc: JanZerebecki, Milimetric, BBlack, ori, gerritbot, hoo, daniel, Aklapper, 
Addshore, Lydia_Pintscher, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread BBlack
BBlack added a comment.

@ori - yeah that makes sense for the initial bump, and I think there may have 
even been a followup to do deferred purges, which may be one of the other 
multipliers, but I haven't found it yet (as in, insert an immediate job and 
also somehow insert one that fires a little later to cover race conditions).


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, 
MZMcBride, Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, 
Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread BBlack
BBlack added a comment.

Another data point from the weekend: In one sample I took Saturday morning, 
when I sampled for 300s, the top site being purged was srwiki, and something 
like 98% of the purges flowing for srwiki were all Talk: pages (well, with 
Talk: as %-encoded something in Serbian).  When I visited random examples of 
the purged Talk pages, the vast majority of the ones I checked were 
content-free (as in, nobody had talked about the given article at all yet, it 
was showing the generic initial blob).  These had to be coming from a job 
obviously, the question is what kind of job wants to rip through (probably) 
every talk page on a wiki, blank ones included, for purging (or alternatively, 
it was ripping through the entire page list, and I just happened to catch it on 
a batch of Talk: ones)?  @faidon suggested a template used in those pages, but 
then what's triggering the template?  Surely not wikidata on a template for 
blank talk pages?


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, 
MZMcBride, Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, 
Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread BBlack
BBlack added a comment.

Regardless, the average rate of HTCP these days is normally-flat-ish (a few 
scary spikes aside), and is mostly throttled by the jobqueue.  The question 
still remains: what caused permanent, large bumps in the jobqueue 
htmlCacheUpdate insertion rate on ~Dec4, ~Dec11, and ~Jan20?


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, 
MZMcBride, Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, 
Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-02-01 Thread BBlack
BBlack added a comment.

@daniel - Sorry I should have linked this earlier, I made a paste at the time: 
https://phabricator.wikimedia.org/P2547 .  Note that 
`/%D0%A0%D0%B0%D0%B7%D0%B3%D0%BE%D0%B2%D0%BE%D1%80:` is the Serbian srwiki 
version of `/Talk:`.


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Legoktm, Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, 
MZMcBride, Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, 
Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread BBlack
BBlack added a comment.

Well, we have 3 different stages of rate-increase in the insert graph, so it 
could well be that we have 3 independent causes to look at here.  And it's not 
necessarily true that any of them are buggy, but we need to understand what 
they're doing and why, because maybe something or other can be tweaked or tuned 
to be less wasteful.  Fundamentally nothing really changed in the past month or 
two; it's not like we gained a 5x increase in human article editing rate...


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, 
Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, Izno, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan

2016-01-31 Thread BBlack
BBlack added a comment.

Continuing with some stuff I was saying in IRC the other day.  At the "new 
normal", we're seeing something in the approximate ballpark of 400/s articles 
purged (which is then multiplied commonly for ?action=history and mobile and 
ends up more like ~1600/s actual HTCP packets), whereas the edit rate across 
all projects is something like 10/s.  That 400/s number used to somewhere south 
of 100 before December.


TASK DETAIL
  https://phabricator.wikimedia.org/T124418

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: BBlack
Cc: Addshore, daniel, hoo, aude, Lydia_Pintscher, JanZerebecki, MZMcBride, 
Luke081515, Denniss, aaron, faidon, Joe, ori, BBlack, Aklapper, Izno, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


  1   2   >