Re: Large difference between response time as measured by nginx and as measured by HAProxy
On Thu, Aug 27, 2020 at 05:39:34PM +0200, Tim Düsterhus, WoltLab GmbH wrote: > Willy and I discussed this in private. My understanding is that this > issue is caused by an inherent limitation of H2 with regard to head of > the line blocking. > > In simple terms: > > If the stuff at the beginning of the the backend connection belongs to a > client that is very slowly or not reading the stuff within its client > connection then everything that's within that backend connection will > stall until the slow client finally gets around to reading the stuff > thats intended for it. > > Basically a very slow client can block *other* clients that are > unfortunate and share the backend connection. And in order to complete this, one theorical workaround would consist in using very small initial windows, but that would not be practical. A test just with the buffer size didn't help. > Our resolution is simple: We disable H2 for the backend connections and > use good old H1. We don't *need* end to end H2. Also that raised an idea about adding a new "http-reuse" mode or slightly changing the way "never" works for H2, which would consist in only grouping together requests from the same session (client connection). It would be what "never" already does, except that after the last response, instead of closing the connection, it would be sent back to the idle pool for any other client to use. It would not be perfect but would keep streams in the same order for a given client based on their performance. Willy
Re: Large difference between response time as measured by nginx and as measured by HAProxy
Hi List, Am 27.08.20 um 11:45 schrieb Tim Düsterhus, WoltLab GmbH: >> You can see two h2c connections, one is on the frontend (hence the 'F') >> and the other one is on the backend. The h2s is the H2 stream, unique to >> a request (though it comes from a pool, so it will be reused for new >> streams on any connection once this one is finished). Thus bu just >> filtering on the h2 and correlating with what you see in the logs (with >> your source ports), and the date of course, you can quite reliably match >> connections, hence requests. > > Okay, that works sufficiently well for me. I've extracted the relevant > parts, redacted any private information and will check with a coworker > to see if I missed anything. I'll send you the traces + logs in private > after the contents have been ACKed :-) > Willy and I discussed this in private. My understanding is that this issue is caused by an inherent limitation of H2 with regard to head of the line blocking. In simple terms: If the stuff at the beginning of the the backend connection belongs to a client that is very slowly or not reading the stuff within its client connection then everything that's within that backend connection will stall until the slow client finally gets around to reading the stuff thats intended for it. Basically a very slow client can block *other* clients that are unfortunate and share the backend connection. Our resolution is simple: We disable H2 for the backend connections and use good old H1. We don't *need* end to end H2. Best regards Tim Düsterhus Developer WoltLab GmbH -- WoltLab GmbH Nedlitzer Str. 27B 14469 Potsdam Tel.: +49 331 96784338 duester...@woltlab.com www.woltlab.com Managing director: Marcel Werk AG Potsdam HRB 26795 P
Re: Large difference between response time as measured by nginx and as measured by HAProxy
On Thu, Aug 27, 2020 at 11:45:13AM +0200, Tim Düsterhus, WoltLab GmbH wrote: > Willy, > > Am 27.08.20 um 10:29 schrieb Willy Tarreau: > >> I now have the trace results and my HAProxy log where I can correlate > >> the slow requests using the timestamp and path. Unfortunately the trace > >> does not appear to contain the unique-id of the request. > > > > Sadly, due to the way HPACK works we can't decode the headers being sent. > > There's already a ugly hack being done for the URI and method. I seem to > > remember we peak them from the HTX's start line, but it could not reliably > > be done for all headers. I'd still like to improve this :-/ > > The unique-id is a property of the the stream (struct stream).unique_id. > So that one should technically be available, no? I would also be happy > with using the internal request_counter (the %rt log value), because > that one is part of my unique ID format anyway. In the stream output yes, but not in the h1 or h2 output, which do not necessarily have a stream above them (e.g.: checks). However if we had it in the stream output, enabling traces at both levels would be enough to trace them. What I'm missing in both is the conn_stream, which joins the h1/h2 stream and the application layer stream, but that shouldn't be hard to add. For the request counter, same problem. I'm trying to think of an identifier that would be present everywhere but can't think of any. Maybe we could try to attack the issue the other way around and enable the production of a trace_id when traces are enabled, and this trace_id would be passed as much as possible between layers and could also be logged. The problem with muxed connections is that it would have to be generated on a new request, and that for a number of issues, that's already too late. Thus maybe a composite trace ID would help instead, i.e. front connection ID, front stream ID, back stream ID, back connection ID. Missing some of these parts would then not be a problem because it would just mean that we didn't reach that stage. > > You can see two h2c connections, one is on the frontend (hence the 'F') > > and the other one is on the backend. The h2s is the H2 stream, unique to > > a request (though it comes from a pool, so it will be reused for new > > streams on any connection once this one is finished). Thus bu just > > filtering on the h2 and correlating with what you see in the logs (with > > your source ports), and the date of course, you can quite reliably match > > connections, hence requests. > > Okay, that works sufficiently well for me. Cool! I do imagine that it has been painful, so thanks for doing it! > I've extracted the relevant > parts, redacted any private information and will check with a coworker > to see if I missed anything. I'll send you the traces + logs in private > after the contents have been ACKed :-) No problem. > There's even a PHP request within there with an 800ms difference between > nginx and HAProxy. Ah excellent! We'll have something to study at least. I'm suddenly thinking about something: there is a known case of head-of-line blocking in H2 when stream windows are larger than buffers: a slow stream could cause some of its data to block into the connection's buffer for some time. It could very well be what's happening to you: merge two clients on the same backend connection, have one slowly read its data and sometimes the other may experience some hiccups. This should be significantly reduced by reducing the number of concurrent streams however. One way to alleviate this is to reduce the tune.h2.initial-window-size to 16kB but it's used on both sides so it would also mean very slow posts. The other solution is to increase the default buffer size to 64kB so that all of an H2 incoming data's stream window can be flushed into a client's buffer. It may increase your memory consumption a bit if you're mostly dealing with slow clients. A test could consist in setting it to 32kB which will generally be enough as you have one buffer in the mux and another one in the channel ;-) > > Do not hesitate to let me know if you think of something that can be > > improved, traces are still quite young and under exploited, I'm quite > > open to studying potential improvements. > > Nothing else for now. grepping for the pointers worked sufficiently > well. The bigger issue is identifying the request in question in the > first place. Yes I know, that's always the same for me as well. That's the reason why the unique-id feature was implemented so long ago! > I was lucky that the file in this case was fairly > low-trafficked so that I was able to accurately identify it based on the > timestamp and path. Usually with patience it does work. There's still a lot of room for improvement but I often tend to imagine what it would have been without and that helps me be tolerant :-) Regards, Willy
Re: Large difference between response time as measured by nginx and as measured by HAProxy
Willy, Am 27.08.20 um 10:29 schrieb Willy Tarreau: >> I now have the trace results and my HAProxy log where I can correlate >> the slow requests using the timestamp and path. Unfortunately the trace >> does not appear to contain the unique-id of the request. > > Sadly, due to the way HPACK works we can't decode the headers being sent. > There's already a ugly hack being done for the URI and method. I seem to > remember we peak them from the HTX's start line, but it could not reliably > be done for all headers. I'd still like to improve this :-/ The unique-id is a property of the the stream (struct stream).unique_id. So that one should technically be available, no? I would also be happy with using the internal request_counter (the %rt log value), because that one is part of my unique ID format anyway. >> Can I somehow filter down the trace file to just the offending requests >> + possible the requests within the same H2 connection? For privacy >> reasons I would not like to provide the full trace log, even if it's in >> a non-public email. > > Of course I understand. You can rely on the "h2c" pointer that appears > in the trace, it's the H2 connection. For example below, I'm forwarding > an H2 request from a frontend to a backend: > > <0>2020-08-27T10:23:26.117936+02:00 [01|h2|0|mux_h2.c:2541] rcvd H2 request > : h2c=0x7f59f0026880(F,FRH) : [1] H2 REQ: GET http://127.0.0.1:4446/ HTTP/2.0 > <0>2020-08-27T10:23:26.128922+02:00 [01|h2|0|mux_h2.c:5152] sent H2 request > : h2c=0x7f59f00382a0(B,FRH) h2s=0x7f59f0037fa0(1,IDL) : [1] H2 REQ: GET > http://mail.google.com/ HTTP/2.0 > <0>2020-08-27T10:23:26.149012+02:00 [01|h2|0|mux_h2.c:2632] rcvd H2 > response : h2c=0x7f59f00382a0(B,FRH) : [1] H2 RES: HTTP/2.0 301 > <0>2020-08-27T10:23:26.149041+02:00 [01|h2|0|mux_h2.c:4790] sent H2 > response : h2c=0x7f59f0026880(F,FRH) h2s=0x7f59f0027dc0(1,HCR) : [1] H2 RES: > HTTP/2.0 301 > > You can see two h2c connections, one is on the frontend (hence the 'F') > and the other one is on the backend. The h2s is the H2 stream, unique to > a request (though it comes from a pool, so it will be reused for new > streams on any connection once this one is finished). Thus bu just > filtering on the h2 and correlating with what you see in the logs (with > your source ports), and the date of course, you can quite reliably match > connections, hence requests. Okay, that works sufficiently well for me. I've extracted the relevant parts, redacted any private information and will check with a coworker to see if I missed anything. I'll send you the traces + logs in private after the contents have been ACKed :-) There's even a PHP request within there with an 800ms difference between nginx and HAProxy. > Do not hesitate to let me know if you think of something that can be > improved, traces are still quite young and under exploited, I'm quite > open to studying potential improvements. Nothing else for now. grepping for the pointers worked sufficiently well. The bigger issue is identifying the request in question in the first place. I was lucky that the file in this case was fairly low-trafficked so that I was able to accurately identify it based on the timestamp and path. Best regards Tim Düsterhus Developer WoltLab GmbH -- WoltLab GmbH Nedlitzer Str. 27B 14469 Potsdam Tel.: +49 331 96784338 duester...@woltlab.com www.woltlab.com Managing director: Marcel Werk AG Potsdam HRB 26795 P
Re: Large difference between response time as measured by nginx and as measured by HAProxy
Hi Tim, On Thu, Aug 27, 2020 at 10:05:08AM +0200, Tim Düsterhus, WoltLab GmbH wrote: > My definition of reproducible is "I can write up a list of steps that > makes the issue happen somewhat reliably". OK, that's fine by me :-) > This morning I re-enabled H2 for the backend communication and then > plugged in the tracing. In the half of an hour since I reenabled H2 I'm > seeing 160 static requests taking longer than 45ms, with the worst ones > being > 800ms. So there's definitely something related to H2. > I now have the trace results and my HAProxy log where I can correlate > the slow requests using the timestamp and path. Unfortunately the trace > does not appear to contain the unique-id of the request. Sadly, due to the way HPACK works we can't decode the headers being sent. There's already a ugly hack being done for the URI and method. I seem to remember we peak them from the HTX's start line, but it could not reliably be done for all headers. I'd still like to improve this :-/ It's possible to also log the streams (at level developer) and get the full requests/responses but they will not necessarily correlate with the H2 traffic (though it could give you high enough confidence). > Can I somehow filter down the trace file to just the offending requests > + possible the requests within the same H2 connection? For privacy > reasons I would not like to provide the full trace log, even if it's in > a non-public email. Of course I understand. You can rely on the "h2c" pointer that appears in the trace, it's the H2 connection. For example below, I'm forwarding an H2 request from a frontend to a backend: <0>2020-08-27T10:23:26.117936+02:00 [01|h2|0|mux_h2.c:2541] rcvd H2 request : h2c=0x7f59f0026880(F,FRH) : [1] H2 REQ: GET http://127.0.0.1:4446/ HTTP/2.0 <0>2020-08-27T10:23:26.128922+02:00 [01|h2|0|mux_h2.c:5152] sent H2 request : h2c=0x7f59f00382a0(B,FRH) h2s=0x7f59f0037fa0(1,IDL) : [1] H2 REQ: GET http://mail.google.com/ HTTP/2.0 <0>2020-08-27T10:23:26.149012+02:00 [01|h2|0|mux_h2.c:2632] rcvd H2 response : h2c=0x7f59f00382a0(B,FRH) : [1] H2 RES: HTTP/2.0 301 <0>2020-08-27T10:23:26.149041+02:00 [01|h2|0|mux_h2.c:4790] sent H2 response : h2c=0x7f59f0026880(F,FRH) h2s=0x7f59f0027dc0(1,HCR) : [1] H2 RES: HTTP/2.0 301 You can see two h2c connections, one is on the frontend (hence the 'F') and the other one is on the backend. The h2s is the H2 stream, unique to a request (though it comes from a pool, so it will be reused for new streams on any connection once this one is finished). Thus bu just filtering on the h2 and correlating with what you see in the logs (with your source ports), and the date of course, you can quite reliably match connections, hence requests. Do not hesitate to let me know if you think of something that can be improved, traces are still quite young and under exploited, I'm quite open to studying potential improvements. Willy
Re: Large difference between response time as measured by nginx and as measured by HAProxy
Willy, Am 26.08.20 um 17:51 schrieb Willy Tarreau >> As I said: It's not really reproducible. > > Yeah but we all have a definition of "not really reproducible". As you > consider that it doesn't happen in HTTP/1 after only 4 hours, for me > this means you expect it to usually happen at least once in 4 hours. > If the load is not too high (no more than a few thousands requests per > second) and you have some disk space, the h2 trace method could prove > to be useful once you have isolated a culprit in the logs. You could > even gzip the output on the fly to take less space, they compress very > well. My definition of reproducible is "I can write up a list of steps that makes the issue happen somewhat reliably". This is not the case here. If I test with e.g. nghttp -a to pull the HTML + all resources then everything is working smoothly. Same if I attempt to pull down the same static file after I am seeing an issue within the logs. An update regarding the H1 numbers: In the 20 hours or so with HTTP/1 enabled a total of 15 (!) static requests took longer than 45ms. The maximum being 77ms. This is still something I consider much, but nothing compared to the H2 performance. This morning I re-enabled H2 for the backend communication and then plugged in the tracing. In the half of an hour since I reenabled H2 I'm seeing 160 static requests taking longer than 45ms, with the worst ones being > 800ms. I now have the trace results and my HAProxy log where I can correlate the slow requests using the timestamp and path. Unfortunately the trace does not appear to contain the unique-id of the request. Can I somehow filter down the trace file to just the offending requests + possible the requests within the same H2 connection? For privacy reasons I would not like to provide the full trace log, even if it's in a non-public email. >>> Another thing you can try is to artificially limit >>> tune.h2.max-concurrent-streams just in case there is contention in >>> the server's connection buffers. By default it's 100, you can try with >>> much less (e.g. 20) and see if it seems to make any difference at all. >>> >> >> The fact that disabling HTTP/2 helps could indicate that something like >> this is the case here. I'll try that tomorrow, thanks. > I've not done this yet, I'd first like to hear how we go about with the trace I've collected. Best regards Tim Düsterhus Developer WoltLab GmbH -- WoltLab GmbH Nedlitzer Str. 27B 14469 Potsdam Tel.: +49 331 96784338 duester...@woltlab.com www.woltlab.com Managing director: Marcel Werk AG Potsdam HRB 26795 P
Re: Large difference between response time as measured by nginx and as measured by HAProxy
On Wed, Aug 26, 2020 at 05:38:52PM +0200, Tim Düsterhus, WoltLab GmbH wrote: > There are even worse ones: > > nginx: > > > Aug 26 09:56:50 *snip* abd21141cdb4[5615]: 5F4631E22E407 "GET *snip*.gif > > HTTP/2.0" 200 139 0.000 > > HAProxy: > > > Aug 26 09:57:17 *snip* haproxy[3066]: *snip*:57086 > > [26/Aug/2020:09:56:50.265] https~ *snip*/nginx 0/0/0/27340/27340 200 335 - > > - 82/82/17/17/0 0/0 {*snip*|*snip*} {*snip*|} "GET *snip*.gif > > HTTP/2.0" 5F4631E22E407 > > Yes, that's 27340ms. I don't have any more details regarding the > connection reuse for that, unfortunately :-( Ouch! > FWIW: I disabled HTTP/2 for the backend connection after sending my > initial email and I am not seeing anything worse than 45ms for static > files any more. OK, that already narrows the issue down a bit. > I just arbitrarily checked 3 PHP requests and they match. The static > ones *mostly* do as well. It's just that static files taking ages stick > out like a sore thumb, while PHP tends to be all over the place anyway. > It basically was a "grep for static file request" and then "halog -pct" > and wonder why there's a large number at the bottom. I can understand indeed. > See above: HTTP/1 did not show the issue within the last 4 hours, so I > would rule out nginx / the disk being the culprit here. That's perfect, it definitely rules out that part indeed. > Static files are served from fast local SSDs with plenty of page cache. > I don't expect the disk access to bottleneck up to the 3-digit > millisecond range. OK. I've been hit in the past by machines swapping out while uploading some logs over HTTP, and taking ages to even allocate network buffers after that! They used to slowly recover during the day and start their madness again next night. It was even causing packet losses causing retransmissions! That's why I asked :-) > As I said: It's not really reproducible. Yeah but we all have a definition of "not really reproducible". As you consider that it doesn't happen in HTTP/1 after only 4 hours, for me this means you expect it to usually happen at least once in 4 hours. If the load is not too high (no more than a few thousands requests per second) and you have some disk space, the h2 trace method could prove to be useful once you have isolated a culprit in the logs. You could even gzip the output on the fly to take less space, they compress very well. > I'm mostly seeing it after the > fact. I can't even tell if HAProxy's measuring or nginx' measuring is > correct (i.e. which latency the client actually experienced). Unless there's a bug causing haproxy to report a random value (or a timer related to the previous request for example and would explain the large values), I'd be tempted to say that it's likely the client experiences it :-/ > However tcpdump definitely is not going to be fun, because it's all TLS > with PFS ciphers. That's what I was suspecting, sadly, but preferred to ask. > > Another thing you can try is to artificially limit > > tune.h2.max-concurrent-streams just in case there is contention in > > the server's connection buffers. By default it's 100, you can try with > > much less (e.g. 20) and see if it seems to make any difference at all. > > > > The fact that disabling HTTP/2 helps could indicate that something like > this is the case here. I'll try that tomorrow, thanks. You're welcome! Willy
Re: Large difference between response time as measured by nginx and as measured by HAProxy
Willy, Am 26.08.20 um 16:59 schrieb Willy Tarreau: > On Wed, Aug 26, 2020 at 12:59:55PM +0200, Tim Düsterhus, WoltLab GmbH wrote: >> Within the HAProxy logs we're sometimes seeing largish 'Tr' values for >> static file requests. For example 734ms for the example request below. > > That sounds huge. I could have imagined a 200ms time, possibly indicating > that somewhere between the two, an extra MSG_MORE was present and prevented > a request and/or response from leaving the machine. But 734ms doesn't > correspond to anything usual. There are even worse ones: nginx: > Aug 26 09:56:50 *snip* abd21141cdb4[5615]: 5F4631E22E407 "GET *snip*.gif > HTTP/2.0" 200 139 0.000 HAProxy: > Aug 26 09:57:17 *snip* haproxy[3066]: *snip*:57086 [26/Aug/2020:09:56:50.265] > https~ *snip*/nginx 0/0/0/27340/27340 200 335 - - 82/82/17/17/0 0/0 > {*snip*|*snip*} {*snip*|} "GET *snip*.gif HTTP/2.0" 5F4631E22E407 Yes, that's 27340ms. I don't have any more details regarding the connection reuse for that, unfortunately :-( FWIW: I disabled HTTP/2 for the backend connection after sending my initial email and I am not seeing anything worse than 45ms for static files any more. >> The two requests in nginx' log: >> >>> Aug 26 10:23:21 *snip* abd21141cdb4[5615]: 5F46381900A0 "GET *snip* >>> HTTP/2.0" 200 27049 0.070 >>> Aug 26 10:23:26 *snip* abd21141cdb4[5615]: 5F46381E00F5 "GET *snip*.gif >>> HTTP/2.0" 200 6341 0.000 >> >> Note how the timing for the PHP request matches up what HAProxy was >> seeing: Roughly 70ms / 0.070s. The timing for the GIF request is *way* >> off, though. > > Stupid question, do the PHP request *always* match ? I'm asking because I just arbitrarily checked 3 PHP requests and they match. The static ones *mostly* do as well. It's just that static files taking ages stick out like a sore thumb, while PHP tends to be all over the place anyway. It basically was a "grep for static file request" and then "halog -pct" and wonder why there's a large number at the bottom. > one point is that haproxy cannot tell the difference between the two. > However the static one very likely responds instantly, which could justify > a less common code path being taken and a bug to be triggered. > > However static files might also cause some disk access, and not knowing > anything about the environment, I'd say it's also possible that the whole > nginx process gets frozen for the time it takes open() to respond, and if > for whatever reason there are lots of I/Os and it takes that long, it might > be possible that it doesn't notice it, depending where the total time is > retrieved. See above: HTTP/1 did not show the issue within the last 4 hours, so I would rule out nginx / the disk being the culprit here. Static files are served from fast local SSDs with plenty of page cache. I don't expect the disk access to bottleneck up to the 3-digit millisecond range. > But it could definitely be a bug in the connection reuse or mux code, or > course. It's just that I have no idea what could cause this for now. > > If you see it often enough, it might be useful to enable h2 traces and > watch them on the CLI, or even run a tcpdump between the two to observe > what's going on. The traces might be easier as a first step as you'll > see already decoded stuff. For example: > > (echo "trace h2 sink buf0; trace h2 level user; trace h2 verbosity \ >complete; trace h2 start now;show events buf0 -w -n" ; cat ) | \ > socat - /path/to/socket > /path/to/logs > > Once you're confident you got it, you can disable all traces using the > short form "trace 0". As I said: It's not really reproducible. I'm mostly seeing it after the fact. I can't even tell if HAProxy's measuring or nginx' measuring is correct (i.e. which latency the client actually experienced). However tcpdump definitely is not going to be fun, because it's all TLS with PFS ciphers. > Another thing you can try is to artificially limit > tune.h2.max-concurrent-streams just in case there is contention in > the server's connection buffers. By default it's 100, you can try with > much less (e.g. 20) and see if it seems to make any difference at all. > The fact that disabling HTTP/2 helps could indicate that something like this is the case here. I'll try that tomorrow, thanks. Best regards Tim Düsterhus Developer WoltLab GmbH -- WoltLab GmbH Nedlitzer Str. 27B 14469 Potsdam Tel.: +49 331 96784338 duester...@woltlab.com www.woltlab.com Managing director: Marcel Werk AG Potsdam HRB 26795 P
Re: Large difference between response time as measured by nginx and as measured by HAProxy
Hi Tim, On Wed, Aug 26, 2020 at 12:59:55PM +0200, Tim Düsterhus, WoltLab GmbH wrote: > Within the HAProxy logs we're sometimes seeing largish 'Tr' values for > static file requests. For example 734ms for the example request below. That sounds huge. I could have imagined a 200ms time, possibly indicating that somewhere between the two, an extra MSG_MORE was present and prevented a request and/or response from leaving the machine. But 734ms doesn't correspond to anything usual. > The two requests in nginx' log: > > > Aug 26 10:23:21 *snip* abd21141cdb4[5615]: 5F46381900A0 "GET *snip* > > HTTP/2.0" 200 27049 0.070 > > Aug 26 10:23:26 *snip* abd21141cdb4[5615]: 5F46381E00F5 "GET *snip*.gif > > HTTP/2.0" 200 6341 0.000 > > Note how the timing for the PHP request matches up what HAProxy was > seeing: Roughly 70ms / 0.070s. The timing for the GIF request is *way* > off, though. Stupid question, do the PHP request *always* match ? I'm asking because one point is that haproxy cannot tell the difference between the two. However the static one very likely responds instantly, which could justify a less common code path being taken and a bug to be triggered. However static files might also cause some disk access, and not knowing anything about the environment, I'd say it's also possible that the whole nginx process gets frozen for the time it takes open() to respond, and if for whatever reason there are lots of I/Os and it takes that long, it might be possible that it doesn't notice it, depending where the total time is retrieved. But it could definitely be a bug in the connection reuse or mux code, or course. It's just that I have no idea what could cause this for now. If you see it often enough, it might be useful to enable h2 traces and watch them on the CLI, or even run a tcpdump between the two to observe what's going on. The traces might be easier as a first step as you'll see already decoded stuff. For example: (echo "trace h2 sink buf0; trace h2 level user; trace h2 verbosity \ complete; trace h2 start now;show events buf0 -w -n" ; cat ) | \ socat - /path/to/socket > /path/to/logs Once you're confident you got it, you can disable all traces using the short form "trace 0". Another thing you can try is to artificially limit tune.h2.max-concurrent-streams just in case there is contention in the server's connection buffers. By default it's 100, you can try with much less (e.g. 20) and see if it seems to make any difference at all. Hoping this helps, Willy
Re: Large difference between response time as measured by nginx and as measured by HAProxy
Ilya, Am 26.08.20 um 13:13 schrieb Илья Шипицин: > nginx buffers output by default. > > can you try > > proxy_request_buffering off; > proxy_buffering off; > The GIF file with the 734ms difference in response time between the two logs is a static file served from the local file system. The proxy_* options do not apply here. In fact nginx is not performing any proxying at all, it's just doing FastCGI. But FastCGI is not the thing I am concerned about. Best regards Tim Düsterhus Developer WoltLab GmbH -- WoltLab GmbH Nedlitzer Str. 27B 14469 Potsdam Tel.: +49 331 96784338 duester...@woltlab.com www.woltlab.com Managing director: Marcel Werk AG Potsdam HRB 26795 P
Re: Large difference between response time as measured by nginx and as measured by HAProxy
nginx buffers output by default. can you try proxy_request_buffering off; proxy_buffering off; ? ср, 26 авг. 2020 г. в 16:02, Tim Düsterhus, WoltLab GmbH < duester...@woltlab.com>: > Hi List, > > please keep this email in Cc for all responses, I am not subscribed to > the list. > > We are running HAProxy 2.1.8 in front of nginx serving static files and > acting as a FastCGI gateway. HAProxy is communicating with nginx using > HTTP/2. Clients are regular web browsers, mostly using HTTP/2. > > Within the HAProxy logs we're sometimes seeing largish 'Tr' values for > static file requests. For example 734ms for the example request below. > The corresponding request in the nginx log shows a $request_time of > 0.000s > (http://nginx.org/en/docs/http/ngx_http_log_module.html#var_request_time). > > HAProxy log line for a static GIF file: > > > Aug 26 10:23:27 *snip* haproxy[3066]: *snip*:1167 > [26/Aug/2020:10:23:26.028] https~ *snip*/nginx 0/0/0/734/1719 200 6540 - - > 19/19/12/12/0 0/0 {*snip*|*snip*} {*snip*|} "GET *snip*.gif HTTP/2.0" > *snip*:59348 5F46381E00F5 > > Some notes regarding the log format: This is the standard HTTP log > format with '%bi:%bp %ID' appended. > > According to the '%bp' value this was not the first request within the > backend connection. The backend connection was established 5 seconds > earlier for a HTTP/1.1 request going to PHP. According to the request ID > this was the 160th (00A0) request after a reload (I just recently added > the logging of %bp). > > > Aug 26 10:23:21 *snip* haproxy[3066]: *snip*:11594 > [26/Aug/2020:10:23:21.148] https~ *snip*/nginx 0/0/2/69/72 200 27432 - - > 19/19/0/0/0 0/0 {*snip*|*snip*} {*snip*|} "GET *snip* HTTP/1.1" > *snip*:59348 5F46381900A0 > > The two requests in nginx' log: > > > Aug 26 10:23:21 *snip* abd21141cdb4[5615]: 5F46381900A0 "GET *snip* > HTTP/2.0" 200 27049 0.070 > > Aug 26 10:23:26 *snip* abd21141cdb4[5615]: 5F46381E00F5 "GET *snip*.gif > HTTP/2.0" 200 6341 0.000 > > Note how the timing for the PHP request matches up what HAProxy was > seeing: Roughly 70ms / 0.070s. The timing for the GIF request is *way* > off, though. > > Any idea what is happening there? It's not really reproducible in any > way. I'll be happy to add more stuff to my logs if that would help. I > can also provide specific configuration settings on request. > > Best regards > Tim Düsterhus > Developer WoltLab GmbH > > -- > > WoltLab GmbH > Nedlitzer Str. 27B > 14469 Potsdam > > Tel.: +49 331 96784338 > > duester...@woltlab.com > www.woltlab.com > > Managing director: > Marcel Werk > > AG Potsdam HRB 26795 P > >