Re: Throughput slow with kernel 4.9.0
Hi Brendon, On Wed, Sep 26, 2018 at 02:45:29PM -0500, Brendon Colby wrote: > Tc mean: 0.59 ms > Tc std dev: 17.49 ms > Tc max: 1033.00 ms > Tc median: 0.00 ms I don't know if all your servers are local, but if that's the case, the Tc should always be very small and an stddev of 17ms and a max of 1s are huge. This could indicate random pauses in the hypervisor which could confirm the possibilities of traffic bursts I was talking about. > So it looks like Tc isn't the issue here. Everything else looks good > to my eyes. I still think something else changed, because on Jessie > this never happened like I said. As I said it's very possible that with a change of limit you were slightly above the minimum required settings and now you're slightly below after the kernel change. > > But I also confess I've not run a VM test myself for a > > while because each time I feel like I'm going to throw up in the middle > > of the test :-/ > > I know that's always been your position on VMs (ha) but one day I > decided to try it for myself and haven't had a single issue until now. > Our old hardware sat nearly 100% idle most of the time, so it was hard > to justify the expense. Oh don't get me wrong, I know how this happens and am not complaining about it. I'm just saying that using VMs is mostly a cost saving solution and that if you cut costs you often have to expect a sacrifice on something else. For those who can stand a slight degradation of performance or latency, or spend some time chasing issues which do not exist from time to time, that's fine. Others can't afford this at all and will prefer bare metal. It's just a matter of trade-off. At least if you found a way to tune your system to work around this issue, you should simply document it somewhere for you or your coworkers and consider yourself lucky to have saved $93/mo without degrading the performance :-) Willy
Re: lua haproxy-auth-request - round 2
Bob, Am 25.09.2018 um 01:08 schrieb Computerisms Corporation: > if you are still with me, thank you so much for reading this far. I > would truly appreciate any thoughts you might have on how to diagnose > what is causing this issue... > While I don't have any idea from the top of my head your configuration surely would be helpful. You might also want to check whether the webbrowser is able to: a) Set-Up the Websocket with auth-request in between (you should see a 101 Switching Protocols in it's network console). b) Send credentials for basic authentication for Websockets. Best regards Tim Düsterhus
Slow backend request processing?
Hi, I have a problem were I don't know were to look further and I hope I can get some pointers. The problem is that during times with a high request count sometimes answers for requests take a long time. So I identified an request that took a long time and got the log entries from the haproxy and the backend nginx server: haproxy: Sep 26 09:47:25 heliosa haproxy[21758]: 109.41.0.201:18679 [26/Sep/2018:09:47:17.102] ext-www-shared-auction1-https~ www-shared-auction1-http/auction1-shared 0/0/0/5607/7748 200 663 - - 6870/423/292/267/0 0/0 "GET /include/ajax/hotbid.php?PHPSESSID=qkmm012o23u8i9esj6nksfo445&_=1537945522572 HTTP/1.1" nginx (the logformat is non standard): "time": "2018-09-26T09:47:17+02:00" "body_bytes_sent": "237" "request_time": "0.036" "status": "200" "request": "GET /include/ajax/hotbid.php?PHPSESSID=qkmm012o23u8i9esj6nksfo445&_=1537945522572 HTTP/1.1" If I understand the haproxy default logformat correctly then the part 0/0/0/5607/7748 means that the request to the backend took around 7.748s (Tq, Tw and Tc are zero and Tr = 5607ms and Td = 2141ms; which means Tt = 7748ms is more or less completely used talking to the backend). But the backend server says it only used 0.036s to handle the request. I made a tcpdump on the server running the haproxy instance (the tcpdump of the request is attached) and if the timestamps are correct the request was indeed received in 0.039s. So either I made some mistake or it took around 5 seconds transfering/processing the request from the network interface into the haproxy instance. I looked into the statistics recorded of the server running haproxy and nothing seems noteworthy (no dropped packages or errors on the network interfaces, acceptable cpu and memory usage). The system is running haproxy 1.6.14 on an old Ubuntu trusty with 3.13.0-158 ubuntu kernel. It worked in the past without those problems. I am out of ideas where to look further for the cause of this issue and as said in the beginning any pointers or ideas were to look further would be greatly appreciated. Thanks Reinhard request_qkmm012o23u8i9esj6nksfo445_1537945522572.pcap Description: application/vnd.tcpdump.pcap
Re: Throughput slow with kernel 4.9.0
On Tue, Sep 25, 2018 at 10:51 PM Willy Tarreau wrote: Hi Willy, > Just be careful as you are allocating 64GB of RAM to the TCP stack. Yeah, after I figured that out I set it down to 1M / 4GB ram which is enough to handle peak traffic on only one proxy VM. > As a hint, take a look at the connection timers in your logs. I calculated some stats from a sample of about 400K requests: Tw mean: 0.00 ms Tw std dev: 0.00 ms Tw max: 0.00 ms Tw median: 0.00 ms Tt mean: 296.70 ms Tt std dev: 4724.12 ms Tt max: 570127.00 ms Tt median: 1.00 ms Tr mean: 22.90 ms Tr std dev: 129.48 ms Tr max: 19007.00 ms Tr median: 1.00 ms Tc mean: 0.59 ms Tc std dev: 17.49 ms Tc max: 1033.00 ms Tc median: 0.00 ms Tq mean: 0.01 ms Tq std dev: 7.90 ms Tq max: 4980.00 ms Tq median: 0.00 ms So it looks like Tc isn't the issue here. Everything else looks good to my eyes. I still think something else changed, because on Jessie this never happened like I said. > But I also confess I've not run a VM test myself for a > while because each time I feel like I'm going to throw up in the middle > of the test :-/ I know that's always been your position on VMs (ha) but one day I decided to try it for myself and haven't had a single issue until now. Our old hardware sat nearly 100% idle most of the time, so it was hard to justify the expense. Performance on hardware is much better, I'm sure, but none of my tests show enough of a performance boost to justify even the $93/mo servers I was looking at renting. Properly tuned VMs have worked really well for us. > We're mostly saying this because everywhere on the net we find copies of > bad values for this field, resulting in out of memory issues for those > who blindly copy-paste them. Yep I totally understand that. I think I was just saying that since everyone says "never change this" there was no discussion around what it is exactly, what it does, what happens during memory pressure mode, how to measure if you need to change it, etc. > Regards, > Willy Thanks for chiming in on this, Willy. -- Brendon Colby Senior DevOps Engineer Newgrounds.com
Re: Crash reposrt
Hi Anton, On Wed, Sep 26, 2018 at 12:09:07PM +0300, prog 76 wrote: > > Hi > First of all Thank you for this great product. We are very happy to use it > for years. > Unfortunately from version 1.8.12 we have an issue. Sometimes haproxy crash. > We tried to upgrade to 1.8.13 and it also crashes. > Not often, we have almot constant load and it crashes 1 times per week. We > have count of instances on different machines and enough amount of them are > ready. Also we use Monit service to restart crashed instances. > But anyway not good because we broke connections. > Our config haproxy.cfg attached. > Crash data haproxy.core.zip attached. > Beware when sending the core on the mailing list, it may contain sensitive informations. > It seems always crashes on the same URL last log lines from two crashes > > proxy1web-next haproxy[12763]: x.x.x.x:6733 [07/Sep/2018:10:16:56.924] https~ > oldpool/web1-next 2634/0/1/82/3649 500 1434 - - 2/1/0/1/0 0/0 "POST > /Mobile/UpdateTechnicianPositionBatched HTTP/1.1" > proxy2web-next haproxy[88028]: x.x.x.x:6750 [12/Sep/2018:10:55:56.696] https~ > oldpool/web2-next 1597/0/1/78/3802 500 1434 - - 4/3/0/1/0 0/0 "POST > /Mobile/UpdateTechnicianPositionBatched HTTP/1.1" > > Back trace looks like > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p > /var/run/haproxy.pid'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x7f5e6c27981d in __memmove_ssse3_back () > at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1881 > 1881 ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: No such file or > directory. > (gdb) bt > #0 0x7f5e6c27981d in __memmove_ssse3_back () > at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1881 > #1 0x556549c98e9b in memmove (__len=, __src=0x7f5e480594c7, > __dest=0x7f5e480594e1) at /usr/include/x86_64-linux-gnu/bits/string3.h:57 > #2 buffer_insert_line2 (b=0x7f5e48055f70, > pos=0x7f5e480594c7 > "\r\n{\"location\":[{\"coords\":{\"speed\":-1,\"longitude\":-122.31412169240122,\"latitude\":47.311481963215847,\"accuracy\":70.4", > '0' , > "6,\"altitude_accuracy\":10,\"altitude\":121.7,\"heading\":-1},\"extras\":{},\"is_m"..., > str=str@entry=0x556549ff0340 "X-Forwarded-Proto: https", len=len@entry=24) > at src/buffer.c:158 > #3 0x556549bd35b7 in http_header_add_tail (msg=msg@entry=0x7f5e5004a570, > hdr_idx=hdr_idx@entry=0x7f5e5004a510, text=0x556549ff0340 "X-Forwarded-Proto: > https") > at src/proto_http.c:538 > #4 0x556549bdc9d8 in http_process_req_common (s=s@entry=0x7f5e50027470, > req=req@entry=0x7f5e50027480, an_bit=an_bit@entry=16, px=0x556549fed430) > at src/proto_http.c:3516 > #5 0x556549c0dc72 in process_stream (t=) at > src/stream.c:1905 > #6 0x556549c8ba07 in process_runnable_tasks () at src/task.c:317 > #7 0x556549c3ec85 in run_poll_loop () at src/haproxy.c:2403 > #8 run_thread_poll_loop (data=) at src/haproxy.c:2470 > #9 0x7f5e6cfa4184 in start_thread (arg=0x7f5e68412700) at > pthread_create.c:312 > #10 0x7f5e6c22237d in eventfd (count=26, flags=1208320820) > at ../sysdeps/unix/sysv/linux/eventfd.c:55 > #11 0x in ?? () > (gdb) > > And unfortunately we were not able to reproduce the issue manually. Tried > slowhttptest, curl, for now no luck. > > Could you please suggest next steps that we can do to fix that? > Any chance you can give us the binary that generated that core as well ? That would be really useful. Thanks ! Olivier