Re: Throughput slow with kernel 4.9.0

2018-09-26 Thread Willy Tarreau
Hi Brendon,

On Wed, Sep 26, 2018 at 02:45:29PM -0500, Brendon Colby wrote:
>   Tc mean: 0.59 ms
>   Tc std dev: 17.49 ms
>   Tc max: 1033.00 ms
>   Tc median: 0.00 ms

I don't know if all your servers are local, but if that's the case, the
Tc should always be very small and an stddev of 17ms and a max of 1s are
huge. This could indicate random pauses in the hypervisor which could
confirm the possibilities of traffic bursts I was talking about.

> So it looks like Tc isn't the issue here. Everything else looks good
> to my eyes. I still think something else changed, because on Jessie
> this never happened like I said.

As I said it's very possible that with a change of limit you were
slightly above the minimum required settings and now you're slightly
below after the kernel change.

> > But I also confess I've not run a VM test myself for a
> > while because each time I feel like I'm going to throw up in the middle
> > of the test :-/
> 
> I know that's always been your position on VMs (ha) but one day I
> decided to try it for myself and haven't had a single issue until now.
> Our old hardware sat nearly 100% idle most of the time, so it was hard
> to justify the expense.

Oh don't get me wrong, I know how this happens and am not complaining
about it. I'm just saying that using VMs is mostly a cost saving
solution and that if you cut costs you often have to expect a sacrifice
on something else. For those who can stand a slight degradation of
performance or latency, or spend some time chasing issues which do not
exist from time to time, that's fine. Others can't afford this at all
and will prefer bare metal. It's just a matter of trade-off.

At least if you found a way to tune your system to work around this
issue, you should simply document it somewhere for you or your coworkers
and consider yourself lucky to have saved $93/mo without degrading the
performance :-)

Willy



Re: lua haproxy-auth-request - round 2

2018-09-26 Thread Tim Düsterhus
Bob,

Am 25.09.2018 um 01:08 schrieb Computerisms Corporation:
> if you are still with me, thank you so much for reading this far.  I
> would truly appreciate any thoughts you might have on how to diagnose
> what is causing this issue...
> 

While I don't have any idea from the top of my head your configuration
surely would be helpful.

You might also want to check whether the webbrowser is able to:
a) Set-Up the Websocket with auth-request in between (you should see a
101 Switching Protocols in it's network console).
b) Send credentials for basic authentication for Websockets.

Best regards
Tim Düsterhus



Slow backend request processing?

2018-09-26 Thread Reinhard Vicinus
Hi,

I have a problem were I don't know were to look further and I hope I can
get some pointers. The problem is that during times with a high request
count sometimes answers for requests take a long time. So I identified
an request that took a long time and got the log entries from the
haproxy and the backend nginx server:

haproxy:

Sep 26 09:47:25 heliosa haproxy[21758]: 109.41.0.201:18679
[26/Sep/2018:09:47:17.102] ext-www-shared-auction1-https~
www-shared-auction1-http/auction1-shared 0/0/0/5607/7748 200 663 - -
 6870/423/292/267/0 0/0 "GET
/include/ajax/hotbid.php?PHPSESSID=qkmm012o23u8i9esj6nksfo445&_=1537945522572
HTTP/1.1"

nginx (the logformat is non standard):

"time": "2018-09-26T09:47:17+02:00"
"body_bytes_sent": "237"
"request_time": "0.036"
"status": "200"
"request": "GET
/include/ajax/hotbid.php?PHPSESSID=qkmm012o23u8i9esj6nksfo445&_=1537945522572
HTTP/1.1"

If I understand the haproxy default logformat correctly then the part
0/0/0/5607/7748 means that the request to the backend took around 7.748s
(Tq, Tw and Tc are zero and Tr = 5607ms and Td = 2141ms; which means Tt
= 7748ms is more or less completely used talking to the backend). But
the backend  server says it only used 0.036s to handle the request. I
made a tcpdump on the server running the haproxy instance (the tcpdump
of the request is attached) and if the timestamps are correct the
request was indeed received in 0.039s.

So either I made some mistake or it took around 5 seconds
transfering/processing the request from the network interface into the
haproxy instance. I looked into the statistics recorded of the server
running haproxy and nothing seems noteworthy (no dropped packages or
errors on the network interfaces, acceptable cpu and memory usage). The
system is running haproxy 1.6.14 on an old Ubuntu trusty with 3.13.0-158
ubuntu kernel. It worked in the past without those problems.

I am out of ideas where to look further for the cause of this issue and
as said in the beginning any pointers or ideas were to look further
would be greatly appreciated.

Thanks
Reinhard


request_qkmm012o23u8i9esj6nksfo445_1537945522572.pcap
Description: application/vnd.tcpdump.pcap


Re: Throughput slow with kernel 4.9.0

2018-09-26 Thread Brendon Colby
On Tue, Sep 25, 2018 at 10:51 PM Willy Tarreau  wrote:

Hi Willy,

> Just be careful as you are allocating 64GB of RAM to the TCP stack.

Yeah, after I figured that out I set it down to 1M / 4GB ram which is
enough to handle peak traffic on only one proxy VM.

> As a hint, take a look at the connection timers in your logs.

I calculated some stats from a sample of about 400K requests:

  Tw mean: 0.00 ms
  Tw std dev: 0.00 ms
  Tw max: 0.00 ms
  Tw median: 0.00 ms

  Tt mean: 296.70 ms
  Tt std dev: 4724.12 ms
  Tt max: 570127.00 ms
  Tt median: 1.00 ms

  Tr mean: 22.90 ms
  Tr std dev: 129.48 ms
  Tr max: 19007.00 ms
  Tr median: 1.00 ms

  Tc mean: 0.59 ms
  Tc std dev: 17.49 ms
  Tc max: 1033.00 ms
  Tc median: 0.00 ms

  Tq mean: 0.01 ms
  Tq std dev: 7.90 ms
  Tq max: 4980.00 ms
  Tq median: 0.00 ms

So it looks like Tc isn't the issue here. Everything else looks good
to my eyes. I still think something else changed, because on Jessie
this never happened like I said.

> But I also confess I've not run a VM test myself for a
> while because each time I feel like I'm going to throw up in the middle
> of the test :-/

I know that's always been your position on VMs (ha) but one day I
decided to try it for myself and haven't had a single issue until now.
Our old hardware sat nearly 100% idle most of the time, so it was hard
to justify the expense.

Performance on hardware is much better, I'm sure, but none of my tests
show enough of a performance boost to justify even the $93/mo servers
I was looking at renting. Properly tuned VMs have worked really well
for us.

> We're mostly saying this because everywhere on the net we find copies of
> bad values for this field, resulting in out of memory issues for those
> who blindly copy-paste them.

Yep I totally understand that. I think I was just saying that since
everyone says "never change this" there was no discussion around what
it is exactly, what it does, what happens during memory pressure mode,
how to measure if you need to change it, etc.

> Regards,
> Willy

Thanks for chiming in on this, Willy.

-- 
Brendon Colby
Senior DevOps Engineer
Newgrounds.com



Re: Crash reposrt

2018-09-26 Thread Olivier Houchard
Hi Anton,

On Wed, Sep 26, 2018 at 12:09:07PM +0300, prog 76 wrote:
> 
> Hi
> First of all Thank you for this great product. We are very happy to use it 
> for years.
> Unfortunately from version 1.8.12 we have an issue. Sometimes haproxy crash.
> We tried  to upgrade to 1.8.13 and it also crashes.
> Not often, we have almot constant load and it crashes 1 times per week. We 
> have count of instances on different machines and enough amount of them are 
> ready. Also we use Monit service to restart crashed instances.
> But anyway not good because we broke connections.
> Our config haproxy.cfg attached. 
> Crash data haproxy.core.zip attached.
> 

Beware when sending the core on the mailing list, it may contain sensitive
informations.

> It seems always crashes on the same URL last log lines from two crashes
> 
> proxy1web-next haproxy[12763]: x.x.x.x:6733 [07/Sep/2018:10:16:56.924] https~ 
> oldpool/web1-next 2634/0/1/82/3649 500 1434 - -  2/1/0/1/0 0/0 "POST 
> /Mobile/UpdateTechnicianPositionBatched HTTP/1.1"
> proxy2web-next haproxy[88028]: x.x.x.x:6750 [12/Sep/2018:10:55:56.696] https~ 
> oldpool/web2-next 1597/0/1/78/3802 500 1434 - -  4/3/0/1/0 0/0 "POST 
> /Mobile/UpdateTechnicianPositionBatched HTTP/1.1"
> 
> Back trace looks like
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p 
> /var/run/haproxy.pid'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x7f5e6c27981d in __memmove_ssse3_back ()
> at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1881
> 1881 ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: No such file or 
> directory.
> (gdb) bt
> #0 0x7f5e6c27981d in __memmove_ssse3_back ()
> at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1881
> #1 0x556549c98e9b in memmove (__len=, __src=0x7f5e480594c7,
> __dest=0x7f5e480594e1) at /usr/include/x86_64-linux-gnu/bits/string3.h:57
> #2 buffer_insert_line2 (b=0x7f5e48055f70,
> pos=0x7f5e480594c7 
> "\r\n{\"location\":[{\"coords\":{\"speed\":-1,\"longitude\":-122.31412169240122,\"latitude\":47.311481963215847,\"accuracy\":70.4",
>  '0' , 
> "6,\"altitude_accuracy\":10,\"altitude\":121.7,\"heading\":-1},\"extras\":{},\"is_m"...,
> str=str@entry=0x556549ff0340 "X-Forwarded-Proto: https", len=len@entry=24)
> at src/buffer.c:158
> #3 0x556549bd35b7 in http_header_add_tail (msg=msg@entry=0x7f5e5004a570,
> hdr_idx=hdr_idx@entry=0x7f5e5004a510, text=0x556549ff0340 "X-Forwarded-Proto: 
> https")
> at src/proto_http.c:538
> #4 0x556549bdc9d8 in http_process_req_common (s=s@entry=0x7f5e50027470,
> req=req@entry=0x7f5e50027480, an_bit=an_bit@entry=16, px=0x556549fed430)
> at src/proto_http.c:3516
> #5 0x556549c0dc72 in process_stream (t=) at 
> src/stream.c:1905
> #6 0x556549c8ba07 in process_runnable_tasks () at src/task.c:317
> #7 0x556549c3ec85 in run_poll_loop () at src/haproxy.c:2403
> #8 run_thread_poll_loop (data=) at src/haproxy.c:2470
> #9 0x7f5e6cfa4184 in start_thread (arg=0x7f5e68412700) at 
> pthread_create.c:312
> #10 0x7f5e6c22237d in eventfd (count=26, flags=1208320820)
> at ../sysdeps/unix/sysv/linux/eventfd.c:55
> #11 0x in ?? ()
> (gdb)
>  
> And unfortunately we were not able to reproduce the issue manually. Tried 
> slowhttptest, curl, for now no luck.
> 
> Could you please suggest next steps that we can do to fix that?
> 

Any chance you can give us the binary that generated that core as well ? That
would be really useful.

Thanks !

Olivier