Re: CPU Spikes

2019-07-15 Thread Sander Klein

On 2019-07-09 08:53, Sander Klein wrote:


It could be useful to issue "show activity" twice 1 second apart when
this happens, and maybe even "show fd" and "show sess all" if you 
don't

have too many connections.


Right, I will do the above steps. But, since this only happens on
Mondays we have to wait a bit ;-)


Drat, the harvester was early this week. So, it was already done when I 
arrived at work. I hope I can catch it next week.


Sander

0x2E78FBE8.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: CPU Spikes

2019-07-09 Thread Sander Klein

Hey Willy,


On 2019-07-09 08:09, Willy Tarreau wrote:
What's you CPU like between the peaks ? 1%, 10%, 50% ? Just to get a 
rough

estimate of whether it's something reaching a critical point or if it's
something doing its mess alone in its corner.


In between the spikes it's about 7% System, 11% User, 6% Softirq, 76% 
Idle. Bandwidth is then about 500Mbit/s, mostly outbound.


What I didn't notice before, but just saw while staring at my graphs, is 
I get more incoming traffic during the CPU spikes. So, I'm doing about 
500Mbit/s, then the incoming traffic rises to about 100Mbit/s (probably 
a HTTP POST), CPU spikes, total traffic drops to about 200Mbit/s,  
everything starts getting slow.


I had HAProxy running on physical hardware with an E5-2407 and 1Gbit 
NIC. Now it is running as a VM on an E5-2650 with 10Gbit NIC. With the 
same issues.



Are you using threads ? I'm asking because I'm currently working on an
issue which I found could cause exactly this behaviour. I'm fairly 
certain

we've met it in the past without being able to attribute it to exactly
this.


Yes, I'm using threads.


If you're using threads, attaching gdb to the process and issuing "info
threads" will tell us where they are. If many of them are in
fd_update_events() or fd_may_recv(), you're likely on the one I've been
working on.

Other possibilities (due to the regularity of your observation) are :
  - timeouts (check in your conf if a 10s timeout appears somewhere,
maybe it triggers and is improperly caught)


I have the following timeouts in defaults:
timeout client  60s
timeout connect 10s
timeout http-keep-alive 4s
timeout http-request15s
timeout queue   30s
timeout server  60s
timeout tarpit  120s

Looking at the spikes again it's more like a 20 second up, 20 second 
down. But that probably has more to do with the POST taking that long.



  - health checks (maybe you have 10s checks, or 2s checks with 4
retries or I don't know what, which causes a special event to
occur after 10s)


Check are every 2s with a rise of 3 and a fall of 3.


In any case you're clearly facing a bug, but it's always difficult to
tell.

It could be useful to issue "show activity" twice 1 second apart when
this happens, and maybe even "show fd" and "show sess all" if you don't
have too many connections.


Right, I will do the above steps. But, since this only happens on 
Mondays we have to wait a bit ;-)


Regards,

Sander


0x2E78FBE8.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: CPU Spikes

2019-07-09 Thread Willy Tarreau
Hi Sander,

On Mon, Jul 08, 2019 at 02:44:44PM +0200, Sander Klein wrote:
> Hi,
> 
> I'm having an issue with HAProxy causing CPU spikes with certain traffic.

We've actually fixed quite a number of issues causing this over the last
few years, though most of them are already addressed by the versions you're
running.

> We have a client who is downloading lots of URL's during the night. When the
> download starts there is not much other traffic going on and there doesn't
> seem to be any problem. But, when the morning comes, 'normal' traffic starts
> hitting HAProxy and every 10 seconds or so, HAProxy starts eating 100% of
> CPU while network traffic drops. When HAProxy stops eating CPU after 10
> seconds, network traffic rises again. When the crawler is finished
> everything returns to normal. So it looks like some kind of mix of traffic
> which causes it.

What's you CPU like between the peaks ? 1%, 10%, 50% ? Just to get a rough
estimate of whether it's something reaching a critical point or if it's
something doing its mess alone in its corner.

> I've tested it with HAProxy 1.8.20, 1.9.8 (which I am running by default)
> and 2.0.1. They all show the same behaviour. I also tried with 2 different
> kernels to see if anything happens there. With kernel 4.9 top show HAProxy
> using 100% CPU where 50% is user and 50% is system. With kernel 4.19 I see
> 100% CPU usage with 70% user and 50% system.

In fact once something stats to loop, all calls are so short that it's very
difficult for the system to measure an accurate time spent in user/sys, so
I am not surprised that it changes with the kernel.

> I also tried with disabling H2, splicing, and some regexes I use. Even tried
> new hardware, and moved it to a VM just to see if I could find any
> difference, but none...

Are you using threads ? I'm asking because I'm currently working on an
issue which I found could cause exactly this behaviour. I'm fairly certain
we've met it in the past without being able to attribute it to exactly
this.

> Does anyone have a good idea how to troubleshoot this any further?

If you're using threads, attaching gdb to the process and issuing "info
threads" will tell us where they are. If many of them are in
fd_update_events() or fd_may_recv(), you're likely on the one I've been
working on.

Other possibilities (due to the regularity of your observation) are :
  - timeouts (check in your conf if a 10s timeout appears somewhere,
maybe it triggers and is improperly caught)
  - health checks (maybe you have 10s checks, or 2s checks with 4
retries or I don't know what, which causes a special event to
occur after 10s)

In any case you're clearly facing a bug, but it's always difficult to
tell.

It could be useful to issue "show activity" twice 1 second apart when
this happens, and maybe even "show fd" and "show sess all" if you don't
have too many connections.

Thanks,
Willy



CPU Spikes

2019-07-08 Thread Sander Klein

Hi,

I'm having an issue with HAProxy causing CPU spikes with certain 
traffic.


We have a client who is downloading lots of URL's during the night. When 
the download starts there is not much other traffic going on and there 
doesn't seem to be any problem. But, when the morning comes, 'normal' 
traffic starts hitting HAProxy and every 10 seconds or so, HAProxy 
starts eating 100% of CPU while network traffic drops. When HAProxy 
stops eating CPU after 10 seconds, network traffic rises again. When the 
crawler is finished everything returns to normal. So it looks like some 
kind of mix of traffic which causes it.


I've tested it with HAProxy 1.8.20, 1.9.8 (which I am running by 
default) and 2.0.1. They all show the same behaviour. I also tried with 
2 different kernels to see if anything happens there. With kernel 4.9 
top show HAProxy using 100% CPU where 50% is user and 50% is system. 
With kernel 4.19 I see 100% CPU usage with 70% user and 50% system.


I also tried with disabling H2, splicing, and some regexes I use. Even 
tried new hardware, and moved it to a VM just to see if I could find any 
difference, but none...


Does anyone have a good idea how to troubleshoot this any further?

Regards,

Sander

0x2E78FBE8.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature