Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-12 Thread Willy Tarreau
Hi Lukas,

On Sun, Nov 12, 2017 at 08:09:29PM +0100, Lukas Tribus wrote:
> Hello,
> 
> 
> 2017-11-10 23:43 GMT+01:00 PiBa-NL :
> > Okay have been running with haproxy-ss-20171017 for a day now. Sofar it
> > sticks to <1% cpu usage.
> 
> FYI a similar report is on discourse, on linux without splicing
> involved. Upgrading from 1.7.9 to haproxy-ss-20171017 appears to have
> solved the problem:
> 
> https://discourse.haproxy.org/t/haproxy-consuming-100-cpu/1758

Thanks for the feedback. We'll release 1.7.10 ASAP with the pending
fixes. It's been waiting too long now.

Cheers,
Willy



Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-12 Thread Lukas Tribus
Hello,


2017-11-10 23:43 GMT+01:00 PiBa-NL :
> Okay have been running with haproxy-ss-20171017 for a day now. Sofar it
> sticks to <1% cpu usage.

FYI a similar report is on discourse, on linux without splicing
involved. Upgrading from 1.7.9 to haproxy-ss-20171017 appears to have
solved the problem:

https://discourse.haproxy.org/t/haproxy-consuming-100-cpu/1758


Lukas



Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-10 Thread Willy Tarreau
Hi Pieter,

On Fri, Nov 10, 2017 at 11:43:41PM +0100, PiBa-NL wrote:
> Okay have been running with haproxy-ss-20171017 for a day now. Sofar it
> sticks to <1% cpu usage.
> 
> Will report if anything changes, cant tell for sure as don't have a clear
> reproduction of the issue, but my issue 'seems' fixed.

Great! If anything changes and you want to report what is wrong, please
don't forget to issue the following :
   - "show sess all" on the cli
   - attach gdb to the running process and issue a few series of "bt full",
 "cont", ctrl-c, "bt full", ... to try to catch the code path commonly
 called.

Cheers,
Willy



Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-10 Thread PiBa-NL

Hi Willy,

Op 9-11-2017 om 5:45 schreef Willy Tarreau:

Hi Pieter,

We had something similar on Linux in relation with TCP splicing and the
fd cache, for which a fix was emitted. But yesterday Christopher explained
me that the fix has an impact on the way applets are scheduled in 1.8, so
actually it could mean that the initial bug might possibly cover a larger
scope than splicing only, and that recv+send might also be affected. If
you're interested in testing, the commit in 1.7 is
c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and
is present in the latest snapshot (we really need to emit 1.7.10 BTW).

I'd be curious to know if it fixes it or not. At least it will tell us
if that's related to this fd cache thing or to something completely
different such as Lua.

I also need to check with Thierry if we could find a way to add some
stats about the time spent in Lua to "show info" to help debugging such
cases where Lua is involved.

By the way, thanks for your dump, we'll check the sessions' statuses.
There are not that many, and maybe it will give us a useful indication!

Cheers,
Willy


Okay have been running with haproxy-ss-20171017 for a day now. Sofar it 
sticks to <1% cpu usage.


Will report if anything changes, cant tell for sure as don't have a 
clear reproduction of the issue, but my issue 'seems' fixed.


Regards,

PiBa-NL / Pieter




Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-09 Thread Willy Tarreau
On Fri, Nov 10, 2017 at 05:25:11AM +0100, Willy Tarreau wrote:
> I just checked the contents and its OK. I think we have something wrong
> on the script which generates the snapshots. The git version and the date
> are normally updated on the fly by git when building the archive, thanks
> to attributes. I'm not seeing the attribute file anymore in the repository
> so it's possible we've lost it during the last server migration, I'll take
> care of it.

Now fixed, 1.4 to 1.7 were affected. Future snapshots should be OK starting
from tomorrow.

Willy



Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-09 Thread Willy Tarreau
Hi Pieter,

On Fri, Nov 10, 2017 at 12:18:30AM +0100, PiBa-NL wrote:
> Hi Willy, List,
> 
> Is it correct that when i build a haproxy-ss-20171017 snapshot that the
> version still shows up as:
> "HAProxy version 1.7.9, released 2017/08/18"
> on both haproxy -vv and stats page.?
> 
> Or did i do it wrong?

I just checked the contents and its OK. I think we have something wrong
on the script which generates the snapshots. The git version and the date
are normally updated on the fly by git when building the archive, thanks
to attributes. I'm not seeing the attribute file anymore in the repository
so it's possible we've lost it during the last server migration, I'll take
care of it.

Thanks for the report.

> p.s. I changed the Makefile like this:
> PORTNAME=    haproxy-ss
> PORTVERSION=    20171017
> CATEGORIES=    net www
> MASTER_SITES=    http://www.haproxy.org/download/1.7/src/snapshot/
> 
> And then ran:
>     make clean build install NO_CHECKSUM=yes
> 
> Which did 'seem' to download the 'intended?' file..

I don't know these files above, they're probably related to your system,
but in the snapshot file I'm clearly seeing that this is wrong :

bash-4.2$ cd haproxy-ss-20171017/
bash-4.2$ cat SUBVERS 
-$Format:%h$

bash-4.2$ cat VERDATE 
$Format:%ci$
2017/08/18

Normally these ones are changed on the fly by git-archive. Don't worry,
the problem is on our side :-)

Thanks for reporting this,
Willy



Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-09 Thread PiBa-NL

Hi Willy, List,

Is it correct that when i build a haproxy-ss-20171017 snapshot that the 
version still shows up as:

"HAProxy version 1.7.9, released 2017/08/18"
on both haproxy -vv and stats page.?

Or did i do it wrong?

p.s. I changed the Makefile like this:
PORTNAME=    haproxy-ss
PORTVERSION=    20171017
CATEGORIES=    net www
MASTER_SITES=    http://www.haproxy.org/download/1.7/src/snapshot/

And then ran:
    make clean build install NO_CHECKSUM=yes

Which did 'seem' to download the 'intended?' file..

Thanks,
PiBa-NL / Pieter



Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-09 Thread PiBa-NL

Hi Willy,

Op 9-11-2017 om 5:45 schreef Willy Tarreau:

Hi Pieter,

On Thu, Nov 09, 2017 at 02:28:46AM +0100, PiBa-NL wrote:

Actually haproxy has been running for a few weeks with 100% and i didnt
notice.. it does keep working it seems..

Anyhow thought i would try and capture the next event if it would happen
again. It did after a few hours..

After the truss output below the last line keeps repeating fast lots and
lots of times.

kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 1
(0x1)
recvfrom(1,0x8024ed972,16290,0,NULL,0x0)     = 0 (0x0)
kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)

We had something similar on Linux in relation with TCP splicing and the
fd cache, for which a fix was emitted. But yesterday Christopher explained
me that the fix has an impact on the way applets are scheduled in 1.8, so
actually it could mean that the initial bug might possibly cover a larger
scope than splicing only, and that recv+send might also be affected. If
you're interested in testing, the commit in 1.7 is
c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and
is present in the latest snapshot (we really need to emit 1.7.10 BTW).

I'd be curious to know if it fixes it or not. At least it will tell us
if that's related to this fd cache thing or to something completely
different such as Lua.

I also need to check with Thierry if we could find a way to add some
stats about the time spent in Lua to "show info" to help debugging such
cases where Lua is involved.

By the way, thanks for your dump, we'll check the sessions' statuses.
There are not that many, and maybe it will give us a useful indication!

Cheers,
Willy


Thanks for your time, i didn't think the 'splice' problem mentioned on 
mailing-list would be relevant for my case so i didn't see a need to try 
latest snapshot. Couldn't find much other recent cpu issues there. But 
ill try and compile haproxy 1.7 latest snapshot or perhaps just 1.7.9 
with this extra patch and see if it keeps running with low cpu usage for 
a few days.. I have not compiled haproxy for a while, ill see what works 
the easiest for me, assuming can make it work build on a separate 
FreeBSD machine and packaged/copied to the actual 'problem machine' that 
doesn't have compilation tools on it.. hopefully my build binary will be 
'compatible'..


Will report back in a few day's..

Thanks,
PiBa-NL / Pieter




Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-08 Thread Willy Tarreau
Hi Pieter,

On Thu, Nov 09, 2017 at 02:28:46AM +0100, PiBa-NL wrote:
> Actually haproxy has been running for a few weeks with 100% and i didnt
> notice.. it does keep working it seems..
> 
> Anyhow thought i would try and capture the next event if it would happen
> again. It did after a few hours..
> 
> After the truss output below the last line keeps repeating fast lots and
> lots of times.
> 
> kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
> kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
> kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
> kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 1
> (0x1)
> recvfrom(1,0x8024ed972,16290,0,NULL,0x0)     = 0 (0x0)
> kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
> kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
> kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
> kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
> kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
> kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)

We had something similar on Linux in relation with TCP splicing and the
fd cache, for which a fix was emitted. But yesterday Christopher explained
me that the fix has an impact on the way applets are scheduled in 1.8, so
actually it could mean that the initial bug might possibly cover a larger
scope than splicing only, and that recv+send might also be affected. If
you're interested in testing, the commit in 1.7 is
c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and
is present in the latest snapshot (we really need to emit 1.7.10 BTW).

I'd be curious to know if it fixes it or not. At least it will tell us
if that's related to this fd cache thing or to something completely
different such as Lua.

I also need to check with Thierry if we could find a way to add some
stats about the time spent in Lua to "show info" to help debugging such
cases where Lua is involved.

By the way, thanks for your dump, we'll check the sessions' statuses.
There are not that many, and maybe it will give us a useful indication!

Cheers,
Willy



HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-08 Thread PiBa-NL

Hi List,

I've experienced a issue where its using 100% cpu usage with haproxy 
1.7.9 on FreeBSD 11.1p3 / pfSense 2.4.2dev.


There is very little traffic actually hitting this haproxy instance. But 
it happened for the second time in a few days now.
Actually haproxy has been running for a few weeks with 100% and i didnt 
notice.. it does keep working it seems..


Anyhow thought i would try and capture the next event if it would happen 
again. It did after a few hours..


After the truss output below the last line keeps repeating fast lots and 
lots of times.


kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 
1 (0x1)

recvfrom(1,0x8024ed972,16290,0,NULL,0x0)     = 0 (0x0)
kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)

I tried to gather all possible relevant info in attached file. Not using 
much special configuration options.. but i am using lua to server a 
small simple static response.. I'm not sure if its a problem that might 
be related to LUA, or perhaps there is some other issue.?.
I've got tcpdump and complete truss output from before and while it 
happened after a few hours, but actually just a few request (+- 29).. 
But i would prefer to send these off list though, Willy if you desire i 
send em to your mail address? But maybe i have overlooked it on the 
mailinglist and its a known issue already..? Last connection which i 
think caused/triggered the issue is in screenshot(if it attaches right 
to the mail..)basically just a GET request which gets a ack, followed by 
a FYN,ACK packet from the client 30 seconds later again followed by a ack..


The LetsEncrypt backend that is part of the configuration never got a 
single request according to stats..


Is it a known issue?
Are tcpdump/truss output desired ..? (where should i send em?)
Is there any other output that can try to gather next time?

Regards,
PiBa-NL

HA-Proxy version 1.7.9 2017/08/18
  TARGET  = freebsd

[2.4.2-DEVELOPMENT][admin@pfsense.local]/root: 
/usr/local/pkg/haproxy/haproxy_socket.sh show sess all
show sess all 0x80242b800: [08/Nov/2017:19:40:18.868158] id=15 
proto=tcpv4 source=45.76.a.b:53752

  flags=0x48a, conn_retries=0, srv_conn=0x0, pend_pos=0x0
  frontend=www (id=3 mode=http), listener=37.97.x.y:80 (id=1) 
addr=37.97.x.y:80

  backend= (id=-1 mode=-)
  server= (id=-1)
  task=0x80248f380 (state=0x04 nice=0 calls=4 exp= age=4h23m)
  txn=0x802421800 flags=0x820 meth=1 status=-1 req.st=MSG_BODY 
rsp.st=MSG_RPBEFORE waiting=0
  si[0]=0x80242ba38 (state=EST flags=0x08 endp0=CONN:0x8024ca480 
exp=, et=0x000)
  si[1]=0x80242ba60 (state=EST flags=0x4010 endp1=APPCTX:0x8024ca600 
exp=, et=0x000)

  co0=0x8024ca480 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x8024ca300
  flags=0x0025b300 fd=1 fd.state=22 fd.cache=0 updt=0
  app1=0x8024ca600 st0=0 st1=0 st2=0 applet=
  req=0x80242b810 (f=0x80c020 an=0x0 pipe=0 tofwd=-1 total=94)
  an_exp= rex= wex=
  buf=0x8024ed900 data=0x8024ed914 o=94 p=94 req.next=94 i=0 size=16384
  res=0x80242b850 (f=0x8040 an=0xa0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x783160 data=0x783174 o=0 p=0 rsp.next=0 i=0 size=0
0x80242ac00: [09/Nov/2017:00:04:24.403636] id=31 proto=unix_stream 
source=unix:1

  flags=0x88, conn_retries=0, srv_conn=0x0, pend_pos=0x0
  frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1
  backend= (id=-1 mode=-)
  server= (id=-1)
  task=0x80248f4d0 (state=0x0a nice=-64 calls=1 exp=10s age=?)
  si[0]=0x80242ae38 (state=EST flags=0x08 endp0=CONN:0x8024ca900 
exp=, et=0x000)
  si[1]=0x80242ae60 (state=EST flags=0x4018 endp1=APPCTX:0x8024ca780 
exp=, et=0x000)
  co0=0x8024ca900 ctrl=unix_stream xprt=RAW data=STRM 
target=LISTENER:0x8024ca000

  flags=0x0020b306 fd=2 fd.state=25 fd.cache=0 updt=0
  app1=0x8024ca780 st0=7 st1=0 st2=3 applet=
  req=0x80242ac10 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=15)
  an_exp= rex=10s wex=
  buf=0x8024e7dc0 data=0x8024e7dd4 o=0 p=0 req.next=0 i=0 size=16384
  res=0x80242ac50 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1198)
  an_exp= rex= wex=
  buf=0x8025603c0 data=0x8025603d4 o=1198 p=1198 rsp.next=0 i=0 
size=16384


FreeBSD pfsense.local 11.1-RELEASE-p3 FreeBSD 11.1-RELEASE-p3 #362 
r313908+9cf44ec5484(RELENG_2_4): Fri Nov  3 08:23:14 CDT 2017

[2.4.2-DEVELOPMENT][admin@pfsense.local]/root: haproxy -vv
HA-Proxy version 1.7.9 2017/08/18
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -O2 -pipe -fstack-protector -fno-s