Re: connection leak (stuck in CLOSE_WAIT) on master

2018-08-21 Thread Patrick Hemmer


On 2018/8/9 01:01, Patrick Hemmer wrote:
> There's an issue on current master (287527a) where haproxy is losing
> track of its connections, and they're getting stuck in CLOSE_WAIT. And
> it's counting these connections towards limits (such as maxconn).
> Eventually maxconn is reached and everything starts queuing, and
> timing out.
>
> Here's an example:
> # lsof -p 25520
> COMMAND   PIDUSER   FD TYPE DEVICE  
> SIZE/OFFNODE NAME
> 
> haproxy 25520 phemmer0u CHR   16,5
> 0x1409380c1797 /dev/ttys005
> haproxy 25520 phemmer1u CHR   16,5
> 0x1409380c1797 /dev/ttys005
> haproxy 25520 phemmer2u CHR   16,5
> 0x1409380c1797 /dev/ttys005
> haproxy 25520 phemmer3u 
> KQUEUE   count=0, state=0xa
> haproxy 25520 phemmer4uunix 0x933b1c4c9ed396b7   
> 0t0 /tmp/haproxy.sock.25520.tmp
> haproxy 25520 phemmer5uIPv4 0x933b1c4cbf5fc73f   
> 0t0 TCP *:8001 (LISTEN)
> haproxy 25520 phemmer6uIPv4 0x933b1c4c925fe437   
> 0t0 UDP *:51977
> haproxy 25520 phemmer7 PIPE 0x933b1c4c9f89c457 
> 16384 ->0x933b1c4c9f89bf17
> haproxy 25520 phemmer8 PIPE 0x933b1c4c9f89bf17 
> 16384 ->0x933b1c4c9f89c457
> haproxy 25520 phemmer9uunix 0x933b1c4c9548c96f   
> 0t0 /tmp/haproxy.sock.25520.tmp
> haproxy 25520 phemmer   11uIPv4 0x933b1c4cc0cca73f   
> 0t0 TCP 127.0.0.1:8001->127.0.0.1:59199 (CLOSE_WAIT)
> haproxy 25520 phemmer   12uIPv4 0x933b1c4cbfd20ddf   
> 0t0 TCP 127.0.0.1:59200->127.0.0.1:8081 (CLOSE_WAIT)
>
>
> FD 11 was the client making the request to haproxy, and FD 12 was
> haproxy to the server.
>
> > show fd
> 4 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc]
> cnext=-13 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852af08430
> iocb=0x10bb89c40(listener_accept) l.st=RDY fe=GLOBAL
> 5 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [lc]
> cnext=-13 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852af08a40
> iocb=0x10bb89c40(listener_accept) l.st=RDY fe=f1
> 7 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [lc]
> cnext=-3 cprev=0 tmask=0x1 umask=0x0 owner=0x10bbc2ca0
> iocb=0x10bbc2ca0(poller_pipe_io_handler)
> 9 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [lc]
> cnext=-3 cprev=-2 tmask=0x1 umask=0x1 owner=0x7f852ae14550
> iocb=0x10bba60a0(conn_fd_handler) cflg=0x00201300 fe=GLOBAL mux=PASS
> mux_ctx=0x7f852ca0
>  11 : st=0x22(R:pRa W:pRa) ev=0x10(Heopi) [lc]
> cnext=-3 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852ae13ef0
> iocb=0x10bba60a0(conn_fd_handler) cflg=0x80241300 fe=f1 mux=PASS
> mux_ctx=0x7f852ae14080
>  12 : st=0x25(R:PrA W:pRa) ev=0x00(heopi) [Lc]
> cnext=-17 cprev=-2 tmask=0x1 umask=0x0 owner=0x7f852ca000c0
> iocb=0x10bba60a0(conn_fd_handler) cflg=0x00202306 sv=s1/b1 mux=PASS
> mux_ctx=0x7f852ae14bb0
>
> > show sess
> 0x7f852ae14140: proto=tcpv4 src=127.0.0.1:59199 fe=f1 be=b1
> srv=s1 ts=00 age=2m9s calls=6 rq[f=8840020h,i=0,an=8000h,rx=,wx=,ax=]
> rp[f=8040h,i=0,an=a0h,rx=,wx=,ax=] s0=[7,8h,fd=11,ex=]
> s1=[7,118h,fd=12,ex=] exp=
> 0x7f852ae14830: proto=unix_stream src=unix:1 fe=GLOBAL
> be= srv= ts=00 age=20s calls=3
> rq[f=40c08202h,i=0,an=00h,rx=,wx=,ax=]
> rp[f=c0048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,ch,fd=9,ex=]
> s1=[7,4018h,fd=-1,ex=] exp=23h59m
>
>
>
> Here's the config I'm using:
> global
> stats socket /tmp/haproxy.sock level admin
> defaults
> log 127.0.0.1:1234 daemon
> mode http
> option httplog
> timeout queue 5s
> frontend f1
> bind :8001
> default_backend b1
> backend b1
> server s1 127.0.0.1:8081 maxconn 1
>
>
> Log output around the time it broke:
> <30>Aug  9 00:45:56 haproxy[25520]: 127.0.0.1:59186
> [09/Aug/2018:00:45:56.620] f1 b1/s1 0/167/1/107/275 200 121 - - 
> 2/2/1/1/0 0/2 "GET / HTTP/1.1"
> <30>Aug  9 00:45:57 haproxy[25520]: 127.0.0.1:59189
> [09/Aug/2018:00:45:56.726] f1 b1/s1 0/169/1/108/278 200 121 - - 
> 2/2/1/1/0 0/2 "GET / HTTP/1.1"
> <30>Aug  9 00:45:57 haproxy[25520]: 127.0.0.1:59193
> [09/Aug/2018:00:45:56.948] f1 b1/s1 0/56/1/109/167 200 121 - - 
> 2/2/1/1/0 0/1 "GET / HTTP/1.1"
> <30>Aug  9 00:45:57 haproxy[25520]: 127.0.0.1:59196
> [09/Aug/2018:00:45:57.056] f1 b1/s1 0/58/1/101/161 200 121 - - 
> 2/2/1/1/0 0/1 "GET / HTTP/1.1"
> <30>Aug  9 00:46:02 haproxy[25520]: 127.0.0.1:59202
> [09/Aug/2018:00:45:57.351] f1 b1/ 0/5001/-1/-1/5007 503 212 - -
> sQ-- 3/3/2/0/0 0/1 "GET / HTTP/1.1"
> <30>Aug  9 00:46:02 haproxy[25520]: 127.0.0.1:59204
> [09/Aug/2018:00:45:57.546] f1 b1/ 0/5005/-1/-1/5005 503 21

Re: haproxy processing request of a disconnected client

2018-08-21 Thread Patrick Hemmer


On 2018/8/9 13:00, Patrick Hemmer wrote:
> So I just noticed the behavior that when a request is queued and the
> client closes the connection, once a server slot frees up that request
> is still sent to the server which processes it and sends a response back.
> What's even more interesting is that the log indicates that everything
> was processed normally. It basically says the response was sent back
> to the client just fine.
>
> Example config:
> global
> stats socket /tmp/haproxy.sock level admin
> defaults
> log 127.0.0.1:1234 daemon
> mode http
> option httplog
> timeout queue 5s
> frontend f1
> bind :8001
> default_backend b1
> backend b1
> server s1 127.0.0.1:8081 maxconn 1
>
> Log output:
> <30>Aug  9 12:50:40 haproxy[12384]: 127.0.0.1:64723
> [09/Aug/2018:12:50:35.167] f1 b1/s1 0/0/0/5106/5106 200 118 - - 
> 2/2/1/1/0 0/0 "GET /y HTTP/1.1"
> <30>Aug  9 12:50:45 haproxy[12384]: 127.0.0.1:64726
> [09/Aug/2018:12:50:35.526] f1 b1/s1 0/4749/0/5102/9851 200 118 - -
>  1/1/0/1/0 0/1 "GET /x HTTP/1.1"
>
> ^ In this, the server sleeps for 5 seconds, and then replies. I sent
> the request for /y, and then a request for /x, but killed the client
> on the /x request after 1 second. The request for /y was processed,
> but then so was the request for /x. The close flags ("") indicate
> everything went fine.
>
> Information:
>
> lsof:
> COMMAND   PIDUSER   FD TYPE DEVICE  
> SIZE/OFFNODE NAME
> ...
> haproxy 12384 phemmer0u CHR   16,5
> 0x140daf691797 /dev/ttys005
> haproxy 12384 phemmer1u CHR   16,5
> 0x140daf691797 /dev/ttys005
> haproxy 12384 phemmer2u CHR   16,5
> 0x140daf691797 /dev/ttys005
> haproxy 12384 phemmer3u 
> KQUEUE   count=0, state=0xa
> haproxy 12384 phemmer4uunix 0x933b1c4c8f54438f   
> 0t0 /tmp/haproxy.sock.12384.tmp
> haproxy 12384 phemmer5uIPv4 0x933b1c4cc16309ff   
> 0t0 TCP *:8001 (LISTEN)
> haproxy 12384 phemmer6uIPv4 0x933b1c4c8fef8977   
> 0t0 UDP *:62112
> haproxy 12384 phemmer7uIPv4 0x933b1c4cbf5fd9ff   
> 0t0 TCP 127.0.0.1:8001->127.0.0.1:64723 (ESTABLISHED)
> haproxy 12384 phemmer8uIPv4 0x933b1c4cc1546ddf   
> 0t0 TCP 127.0.0.1:64724->127.0.0.1:8081 (ESTABLISHED)
> haproxy 12384 phemmer9uIPv4 0x933b1c4cc1c6209f   
> 0t0 TCP 127.0.0.1:8001->127.0.0.1:64726 (CLOSE_WAIT)
>
> admin socket 'show fd':
>   4 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlc] cache=0
> owner=0x7f93f3705010 iocb=0x107f82c40(listener_accept)
> tmask=0x umask=0xfffe l.st=RDY fe=GLOBAL
>   5 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlc] cache=0
> owner=0x7f93f3705650 iocb=0x107f82c40(listener_accept)
> tmask=0x umask=0xfffe l.st=RDY fe=f1
>   7 : st=0x25(R:PrA W:pRa) ev=0x00(heopi) [nlc] cache=0
> owner=0x7f93f3708280 iocb=0x107f9e2e0(conn_fd_handler) tmask=0x1
> umask=0x0 cflg=0x80201306 fe=f1 mux=PASS mux_ctx=0x7f93f3708490
>   8 : st=0x25(R:PrA W:pRa) ev=0x00(heopi) [nLc] cache=0
> owner=0x7f93f340ede0 iocb=0x107f9e2e0(conn_fd_handler) tmask=0x1
> umask=0x0 cflg=0x00202306 sv=s1/b1 mux=PASS mux_ctx=0x7f93f350d0e0
>   9 : st=0x22(R:pRa W:pRa) ev=0x10(Heopi) [nlc] cache=0
> owner=0x7f93f510 iocb=0x107f9e2e0(conn_fd_handler) tmask=0x1
> umask=0x0 cflg=0x80241300 fe=f1 mux=PASS mux_ctx=0x7f93f5100210
>  10 : st=0x25(R:PrA W:pRa) ev=0x00(heopi) [nlc] cache=0
> owner=0x7f93f3708100 iocb=0x107f9e2e0(conn_fd_handler) tmask=0x1
> umask=0x0 cflg=0x00201306 fe=GLOBAL mux=PASS mux_ctx=0x7f93f350d1a0
>
> admin socket 'show sess':
> 0x7f93f37084c0: proto=tcpv4 src=127.0.0.1:64723 fe=f1 be=b1 srv=s1
> ts=08 age=2s calls=3 rq[f=48840200h,i=0,an=8000h,rx=,wx=,ax=]
> rp[f=8040h,i=0,an=a0h,rx=,wx=,ax=] s0=[7,8h,fd=7,ex=]
> s1=[7,118h,fd=8,ex=] exp=
> 0x7f93f5100240: proto=tcpv4 src=127.0.0.1:64726 fe=f1 be=b1
> srv= ts=08 age=2s calls=3
> rq[f=c800020h,i=0,an=8000h,rx=,wx=,ax=]
> rp[f=8000h,i=0,an=00h,rx=,wx=,ax=] s0=[7,8h,fd=9,ex=]
> s1=[2,110h,fd=-1,ex=2s] exp=2s
> 0x7f93f350d1d0: proto=unix_stream src=unix:1 fe=GLOBAL be=
> srv= ts=02 age=0s calls=1
> rq[f=40c08202h,i=0,an=00h,rx=10s,wx=,ax=]
> rp[f=80008002h,i=0,an=00h,rx=,wx=,ax=] s0=[7,8h,fd=10,ex=]
> s1=[7,4018h,fd=-1,ex=] exp=10s
>
> Version info:
> # ./haproxy -vv
> HA-Proxy version 1.8.13 2018/07/30
> Copyright 2000-2018 Willy Tarreau 
>
> Build options :
> TARGET  = osx
> CPU = generic
> CC  = gcc
> CFLAGS  = -O0 -g -fno-strict-aliasing
> -Wdeclaration-after-statement -fwrapv -fno-strict-overflow
> -Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label
> OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1
>

Re: [PATCH 0/2] sample fetches for available connections

2018-08-21 Thread Patrick Hemmer


On 2018/8/9 18:46, Patrick Hemmer wrote:
> These are 2 new sample fetches which provide the available connections.
> The be_conn_free fetch is similar to connslots, but has significant
> difference in that it does not count queue slots, nor backup servers
> (unless all servers are down).
>
> These are intended to be useful in combination with the priority
> queuing, so you can see how many connections are available, perhaps for
> taking action when the number is low. For example by reserving
> connections for high-priority requests, and rejecting low priority ones.
>
> -Patrick
>
> Patrick Hemmer (2):
>   MINOR: add be_conn_free sample fetch
>   MINOR: Add srv_conn_free sample fetch
>
>  doc/configuration.txt | 34 +---
>  src/backend.c | 60 +++
>  2 files changed, 91 insertions(+), 3 deletions(-)
>

Is there any interest in this?

Sorry for all the pings, lots of emails which look like they might have
fallen off the radar.

-Patrick


Re: [PATCH] MINOR: crypto: Add digest and hmac converters

2018-08-21 Thread Patrick Hemmer


On 2018/6/17 14:02, Baptiste wrote:
>
>
> Le dim. 17 juin 2018 à 14:10, Patrick Gansterer  > a écrit :
>
>
> > On 17 Jun 2018, at 13:36, Baptiste  > wrote:
> >
> > Can they be used to validate oauth tokens too?
>
> Depends on the implementation of the tokens, but if they are
> HMACSHA256 signed JWT, it’s very easy to validate them in a lua
> script now.
>
>
> That's how I am doing it now.
> Just wanted to know if a combination of http rules could do the job.
> And be maybe faster.
>
> Baptiste

Any progress on this? I'd also be interested in seeing native HMAC
functions added, as I'm doing something similar to a HMAC that I'd love
to replace.

-Patrick


Re: BUG: Tw is negative with lua sleep

2018-08-21 Thread Patrick Hemmer


On 2018/7/18 09:03, Frederic Lecaille wrote:
> Hello Patrick,
>
> On 07/17/2018 03:59 PM, Patrick Hemmer wrote:
>> Ping?
>>
>> -Patrick
>>
>> On 2018/6/22 15:10, Patrick Hemmer wrote:
>>> When using core.msleep in lua, the %Tw metric is a negative value.
>>>
>>> For example with the following config:
>>> haproxy.cfg:
>>> global
>>> lua-load /tmp/haproxy.lua
>>>
>>> frontend f1
>>> mode http
>>> bind :8000
>>> default_backend b1
>>> log 127.0.0.1:1234 daemon
>>> log-format Ta=%Ta\ Tc=%Tc\ Td=%Td\ Th=%Th\ Ti=%Ti\
>>> Tq=%Tq\ TR=%TR\ Tr=%Tr\ Tt=%Tt\ Tw=%Tw
>>>
>>> backend b1
>>> mode http
>>> http-request use-service lua.foo
>>>
>>> haproxy.lua:
>>> core.register_service("foo", "http", function(applet)
>>> core.msleep(100)
>>> applet:set_status(200)
>>> applet:start_response()
>>> end)
>>>
>>> The log contains:
>>> Ta=104 Tc=0 Td=0 Th=0 Ti=0 Tq=104 TR=104 Tr=104 Tt=104 Tw=-104
>>>
>>> ^ TR also looks wrong, as it did not take 104ms to receive the full
>>> request.
>>>
>>> This is built from the commit before current master: d8fd2af
>>>
>>> -Patrick
>>
>
> The patch attached to this mail fixes this issue at least for %TR field.
>
> But I am not sure at all it is correct or if there is no remaining
> issues. For instance the LUA tcp callback also updates the tv_request
> log field.
>
> So, let's wait for Thierry's validation.
>
> Regards.
>

Any update on this?

-Patrick



BUG: LUA txn:get_priv() scoped to connection, not transaction

2018-08-21 Thread Patrick Hemmer
There is a bug in the current stable haproxy (1.8.13) where the LUA
function txn:get_priv() is returning data stored from other
transactions. This was discovered as we have code that triggers on
certain requests, and it was triggering on requests it should not have been.

You can reproduce with this config:
global
lua-load haproxy.lua

defaults
mode http

frontend f1
bind :8000
default_backend b1
http-request lua.test

backend b1
http-request use-service lua.fakeserv

And this lua file:
core.register_action("test", { "http-req" }, function(txn)
data = txn:get_priv()
if not data then
data = 0
end
data = data + 1
print(string.format("set to %d", data))
txn:set_priv(data)
end)

core.register_service("fakeserv", "http", function(applet)
applet:set_status(200)
applet:start_response()
end)

And this curl command:
curl http://localhost:8000 http://localhost:8000

Which provides this output:
set to 1
set to 2



Version information:
HA-Proxy version 1.8.13 2018/07/30
Copyright 2000-2018 Willy Tarreau 

Build options :
TARGET  = osx
CPU = generic
CC  = gcc
CFLAGS  = -O0 -g -fno-strict-aliasing
-Wdeclaration-after-statement -fwrapv -fno-strict-overflow
-Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label
OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024,
maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.0h  27 Mar 2018
Running on OpenSSL version : OpenSSL 1.1.0h  27 Mar 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using:
Encrypted password support via crypt(3): yes
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.42 2018-03-20
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
 kqueue : pref=300,  test result OK
 poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace


-Patrick


Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Willy Tarreau
On Wed, Aug 22, 2018 at 04:32:49AM +0200, Willy Tarreau wrote:
> Excellent, I think I found it :
> 
> trash.data = recv(conn->handle.fd, trash.area, trash.size,
>  MSG_PEEK);
> if (trash.data < 0) {
> if (errno == EINTR)
> continue;
> if (errno == EAGAIN) {
> fd_cant_recv(conn->handle.fd);
> return 0;
> }
>   ...
> 
> trash.data is a size_t now so it cannot be negative. Thus it's believed
> that recv() never fails. This it's clearly related to the buffer changes.
> I'm seeing a few other such places that require using an intermediate
> variable for the test. After all it's not that bad because we've inherited
> such assignments from a good decade, and it's time to clean this up as well.

So I've now addressed all those I could find (quite a bunch in fact).
I think everything's OK now regarding this. I haven't checked for the
capture yet but since it was already broken in 1.6, it can still wait
a bit ;-)

Thanks for your reports!
Willy



Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Willy Tarreau
On Wed, Aug 22, 2018 at 12:46:47AM +0200, Cyril Bonté wrote:
> Le 22/08/2018 à 00:40, Cyril Bonté a écrit :
> > Hi again Willy,
> > 
> > Le 21/08/2018 à 22:55, Cyril Bonté a écrit :
> > > > Thanks for the diag. I don't remember changing anything around the proxy
> > > > protocol, but it's possible that something subtle changed. Also it's not
> > > > on the regular send/receive path so maybe I overlooked certain parts and
> > > > broke it by accident when changing the buffers.
> > > > 
> > > > Same here, if you have a small reproducer it will really help.
> > > 
> > > I try to find a configuration that could help identify the issue,
> > > but currently I fail (timings seems to have a role). I let you know
> > > once I have a good reproducer.
> > 
> > OK, I have a small reproducer that triggers the issue quite often on my
> > laptop:
> >      global
> >      log /dev/log len 2048 local7 debug err
> > 
> >      nbproc 4
> > 
> >      defaults
> >      mode http
> >      log global
> >      option log-health-checks
> > 
> >      listen ssl-offload-http
> >      bind :4443 ssl crt localhost.pem ssl no-sslv3 alpn h2,http/1.1
> >      bind-process 2-4
> > 
> >      server http abns@http send-proxy
> > 
> >      listen http
> >      bind-process 1
> >      bind abns@http accept-proxy name ssl-offload-http
> > 
> >      option forwardfor
> > 
> > then execute several H2 requests on the same connection, for example:
> > $ curl -k -d 'x=x' $(printf 'https://localhost:4443/%s ' {1..8})
> > 503 Service Unavailable
> > No server is available to handle this request.
> > 
> > 503 Service Unavailable
> > No server is available to handle this request.
> > 
> > 503 Service Unavailable
> > No server is available to handle this request.
> > 
> > curl: (16) Error in the HTTP2 framing layer
> > 503 Service Unavailable
> > No server is available to handle this request.
> > 
> > curl: (16) Error in the HTTP2 framing layer
> > curl: (16) Error in the HTTP2 framing layer
> > 503 Service Unavailable
> > No server is available to handle this request.
> > 
> > 
> > In the logs, I can see:
> > Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538
> > [22/Aug/2018:00:34:19.459] http http/ 0/-1/-1/-1/0 503 212 - -
> > SC-- 1/1/0/0/0 0/0 "POST /1 HTTP/1.1"
> > Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538
> > [22/Aug/2018:00:34:19.458] ssl-offload-http~ ssl-offload-http/http
> > 0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /1 HTTP/1.1"
> > Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538
> > [22/Aug/2018:00:34:19.459] http http/ 0/-1/-1/-1/0 503 212 - -
> > SC-- 1/1/0/0/0 0/0 "POST /2 HTTP/1.1"
> > Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538
> > [22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http
> > 0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /2 HTTP/1.1"
> > Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538
> > [22/Aug/2018:00:34:19.460] http http/ 0/-1/-1/-1/0 503 212 - -
> > SC-- 1/1/0/0/0 0/0 "POST /3 HTTP/1.1"
> > Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538
> > [22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http
> > 0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /3 HTTP/1.1"
> > Aug 22 00:34:19 asus-wifi haproxy[12623]: PROXY SIG ERROR
> > X-Forwarded-For: 127.0.0.1
> > Aug 22 00:34:19 asus-wifi haproxy[12623]: unix:1
> > [22/Aug/2018:00:34:19.460] http/ssl-offload-http: Received something
> > which does not look like a PROXY protocol header
> > Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538
> > [22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http
> > 0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /4 HTTP/1.1"
> > Aug 22 00:34:19 asus-wifi haproxy[12623]: PROXY SIG ERROR unix:1
> > [22/Aug/2018:00:34:19.460] http/ssl-offload-http
> > Aug 22 00:34:19 asus-wifi haproxy[12623]: unix:1
> > [22/Aug/2018:00:34:19.461] http/ssl-offload-http: Received something
> > which does not look like a PROXY protocol header
> > Aug 22 00:34:20 asus-wifi haproxy[12623]: 127.0.0.1:37542
> > [22/Aug/2018:00:34:20.462] http http/ 0/-1/-1/-1/0 503 212 - -
> > SC-- 1/1/0/0/0 0/0 "POST /5 HTTP/1.1"
> > Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37542
> > [22/Aug/2018:00:34:19.460] ssl-offload-http~ ssl-offload-http/http
> > 0/0/1002/0/1002 503 212 - -  1/1/0/0/1 0/0 "POST /5 HTTP/1.1"
> > Aug 22 00:34:20 asus-wifi haproxy[12623]: PROXY SIG ERROR
> > X-Forwarded-For: 127.0.0.1
> > Aug 22 00:34:20 asus-wifi haproxy[12623]: unix:1
> > [22/Aug/2018:00:34:20.463] http/ssl-offload-http: Received something
> > which does not look like a PROXY protocol header
> > Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37542
> > [22/Aug/2018:00:34:20.463] ssl-offload-http~ ssl-offload-http/http
> > 0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /6 HTTP/1.1"
> > Aug 22 00:34:20 asus-wifi haproxy[12623]: PROXY SIG ERROR unix:1
> > [22/Aug/2018:00:34:20.463] http/ssl-offload-http
> > Aug 22 00:34:20 asus

Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Willy Tarreau
On Tue, Aug 21, 2018 at 10:55:40PM +0200, Cyril Bonté wrote:
> > > OK, that one is my fault, I used a "listen" section for use_backend (maybe
> > > we'll have to secure such configuration error to prevent segfaults).
> > 
> > I don't see how it could be an issue, that's a completely supported
> > config! You can even have this :
> > 
> >listen a
> >   use_backend b if { prefer_b }
> > 
> >listen b
> >   use_backend a if { prefer_a }
> > 
> > So any reproducer would be interesting.
> 
> I could extract this minimal configuration to reproduce the issue:
>   defaults
> mode http
> 
>   listen proxy1
> bind :9001
> http-request set-header X-Forwarded-Proto http
> http-request capture hdr(X-Forwarded-Proto) len 8
> 
>   listen proxy2
> bind :9002
> use_backend proxy1
> 
> then, curl http://localhost:9002/ will cause a segfault.

Very interesting! Even 1.6 crashes on it! Even with proxy2 being a pure
frontend it dies. Apparently it seems caused by the http-request capture
rule that uses some space that ought to have been allocated in the frontend
but was not. The rule is accepted since proxy1 is a full proxy. Apparently
"capture request header" doesn't die, so it is able to protect itself against
this. I'll have a look to see how to do the same for http-request capture.

> > > The good news : I kept a configuration with abns, but removed the proxy
> > > protocol. Everything is working as expected. So, the proxy protocol looks
> > > to have a main role in the buffer issue.
> > 
> > Thanks for the diag. I don't remember changing anything around the proxy
> > protocol, but it's possible that something subtle changed. Also it's not
> > on the regular send/receive path so maybe I overlooked certain parts and
> > broke it by accident when changing the buffers.
> > 
> > Same here, if you have a small reproducer it will really help.
> 
> I try to find a configuration that could help identify the issue, but
> currently I fail (timings seems to have a role). I let you know once I have
> a good reproducer.

OK thank you!

Willy



Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Cyril Bonté

Le 22/08/2018 à 00:40, Cyril Bonté a écrit :

Hi again Willy,

Le 21/08/2018 à 22:55, Cyril Bonté a écrit :

Thanks for the diag. I don't remember changing anything around the proxy
protocol, but it's possible that something subtle changed. Also it's not
on the regular send/receive path so maybe I overlooked certain parts and
broke it by accident when changing the buffers.

Same here, if you have a small reproducer it will really help.


I try to find a configuration that could help identify the issue, but 
currently I fail (timings seems to have a role). I let you know once I 
have a good reproducer.


OK, I have a small reproducer that triggers the issue quite often on my 
laptop:

     global
     log /dev/log len 2048 local7 debug err

     nbproc 4

     defaults
     mode http
     log global
     option log-health-checks

     listen ssl-offload-http
     bind :4443 ssl crt localhost.pem ssl no-sslv3 alpn h2,http/1.1
     bind-process 2-4

     server http abns@http send-proxy

     listen http
     bind-process 1
     bind abns@http accept-proxy name ssl-offload-http

     option forwardfor

then execute several H2 requests on the same connection, for example:
$ curl -k -d 'x=x' $(printf 'https://localhost:4443/%s ' {1..8})
503 Service Unavailable
No server is available to handle this request.

503 Service Unavailable
No server is available to handle this request.

503 Service Unavailable
No server is available to handle this request.

curl: (16) Error in the HTTP2 framing layer
503 Service Unavailable
No server is available to handle this request.

curl: (16) Error in the HTTP2 framing layer
curl: (16) Error in the HTTP2 framing layer
503 Service Unavailable
No server is available to handle this request.


In the logs, I can see:
Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /1 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.458] ssl-offload-http~ ssl-offload-http/http 
0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /1 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /2 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http 
0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /2 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.460] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /3 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http 
0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /3 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: PROXY SIG ERROR 
X-Forwarded-For: 127.0.0.1
Aug 22 00:34:19 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:19.460] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http 
0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /4 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: PROXY SIG ERROR unix:1 
[22/Aug/2018:00:34:19.460] http/ssl-offload-http
Aug 22 00:34:19 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:19.461] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:20 asus-wifi haproxy[12623]: 127.0.0.1:37542 
[22/Aug/2018:00:34:20.462] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /5 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37542 
[22/Aug/2018:00:34:19.460] ssl-offload-http~ ssl-offload-http/http 
0/0/1002/0/1002 503 212 - -  1/1/0/0/1 0/0 "POST /5 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12623]: PROXY SIG ERROR 
X-Forwarded-For: 127.0.0.1
Aug 22 00:34:20 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:20.463] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37542 
[22/Aug/2018:00:34:20.463] ssl-offload-http~ ssl-offload-http/http 
0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /6 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12623]: PROXY SIG ERROR unix:1 
[22/Aug/2018:00:34:20.463] http/ssl-offload-http
Aug 22 00:34:20 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:20.469] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37546 
[22/Aug/2018:00:34:20.469] ssl-offload-http~ ssl-offload-http/http 
0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /7 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12623]: 127.0.0.1:37550 
[22/Aug/2018:00:34:20.472] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /8 HTT

Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Cyril Bonté

Hi again Willy,

Le 21/08/2018 à 22:55, Cyril Bonté a écrit :

Thanks for the diag. I don't remember changing anything around the proxy
protocol, but it's possible that something subtle changed. Also it's not
on the regular send/receive path so maybe I overlooked certain parts and
broke it by accident when changing the buffers.

Same here, if you have a small reproducer it will really help.


I try to find a configuration that could help identify the issue, but 
currently I fail (timings seems to have a role). I let you know once I 
have a good reproducer.


OK, I have a small reproducer that triggers the issue quite often on my 
laptop:

global
log /dev/log len 2048 local7 debug err

nbproc 4

defaults
mode http
log global
option log-health-checks

listen ssl-offload-http
bind :4443 ssl crt localhost.pem ssl no-sslv3 alpn h2,http/1.1
bind-process 2-4

server http abns@http send-proxy

listen http
bind-process 1
bind abns@http accept-proxy name ssl-offload-http

option forwardfor

then execute several H2 requests on the same connection, for example:
$ curl -k -d 'x=x' $(printf 'https://localhost:4443/%s ' {1..8})
503 Service Unavailable
No server is available to handle this request.

503 Service Unavailable
No server is available to handle this request.

503 Service Unavailable
No server is available to handle this request.

curl: (16) Error in the HTTP2 framing layer
503 Service Unavailable
No server is available to handle this request.

curl: (16) Error in the HTTP2 framing layer
curl: (16) Error in the HTTP2 framing layer
503 Service Unavailable
No server is available to handle this request.


In the logs, I can see:
Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /1 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.458] ssl-offload-http~ ssl-offload-http/http 
0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /1 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /2 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http 
0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /2 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.460] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /3 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http 
0/0/0/0/0 503 212 - -  1/1/0/0/0 0/0 "POST /3 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: PROXY SIG ERROR 
X-Forwarded-For: 127.0.0.1
Aug 22 00:34:19 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:19.460] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:19 asus-wifi haproxy[12625]: 127.0.0.1:37538 
[22/Aug/2018:00:34:19.459] ssl-offload-http~ ssl-offload-http/http 
0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /4 HTTP/1.1"
Aug 22 00:34:19 asus-wifi haproxy[12623]: PROXY SIG ERROR unix:1 
[22/Aug/2018:00:34:19.460] http/ssl-offload-http
Aug 22 00:34:19 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:19.461] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:20 asus-wifi haproxy[12623]: 127.0.0.1:37542 
[22/Aug/2018:00:34:20.462] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /5 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37542 
[22/Aug/2018:00:34:19.460] ssl-offload-http~ ssl-offload-http/http 
0/0/1002/0/1002 503 212 - -  1/1/0/0/1 0/0 "POST /5 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12623]: PROXY SIG ERROR 
X-Forwarded-For: 127.0.0.1
Aug 22 00:34:20 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:20.463] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37542 
[22/Aug/2018:00:34:20.463] ssl-offload-http~ ssl-offload-http/http 
0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /6 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12623]: PROXY SIG ERROR unix:1 
[22/Aug/2018:00:34:20.463] http/ssl-offload-http
Aug 22 00:34:20 asus-wifi haproxy[12623]: unix:1 
[22/Aug/2018:00:34:20.469] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header
Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37546 
[22/Aug/2018:00:34:20.469] ssl-offload-http~ ssl-offload-http/http 
0/0/0/-1/0 -1 0 - - SD-- 1/1/0/0/0 0/0 "POST /7 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12623]: 127.0.0.1:37550 
[22/Aug/2018:00:34:20.472] http http/ 0/-1/-1/-1/0 503 212 - - 
SC-- 1/1/0/0/0 0/0 "POST /8 HTTP/1.1"
Aug 22 00:34:20 asus-wifi haproxy[12625]: 127.0.0.1:37

Re: [PATCH] MEDIUM: lua: Add stick table support for Lua

2018-08-21 Thread Adis Nezirovic
On Tue, Aug 21, 2018 at 11:32:34AM +0200, Thierry Fournier wrote:
> Some remark about the documentation and the formats:
> 
>
> js:function:: StickTable.info()
> js:function:: StickTable.lookup(key)
>
> Maybe the specification and an example of the returnes values
> will be welcome, because I guess that the table keys are hardcoded.
> Something like :

That makes sense, I've updated the docs.

> js:function:: StickTable.dump([filter])
>
> The exact list of allowed operators will helps the user to use
> your class. It seems that string or regexes operators are not
> allowed, only the integer operator are taken in account. This
> list is "eq", "ne", "le", "lt", "ge", "gt".
Since only integers are available as data columns, we can only use
numeric operators. Noted in the docs now, with pointer to "show table".

> Same remarqk for allowed data type. Maybe a link to the HAProxy
> documentation will be sufficient for the data type.

> I see in the code that the filters can exceed 4 entries. This
> limitation must be written in the doc.
Noted.

> I miss also the relation between oprators and between the content
> of operators. I mean AND or OR. How I understand your example:
>
> +local filter = {
> +  lt={{"gpc0", 1}, {"gpc1", 2}},
> +  gt={{"conn_rate", 3}},
> +  eq={{"conn_cur", 4}}
> +}
>
> Are you sure that the syntax =
> is a good format ? Maybe something like the following, with the operator
> as argument between the two operands. lines are implicitly OR, and columns
> are AND:
>
> +local filter = {
> +  {{"gpc0", "lt", 1}, {"gpc1", "lt", 2}},
> +  {{"conn_rate", "gt", 3}},
> +  {{"conn_cur", "eq", 4}}
> +}
Actually, I was playing with some other ideas, and it was useful to be
able to "preselect" filter operators.
However, the CLI doesn't even support more than one, maybe we don't need
to complicate too much. Maybe we can simplify to this:

  local filter = {
{"gpc0", "lt", 1},
{"gpc1", "lt", 2},
{"conn_rate", "gt", 3},
{"conn_cur", "eq", 4}
  }

The default operator would be AND, and we would not support other
operators (to keep the things simple). e.g. example use case for the
filter would be to filter out on gpc0 > X AND gpc1 > Y

If this sounds good, I can update/simplify the code.

> Idea of extension for the future: Maybe it will be safe to compile
> sticktable filter during the initialisation of the Lua code, to avoid
> runtime errors ?
I'm returning runtime errors since it can be easy to mix up data from
the client side (most probably data would come as json table, then
transformed to Lua table)


> Other point with the doc of this function, you must specify that the
> execution of this function can be very long for millions of entry, even
> if a filter is specified because the stick table is entirely scanned.
Noted.


> some remarks about the code and the logs:
> -
>
> The line 182 of the patch contains space in place of tab.
Sorry, I generally try to be careful with code style, to many different
vim profiles :-D. Fixef.

> Line 274 of your patch, I don't see any HA_SPIN_LOCK(STK_TABLE_LOCK
> I don't known very well the thread, so maybe there are useles, maybe no.
hlua_stktable_lookup() uses stktable_lookup_key() which does have locks,
so I guess that it should be fine then?

> Line 311: I see the decrement of the refcount without ATOMIC function
> and whitout lock. Once again, I don't known very well the thread but
> I send a warning. Maybe locks are useles, maybe no.
You are totally right, locks are needed when decrementing references.
Fixed.

> Line 286 of your patch. It seems that smp.flags is used uninitialized.
> Maybe you should apply this change:
>
>-  smp.flags |= SMP_F_CONST;
>+  smp.flags = SMP_F_CONST;
Fixed, and cleaned up a bit (unecessary 'char *key' variable)

> l.365, 369: The user doesn't have context about the error. there are the
> first entry of the table, the second ? Which operator doesn't exists ?
>
> L.380, 384: Which line is wrong ?
Yes, it is somwehat cryptic. I've tried to avoid returning user supplied
data in the error messages. We can revisit this if/when we change the
filter table format.

> L.431: Your release the lock, so the next element relative to the current
> "n", can disappear and the ebmb_next() can return wrong memory.
I was under impression that we only have to acquire lock and increment
ref_cnt (so we can be sure our current node n is not deleted)
ebmb_next() is called only when we're holding lock, first and every
other iteration, i.e.

  HA_SPIN_LOCK(STK_TABLE_LOCK, &t->lock);
  eb = ebmb_first(&t->keys);
  for (n = eb; n; n = ebmb_next(n)) {
  ...
  ts->ref_cnt++;
  HA_SPIN_UNLOCK(STK_TABLE_LOCK, &t->lock);
  ...

  HA_SPIN_LOCK(STK_TABLE_LOCK, &t->lock);
  }

Or I didn't get your point?
>From 1f7912cbdf9a664a43ab64505d8dd0415984ca4e Mon Sep 17 00:00:00 2001
Fro

Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Cyril Bonté

Le 21/08/2018 à 21:56, Willy Tarreau a écrit :

On Tue, Aug 21, 2018 at 07:00:39PM +0200, Cyril Bonté wrote:

Le 21/08/2018 à 18:36, Willy Tarreau a écrit :

Cyril,

if you want to test, please be sure to update to at least 1b13bfd as
we've just added another fix on top of this series.


OK, I was performing some tests, I've now updated with this patch.
It looks better as now I don't see anymore hangs, but some issues remain.


OK, FWIW I've finally addressed the remaining potential deadlock causes
that could happen between checks and other operations on a same server
around the rdv point. But your issues seem unrelated.


Ah great !


(...)

OK, that one is my fault, I used a "listen" section for use_backend (maybe
we'll have to secure such configuration error to prevent segfaults).


I don't see how it could be an issue, that's a completely supported
config! You can even have this :

   listen a
  use_backend b if { prefer_b }

   listen b
  use_backend a if { prefer_a }

So any reproducer would be interesting.


I could extract this minimal configuration to reproduce the issue:
  defaults
mode http

  listen proxy1
bind :9001
http-request set-header X-Forwarded-Proto http
http-request capture hdr(X-Forwarded-Proto) len 8

  listen proxy2
bind :9002
use_backend proxy1

then, curl http://localhost:9002/ will cause a segfault.


The good news : I kept a configuration with abns, but removed the proxy
protocol. Everything is working as expected. So, the proxy protocol looks
to have a main role in the buffer issue.


Thanks for the diag. I don't remember changing anything around the proxy
protocol, but it's possible that something subtle changed. Also it's not
on the regular send/receive path so maybe I overlooked certain parts and
broke it by accident when changing the buffers.

Same here, if you have a small reproducer it will really help.


I try to find a configuration that could help identify the issue, but 
currently I fail (timings seems to have a role). I let you know once I 
have a good reproducer.



--
Cyril Bonté



Need Clarification

2018-08-21 Thread Jordan Finsbel
Hello my name is Jordan Finsbell and interested to get involved



Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Willy Tarreau
On Tue, Aug 21, 2018 at 07:00:39PM +0200, Cyril Bonté wrote:
> Le 21/08/2018 à 18:36, Willy Tarreau a écrit :
> > Cyril,
> > 
> > if you want to test, please be sure to update to at least 1b13bfd as
> > we've just added another fix on top of this series.
> 
> OK, I was performing some tests, I've now updated with this patch.
> It looks better as now I don't see anymore hangs, but some issues remain.

OK, FWIW I've finally addressed the remaining potential deadlock causes
that could happen between checks and other operations on a same server
around the rdv point. But your issues seem unrelated.

(...)
> OK, that one is my fault, I used a "listen" section for use_backend (maybe
> we'll have to secure such configuration error to prevent segfaults).

I don't see how it could be an issue, that's a completely supported
config! You can even have this :

  listen a
 use_backend b if { prefer_b }

  listen b
 use_backend a if { prefer_a }

So any reproducer would be interesting.

> The good news : I kept a configuration with abns, but removed the proxy
> protocol. Everything is working as expected. So, the proxy protocol looks
> to have a main role in the buffer issue.

Thanks for the diag. I don't remember changing anything around the proxy
protocol, but it's possible that something subtle changed. Also it's not
on the regular send/receive path so maybe I overlooked certain parts and
broke it by accident when changing the buffers.

Same here, if you have a small reproducer it will really help. I don't
promise to work on it tomorrow though, I think I figured some of the
stuff that needs to be done on the native HTTP part and I expected to
work on it today but granted myself a distraction by looking at scary
bugs. I must admit I was properly served :-)  But now I really need to
get back to the code to validate my ideas before they escape my mind.

Cheers,
Willy



Re: Need Clarification

2018-08-21 Thread Jonathan Matthews
On Tue, 21 Aug 2018 at 17:53, Jordan Finsbel 
wrote:

> Hello my name is Jordan Finsbell and interested to get involved


That's great! What areas are you interested in?

J
-- 
Jonathan Matthews
London, UK
http://www.jpluscplusm.com/contact.html


Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Cyril Bonté

Le 21/08/2018 à 19:00, Cyril Bonté a écrit :

Le 21/08/2018 à 18:36, Willy Tarreau a écrit :

Cyril,

if you want to test, please be sure to update to at least 1b13bfd as
we've just added another fix on top of this series.


OK, I was performing some tests, I've now updated with this patch.
It looks better as now I don't see anymore hangs, but some issues 
remain. It seems that some data in the buffers are not processed correctly.

Some requests done in the grafana dashboard fails to receive the response.
Note that in that test, I'm in the case where SSL is performed on a 
frontend, then sent to a backend thru abns and send-proxy, the backends 
makes some captures, etc...


Time to time, it results in such logs :
Aug 21 18:52:25 intense.local haproxy[5894]: unix:2 
[21/Aug/2018:18:52:25.169] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header


To exclude send-proxy from the equation, I've replaced the line
   server http abns@http send-proxy
with
   use_backend http

Sadly, it was worst with a new segfault on header captures.
#0  0x563f9a94bee7 in http_action_req_capture (rule=, 
px=, sess=, s=, 
flags=) at src/proto_http.c:12268

12268   if (cap[h->index] == NULL)
(gdb) bt
#0  0x563f9a94bee7 in http_action_req_capture (rule=, 
px=, sess=, s=, 
flags=) at src/proto_http.c:12268
#1  0x563f9a95076b in http_req_get_intercept_rule 
(px=px@entry=0x563f9b34f440, rules=rules@entry=0x563f9b34f488, 
s=s@entry=0x563f9e49f3f0, deny_status=deny_status@entry=0x7ffd1da4e20c) 
at src/proto_http.c:2767
#2  0x563f9a956eec in http_process_req_common 
(s=s@entry=0x563f9e49f3f0, req=req@entry=0x563f9e49f400, 
an_bit=an_bit@entry=256, px=0x563f9b34f440) at src/proto_http.c:3494
#3  0x563f9a98ea3a in process_stream (t=, 
context=0x563f9e49f3f0, state=) at src/stream.c:1932

#4  0x563f9aa14978 in process_runnable_tasks () at src/task.c:381
#5  0x563f9a9c3371 in run_poll_loop () at src/haproxy.c:2386
#6  run_thread_poll_loop (data=) at src/haproxy.c:2451
#7  0x563f9a91d9de in main (argc=, 
argv=0x7ffd1da4e748) at src/haproxy.c:3053


OK, that one is my fault, I used a "listen" section for use_backend 
(maybe we'll have to secure such configuration error to prevent segfaults).


The good news : I kept a configuration with abns, but removed the proxy 
protocol. Everything is working as expected. So, the proxy protocol 
looks to have a main role in the buffer issue.



--
Cyril Bonté



Re: Observations about reloads and DNS SRV records

2018-08-21 Thread francis Lavalliere
This is related to the discourse thread (And also discuss the issues in ML
instead of discourse).

https://discourse.haproxy.org/t/config-reload-with-dynamic-service-discovery-via-dns/2625/10


Here are the findings from my ends:



1 - State file / Port range invalid conversion

In the haproxy state file the Port is written as an unsigned long value..

ie: Port 32777 is actually written 4294934537

112 defaultback_failsaife 1 varnish1 10.100.20.78 2 0 1 1 203910 15 3 4 6 0
0 0 ip-10-100-20-78.node.aws-us-east-1.consul 4294934537 _tcp_.varnish
.service.consul


To allow the seamless reload, I had to comment the following lines in the
src/server.c


if (port > USHRT_MAX) {
  chunk_appendf(msg, “, invalid srv_port value ‘%s’”, port_str);
  port_str = NULL;
}


2 - Backend responds with 503 only after reload

In order to make it functional,

in the patch  i had to comment the following lines :

/*
// prepare DNS resolution for this server (but aint this has already been
done by the server-template function?)
res = srv_prepare_for_resolution(srv, fqdn);
if (res == -1) {
ha_alert(“could not allocate memory for DNS REsolution for server …
‘%s’\n”, srv->id);
chunk_appendf(msg, “, can’t allocate memory for DNS resolution for server
‘%s’”, srv->id);
HA_SPIN_UNLOCK(SERVER_LOCK, &srv->lock);
goto out;
}
*/


Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Cyril Bonté

Le 21/08/2018 à 18:36, Willy Tarreau a écrit :

Cyril,

if you want to test, please be sure to update to at least 1b13bfd as
we've just added another fix on top of this series.


OK, I was performing some tests, I've now updated with this patch.
It looks better as now I don't see anymore hangs, but some issues 
remain. It seems that some data in the buffers are not processed correctly.

Some requests done in the grafana dashboard fails to receive the response.
Note that in that test, I'm in the case where SSL is performed on a 
frontend, then sent to a backend thru abns and send-proxy, the backends 
makes some captures, etc...


Time to time, it results in such logs :
Aug 21 18:52:25 intense.local haproxy[5894]: unix:2 
[21/Aug/2018:18:52:25.169] http/ssl-offload-http: Received something 
which does not look like a PROXY protocol header


To exclude send-proxy from the equation, I've replaced the line
  server http abns@http send-proxy
with
  use_backend http

Sadly, it was worst with a new segfault on header captures.
#0  0x563f9a94bee7 in http_action_req_capture (rule=, 
px=, sess=, s=, 
flags=) at src/proto_http.c:12268

12268   if (cap[h->index] == NULL)
(gdb) bt
#0  0x563f9a94bee7 in http_action_req_capture (rule=, 
px=, sess=, s=, 
flags=) at src/proto_http.c:12268
#1  0x563f9a95076b in http_req_get_intercept_rule 
(px=px@entry=0x563f9b34f440, rules=rules@entry=0x563f9b34f488, 
s=s@entry=0x563f9e49f3f0, deny_status=deny_status@entry=0x7ffd1da4e20c) 
at src/proto_http.c:2767
#2  0x563f9a956eec in http_process_req_common 
(s=s@entry=0x563f9e49f3f0, req=req@entry=0x563f9e49f400, 
an_bit=an_bit@entry=256, px=0x563f9b34f440) at src/proto_http.c:3494
#3  0x563f9a98ea3a in process_stream (t=, 
context=0x563f9e49f3f0, state=) at src/stream.c:1932

#4  0x563f9aa14978 in process_runnable_tasks () at src/task.c:381
#5  0x563f9a9c3371 in run_poll_loop () at src/haproxy.c:2386
#6  run_thread_poll_loop (data=) at src/haproxy.c:2451
#7  0x563f9a91d9de in main (argc=, 
argv=0x7ffd1da4e748) at src/haproxy.c:3053



--
Cyril Bonté



Need Clarification

2018-08-21 Thread Jordan Finsbel
Hello my name is Jordan Finsbell and interested to get involved



Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Willy Tarreau
Cyril,

if you want to test, please be sure to update to at least 1b13bfd as
we've just added another fix on top of this series.

Cheers,
Willy



Re: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with option redispatch

2018-08-21 Thread Willy Tarreau
Hi Cyril,

I've just committed and pushed a fix for the double locking that happens
when calling pendconn_redistribute() from srv_update_status() and it fixes
your report (just checked). Actually I'm realising that I should have
credited you and Francis for the report since it's the same one :-(

However be careful, as for me there's still a concurrency issue on
lbprm.set_server_status_{up,down} and set_server_weight(). I'm now
checking how to address this.

Thanks!
Willy



Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Willy Tarreau
On Tue, Aug 21, 2018 at 06:01:59PM +0200, Olivier Houchard wrote:
> Hi Cyril,
> 
> On Tue, Aug 21, 2018 at 12:51:13PM +0200, Cyril Bonté wrote:
> > Hi Willy,
> > 
> > I'm afraid there's still some issues with HTTP/2 in the current dev branch 
> > :-(
> > 
> > This morning, I've upgraded a test server and discovered that some HTTPS 
> > sites did not work anymore (content hangs and is not sent to the client), 
> > I've also noticed some segfaults in haproxy.
> > As this is a test server that I've used for several years with haproxy, the 
> > configuration begins to be quite ugly, it won't be helpful to provide it in 
> > its current state.
> > 
> 
> I think I figured those out, well at least I don't have hangs anymore, and
> I think I understood those segfaults.
> The attached patchset should do the trick.
> 
> Willy, those should be mergeable.

All merged, thank you!

Willy



Re: HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Olivier Houchard
Hi Cyril,

On Tue, Aug 21, 2018 at 12:51:13PM +0200, Cyril Bonté wrote:
> Hi Willy,
> 
> I'm afraid there's still some issues with HTTP/2 in the current dev branch :-(
> 
> This morning, I've upgraded a test server and discovered that some HTTPS 
> sites did not work anymore (content hangs and is not sent to the client), 
> I've also noticed some segfaults in haproxy.
> As this is a test server that I've used for several years with haproxy, the 
> configuration begins to be quite ugly, it won't be helpful to provide it in 
> its current state.
> 

I think I figured those out, well at least I don't have hangs anymore, and
I think I understood those segfaults.
The attached patchset should do the trick.

Willy, those should be mergeable.

Thanks !

Olivier
>From 159f4653cffdd26fb62a26d047ac3f87f7506e59 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 14:25:52 +0200
Subject: [PATCH 1/5] BUG/MEDIUM: streams: Don't forget to remove the si from
 the wait list.

When freeing the stream, make sure we remove the stream interfaces from the
wait lists, in case it was in there.
---
 src/stream.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/stream.c b/src/stream.c
index bc0f1ac7..0f2ccf01 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -409,7 +409,9 @@ static void stream_free(struct stream *s)
session_free(sess);
 
tasklet_free(s->si[0].wait_list.task);
+   LIST_DEL(&s->si[0].wait_list.list);
tasklet_free(s->si[1].wait_list.task);
+   LIST_DEL(&s->si[1].wait_list.list);
pool_free(pool_head_stream, s);
 
/* We may want to free the maximum amount of pools if the proxy is 
stopping */
-- 
2.14.3

>From 3c93f5ad9f6dd14114c01217a11d56455c1fe62d Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Fri, 17 Aug 2018 18:57:51 +0200
Subject: [PATCH 2/5] BUG/MEDIUM: tasklets: Add the thread as active when
 waking a tasklet.

Set the flag for the current thread in active_threads_mask when waking a
tasklet, or we will never run it if no tasks are available.
---
 include/proto/task.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/proto/task.h b/include/proto/task.h
index 13398c53..eef4ded4 100644
--- a/include/proto/task.h
+++ b/include/proto/task.h
@@ -226,6 +226,7 @@ static inline void tasklet_wakeup(struct tasklet *tl)
return;
LIST_ADDQ(&task_list[tid], &tl->list);
task_list_size[tid]++;
+   HA_ATOMIC_OR(&active_tasks_mask, tid_bit);
HA_ATOMIC_ADD(&tasks_run_queue, 1);
 
 }
-- 
2.14.3

>From 722665b5f60b65ad0eb035b6dd2162062dcbf42f Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 15:59:43 +0200
Subject: [PATCH 3/5] BUG/MEDIUM: Check if the conn_stream exist in
 si_cs_io_cb.

It is possible that the conn_stream gets detached from the stream_interface,
and as it subscribed to the wait list, si_cs_io_cb() gets called anyway,
so make sure we have a conn_stream before attempting to send more data.
---
 src/stream_interface.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/stream_interface.c b/src/stream_interface.c
index 4b5b760c..52aa7c43 100644
--- a/src/stream_interface.c
+++ b/src/stream_interface.c
@@ -760,8 +760,12 @@ wake_others:
 struct task *si_cs_io_cb(struct task *t, void *ctx, unsigned short state)
 {
struct stream_interface *si = ctx;
+   struct conn_stream *cs = objt_cs(si->end);
+
+   if (!cs)
+   return NULL;
if (!(si->wait_list.wait_reason & SUB_CAN_SEND))
-   si_cs_send(__objt_cs(si->end));
+   si_cs_send(cs);
return (NULL);
 }
 
-- 
2.14.3

>From bee74b66b702126dd7bd808d0f4037ba885441a2 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 16:36:10 +0200
Subject: [PATCH 4/5] BUG/MEDIUM: H2: Activate polling after successful
 h2_snd_buf().

Make sure h2_send() is called after h2_snd_buf() by activating polling.
---
 src/mux_h2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mux_h2.c b/src/mux_h2.c
index 4a3150a2..7824cfe4 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -3531,6 +3531,8 @@ static size_t h2_snd_buf(struct conn_stream *cs, struct 
buffer *buf, size_t coun
}
 
b_del(buf, total);
+   if (total > 0)
+   conn_xprt_want_send(h2s->h2c->conn);
return total;
 }
 
-- 
2.14.3

>From 64220c175bdc43fc82efc73c633be3b0212ab3e2 Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 21 Aug 2018 16:37:06 +0200
Subject: [PATCH 5/5] BUG/MEDIUM: stream_interface: Call the wake callback
 after sending.

If we subscribed to send, and the callback is called, call the wake callback
after, so that process_stream() may be woken up if needed.
---
 src/stream_interface.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/stream_interface.c b/src/stream_interface.c
index 52aa7c43..51f2300d 100644
--- a/src/stream_interface.c
+++ b/src/stream_interface.c
@@ -764,8 +764,10

Re: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with option redispatch

2018-08-21 Thread Willy Tarreau
On Tue, Aug 21, 2018 at 04:09:55PM +0200, Cyril Bonté wrote:
> Hi Willy,
> Here is another issue seen today with the current dev branch [tests were also 
> made after pulling recent commit 3bcc2699b].
> 
> Since 0c026f49e, when a server status is set to DOWN and option redispatch is 
> enabled, the haproxy process hits 100% CPU.
> Even more, with the latest commits, if haproxy is compiled with DEBUG_FULL, 
> it will simply segfault.
> 
> Here is the minimal configuration for the test:
> listen crash
> bind :9000
> option redispatch
> server non-existent 127.0.0.1: check

OK so this one is related to the first part of the problem that I spotted,
which is that pendconn_redistribute() takes the server lock, which is already
held when entering srv_update_status(). I'm currently studying the other
similar corner cases but it seems for now that it's the only one trying to
take the lock from the callees we have there, so I'll add an unlocked
version.

However I'm more concerned by the calls to lb.set_server_{up,down} that
definitely do not expect to be called concurrently. It looks like at least
for the roundrobin algo it supports a lock that we should use there, but I
have to study the other ones as well.

The rendez-vous point was a much bigger carpet than I imagined it seems...

Willy



Re: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with option redispatch

2018-08-21 Thread Willy Tarreau
Hi Cyril,

On Tue, Aug 21, 2018 at 04:13:38PM +0200, Cyril Bonté wrote:
> > De: "Cyril Bonté" 
> > À: "Willy Tarreau" 
> > Cc: "HAProxy" 
> > Envoyé: Mardi 21 Août 2018 16:09:55
> > Objet: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with 
> > option redispatch
> > 
> > Hi Willy,
> > Here is another issue seen today with the current dev branch [tests
> > were also made after pulling recent commit 3bcc2699b].
> > 
> > Since 0c026f49e, when a server status is set to DOWN and option
> > redispatch is enabled, the haproxy process hits 100% CPU.
> > Even more, with the latest commits, if haproxy is compiled with
> > DEBUG_FULL, it will simply segfault.
> 
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7703f2f1 in __GI_abort () at abort.c:79
> #2  0x5558040f in __spin_lock (line=, file= out>, func=, l=, lbl=) at 
> include/common/hathreads.h:725
> #3  pendconn_redistribute (s=0x55786d80) at src/queue.c:411
> #4  0x555e7842 in srv_update_status () at src/server.c:4680
> #5  0x555e89a1 in srv_set_stopped.part () at src/server.c:966
> #6  0x555e8bd1 in srv_set_stopped (s=, 
> reason=, check=) at src/server.c:948
> #7  0x55639358 in process_chk_conn (state=, 
> context=0x557871d0, t=0x55783290) at src/checks.c:2265
> #8  process_chk () at src/checks.c:2304
> #9  0x556a2293 in process_runnable_tasks () at src/task.c:384
> #10 0x556408b9 in run_poll_loop () at src/haproxy.c:2386
> #11 run_thread_poll_loop () at src/haproxy.c:2451
> #12 0x55581e4b in main () at src/haproxy.c:3053
> #13 0x7702ab17 in __libc_start_main (main=0x55580e50 , 
> argc=3, argv=0x7fffdfa8, init=, fini=, 
> rtld_fini=, stack_end=0x7fffdf98) at 
> ../csu/libc-start.c:310
> #14 0x55583d0a in _start ()

Thank you for this trace. I'm currently debugging a very long chain of
insufficient locking in the server check code. Most operations from the
CLI simply take no lock, some status update functions assume they are
locked while they can be called from various places. This remained
partially hidden by the asynchronous mode called from the rendez-vous
point, but now that they are called synchronously everything breaks.

I have good hopes to sort this out, by starting to place comments at
the top of the functions to detail their expectations and to detect
all the wrong assumptions. If I can't sort it out like this, I can
easily revert the patch that made everything synchronous and decide
to go back to the asynchronous mode and hide all the dust under the
carpet. But as you know I really hate doing this so I prefer to take
it face to face for now.

Cheers,
Willy



Clarification re Timeouts and Session State in the Logs

2018-08-21 Thread Daniel Schneller
Hi!

I am trying to wrap my head around an issue we are seeing where there are many 
HTTP 504 responses sent out to clients.

I suspect that due to a client bug they stop sending data midway during the 
data phase of the request, but they keep the connection open.

What I see in the haproxy logs is a 504 response with termination flags "sHNN".
That I read as haproxy getting impatient (timeout server) waiting for response 
headers from the backend. The backend, not having seen the complete request 
yet, can't really answer at this point, of course.
I am wondering though, why it is that I see the I don't see a termination state 
indicating a client problem.

So my question (for now ;-)) boils down to these points:

1) When does the server timeout actually start counting? Am I right to assume 
it is from the last moment the server sent or (in this case) received some data?

2) If both "timeout server" and "timeout client" are set to the same value, and 
the input stalls (after the headers) longer than that, is it just that the 
implementation is such that the server side timeout "wins" when it comes to 
setting the termination flags?

3) If I set the client timeout shorter than the server timeout and produced 
this situation, should I then see a cD state?  If so, would I be right to 
assume that if the server were now to stall, the log could again be misleading 
in telling me that the client timeout expired first?

I understand it is difficult to tell "who's to blame" for an inactivity timeout 
without knowledge about the content or final size of the request -- I just need 
some clarity on how the read the logs :)


Thanks!
Daniel




--
Daniel Schneller
Principal Cloud Engineer

CenterDevice GmbH
Rheinwerkallee 3
53227 Bonn
www.centerdevice.com

__
Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, 
Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431

Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie 
bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter 
Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter 
Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.




signature.asc
Description: Message signed with OpenPGP


Re: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with option redispatch

2018-08-21 Thread Cyril Bonté
> De: "Cyril Bonté" 
> À: "Willy Tarreau" 
> Cc: "HAProxy" 
> Envoyé: Mardi 21 Août 2018 16:09:55
> Objet: haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with 
> option redispatch
> 
> Hi Willy,
> Here is another issue seen today with the current dev branch [tests
> were also made after pulling recent commit 3bcc2699b].
> 
> Since 0c026f49e, when a server status is set to DOWN and option
> redispatch is enabled, the haproxy process hits 100% CPU.
> Even more, with the latest commits, if haproxy is compiled with
> DEBUG_FULL, it will simply segfault.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7703f2f1 in __GI_abort () at abort.c:79
#2  0x5558040f in __spin_lock (line=, file=, func=, l=, lbl=) at 
include/common/hathreads.h:725
#3  pendconn_redistribute (s=0x55786d80) at src/queue.c:411
#4  0x555e7842 in srv_update_status () at src/server.c:4680
#5  0x555e89a1 in srv_set_stopped.part () at src/server.c:966
#6  0x555e8bd1 in srv_set_stopped (s=, reason=, check=) at src/server.c:948
#7  0x55639358 in process_chk_conn (state=, 
context=0x557871d0, t=0x55783290) at src/checks.c:2265
#8  process_chk () at src/checks.c:2304
#9  0x556a2293 in process_runnable_tasks () at src/task.c:384
#10 0x556408b9 in run_poll_loop () at src/haproxy.c:2386
#11 run_thread_poll_loop () at src/haproxy.c:2451
#12 0x55581e4b in main () at src/haproxy.c:3053
#13 0x7702ab17 in __libc_start_main (main=0x55580e50 , 
argc=3, argv=0x7fffdfa8, init=, fini=, 
rtld_fini=, stack_end=0x7fffdf98) at ../csu/libc-start.c:310
#14 0x55583d0a in _start ()


> Here is the minimal configuration for the test:
> listen crash
> bind :9000
> option redispatch
> server non-existent 127.0.0.1: check
> 
> I'll look at it later if nobody has fixed it in the mean time ;)
> Cyril
> 
> 



haproxy-1.9-dev [0c026f49e]: 100% CPU when a server goes DOWN with option redispatch

2018-08-21 Thread Cyril Bonté
Hi Willy,
Here is another issue seen today with the current dev branch [tests were also 
made after pulling recent commit 3bcc2699b].

Since 0c026f49e, when a server status is set to DOWN and option redispatch is 
enabled, the haproxy process hits 100% CPU.
Even more, with the latest commits, if haproxy is compiled with DEBUG_FULL, it 
will simply segfault.

Here is the minimal configuration for the test:
listen crash
bind :9000
option redispatch
server non-existent 127.0.0.1: check

I'll look at it later if nobody has fixed it in the mean time ;)
Cyril



Re: Fwd: [haproxy/haproxy] MAJOR: server: make server state changes synchronous again (3ff577e)

2018-08-21 Thread Willy Tarreau
Hi Tim,

On Tue, Aug 21, 2018 at 12:51:05PM +0200, Tim Düsterhus wrote:
> Hi list
> 
> the following comment has been posted to GitHub on commit
> 3ff577e165486de87d1c0923e84f72cc7ad790c0
> "MAJOR: server: make server state changes synchronous again"
> 
> https://github.com/haproxy/haproxy/commit/3ff577e165486de87d1c0923e84f72cc7ad790c0#commitcomment-30217983

Yep, thanks. This message was also posted on Discourse here :

   https://discourse.haproxy.org/t/deadlock-in-one-of-the-recent-commits/2881/3

Where I started to respond. There's a big problem with locking around
server states unfortunately, which I need to take care of because 1.8
is also affected. I suspect that once completely solved we may not even
need to use the rendez-vous point anymore, but it's a bit early to tell.

Cheers,
Willy



Fwd: [haproxy/haproxy] MAJOR: server: make server state changes synchronous again (3ff577e)

2018-08-21 Thread Tim Düsterhus
Hi list

the following comment has been posted to GitHub on commit
3ff577e165486de87d1c0923e84f72cc7ad790c0
"MAJOR: server: make server state changes synchronous again"

https://github.com/haproxy/haproxy/commit/3ff577e165486de87d1c0923e84f72cc7ad790c0#commitcomment-30217983

I'll make the author aware of the list once my email lands in the mail
archive.

Best regards
Tim Düsterhus



HTTP/2 issues and segfaults with current 1.9-dev [7ee465]

2018-08-21 Thread Cyril Bonté
Hi Willy,

I'm afraid there's still some issues with HTTP/2 in the current dev branch :-(

This morning, I've upgraded a test server and discovered that some HTTPS sites 
did not work anymore (content hangs and is not sent to the client), I've also 
noticed some segfaults in haproxy.
As this is a test server that I've used for several years with haproxy, the 
configuration begins to be quite ugly, it won't be helpful to provide it in its 
current state.

Here is a backtrace of a recent segfault:
#0  si_cs_send (cs=0x0) at src/stream_interface.c:648
#1  0x557260d4c6cd in si_cs_io_cb (t=, ctx=, 
state=) at src/stream_interface.c:764
#2  0x557260d7d237 in process_runnable_tasks () at src/task.c:384
#3  0x557260d2bf61 in run_poll_loop () at src/haproxy.c:2386
#4  run_thread_poll_loop (data=) at src/haproxy.c:2451
#5  0x557260c869de in main (argc=, argv=0x7fff3770b8d8) at 
src/haproxy.c:3053

I could identify that it was easy to reproduce with a grafana server behind 
haproxy (loading css/js resources seems to hang).
It seems the issues began with commit d54a8ceb9 MAJOR: start to change buffer 
API.

Here is an example of configuration which allows to reproduce the hanging issue 
(I could not reproduce the segfault with that one):
defaults http
mode http
timeout connect 5s
timeout client  300s
timeout server  300s
timeout http-request 10s
timeout http-keep-alive 15s

listen http
bind :4080 name http  # OK
bind :4443 ssl crt localhost.pem ssl no-sslv3 alpn h2,http/1.1  # FAIL
bind :6443 ssl crt localhost.pem ssl no-sslv3  # OK

bind abns@http accept-proxy

server grafana 127.0.0.1:3000

listen https
bind :8443 ssl crt localhost.pem ssl no-sslv3 alpn h2,http/1.1  # FAIL

http-reuse never
server http abns@http send-proxy

>From the browser, requesting http://localhost:4080/ or 
>https://localhost:6443/, it will work.
But once HTTP/2 is used, it hangs : https://localhost:4443/ and 
http://localhost:8443/

Some details:
# haproxy -vv
HA-Proxy version 1.9-dev1-7ee465-56 2018/08/19
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-fno-strict-overflow -Wno-null-dereference -Wno-unused-label
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.0f  25 May 2017
Running on OpenSSL version : OpenSSL 1.1.0f  25 May 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols markes as  cannot be specified using 'proto' keyword)
  h2 : mode=HTTP   side=FE
: mode=TCP|HTTP   side=FE|BE

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

I'll try to investigate more tonight,
Cyril



Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-21 Thread Olivier Houchard
Hi Pieter,

On Mon, Aug 20, 2018 at 09:33:49PM +0200, PiBa-NL wrote:
> Hi Olivier,
> 
> Op 17-8-2018 om 14:51 schreef Willy Tarreau:
> > On Fri, Aug 17, 2018 at 01:41:54PM +0200, Olivier Houchard wrote:
> > > That is true, this one is not a bug, but a pessimization, by using the 
> > > global
> > > update_list which is more costly than the local one.
> > > 
> > > Patches attached to do as suggest.
> > Applied, thank you!
> > willy
> 
> Just a little update :)
> 
> The '1.9-dev0-48d92ee 2018/07/30'+your initial 2 patches, is running
> correctly using as little cpu as can be expected for its little workload for
> 4+ days now. I think i can call it 'fix confirmed', as you already knew ;) ,
> previously the issue would have likely returned in this time period..
> 
> Ill keep it running for a few more days, and switch back to nbthread 3
> then.. Till next time ;)
> 
> Thanks again!
> 
> Best regards,
> 
> PiBa-NL (Pieter)
> 

Thanks for the feedback :)
Let us know how it goes with 3 threads !

Regards,

Olivier



Re: [PATCH] MEDIUM: lua: Add stick table support for Lua

2018-08-21 Thread Thierry Fournier
Hi Adis,

Thanks for this patch, it is a very useful class.

Some remark about the documentation and the formats:


js:function:: StickTable.info()
js:function:: StickTable.lookup(key)

Maybe the specification and an example of the returnes values
will be welcome, because I guess that the table keys are hardcoded.
Something like :

   {
  type =  -- list of given string,
  length = ,
  ...
   }

or

   {
  expire =  -- Specifiy the unit (ms or ?)
   }

js:function:: StickTable.dump([filter])

The exact list of allowed operators will helps the user to use
your class. It seems that string or regexes operators are not
allowed, only the integer operator are taken in account. This
list is "eq", "ne", "le", "lt", "ge", "gt".

Same remarqk for allowed data type. Maybe a link to the HAProxy
documentation will be sufficient for the data type.

I see in the code that the filters can exceed 4 entries. This
limitation must be written in the doc.

I miss also the relation between oprators and between the content
of operators. I mean AND or OR. How I understand your example:

+local filter = {
+  lt={{"gpc0", 1}, {"gpc1", 2}},
+  gt={{"conn_rate", 3}},
+  eq={{"conn_cur", 4}}
+}

( gpc0 >= 1 AND gpc1 >= 2 ) OR (conn_rate > 3) OR (conn_cur = 4)
   or
( gpc0 >= 1 OR gpc1 >= 2 ) OR (conn_rate > 3) OR (conn_cur = 4)
   or 
( gpc0 >= 1 AND gpc1 >= 2 ) AND (conn_rate > 3) AND (conn_cur = 4)
   or
( gpc0 >= 1 OR gpc1 >= 2 ) AND (conn_rate > 3) AND (conn_cur = 4)

Are you sure that the syntax =
is a good format ? Maybe something like the following, with the operator
as argument between the two operands. lines are implicitly OR, and columns
are AND:

+local filter = {
+  {{"gpc0", "lt", 1}, {"gpc1", "lt", 2}},
+  {{"conn_rate", "gt", 3}},
+  {{"conn_cur", "eq", 4}}
+}

It is read:

   ( gpc0 >= 1 OR gpc1 >= 2 ) AND (conn_rate > 3) AND (conn_cur = 4)

You can also implement a parser for natural language, in order to understand
directly the previous string ? This algoritm is very simple to implement:

   https://en.wikipedia.org/wiki/Shunting-yard_algorithm

Idea of extension for the future: Maybe it will be safe to compile
sticktable filter during the initialisation of the Lua code, to avoid
runtime errors ?


Other point with the doc of this function, you must specify that the
execution of this function can be very long for millions of entry, even
if a filter is specified because the stick table is entirely scanned.


some remarks about the code and the logs:
-

The line 182 of the patch contains space in place of tab.

Line 274 of your patch, I don't see any HA_SPIN_LOCK(STK_TABLE_LOCK
I don't known very well the thread, so maybe there are useles, maybe no.

Line 311: I see the decrement of the refcount without ATOMIC function
and whitout lock. Once again, I don't known very well the thread but
I send a warning. Maybe locks are useles, maybe no.

Line 286 of your patch. It seems that smp.flags is used uninitialized.
Maybe you should apply this change:

   -smp.flags |= SMP_F_CONST;
   +smp.flags = SMP_F_CONST;

l.365, 369: The user doesn't have context about the error. there are the
first entry of the table, the second ? Which operator doesn't exists ?

L.380, 384: Which line is wrong ?

L.431: Your release the lock, so the next element relative to the current
"n", can disappear and the ebmb_next() can return wrong memory.

That's all.
Thierry





> On 20 Aug 2018, at 15:15, Adis Nezirovic  wrote:
> 
> On Mon, Aug 20, 2018 at 02:11:13PM +0200, Adis Nezirovic wrote:
>> Hi guys,
>> 
>> I've attached a patch to add stick table support to Lua. Operations are
>> mostly similar to "show table" functionality from admin socket, namely:
>> 
>> - Show basic table info
>> - Key lookup
>> - Table dump, optionally using data/column filter
>> 
>> One side note, the code provides support for multiple filters
>> (4 by default) while CLI doesn't support that, nor it complains about
>> multiple "  " clauses.
>> 
>> Also, if this patch is accepted, maybe we can use provided helper
>> functions in other places in the code.
>> 
> 
> It's always funny to reply to self, right sending email to public I've
> spotted a bug. New patch attached.
> 
> SMT_T_SINT should be treated as ordinary signed integer, and shoud use
> lua_pushinteger() on it?
> 
> On many places in the code it is noted as "64" bit integer, but in
> stick_table.c it is defined as 32bit integer:
> 
> struct stktable_type stktable_types[SMP_TYPES] = {
>   [SMP_T_SINT] = { "integer", 0, 4 },
> 
> 
> Best regards,
> Adis
> <0001-MEDIUM-lua-Add-stick-table-support-for-Lua-read-only.patch>




Re: [PATCH] MEDIUM: lua: Add stick table support for Lua

2018-08-21 Thread Thierry Fournier


> On 20 Aug 2018, at 15:15, Adis Nezirovic  wrote:
> 
> On Mon, Aug 20, 2018 at 02:11:13PM +0200, Adis Nezirovic wrote:
>> Hi guys,
>> 
>> I've attached a patch to add stick table support to Lua. Operations are
>> mostly similar to "show table" functionality from admin socket, namely:
>> 
>> - Show basic table info
>> - Key lookup
>> - Table dump, optionally using data/column filter
>> 
>> One side note, the code provides support for multiple filters
>> (4 by default) while CLI doesn't support that, nor it complains about
>> multiple "  " clauses.


Hi Adis,

This is a great feature. I look this ASAP.


>> Also, if this patch is accepted, maybe we can use provided helper
>> functions in other places in the code.
>> 
> 
> It's always funny to reply to self, right sending email to public I've
> spotted a bug. New patch attached.
> 
> SMT_T_SINT should be treated as ordinary signed integer, and shoud use
> lua_pushinteger() on it?


There are a function which convert haproxy type in Lua types: hlua_arg2lua(),
and SINT is converted with lua_pushinteger(). I guess that stick tables SINT
are the same.


> On many places in the code it is noted as "64" bit integer, but in
> stick_table.c it is defined as 32bit integer:
> 
> struct stktable_type stktable_types[SMP_TYPES] = {
>   [SMP_T_SINT] = { "integer", 0, 4 },


The lua_pushinteger function takes a lua_Integer which is defined as known C
type (int, long or long long). So, when the function lua_pushinteger() is
called, the compilator perform a conversion between the type passed as argument
and the expected type. So I’m confident with the good behavior.

br,
Thierry


> Best regards,
> Adis
> <0001-MEDIUM-lua-Add-stick-table-support-for-Lua-read-only.patch>