Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Confirmed on my side as well. No segfault, and no spinning CPU with the latest patch. thanks! Steve Ruiz Manager - Hosting Operations Mirth ste...@mirth.com On Fri, Jan 17, 2014 at 10:25 AM, Cyril Bonté wrote: > Le 17/01/2014 11:14, Willy Tarreau a écrit : > > On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote: >> >>> On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote: >>> So I might have broken something in the way to count the "try" value, ending up with zero being selected and nothing done. Unfortunately it works fine here. >>> >>> OK I can reproduce it in 32-bit now. Let's see what happens... >>> >> >> OK here's the fix. I'm ashamed for not having noticed this mistake during >> the change. I ported the raw_sock changes to ssl_sock, it was pretty >> straght-forward but I missed the condition in the while () loop. And >> unfortunately, the variable happened to be non-zero in the stack, >> resulting in something working well for me :-/ >> >> I've pushed the fix. >> > > Great ! I didn't have time to try to fix it yesterday. > Everything is working well now, we can definitely close this bug, that's a > good thing ;-) > > -- > Cyril Bonté > -- CONFIDENTIALITY NOTICE: The information contained in this electronic transmission may be confidential. If you are not an intended recipient, be aware that any disclosure, copying, distribution or use of the information contained in this transmission is prohibited and may be unlawful. If you have received this transmission in error, please notify us by email reply and then erase it from your computer system.
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Le 17/01/2014 11:14, Willy Tarreau a écrit : On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote: On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote: So I might have broken something in the way to count the "try" value, ending up with zero being selected and nothing done. Unfortunately it works fine here. OK I can reproduce it in 32-bit now. Let's see what happens... OK here's the fix. I'm ashamed for not having noticed this mistake during the change. I ported the raw_sock changes to ssl_sock, it was pretty straght-forward but I missed the condition in the while () loop. And unfortunately, the variable happened to be non-zero in the stack, resulting in something working well for me :-/ I've pushed the fix. Great ! I didn't have time to try to fix it yesterday. Everything is working well now, we can definitely close this bug, that's a good thing ;-) -- Cyril Bonté
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote: > On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote: > > So I might have broken something in the way to count the "try" value, > > ending up with zero being selected and nothing done. Unfortunately it > > works fine here. > > OK I can reproduce it in 32-bit now. Let's see what happens... OK here's the fix. I'm ashamed for not having noticed this mistake during the change. I ported the raw_sock changes to ssl_sock, it was pretty straght-forward but I missed the condition in the while () loop. And unfortunately, the variable happened to be non-zero in the stack, resulting in something working well for me :-/ I've pushed the fix. Thanks guys. Willy >From 00b0fb9349b8842a5ec2cee9dc4f286c8d3a3685 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Fri, 17 Jan 2014 11:09:40 +0100 Subject: BUG/MAJOR: ssl: fix breakage caused by recent fix abf08d9 MIME-Version: 1.0 Content-Type: text/plain; charset=latin1 Content-Transfer-Encoding: 8bit Recent commit abf08d9 ("BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage") accidentely broke SSL by relying on an uninitialized value to enter the read loop. Many thanks to Cyril Bonté and Steve Ruiz for reporting this issue. --- src/ssl_sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/ssl_sock.c b/src/ssl_sock.c index 7120ff8..87a2a58 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -1353,7 +1353,7 @@ static int ssl_sock_to_buf(struct connection *conn, struct buffer *buf, int coun * in which case we accept to do it once again. A new attempt is made on * EINTR too. */ - while (try) { + while (count > 0) { /* first check if we have some room after p+i */ try = buf->data + buf->size - (buf->p + buf->i); /* otherwise continue between data and p-o */ -- 1.7.12.1
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote: > So I might have broken something in the way to count the "try" value, > ending up with zero being selected and nothing done. Unfortunately it > works fine here. OK I can reproduce it in 32-bit now. Let's see what happens... Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Cyril, On Thu, Jan 16, 2014 at 10:48:10PM +0100, Cyril Bonté wrote: > Hi Willy, > > Le 15/01/2014 01:08, Willy Tarreau a écrit : > >On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote: > >>Patched and confirmed in our environment that this is now working / seems > >>to have fixed the issue. Thanks! > > > >Great, many thanks to you both guys. We've got rid of another pretty > >old bug, these are the ones that make me the happiest once fixed! > > > >I'm currently unpacking my laptop to push the fix so that it appears > >in todays snapshot. > > > >Excellent work! > > I fear there are some more work to do on this patch. > I made some tests on ssl and it looks to be broken since this commit :-( > > The shortest configuration I could find to reproduce the issue is : > listen test > bind 0.0.0.0:443 ssl crt cert.pem > mode http > timeout server 5s > timeout client 5s > > When a request is received by haproxy, the cpu raises to 100% in a > epoll_wait loop (timeouts are here to prevent an unlimited loop). > > $ curl -k https://localhost/ > ... > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > ... So I might have broken something in the way to count the "try" value, ending up with zero being selected and nothing done. Unfortunately it works fine here. Could you try to single-step in gdb through ssl_sock_to_buf ? I'll continue to test if I can reproduce it and also to understand in which case we could end up with a wrong size computation. Thanks, Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Cyril is correct - I simply waited for a segfault, but didn't actually test through the load balancer. I'm using SSL on haproxy, and yes, when I try to hit a web page behind haproxy, CPU spins at 100% for a good while. Steve Ruiz Manager - Hosting Operations Mirth ste...@mirth.com On Thu, Jan 16, 2014 at 1:48 PM, Cyril Bonté wrote: > Hi Willy, > > Le 15/01/2014 01:08, Willy Tarreau a écrit : > > On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote: >> >>> Patched and confirmed in our environment that this is now working / seems >>> to have fixed the issue. Thanks! >>> >> >> Great, many thanks to you both guys. We've got rid of another pretty >> old bug, these are the ones that make me the happiest once fixed! >> >> I'm currently unpacking my laptop to push the fix so that it appears >> in todays snapshot. >> >> Excellent work! >> > > I fear there are some more work to do on this patch. > I made some tests on ssl and it looks to be broken since this commit :-( > > The shortest configuration I could find to reproduce the issue is : > listen test > bind 0.0.0.0:443 ssl crt cert.pem > mode http > timeout server 5s > timeout client 5s > > When a request is received by haproxy, the cpu raises to 100% in a > epoll_wait loop (timeouts are here to prevent an unlimited loop). > > $ curl -k https://localhost/ > ... > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 > ... > The same issue occurs when a server is declared. > > The same also occurs when the proxy is in clear http and a server is in > https : > listen test > bind 0.0.0.0:80 > mode http > timeout server 5s > timeout client 5s > server ssl_backend 127.0.0.1:443 ssl > > $ curl http://localhost/ > ... > epoll_wait(3, {}, 200, 0) = 0 > epoll_wait(3, {}, 200, 0) = 0 > epoll_wait(3, {}, 200, 0) = 0 > epoll_wait(3, {}, 200, 0) = 0 > epoll_wait(3, {}, 200, 0) = 0 > ... > > > -- > Cyril Bonté > -- CONFIDENTIALITY NOTICE: The information contained in this electronic transmission may be confidential. If you are not an intended recipient, be aware that any disclosure, copying, distribution or use of the information contained in this transmission is prohibited and may be unlawful. If you have received this transmission in error, please notify us by email reply and then erase it from your computer system.
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Willy, Le 15/01/2014 01:08, Willy Tarreau a écrit : On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote: Patched and confirmed in our environment that this is now working / seems to have fixed the issue. Thanks! Great, many thanks to you both guys. We've got rid of another pretty old bug, these are the ones that make me the happiest once fixed! I'm currently unpacking my laptop to push the fix so that it appears in todays snapshot. Excellent work! I fear there are some more work to do on this patch. I made some tests on ssl and it looks to be broken since this commit :-( The shortest configuration I could find to reproduce the issue is : listen test bind 0.0.0.0:443 ssl crt cert.pem mode http timeout server 5s timeout client 5s When a request is received by haproxy, the cpu raises to 100% in a epoll_wait loop (timeouts are here to prevent an unlimited loop). $ curl -k https://localhost/ ... epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1 ... The same issue occurs when a server is declared. The same also occurs when the proxy is in clear http and a server is in https : listen test bind 0.0.0.0:80 mode http timeout server 5s timeout client 5s server ssl_backend 127.0.0.1:443 ssl $ curl http://localhost/ ... epoll_wait(3, {}, 200, 0) = 0 epoll_wait(3, {}, 200, 0) = 0 epoll_wait(3, {}, 200, 0) = 0 epoll_wait(3, {}, 200, 0) = 0 epoll_wait(3, {}, 200, 0) = 0 ... -- Cyril Bonté
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote: > Patched and confirmed in our environment that this is now working / seems > to have fixed the issue. Thanks! Great, many thanks to you both guys. We've got rid of another pretty old bug, these are the ones that make me the happiest once fixed! I'm currently unpacking my laptop to push the fix so that it appears in todays snapshot. Excellent work! Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Patched and confirmed in our environment that this is now working / seems to have fixed the issue. Thanks! Steve Ruiz On Tue, Jan 14, 2014 at 3:22 AM, Willy Tarreau wrote: > OK here's a proposed fix which addresses the API issue for both > raw_sock and ssl_sock. > > Steve, it would be nice if you could give it a try just to confirm > I didn't miss anything. > > Thanks, > Willy > > -- CONFIDENTIALITY NOTICE: The information contained in this electronic transmission may be confidential. If you are not an intended recipient, be aware that any disclosure, copying, distribution or use of the information contained in this transmission is prohibited and may be unlawful. If you have received this transmission in error, please notify us by email reply and then erase it from your computer system.
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Willy, have you validated this version in our lab as well Baptiste Le 14 janv. 2014 19:21, "Cyril Bonté" a écrit : > Hi again Willy, > > Le 14/01/2014 12:22, Willy Tarreau a écrit : > >> OK here's a proposed fix which addresses the API issue for both >> raw_sock and ssl_sock. >> >> Steve, it would be nice if you could give it a try just to confirm >> I didn't miss anything. >> > > OK, from my side, now I'm on the laptop where I can reproduce the > segfault, I confirm it doesn't crash anymore once the patch is applied > (which was predictable from the quick test I made this afternoon). > > Let's see if it's OK for Steve too ;-) > > -- > Cyril Bonté > >
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi again Willy, Le 14/01/2014 12:22, Willy Tarreau a écrit : OK here's a proposed fix which addresses the API issue for both raw_sock and ssl_sock. Steve, it would be nice if you could give it a try just to confirm I didn't miss anything. OK, from my side, now I'm on the laptop where I can reproduce the segfault, I confirm it doesn't crash anymore once the patch is applied (which was predictable from the quick test I made this afternoon). Let's see if it's OK for Steve too ;-) -- Cyril Bonté
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
OK here's a proposed fix which addresses the API issue for both raw_sock and ssl_sock. Steve, it would be nice if you could give it a try just to confirm I didn't miss anything. Thanks, Willy >From 3e499a6da1ca070f23083c874aa48895f00d0d6f Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Tue, 14 Jan 2014 11:31:27 +0100 Subject: BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage MIME-Version: 1.0 Content-Type: text/plain; charset=latin1 Content-Transfer-Encoding: 8bit Steve Ruiz reported some reproducible crashes with HTTP health checks on a certain page returning a huge length. The traces he provided clearly showed that the recv() call was performed twice for a total size exceeding the buffer's length. Cyril Bonté tracked down the problem to be caused by the full buffer size being passed to rcv_buf() in event_srv_chk_r() instead of passing just the remaining amount of space. Indeed, this change happened during the connection rework in 1.5-dev13 with the following commit : f150317 MAJOR: checks: completely use the connection transport layer But one of the problems is also that the comments at the top of the rcv_buf() functions suggest that the caller only has to ensure the requested size doesn't overflow the buffer's size. Also, these functions already have to care about the buffer's size to handle wrapping free space when there are pending data in the buffer. So let's change the API instead to more closely match what could be expected from these functions : - the caller asks for the maximum amount of bytes it wants to read ; This means that only the caller is responsible for enforcing the reserve if it wants to (eg: checks don't). - the rcv_buf() functions fix their computations to always consider this size as a max, and always perform validity checks based on the buffer's free space. As a result, the code is simplified and reduced, and made more robust for callers which now just have to care about whether they want the buffer to be filled or not. Since the bug was introduced in 1.5-dev13, no backport to stable versions is needed. --- src/checks.c | 2 +- src/raw_sock.c | 31 --- src/ssl_sock.c | 29 +++-- 3 files changed, 32 insertions(+), 30 deletions(-) diff --git a/src/checks.c b/src/checks.c index 3237304..2274136 100644 --- a/src/checks.c +++ b/src/checks.c @@ -2065,7 +2065,7 @@ static void tcpcheck_main(struct connection *conn) goto out_end_tcpcheck; if ((conn->flags & CO_FL_WAIT_RD) || - conn->xprt->rcv_buf(conn, check->bi, buffer_total_space(check->bi)) <= 0) { + conn->xprt->rcv_buf(conn, check->bi, check->bi->size) <= 0) { if (conn->flags & (CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH)) { done = 1; if ((conn->flags & CO_FL_ERROR) && !check->bi->i) { diff --git a/src/raw_sock.c b/src/raw_sock.c index 4dc1c7a..2e3a0cb 100644 --- a/src/raw_sock.c +++ b/src/raw_sock.c @@ -226,8 +226,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe *pipe) /* Receive up to bytes from connection 's socket and store them - * into buffer . The caller must ensure that is always smaller - * than the buffer's size. Only one call to recv() is performed, unless the + * into buffer . Only one call to recv() is performed, unless the * buffer wraps, in which case a second call may be performed. The connection's * flags are updated with whatever special event is detected (error, read0, * empty). The caller is responsible for taking care of those events and @@ -239,7 +238,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe *pipe) static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int count) { int ret, done = 0; - int try = count; + int try; if (!(conn->flags & CO_FL_CTRL_READY)) return 0; @@ -258,24 +257,27 @@ static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int coun } } - /* compute the maximum block size we can read at once. */ - if (buffer_empty(buf)) { - /* let's realign the buffer to optimize I/O */ + /* let's realign the buffer to optimize I/O */ + if (buffer_empty(buf)) buf->p = buf->data; - } - else if (buf->data + buf->o < buf->p && -buf->p + buf->i < buf->data + buf->size) { - /* remaining space wraps at the end, with a moving limit */ - if (try > buf->data + buf->size - (buf->p + buf->i)) - try = buf->data + buf->size - (buf->p + buf->i); - } /* read the largest possible block. For this, we perform only one call * to recv() unless the buffer wraps and we exactly fill the first hunk,
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Cyril, On Tue, Jan 14, 2014 at 08:23:00AM +0100, Willy Tarreau wrote: > Hey, excellent catch! You're absolutely right. I'm totally ashamed > for not having found it while reading the code. I was searching for > a place where a wrong computation could lead to something larger > than the buffer and forgot to check for multiple reads of the > buffer's size :-) Now thinking about it a little bit more, I think we have an API problem in fact. The raw_sock_to_buf() functions says : /* Receive up to bytes from connection 's socket and store them * into buffer . The caller must ensure that is always smaller * than the buffer's size. */ But as you found, this is misleading as it doesn't work that well, since the caller needs to take care of not asking for too much data. So I'm thinking about changing the API instead so that the caller doesn't have to care abou this and that only the read functions do. Anyway, they already care about free space wrapping at the end of the buffer. So I'd rather fix raw_sock_to_buf() and ssl_sock_to_buf() with a patch like this one, and simplify the logic at some call places. It would make the code much more robust and protect us against such bugs in the future. Could you please give it a try in your environment ? Thanks, Willy diff --git a/src/raw_sock.c b/src/raw_sock.c index 4dc1c7a..2e3a0cb 100644 --- a/src/raw_sock.c +++ b/src/raw_sock.c @@ -226,8 +226,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe *pipe) /* Receive up to bytes from connection 's socket and store them - * into buffer . The caller must ensure that is always smaller - * than the buffer's size. Only one call to recv() is performed, unless the + * into buffer . Only one call to recv() is performed, unless the * buffer wraps, in which case a second call may be performed. The connection's * flags are updated with whatever special event is detected (error, read0, * empty). The caller is responsible for taking care of those events and @@ -239,7 +238,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe *pipe) static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int count) { int ret, done = 0; - int try = count; + int try; if (!(conn->flags & CO_FL_CTRL_READY)) return 0; @@ -258,24 +257,27 @@ static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int coun } } - /* compute the maximum block size we can read at once. */ - if (buffer_empty(buf)) { - /* let's realign the buffer to optimize I/O */ + /* let's realign the buffer to optimize I/O */ + if (buffer_empty(buf)) buf->p = buf->data; - } - else if (buf->data + buf->o < buf->p && -buf->p + buf->i < buf->data + buf->size) { - /* remaining space wraps at the end, with a moving limit */ - if (try > buf->data + buf->size - (buf->p + buf->i)) - try = buf->data + buf->size - (buf->p + buf->i); - } /* read the largest possible block. For this, we perform only one call * to recv() unless the buffer wraps and we exactly fill the first hunk, * in which case we accept to do it once again. A new attempt is made on * EINTR too. */ - while (try) { + while (count > 0) { + /* first check if we have some room after p+i */ + try = buf->data + buf->size - (buf->p + buf->i); + /* otherwise continue between data and p-o */ + if (try <= 0) { + try = buf->p - (buf->data + buf->o); + if (try <= 0) + break; + } + if (try > count) + try = count; + ret = recv(conn->t.sock.fd, bi_end(buf), try, 0); if (ret > 0) { @@ -291,7 +293,6 @@ static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int coun break; } count -= ret; - try = count; } else if (ret == 0) { goto read0;
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Cyril! On Tue, Jan 14, 2014 at 02:51:41AM +0100, Cyril Bonté wrote: > Le 14/01/2014 00:51, Cyril Bonté a écrit : > Well, I couldn't leave my debug session in its current state. I know what it's like when you go to bed an cannot sleep with eyes wide open thinking about your last gdb output :-) > Can you confirm that this patch could fix the issue ? I think this > prevents a buffer overflow when waiting for more data. > Currently, I can't reproduce segfaults anymore when applied. Hey, excellent catch! You're absolutely right. I'm totally ashamed for not having found it while reading the code. I was searching for a place where a wrong computation could lead to something larger than the buffer and forgot to check for multiple reads of the buffer's size :-) > Now it's time to sleep some hours ;-) Yeah you deserve it. Steve, please also confirm that Cyril's patch fixes your segfault (I'm sure it does given the traces you provided). Cyril, feel free to send it to me with a few lines of commit message, I'll merge it. Just for the record, the bug was introduced in 1.5-dev13 by this patch : f150317 MAJOR: checks: completely use the connection transport layer Thanks! Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi again Willy, Le 14/01/2014 00:51, Cyril Bonté a écrit : I don't know if this is of any help because I don't have enough details yet, but I jut reproduced segfaults while playing with the configuration provided by Steve. To reproduce it on my laptop, it's quite easy : generate a lot of headers, and send the content of 404.html. Here is a PHP script I used to emulate the check : There's something strange in the values I sent to the debug output. In bo_putblk(), the "half" variable could have a negative value, which then segfaults when calling memcpy(). Now I can reproduce a segfault, I'll try to make some more tests tomorrow (only after work). But I believe you'll already find the reason before ;-) Well, I couldn't leave my debug session in its current state. Can you confirm that this patch could fix the issue ? I think this prevents a buffer overflow when waiting for more data. Currently, I can't reproduce segfaults anymore when applied. Now it's time to sleep some hours ;-) -- Cyril Bonté diff --git a/src/checks.c b/src/checks.c index 115cc85..abdc333 100644 --- a/src/checks.c +++ b/src/checks.c @@ -1031,7 +1031,7 @@ static void event_srv_chk_r(struct connection *conn) done = 0; - conn->xprt->rcv_buf(conn, check->bi, check->bi->size); + conn->xprt->rcv_buf(conn, check->bi, buffer_total_space(check->bi)); if (conn->flags & (CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH)) { done = 1; if ((conn->flags & CO_FL_ERROR) && !check->bi->i) {
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Willy, Le 13/01/2014 19:19, Willy Tarreau a écrit : There's something excellent in your trace : 09:52:29.759117 sendto(1, "GET /cp/testcheck.php HTTP/1.0\r\n"..., 34, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 34 09:52:29.759357 epoll_ctl(0, EPOLL_CTL_MOD, 1, {EPOLLIN|0x2000, {u32=1, u64=1}}) = 0 09:52:29.759487 gettimeofday({1389635549, 759527}, NULL) = 0 09:52:29.759603 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1 09:52:29.768407 gettimeofday({1389635549, 768449}, NULL) = 0 09:52:29.768529 recvfrom(1, "HTTP/1.1 404 Not Found\r\nDate: Mo"..., 16384, 0, NULL, NULL) = 4344 09:52:29.768754 gettimeofday({1389635549, 768796}, NULL) = 0 09:52:29.768873 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1 09:52:29.769096 gettimeofday({1389635549, 769137}, NULL) = 0 09:52:29.769309 recvfrom(1, "l .2s ease-in-out}.img-circle{bo"..., 16384, 0, NULL, NULL) = 16384 09:52:29.769597 recvfrom(1, NULL, 2147483647, MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = 31400 09:52:29.769751 recvfrom(1, 0, 2147483647, 16480, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 09:52:29.769933 setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 09:52:29.770087 close(1)= 0 As you can see, we first read 4kB, then read an extra 16kB on top of it, so for sure we overflow the read buffer. How this is possible is still a mystery but now I'll dig along this track. I suspect we erroneously start to flush the buffer at some point where we should not. I don't know if this is of any help because I don't have enough details yet, but I jut reproduced segfaults while playing with the configuration provided by Steve. To reproduce it on my laptop, it's quite easy : generate a lot of headers, and send the content of 404.html. Here is a PHP script I used to emulate the check : There's something strange in the values I sent to the debug output. In bo_putblk(), the "half" variable could have a negative value, which then segfaults when calling memcpy(). Now I can reproduce a segfault, I'll try to make some more tests tomorrow (only after work). But I believe you'll already find the reason before ;-) -- Cyril Bonté
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
There's something excellent in your trace : 09:52:29.759117 sendto(1, "GET /cp/testcheck.php HTTP/1.0\r\n"..., 34, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 34 09:52:29.759357 epoll_ctl(0, EPOLL_CTL_MOD, 1, {EPOLLIN|0x2000, {u32=1, u64=1}}) = 0 09:52:29.759487 gettimeofday({1389635549, 759527}, NULL) = 0 09:52:29.759603 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1 09:52:29.768407 gettimeofday({1389635549, 768449}, NULL) = 0 09:52:29.768529 recvfrom(1, "HTTP/1.1 404 Not Found\r\nDate: Mo"..., 16384, 0, NULL, NULL) = 4344 09:52:29.768754 gettimeofday({1389635549, 768796}, NULL) = 0 09:52:29.768873 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1 09:52:29.769096 gettimeofday({1389635549, 769137}, NULL) = 0 09:52:29.769309 recvfrom(1, "l .2s ease-in-out}.img-circle{bo"..., 16384, 0, NULL, NULL) = 16384 09:52:29.769597 recvfrom(1, NULL, 2147483647, MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = 31400 09:52:29.769751 recvfrom(1, 0, 2147483647, 16480, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) 09:52:29.769933 setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 09:52:29.770087 close(1)= 0 As you can see, we first read 4kB, then read an extra 16kB on top of it, so for sure we overflow the read buffer. How this is possible is still a mystery but now I'll dig along this track. I suspect we erroneously start to flush the buffer at some point where we should not. Thank you! Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
On Mon, Jan 13, 2014 at 10:10:45AM -0800, Steve Ruiz wrote: > sure thing, trace attached. Looking at the page returned, the only strange > thing I can see is that there are extremely long lines in the response - > I'm guessing on the order of 100k / line. I also tried this but failed to see the issue. The string is looked up using strstr() so it's insensible to this. I've looked at how status messages were reported and did not find a place where a copy of the output was returned. But I'll insist with these elements in mind. > I'm attaching our error doc as well, please don't share this as its > proprietary. Steve, you're posting to a public mailing list! Unfortunately it's too late now :-( > I'm guessing if you're > allocating a certain buffer space, and doing a read-line() that could do it. That's exactly why I was interested in the very useful information you provided above. > Let me know if you need anything else. I'll check what I can do with this, thank you very much! Best regards, Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Tim, On Mon, Jan 13, 2014 at 12:25:30PM -0500, Tim Prepscius wrote: > Willy, > > Can you take me off of this list? done! > Unsubscribing doesn't work. I have no idea why. I've tried many times. > The last time I tried, I got back a message that gmail was identified > as a spammer. This is the reason. From time to time, gmail seems to be marked as a spammer by some RBLs. I absolutely hate RBLs beyond imagination for this exact reason, clueless bots marking any sender as spammer, sometimes helped by stupid or arrogant people. But anyway they help keeping the spam rate low enough so we keep them. Overall it doesn't work too bad anyway. Regards, Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Willy, Can you take me off of this list? Unsubscribing doesn't work. I have no idea why. I've tried many times. The last time I tried, I got back a message that gmail was identified as a spammer. Here is a sample of my unsubscribe message: - snip MIME-Version: 1.0 Received: by 10.140.86.244 with HTTP; Thu, 9 Jan 2014 19:38:56 -0800 (PST) Date: Thu, 9 Jan 2014 22:38:56 -0500 Delivered-To: timprepsc...@gmail.com Message-ID: Subject: unsubscribe From: Tim Prepscius To: haproxy+unsubscr...@formilux.org Content-Type: text/plain; charset=ISO-8859-1 unsubscribe - snip Thank you, -tim On 1/13/14, Willy Tarreau wrote: > Hi again Steve, > > On Mon, Jan 13, 2014 at 08:44:08AM +0100, Willy Tarreau wrote: >> Hi Steve, >> >> On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote: >> > I'm experimenting with haproxy on a centos6 VM here. I found that when >> > I >> > specified a health check page (option httpchk GET /url), and that page >> > didn't exist, we have a large 404 page returned, and that causes haproxy >> > to >> > quickly segfault (seems like on the second try GET'ing and parsing the >> > page). I couldn't figure out from the website where to submit a bug, so >> > I >> > figure I'll try here first. >> > >> > Steps to reproduce: >> > - setup http backend, with option httpchk and httpcheck expect string >> > x. >> > Make option httpchk point to a non-existent page >> > - On backend server, set it up to serve large 404 response (in my case, >> > the >> > 404 page is 186kB, as it has an inline graphic and inline css) >> > - Start haproxy, and wait for it to segfault >> > >> > I wasn't sure exactly what was causing this at first, so I did some work >> > to >> > narrow it down with GDB. The variable values from gdb led me to the >> > cause >> > on my side, and hopefully can help you fix the issue. I could not make >> > this work with simply a large page for the http response - in that case, >> > it >> > seems to work as advertised, only inspecting the response up to >> > tune.chksize (default 16384 as i've left it). But if I do this with a >> > 404, >> > it seems to kill it. Let me know what additional information you need >> > if >> > any. Thanks and kudos for the great bit of software! >> >> Thanks for all these details. I remember that the http-expect code puts >> a zero at the end of the received buffer prior to looking up the string. >> But it might be possible that there would be some cases where it doesn't >> do it, or maybe it dies after restoring it. Another thing I'm thinking >> about is that we're using the trash buffer for many operations and I'm >> realizing that the check buffer's size might possibly be larger :-/ > > I'm a bit puzzled, not only I cannot reproduce the issue, but also I do > not see in the code how this could happen, so I must be missing something. > Could you please post the output of "strace -tt" on haproxy when it does > this ? Especially the last checks ? I'm suspecting an anomaly in the > receive > buffer size calculation but all I read here seems fine, which puzzles me. > > Thanks! > Willy > > >
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi again Steve, On Mon, Jan 13, 2014 at 08:44:08AM +0100, Willy Tarreau wrote: > Hi Steve, > > On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote: > > I'm experimenting with haproxy on a centos6 VM here. I found that when I > > specified a health check page (option httpchk GET /url), and that page > > didn't exist, we have a large 404 page returned, and that causes haproxy to > > quickly segfault (seems like on the second try GET'ing and parsing the > > page). I couldn't figure out from the website where to submit a bug, so I > > figure I'll try here first. > > > > Steps to reproduce: > > - setup http backend, with option httpchk and httpcheck expect string x. > > Make option httpchk point to a non-existent page > > - On backend server, set it up to serve large 404 response (in my case, the > > 404 page is 186kB, as it has an inline graphic and inline css) > > - Start haproxy, and wait for it to segfault > > > > I wasn't sure exactly what was causing this at first, so I did some work to > > narrow it down with GDB. The variable values from gdb led me to the cause > > on my side, and hopefully can help you fix the issue. I could not make > > this work with simply a large page for the http response - in that case, it > > seems to work as advertised, only inspecting the response up to > > tune.chksize (default 16384 as i've left it). But if I do this with a 404, > > it seems to kill it. Let me know what additional information you need if > > any. Thanks and kudos for the great bit of software! > > Thanks for all these details. I remember that the http-expect code puts > a zero at the end of the received buffer prior to looking up the string. > But it might be possible that there would be some cases where it doesn't > do it, or maybe it dies after restoring it. Another thing I'm thinking > about is that we're using the trash buffer for many operations and I'm > realizing that the check buffer's size might possibly be larger :-/ I'm a bit puzzled, not only I cannot reproduce the issue, but also I do not see in the code how this could happen, so I must be missing something. Could you please post the output of "strace -tt" on haproxy when it does this ? Especially the last checks ? I'm suspecting an anomaly in the receive buffer size calculation but all I read here seems fine, which puzzles me. Thanks! Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Steve, On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote: > I'm experimenting with haproxy on a centos6 VM here. I found that when I > specified a health check page (option httpchk GET /url), and that page > didn't exist, we have a large 404 page returned, and that causes haproxy to > quickly segfault (seems like on the second try GET'ing and parsing the > page). I couldn't figure out from the website where to submit a bug, so I > figure I'll try here first. > > Steps to reproduce: > - setup http backend, with option httpchk and httpcheck expect string x. > Make option httpchk point to a non-existent page > - On backend server, set it up to serve large 404 response (in my case, the > 404 page is 186kB, as it has an inline graphic and inline css) > - Start haproxy, and wait for it to segfault > > I wasn't sure exactly what was causing this at first, so I did some work to > narrow it down with GDB. The variable values from gdb led me to the cause > on my side, and hopefully can help you fix the issue. I could not make > this work with simply a large page for the http response - in that case, it > seems to work as advertised, only inspecting the response up to > tune.chksize (default 16384 as i've left it). But if I do this with a 404, > it seems to kill it. Let me know what additional information you need if > any. Thanks and kudos for the great bit of software! Thanks for all these details. I remember that the http-expect code puts a zero at the end of the received buffer prior to looking up the string. But it might be possible that there would be some cases where it doesn't do it, or maybe it dies after restoring it. Another thing I'm thinking about is that we're using the trash buffer for many operations and I'm realizing that the check buffer's size might possibly be larger :-/ In your case the check indeed died on the second request so it's out of context. I'll try to reproduce this and fix it, thanks very much for your valuable information! Willy
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Thanks for the workaround + super fast response, and glad to help :). Steve Ruiz Manager - Hosting Operations Mirth ste...@mirth.com On Fri, Jan 10, 2014 at 3:53 PM, Baptiste wrote: > Well, let say this is a workaround... > We'll definitively have to fix the bug ;) > > Baptiste > > On Sat, Jan 11, 2014 at 12:24 AM, Steve Ruiz wrote: > > Made those changes, and it seems to be working properly, no segfault yet > > after ~2 minutes of checks. Thanks! > > > > Steve Ruiz > > Manager - Hosting Operations > > Mirth > > ste...@mirth.com > > > > > > On Fri, Jan 10, 2014 at 3:06 PM, Baptiste wrote: > >> > >> Hi Steve, > >> > >> Could you give a try to the tcp-check and tell us if your have the same > >> issue. > >> In your backend, turn your httpchk related directives into: > >> option tcp-check > >> tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n > >> tcp-check send \r\n > >> tcp-check expect string good > >> > >> Baptiste > >> > >> > >> On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz wrote: > >> > I'm experimenting with haproxy on a centos6 VM here. I found that > when > >> > I > >> > specified a health check page (option httpchk GET /url), and that page > >> > didn't exist, we have a large 404 page returned, and that causes > haproxy > >> > to > >> > quickly segfault (seems like on the second try GET'ing and parsing the > >> > page). I couldn't figure out from the website where to submit a bug, > so > >> > I > >> > figure I'll try here first. > >> > > >> > Steps to reproduce: > >> > - setup http backend, with option httpchk and httpcheck expect string > x. > >> > Make option httpchk point to a non-existent page > >> > - On backend server, set it up to serve large 404 response (in my > case, > >> > the > >> > 404 page is 186kB, as it has an inline graphic and inline css) > >> > - Start haproxy, and wait for it to segfault > >> > > >> > I wasn't sure exactly what was causing this at first, so I did some > work > >> > to > >> > narrow it down with GDB. The variable values from gdb led me to the > >> > cause > >> > on my side, and hopefully can help you fix the issue. I could not > make > >> > this > >> > work with simply a large page for the http response - in that case, it > >> > seems > >> > to work as advertised, only inspecting the response up to tune.chksize > >> > (default 16384 as i've left it). But if I do this with a 404, it > seems > >> > to > >> > kill it. Let me know what additional information you need if any. > >> > Thanks > >> > and kudos for the great bit of software! > >> > > >> > > >> > #haproxy config: > >> > #- > >> > # Example configuration for a possible web application. See the > >> > # full configuration options online. > >> > # > >> > # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt > >> > # > >> > #- > >> > > >> > # Help in developing config here: > >> > # https://www.twilio.com/engineering/2013/10/16/haproxy > >> > > >> > > >> > #- > >> > # Global settings > >> > #- > >> > global > >> > # to have these messages end up in /var/log/haproxy.log you will > >> > # need to: > >> > # > >> > # 1) configure syslog to accept network log events. This is done > >> > #by adding the '-r' option to the SYSLOGD_OPTIONS in > >> > #/etc/sysconfig/syslog > >> > # > >> > # 2) configure local2 events to go to the /var/log/haproxy.log > >> > # file. A line like the following can be added to > >> > # /etc/sysconfig/syslog > >> > # > >> > #local2.* /var/log/haproxy.log > >> > # > >> > log 127.0.0.1 local2 info > >> > > >> > chroot /var/lib/haproxy > >> > pidfile /var/run/haproxy.pid > >> > maxconn 4000 > >> > userhaproxy > >> > group haproxy > >> > daemon > >> > > >> > #enable stats > >> > stats socket /tmp/haproxy.sock > >> > > >> > listen ha_stats :8088 > >> > balance source > >> > mode http > >> > timeout client 3ms > >> > stats enable > >> > stats auth haproxystats:foobar > >> > stats uri /haproxy?stats > >> > > >> > #- > >> > # common defaults that all the 'listen' and 'backend' sections will > >> > # use if not designated in their block > >> > #- > >> > defaults > >> > modehttp > >> > log global > >> > option httplog > >> > option dontlognull > >> > #keep persisten client connection open > >> > option http-server-close > >> > option forwardfor except 127.0.0.0
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Well, let say this is a workaround... We'll definitively have to fix the bug ;) Baptiste On Sat, Jan 11, 2014 at 12:24 AM, Steve Ruiz wrote: > Made those changes, and it seems to be working properly, no segfault yet > after ~2 minutes of checks. Thanks! > > Steve Ruiz > Manager - Hosting Operations > Mirth > ste...@mirth.com > > > On Fri, Jan 10, 2014 at 3:06 PM, Baptiste wrote: >> >> Hi Steve, >> >> Could you give a try to the tcp-check and tell us if your have the same >> issue. >> In your backend, turn your httpchk related directives into: >> option tcp-check >> tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n >> tcp-check send \r\n >> tcp-check expect string good >> >> Baptiste >> >> >> On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz wrote: >> > I'm experimenting with haproxy on a centos6 VM here. I found that when >> > I >> > specified a health check page (option httpchk GET /url), and that page >> > didn't exist, we have a large 404 page returned, and that causes haproxy >> > to >> > quickly segfault (seems like on the second try GET'ing and parsing the >> > page). I couldn't figure out from the website where to submit a bug, so >> > I >> > figure I'll try here first. >> > >> > Steps to reproduce: >> > - setup http backend, with option httpchk and httpcheck expect string x. >> > Make option httpchk point to a non-existent page >> > - On backend server, set it up to serve large 404 response (in my case, >> > the >> > 404 page is 186kB, as it has an inline graphic and inline css) >> > - Start haproxy, and wait for it to segfault >> > >> > I wasn't sure exactly what was causing this at first, so I did some work >> > to >> > narrow it down with GDB. The variable values from gdb led me to the >> > cause >> > on my side, and hopefully can help you fix the issue. I could not make >> > this >> > work with simply a large page for the http response - in that case, it >> > seems >> > to work as advertised, only inspecting the response up to tune.chksize >> > (default 16384 as i've left it). But if I do this with a 404, it seems >> > to >> > kill it. Let me know what additional information you need if any. >> > Thanks >> > and kudos for the great bit of software! >> > >> > >> > #haproxy config: >> > #- >> > # Example configuration for a possible web application. See the >> > # full configuration options online. >> > # >> > # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt >> > # >> > #- >> > >> > # Help in developing config here: >> > # https://www.twilio.com/engineering/2013/10/16/haproxy >> > >> > >> > #- >> > # Global settings >> > #- >> > global >> > # to have these messages end up in /var/log/haproxy.log you will >> > # need to: >> > # >> > # 1) configure syslog to accept network log events. This is done >> > #by adding the '-r' option to the SYSLOGD_OPTIONS in >> > #/etc/sysconfig/syslog >> > # >> > # 2) configure local2 events to go to the /var/log/haproxy.log >> > # file. A line like the following can be added to >> > # /etc/sysconfig/syslog >> > # >> > #local2.* /var/log/haproxy.log >> > # >> > log 127.0.0.1 local2 info >> > >> > chroot /var/lib/haproxy >> > pidfile /var/run/haproxy.pid >> > maxconn 4000 >> > userhaproxy >> > group haproxy >> > daemon >> > >> > #enable stats >> > stats socket /tmp/haproxy.sock >> > >> > listen ha_stats :8088 >> > balance source >> > mode http >> > timeout client 3ms >> > stats enable >> > stats auth haproxystats:foobar >> > stats uri /haproxy?stats >> > >> > #- >> > # common defaults that all the 'listen' and 'backend' sections will >> > # use if not designated in their block >> > #- >> > defaults >> > modehttp >> > log global >> > option httplog >> > option dontlognull >> > #keep persisten client connection open >> > option http-server-close >> > option forwardfor except 127.0.0.0/8 >> > option redispatch >> > # Limit number of retries - total time trying to connect = connect >> > timeout * (#retries + 1) >> > retries 2 >> > timeout http-request10s >> > timeout queue 1m >> > #timeout opening a tcp connection to server - should be shorter than >> > timeout client and server >> > timeout connect 3100 >> > timeout client 30s >> > ti
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Made those changes, and it seems to be working properly, no segfault yet after ~2 minutes of checks. Thanks! Steve Ruiz Manager - Hosting Operations Mirth ste...@mirth.com On Fri, Jan 10, 2014 at 3:06 PM, Baptiste wrote: > Hi Steve, > > Could you give a try to the tcp-check and tell us if your have the same > issue. > In your backend, turn your httpchk related directives into: > option tcp-check > tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n > tcp-check send \r\n > tcp-check expect string good > > Baptiste > > > On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz wrote: > > I'm experimenting with haproxy on a centos6 VM here. I found that when I > > specified a health check page (option httpchk GET /url), and that page > > didn't exist, we have a large 404 page returned, and that causes haproxy > to > > quickly segfault (seems like on the second try GET'ing and parsing the > > page). I couldn't figure out from the website where to submit a bug, so > I > > figure I'll try here first. > > > > Steps to reproduce: > > - setup http backend, with option httpchk and httpcheck expect string x. > > Make option httpchk point to a non-existent page > > - On backend server, set it up to serve large 404 response (in my case, > the > > 404 page is 186kB, as it has an inline graphic and inline css) > > - Start haproxy, and wait for it to segfault > > > > I wasn't sure exactly what was causing this at first, so I did some work > to > > narrow it down with GDB. The variable values from gdb led me to the > cause > > on my side, and hopefully can help you fix the issue. I could not make > this > > work with simply a large page for the http response - in that case, it > seems > > to work as advertised, only inspecting the response up to tune.chksize > > (default 16384 as i've left it). But if I do this with a 404, it seems > to > > kill it. Let me know what additional information you need if any. > Thanks > > and kudos for the great bit of software! > > > > > > #haproxy config: > > #- > > # Example configuration for a possible web application. See the > > # full configuration options online. > > # > > # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt > > # > > #- > > > > # Help in developing config here: > > # https://www.twilio.com/engineering/2013/10/16/haproxy > > > > > > #- > > # Global settings > > #- > > global > > # to have these messages end up in /var/log/haproxy.log you will > > # need to: > > # > > # 1) configure syslog to accept network log events. This is done > > #by adding the '-r' option to the SYSLOGD_OPTIONS in > > #/etc/sysconfig/syslog > > # > > # 2) configure local2 events to go to the /var/log/haproxy.log > > # file. A line like the following can be added to > > # /etc/sysconfig/syslog > > # > > #local2.* /var/log/haproxy.log > > # > > log 127.0.0.1 local2 info > > > > chroot /var/lib/haproxy > > pidfile /var/run/haproxy.pid > > maxconn 4000 > > userhaproxy > > group haproxy > > daemon > > > > #enable stats > > stats socket /tmp/haproxy.sock > > > > listen ha_stats :8088 > > balance source > > mode http > > timeout client 3ms > > stats enable > > stats auth haproxystats:foobar > > stats uri /haproxy?stats > > > > #- > > # common defaults that all the 'listen' and 'backend' sections will > > # use if not designated in their block > > #- > > defaults > > modehttp > > log global > > option httplog > > option dontlognull > > #keep persisten client connection open > > option http-server-close > > option forwardfor except 127.0.0.0/8 > > option redispatch > > # Limit number of retries - total time trying to connect = connect > > timeout * (#retries + 1) > > retries 2 > > timeout http-request10s > > timeout queue 1m > > #timeout opening a tcp connection to server - should be shorter than > > timeout client and server > > timeout connect 3100 > > timeout client 30s > > timeout server 30s > > timeout http-keep-alive 10s > > timeout check 10s > > maxconn 3000 > > > > #- > > # main frontend which proxys to the backends > > #
Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
Hi Steve, Could you give a try to the tcp-check and tell us if your have the same issue. In your backend, turn your httpchk related directives into: option tcp-check tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n tcp-check send \r\n tcp-check expect string good Baptiste On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz wrote: > I'm experimenting with haproxy on a centos6 VM here. I found that when I > specified a health check page (option httpchk GET /url), and that page > didn't exist, we have a large 404 page returned, and that causes haproxy to > quickly segfault (seems like on the second try GET'ing and parsing the > page). I couldn't figure out from the website where to submit a bug, so I > figure I'll try here first. > > Steps to reproduce: > - setup http backend, with option httpchk and httpcheck expect string x. > Make option httpchk point to a non-existent page > - On backend server, set it up to serve large 404 response (in my case, the > 404 page is 186kB, as it has an inline graphic and inline css) > - Start haproxy, and wait for it to segfault > > I wasn't sure exactly what was causing this at first, so I did some work to > narrow it down with GDB. The variable values from gdb led me to the cause > on my side, and hopefully can help you fix the issue. I could not make this > work with simply a large page for the http response - in that case, it seems > to work as advertised, only inspecting the response up to tune.chksize > (default 16384 as i've left it). But if I do this with a 404, it seems to > kill it. Let me know what additional information you need if any. Thanks > and kudos for the great bit of software! > > > #haproxy config: > #- > # Example configuration for a possible web application. See the > # full configuration options online. > # > # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt > # > #- > > # Help in developing config here: > # https://www.twilio.com/engineering/2013/10/16/haproxy > > > #- > # Global settings > #- > global > # to have these messages end up in /var/log/haproxy.log you will > # need to: > # > # 1) configure syslog to accept network log events. This is done > #by adding the '-r' option to the SYSLOGD_OPTIONS in > #/etc/sysconfig/syslog > # > # 2) configure local2 events to go to the /var/log/haproxy.log > # file. A line like the following can be added to > # /etc/sysconfig/syslog > # > #local2.* /var/log/haproxy.log > # > log 127.0.0.1 local2 info > > chroot /var/lib/haproxy > pidfile /var/run/haproxy.pid > maxconn 4000 > userhaproxy > group haproxy > daemon > > #enable stats > stats socket /tmp/haproxy.sock > > listen ha_stats :8088 > balance source > mode http > timeout client 3ms > stats enable > stats auth haproxystats:foobar > stats uri /haproxy?stats > > #- > # common defaults that all the 'listen' and 'backend' sections will > # use if not designated in their block > #- > defaults > modehttp > log global > option httplog > option dontlognull > #keep persisten client connection open > option http-server-close > option forwardfor except 127.0.0.0/8 > option redispatch > # Limit number of retries - total time trying to connect = connect > timeout * (#retries + 1) > retries 2 > timeout http-request10s > timeout queue 1m > #timeout opening a tcp connection to server - should be shorter than > timeout client and server > timeout connect 3100 > timeout client 30s > timeout server 30s > timeout http-keep-alive 10s > timeout check 10s > maxconn 3000 > > #- > # main frontend which proxys to the backends > #- > frontend https_frontend > bind :80 > redirect scheme https if !{ ssl_fc } > > #config help: > https://github.com/observing/balancerbattle/blob/master/haproxy.cfg > bind *:443 ssl crt /etc/certs/mycert.pem ciphers > RC4-SHA:AES128-SHA:AES:!ADH:!aNULL:!DH:!EDH:!eNULL > mode http > default_backend webapp > > #- > # Main backend for web application servers > #
Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)
I'm experimenting with haproxy on a centos6 VM here. I found that when I specified a health check page (option httpchk GET /url), and that page didn't exist, we have a large 404 page returned, and that causes haproxy to quickly segfault (seems like on the second try GET'ing and parsing the page). I couldn't figure out from the website where to submit a bug, so I figure I'll try here first. Steps to reproduce: - setup http backend, with option httpchk and httpcheck expect string x. Make option httpchk point to a non-existent page - On backend server, set it up to serve large 404 response (in my case, the 404 page is 186kB, as it has an inline graphic and inline css) - Start haproxy, and wait for it to segfault I wasn't sure exactly what was causing this at first, so I did some work to narrow it down with GDB. The variable values from gdb led me to the cause on my side, and hopefully can help you fix the issue. I could not make this work with simply a large page for the http response - in that case, it seems to work as advertised, only inspecting the response up to tune.chksize (default 16384 as i've left it). But if I do this with a 404, it seems to kill it. Let me know what additional information you need if any. Thanks and kudos for the great bit of software! *#haproxy config:* #- # Example configuration for a possible web application. See the # full configuration options online. # # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt # #- # Help in developing config here: # https://www.twilio.com/engineering/2013/10/16/haproxy #- # Global settings #- global # to have these messages end up in /var/log/haproxy.log you will # need to: # # 1) configure syslog to accept network log events. This is done #by adding the '-r' option to the SYSLOGD_OPTIONS in #/etc/sysconfig/syslog # # 2) configure local2 events to go to the /var/log/haproxy.log # file. A line like the following can be added to # /etc/sysconfig/syslog # #local2.* /var/log/haproxy.log # log 127.0.0.1 local2 info chroot /var/lib/haproxy pidfile /var/run/haproxy.pid maxconn 4000 userhaproxy group haproxy daemon #enable stats stats socket /tmp/haproxy.sock listen ha_stats :8088 balance source mode http timeout client 3ms stats enable stats auth haproxystats:foobar stats uri /haproxy?stats #- # common defaults that all the 'listen' and 'backend' sections will # use if not designated in their block #- defaults modehttp log global option httplog option dontlognull #keep persisten client connection open option http-server-close option forwardfor except 127.0.0.0/8 option redispatch # Limit number of retries - total time trying to connect = connect timeout * (#retries + 1) retries 2 timeout http-request10s timeout queue 1m #timeout opening a tcp connection to server - should be shorter than timeout client and server timeout connect 3100 timeout client 30s timeout server 30s timeout http-keep-alive 10s timeout check 10s maxconn 3000 #- # main frontend which proxys to the backends #- frontend https_frontend bind :80 redirect scheme https if !{ ssl_fc } #config help: https://github.com/observing/balancerbattle/blob/master/haproxy.cfg bind *:443 ssl crt /etc/certs/mycert.pem ciphers RC4-SHA:AES128-SHA:AES:!ADH:!aNULL:!DH:!EDH:!eNULL mode http default_backend webapp #- # Main backend for web application servers #- backend webapp balance roundrobin #Insert cookie SERVERID to pin it to one leg cookie SERVERID insert nocache indirect #http check should pull url below option httpchk GET /cp/testcheck.html HTTP/1.0 #option httpchk GET /cp/testcheck.php HTTP/1.0 #http check should find string below in response to be considered up http-check expect string good #Define servers - inter=interval of 5s, rise 2=become avail after 2 successful checks, fall 3=take out after 3 fails