Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Willy Tarreau
Hi Cyril,

On Thu, Jan 16, 2014 at 10:48:10PM +0100, Cyril Bonté wrote:
 Hi Willy,
 
 Le 15/01/2014 01:08, Willy Tarreau a écrit :
 On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:
 Patched and confirmed in our environment that this is now working / seems
 to have fixed the issue. Thanks!
 
 Great, many thanks to you both guys. We've got rid of another pretty
 old bug, these are the ones that make me the happiest once fixed!
 
 I'm currently unpacking my laptop to push the fix so that it appears
 in todays snapshot.
 
 Excellent work!
 
 I fear there are some more work to do on this patch.
 I made some tests on ssl and it looks to be broken since this commit :-(
 
 The shortest configuration I could find to reproduce the issue is :
   listen test
 bind 0.0.0.0:443 ssl crt cert.pem
 mode http
 timeout server 5s
 timeout client 5s
 
 When a request is received by haproxy, the cpu raises to 100% in a 
 epoll_wait loop (timeouts are here to prevent an unlimited loop).
 
 $ curl -k https://localhost/
 ...
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 ...

So I might have broken something in the way to count the try value,
ending up with zero being selected and nothing done. Unfortunately it
works fine here.

Could you try to single-step in gdb through ssl_sock_to_buf ?

I'll continue to test if I can reproduce it and also to understand
in which case we could end up with a wrong size computation.

Thanks,
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Willy Tarreau
On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:
 So I might have broken something in the way to count the try value,
 ending up with zero being selected and nothing done. Unfortunately it
 works fine here.

OK I can reproduce it in 32-bit now. Let's see what happens...

Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Willy Tarreau
On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote:
 On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:
  So I might have broken something in the way to count the try value,
  ending up with zero being selected and nothing done. Unfortunately it
  works fine here.
 
 OK I can reproduce it in 32-bit now. Let's see what happens...

OK here's the fix. I'm ashamed for not having noticed this mistake during
the change. I ported the raw_sock changes to ssl_sock, it was pretty
straght-forward but I missed the condition in the while () loop. And
unfortunately, the variable happened to be non-zero in the stack,
resulting in something working well for me :-/

I've pushed the fix.

Thanks guys.

Willy



From 00b0fb9349b8842a5ec2cee9dc4f286c8d3a3685 Mon Sep 17 00:00:00 2001
From: Willy Tarreau w...@1wt.eu
Date: Fri, 17 Jan 2014 11:09:40 +0100
Subject: BUG/MAJOR: ssl: fix breakage caused by recent fix abf08d9
MIME-Version: 1.0
Content-Type: text/plain; charset=latin1
Content-Transfer-Encoding: 8bit

Recent commit abf08d9 (BUG/MAJOR: connection: fix mismatch between rcv_buf's
API and usage) accidentely broke SSL by relying on an uninitialized value to
enter the read loop.

Many thanks to Cyril Bonté and Steve Ruiz for reporting this issue.
---
 src/ssl_sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 7120ff8..87a2a58 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -1353,7 +1353,7 @@ static int ssl_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
 * in which case we accept to do it once again. A new attempt is made on
 * EINTR too.
 */
-   while (try) {
+   while (count  0) {
/* first check if we have some room after p+i */
try = buf-data + buf-size - (buf-p + buf-i);
/* otherwise continue between data and p-o */
-- 
1.7.12.1




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Cyril Bonté

Le 17/01/2014 11:14, Willy Tarreau a écrit :

On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote:

On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:

So I might have broken something in the way to count the try value,
ending up with zero being selected and nothing done. Unfortunately it
works fine here.


OK I can reproduce it in 32-bit now. Let's see what happens...


OK here's the fix. I'm ashamed for not having noticed this mistake during
the change. I ported the raw_sock changes to ssl_sock, it was pretty
straght-forward but I missed the condition in the while () loop. And
unfortunately, the variable happened to be non-zero in the stack,
resulting in something working well for me :-/

I've pushed the fix.


Great ! I didn't have time to try to fix it yesterday.
Everything is working well now, we can definitely close this bug, that's 
a good thing ;-)


--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Steve Ruiz
Confirmed on my side as well. No segfault, and no spinning CPU with the
latest patch.

thanks!

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com ste...@mirthcorp.com


On Fri, Jan 17, 2014 at 10:25 AM, Cyril Bonté cyril.bo...@free.fr wrote:

 Le 17/01/2014 11:14, Willy Tarreau a écrit :

  On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote:

 On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:

 So I might have broken something in the way to count the try value,
 ending up with zero being selected and nothing done. Unfortunately it
 works fine here.


 OK I can reproduce it in 32-bit now. Let's see what happens...


 OK here's the fix. I'm ashamed for not having noticed this mistake during
 the change. I ported the raw_sock changes to ssl_sock, it was pretty
 straght-forward but I missed the condition in the while () loop. And
 unfortunately, the variable happened to be non-zero in the stack,
 resulting in something working well for me :-/

 I've pushed the fix.


 Great ! I didn't have time to try to fix it yesterday.
 Everything is working well now, we can definitely close this bug, that's a
 good thing ;-)

 --
 Cyril Bonté


-- 
CONFIDENTIALITY NOTICE: The information contained in this electronic 
transmission may be confidential. If you are not an intended recipient, be 
aware that any disclosure, copying, distribution or use of the information 
contained in this transmission is prohibited and may be unlawful. If you 
have received this transmission in error, please notify us by email reply 
and then erase it from your computer system.


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-16 Thread Cyril Bonté

Hi Willy,

Le 15/01/2014 01:08, Willy Tarreau a écrit :

On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:

Patched and confirmed in our environment that this is now working / seems
to have fixed the issue. Thanks!


Great, many thanks to you both guys. We've got rid of another pretty
old bug, these are the ones that make me the happiest once fixed!

I'm currently unpacking my laptop to push the fix so that it appears
in todays snapshot.

Excellent work!


I fear there are some more work to do on this patch.
I made some tests on ssl and it looks to be broken since this commit :-(

The shortest configuration I could find to reproduce the issue is :
  listen test
bind 0.0.0.0:443 ssl crt cert.pem
mode http
timeout server 5s
timeout client 5s

When a request is received by haproxy, the cpu raises to 100% in a 
epoll_wait loop (timeouts are here to prevent an unlimited loop).


$ curl -k https://localhost/
...
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
...
The same issue occurs when a server is declared.

The same also occurs when the proxy is in clear http and a server is in 
https :

  listen test
bind 0.0.0.0:80
mode http
timeout server 5s
timeout client 5s
server ssl_backend 127.0.0.1:443 ssl

$ curl http://localhost/
...
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
...


--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-16 Thread Steve Ruiz
Cyril is correct - I simply waited for a segfault, but didn't actually test
through the load balancer. I'm using SSL on haproxy, and yes, when I try to
hit a web page behind haproxy, CPU spins at 100% for a good while.

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com ste...@mirthcorp.com


On Thu, Jan 16, 2014 at 1:48 PM, Cyril Bonté cyril.bo...@free.fr wrote:

 Hi Willy,

 Le 15/01/2014 01:08, Willy Tarreau a écrit :

  On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:

 Patched and confirmed in our environment that this is now working / seems
 to have fixed the issue. Thanks!


 Great, many thanks to you both guys. We've got rid of another pretty
 old bug, these are the ones that make me the happiest once fixed!

 I'm currently unpacking my laptop to push the fix so that it appears
 in todays snapshot.

 Excellent work!


 I fear there are some more work to do on this patch.
 I made some tests on ssl and it looks to be broken since this commit :-(

 The shortest configuration I could find to reproduce the issue is :
   listen test
 bind 0.0.0.0:443 ssl crt cert.pem
 mode http
 timeout server 5s
 timeout client 5s

 When a request is received by haproxy, the cpu raises to 100% in a
 epoll_wait loop (timeouts are here to prevent an unlimited loop).

 $ curl -k https://localhost/
 ...
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
 ...
 The same issue occurs when a server is declared.

 The same also occurs when the proxy is in clear http and a server is in
 https :
   listen test
 bind 0.0.0.0:80
 mode http
 timeout server 5s
 timeout client 5s
 server ssl_backend 127.0.0.1:443 ssl

 $ curl http://localhost/
 ...
 epoll_wait(3, {}, 200, 0)   = 0
 epoll_wait(3, {}, 200, 0)   = 0
 epoll_wait(3, {}, 200, 0)   = 0
 epoll_wait(3, {}, 200, 0)   = 0
 epoll_wait(3, {}, 200, 0)   = 0
 ...


 --
 Cyril Bonté


-- 
CONFIDENTIALITY NOTICE: The information contained in this electronic 
transmission may be confidential. If you are not an intended recipient, be 
aware that any disclosure, copying, distribution or use of the information 
contained in this transmission is prohibited and may be unlawful. If you 
have received this transmission in error, please notify us by email reply 
and then erase it from your computer system.


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Willy Tarreau
Hi Cyril,

On Tue, Jan 14, 2014 at 08:23:00AM +0100, Willy Tarreau wrote:
 Hey, excellent catch! You're absolutely right. I'm totally ashamed
 for not having found it while reading the code. I was searching for
 a place where a wrong computation could lead to something larger
 than the buffer and forgot to check for multiple reads of the
 buffer's size :-)

Now thinking about it a little bit more, I think we have an API problem
in fact. The raw_sock_to_buf() functions says :

/* Receive up to count bytes from connection conn's socket and store them
 * into buffer buf. The caller must ensure that count is always smaller
 * than the buffer's size.
 */
 
But as you found, this is misleading as it doesn't work that well, since
the caller needs to take care of not asking for too much data. So I'm
thinking about changing the API instead so that the caller doesn't have
to care abou this and that only the read functions do. Anyway, they
already care about free space wrapping at the end of the buffer.

So I'd rather fix raw_sock_to_buf() and ssl_sock_to_buf() with a patch
like this one, and simplify the logic at some call places. It would make
the code much more robust and protect us against such bugs in the future.

Could you please give it a try in your environment ?

Thanks,
Willy


diff --git a/src/raw_sock.c b/src/raw_sock.c
index 4dc1c7a..2e3a0cb 100644
--- a/src/raw_sock.c
+++ b/src/raw_sock.c
@@ -226,8 +226,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 
 
 /* Receive up to count bytes from connection conn's socket and store them
- * into buffer buf. The caller must ensure that count is always smaller
- * than the buffer's size. Only one call to recv() is performed, unless the
+ * into buffer buf. Only one call to recv() is performed, unless the
  * buffer wraps, in which case a second call may be performed. The connection's
  * flags are updated with whatever special event is detected (error, read0,
  * empty). The caller is responsible for taking care of those events and
@@ -239,7 +238,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int 
count)
 {
int ret, done = 0;
-   int try = count;
+   int try;
 
if (!(conn-flags  CO_FL_CTRL_READY))
return 0;
@@ -258,24 +257,27 @@ static int raw_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
}
}
 
-   /* compute the maximum block size we can read at once. */
-   if (buffer_empty(buf)) {
-   /* let's realign the buffer to optimize I/O */
+   /* let's realign the buffer to optimize I/O */
+   if (buffer_empty(buf))
buf-p = buf-data;
-   }
-   else if (buf-data + buf-o  buf-p 
-buf-p + buf-i  buf-data + buf-size) {
-   /* remaining space wraps at the end, with a moving limit */
-   if (try  buf-data + buf-size - (buf-p + buf-i))
-   try = buf-data + buf-size - (buf-p + buf-i);
-   }
 
/* read the largest possible block. For this, we perform only one call
 * to recv() unless the buffer wraps and we exactly fill the first hunk,
 * in which case we accept to do it once again. A new attempt is made on
 * EINTR too.
 */
-   while (try) {
+   while (count  0) {
+   /* first check if we have some room after p+i */
+   try = buf-data + buf-size - (buf-p + buf-i);
+   /* otherwise continue between data and p-o */
+   if (try = 0) {
+   try = buf-p - (buf-data + buf-o);
+   if (try = 0)
+   break;
+   }
+   if (try  count)
+   try = count;
+
ret = recv(conn-t.sock.fd, bi_end(buf), try, 0);
 
if (ret  0) {
@@ -291,7 +293,6 @@ static int raw_sock_to_buf(struct connection *conn, struct 
buffer *buf, int coun
break;
}
count -= ret;
-   try = count;
}
else if (ret == 0) {
goto read0;



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Willy Tarreau
OK here's a proposed fix which addresses the API issue for both
raw_sock and ssl_sock.

Steve, it would be nice if you could give it a try just to confirm
I didn't miss anything.

Thanks,
Willy

From 3e499a6da1ca070f23083c874aa48895f00d0d6f Mon Sep 17 00:00:00 2001
From: Willy Tarreau w...@1wt.eu
Date: Tue, 14 Jan 2014 11:31:27 +0100
Subject: BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage
MIME-Version: 1.0
Content-Type: text/plain; charset=latin1
Content-Transfer-Encoding: 8bit

Steve Ruiz reported some reproducible crashes with HTTP health checks
on a certain page returning a huge length. The traces he provided
clearly showed that the recv() call was performed twice for a total
size exceeding the buffer's length.

Cyril Bonté tracked down the problem to be caused by the full buffer
size being passed to rcv_buf() in event_srv_chk_r() instead of passing
just the remaining amount of space. Indeed, this change happened during
the connection rework in 1.5-dev13 with the following commit :

f150317 MAJOR: checks: completely use the connection transport layer

But one of the problems is also that the comments at the top of the
rcv_buf() functions suggest that the caller only has to ensure the
requested size doesn't overflow the buffer's size.

Also, these functions already have to care about the buffer's size to
handle wrapping free space when there are pending data in the buffer.
So let's change the API instead to more closely match what could be
expected from these functions :

- the caller asks for the maximum amount of bytes it wants to read ;
This means that only the caller is responsible for enforcing the
reserve if it wants to (eg: checks don't).

- the rcv_buf() functions fix their computations to always consider
this size as a max, and always perform validity checks based on
the buffer's free space.

As a result, the code is simplified and reduced, and made more robust
for callers which now just have to care about whether they want the
buffer to be filled or not.

Since the bug was introduced in 1.5-dev13, no backport to stable versions
is needed.
---
 src/checks.c   |  2 +-
 src/raw_sock.c | 31 ---
 src/ssl_sock.c | 29 +++--
 3 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/src/checks.c b/src/checks.c
index 3237304..2274136 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2065,7 +2065,7 @@ static void tcpcheck_main(struct connection *conn)
goto out_end_tcpcheck;
 
if ((conn-flags  CO_FL_WAIT_RD) ||
-   conn-xprt-rcv_buf(conn, check-bi, 
buffer_total_space(check-bi)) = 0) {
+   conn-xprt-rcv_buf(conn, check-bi, 
check-bi-size) = 0) {
if (conn-flags  (CO_FL_ERROR | 
CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH)) {
done = 1;
if ((conn-flags  CO_FL_ERROR)  
!check-bi-i) {
diff --git a/src/raw_sock.c b/src/raw_sock.c
index 4dc1c7a..2e3a0cb 100644
--- a/src/raw_sock.c
+++ b/src/raw_sock.c
@@ -226,8 +226,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 
 
 /* Receive up to count bytes from connection conn's socket and store them
- * into buffer buf. The caller must ensure that count is always smaller
- * than the buffer's size. Only one call to recv() is performed, unless the
+ * into buffer buf. Only one call to recv() is performed, unless the
  * buffer wraps, in which case a second call may be performed. The connection's
  * flags are updated with whatever special event is detected (error, read0,
  * empty). The caller is responsible for taking care of those events and
@@ -239,7 +238,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int 
count)
 {
int ret, done = 0;
-   int try = count;
+   int try;
 
if (!(conn-flags  CO_FL_CTRL_READY))
return 0;
@@ -258,24 +257,27 @@ static int raw_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
}
}
 
-   /* compute the maximum block size we can read at once. */
-   if (buffer_empty(buf)) {
-   /* let's realign the buffer to optimize I/O */
+   /* let's realign the buffer to optimize I/O */
+   if (buffer_empty(buf))
buf-p = buf-data;
-   }
-   else if (buf-data + buf-o  buf-p 
-buf-p + buf-i  buf-data + buf-size) {
-   /* remaining space wraps at the end, with a moving limit */
-   if (try  buf-data + buf-size - (buf-p + buf-i))
-   try = buf-data + buf-size - (buf-p + buf-i);
-   }
 
/* read the largest possible block. For this, we perform only one call
 * to recv() unless the buffer wraps and we exactly fill the first hunk,
 * in which 

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Cyril Bonté

Hi again Willy,

Le 14/01/2014 12:22, Willy Tarreau a écrit :

OK here's a proposed fix which addresses the API issue for both
raw_sock and ssl_sock.

Steve, it would be nice if you could give it a try just to confirm
I didn't miss anything.


OK, from my side, now I'm on the laptop where I can reproduce the 
segfault, I confirm it doesn't crash anymore once the patch is applied 
(which was predictable from the quick test I made this afternoon).


Let's see if it's OK for Steve too ;-)

--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Baptiste
Willy, have you validated this version in our lab as well

Baptiste
 Le 14 janv. 2014 19:21, Cyril Bonté cyril.bo...@free.fr a écrit :

 Hi again Willy,

 Le 14/01/2014 12:22, Willy Tarreau a écrit :

 OK here's a proposed fix which addresses the API issue for both
 raw_sock and ssl_sock.

 Steve, it would be nice if you could give it a try just to confirm
 I didn't miss anything.


 OK, from my side, now I'm on the laptop where I can reproduce the
 segfault, I confirm it doesn't crash anymore once the patch is applied
 (which was predictable from the quick test I made this afternoon).

 Let's see if it's OK for Steve too ;-)

 --
 Cyril Bonté




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Steve Ruiz
Patched and confirmed in our environment that this is now working / seems
to have fixed the issue. Thanks!

Steve Ruiz


On Tue, Jan 14, 2014 at 3:22 AM, Willy Tarreau w...@1wt.eu wrote:

 OK here's a proposed fix which addresses the API issue for both
 raw_sock and ssl_sock.

 Steve, it would be nice if you could give it a try just to confirm
 I didn't miss anything.

 Thanks,
 Willy



-- 
CONFIDENTIALITY NOTICE: The information contained in this electronic 
transmission may be confidential. If you are not an intended recipient, be 
aware that any disclosure, copying, distribution or use of the information 
contained in this transmission is prohibited and may be unlawful. If you 
have received this transmission in error, please notify us by email reply 
and then erase it from your computer system.


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Willy Tarreau
On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:
 Patched and confirmed in our environment that this is now working / seems
 to have fixed the issue. Thanks!

Great, many thanks to you both guys. We've got rid of another pretty
old bug, these are the ones that make me the happiest once fixed!

I'm currently unpacking my laptop to push the fix so that it appears
in todays snapshot.

Excellent work!
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
Hi again Steve,

On Mon, Jan 13, 2014 at 08:44:08AM +0100, Willy Tarreau wrote:
 Hi Steve,
 
 On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote:
  I'm experimenting with haproxy on a centos6 VM here.  I found that when I
  specified a health check page (option httpchk GET /url), and that page
  didn't exist, we have a large 404 page returned, and that causes haproxy to
  quickly segfault (seems like on the second try GET'ing and parsing the
  page).  I couldn't figure out from the website where to submit a bug, so I
  figure I'll try here first.
  
  Steps to reproduce:
  - setup http backend, with option httpchk and httpcheck expect string x.
  Make option httpchk point to a non-existent page
  - On backend server, set it up to serve large 404 response (in my case, the
  404 page is 186kB, as it has an inline graphic and inline css)
  - Start haproxy, and wait for it to segfault
  
  I wasn't sure exactly what was causing this at first, so I did some work to
  narrow it down with GDB.  The variable values from gdb led me to the cause
  on my side, and hopefully can help you fix the issue.  I could not make
  this work with simply a large page for the http response - in that case, it
  seems to work as advertised, only inspecting the response up to
  tune.chksize (default 16384 as i've left it).  But if I do this with a 404,
  it seems to kill it.  Let me know what additional information you need if
  any.  Thanks and kudos for the great bit of software!
 
 Thanks for all these details. I remember that the http-expect code puts
 a zero at the end of the received buffer prior to looking up the string.
 But it might be possible that there would be some cases where it doesn't
 do it, or maybe it dies after restoring it. Another thing I'm thinking
 about is that we're using the trash buffer for many operations and I'm
 realizing that the check buffer's size might possibly be larger :-/

I'm a bit puzzled, not only I cannot reproduce the issue, but also I do
not see in the code how this could happen, so I must be missing something.
Could you please post the output of strace -tt on haproxy when it does
this ? Especially the last checks ? I'm suspecting an anomaly in the receive
buffer size calculation but all I read here seems fine, which puzzles me.

Thanks!
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Tim Prepscius
Willy,

Can you take me off of this list?

Unsubscribing doesn't work. I have no idea why.  I've tried many times.
The last time I tried, I got back a message that gmail was identified
as a spammer.


Here is a sample of my unsubscribe message:

- snip 

MIME-Version: 1.0
Received: by 10.140.86.244 with HTTP; Thu, 9 Jan 2014 19:38:56 -0800 (PST)
Date: Thu, 9 Jan 2014 22:38:56 -0500
Delivered-To: timprepsc...@gmail.com
Message-ID: CAAJ3AvX7XuqRAQkDGZ4k8DqVDsp2Mt=2mnwsyrhsvnb_bwh...@mail.gmail.com
Subject: unsubscribe
From: Tim Prepscius timprepsc...@gmail.com
To: haproxy+unsubscr...@formilux.org
Content-Type: text/plain; charset=ISO-8859-1

unsubscribe

- snip 

Thank you,

-tim

On 1/13/14, Willy Tarreau w...@1wt.eu wrote:
 Hi again Steve,

 On Mon, Jan 13, 2014 at 08:44:08AM +0100, Willy Tarreau wrote:
 Hi Steve,

 On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote:
  I'm experimenting with haproxy on a centos6 VM here.  I found that when
  I
  specified a health check page (option httpchk GET /url), and that page
  didn't exist, we have a large 404 page returned, and that causes haproxy
  to
  quickly segfault (seems like on the second try GET'ing and parsing the
  page).  I couldn't figure out from the website where to submit a bug, so
  I
  figure I'll try here first.
 
  Steps to reproduce:
  - setup http backend, with option httpchk and httpcheck expect string
  x.
  Make option httpchk point to a non-existent page
  - On backend server, set it up to serve large 404 response (in my case,
  the
  404 page is 186kB, as it has an inline graphic and inline css)
  - Start haproxy, and wait for it to segfault
 
  I wasn't sure exactly what was causing this at first, so I did some work
  to
  narrow it down with GDB.  The variable values from gdb led me to the
  cause
  on my side, and hopefully can help you fix the issue.  I could not make
  this work with simply a large page for the http response - in that case,
  it
  seems to work as advertised, only inspecting the response up to
  tune.chksize (default 16384 as i've left it).  But if I do this with a
  404,
  it seems to kill it.  Let me know what additional information you need
  if
  any.  Thanks and kudos for the great bit of software!

 Thanks for all these details. I remember that the http-expect code puts
 a zero at the end of the received buffer prior to looking up the string.
 But it might be possible that there would be some cases where it doesn't
 do it, or maybe it dies after restoring it. Another thing I'm thinking
 about is that we're using the trash buffer for many operations and I'm
 realizing that the check buffer's size might possibly be larger :-/

 I'm a bit puzzled, not only I cannot reproduce the issue, but also I do
 not see in the code how this could happen, so I must be missing something.
 Could you please post the output of strace -tt on haproxy when it does
 this ? Especially the last checks ? I'm suspecting an anomaly in the
 receive
 buffer size calculation but all I read here seems fine, which puzzles me.

 Thanks!
 Willy






Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
Hi Tim,

On Mon, Jan 13, 2014 at 12:25:30PM -0500, Tim Prepscius wrote:
 Willy,
 
 Can you take me off of this list?

done!

 Unsubscribing doesn't work. I have no idea why.  I've tried many times.
 The last time I tried, I got back a message that gmail was identified
 as a spammer.

This is the reason. From time to time, gmail seems to be marked as a spammer
by some RBLs. I absolutely hate RBLs beyond imagination for this exact reason,
clueless bots marking any sender as spammer, sometimes helped by stupid or
arrogant people. But anyway they help keeping the spam rate low enough so we
keep them. Overall it doesn't work too bad anyway.

Regards,
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
On Mon, Jan 13, 2014 at 10:10:45AM -0800, Steve Ruiz wrote:
 sure thing, trace attached.  Looking at the page returned, the only strange
 thing I can see is that there are extremely long lines in the response -
 I'm guessing on the order of 100k / line.

I also tried this but failed to see the issue. The string is looked up
using strstr() so it's insensible to this. I've looked at how status
messages were reported and did not find a place where a copy of the
output was returned. But I'll insist with these elements in mind.

 I'm attaching our error doc as well, please don't share this as its 
 proprietary.

Steve, you're posting to a public mailing list! Unfortunately it's too late
now :-(

 I'm guessing if you're
 allocating a certain buffer space, and doing a read-line() that could do it.

That's exactly why I was interested in the very useful information you provided
above.

 Let me know if you need anything else.

I'll check what I can do with this, thank you very much!

Best regards,
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
There's something excellent in your trace :

09:52:29.759117 sendto(1, GET /cp/testcheck.php HTTP/1.0\r\n..., 34, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 34
09:52:29.759357 epoll_ctl(0, EPOLL_CTL_MOD, 1, {EPOLLIN|0x2000, {u32=1, 
u64=1}}) = 0
09:52:29.759487 gettimeofday({1389635549, 759527}, NULL) = 0
09:52:29.759603 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.768407 gettimeofday({1389635549, 768449}, NULL) = 0
09:52:29.768529 recvfrom(1, HTTP/1.1 404 Not Found\r\nDate: Mo..., 16384, 0, 
NULL, NULL) = 4344
09:52:29.768754 gettimeofday({1389635549, 768796}, NULL) = 0
09:52:29.768873 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.769096 gettimeofday({1389635549, 769137}, NULL) = 0
09:52:29.769309 recvfrom(1, l .2s ease-in-out}.img-circle{bo..., 16384, 0, 
NULL, NULL) = 16384
09:52:29.769597 recvfrom(1, NULL, 2147483647, 
MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = 31400
09:52:29.769751 recvfrom(1, 0, 2147483647, 16480, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
09:52:29.769933 setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
09:52:29.770087 close(1)= 0

As you can see, we first read 4kB, then read an extra 16kB on top of it, so
for sure we overflow the read buffer. How this is possible is still a mystery
but now I'll dig along this track. I suspect we erroneously start to flush
the buffer at some point where we should not.

Thank you!

Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Cyril Bonté

Hi Willy,

Le 13/01/2014 19:19, Willy Tarreau a écrit :

There's something excellent in your trace :

09:52:29.759117 sendto(1, GET /cp/testcheck.php HTTP/1.0\r\n..., 34, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 34
09:52:29.759357 epoll_ctl(0, EPOLL_CTL_MOD, 1, {EPOLLIN|0x2000, {u32=1, 
u64=1}}) = 0
09:52:29.759487 gettimeofday({1389635549, 759527}, NULL) = 0
09:52:29.759603 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.768407 gettimeofday({1389635549, 768449}, NULL) = 0
09:52:29.768529 recvfrom(1, HTTP/1.1 404 Not Found\r\nDate: Mo..., 16384, 0, 
NULL, NULL) = 4344
09:52:29.768754 gettimeofday({1389635549, 768796}, NULL) = 0
09:52:29.768873 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.769096 gettimeofday({1389635549, 769137}, NULL) = 0
09:52:29.769309 recvfrom(1, l .2s ease-in-out}.img-circle{bo..., 16384, 0, 
NULL, NULL) = 16384
09:52:29.769597 recvfrom(1, NULL, 2147483647, 
MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = 31400
09:52:29.769751 recvfrom(1, 0, 2147483647, 16480, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
09:52:29.769933 setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
09:52:29.770087 close(1)= 0

As you can see, we first read 4kB, then read an extra 16kB on top of it, so
for sure we overflow the read buffer. How this is possible is still a mystery
but now I'll dig along this track. I suspect we erroneously start to flush
the buffer at some point where we should not.


I don't know if this is of any help because I don't have enough details 
yet, but I jut reproduced segfaults while playing with the configuration 
provided by Steve.


To reproduce it on my laptop, it's quite easy : generate a lot of 
headers, and send the content of 404.html.


Here is a PHP script I used to emulate the check :
?php
for ($i = 0; $i  640; $i++) {
header(X-h$i: $i);
}
readfile(404.html);
?

There's something strange in the values I sent to the debug output. In 
bo_putblk(), the half variable could have a negative value, which then 
segfaults when calling memcpy().


Now I can reproduce a segfault, I'll try to make some more tests 
tomorrow (only after work). But I believe you'll already find the reason 
before ;-)


--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Cyril Bonté

Hi again Willy,

Le 14/01/2014 00:51, Cyril Bonté a écrit :

I don't know if this is of any help because I don't have enough details
yet, but I jut reproduced segfaults while playing with the configuration
provided by Steve.

To reproduce it on my laptop, it's quite easy : generate a lot of
headers, and send the content of 404.html.

Here is a PHP script I used to emulate the check :
?php
for ($i = 0; $i  640; $i++) {
 header(X-h$i: $i);
}
readfile(404.html);
?

There's something strange in the values I sent to the debug output. In
bo_putblk(), the half variable could have a negative value, which then
segfaults when calling memcpy().

Now I can reproduce a segfault, I'll try to make some more tests
tomorrow (only after work). But I believe you'll already find the reason
before ;-)


Well, I couldn't leave my debug session in its current state.
Can you confirm that this patch could fix the issue ? I think this 
prevents a buffer overflow when waiting for more data.

Currently, I can't reproduce segfaults anymore when applied.

Now it's time to sleep some hours ;-)

--
Cyril Bonté
diff --git a/src/checks.c b/src/checks.c
index 115cc85..abdc333 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -1031,7 +1031,7 @@ static void event_srv_chk_r(struct connection *conn)
 
 	done = 0;
 
-	conn-xprt-rcv_buf(conn, check-bi, check-bi-size);
+	conn-xprt-rcv_buf(conn, check-bi, buffer_total_space(check-bi));
 	if (conn-flags  (CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH)) {
 		done = 1;
 		if ((conn-flags  CO_FL_ERROR)  !check-bi-i) {


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
Hi Cyril!

On Tue, Jan 14, 2014 at 02:51:41AM +0100, Cyril Bonté wrote:
 Le 14/01/2014 00:51, Cyril Bonté a écrit :
 Well, I couldn't leave my debug session in its current state.

I know what it's like when you go to bed an cannot sleep with eyes
wide open thinking about your last gdb output :-)

 Can you confirm that this patch could fix the issue ? I think this 
 prevents a buffer overflow when waiting for more data.
 Currently, I can't reproduce segfaults anymore when applied.

Hey, excellent catch! You're absolutely right. I'm totally ashamed
for not having found it while reading the code. I was searching for
a place where a wrong computation could lead to something larger
than the buffer and forgot to check for multiple reads of the
buffer's size :-)

 Now it's time to sleep some hours ;-)

Yeah you deserve it.

Steve, please also confirm that Cyril's patch fixes your segfault
(I'm sure it does given the traces you provided).

Cyril, feel free to send it to me with a few lines of commit message,
I'll merge it. Just for the record, the bug was introduced in 1.5-dev13
by this patch :

   f150317 MAJOR: checks: completely use the connection transport layer

Thanks!
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-12 Thread Willy Tarreau
Hi Steve,

On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote:
 I'm experimenting with haproxy on a centos6 VM here.  I found that when I
 specified a health check page (option httpchk GET /url), and that page
 didn't exist, we have a large 404 page returned, and that causes haproxy to
 quickly segfault (seems like on the second try GET'ing and parsing the
 page).  I couldn't figure out from the website where to submit a bug, so I
 figure I'll try here first.
 
 Steps to reproduce:
 - setup http backend, with option httpchk and httpcheck expect string x.
 Make option httpchk point to a non-existent page
 - On backend server, set it up to serve large 404 response (in my case, the
 404 page is 186kB, as it has an inline graphic and inline css)
 - Start haproxy, and wait for it to segfault
 
 I wasn't sure exactly what was causing this at first, so I did some work to
 narrow it down with GDB.  The variable values from gdb led me to the cause
 on my side, and hopefully can help you fix the issue.  I could not make
 this work with simply a large page for the http response - in that case, it
 seems to work as advertised, only inspecting the response up to
 tune.chksize (default 16384 as i've left it).  But if I do this with a 404,
 it seems to kill it.  Let me know what additional information you need if
 any.  Thanks and kudos for the great bit of software!

Thanks for all these details. I remember that the http-expect code puts
a zero at the end of the received buffer prior to looking up the string.
But it might be possible that there would be some cases where it doesn't
do it, or maybe it dies after restoring it. Another thing I'm thinking
about is that we're using the trash buffer for many operations and I'm
realizing that the check buffer's size might possibly be larger :-/

In your case the check indeed died on the second request so it's out of
context.

I'll try to reproduce this and fix it, thanks very much for your valuable
information!

Willy




Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Steve Ruiz
I'm experimenting with haproxy on a centos6 VM here.  I found that when I
specified a health check page (option httpchk GET /url), and that page
didn't exist, we have a large 404 page returned, and that causes haproxy to
quickly segfault (seems like on the second try GET'ing and parsing the
page).  I couldn't figure out from the website where to submit a bug, so I
figure I'll try here first.

Steps to reproduce:
- setup http backend, with option httpchk and httpcheck expect string x.
Make option httpchk point to a non-existent page
- On backend server, set it up to serve large 404 response (in my case, the
404 page is 186kB, as it has an inline graphic and inline css)
- Start haproxy, and wait for it to segfault

I wasn't sure exactly what was causing this at first, so I did some work to
narrow it down with GDB.  The variable values from gdb led me to the cause
on my side, and hopefully can help you fix the issue.  I could not make
this work with simply a large page for the http response - in that case, it
seems to work as advertised, only inspecting the response up to
tune.chksize (default 16384 as i've left it).  But if I do this with a 404,
it seems to kill it.  Let me know what additional information you need if
any.  Thanks and kudos for the great bit of software!


*#haproxy config:*
#-
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#-

# Help in developing config here:
# https://www.twilio.com/engineering/2013/10/16/haproxy


#-
# Global settings
#-
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events.  This is done
#by adding the '-r' option to the SYSLOGD_OPTIONS in
#/etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
#   file. A line like the following can be added to
#   /etc/sysconfig/syslog
#
#local2.*   /var/log/haproxy.log
#
log 127.0.0.1 local2 info

chroot  /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
userhaproxy
group   haproxy
daemon

#enable stats
stats socket /tmp/haproxy.sock

listen ha_stats :8088
balance source
mode http
timeout client 3ms
stats enable
stats auth haproxystats:foobar
stats uri /haproxy?stats

#-
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#-
defaults
modehttp
log global
option  httplog
option  dontlognull
#keep persisten client connection open
option  http-server-close
option forwardfor   except 127.0.0.0/8
option  redispatch
# Limit number of retries - total time trying to connect = connect
timeout * (#retries + 1)
retries 2
timeout http-request10s
timeout queue   1m
#timeout opening a tcp connection to server - should be shorter than
timeout client and server
timeout connect 3100
timeout client  30s
timeout server  30s
timeout http-keep-alive 10s
timeout check   10s
maxconn 3000

#-
# main frontend which proxys to the backends
#-
frontend https_frontend
bind :80
 redirect scheme https if !{ ssl_fc }

#config help:
https://github.com/observing/balancerbattle/blob/master/haproxy.cfg
 bind *:443 ssl crt /etc/certs/mycert.pem ciphers
RC4-SHA:AES128-SHA:AES:!ADH:!aNULL:!DH:!EDH:!eNULL
mode http
 default_backend webapp

#-
# Main backend for web application servers
#-
backend webapp
balance roundrobin
#Insert cookie SERVERID to pin it to one leg
cookie SERVERID insert nocache indirect
#http check should pull url below
option httpchk GET /cp/testcheck.html HTTP/1.0
#option httpchk GET /cp/testcheck.php HTTP/1.0
#http check should find string below in response to be considered up
http-check expect string good
#Define servers - inter=interval of 5s, rise 2=become avail after 2
successful checks, fall 3=take out after 3 fails
  

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Baptiste
Hi Steve,

Could you give a try to the tcp-check and tell us if your have the same issue.
In your backend, turn your httpchk related directives into:
  option tcp-check
  tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
  tcp-check send \r\n
  tcp-check expect string good

Baptiste


On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz ste...@mirth.com wrote:
 I'm experimenting with haproxy on a centos6 VM here.  I found that when I
 specified a health check page (option httpchk GET /url), and that page
 didn't exist, we have a large 404 page returned, and that causes haproxy to
 quickly segfault (seems like on the second try GET'ing and parsing the
 page).  I couldn't figure out from the website where to submit a bug, so I
 figure I'll try here first.

 Steps to reproduce:
 - setup http backend, with option httpchk and httpcheck expect string x.
 Make option httpchk point to a non-existent page
 - On backend server, set it up to serve large 404 response (in my case, the
 404 page is 186kB, as it has an inline graphic and inline css)
 - Start haproxy, and wait for it to segfault

 I wasn't sure exactly what was causing this at first, so I did some work to
 narrow it down with GDB.  The variable values from gdb led me to the cause
 on my side, and hopefully can help you fix the issue.  I could not make this
 work with simply a large page for the http response - in that case, it seems
 to work as advertised, only inspecting the response up to tune.chksize
 (default 16384 as i've left it).  But if I do this with a 404, it seems to
 kill it.  Let me know what additional information you need if any.  Thanks
 and kudos for the great bit of software!


 #haproxy config:
 #-
 # Example configuration for a possible web application.  See the
 # full configuration options online.
 #
 #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
 #
 #-

 # Help in developing config here:
 # https://www.twilio.com/engineering/2013/10/16/haproxy


 #-
 # Global settings
 #-
 global
 # to have these messages end up in /var/log/haproxy.log you will
 # need to:
 #
 # 1) configure syslog to accept network log events.  This is done
 #by adding the '-r' option to the SYSLOGD_OPTIONS in
 #/etc/sysconfig/syslog
 #
 # 2) configure local2 events to go to the /var/log/haproxy.log
 #   file. A line like the following can be added to
 #   /etc/sysconfig/syslog
 #
 #local2.*   /var/log/haproxy.log
 #
 log 127.0.0.1 local2 info

 chroot  /var/lib/haproxy
 pidfile /var/run/haproxy.pid
 maxconn 4000
 userhaproxy
 group   haproxy
 daemon

 #enable stats
 stats socket /tmp/haproxy.sock

 listen ha_stats :8088
 balance source
 mode http
 timeout client 3ms
 stats enable
 stats auth haproxystats:foobar
 stats uri /haproxy?stats

 #-
 # common defaults that all the 'listen' and 'backend' sections will
 # use if not designated in their block
 #-
 defaults
 modehttp
 log global
 option  httplog
 option  dontlognull
 #keep persisten client connection open
 option  http-server-close
 option forwardfor   except 127.0.0.0/8
 option  redispatch
 # Limit number of retries - total time trying to connect = connect
 timeout * (#retries + 1)
 retries 2
 timeout http-request10s
 timeout queue   1m
 #timeout opening a tcp connection to server - should be shorter than
 timeout client and server
 timeout connect 3100
 timeout client  30s
 timeout server  30s
 timeout http-keep-alive 10s
 timeout check   10s
 maxconn 3000

 #-
 # main frontend which proxys to the backends
 #-
 frontend https_frontend
 bind :80
 redirect scheme https if !{ ssl_fc }

 #config help:
 https://github.com/observing/balancerbattle/blob/master/haproxy.cfg
 bind *:443 ssl crt /etc/certs/mycert.pem ciphers
 RC4-SHA:AES128-SHA:AES:!ADH:!aNULL:!DH:!EDH:!eNULL
 mode http
 default_backend webapp

 #-
 # Main backend for web application servers
 #-
 backend webapp
 balance roundrobin

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Steve Ruiz
Made those changes, and it seems to be working properly, no segfault yet
after ~2 minutes of checks.  Thanks!

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com ste...@mirthcorp.com


On Fri, Jan 10, 2014 at 3:06 PM, Baptiste bed...@gmail.com wrote:

 Hi Steve,

 Could you give a try to the tcp-check and tell us if your have the same
 issue.
 In your backend, turn your httpchk related directives into:
   option tcp-check
   tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
   tcp-check send \r\n
   tcp-check expect string good

 Baptiste


 On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz ste...@mirth.com wrote:
  I'm experimenting with haproxy on a centos6 VM here.  I found that when I
  specified a health check page (option httpchk GET /url), and that page
  didn't exist, we have a large 404 page returned, and that causes haproxy
 to
  quickly segfault (seems like on the second try GET'ing and parsing the
  page).  I couldn't figure out from the website where to submit a bug, so
 I
  figure I'll try here first.
 
  Steps to reproduce:
  - setup http backend, with option httpchk and httpcheck expect string x.
  Make option httpchk point to a non-existent page
  - On backend server, set it up to serve large 404 response (in my case,
 the
  404 page is 186kB, as it has an inline graphic and inline css)
  - Start haproxy, and wait for it to segfault
 
  I wasn't sure exactly what was causing this at first, so I did some work
 to
  narrow it down with GDB.  The variable values from gdb led me to the
 cause
  on my side, and hopefully can help you fix the issue.  I could not make
 this
  work with simply a large page for the http response - in that case, it
 seems
  to work as advertised, only inspecting the response up to tune.chksize
  (default 16384 as i've left it).  But if I do this with a 404, it seems
 to
  kill it.  Let me know what additional information you need if any.
  Thanks
  and kudos for the great bit of software!
 
 
  #haproxy config:
  #-
  # Example configuration for a possible web application.  See the
  # full configuration options online.
  #
  #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
  #
  #-
 
  # Help in developing config here:
  # https://www.twilio.com/engineering/2013/10/16/haproxy
 
 
  #-
  # Global settings
  #-
  global
  # to have these messages end up in /var/log/haproxy.log you will
  # need to:
  #
  # 1) configure syslog to accept network log events.  This is done
  #by adding the '-r' option to the SYSLOGD_OPTIONS in
  #/etc/sysconfig/syslog
  #
  # 2) configure local2 events to go to the /var/log/haproxy.log
  #   file. A line like the following can be added to
  #   /etc/sysconfig/syslog
  #
  #local2.*   /var/log/haproxy.log
  #
  log 127.0.0.1 local2 info
 
  chroot  /var/lib/haproxy
  pidfile /var/run/haproxy.pid
  maxconn 4000
  userhaproxy
  group   haproxy
  daemon
 
  #enable stats
  stats socket /tmp/haproxy.sock
 
  listen ha_stats :8088
  balance source
  mode http
  timeout client 3ms
  stats enable
  stats auth haproxystats:foobar
  stats uri /haproxy?stats
 
  #-
  # common defaults that all the 'listen' and 'backend' sections will
  # use if not designated in their block
  #-
  defaults
  modehttp
  log global
  option  httplog
  option  dontlognull
  #keep persisten client connection open
  option  http-server-close
  option forwardfor   except 127.0.0.0/8
  option  redispatch
  # Limit number of retries - total time trying to connect = connect
  timeout * (#retries + 1)
  retries 2
  timeout http-request10s
  timeout queue   1m
  #timeout opening a tcp connection to server - should be shorter than
  timeout client and server
  timeout connect 3100
  timeout client  30s
  timeout server  30s
  timeout http-keep-alive 10s
  timeout check   10s
  maxconn 3000
 
  #-
  # main frontend which proxys to the backends
  #-
  frontend https_frontend
  bind :80
  redirect scheme https if !{ ssl_fc }
 
  #config help:
  

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Baptiste
Well, let say this is a workaround...
We'll definitively have to fix the bug ;)

Baptiste

On Sat, Jan 11, 2014 at 12:24 AM, Steve Ruiz ste...@mirth.com wrote:
 Made those changes, and it seems to be working properly, no segfault yet
 after ~2 minutes of checks.  Thanks!

 Steve Ruiz
 Manager - Hosting Operations
 Mirth
 ste...@mirth.com


 On Fri, Jan 10, 2014 at 3:06 PM, Baptiste bed...@gmail.com wrote:

 Hi Steve,

 Could you give a try to the tcp-check and tell us if your have the same
 issue.
 In your backend, turn your httpchk related directives into:
   option tcp-check
   tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
   tcp-check send \r\n
   tcp-check expect string good

 Baptiste


 On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz ste...@mirth.com wrote:
  I'm experimenting with haproxy on a centos6 VM here.  I found that when
  I
  specified a health check page (option httpchk GET /url), and that page
  didn't exist, we have a large 404 page returned, and that causes haproxy
  to
  quickly segfault (seems like on the second try GET'ing and parsing the
  page).  I couldn't figure out from the website where to submit a bug, so
  I
  figure I'll try here first.
 
  Steps to reproduce:
  - setup http backend, with option httpchk and httpcheck expect string x.
  Make option httpchk point to a non-existent page
  - On backend server, set it up to serve large 404 response (in my case,
  the
  404 page is 186kB, as it has an inline graphic and inline css)
  - Start haproxy, and wait for it to segfault
 
  I wasn't sure exactly what was causing this at first, so I did some work
  to
  narrow it down with GDB.  The variable values from gdb led me to the
  cause
  on my side, and hopefully can help you fix the issue.  I could not make
  this
  work with simply a large page for the http response - in that case, it
  seems
  to work as advertised, only inspecting the response up to tune.chksize
  (default 16384 as i've left it).  But if I do this with a 404, it seems
  to
  kill it.  Let me know what additional information you need if any.
  Thanks
  and kudos for the great bit of software!
 
 
  #haproxy config:
  #-
  # Example configuration for a possible web application.  See the
  # full configuration options online.
  #
  #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
  #
  #-
 
  # Help in developing config here:
  # https://www.twilio.com/engineering/2013/10/16/haproxy
 
 
  #-
  # Global settings
  #-
  global
  # to have these messages end up in /var/log/haproxy.log you will
  # need to:
  #
  # 1) configure syslog to accept network log events.  This is done
  #by adding the '-r' option to the SYSLOGD_OPTIONS in
  #/etc/sysconfig/syslog
  #
  # 2) configure local2 events to go to the /var/log/haproxy.log
  #   file. A line like the following can be added to
  #   /etc/sysconfig/syslog
  #
  #local2.*   /var/log/haproxy.log
  #
  log 127.0.0.1 local2 info
 
  chroot  /var/lib/haproxy
  pidfile /var/run/haproxy.pid
  maxconn 4000
  userhaproxy
  group   haproxy
  daemon
 
  #enable stats
  stats socket /tmp/haproxy.sock
 
  listen ha_stats :8088
  balance source
  mode http
  timeout client 3ms
  stats enable
  stats auth haproxystats:foobar
  stats uri /haproxy?stats
 
  #-
  # common defaults that all the 'listen' and 'backend' sections will
  # use if not designated in their block
  #-
  defaults
  modehttp
  log global
  option  httplog
  option  dontlognull
  #keep persisten client connection open
  option  http-server-close
  option forwardfor   except 127.0.0.0/8
  option  redispatch
  # Limit number of retries - total time trying to connect = connect
  timeout * (#retries + 1)
  retries 2
  timeout http-request10s
  timeout queue   1m
  #timeout opening a tcp connection to server - should be shorter than
  timeout client and server
  timeout connect 3100
  timeout client  30s
  timeout server  30s
  timeout http-keep-alive 10s
  timeout check   10s
  maxconn 3000
 
  #-
  # main frontend which proxys to the backends
  

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Steve Ruiz
Thanks for the workaround + super fast response, and glad to help :).

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com ste...@mirthcorp.com


On Fri, Jan 10, 2014 at 3:53 PM, Baptiste bed...@gmail.com wrote:

 Well, let say this is a workaround...
 We'll definitively have to fix the bug ;)

 Baptiste

 On Sat, Jan 11, 2014 at 12:24 AM, Steve Ruiz ste...@mirth.com wrote:
  Made those changes, and it seems to be working properly, no segfault yet
  after ~2 minutes of checks.  Thanks!
 
  Steve Ruiz
  Manager - Hosting Operations
  Mirth
  ste...@mirth.com
 
 
  On Fri, Jan 10, 2014 at 3:06 PM, Baptiste bed...@gmail.com wrote:
 
  Hi Steve,
 
  Could you give a try to the tcp-check and tell us if your have the same
  issue.
  In your backend, turn your httpchk related directives into:
option tcp-check
tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
tcp-check send \r\n
tcp-check expect string good
 
  Baptiste
 
 
  On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz ste...@mirth.com wrote:
   I'm experimenting with haproxy on a centos6 VM here.  I found that
 when
   I
   specified a health check page (option httpchk GET /url), and that page
   didn't exist, we have a large 404 page returned, and that causes
 haproxy
   to
   quickly segfault (seems like on the second try GET'ing and parsing the
   page).  I couldn't figure out from the website where to submit a bug,
 so
   I
   figure I'll try here first.
  
   Steps to reproduce:
   - setup http backend, with option httpchk and httpcheck expect string
 x.
   Make option httpchk point to a non-existent page
   - On backend server, set it up to serve large 404 response (in my
 case,
   the
   404 page is 186kB, as it has an inline graphic and inline css)
   - Start haproxy, and wait for it to segfault
  
   I wasn't sure exactly what was causing this at first, so I did some
 work
   to
   narrow it down with GDB.  The variable values from gdb led me to the
   cause
   on my side, and hopefully can help you fix the issue.  I could not
 make
   this
   work with simply a large page for the http response - in that case, it
   seems
   to work as advertised, only inspecting the response up to tune.chksize
   (default 16384 as i've left it).  But if I do this with a 404, it
 seems
   to
   kill it.  Let me know what additional information you need if any.
   Thanks
   and kudos for the great bit of software!
  
  
   #haproxy config:
   #-
   # Example configuration for a possible web application.  See the
   # full configuration options online.
   #
   #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
   #
   #-
  
   # Help in developing config here:
   # https://www.twilio.com/engineering/2013/10/16/haproxy
  
  
   #-
   # Global settings
   #-
   global
   # to have these messages end up in /var/log/haproxy.log you will
   # need to:
   #
   # 1) configure syslog to accept network log events.  This is done
   #by adding the '-r' option to the SYSLOGD_OPTIONS in
   #/etc/sysconfig/syslog
   #
   # 2) configure local2 events to go to the /var/log/haproxy.log
   #   file. A line like the following can be added to
   #   /etc/sysconfig/syslog
   #
   #local2.*   /var/log/haproxy.log
   #
   log 127.0.0.1 local2 info
  
   chroot  /var/lib/haproxy
   pidfile /var/run/haproxy.pid
   maxconn 4000
   userhaproxy
   group   haproxy
   daemon
  
   #enable stats
   stats socket /tmp/haproxy.sock
  
   listen ha_stats :8088
   balance source
   mode http
   timeout client 3ms
   stats enable
   stats auth haproxystats:foobar
   stats uri /haproxy?stats
  
   #-
   # common defaults that all the 'listen' and 'backend' sections will
   # use if not designated in their block
   #-
   defaults
   modehttp
   log global
   option  httplog
   option  dontlognull
   #keep persisten client connection open
   option  http-server-close
   option forwardfor   except 127.0.0.0/8
   option  redispatch
   # Limit number of retries - total time trying to connect = connect
   timeout * (#retries + 1)
   retries 2
   timeout http-request10s
   timeout queue   1m
   #timeout opening a tcp connection to server - should be shorter
 than
   timeout client and server
   timeout connect