Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Steve Ruiz
Confirmed on my side as well. No segfault, and no spinning CPU with the
latest patch.

thanks!

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com 


On Fri, Jan 17, 2014 at 10:25 AM, Cyril Bonté  wrote:

> Le 17/01/2014 11:14, Willy Tarreau a écrit :
>
>  On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote:
>>
>>> On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:
>>>
 So I might have broken something in the way to count the "try" value,
 ending up with zero being selected and nothing done. Unfortunately it
 works fine here.

>>>
>>> OK I can reproduce it in 32-bit now. Let's see what happens...
>>>
>>
>> OK here's the fix. I'm ashamed for not having noticed this mistake during
>> the change. I ported the raw_sock changes to ssl_sock, it was pretty
>> straght-forward but I missed the condition in the while () loop. And
>> unfortunately, the variable happened to be non-zero in the stack,
>> resulting in something working well for me :-/
>>
>> I've pushed the fix.
>>
>
> Great ! I didn't have time to try to fix it yesterday.
> Everything is working well now, we can definitely close this bug, that's a
> good thing ;-)
>
> --
> Cyril Bonté
>

-- 
CONFIDENTIALITY NOTICE: The information contained in this electronic 
transmission may be confidential. If you are not an intended recipient, be 
aware that any disclosure, copying, distribution or use of the information 
contained in this transmission is prohibited and may be unlawful. If you 
have received this transmission in error, please notify us by email reply 
and then erase it from your computer system.


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Cyril Bonté

Le 17/01/2014 11:14, Willy Tarreau a écrit :

On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote:

On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:

So I might have broken something in the way to count the "try" value,
ending up with zero being selected and nothing done. Unfortunately it
works fine here.


OK I can reproduce it in 32-bit now. Let's see what happens...


OK here's the fix. I'm ashamed for not having noticed this mistake during
the change. I ported the raw_sock changes to ssl_sock, it was pretty
straght-forward but I missed the condition in the while () loop. And
unfortunately, the variable happened to be non-zero in the stack,
resulting in something working well for me :-/

I've pushed the fix.


Great ! I didn't have time to try to fix it yesterday.
Everything is working well now, we can definitely close this bug, that's 
a good thing ;-)


--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Willy Tarreau
On Fri, Jan 17, 2014 at 11:03:51AM +0100, Willy Tarreau wrote:
> On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:
> > So I might have broken something in the way to count the "try" value,
> > ending up with zero being selected and nothing done. Unfortunately it
> > works fine here.
> 
> OK I can reproduce it in 32-bit now. Let's see what happens...

OK here's the fix. I'm ashamed for not having noticed this mistake during
the change. I ported the raw_sock changes to ssl_sock, it was pretty
straght-forward but I missed the condition in the while () loop. And
unfortunately, the variable happened to be non-zero in the stack,
resulting in something working well for me :-/

I've pushed the fix.

Thanks guys.

Willy



>From 00b0fb9349b8842a5ec2cee9dc4f286c8d3a3685 Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Fri, 17 Jan 2014 11:09:40 +0100
Subject: BUG/MAJOR: ssl: fix breakage caused by recent fix abf08d9
MIME-Version: 1.0
Content-Type: text/plain; charset=latin1
Content-Transfer-Encoding: 8bit

Recent commit abf08d9 ("BUG/MAJOR: connection: fix mismatch between rcv_buf's
API and usage") accidentely broke SSL by relying on an uninitialized value to
enter the read loop.

Many thanks to Cyril Bonté and Steve Ruiz for reporting this issue.
---
 src/ssl_sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 7120ff8..87a2a58 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -1353,7 +1353,7 @@ static int ssl_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
 * in which case we accept to do it once again. A new attempt is made on
 * EINTR too.
 */
-   while (try) {
+   while (count > 0) {
/* first check if we have some room after p+i */
try = buf->data + buf->size - (buf->p + buf->i);
/* otherwise continue between data and p-o */
-- 
1.7.12.1




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Willy Tarreau
On Fri, Jan 17, 2014 at 10:47:01AM +0100, Willy Tarreau wrote:
> So I might have broken something in the way to count the "try" value,
> ending up with zero being selected and nothing done. Unfortunately it
> works fine here.

OK I can reproduce it in 32-bit now. Let's see what happens...

Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-17 Thread Willy Tarreau
Hi Cyril,

On Thu, Jan 16, 2014 at 10:48:10PM +0100, Cyril Bonté wrote:
> Hi Willy,
> 
> Le 15/01/2014 01:08, Willy Tarreau a écrit :
> >On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:
> >>Patched and confirmed in our environment that this is now working / seems
> >>to have fixed the issue. Thanks!
> >
> >Great, many thanks to you both guys. We've got rid of another pretty
> >old bug, these are the ones that make me the happiest once fixed!
> >
> >I'm currently unpacking my laptop to push the fix so that it appears
> >in todays snapshot.
> >
> >Excellent work!
> 
> I fear there are some more work to do on this patch.
> I made some tests on ssl and it looks to be broken since this commit :-(
> 
> The shortest configuration I could find to reproduce the issue is :
>   listen test
> bind 0.0.0.0:443 ssl crt cert.pem
> mode http
> timeout server 5s
> timeout client 5s
> 
> When a request is received by haproxy, the cpu raises to 100% in a 
> epoll_wait loop (timeouts are here to prevent an unlimited loop).
> 
> $ curl -k https://localhost/
> ...
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> ...

So I might have broken something in the way to count the "try" value,
ending up with zero being selected and nothing done. Unfortunately it
works fine here.

Could you try to single-step in gdb through ssl_sock_to_buf ?

I'll continue to test if I can reproduce it and also to understand
in which case we could end up with a wrong size computation.

Thanks,
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-16 Thread Steve Ruiz
Cyril is correct - I simply waited for a segfault, but didn't actually test
through the load balancer. I'm using SSL on haproxy, and yes, when I try to
hit a web page behind haproxy, CPU spins at 100% for a good while.

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com 


On Thu, Jan 16, 2014 at 1:48 PM, Cyril Bonté  wrote:

> Hi Willy,
>
> Le 15/01/2014 01:08, Willy Tarreau a écrit :
>
>  On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:
>>
>>> Patched and confirmed in our environment that this is now working / seems
>>> to have fixed the issue. Thanks!
>>>
>>
>> Great, many thanks to you both guys. We've got rid of another pretty
>> old bug, these are the ones that make me the happiest once fixed!
>>
>> I'm currently unpacking my laptop to push the fix so that it appears
>> in todays snapshot.
>>
>> Excellent work!
>>
>
> I fear there are some more work to do on this patch.
> I made some tests on ssl and it looks to be broken since this commit :-(
>
> The shortest configuration I could find to reproduce the issue is :
>   listen test
> bind 0.0.0.0:443 ssl crt cert.pem
> mode http
> timeout server 5s
> timeout client 5s
>
> When a request is received by haproxy, the cpu raises to 100% in a
> epoll_wait loop (timeouts are here to prevent an unlimited loop).
>
> $ curl -k https://localhost/
> ...
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
> ...
> The same issue occurs when a server is declared.
>
> The same also occurs when the proxy is in clear http and a server is in
> https :
>   listen test
> bind 0.0.0.0:80
> mode http
> timeout server 5s
> timeout client 5s
> server ssl_backend 127.0.0.1:443 ssl
>
> $ curl http://localhost/
> ...
> epoll_wait(3, {}, 200, 0)   = 0
> epoll_wait(3, {}, 200, 0)   = 0
> epoll_wait(3, {}, 200, 0)   = 0
> epoll_wait(3, {}, 200, 0)   = 0
> epoll_wait(3, {}, 200, 0)   = 0
> ...
>
>
> --
> Cyril Bonté
>

-- 
CONFIDENTIALITY NOTICE: The information contained in this electronic 
transmission may be confidential. If you are not an intended recipient, be 
aware that any disclosure, copying, distribution or use of the information 
contained in this transmission is prohibited and may be unlawful. If you 
have received this transmission in error, please notify us by email reply 
and then erase it from your computer system.


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-16 Thread Cyril Bonté

Hi Willy,

Le 15/01/2014 01:08, Willy Tarreau a écrit :

On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:

Patched and confirmed in our environment that this is now working / seems
to have fixed the issue. Thanks!


Great, many thanks to you both guys. We've got rid of another pretty
old bug, these are the ones that make me the happiest once fixed!

I'm currently unpacking my laptop to push the fix so that it appears
in todays snapshot.

Excellent work!


I fear there are some more work to do on this patch.
I made some tests on ssl and it looks to be broken since this commit :-(

The shortest configuration I could find to reproduce the issue is :
  listen test
bind 0.0.0.0:443 ssl crt cert.pem
mode http
timeout server 5s
timeout client 5s

When a request is received by haproxy, the cpu raises to 100% in a 
epoll_wait loop (timeouts are here to prevent an unlimited loop).


$ curl -k https://localhost/
...
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
epoll_wait(3, {{EPOLLIN, {u32=5, u64=5}}}, 200, 0) = 1
...
The same issue occurs when a server is declared.

The same also occurs when the proxy is in clear http and a server is in 
https :

  listen test
bind 0.0.0.0:80
mode http
timeout server 5s
timeout client 5s
server ssl_backend 127.0.0.1:443 ssl

$ curl http://localhost/
...
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
epoll_wait(3, {}, 200, 0)   = 0
...


--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Willy Tarreau
On Tue, Jan 14, 2014 at 12:25:37PM -0800, Steve Ruiz wrote:
> Patched and confirmed in our environment that this is now working / seems
> to have fixed the issue. Thanks!

Great, many thanks to you both guys. We've got rid of another pretty
old bug, these are the ones that make me the happiest once fixed!

I'm currently unpacking my laptop to push the fix so that it appears
in todays snapshot.

Excellent work!
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Steve Ruiz
Patched and confirmed in our environment that this is now working / seems
to have fixed the issue. Thanks!

Steve Ruiz


On Tue, Jan 14, 2014 at 3:22 AM, Willy Tarreau  wrote:

> OK here's a proposed fix which addresses the API issue for both
> raw_sock and ssl_sock.
>
> Steve, it would be nice if you could give it a try just to confirm
> I didn't miss anything.
>
> Thanks,
> Willy
>
>

-- 
CONFIDENTIALITY NOTICE: The information contained in this electronic 
transmission may be confidential. If you are not an intended recipient, be 
aware that any disclosure, copying, distribution or use of the information 
contained in this transmission is prohibited and may be unlawful. If you 
have received this transmission in error, please notify us by email reply 
and then erase it from your computer system.


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Baptiste
Willy, have you validated this version in our lab as well

Baptiste
 Le 14 janv. 2014 19:21, "Cyril Bonté"  a écrit :

> Hi again Willy,
>
> Le 14/01/2014 12:22, Willy Tarreau a écrit :
>
>> OK here's a proposed fix which addresses the API issue for both
>> raw_sock and ssl_sock.
>>
>> Steve, it would be nice if you could give it a try just to confirm
>> I didn't miss anything.
>>
>
> OK, from my side, now I'm on the laptop where I can reproduce the
> segfault, I confirm it doesn't crash anymore once the patch is applied
> (which was predictable from the quick test I made this afternoon).
>
> Let's see if it's OK for Steve too ;-)
>
> --
> Cyril Bonté
>
>


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Cyril Bonté

Hi again Willy,

Le 14/01/2014 12:22, Willy Tarreau a écrit :

OK here's a proposed fix which addresses the API issue for both
raw_sock and ssl_sock.

Steve, it would be nice if you could give it a try just to confirm
I didn't miss anything.


OK, from my side, now I'm on the laptop where I can reproduce the 
segfault, I confirm it doesn't crash anymore once the patch is applied 
(which was predictable from the quick test I made this afternoon).


Let's see if it's OK for Steve too ;-)

--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Willy Tarreau
OK here's a proposed fix which addresses the API issue for both
raw_sock and ssl_sock.

Steve, it would be nice if you could give it a try just to confirm
I didn't miss anything.

Thanks,
Willy

>From 3e499a6da1ca070f23083c874aa48895f00d0d6f Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Tue, 14 Jan 2014 11:31:27 +0100
Subject: BUG/MAJOR: connection: fix mismatch between rcv_buf's API and usage
MIME-Version: 1.0
Content-Type: text/plain; charset=latin1
Content-Transfer-Encoding: 8bit

Steve Ruiz reported some reproducible crashes with HTTP health checks
on a certain page returning a huge length. The traces he provided
clearly showed that the recv() call was performed twice for a total
size exceeding the buffer's length.

Cyril Bonté tracked down the problem to be caused by the full buffer
size being passed to rcv_buf() in event_srv_chk_r() instead of passing
just the remaining amount of space. Indeed, this change happened during
the connection rework in 1.5-dev13 with the following commit :

f150317 MAJOR: checks: completely use the connection transport layer

But one of the problems is also that the comments at the top of the
rcv_buf() functions suggest that the caller only has to ensure the
requested size doesn't overflow the buffer's size.

Also, these functions already have to care about the buffer's size to
handle wrapping free space when there are pending data in the buffer.
So let's change the API instead to more closely match what could be
expected from these functions :

- the caller asks for the maximum amount of bytes it wants to read ;
This means that only the caller is responsible for enforcing the
reserve if it wants to (eg: checks don't).

- the rcv_buf() functions fix their computations to always consider
this size as a max, and always perform validity checks based on
the buffer's free space.

As a result, the code is simplified and reduced, and made more robust
for callers which now just have to care about whether they want the
buffer to be filled or not.

Since the bug was introduced in 1.5-dev13, no backport to stable versions
is needed.
---
 src/checks.c   |  2 +-
 src/raw_sock.c | 31 ---
 src/ssl_sock.c | 29 +++--
 3 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/src/checks.c b/src/checks.c
index 3237304..2274136 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2065,7 +2065,7 @@ static void tcpcheck_main(struct connection *conn)
goto out_end_tcpcheck;
 
if ((conn->flags & CO_FL_WAIT_RD) ||
-   conn->xprt->rcv_buf(conn, check->bi, 
buffer_total_space(check->bi)) <= 0) {
+   conn->xprt->rcv_buf(conn, check->bi, 
check->bi->size) <= 0) {
if (conn->flags & (CO_FL_ERROR | 
CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH)) {
done = 1;
if ((conn->flags & CO_FL_ERROR) && 
!check->bi->i) {
diff --git a/src/raw_sock.c b/src/raw_sock.c
index 4dc1c7a..2e3a0cb 100644
--- a/src/raw_sock.c
+++ b/src/raw_sock.c
@@ -226,8 +226,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 
 
 /* Receive up to  bytes from connection 's socket and store them
- * into buffer . The caller must ensure that  is always smaller
- * than the buffer's size. Only one call to recv() is performed, unless the
+ * into buffer . Only one call to recv() is performed, unless the
  * buffer wraps, in which case a second call may be performed. The connection's
  * flags are updated with whatever special event is detected (error, read0,
  * empty). The caller is responsible for taking care of those events and
@@ -239,7 +238,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int 
count)
 {
int ret, done = 0;
-   int try = count;
+   int try;
 
if (!(conn->flags & CO_FL_CTRL_READY))
return 0;
@@ -258,24 +257,27 @@ static int raw_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
}
}
 
-   /* compute the maximum block size we can read at once. */
-   if (buffer_empty(buf)) {
-   /* let's realign the buffer to optimize I/O */
+   /* let's realign the buffer to optimize I/O */
+   if (buffer_empty(buf))
buf->p = buf->data;
-   }
-   else if (buf->data + buf->o < buf->p &&
-buf->p + buf->i < buf->data + buf->size) {
-   /* remaining space wraps at the end, with a moving limit */
-   if (try > buf->data + buf->size - (buf->p + buf->i))
-   try = buf->data + buf->size - (buf->p + buf->i);
-   }
 
/* read the largest possible block. For this, we perform only one call
 * to recv() unless the buffer wraps and we exactly fill the first hunk,
 

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-14 Thread Willy Tarreau
Hi Cyril,

On Tue, Jan 14, 2014 at 08:23:00AM +0100, Willy Tarreau wrote:
> Hey, excellent catch! You're absolutely right. I'm totally ashamed
> for not having found it while reading the code. I was searching for
> a place where a wrong computation could lead to something larger
> than the buffer and forgot to check for multiple reads of the
> buffer's size :-)

Now thinking about it a little bit more, I think we have an API problem
in fact. The raw_sock_to_buf() functions says :

/* Receive up to  bytes from connection 's socket and store them
 * into buffer . The caller must ensure that  is always smaller
 * than the buffer's size.
 */
 
But as you found, this is misleading as it doesn't work that well, since
the caller needs to take care of not asking for too much data. So I'm
thinking about changing the API instead so that the caller doesn't have
to care abou this and that only the read functions do. Anyway, they
already care about free space wrapping at the end of the buffer.

So I'd rather fix raw_sock_to_buf() and ssl_sock_to_buf() with a patch
like this one, and simplify the logic at some call places. It would make
the code much more robust and protect us against such bugs in the future.

Could you please give it a try in your environment ?

Thanks,
Willy


diff --git a/src/raw_sock.c b/src/raw_sock.c
index 4dc1c7a..2e3a0cb 100644
--- a/src/raw_sock.c
+++ b/src/raw_sock.c
@@ -226,8 +226,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 
 
 /* Receive up to  bytes from connection 's socket and store them
- * into buffer . The caller must ensure that  is always smaller
- * than the buffer's size. Only one call to recv() is performed, unless the
+ * into buffer . Only one call to recv() is performed, unless the
  * buffer wraps, in which case a second call may be performed. The connection's
  * flags are updated with whatever special event is detected (error, read0,
  * empty). The caller is responsible for taking care of those events and
@@ -239,7 +238,7 @@ int raw_sock_from_pipe(struct connection *conn, struct pipe 
*pipe)
 static int raw_sock_to_buf(struct connection *conn, struct buffer *buf, int 
count)
 {
int ret, done = 0;
-   int try = count;
+   int try;
 
if (!(conn->flags & CO_FL_CTRL_READY))
return 0;
@@ -258,24 +257,27 @@ static int raw_sock_to_buf(struct connection *conn, 
struct buffer *buf, int coun
}
}
 
-   /* compute the maximum block size we can read at once. */
-   if (buffer_empty(buf)) {
-   /* let's realign the buffer to optimize I/O */
+   /* let's realign the buffer to optimize I/O */
+   if (buffer_empty(buf))
buf->p = buf->data;
-   }
-   else if (buf->data + buf->o < buf->p &&
-buf->p + buf->i < buf->data + buf->size) {
-   /* remaining space wraps at the end, with a moving limit */
-   if (try > buf->data + buf->size - (buf->p + buf->i))
-   try = buf->data + buf->size - (buf->p + buf->i);
-   }
 
/* read the largest possible block. For this, we perform only one call
 * to recv() unless the buffer wraps and we exactly fill the first hunk,
 * in which case we accept to do it once again. A new attempt is made on
 * EINTR too.
 */
-   while (try) {
+   while (count > 0) {
+   /* first check if we have some room after p+i */
+   try = buf->data + buf->size - (buf->p + buf->i);
+   /* otherwise continue between data and p-o */
+   if (try <= 0) {
+   try = buf->p - (buf->data + buf->o);
+   if (try <= 0)
+   break;
+   }
+   if (try > count)
+   try = count;
+
ret = recv(conn->t.sock.fd, bi_end(buf), try, 0);
 
if (ret > 0) {
@@ -291,7 +293,6 @@ static int raw_sock_to_buf(struct connection *conn, struct 
buffer *buf, int coun
break;
}
count -= ret;
-   try = count;
}
else if (ret == 0) {
goto read0;



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
Hi Cyril!

On Tue, Jan 14, 2014 at 02:51:41AM +0100, Cyril Bonté wrote:
> Le 14/01/2014 00:51, Cyril Bonté a écrit :
> Well, I couldn't leave my debug session in its current state.

I know what it's like when you go to bed an cannot sleep with eyes
wide open thinking about your last gdb output :-)

> Can you confirm that this patch could fix the issue ? I think this 
> prevents a buffer overflow when waiting for more data.
> Currently, I can't reproduce segfaults anymore when applied.

Hey, excellent catch! You're absolutely right. I'm totally ashamed
for not having found it while reading the code. I was searching for
a place where a wrong computation could lead to something larger
than the buffer and forgot to check for multiple reads of the
buffer's size :-)

> Now it's time to sleep some hours ;-)

Yeah you deserve it.

Steve, please also confirm that Cyril's patch fixes your segfault
(I'm sure it does given the traces you provided).

Cyril, feel free to send it to me with a few lines of commit message,
I'll merge it. Just for the record, the bug was introduced in 1.5-dev13
by this patch :

   f150317 MAJOR: checks: completely use the connection transport layer

Thanks!
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Cyril Bonté

Hi again Willy,

Le 14/01/2014 00:51, Cyril Bonté a écrit :

I don't know if this is of any help because I don't have enough details
yet, but I jut reproduced segfaults while playing with the configuration
provided by Steve.

To reproduce it on my laptop, it's quite easy : generate a lot of
headers, and send the content of 404.html.

Here is a PHP script I used to emulate the check :


There's something strange in the values I sent to the debug output. In
bo_putblk(), the "half" variable could have a negative value, which then
segfaults when calling memcpy().

Now I can reproduce a segfault, I'll try to make some more tests
tomorrow (only after work). But I believe you'll already find the reason
before ;-)


Well, I couldn't leave my debug session in its current state.
Can you confirm that this patch could fix the issue ? I think this 
prevents a buffer overflow when waiting for more data.

Currently, I can't reproduce segfaults anymore when applied.

Now it's time to sleep some hours ;-)

--
Cyril Bonté
diff --git a/src/checks.c b/src/checks.c
index 115cc85..abdc333 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -1031,7 +1031,7 @@ static void event_srv_chk_r(struct connection *conn)
 
 	done = 0;
 
-	conn->xprt->rcv_buf(conn, check->bi, check->bi->size);
+	conn->xprt->rcv_buf(conn, check->bi, buffer_total_space(check->bi));
 	if (conn->flags & (CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_DATA_RD_SH)) {
 		done = 1;
 		if ((conn->flags & CO_FL_ERROR) && !check->bi->i) {


Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Cyril Bonté

Hi Willy,

Le 13/01/2014 19:19, Willy Tarreau a écrit :

There's something excellent in your trace :

09:52:29.759117 sendto(1, "GET /cp/testcheck.php HTTP/1.0\r\n"..., 34, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 34
09:52:29.759357 epoll_ctl(0, EPOLL_CTL_MOD, 1, {EPOLLIN|0x2000, {u32=1, 
u64=1}}) = 0
09:52:29.759487 gettimeofday({1389635549, 759527}, NULL) = 0
09:52:29.759603 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.768407 gettimeofday({1389635549, 768449}, NULL) = 0
09:52:29.768529 recvfrom(1, "HTTP/1.1 404 Not Found\r\nDate: Mo"..., 16384, 0, 
NULL, NULL) = 4344
09:52:29.768754 gettimeofday({1389635549, 768796}, NULL) = 0
09:52:29.768873 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.769096 gettimeofday({1389635549, 769137}, NULL) = 0
09:52:29.769309 recvfrom(1, "l .2s ease-in-out}.img-circle{bo"..., 16384, 0, 
NULL, NULL) = 16384
09:52:29.769597 recvfrom(1, NULL, 2147483647, 
MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = 31400
09:52:29.769751 recvfrom(1, 0, 2147483647, 16480, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
09:52:29.769933 setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
09:52:29.770087 close(1)= 0

As you can see, we first read 4kB, then read an extra 16kB on top of it, so
for sure we overflow the read buffer. How this is possible is still a mystery
but now I'll dig along this track. I suspect we erroneously start to flush
the buffer at some point where we should not.


I don't know if this is of any help because I don't have enough details 
yet, but I jut reproduced segfaults while playing with the configuration 
provided by Steve.


To reproduce it on my laptop, it's quite easy : generate a lot of 
headers, and send the content of 404.html.


Here is a PHP script I used to emulate the check :


There's something strange in the values I sent to the debug output. In 
bo_putblk(), the "half" variable could have a negative value, which then 
segfaults when calling memcpy().


Now I can reproduce a segfault, I'll try to make some more tests 
tomorrow (only after work). But I believe you'll already find the reason 
before ;-)


--
Cyril Bonté



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
There's something excellent in your trace :

09:52:29.759117 sendto(1, "GET /cp/testcheck.php HTTP/1.0\r\n"..., 34, 
MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 34
09:52:29.759357 epoll_ctl(0, EPOLL_CTL_MOD, 1, {EPOLLIN|0x2000, {u32=1, 
u64=1}}) = 0
09:52:29.759487 gettimeofday({1389635549, 759527}, NULL) = 0
09:52:29.759603 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.768407 gettimeofday({1389635549, 768449}, NULL) = 0
09:52:29.768529 recvfrom(1, "HTTP/1.1 404 Not Found\r\nDate: Mo"..., 16384, 0, 
NULL, NULL) = 4344
09:52:29.768754 gettimeofday({1389635549, 768796}, NULL) = 0
09:52:29.768873 epoll_wait(0, {{EPOLLIN, {u32=1, u64=1}}}, 200, 1000) = 1
09:52:29.769096 gettimeofday({1389635549, 769137}, NULL) = 0
09:52:29.769309 recvfrom(1, "l .2s ease-in-out}.img-circle{bo"..., 16384, 0, 
NULL, NULL) = 16384
09:52:29.769597 recvfrom(1, NULL, 2147483647, 
MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = 31400
09:52:29.769751 recvfrom(1, 0, 2147483647, 16480, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
09:52:29.769933 setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
09:52:29.770087 close(1)= 0

As you can see, we first read 4kB, then read an extra 16kB on top of it, so
for sure we overflow the read buffer. How this is possible is still a mystery
but now I'll dig along this track. I suspect we erroneously start to flush
the buffer at some point where we should not.

Thank you!

Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
On Mon, Jan 13, 2014 at 10:10:45AM -0800, Steve Ruiz wrote:
> sure thing, trace attached.  Looking at the page returned, the only strange
> thing I can see is that there are extremely long lines in the response -
> I'm guessing on the order of 100k / line.

I also tried this but failed to see the issue. The string is looked up
using strstr() so it's insensible to this. I've looked at how status
messages were reported and did not find a place where a copy of the
output was returned. But I'll insist with these elements in mind.

> I'm attaching our error doc as well, please don't share this as its 
> proprietary.

Steve, you're posting to a public mailing list! Unfortunately it's too late
now :-(

> I'm guessing if you're
> allocating a certain buffer space, and doing a read-line() that could do it.

That's exactly why I was interested in the very useful information you provided
above.

> Let me know if you need anything else.

I'll check what I can do with this, thank you very much!

Best regards,
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
Hi Tim,

On Mon, Jan 13, 2014 at 12:25:30PM -0500, Tim Prepscius wrote:
> Willy,
> 
> Can you take me off of this list?

done!

> Unsubscribing doesn't work. I have no idea why.  I've tried many times.
> The last time I tried, I got back a message that gmail was identified
> as a spammer.

This is the reason. From time to time, gmail seems to be marked as a spammer
by some RBLs. I absolutely hate RBLs beyond imagination for this exact reason,
clueless bots marking any sender as spammer, sometimes helped by stupid or
arrogant people. But anyway they help keeping the spam rate low enough so we
keep them. Overall it doesn't work too bad anyway.

Regards,
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Tim Prepscius
Willy,

Can you take me off of this list?

Unsubscribing doesn't work. I have no idea why.  I've tried many times.
The last time I tried, I got back a message that gmail was identified
as a spammer.


Here is a sample of my unsubscribe message:

- snip 

MIME-Version: 1.0
Received: by 10.140.86.244 with HTTP; Thu, 9 Jan 2014 19:38:56 -0800 (PST)
Date: Thu, 9 Jan 2014 22:38:56 -0500
Delivered-To: timprepsc...@gmail.com
Message-ID: 
Subject: unsubscribe
From: Tim Prepscius 
To: haproxy+unsubscr...@formilux.org
Content-Type: text/plain; charset=ISO-8859-1

unsubscribe

- snip 

Thank you,

-tim

On 1/13/14, Willy Tarreau  wrote:
> Hi again Steve,
>
> On Mon, Jan 13, 2014 at 08:44:08AM +0100, Willy Tarreau wrote:
>> Hi Steve,
>>
>> On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote:
>> > I'm experimenting with haproxy on a centos6 VM here.  I found that when
>> > I
>> > specified a health check page (option httpchk GET /url), and that page
>> > didn't exist, we have a large 404 page returned, and that causes haproxy
>> > to
>> > quickly segfault (seems like on the second try GET'ing and parsing the
>> > page).  I couldn't figure out from the website where to submit a bug, so
>> > I
>> > figure I'll try here first.
>> >
>> > Steps to reproduce:
>> > - setup http backend, with option httpchk and httpcheck expect string
>> > x.
>> > Make option httpchk point to a non-existent page
>> > - On backend server, set it up to serve large 404 response (in my case,
>> > the
>> > 404 page is 186kB, as it has an inline graphic and inline css)
>> > - Start haproxy, and wait for it to segfault
>> >
>> > I wasn't sure exactly what was causing this at first, so I did some work
>> > to
>> > narrow it down with GDB.  The variable values from gdb led me to the
>> > cause
>> > on my side, and hopefully can help you fix the issue.  I could not make
>> > this work with simply a large page for the http response - in that case,
>> > it
>> > seems to work as advertised, only inspecting the response up to
>> > tune.chksize (default 16384 as i've left it).  But if I do this with a
>> > 404,
>> > it seems to kill it.  Let me know what additional information you need
>> > if
>> > any.  Thanks and kudos for the great bit of software!
>>
>> Thanks for all these details. I remember that the http-expect code puts
>> a zero at the end of the received buffer prior to looking up the string.
>> But it might be possible that there would be some cases where it doesn't
>> do it, or maybe it dies after restoring it. Another thing I'm thinking
>> about is that we're using the trash buffer for many operations and I'm
>> realizing that the check buffer's size might possibly be larger :-/
>
> I'm a bit puzzled, not only I cannot reproduce the issue, but also I do
> not see in the code how this could happen, so I must be missing something.
> Could you please post the output of "strace -tt" on haproxy when it does
> this ? Especially the last checks ? I'm suspecting an anomaly in the
> receive
> buffer size calculation but all I read here seems fine, which puzzles me.
>
> Thanks!
> Willy
>
>
>



Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-13 Thread Willy Tarreau
Hi again Steve,

On Mon, Jan 13, 2014 at 08:44:08AM +0100, Willy Tarreau wrote:
> Hi Steve,
> 
> On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote:
> > I'm experimenting with haproxy on a centos6 VM here.  I found that when I
> > specified a health check page (option httpchk GET /url), and that page
> > didn't exist, we have a large 404 page returned, and that causes haproxy to
> > quickly segfault (seems like on the second try GET'ing and parsing the
> > page).  I couldn't figure out from the website where to submit a bug, so I
> > figure I'll try here first.
> > 
> > Steps to reproduce:
> > - setup http backend, with option httpchk and httpcheck expect string x.
> > Make option httpchk point to a non-existent page
> > - On backend server, set it up to serve large 404 response (in my case, the
> > 404 page is 186kB, as it has an inline graphic and inline css)
> > - Start haproxy, and wait for it to segfault
> > 
> > I wasn't sure exactly what was causing this at first, so I did some work to
> > narrow it down with GDB.  The variable values from gdb led me to the cause
> > on my side, and hopefully can help you fix the issue.  I could not make
> > this work with simply a large page for the http response - in that case, it
> > seems to work as advertised, only inspecting the response up to
> > tune.chksize (default 16384 as i've left it).  But if I do this with a 404,
> > it seems to kill it.  Let me know what additional information you need if
> > any.  Thanks and kudos for the great bit of software!
> 
> Thanks for all these details. I remember that the http-expect code puts
> a zero at the end of the received buffer prior to looking up the string.
> But it might be possible that there would be some cases where it doesn't
> do it, or maybe it dies after restoring it. Another thing I'm thinking
> about is that we're using the trash buffer for many operations and I'm
> realizing that the check buffer's size might possibly be larger :-/

I'm a bit puzzled, not only I cannot reproduce the issue, but also I do
not see in the code how this could happen, so I must be missing something.
Could you please post the output of "strace -tt" on haproxy when it does
this ? Especially the last checks ? I'm suspecting an anomaly in the receive
buffer size calculation but all I read here seems fine, which puzzles me.

Thanks!
Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-12 Thread Willy Tarreau
Hi Steve,

On Fri, Jan 10, 2014 at 02:16:48PM -0800, Steve Ruiz wrote:
> I'm experimenting with haproxy on a centos6 VM here.  I found that when I
> specified a health check page (option httpchk GET /url), and that page
> didn't exist, we have a large 404 page returned, and that causes haproxy to
> quickly segfault (seems like on the second try GET'ing and parsing the
> page).  I couldn't figure out from the website where to submit a bug, so I
> figure I'll try here first.
> 
> Steps to reproduce:
> - setup http backend, with option httpchk and httpcheck expect string x.
> Make option httpchk point to a non-existent page
> - On backend server, set it up to serve large 404 response (in my case, the
> 404 page is 186kB, as it has an inline graphic and inline css)
> - Start haproxy, and wait for it to segfault
> 
> I wasn't sure exactly what was causing this at first, so I did some work to
> narrow it down with GDB.  The variable values from gdb led me to the cause
> on my side, and hopefully can help you fix the issue.  I could not make
> this work with simply a large page for the http response - in that case, it
> seems to work as advertised, only inspecting the response up to
> tune.chksize (default 16384 as i've left it).  But if I do this with a 404,
> it seems to kill it.  Let me know what additional information you need if
> any.  Thanks and kudos for the great bit of software!

Thanks for all these details. I remember that the http-expect code puts
a zero at the end of the received buffer prior to looking up the string.
But it might be possible that there would be some cases where it doesn't
do it, or maybe it dies after restoring it. Another thing I'm thinking
about is that we're using the trash buffer for many operations and I'm
realizing that the check buffer's size might possibly be larger :-/

In your case the check indeed died on the second request so it's out of
context.

I'll try to reproduce this and fix it, thanks very much for your valuable
information!

Willy




Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Steve Ruiz
Thanks for the workaround + super fast response, and glad to help :).

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com 


On Fri, Jan 10, 2014 at 3:53 PM, Baptiste  wrote:

> Well, let say this is a workaround...
> We'll definitively have to fix the bug ;)
>
> Baptiste
>
> On Sat, Jan 11, 2014 at 12:24 AM, Steve Ruiz  wrote:
> > Made those changes, and it seems to be working properly, no segfault yet
> > after ~2 minutes of checks.  Thanks!
> >
> > Steve Ruiz
> > Manager - Hosting Operations
> > Mirth
> > ste...@mirth.com
> >
> >
> > On Fri, Jan 10, 2014 at 3:06 PM, Baptiste  wrote:
> >>
> >> Hi Steve,
> >>
> >> Could you give a try to the tcp-check and tell us if your have the same
> >> issue.
> >> In your backend, turn your httpchk related directives into:
> >>   option tcp-check
> >>   tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
> >>   tcp-check send \r\n
> >>   tcp-check expect string good
> >>
> >> Baptiste
> >>
> >>
> >> On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz  wrote:
> >> > I'm experimenting with haproxy on a centos6 VM here.  I found that
> when
> >> > I
> >> > specified a health check page (option httpchk GET /url), and that page
> >> > didn't exist, we have a large 404 page returned, and that causes
> haproxy
> >> > to
> >> > quickly segfault (seems like on the second try GET'ing and parsing the
> >> > page).  I couldn't figure out from the website where to submit a bug,
> so
> >> > I
> >> > figure I'll try here first.
> >> >
> >> > Steps to reproduce:
> >> > - setup http backend, with option httpchk and httpcheck expect string
> x.
> >> > Make option httpchk point to a non-existent page
> >> > - On backend server, set it up to serve large 404 response (in my
> case,
> >> > the
> >> > 404 page is 186kB, as it has an inline graphic and inline css)
> >> > - Start haproxy, and wait for it to segfault
> >> >
> >> > I wasn't sure exactly what was causing this at first, so I did some
> work
> >> > to
> >> > narrow it down with GDB.  The variable values from gdb led me to the
> >> > cause
> >> > on my side, and hopefully can help you fix the issue.  I could not
> make
> >> > this
> >> > work with simply a large page for the http response - in that case, it
> >> > seems
> >> > to work as advertised, only inspecting the response up to tune.chksize
> >> > (default 16384 as i've left it).  But if I do this with a 404, it
> seems
> >> > to
> >> > kill it.  Let me know what additional information you need if any.
> >> > Thanks
> >> > and kudos for the great bit of software!
> >> >
> >> >
> >> > #haproxy config:
> >> > #-
> >> > # Example configuration for a possible web application.  See the
> >> > # full configuration options online.
> >> > #
> >> > #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
> >> > #
> >> > #-
> >> >
> >> > # Help in developing config here:
> >> > # https://www.twilio.com/engineering/2013/10/16/haproxy
> >> >
> >> >
> >> > #-
> >> > # Global settings
> >> > #-
> >> > global
> >> > # to have these messages end up in /var/log/haproxy.log you will
> >> > # need to:
> >> > #
> >> > # 1) configure syslog to accept network log events.  This is done
> >> > #by adding the '-r' option to the SYSLOGD_OPTIONS in
> >> > #/etc/sysconfig/syslog
> >> > #
> >> > # 2) configure local2 events to go to the /var/log/haproxy.log
> >> > #   file. A line like the following can be added to
> >> > #   /etc/sysconfig/syslog
> >> > #
> >> > #local2.*   /var/log/haproxy.log
> >> > #
> >> > log 127.0.0.1 local2 info
> >> >
> >> > chroot  /var/lib/haproxy
> >> > pidfile /var/run/haproxy.pid
> >> > maxconn 4000
> >> > userhaproxy
> >> > group   haproxy
> >> > daemon
> >> >
> >> > #enable stats
> >> > stats socket /tmp/haproxy.sock
> >> >
> >> > listen ha_stats :8088
> >> > balance source
> >> > mode http
> >> > timeout client 3ms
> >> > stats enable
> >> > stats auth haproxystats:foobar
> >> > stats uri /haproxy?stats
> >> >
> >> > #-
> >> > # common defaults that all the 'listen' and 'backend' sections will
> >> > # use if not designated in their block
> >> > #-
> >> > defaults
> >> > modehttp
> >> > log global
> >> > option  httplog
> >> > option  dontlognull
> >> > #keep persisten client connection open
> >> > option  http-server-close
> >> > option forwardfor   except 127.0.0.0

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Baptiste
Well, let say this is a workaround...
We'll definitively have to fix the bug ;)

Baptiste

On Sat, Jan 11, 2014 at 12:24 AM, Steve Ruiz  wrote:
> Made those changes, and it seems to be working properly, no segfault yet
> after ~2 minutes of checks.  Thanks!
>
> Steve Ruiz
> Manager - Hosting Operations
> Mirth
> ste...@mirth.com
>
>
> On Fri, Jan 10, 2014 at 3:06 PM, Baptiste  wrote:
>>
>> Hi Steve,
>>
>> Could you give a try to the tcp-check and tell us if your have the same
>> issue.
>> In your backend, turn your httpchk related directives into:
>>   option tcp-check
>>   tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
>>   tcp-check send \r\n
>>   tcp-check expect string good
>>
>> Baptiste
>>
>>
>> On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz  wrote:
>> > I'm experimenting with haproxy on a centos6 VM here.  I found that when
>> > I
>> > specified a health check page (option httpchk GET /url), and that page
>> > didn't exist, we have a large 404 page returned, and that causes haproxy
>> > to
>> > quickly segfault (seems like on the second try GET'ing and parsing the
>> > page).  I couldn't figure out from the website where to submit a bug, so
>> > I
>> > figure I'll try here first.
>> >
>> > Steps to reproduce:
>> > - setup http backend, with option httpchk and httpcheck expect string x.
>> > Make option httpchk point to a non-existent page
>> > - On backend server, set it up to serve large 404 response (in my case,
>> > the
>> > 404 page is 186kB, as it has an inline graphic and inline css)
>> > - Start haproxy, and wait for it to segfault
>> >
>> > I wasn't sure exactly what was causing this at first, so I did some work
>> > to
>> > narrow it down with GDB.  The variable values from gdb led me to the
>> > cause
>> > on my side, and hopefully can help you fix the issue.  I could not make
>> > this
>> > work with simply a large page for the http response - in that case, it
>> > seems
>> > to work as advertised, only inspecting the response up to tune.chksize
>> > (default 16384 as i've left it).  But if I do this with a 404, it seems
>> > to
>> > kill it.  Let me know what additional information you need if any.
>> > Thanks
>> > and kudos for the great bit of software!
>> >
>> >
>> > #haproxy config:
>> > #-
>> > # Example configuration for a possible web application.  See the
>> > # full configuration options online.
>> > #
>> > #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
>> > #
>> > #-
>> >
>> > # Help in developing config here:
>> > # https://www.twilio.com/engineering/2013/10/16/haproxy
>> >
>> >
>> > #-
>> > # Global settings
>> > #-
>> > global
>> > # to have these messages end up in /var/log/haproxy.log you will
>> > # need to:
>> > #
>> > # 1) configure syslog to accept network log events.  This is done
>> > #by adding the '-r' option to the SYSLOGD_OPTIONS in
>> > #/etc/sysconfig/syslog
>> > #
>> > # 2) configure local2 events to go to the /var/log/haproxy.log
>> > #   file. A line like the following can be added to
>> > #   /etc/sysconfig/syslog
>> > #
>> > #local2.*   /var/log/haproxy.log
>> > #
>> > log 127.0.0.1 local2 info
>> >
>> > chroot  /var/lib/haproxy
>> > pidfile /var/run/haproxy.pid
>> > maxconn 4000
>> > userhaproxy
>> > group   haproxy
>> > daemon
>> >
>> > #enable stats
>> > stats socket /tmp/haproxy.sock
>> >
>> > listen ha_stats :8088
>> > balance source
>> > mode http
>> > timeout client 3ms
>> > stats enable
>> > stats auth haproxystats:foobar
>> > stats uri /haproxy?stats
>> >
>> > #-
>> > # common defaults that all the 'listen' and 'backend' sections will
>> > # use if not designated in their block
>> > #-
>> > defaults
>> > modehttp
>> > log global
>> > option  httplog
>> > option  dontlognull
>> > #keep persisten client connection open
>> > option  http-server-close
>> > option forwardfor   except 127.0.0.0/8
>> > option  redispatch
>> > # Limit number of retries - total time trying to connect = connect
>> > timeout * (#retries + 1)
>> > retries 2
>> > timeout http-request10s
>> > timeout queue   1m
>> > #timeout opening a tcp connection to server - should be shorter than
>> > timeout client and server
>> > timeout connect 3100
>> > timeout client  30s
>> > ti

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Steve Ruiz
Made those changes, and it seems to be working properly, no segfault yet
after ~2 minutes of checks.  Thanks!

Steve Ruiz
Manager - Hosting Operations
Mirth
ste...@mirth.com 


On Fri, Jan 10, 2014 at 3:06 PM, Baptiste  wrote:

> Hi Steve,
>
> Could you give a try to the tcp-check and tell us if your have the same
> issue.
> In your backend, turn your httpchk related directives into:
>   option tcp-check
>   tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
>   tcp-check send \r\n
>   tcp-check expect string good
>
> Baptiste
>
>
> On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz  wrote:
> > I'm experimenting with haproxy on a centos6 VM here.  I found that when I
> > specified a health check page (option httpchk GET /url), and that page
> > didn't exist, we have a large 404 page returned, and that causes haproxy
> to
> > quickly segfault (seems like on the second try GET'ing and parsing the
> > page).  I couldn't figure out from the website where to submit a bug, so
> I
> > figure I'll try here first.
> >
> > Steps to reproduce:
> > - setup http backend, with option httpchk and httpcheck expect string x.
> > Make option httpchk point to a non-existent page
> > - On backend server, set it up to serve large 404 response (in my case,
> the
> > 404 page is 186kB, as it has an inline graphic and inline css)
> > - Start haproxy, and wait for it to segfault
> >
> > I wasn't sure exactly what was causing this at first, so I did some work
> to
> > narrow it down with GDB.  The variable values from gdb led me to the
> cause
> > on my side, and hopefully can help you fix the issue.  I could not make
> this
> > work with simply a large page for the http response - in that case, it
> seems
> > to work as advertised, only inspecting the response up to tune.chksize
> > (default 16384 as i've left it).  But if I do this with a 404, it seems
> to
> > kill it.  Let me know what additional information you need if any.
>  Thanks
> > and kudos for the great bit of software!
> >
> >
> > #haproxy config:
> > #-
> > # Example configuration for a possible web application.  See the
> > # full configuration options online.
> > #
> > #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
> > #
> > #-
> >
> > # Help in developing config here:
> > # https://www.twilio.com/engineering/2013/10/16/haproxy
> >
> >
> > #-
> > # Global settings
> > #-
> > global
> > # to have these messages end up in /var/log/haproxy.log you will
> > # need to:
> > #
> > # 1) configure syslog to accept network log events.  This is done
> > #by adding the '-r' option to the SYSLOGD_OPTIONS in
> > #/etc/sysconfig/syslog
> > #
> > # 2) configure local2 events to go to the /var/log/haproxy.log
> > #   file. A line like the following can be added to
> > #   /etc/sysconfig/syslog
> > #
> > #local2.*   /var/log/haproxy.log
> > #
> > log 127.0.0.1 local2 info
> >
> > chroot  /var/lib/haproxy
> > pidfile /var/run/haproxy.pid
> > maxconn 4000
> > userhaproxy
> > group   haproxy
> > daemon
> >
> > #enable stats
> > stats socket /tmp/haproxy.sock
> >
> > listen ha_stats :8088
> > balance source
> > mode http
> > timeout client 3ms
> > stats enable
> > stats auth haproxystats:foobar
> > stats uri /haproxy?stats
> >
> > #-
> > # common defaults that all the 'listen' and 'backend' sections will
> > # use if not designated in their block
> > #-
> > defaults
> > modehttp
> > log global
> > option  httplog
> > option  dontlognull
> > #keep persisten client connection open
> > option  http-server-close
> > option forwardfor   except 127.0.0.0/8
> > option  redispatch
> > # Limit number of retries - total time trying to connect = connect
> > timeout * (#retries + 1)
> > retries 2
> > timeout http-request10s
> > timeout queue   1m
> > #timeout opening a tcp connection to server - should be shorter than
> > timeout client and server
> > timeout connect 3100
> > timeout client  30s
> > timeout server  30s
> > timeout http-keep-alive 10s
> > timeout check   10s
> > maxconn 3000
> >
> > #-
> > # main frontend which proxys to the backends
> > #

Re: Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Baptiste
Hi Steve,

Could you give a try to the tcp-check and tell us if your have the same issue.
In your backend, turn your httpchk related directives into:
  option tcp-check
  tcp-check send GET\ /cp/testcheck.html\ HTTP/1.0\r\n
  tcp-check send \r\n
  tcp-check expect string good

Baptiste


On Fri, Jan 10, 2014 at 11:16 PM, Steve Ruiz  wrote:
> I'm experimenting with haproxy on a centos6 VM here.  I found that when I
> specified a health check page (option httpchk GET /url), and that page
> didn't exist, we have a large 404 page returned, and that causes haproxy to
> quickly segfault (seems like on the second try GET'ing and parsing the
> page).  I couldn't figure out from the website where to submit a bug, so I
> figure I'll try here first.
>
> Steps to reproduce:
> - setup http backend, with option httpchk and httpcheck expect string x.
> Make option httpchk point to a non-existent page
> - On backend server, set it up to serve large 404 response (in my case, the
> 404 page is 186kB, as it has an inline graphic and inline css)
> - Start haproxy, and wait for it to segfault
>
> I wasn't sure exactly what was causing this at first, so I did some work to
> narrow it down with GDB.  The variable values from gdb led me to the cause
> on my side, and hopefully can help you fix the issue.  I could not make this
> work with simply a large page for the http response - in that case, it seems
> to work as advertised, only inspecting the response up to tune.chksize
> (default 16384 as i've left it).  But if I do this with a 404, it seems to
> kill it.  Let me know what additional information you need if any.  Thanks
> and kudos for the great bit of software!
>
>
> #haproxy config:
> #-
> # Example configuration for a possible web application.  See the
> # full configuration options online.
> #
> #   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
> #
> #-
>
> # Help in developing config here:
> # https://www.twilio.com/engineering/2013/10/16/haproxy
>
>
> #-
> # Global settings
> #-
> global
> # to have these messages end up in /var/log/haproxy.log you will
> # need to:
> #
> # 1) configure syslog to accept network log events.  This is done
> #by adding the '-r' option to the SYSLOGD_OPTIONS in
> #/etc/sysconfig/syslog
> #
> # 2) configure local2 events to go to the /var/log/haproxy.log
> #   file. A line like the following can be added to
> #   /etc/sysconfig/syslog
> #
> #local2.*   /var/log/haproxy.log
> #
> log 127.0.0.1 local2 info
>
> chroot  /var/lib/haproxy
> pidfile /var/run/haproxy.pid
> maxconn 4000
> userhaproxy
> group   haproxy
> daemon
>
> #enable stats
> stats socket /tmp/haproxy.sock
>
> listen ha_stats :8088
> balance source
> mode http
> timeout client 3ms
> stats enable
> stats auth haproxystats:foobar
> stats uri /haproxy?stats
>
> #-
> # common defaults that all the 'listen' and 'backend' sections will
> # use if not designated in their block
> #-
> defaults
> modehttp
> log global
> option  httplog
> option  dontlognull
> #keep persisten client connection open
> option  http-server-close
> option forwardfor   except 127.0.0.0/8
> option  redispatch
> # Limit number of retries - total time trying to connect = connect
> timeout * (#retries + 1)
> retries 2
> timeout http-request10s
> timeout queue   1m
> #timeout opening a tcp connection to server - should be shorter than
> timeout client and server
> timeout connect 3100
> timeout client  30s
> timeout server  30s
> timeout http-keep-alive 10s
> timeout check   10s
> maxconn 3000
>
> #-
> # main frontend which proxys to the backends
> #-
> frontend https_frontend
> bind :80
> redirect scheme https if !{ ssl_fc }
>
> #config help:
> https://github.com/observing/balancerbattle/blob/master/haproxy.cfg
> bind *:443 ssl crt /etc/certs/mycert.pem ciphers
> RC4-SHA:AES128-SHA:AES:!ADH:!aNULL:!DH:!EDH:!eNULL
> mode http
> default_backend webapp
>
> #-
> # Main backend for web application servers
> #

Bug report for latest dev release, 1.5.21, segfault when using http expect string x and large 404 page (includes GDB output)

2014-01-10 Thread Steve Ruiz
I'm experimenting with haproxy on a centos6 VM here.  I found that when I
specified a health check page (option httpchk GET /url), and that page
didn't exist, we have a large 404 page returned, and that causes haproxy to
quickly segfault (seems like on the second try GET'ing and parsing the
page).  I couldn't figure out from the website where to submit a bug, so I
figure I'll try here first.

Steps to reproduce:
- setup http backend, with option httpchk and httpcheck expect string x.
Make option httpchk point to a non-existent page
- On backend server, set it up to serve large 404 response (in my case, the
404 page is 186kB, as it has an inline graphic and inline css)
- Start haproxy, and wait for it to segfault

I wasn't sure exactly what was causing this at first, so I did some work to
narrow it down with GDB.  The variable values from gdb led me to the cause
on my side, and hopefully can help you fix the issue.  I could not make
this work with simply a large page for the http response - in that case, it
seems to work as advertised, only inspecting the response up to
tune.chksize (default 16384 as i've left it).  But if I do this with a 404,
it seems to kill it.  Let me know what additional information you need if
any.  Thanks and kudos for the great bit of software!


*#haproxy config:*
#-
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#-

# Help in developing config here:
# https://www.twilio.com/engineering/2013/10/16/haproxy


#-
# Global settings
#-
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events.  This is done
#by adding the '-r' option to the SYSLOGD_OPTIONS in
#/etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
#   file. A line like the following can be added to
#   /etc/sysconfig/syslog
#
#local2.*   /var/log/haproxy.log
#
log 127.0.0.1 local2 info

chroot  /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
userhaproxy
group   haproxy
daemon

#enable stats
stats socket /tmp/haproxy.sock

listen ha_stats :8088
balance source
mode http
timeout client 3ms
stats enable
stats auth haproxystats:foobar
stats uri /haproxy?stats

#-
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#-
defaults
modehttp
log global
option  httplog
option  dontlognull
#keep persisten client connection open
option  http-server-close
option forwardfor   except 127.0.0.0/8
option  redispatch
# Limit number of retries - total time trying to connect = connect
timeout * (#retries + 1)
retries 2
timeout http-request10s
timeout queue   1m
#timeout opening a tcp connection to server - should be shorter than
timeout client and server
timeout connect 3100
timeout client  30s
timeout server  30s
timeout http-keep-alive 10s
timeout check   10s
maxconn 3000

#-
# main frontend which proxys to the backends
#-
frontend https_frontend
bind :80
 redirect scheme https if !{ ssl_fc }

#config help:
https://github.com/observing/balancerbattle/blob/master/haproxy.cfg
 bind *:443 ssl crt /etc/certs/mycert.pem ciphers
RC4-SHA:AES128-SHA:AES:!ADH:!aNULL:!DH:!EDH:!eNULL
mode http
 default_backend webapp

#-
# Main backend for web application servers
#-
backend webapp
balance roundrobin
#Insert cookie SERVERID to pin it to one leg
cookie SERVERID insert nocache indirect
#http check should pull url below
option httpchk GET /cp/testcheck.html HTTP/1.0
#option httpchk GET /cp/testcheck.php HTTP/1.0
#http check should find string below in response to be considered up
http-check expect string good
#Define servers - inter=interval of 5s, rise 2=become avail after 2
successful checks, fall 3=take out after 3 fails