Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-26 Thread Andrey Galkin
Hi guys,

Sorry for late response. I've just tested current git master. It seems
the issue is gone now.

Wily, thank you for looking into the issue so close.

Anticipating latest 1.6.x in official Debian/Ubuntu packages soon.

Thanks,
Andrey

On Tue, Jun 21, 2016 at 7:50 PM, Willy Tarreau  wrote:
> Thus I backported all of this to 1.6 planning for 1.6.6. It would be nice
> if the people affected with issues could give it a try this week (either
> from git or just wait for tomorrow morning to get the latest snapshot).



Re: epoll and external-check leads to dropped connections

2016-06-20 Thread Andrey Galkin
Hi Willy,

OK, I see your points.

Re:
1) there is a known problem with epoll: child process affects parent
process. That was clearly visible in tests even with close() loop in
child.
I expect we may try to mitigate that by closing epoll FD before all others.

2) posix_spawn() actually helps with large VM size (optionally,
constrained by limits of RAM). I guess, a significant part of running
HAProxy VM is a constantly changing socket data. So, each fork() on a
large instance should stress the system quite hard due to almost full
copy-on-write. However, only benchmarks under full load can tell the
truth.


To avoid issues mentioned above, what do you think about adding a
separate "clean" process to invoke external-checks and then update
connection handling processes "set server"-like way? That's mostly a
solution with backward compatibility in mind.

For better efficiency and flexibility, another long-term approach
would be to create a httpchk-like check with support for arbitrary
request address (including UNIX socket) and providing backend server
in question as part of URI and/or POST data. That way, a local
efficient stateless simple scripted daemon (e.g. Python, Perl,
Node.js) can be created with integration-specific checks - it would
get requests from HAProxy, check requested server health and return
result back to HAProxy. The daemon can also be started and controlled
by HAProxy itself based on a new configuration option.

At the moment, I understand there is a less integrated approach to use
admin socket for similar purposes, but with no check control from
HAProxy side.
And existing httpchk approach with inetd on database nodes easily
leads to self-DoS. A long-running daemon on database side may open
other security issues as well. Built-in database checks are not really
cluster aware and do not support authentication.


Thanks,
Andrey


On Mon, Jun 20, 2016 at 11:31 AM, Willy Tarreau <w...@1wt.eu> wrote:
> Hi Andrey,
>
> On Tue, Jun 14, 2016 at 06:39:19AM +0300, Andrey Galkin wrote:
>> Hi Willy,
>>
>> It seems I fixed the issues:
>>
>> 1. Already mentioned fix for epoll
>> 2. Implemented posix_spawn() almost for all platform with fallback to
>> fork() (USE_POSIX_SPAWN).
>>- A temporary solution to close all file descriptors (not efficient)
>> 3. Implemented proper SOCK_CLOEXEC / FD_CLOEXEC on all open sockets
>> (USE_SOCK_CLOEXEC)
>
> I finally found some time to review your work, thought a bit about it and
> checked some of the portability stuff and overall I'd rather not fix it
> this way for several reasons :
>
> 1) epoll_create1() was only brought in kernel 2.6.27, so this means that
>on earlier versions we'll still need to perform the close by hand.
>
> 2) posix_spawn() doesn't provide *any* benefit over fork()+execve(), it's
>just a libc-provided wrapper which is supposed to also work on minimal
>systems which do not have a working fork() implementation. All the systems
>we support do have a working fork() since it's used early to daemonize the
>process, thus their posix_spawn() simply is fork+execve.
>
> 3) there are actually two problems with CLOEXEC. The first one is that it
>requires to be done *everywhere* and that it is quite hard to ensure none
>was missed. For example in your case you missed the pipe() calls. If we
>later implement fd passing using SCM_RIGHTS, there's a risk to miss it
>as well. Also we rely on extra libs like openssl or lua, and you don't
>know whether they have their own FDs (eg: for /dev/random or /dev/crypto).
>The second issue is the performance impact : on all systems where there's
>no CLOEXEC flag available on all syscalls, an extra fcntl() syscall will
>be needed with every socket creation, thus on accept() and connect(). We
>already spend most of our time in syscalls, and slowing down every syscall
>so that an execve() performed once in a while doesn't have to close its fds
>is not a good balance in my opinion.
>
> 4) relying on two completely different behaviours depending on the operating
>system combinations will just add more trouble and make bug reports more
>complicated to troubleshoot.
>
> 5) it doesn't protect against any FDs inherited from any calling process, so
>it's still not 100% safe as-is.
>
> I ran some performance tests on the close() loop. On my machine I can close
> about 17M FD per second, which means that when you run with 100k sockets,
> you can perform 170 fork/exec per second, which is already huge. Also people
> who implement external checks know that it comes with a cost. For all of
> these reasons I'd rather simply close all FDs after the fork(). On some
> systems (BSD, Solaris), we may even later switch to closefrom() which does
> exactly this.
>
> Regards,
> Willy



Re: epoll and external-check leads to dropped connections

2016-06-13 Thread Andrey Galkin
Hi Willy,

It seems I fixed the issues:

1. Already mentioned fix for epoll
2. Implemented posix_spawn() almost for all platform with fallback to
fork() (USE_POSIX_SPAWN).
   - A temporary solution to close all file descriptors (not efficient)
3. Implemented proper SOCK_CLOEXEC / FD_CLOEXEC on all open sockets
(USE_SOCK_CLOEXEC)

You can merge the changes from:
https://github.com/andvgal/haproxy/tree/20160614-fix-cloexec
Comparison against master:
https://github.com/andvgal/haproxy/compare/20160614-fix-cloexec?expand=1

I did as much testing as I could in scope of the problem. My original
test case with 10ms inter generates some false positive by itself -
that's where less than 1% of fails came from.


Patches are below:

>From eaa39496fd68afeff3b2cd1e4294518c3503db84 Mon Sep 17 00:00:00 2001
From: Andrey Galkin <and...@futoin.org>
Date: Tue, 14 Jun 2016 03:25:11 +0300
Subject: [PATCH 1/3] BUG/MINOR: checks: fixed to create epoll with CLOEXEC

This is only part of fixed to general issue of leaking file descriptors
to child external-check processes and leading to connection data corruption
and disconnects.
---
 src/ev_epoll.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/ev_epoll.c b/src/ev_epoll.c
index ccb7c33..dbfded1 100644
--- a/src/ev_epoll.c
+++ b/src/ev_epoll.c
@@ -176,7 +176,7 @@ REGPRM1 static int _do_init(struct poller *p)
 {
 p->private = NULL;

-epoll_fd = epoll_create(global.maxsock + 1);
+epoll_fd = epoll_create1(EPOLL_CLOEXEC);
 if (epoll_fd < 0)
 goto fail_fd;

@@ -222,7 +222,7 @@ REGPRM1 static int _do_test(struct poller *p)
 {
 int fd;

-fd = epoll_create(global.maxsock + 1);
+fd = epoll_create1(EPOLL_CLOEXEC);
 if (fd < 0)
 return 0;
 close(fd);
@@ -239,7 +239,7 @@ REGPRM1 static int _do_fork(struct poller *p)
 {
 if (epoll_fd >= 0)
 close(epoll_fd);
-epoll_fd = epoll_create(global.maxsock + 1);
+epoll_fd = epoll_create1(EPOLL_CLOEXEC);
 if (epoll_fd < 0)
 return 0;
 return 1;
-- 
2.1.4

>From fd60e2493b6df8d5bdfa610261464095b7b2b070 Mon Sep 17 00:00:00 2001
From: Andrey Galkin <and...@futoin.org>
Date: Tue, 14 Jun 2016 03:53:48 +0300
Subject: [PATCH 2/3] BUG/MEDIUM: checks: fixed to close all FDs on
 external-check OPTIM/MEDIUM: checks: conditional support of posix_spawn() for
 external-check

Calling fork() is suboptimal due VM management expense. posix_spawn() is
relatively thin call spawning a new process from clean image what can
make significant difference in heavy deployments.

posix_spawn() is controlled by new USE_POSIX_SPAWN setting which is enabled
by default for newer Linux, Solaris, OSX and *BSD.
---
 Makefile | 13 +
 src/checks.c | 45 +
 2 files changed, 58 insertions(+)

diff --git a/Makefile b/Makefile
index 08d6108..02037ee 100644
--- a/Makefile
+++ b/Makefile
@@ -37,6 +37,7 @@
 #   USE_TFO  : enable TCP fast open. Supported on Linux >= 3.7.
 #   USE_NS   : enable network namespace support.
Supported on Linux >= 2.6.24.
 #   USE_DL   : enable it if your system requires -ldl.
Automatic on Linux.
+#   USE_POSIX_SPAWN  : enable posix_spawn() support instead of fork()
 #   USE_DEVICEATLAS  : enable DeviceAtlas api.
 #   USE_51DEGREES: enable third party device detection
library from 51Degrees
 #
@@ -283,6 +284,7 @@ ifeq ($(TARGET),linux2628)
   ASSUME_SPLICE_WORKS= implicit
   EXTRA  += haproxy-systemd-wrapper
   USE_DL  = implicit
+  USE_POSIX_SPAWN = implicit
 else
 ifeq ($(TARGET),solaris)
   # This is for Solaris 8
@@ -294,6 +296,7 @@ ifeq ($(TARGET),solaris)
   USE_LIBCRYPT= implicit
   USE_CRYPT_H = implicit
   USE_GETADDRINFO = implicit
+  USE_POSIX_SPAWN = implicit
 else
 ifeq ($(TARGET),freebsd)
   # This is for FreeBSD
@@ -301,6 +304,7 @@ ifeq ($(TARGET),freebsd)
   USE_KQUEUE = implicit
   USE_TPROXY = implicit
   USE_LIBCRYPT   = implicit
+  USE_POSIX_SPAWN = implicit
 else
 ifeq ($(TARGET),osx)
   # This is for Mac OS/X
@@ -308,18 +312,21 @@ ifeq ($(TARGET),osx)
   USE_KQUEUE = implicit
   USE_TPROXY = implicit
   USE_LIBCRYPT   = implicit
+  USE_POSIX_SPAWN = implicit
 else
 ifeq ($(TARGET),openbsd)
   # This is for OpenBSD >= 3.0
   USE_POLL   = implicit
   USE_KQUEUE = implicit
   USE_TPROXY = implicit
+  USE_POSIX_SPAWN = implicit
 else
 ifeq ($(TARGET),netbsd)
   # This is for NetBSD
   USE_POLL   = implicit
   USE_KQUEUE = implicit
   USE_TPROXY = implicit
+  USE_POSIX_SPAWN = implicit
 else
 ifeq ($(TARGET),aix51)
   # This is for AIX 5.1
@@ -551,6 +558,12 @@ BUILD_OPTIONS   += $(call ignore_implicit,USE_DL)
 OPTIONS_LDFLAGS += -ldl
 endif

+ifneq ($(USE_POSIX_SPAWN),)
+OPTIONS_CFLAGS += -DUSE_POSIX_SPAWN
+BUILD_OPTIONS  += $(call ignore_implicit,USE_POSIX_SPAWN)
+endif
+
+
 # report DLMALLOC_SRC only if e