May 3, 2019 - Thanks to women, India is on its way to beat all voting records

2019-05-03 Thread TradeBriefs International



HAProxy 1.9.6 unresponsive

2019-05-03 Thread Patrick Hemmer
We are running HAProxy 1.9.6 and managed to get into a state where 
HAProxy was completely unresponsive. It was pegged at 100% like many of 
the other experiences here on the mailing list lately. But in addition 
it wouldn't respond to anything. The stats socket wasn't even responsive.


When I attached an strace, it sat there with no activity. When I 
attached GDB I got the following stack:


        (gdb) bt full
        #0  htx_get_head (htx=0x7fbeb666eba0) at include/common/htx.h:357
        No locals.
        #1  h2s_htx_make_trailers (h2s=h2s@entry=0x7fbeb625f9f0, 
htx=htx@entry=0x7fbeb666eba0) at src/mux_h2.c:4975
                        list = {{n = {ptr = 0x0, len = 0}, v = {ptr = 
0x0, len = 0}} }

                        h2c = 0x7fbeb6372320
                        blk = 
                        blk_end = 0x0
                        outbuf = {size = 140722044755807, area = 0x0, 
data = 140457080712096, head = 140457060939041}
                        h1m = {state = H1_MSG_HDR_NAME, flags = 2056, 
curr_len = 140457077580664, body_len = 16384, next = 2, err_pos = 0, 
err_state = -1237668736}

                        type = 
                        ret = 0
                        hdr = 0
                        idx = 
                        start = 
        #2  0x7fbeb50f2ef5 in h2_snd_buf (cs=0x7fbeb63ea9a0, 
buf=0x7fbeb6127048, count=2, flags=) at src/mux_h2.c:5372

                        h2s = 
                        orig_count = 
                        total = 15302
                        ret = 
                        htx = 0x7fbeb666eba0
                        blk = 
                        btype = 
                        idx = 
        #3  0x7fbeb5180be4 in si_cs_send (cs=0x7fbeb63ea9a0) at 
src/stream_interface.c:691

                        send_flag = 
                        conn = 0x7fbeb6051a70
                        si = 0x7fbeb6127268
                        oc = 0x7fbeb6127040
                        ret = 
                        did_send = 0
        #4  0x7fbeb51817c8 in si_update_both 
(si_f=si_f@entry=0x7fbeb6127268, si_b=si_b@entry=0x7fbeb61272a8) at 
src/stream_interface.c:850

                        req = 0x7fbeb6126fe0
                        res = 
                        cs = 
        #5  0x7fbeb50ea2e1 in process_stream (t=, 
context=0x7fbeb6126fd0, state=) at src/stream.c:2502

                        srv = 
                        s = 0x7fbeb6126fd0
                        sess = 
                        rqf_last = 
                        rpf_last = 3255042562
                        rq_prod_last = 
                        rq_cons_last = 
                        rp_cons_last = 7
                        rp_prod_last = 7
                        req_ana_back = 
                        req = 0x7fbeb6126fe0
                        res = 0x7fbeb6127040
                        si_f = 0x7fbeb6127268
                        si_b = 0x7fbeb61272a8
        #6  0x7fbeb51b20a8 in process_runnable_tasks () at 
src/task.c:434

                        t = 
                        state = 
                        ctx = 
                        process = 
                        t = 
                        max_processed = 
        #7  0x7fbeb512b6ff in run_poll_loop () at src/haproxy.c:2642
                        next = 
                        exp = 
        #8  run_thread_poll_loop (data=data@entry=0x7fbeb5d84620) at 
src/haproxy.c:2707

                        ptif = 
                        ptdf = 
                        start_lock = 0
        #9  0x7fbeb507d2b5 in main (argc=, 
argv=0x7ffc677d73b8) at src/haproxy.c:3343

                        tids = 0x7fbeb5d84620
                        threads = 0x7fbeb5eb6d90
                        i = 
                        old_sig = {__val = {68097, 0, 511101108338, 0, 
140722044760335, 140457059422467, 140722044760392, 140454020513805, 124, 
140457064304960, 390842023936, 140457064395072, 48, 140457035994976, 
18446603351664791121, 140454020513794}}

        ---Type  to continue, or q  to quit---
                        blocked_sig = {__val = {1844674406710583, 
18446744073709551615 }}

                        err = 
                        retry = 
                        limit = {rlim_cur = 131300, rlim_max = 131300}
                        errmsg = 
"\000@\000\000\000\000\000\000\002\366\210\263\276\177\000\000\300\364m\265\276\177\000\000`\227\274\263\276\177\000\000\030\000\000\000\000\000\000\000>\001\000\024\000\000\000\000p$o\265\276\177\000\000@>k\265\276\177\000\000\000\320$\265\276\177\000\000\274\276\177\000\000 
t}g\374\177\000\000\000\000\000\000\000\000\000\000P\367m\265"

                        pidfd = 

Our config is big and complex, and not something I want to post here (I 
may be able to provide directly if required). However I think the 
important bit is that we we have a frontend and backend which are used 
for load balancing gRPC traffic (thus h2). The backend servers are 

Re: Zero RTT in backend server side

2019-05-03 Thread Илья Шипицин
libressl is known to present "bigger than openssl-1.1.1" version (while
lacking many features)
let us wait for libressl+travis-ci patch approval

сб, 4 мая 2019 г. в 00:09, Olivier Houchard :

> Hi Igor,
>
> On Fri, May 03, 2019 at 05:21:50PM +0800, Igor Pav wrote:
> > Just tested with openssl 1.1.1b and haproxy 1.9.7, it appears no
> > success, you are right :)
> >
>
> Indeed :)
> I just pushed commit 010941f87605e8219d25becdbc652350a687d6a2 to master,
> that
> let me do 0RTT both as server and as client. This should be backported to
> 1.8 and 1.9 soon.
> Please note, however, that we will only attempt to connect to a server
> using 0RTT if the client did so, as we have to be sure the client support
> it,
> in case it receives a 425.
> This may change in 2.0, if we add the ability to retry failed requests.
>
> Regards,
>
> Olivier
>
>


Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Илья Шипицин
I would even use shorter path, i.e.

mkdir ~/t

пт, 3 мая 2019 г. в 22:27, Frederic Lecaille :

> On 5/3/19 5:35 PM, Frederic Lecaille wrote:
> > On 5/3/19 3:44 PM, Илья Шипицин wrote:
> >>
> >>
> >> пт, 3 мая 2019 г. в 18:42, Tim Düsterhus  >> >:
> >>
> >> Ilya,
> >>
> >> Am 03.05.19 um 15:39 schrieb Илья Шипицин:
> >>  > when I played with enabling travis-ci, I tried to set TMPDIR
> >> directly,
> >>  > however I was not lucky enough.
> >>  > Later Tim added "sed" magic to .travis.yml
> >>  >
> >>  > personally, I do not understand why "sed" is better than
> >> assigning TMPDIR
> >>  > directly.
> >>
> >> I did not try using TMPDIR=/tmp or something like that, because I
> >> thought there must be a reason why it's that strange long path.
> >>
> >>
> >> I tried /tmp and /var/tmp
> >> it seems that not any filesystem on osx can hold network socket (at
> >> least from my point of view)
> >
> > try to create a working directory owned by the user which run the reg
> > test :
> >
> > $ mkdir -p ~/tmp/
> > $ TMPDIR=~/tmp make reg-tests
>
> I confirm that with such a value everything work on all OS'es
> (https://travis-ci.com/haproxyFred/haproxy)
>
> The attached patch should fix this issue.
>
> Thank you Tim, Ilya.
>
> Fred.
>


Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Frederic Lecaille

On 5/3/19 5:35 PM, Frederic Lecaille wrote:

On 5/3/19 3:44 PM, Илья Шипицин wrote:



пт, 3 мая 2019 г. в 18:42, Tim Düsterhus >:


    Ilya,

    Am 03.05.19 um 15:39 schrieb Илья Шипицин:
 > when I played with enabling travis-ci, I tried to set TMPDIR
    directly,
 > however I was not lucky enough.
 > Later Tim added "sed" magic to .travis.yml
 >
 > personally, I do not understand why "sed" is better than
    assigning TMPDIR
 > directly.

    I did not try using TMPDIR=/tmp or something like that, because I
    thought there must be a reason why it's that strange long path.


I tried /tmp and /var/tmp
it seems that not any filesystem on osx can hold network socket (at 
least from my point of view)


try to create a working directory owned by the user which run the reg 
test :


    $ mkdir -p ~/tmp/
    $ TMPDIR=~/tmp make reg-tests


I confirm that with such a value everything work on all OS'es 
(https://travis-ci.com/haproxyFred/haproxy)


The attached patch should fix this issue.

Thank you Tim, Ilya.

Fred.
>From fc9decae9ec679038dc494ad612dd3eb144de408 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20L=C3=A9caille?= 
Date: Fri, 3 May 2019 19:16:02 +0200
Subject: [PATCH] BUILD: travis: TMPDIR replacement.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

TMPDIR default value is too long especially on OSX systems.
We decided to shorten it for all the OS'es.

Thank you to Tim Düsterhus and Ilya for having helped on this issue.
---
 .travis.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index f689fe982..7475ad028 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -31,12 +31,12 @@ before_script:
   # This is a fix for the super long TMPDIR on Mac making
   # the unix socket path names exceed the maximum allowed
   # length.
-  - sed -i'.original' '/TESTDIR=.*haregtests/s/haregtests-.*XX/regtest.XXX/' scripts/run-regtests.sh
+  - mkdir ~/tmp
 
 script:
   - make CC=$CC V=1 TARGET=$TARGET $FLAGS
   - ./haproxy -vv
-  - env VTEST_PROGRAM=../vtest/vtest make reg-tests
+  - env TMPDIR=~/tmp VTEST_PROGRAM=../vtest/vtest make reg-tests
 
 after_failure:
   - |
-- 
2.11.0



Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Emeric Brun
Hi Marcin,

On 5/3/19 4:56 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 5/3/19 4:50 PM, Emeric Brun wrote:
> 
>> I've a testing platform here but I don't use the usdm_drv but the 
>> qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the 
>> doc says to use with my chip) .
> 
> I see. I use qat 1.7 and qat-engine 0.5.40.
> 
>> Anyway, could you re-compile a haproxy's binary if I provide you a testing 
>> patch?
> 
> Sure, that should not be a problem.

The patch in attachment.
> 
>> The idea is to perform a deinit in the master to force a close of those 
>> '/dev's at each reload. Perhaps It won't fix our issue but this leak of fd 
>> should not be.
> 
> Hope this will give us at least some more insight..
> Regards,
> 
> Marcin Deranek

R,
Emeric
>From ca57857a492e898759ef211a8fd9714d0f7dd7fa Mon Sep 17 00:00:00 2001
From: Emeric Brun 
Date: Fri, 3 May 2019 17:06:59 +0200
Subject: [PATCH] BUG/MEDIUM: ssl: fix ssl engine's open fds are leaking.

The master didn't call the engine deinit, resulting
in a leak of fd opened by the engine during init. The
workers inherit of these accumulated fds at each reload.

This patch add a call to engine deinit on the master just
before reloading with an exec.
---
 src/haproxy.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/haproxy.c b/src/haproxy.c
index 603f084c..f77eb1b4 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -588,6 +588,13 @@ void mworker_reload()
 	if (fdtab)
 		deinit_pollers();
 
+#if defined(USE_OPENSSL)
+#ifndef OPENSSL_NO_ENGINE
+	/* Engines may have opened fds and we must close them */
+	ssl_free_engines();
+#endif
+#endif
+
 	/* restore the initial FD limits */
 	limit.rlim_cur = rlim_fd_cur_at_boot;
 	limit.rlim_max = rlim_fd_max_at_boot;
-- 
2.17.1



Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Frederic Lecaille

On 5/3/19 3:44 PM, Илья Шипицин wrote:



пт, 3 мая 2019 г. в 18:42, Tim Düsterhus >:


Ilya,

Am 03.05.19 um 15:39 schrieb Илья Шипицин:
 > when I played with enabling travis-ci, I tried to set TMPDIR
directly,
 > however I was not lucky enough.
 > Later Tim added "sed" magic to .travis.yml
 >
 > personally, I do not understand why "sed" is better than
assigning TMPDIR
 > directly.

I did not try using TMPDIR=/tmp or something like that, because I
thought there must be a reason why it's that strange long path.


I tried /tmp and /var/tmp
it seems that not any filesystem on osx can hold network socket (at 
least from my point of view)


try to create a working directory owned by the user which run the reg test :

   $ mkdir -p ~/tmp/
   $ TMPDIR=~/tmp make reg-tests






Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Marcin Deranek

Hi Emeric,

On 5/3/19 4:50 PM, Emeric Brun wrote:


I've a testing platform here but I don't use the usdm_drv but the 
qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the doc 
says to use with my chip) .


I see. I use qat 1.7 and qat-engine 0.5.40.


Anyway, could you re-compile a haproxy's binary if I provide you a testing 
patch?


Sure, that should not be a problem.



The idea is to perform a deinit in the master to force a close of those '/dev's 
at each reload. Perhaps It won't fix our issue but this leak of fd should not 
be.


Hope this will give us at least some more insight..
Regards,

Marcin Deranek


On 5/3/19 4:21 PM, Marcin Deranek wrote:

Hi Emeric,

It looks like on every reload master leaks /dev/usdm_drv device:

# systemctl restart haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv

# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv

# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx-- 1 root root 64 May  3 15:40 10 -> /dev/usdm_drv
lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv

Obviously workers do inherit this from the master. Looking at workers I see the 
following:

* 1st gen:

# ls -al /proc/36083/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio19
/dev/uio3
/dev/uio35
/dev/usdm_drv

* 2nd gen:

# ls -al /proc/41637/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio23
/dev/uio39
/dev/uio7
/dev/usdm_drv
/dev/usdm_drv

Looks like only /dev/usdm_drv is leaked.

Cheers,

Marcin Deranek

On 5/3/19 2:22 PM, Emeric Brun wrote:

Hi Marcin,

On 4/29/19 6:41 PM, Marcin Deranek wrote:

Hi Emeric,

On 4/29/19 3:42 PM, Emeric Brun wrote:

Hi Marcin,




I've also a contact at intel who told me to try this option on the qat engine:


--disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
    Disable/Enable the engine from being initialized automatically 
following a
    fork operation. This is useful in a situation where you want to tightly
    control how many instances are being used for processes. For instance 
if an
    application forks to start a process that does not utilize QAT currently
    the default behaviour is for the engine to still automatically get 
started
    in the child using up an engine instance. After using this flag either 
the
    engine needs to be initialized manually using the engine message:
    INIT_ENGINE or will automatically get initialized on the first QAT 
crypto
    operation. The initialization on fork is enabled by default.


I tried to build QAT Engine with disabled auto init, but that did not help. Now 
I get the following during startup:

2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 Unable to 
initialize memory file handle /dev/usdm_drv
2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
[29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure


" INIT_ENGINE or will automatically get initialized on the first QAT crypto 
operation"

Perhaps the init appears "with first qat crypto operation" and is delayed after 
the fork so if a chroot is configured, it doesn't allow some accesses
to /dev. Could you perform a test in that case without chroot enabled in the 
haproxy config ?


Removed chroot and now it initializes properly. Unfortunately reload still causes 
"stuck" HAProxy process :-(

Marcin Deranek


Could you check with "ls -l /proc//fd" if the "/dev/" is 
open multiple times after a reload?

Emeric







Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Emeric Brun
Hi Marcin,

Good so we progress!

I've a testing platform here but I don't use the usdm_drv but the 
qat_contig_mem and I don't reproduce this issue (I'm using QAT 1.5, as the doc 
says to use with my chip) .

Anyway, could you re-compile a haproxy's binary if I provide you a testing 
patch?

The idea is to perform a deinit in the master to force a close of those '/dev's 
at each reload. Perhaps It won't fix our issue but this leak of fd should not 
be.

R,
Emeric

On 5/3/19 4:21 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> It looks like on every reload master leaks /dev/usdm_drv device:
> 
> # systemctl restart haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
> lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
> 
> # systemctl reload haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
> lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
> lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv
> 
> # systemctl reload haproxy.service
> # ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
> lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
> lrwx-- 1 root root 64 May  3 15:40 10 -> /dev/usdm_drv
> lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
> lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv
> 
> Obviously workers do inherit this from the master. Looking at workers I see 
> the following:
> 
> * 1st gen:
> 
> # ls -al /proc/36083/fd|awk '/dev/ {print $NF}'|sort
> /dev/null
> /dev/null
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_dev_processes
> /dev/uio19
> /dev/uio3
> /dev/uio35
> /dev/usdm_drv
> 
> * 2nd gen:
> 
> # ls -al /proc/41637/fd|awk '/dev/ {print $NF}'|sort
> /dev/null
> /dev/null
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_adf_ctl
> /dev/qat_dev_processes
> /dev/uio23
> /dev/uio39
> /dev/uio7
> /dev/usdm_drv
> /dev/usdm_drv
> 
> Looks like only /dev/usdm_drv is leaked.
> 
> Cheers,
> 
> Marcin Deranek
> 
> On 5/3/19 2:22 PM, Emeric Brun wrote:
>> Hi Marcin,
>>
>> On 4/29/19 6:41 PM, Marcin Deranek wrote:
>>> Hi Emeric,
>>>
>>> On 4/29/19 3:42 PM, Emeric Brun wrote:
 Hi Marcin,

>
>> I've also a contact at intel who told me to try this option on the qat 
>> engine:
>>
>>> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
>>>    Disable/Enable the engine from being initialized automatically 
>>> following a
>>>    fork operation. This is useful in a situation where you want to 
>>> tightly
>>>    control how many instances are being used for processes. For 
>>> instance if an
>>>    application forks to start a process that does not utilize QAT 
>>> currently
>>>    the default behaviour is for the engine to still automatically 
>>> get started
>>>    in the child using up an engine instance. After using this flag 
>>> either the
>>>    engine needs to be initialized manually using the engine message:
>>>    INIT_ENGINE or will automatically get initialized on the first 
>>> QAT crypto
>>>    operation. The initialization on fork is enabled by default.
>
> I tried to build QAT Engine with disabled auto init, but that did not 
> help. Now I get the following during startup:
>
> 2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 
> Unable to initialize memory file handle /dev/usdm_drv
> 2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
> [29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure

 " INIT_ENGINE or will automatically get initialized on the first QAT 
 crypto operation"

 Perhaps the init appears "with first qat crypto operation" and is delayed 
 after the fork so if a chroot is configured, it doesn't allow some accesses
 to /dev. Could you perform a test in that case without chroot enabled in 
 the haproxy config ?
>>>
>>> Removed chroot and now it initializes properly. Unfortunately reload still 
>>> causes "stuck" HAProxy process :-(
>>>
>>> Marcin Deranek
>>
>> Could you check with "ls -l /proc//fd" if the "/dev/" 
>> is open multiple times after a reload?
>>
>> Emeric
>>




Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Marcin Deranek

Hi Emeric,

It looks like on every reload master leaks /dev/usdm_drv device:

# systemctl restart haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv

# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv

# systemctl reload haproxy.service
# ls -la /proc/$(cat haproxy.pid)/fd|fgrep dev
lr-x-- 1 root root 64 May  3 15:40 0 -> /dev/null
lrwx-- 1 root root 64 May  3 15:40 10 -> /dev/usdm_drv
lrwx-- 1 root root 64 May  3 15:40 7 -> /dev/usdm_drv
lrwx-- 1 root root 64 May  3 15:40 9 -> /dev/usdm_drv

Obviously workers do inherit this from the master. Looking at workers I 
see the following:


* 1st gen:

# ls -al /proc/36083/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio19
/dev/uio3
/dev/uio35
/dev/usdm_drv

* 2nd gen:

# ls -al /proc/41637/fd|awk '/dev/ {print $NF}'|sort
/dev/null
/dev/null
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_adf_ctl
/dev/qat_dev_processes
/dev/uio23
/dev/uio39
/dev/uio7
/dev/usdm_drv
/dev/usdm_drv

Looks like only /dev/usdm_drv is leaked.

Cheers,

Marcin Deranek

On 5/3/19 2:22 PM, Emeric Brun wrote:

Hi Marcin,

On 4/29/19 6:41 PM, Marcin Deranek wrote:

Hi Emeric,

On 4/29/19 3:42 PM, Emeric Brun wrote:

Hi Marcin,




I've also a contact at intel who told me to try this option on the qat engine:


--disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
   Disable/Enable the engine from being initialized automatically following 
a
   fork operation. This is useful in a situation where you want to tightly
   control how many instances are being used for processes. For instance if 
an
   application forks to start a process that does not utilize QAT currently
   the default behaviour is for the engine to still automatically get 
started
   in the child using up an engine instance. After using this flag either 
the
   engine needs to be initialized manually using the engine message:
   INIT_ENGINE or will automatically get initialized on the first QAT crypto
   operation. The initialization on fork is enabled by default.


I tried to build QAT Engine with disabled auto init, but that did not help. Now 
I get the following during startup:

2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 Unable to 
initialize memory file handle /dev/usdm_drv
2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
[29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure


" INIT_ENGINE or will automatically get initialized on the first QAT crypto 
operation"

Perhaps the init appears "with first qat crypto operation" and is delayed after 
the fork so if a chroot is configured, it doesn't allow some accesses
to /dev. Could you perform a test in that case without chroot enabled in the 
haproxy config ?


Removed chroot and now it initializes properly. Unfortunately reload still causes 
"stuck" HAProxy process :-(

Marcin Deranek


Could you check with "ls -l /proc//fd" if the "/dev/" is 
open multiple times after a reload?

Emeric





Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread Lukas Tribus
Hello,


On Fri, 3 May 2019 at 14:15, Emeric Brun  wrote:
> >> Please do not commit this yet.
> >>
> >> We need those random devices open in openssl 1.1.1. We specifically
> >> pushed for this and had very long conversations with openssl folks.
> >>
> >> I don't have time to dig up the entire history right now, will do that
> >> later for context, however, please do not commit this yet.
> >>
> >>
> >
> > Lukas,
> >
> > This is the code of deinitilisation of the master, which is launched before
> > the re-execution of the master, it does not impact the workers.
> >
>
> Indeed if the workers keep the fd open it should work, the master is outside 
> de chroot and doesn't need to keep the fd open.

Ok, thanks for clarifying to both of you, I imagined something like
this but wanted to be sure.


cheers,
lukas



findings of gcc address sanitizer

2019-05-03 Thread Илья Шипицин
Hello,

I run reg-tests on gcc-9 (fedora 30).
I built haproxy the following way

make CC=gcc V=1 TARGET=$TARGET $FLAGS DEBUG_CFLAGS="-fsanitize=address
-ggdb" LDFLAGS="-lasan"

asan found couple of things

***  h10.1 debug|#0 0x6db986 in update_log_hdr src/log.c:1399
***  h10.1 debug|#1 0x6db986 in __do_send_log src/log.c:1547
***  h10.1 debug|#2 0x6db986 in __send_log src/log.c:1764
***  h10.1 debug|#3 0x6e274e in strm_log src/log.c:2959
***  h10.1 debug|#4 0x559753 in process_stream src/stream.c:2665
***  h10.1 debug|#5 0x7b66b6 in process_runnable_tasks
src/task.c:389
***  h10.1 debug|#6 0x6127f9 in run_poll_loop src/haproxy.c:2447
***  h10.1 debug|#7 0x6127f9 in run_thread_poll_loop
src/haproxy.c:2512
***  h10.1 debug|#8 0x42241d in main src/haproxy.c:3183
***  h10.1 debug|#9 0x7f8ebc8aff32 in __libc_start_main
(/lib64/libc.so.6+0x23f32)
***  h10.1 debug|#10 0x4250bd in _start
(/home/ilia/haproxy-1/haproxy+0x4250bd)
***  h10.1 debug|
***  h10.1 debug|0x61903c95 is located 21 bytes inside of 1025-byte
region [0x61903c80,0x61904081)
***  h10.1 debug|freed by thread T0 here:
***  h10.1 debug|#0 0x7f8ebd15c5de in realloc
(/lib64/libasan.so.5+0x10e5de)
***  h10.1 debug|#1 0x6dbd31 in my_realloc2
include/common/standard.h:1432
***  h10.1 debug|#2 0x6dbd31 in init_log_buffers src/log.c:1880
***  h10.1 debug|
***  h10.1 debug|previously allocated by thread T0 here:
***  h10.1 debug|#0 0x7f8ebd15c5de in realloc
(/lib64/libasan.so.5+0x10e5de)
***  h10.1 debug|#1 0x6dbd31 in my_realloc2
include/common/standard.h:1432
***  h10.1 debug|#2 0x6dbd31 in init_log_buffers src/log.c:1880
***  h10.1 debug|
***  h10.1 debug|SUMMARY: AddressSanitizer: heap-use-after-free
src/log.c:1399 in update_log_hdr






***  h10.1
debug|=
***  h10.1 debug|==23684==ERROR: LeakSanitizer: detected memory leaks
***  h10.1 debug|
***  h10.1 debug|Direct leak of 24 byte(s) in 1 object(s) allocated
from:
***  h10.1 debug|#0 0x7f9ac626f1a8 in __interceptor_malloc
(/lib64/libasan.so.5+0x10e1a8)
***  h10.1 debug|#1 0x7f9ac6076b1b  (/lib64/libssl.so.1.1+0x33b1b)
***  h10.1 debug|
***  h10.1 debug|SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1
allocation(s).


Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Илья Шипицин
пт, 3 мая 2019 г. в 18:42, Tim Düsterhus :

> Ilya,
>
> Am 03.05.19 um 15:39 schrieb Илья Шипицин:
> > when I played with enabling travis-ci, I tried to set TMPDIR directly,
> > however I was not lucky enough.
> > Later Tim added "sed" magic to .travis.yml
> >
> > personally, I do not understand why "sed" is better than assigning TMPDIR
> > directly.
>
> I did not try using TMPDIR=/tmp or something like that, because I
> thought there must be a reason why it's that strange long path.
>

I tried /tmp and /var/tmp
it seems that not any filesystem on osx can hold network socket (at least
from my point of view)


>
> Best regards
> Tim Düsterhus
>


Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Tim Düsterhus
Ilya,

Am 03.05.19 um 15:39 schrieb Илья Шипицин:
> when I played with enabling travis-ci, I tried to set TMPDIR directly,
> however I was not lucky enough.
> Later Tim added "sed" magic to .travis.yml
> 
> personally, I do not understand why "sed" is better than assigning TMPDIR
> directly.

I did not try using TMPDIR=/tmp or something like that, because I
thought there must be a reason why it's that strange long path.

Best regards
Tim Düsterhus



Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Илья Шипицин
пт, 3 мая 2019 г. в 18:33, Frederic Lecaille :

> On 5/3/19 1:34 PM, Tim Düsterhus wrote:
> > Fred,
> > Ilya,
>
> Hello Tim,
>
> > Am 03.05.19 um 13:20 schrieb Frederic Lecaille:
> >> About the test which fail, I would say that such errors are not
> >> negligible :
> >>
> >>  Starting frontend GLOBAL: cannot change UNIX socket ownership
> >> [/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/
> >
> > I believe this is an issue with the long TMPDIR that I tried to mitigate
> > with this:
> https://github.com/haproxy/haproxy/blob/master/.travis.yml#L34
>
> With your patch, vtest is able to create the LOG files at the same place
> $TMPDIR/ where the UNIX stats socket should be
> created. So this does not interfere with the test.
>
> > While debugging I noticed that the validation did not properly account
> > for the temporary extension of the filename during start-up, causing
> > HAProxy to accept the filename during the check, but fail to set it up.
> > This leads to the misleading error message.
>
> Yes, perhaps he UNIX stats socket filename is too long (I have found 104
> max length for sun_path on Max OS X, 108 on Linux).
>
> So, I propose you revert your fix, and try to find another ways to set
> TMPDIR with a shorter value than the default one which is too long for
> UNIX sockets. At least this is the correct way to change the working
> directory for vtest.
>
> For instance we have:
>
>
> /var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/vtc.23058.0fa4d8bc/h1/stats.sock
>
> which is 94 bytes long. Should work only if we do not add an ..tmp
> extension bigger than 10 bytes. I guess this is not the case when the
> PID is big. Now I understand why some test may pass.
>
> I have also noted that there is a missing closing bracket in this log line:
>
> ***  h10.0 debug|[ALERT] 122/093540 (23139) : Starting frontend
> GLOBAL: cannot change UNIX socket ownership
> [/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/
>
> which is built like that:
>
>  snprintf(errmsg, errlen, "%s [%s]", msg, path);
>
> with 100 as errlen value: "cannot change UNIX socket ownership
> [/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/" is
> exactly a 100 bytes long string. So here the path for the UNIX socket is
> truncated in the log.
>
> So let's try with a shorter TMPDIR variable please. This should fix the
> issue.
>

when I played with enabling travis-ci, I tried to set TMPDIR directly,
however I was not lucky enough.
Later Tim added "sed" magic to .travis.yml

personally, I do not understand why "sed" is better than assigning TMPDIR
directly.

please enable travis-ci.com on your accounts and try your ideas (with osx).


>
> > I did not get around to investigating this further and filing a bug
> > report, however.
> >
> > Best regards
> > Tim Düsterhus
> >
>
>


Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Frederic Lecaille

On 5/3/19 1:34 PM, Tim Düsterhus wrote:

Fred,
Ilya,


Hello Tim,


Am 03.05.19 um 13:20 schrieb Frederic Lecaille:

About the test which fail, I would say that such errors are not
negligible :

     Starting frontend GLOBAL: cannot change UNIX socket ownership
[/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/


I believe this is an issue with the long TMPDIR that I tried to mitigate
with this: https://github.com/haproxy/haproxy/blob/master/.travis.yml#L34


With your patch, vtest is able to create the LOG files at the same place 
$TMPDIR/ where the UNIX stats socket should be 
created. So this does not interfere with the test.



While debugging I noticed that the validation did not properly account
for the temporary extension of the filename during start-up, causing
HAProxy to accept the filename during the check, but fail to set it up.
This leads to the misleading error message.


Yes, perhaps he UNIX stats socket filename is too long (I have found 104 
max length for sun_path on Max OS X, 108 on Linux).


So, I propose you revert your fix, and try to find another ways to set 
TMPDIR with a shorter value than the default one which is too long for 
UNIX sockets. At least this is the correct way to change the working 
directory for vtest.


For instance we have:

/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/vtc.23058.0fa4d8bc/h1/stats.sock

which is 94 bytes long. Should work only if we do not add an ..tmp 
extension bigger than 10 bytes. I guess this is not the case when the 
PID is big. Now I understand why some test may pass.


I have also noted that there is a missing closing bracket in this log line:

***  h10.0 debug|[ALERT] 122/093540 (23139) : Starting frontend 
GLOBAL: cannot change UNIX socket ownership 
[/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/


which is built like that:

snprintf(errmsg, errlen, "%s [%s]", msg, path);

with 100 as errlen value: "cannot change UNIX socket ownership 
[/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/" is 
exactly a 100 bytes long string. So here the path for the UNIX socket is 
truncated in the log.


So let's try with a shorter TMPDIR variable please. This should fix the 
issue.



I did not get around to investigating this further and filing a bug
report, however.

Best regards
Tim Düsterhus






Re: [External] Re: QAT intermittent healthcheck errors

2019-05-03 Thread Emeric Brun
Hi Marcin,

On 4/29/19 6:41 PM, Marcin Deranek wrote:
> Hi Emeric,
> 
> On 4/29/19 3:42 PM, Emeric Brun wrote:
>> Hi Marcin,
>>
>>>
 I've also a contact at intel who told me to try this option on the qat 
 engine:

> --disable-qat_auto_engine_init_on_fork/--enable-qat_auto_engine_init_on_fork
>   Disable/Enable the engine from being initialized automatically 
> following a
>   fork operation. This is useful in a situation where you want to 
> tightly
>   control how many instances are being used for processes. For 
> instance if an
>   application forks to start a process that does not utilize QAT 
> currently
>   the default behaviour is for the engine to still automatically get 
> started
>   in the child using up an engine instance. After using this flag 
> either the
>   engine needs to be initialized manually using the engine message:
>   INIT_ENGINE or will automatically get initialized on the first QAT 
> crypto
>   operation. The initialization on fork is enabled by default.
>>>
>>> I tried to build QAT Engine with disabled auto init, but that did not help. 
>>> Now I get the following during startup:
>>>
>>> 2019-04-29T15:13:47.142297+02:00 host1 hapee-lb[16604]: qaeOpenFd:753 
>>> Unable to initialize memory file handle /dev/usdm_drv
>>> 2019-04-29T15:13:47+02:00 localhost hapee-lb[16611]: 127.0.0.1:60512 
>>> [29/Apr/2019:15:13:47.139] vip1/23: SSL handshake failure
>>
>> " INIT_ENGINE or will automatically get initialized on the first QAT crypto 
>> operation"
>>
>> Perhaps the init appears "with first qat crypto operation" and is delayed 
>> after the fork so if a chroot is configured, it doesn't allow some accesses
>> to /dev. Could you perform a test in that case without chroot enabled in the 
>> haproxy config ?
> 
> Removed chroot and now it initializes properly. Unfortunately reload still 
> causes "stuck" HAProxy process :-(
> 
> Marcin Deranek

Could you check with "ls -l /proc//fd" if the "/dev/" is 
open multiple times after a reload?

Emeric



Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread Emeric Brun
Hi Lukas,

On 5/3/19 1:49 PM, William Lallemand wrote:
> On Fri, May 03, 2019 at 01:38:00PM +0200, Lukas Tribus wrote:
>> Hello everyone,
>>
>>
>> On Fri, 3 May 2019 at 12:50, Robert Allen1  wrote:
>>> +#if defined(USE_OPENSSL) && (OPENSSL_VERSION_NUMBER >= 0x10101000L)
>>> +   if (global.ssl_used_frontend || global.ssl_used_backend)
>>> +   /* close random device FDs */
>>> +   RAND_keep_random_devices_open(0);
>>> +#endif
>>>
>>> and requests a backport to 1.8 and 1.9 where we noticed this issue (and
>>> which
>>> include the re-exec for reload code, if I followed its history
>>> thoroughly).
>>
>> Please do not commit this yet.
>>
>> We need those random devices open in openssl 1.1.1. We specifically
>> pushed for this and had very long conversations with openssl folks.
>>
>> I don't have time to dig up the entire history right now, will do that
>> later for context, however, please do not commit this yet.
>>
>>
> 
> Lukas,
> 
> This is the code of deinitilisation of the master, which is launched before
> the re-execution of the master, it does not impact the workers.
> 

Indeed if the workers keep the fd open it should work, the master is outside de 
chroot and doesn't need to keep the fd open.

Emeric



Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread William Lallemand
On Fri, May 03, 2019 at 01:38:00PM +0200, Lukas Tribus wrote:
> Hello everyone,
> 
> 
> On Fri, 3 May 2019 at 12:50, Robert Allen1  wrote:
> > +#if defined(USE_OPENSSL) && (OPENSSL_VERSION_NUMBER >= 0x10101000L)
> > +   if (global.ssl_used_frontend || global.ssl_used_backend)
> > +   /* close random device FDs */
> > +   RAND_keep_random_devices_open(0);
> > +#endif
> >
> > and requests a backport to 1.8 and 1.9 where we noticed this issue (and
> > which
> > include the re-exec for reload code, if I followed its history
> > thoroughly).
> 
> Please do not commit this yet.
> 
> We need those random devices open in openssl 1.1.1. We specifically
> pushed for this and had very long conversations with openssl folks.
> 
> I don't have time to dig up the entire history right now, will do that
> later for context, however, please do not commit this yet.
> 
> 

Lukas,

This is the code of deinitilisation of the master, which is launched before
the re-execution of the master, it does not impact the workers.

-- 
William Lallemand



Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Tim Düsterhus
Fred,
Ilya,

Am 03.05.19 um 13:20 schrieb Frederic Lecaille:
> About the test which fail, I would say that such errors are not
> negligible :
> 
>     Starting frontend GLOBAL: cannot change UNIX socket ownership
> [/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/

I believe this is an issue with the long TMPDIR that I tried to mitigate
with this: https://github.com/haproxy/haproxy/blob/master/.travis.yml#L34

While debugging I noticed that the validation did not properly account
for the temporary extension of the filename during start-up, causing
HAProxy to accept the filename during the check, but fail to set it up.
This leads to the misleading error message.

I did not get around to investigating this further and filing a bug
report, however.

Best regards
Tim Düsterhus



Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread Lukas Tribus
Hello everyone,


On Fri, 3 May 2019 at 12:50, Robert Allen1  wrote:
> +#if defined(USE_OPENSSL) && (OPENSSL_VERSION_NUMBER >= 0x10101000L)
> +   if (global.ssl_used_frontend || global.ssl_used_backend)
> +   /* close random device FDs */
> +   RAND_keep_random_devices_open(0);
> +#endif
>
> and requests a backport to 1.8 and 1.9 where we noticed this issue (and
> which
> include the re-exec for reload code, if I followed its history
> thoroughly).

Please do not commit this yet.

We need those random devices open in openssl 1.1.1. We specifically
pushed for this and had very long conversations with openssl folks.

I don't have time to dig up the entire history right now, will do that
later for context, however, please do not commit this yet.


Also CCing Emeric.


Thanks,
Lukas



Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Frederic Lecaille

On 5/3/19 1:20 PM, Frederic Lecaille wrote:

So on OSX you should try to use/create a temporary working director 
where you have enough permissions to create a stats UNIX socket with 
0600 as permissions.


I meant you should try to create a temporary working directory for vtest 
using the TMPDIR environment variable, as follows for instance:


  $ mkdir ~/tmp/foo

  $ TMPDIR=~/tmp/foo make reg-tests 
reg-tests/http-capture/multiple_headers.vtc


## Preparing to run tests ##
Testing with haproxy version: 2.0-dev2-a48237-261
Target : linux2628
Options : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE -PCRE_JIT 
-PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -REGPARM 
-STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT 
+CRYPT_H -VSYSCALL -GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 
-MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY -TFO -NS +DL +RT -DEVICEATLAS 
-51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL

## Gathering tests to run ##
  Add test: reg-tests/http-capture/multiple_headers.vtc
## Starting vtest ##
Testing with haproxy version: 2.0-dev2-a48237-261
#top  TEST reg-tests/http-capture/multiple_headers.vtc TIMED OUT 
(kill -9)
#top  TEST reg-tests/http-capture/multiple_headers.vtc FAILED 
(10.010) signal=9

1 tests failed, 0 tests skipped, 0 tests passed
## Gathering results ##
## Test case: reg-tests/http-capture/multiple_headers.vtc ##
## test results in: 
"/home/flecaille/tmp/foo/haregtests-2019-05-03_13-24-59.2P0ECZ/vtc.1327.1fe5daa1"

 c15.0 HTTP rx timeout (fd:9 5000 ms)
Makefile:971: recipe for target 'reg-tests' failed
make: *** [reg-tests] Error 1


As you can see the logs are now in /home/flecaille/tmp/foo (with ~ my 
home directory: /home/flecaille)


The LOG file is here: 
/home/flecaille/tmp/foo/haregtests-2019-05-03_13-24-59.2P0ECZ/vtc.1327.1fe5daa1/LOG


and the UNIX stats socket is here:

$ grep stats 
/home/flecaille/tmp/foo/haregtests-2019-05-03_13-24-59.2P0ECZ/vtc.1327.1fe5daa1/LOG 

 h 0.0 conf|\tstats socket 
"/home/flecaille/tmp/foo/haregtests-2019-05-03_13-24-59.2P0ECZ/vtc.1327.1fe5daa1/h/stats.sock" 
level admin mode 600









Re: reg-tests are broken when running osx + openssl

2019-05-03 Thread Frederic Lecaille

On 5/3/19 11:39 AM, Илья Шипицин wrote:

Hello,

I'm expanding openssl matrix.
here's failing build

https://travis-ci.org/chipitsine/haproxy-1/jobs/527683332


Hello Ilya,

In fact this has nothing to see with openssl. A lot of tests without any 
usage of TLS/SSL also fail.


There are a lot of HTTP rx which timed out.

Only these two tests passed:

   reg-tests/http-capture/multiple_headers.vtc
   reg-tests/spoe/wrong_init.vtc

but in these cases we do not have any log.

About the test which fail, I would say that such errors are not negligible :

Starting frontend GLOBAL: cannot change UNIX socket ownership 
[/var/folders/nz/vv4_9tw56nv9k3tkvyszvwg8gn/T//regtest.zHu/


I have simulated it with such a patch on Linux:

$ git diff src/proto_uxst.c
diff --git a/src/proto_uxst.c b/src/proto_uxst.c
index 980a22649..b5f945b9f 100644
--- a/src/proto_uxst.c
+++ b/src/proto_uxst.c
@@ -309,6 +309,10 @@ static int uxst_bind_listener(struct listener 
*listener, char *errmsg, int errle

goto err_unlink_temp;
}

+   err |= ERR_FATAL | ERR_ALERT;
+   msg = "cannot change UNIX socket ownership";
+   goto err_unlink_temp;
+
ready = 0;
ready_len = sizeof(ready);
if (getsockopt(fd, SOL_SOCKET, SO_ACCEPTCONN, , 
_len) == -1)



I got the same results as yours: lots of HTTP RX time out because 
haproxy exited unespectedly.


But in such a case on my PC only reg-tests/spoe/wrong_init.vtc succeeds.
I do not understand how reg-tests/http-capture/multiple_headers.vtc can 
succeed on your side.


Would be interesting to run it on OSX with this command:

$ make reg-tests reg-tests/http-capture/multiple_headers.vtc -- --debug


So on OSX you should try to use/create a temporary working director 
where you have enough permissions to create a stats UNIX socket with 
0600 as permissions.


And let's see if that fixes your issue.


Fred.



Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread Robert Allen1
Hi William,

William Lallemand  wrote on 03/05/2019 11:06:41:


> your mailer seems to mess with the whitespaces and tabs in the patch.

Apologies again for the formatting on my last message.

My mailer -- if it deserves the term -- is Lotus Notes, which is
apparently incapable of doing the right thing when it comes to
wrapping. I'll try to fix this before I embarrass myself further... :)

(Or else only write short paragraphs and sentences.)

Rob


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU




Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread Robert Allen1
Hi William,

William Lallemand  wrote on 03/05/2019 11:06:41:

> Could you send us as an attachment or using git-send-email because
> your mailer seems to mess with the whitespaces and tabs in the patch.
> Also add a line at the end of the commit message indicating in which 
version
> this patch should be backported. Thanks!

Apologies! I have attached it now, with a backports line.

> > * My reading of RAND_keep_random_devices_open is that it expects 
OpenSSL
> >   rand_lib initialisation to have occurred already, and it will do it 
if 
> > not.
> >   So it seems possible that this function call could incur some delays 
if
> >   rand_lib is not yet initialised and the entropy sources cause delay, 

> > etc.
> >   However, I don't know how big a concern that is. Any thoughts?
> 
> In this case you could check the variables global.ssl_used_frontend &&
> global.ssl_used_backend to ensure that SSL was used in the 
configuration.
> When those variables are not set, the random is not initialized. 

I did this in the attached patch.

However, I checked the current implementation in OpenSSL and I overstated 
the
problem before: the initialisation consists of constructing three locks 
and
initialising a short array of structs, with no obvious usage of random 
devices.
Therefore, it should not be very expensive, although it is still 
unnecessary.

For the sake of the list, the patch now looks like:

+#if defined(USE_OPENSSL) && (OPENSSL_VERSION_NUMBER >= 0x10101000L)
+   if (global.ssl_used_frontend || global.ssl_used_backend)
+   /* close random device FDs */
+   RAND_keep_random_devices_open(0);
+#endif

and requests a backport to 1.8 and 1.9 where we noticed this issue (and 
which
include the re-exec for reload code, if I followed its history 
thoroughly).

Rob


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


0001-BUG-MINOR-mworker-close-OpenSSL-FDs-on-reload.patch
Description: Binary data


[PR] BUILD: extend travis-ci matrix

2019-05-03 Thread PR Bot
Dear list!

Author: Ilya Shipitsin 
Number of patches: 1

This is an automated relay of the Github pull request:
   BUILD: extend travis-ci matrix

Patch title(s): 
   BUILD: extend travis-ci matrix

Link:
   https://github.com/haproxy/haproxy/pull/91

Edit locally:
   wget https://github.com/haproxy/haproxy/pull/91.patch && vi 91.patch

Apply locally:
   curl https://github.com/haproxy/haproxy/pull/91.patch | git am -

Description:
   added openssl-1.0.2, 1.1.0, 1.1.1, libressl-2.7.5, 2.8.3, 2.9.1
   added linux-ppc64le image
   
   libressl builds are yet broken.
   they will get repaired after separate patch (already sent to mailing
   list)

Instructions:
   This github pull request will be closed automatically; patch should be
   reviewed on the haproxy mailing list (haproxy@formilux.org). Everyone is
   invited to comment, even the patch's author. Please keep the author and
   list CCed in replies. Please note that in absence of any response this
   pull request will be lost.



Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread William Lallemand
Hi Robert,

> Hi William,
> 
> Thanks for the your input. I've included a patch below against current 
> master
> that I hope conforms to the contribution guidelines well enough. :)
>

Could you send us as an attachment or using git-send-email because
your mailer seems to mess with the whitespaces and tabs in the patch.
Also add a line at the end of the commit message indicating in which version
this patch should be backported. Thanks!

> A couple of thoughts on my work:
> 
> * Having to include a file directly from OpenSSL seems unfortunate, but OK 
> in
>   the context of the preprocessor guard
> * The comment is perhaps redundant, but I don't think the side effect of 
> the
>   OpenSSL function is obvious from its name otherwise

Fine to me.

> * My reading of RAND_keep_random_devices_open is that it expects OpenSSL
>   rand_lib initialisation to have occurred already, and it will do it if 
> not.
>   So it seems possible that this function call could incur some delays if
>   rand_lib is not yet initialised and the entropy sources cause delay, 
> etc.
>   However, I don't know how big a concern that is. Any thoughts?

In this case you could check the variables global.ssl_used_frontend &&
global.ssl_used_backend to ensure that SSL was used in the configuration.
When those variables are not set, the random is not initialized. 

Regards,

-- 
William Lallemand



reg-tests are broken when running osx + openssl

2019-05-03 Thread Илья Шипицин
Hello,

I'm expanding openssl matrix.
here's failing build

https://travis-ci.org/chipitsine/haproxy-1/jobs/527683332


Re: v1.9.6 socket unresponsive with high cpu usage

2019-05-03 Thread William Dauchy
Hi Willy,

> Note that with all the scheduling issues we've fixed over the last
> days, there are multiple candidates which could cause this. Another
> one was the lack of effect of the nice parameter which is normally
> set on the CLI but the lack of which could result in socat timing
> out during the first half second in absence of any response.

we got a similar issue with last v1.9.7+HEAD
(last commit 
http://git.haproxy.org/?p=haproxy-1.9.git;a=commit;h=f3c64c69b1a293ae54db359a2b2a5f9e0c5265dd)

Here are the complete threads backtraces:

(gdb) bt
#0  0x56153d958570 in fwrr_update_server_weight (srv=0x56157f7a8680) at 
src/lb_fwrr.c:198
#1  0x56153d8ae8ac in srv_update_status (s=0x56157f7a8680) at 
src/server.c:4923
#2  0x56153d8adfc2 in server_recalc_eweight (sv=sv@entry=0x56157f7a8680, 
must_update=must_update@entry=1) at src/server.c:1310
#3  0x56153d8b6edd in server_warmup (t=0x5615899c1a20, 
context=0x56157f7a8680, state=) at src/checks.c:1492
#4  0x56153d94d97a in process_runnable_tasks () at src/task.c:390
#5  0x56153d8c5c4f in run_poll_loop () at src/haproxy.c:2661
#6  run_thread_poll_loop (data=data@entry=0x5615893fab00) at src/haproxy.c:2726
#7  0x56153d83b455 in main (argc=, argv=0x7fff630890d8) at 
src/haproxy.c:3388

(gdb) thread apply all bt
Thread 16 (Thread 0x7fe9b6e32700 (LWP 2807)):
#0  0x56153d958459 in fwrr_update_server_weight (srv=0x56157f5b2fc0) at 
src/lb_fwrr.c:198
#1  0x56153d8ae8ac in srv_update_status (s=0x56157f5b2fc0) at 
src/server.c:4923
#2  0x56153d8adfc2 in server_recalc_eweight (sv=sv@entry=0x56157f5b2fc0, 
must_update=must_update@entry=1) at src/server.c:1310
#3  0x56153d8b6edd in server_warmup (t=0x5615899bf2f0, 
context=0x56157f5b2fc0, state=) at src/checks.c:1492
#4  0x56153d94d97a in process_runnable_tasks () at src/task.c:390
#5  0x56153d8c5c4f in run_poll_loop () at src/haproxy.c:2661
#6  run_thread_poll_loop (data=) at src/haproxy.c:2726
#7  0x7fe9bd5e7dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x7fe9bc320ead in clone () from /lib64/libc.so.6
Thread 15 (Thread 0x7fe9b6631700 (LWP 2808)):
#0  0x56153d96d7a0 in __eb_insert_dup (new=0x56157f52f424, 
sub=0x56157f5640a4) at ebtree/ebtree.h:478
#1  eb_insert_dup (sub=, new=0x56157f52f424) at 
ebtree/ebtree.c:31
#2  0x56153d96df10 in __eb32_insert (new=new@entry=0x56157f52f424, 
root=, root@entry=0x56157deb4140) at ebtree/eb32tree.h:337
#3  eb32_insert (root=root@entry=0x56157deb4140, new=new@entry=0x56157f52f424) 
at ebtree/eb32tree.c:27
#4  0x56153d957fcb in fwrr_queue_srv (s=s@entry=0x56157f52f080) at 
src/lb_fwrr.c:371
#5  0x56153d9585e8 in fwrr_update_server_weight (srv=0x56157f52f080) at 
src/lb_fwrr.c:242
#6  0x56153d8ae8ac in srv_update_status (s=0x56157f52f080) at 
src/server.c:4923
#7  0x56153d8adfc2 in server_recalc_eweight (sv=sv@entry=0x56157f52f080, 
must_update=must_update@entry=1) at src/server.c:1310
#8  0x56153d8b6edd in server_warmup (t=0x5615899be8a0, 
context=0x56157f52f080, state=) at src/checks.c:1492
#9  0x56153d94d97a in process_runnable_tasks () at src/task.c:390
#10 0x56153d8c5c4f in run_poll_loop () at src/haproxy.c:2661
#11 run_thread_poll_loop (data=) at src/haproxy.c:2726
#12 0x7fe9bd5e7dd5 in start_thread () from /lib64/libpthread.so.0
#13 0x7fe9bc320ead in clone () from /lib64/libc.so.6
Thread 14 (Thread 0x7fe9b5e30700 (LWP 2809)):
#0  0x56153d958572 in fwrr_update_server_weight (srv=0x56157f625580) at 
src/lb_fwrr.c:198
#1  0x56153d8ae8ac in srv_update_status (s=0x56157f625580) at 
src/server.c:4923
#2  0x56153d8adfc2 in server_recalc_eweight (sv=sv@entry=0x56157f625580, 
must_update=must_update@entry=1) at src/server.c:1310
#3  0x56153d8b6edd in server_warmup (t=0x5615899bfbe0, 
context=0x56157f625580, state=) at src/checks.c:1492
#4  0x56153d94d97a in process_runnable_tasks () at src/task.c:390
#5  0x56153d8c5c4f in run_poll_loop () at src/haproxy.c:2661
#6  run_thread_poll_loop (data=) at src/haproxy.c:2726
#7  0x7fe9bd5e7dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x7fe9bc320ead in clone () from /lib64/libc.so.6
Thread 13 (Thread 0x7fe9b562f700 (LWP 2810)):
#0  fwrr_update_server_weight (srv=0x56157f563d00) at src/lb_fwrr.c:198
#1  0x56153d8ae8ac in srv_update_status (s=0x56157f563d00) at 
src/server.c:4923
#2  0x56153d8adfc2 in server_recalc_eweight (sv=sv@entry=0x56157f563d00, 
must_update=must_update@entry=1) at src/server.c:1310
#3  0x56153d8b6edd in server_warmup (t=0x5615899becc0, 
context=0x56157f563d00, state=) at src/checks.c:1492
#4  0x56153d94d97a in process_runnable_tasks () at src/task.c:390
#5  0x56153d8c5c4f in run_poll_loop () at src/haproxy.c:2661
#6  run_thread_poll_loop (data=) at src/haproxy.c:2726
#7  0x7fe9bd5e7dd5 in start_thread () from /lib64/libpthread.so.0
#8  0x7fe9bc320ead in clone () from /lib64/libc.so.6
Thread 12 (Thread 0x7fe9a7fff700 (LWP 2811)):
#0  

Re: Zero RTT in backend server side

2019-05-03 Thread Igor Pav
Just tested with openssl 1.1.1b and haproxy 1.9.7, it appears no
success, you are right :)

On Thu, May 2, 2019 at 8:45 PM Olivier Houchard  wrote:
>
> Hi Igor,
>
> On Thu, May 02, 2019 at 08:39:58PM +0800, Igor Pav wrote:
> > Hello, can we use TLS zero RTT in server-side now? Just want to reduce
> > more latency when using SSL talk to the backend servers(also running
> > haproxy).
> >
> > Thanks in advance. Regards
> >
>
> It should work if you add "allow-0rtt" on your server line. However it hasn't
> been tested for some time, and was written with a development version of
> OpenSSL 1.1.1, so I wouldn't be entirely surprised if it didn't work anymore.
>
> Regards,
>
> Olivier



Re: leak of handle to /dev/urandom since 1.8?

2019-05-03 Thread Robert Allen1
William Lallemand  wrote on 02/05/2019 20:56:32:

> From: William Lallemand 
> To: Robert Allen1 
> Cc: haproxy@formilux.org
> Date: 02/05/2019 20:56
> Subject: Re: leak of handle to /dev/urandom since 1.8?
> 
> On Thu, May 02, 2019 at 03:34:22PM +0100, Robert Allen1 wrote:
> > Hi,
> > 
> > I spent some time digging into the FD leak and I've pinpointed it: 
it's an 
> > interaction between HAProxy re-exec and a change to keep the random 
> > devices open by default in OpenSSL 1.1.1 -- specifically 
> > https://urldefense.proofpoint.com/v2/url?
> 
u=https-3A__github.com_openssl_openssl_commit_c7504aeb640a88949dfe3146f7e0f275f517464c=DwIBAg=jf_iaSHvJObTbx-
> siA1ZOg=mL0dJce__JV7umP9p1PfZytbXkr-VrmKaDyuaum4y-
> 
M=VlTkwp0lruBUlXxGhxs-3CGI2Q019JikpqAlU4Lv8SU=2so7ELjkXftWUz9tSJgJer-
> eibQJsr5jR0cgq7NV5Ns=
> > 
> > Since a lot of different RAND_* family functions cause initialisation, 

> > that explains why hacking out ssl_initialize_random didn't work for 
us; 
> > some other part of reloading just ended up opening the random devices 
and 
> > then leaking the handles on the next exec().
> > 
> > Anyway, from experimentation, we can close the leak with a patch like 
(but 
> > obviously better abstracted than) this one against 1.9.6:
> > 
> > diff --git a/src/haproxy.c b/src/haproxy.c
> > index 1cb10391..d3482f46 100644
> > --- a/src/haproxy.c
> > +++ b/src/haproxy.c
> > @@ -737,6 +737,10 @@ void mworker_reload()
> > ptdf->fct();
> > if (fdtab)
> > deinit_pollers();
> > +   /* Close OpenSSL random devices, if open */
> > +   /* Note: If not already initialised, this may cause OpenSSL 
> > rand_lib initialisation... */
> > +   void RAND_keep_random_devices_open(int keep);
> > +   RAND_keep_random_devices_open(0);
> > /* restore the initial FD limits */
> > limit.rlim_cur = rlim_fd_cur_at_boot;
> > 
> > From reading in the OpenSSL code, I assume that the use of O_RDONLY 
> > without O_CLOEXEC or a fcntl is intentional so that it doesn't have to 

> > re-open the devices on fork().
> > 
> > So it looks like the best place to solve this is around there ^^, 
although 
> > it needs to be wrapped up in ssl_sock.c with appropriate version 
guards on 
> > the OpenSSL version.
> > 
> > Any thoughts on the above and how to proceed?
> > 
> > Rob
> > 
> 
> Hi Robert,
> 
> I think that the right thing to do, we should probably just call
> RAND_keep_random_devices_open(0); in the mworker_reload function.
> 
> You just need to check the SSL macros for the build and the minimum 
openssl
> version (1.1.1), so probably something like this:
> 
> #if defined(USE_OPENSSL) && (OPENSSL_VERSION_NUMBER >= 0x10101000L)
> RAND_keep_random_devices_open(0);
> #endif
> 
> I don't think we need anything in ssl_sock.c though.
> 
> Regards,
> 
> -- 
> William Lallemand
> 

Hi William,

Thanks for the your input. I've included a patch below against current 
master
that I hope conforms to the contribution guidelines well enough. :)

A couple of thoughts on my work:

* Having to include a file directly from OpenSSL seems unfortunate, but OK 
in
  the context of the preprocessor guard
* The comment is perhaps redundant, but I don't think the side effect of 
the
  OpenSSL function is obvious from its name otherwise
* My reading of RAND_keep_random_devices_open is that it expects OpenSSL
  rand_lib initialisation to have occurred already, and it will do it if 
not.
  So it seems possible that this function call could incur some delays if
  rand_lib is not yet initialised and the entropy sources cause delay, 
etc.
  However, I don't know how big a concern that is. Any thoughts?

Rob



>From 7f432956cd7a11837c2657944b5e037c510645c7 Mon Sep 17 00:00:00 2001
From: Rob Allen 
Date: Fri, 3 May 2019 09:11:32 +0100
Subject: [PATCH] BUG/MINOR: mworker: close OpenSSL FDs on reload

>From OpenSSL 1.1.1, the default behaviour is to maintain open FDs to any
random devices that get used by the random number library. As a result,
those FDs leak when the master re-execs on reload; since those FDs are
not marked FD_CLOEXEC or O_CLOEXEC, they also get inherited by children.
Eventually both master and children run out of FDs.

OpenSSL 1.1.1 introduces a new function to control whether the random
devices are kept open. When clearing the keep-open flag, it also closes
any currently open FDs, so it can be used to clean-up open FDs too.
Therefore, a call to this function is made in mworker_reload prior to
re-exec.
---
 src/haproxy.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/haproxy.c b/src/haproxy.c
index 603f084c..cc689b62 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -127,6 +127,7 @@
 #include 
 #ifdef USE_OPENSSL
 #include 
+#include 
 #endif
 
 /* array of init calls for older platforms */
@@ -587,6 +588,10 @@ void mworker_reload()
ptdf->fct();
if (fdtab)
deinit_pollers();
+#if defined(USE_OPENSSL) && (OPENSSL_VERSION_NUMBER >=