Re: haproxy 1.9.6 segfault in srv_update_status

2019-05-15 Thread Willy Tarreau
Hi Patrick,

On Wed, May 15, 2019 at 01:22:41AM -0400, Patrick Hemmer wrote:
> We haven't had a chance to update to 1.9.8 yet, so we're still running 1.9.6
> (Linux) in production, and just had 2 segfaults happen a little over an hour
> apart. When I look at the core dumps from them, the stack trace is the same.
> I'm not sure if this is an issue already fixed, so providing just in case.

In 1.9.6 some locking was missing in the roundrobin LB algorithm as well
as in the slowstart function. Any server state update there (weight change,
up/down etc) only exercises your luck :-)

> There was one oddity going on at the time these segfaults occurred. We had
> maxed out the Linux kernel's conntrack table. So haproxy would have been
> experiencing timeouts when attempting new connections, with health checks
> failing all over the place.

Yes good point, that's very likely what happened, which could indicate
that the rest of the time your servers are quite stable and you don't
trigger these code paths.

Thanks!
Willy



haproxy 1.9.6 segfault in srv_update_status

2019-05-14 Thread Patrick Hemmer
We haven't had a chance to update to 1.9.8 yet, so we're still running 
1.9.6 (Linux) in production, and just had 2 segfaults happen a little 
over an hour apart. When I look at the core dumps from them, the stack 
trace is the same. I'm not sure if this is an issue already fixed, so 
providing just in case.


There was one oddity going on at the time these segfaults occurred. We 
had maxed out the Linux kernel's conntrack table. So haproxy would have 
been experiencing timeouts when attempting new connections, with health 
checks failing all over the place.



(gdb) bt full
#0  task_schedule (when=, task=0x0) at 
include/proto/task.h:439

No locals.
#1  srv_update_status (s=0x7f6ea12e2a80) at src/server.c:4872
    next_admin = 0
    check = 0x7f6ea12e2f20
    xferred = 
    px = 0x7f6ea12d8300
    prev_srv_count = 6
    srv_was_stopping = 0
    log_level = 
    tmptrash = 0x0
#2  0x7f6ea0a7bd22 in server_recalc_eweight 
(sv=sv@entry=0x7f6ea12e2a80, must_update=must_update@entry=1) at 
src/server.c:1310

    px = 
    w = 
#3  0x7f6ea0a81ce8 in srv_update_state (params=0x7ffc51186bf0, 
version=1, srv=0x7f6ea12e2a80) at src/server.c:3112

    p = 0x7ffc51186ccf ""
    srv_op_state = 
    bk_f_forced_id = 
    port = 8080
    srv_admin_state = 0
    srv_last_time_change = 6
    srv_check_state = 6
    srv_agent_state = 0
    srv_check_result = CHK_RES_PASSED
    fqdn_set_by_cli = 0
    srv_check_status = 15
    port_str = 0x7ffc51186cd2 "8080"
    srvrecord = 0x0
    msg = 0x7f6ea08d7fe0
    srv_uweight = 1
    srv_iweight = 1
    srv_check_health = 
    srv_f_forced_id = 
    fqdn = 0x0
#4  apply_server_state () at src/server.c:3514
    bk_f_forced_id = 
    check_id = 
    check_name = 
    cur = 
    end = 
    mybuf = 
"36\000backoffice\000\062\000iad1gbow02\000\061\060.3.66.169\000\061\000\060\000\061\000\061\000\066\000\061\065\000\063\000\062\000\066\000\060\000\060\000\060\000-\000\070\060\070\060\000-\000\000\000\000\061\000\060\000\062\000\060\000\060\000\060\000\060\000-\000\070\060\070\060\000-\000\000me_since_last_change 
srv_check_status srv_check_result srv_check_health srv_check_state 
srv_agent_st"...

    mybuflen = 
    params = {0x7ffc51186c90 "36", 0x7ffc51186c93 "backoffice", 
0x7ffc51186c9e "2", 0x7ffc51186ca0 "iad1gbow02", 0x7ffc51186cab 
"10.3.66.169",
  0x7ffc51186cb7 "1", 0x7ffc51186cb9 "0", 0x7ffc51186cbb "1", 
0x7ffc51186cbd "1", 0x7ffc51186cbf "6", 0x7ffc51186cc1 "15",
  0x7ffc51186cc4 "3", 0x7ffc51186cc6 "2", 0x7ffc51186cc8 "6", 
0x7ffc51186cca "0", 0x7ffc51186ccc "0", 0x7ffc51186cce "0",

  0x7ffc51186cd0 "-", 0x7ffc51186cd2 "8080", 0x7ffc51186cd7 "-"}
    srv_params = {0x7ffc51186cab "10.3.66.169", 0x7ffc51186cb7 "1", 
0x7ffc51186cb9 "0", 0x7ffc51186cbb "1", 0x7ffc51186cbd "1",
  0x7ffc51186cbf "6", 0x7ffc51186cc1 "15", 0x7ffc51186cc4 "3", 
0x7ffc51186cc6 "2", 0x7ffc51186cc8 "6", 0x7ffc51186cca "0",
  0x7ffc51186ccc "0", 0x7ffc51186cce "0", 0x7ffc51186cd0 "-", 
0x7ffc51186cd2 "8080", 0x7ffc51186cd7 "-", 0x0, 0x0, 0x0, 0x0}

    arg = 
    srv_arg = 
    version = 
    diff = 0
    f = 0x7f6ea16ad080
    filepath = 
    globalfilepath = "/var/lib/haproxy/state", '\000' times>...
    localfilepath = 
"d\000\000\000\000\000\000\000f\222Ҡn\177\000\000b\222Ҡn\177\000\000\000\177\030Q\374\177\000\000\020\000\000\000\000\000\000\000\265\221Ҡn\177\000\000\261\221Ҡn\177\000\000\003\000\000\000\000\000\000\000\250\221Ҡn\177\000\000\b\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\250\221Ҡn\177\000\000\001\222Ҡn\177\000\000\f\000\000\000\000\000\000\000\060\200\030Q\374\177\000\000\t\000\000\000\000\000\000\000\363\221Ҡn\177\000\000\344t\262\240n\177\000\000\265\221Ҡn\177\000\000\240\231m\241n\177\000\000\\\000\000\000\000\000\000\000x\331m\241n\177\000\000\016\222Ҡn\177\000\000\b\000\000\000\000\000\000\000\250\221Ҡn\177\000\000\302\000\000\000\000\000\000\000\060\200\030"...

    len = 
    fileopenerr = 
    globalfilepathlen = 
    localfilepathlen = 
    curproxy = 0x7f6ea12d8300
    bk = 0x7f6ea12d8300
    srv = 0x7f6ea12e2a80
#5  0x7f6ea0a8f48f in init (argc=, argc@entry=13, 
argv=, argv@entry=0x7ffc51189488) at src/haproxy.c:1843

    arg_mode = 
    tmp = 
    cfg_pidfile = 
    err_code = 9
    err_msg = 0x0
    wl = 
    progname = 0x7ffc5118acf6 "haproxy"
    change_dir = 
    px = 
    pcf = 
#6  0x7f6ea09e41a7 in main (argc=13, argv=0x7ffc51189488) at 
src/haproxy.c:2774

    err = 
    retry = 
    limit = {rlim_cur = 131072, rlim_max = 131072}
    errmsg = 
"\000@\000\000\000\000\000\000\002v\037\237n\177\000\000\300t\004\241n\177\000\000`\027S\237n\177\000\000\030\000\000\000\000\000\000\000>\001\000\024\000\000\000\000p\244\005\241n\177\0