Re: 1.9.4 Make issue on Cygwin

2019-03-14 Thread Willy Tarreau
Hi Jeffrey,

On Fri, Mar 15, 2019 at 10:22:14AM +0800, ??? wrote:
> Hi,
> I'm trying to compile haproxy under cygwin but get problem.
> 
> I have try google search to resolve the probme but can't get any.
> 
> Have anyone can let me know what's wrong ?
> 
> Jeffrey_Chen@jeffrey_chen ~/haproxy-1.9.4
> $  make TARGET=cygwin ARCH=x86_64 CPU=generic USE_THREAD=1
> USE_REGPARM=1 USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 TRACE=1
> LDFLAGS="-Wl,--allow-multiple-definition"
>   LD  haproxy
> src/haproxy.o:haproxy.c:(.rdata$.refptr.__stop_init_STG_INIT[.refptr.__stop_init_STG_INIT]+0x0):
> undefined reference to `__stop_init_STG_INIT'
(...)
> cygwin gcc version : 7.4.0
> 
> ld version : 2.29.1.20171006
> 
> also have try gcc version : 8.3.0
> 
> I have use the same environment to build 1.8.19 haven't any problem.

I see what the problem is though I don't know cygwin nor windows executables
enough to be able to propose a fix. We use ELF sections to enumerate init
functions for different initialisation stages in the early boot. We had
to adjust the section names on OSX because the linker uses different names.
But on Windows I don't even know if cygwin produces ELF-compatible binaries
or emulates such sections. Someone knowing windows should possibly have a
deeper look at this I'm afraid :-/

Regards,
Willy



1.9.4 Make issue on Cygwin

2019-03-14 Thread 陳喵喵
Hi,
I'm trying to compile haproxy under cygwin but get problem.

I have try google search to resolve the probme but can't get any.

Have anyone can let me know what's wrong ?

Jeffrey_Chen@jeffrey_chen ~/haproxy-1.9.4
$  make TARGET=cygwin ARCH=x86_64 CPU=generic USE_THREAD=1
USE_REGPARM=1 USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 TRACE=1
LDFLAGS="-Wl,--allow-multiple-definition"
  LD  haproxy
src/haproxy.o:haproxy.c:(.rdata$.refptr.__stop_init_STG_INIT[.refptr.__stop_init_STG_INIT]+0x0):
undefined reference to `__stop_init_STG_INIT'
src/haproxy.o:haproxy.c:(.rdata$.refptr.__stop_init_STG_ALLOC[.refptr.__stop_init_STG_ALLOC]+0x0):
undefined reference to `__stop_init_STG_ALLOC'
src/haproxy.o:haproxy.c:(.rdata$.refptr.__start_init_STG_INIT[.refptr.__start_init_STG_INIT]+0x0):
undefined reference to `__start_init_STG_INIT'
src/haproxy.o:haproxy.c:(.rdata$.refptr.__start_init_STG_ALLOC[.refptr.__start_init_STG_ALLOC]+0x0):
undefined reference to `__start_init_STG_ALLOC'
collect2: error: ld returned 1 exit status
make: *** [Makefile:976: haproxy] Error 1

cygwin gcc version : 7.4.0

ld version : 2.29.1.20171006

also have try gcc version : 8.3.0

I have use the same environment to build 1.8.19 haven't any problem.


Thank You,

Jeffrey Chen


Re: haproxy reverse proxy to https streaming backend

2019-03-14 Thread PiBa-NL

Hi Thomas,

Op 14-3-2019 om 20:28 schreef Thomas Schmiedl:

Hello,

I never got a reply from the original author of xupnpd2 to fix the
hls-handling, so I created a lua-script (thanks to Thierry Fournier),
but it's too slow for the router cpu. Could someone rewrite the script
to a lua-c-module?

I don't think making this exact code a lua-c-module would solve the 
issue, lua is not a 'slow' language. But I do wonder if regex is the 
right tool for data manipulation..

Regards,
Thomas

test.cfg:
global
    lua-load /var/media/ftp/playlist.lua

frontend main
    mode http
    bind *:8080
    acl is_index_m3u8 path -m end /index.m3u8
    http-request use-service lua.playlist if is_index_m3u8
    default_backend forward

backend forward
    mode http
    server gjirafa puma.gjirafa.com:443 ssl verify none

playlist.lua:
core.register_service("playlist", "http", function(applet)
    local tcp = core.tcp()
    tcp:connect_ssl("51.75.52.73", 443)
    tcp:send("GET ".. applet.path .." HTTP/1.1\r\nConnection:
Close\r\nHost: puma.gjirafa.com\r\n\r\n")
    local body = tcp:receive("*a")

    local result = string.match(body,"^.*(#EXTM3U.-)#EXTINF")
    result = result ..
string.match(body,"(...%d+.ts%d+.ts%d+.ts)[\r\n|0]*$") 



I think a 'easier' regex might already improve performance, can you try 
this one for example ?:
    result = result .. 
string.match(body,"(#EXTINF:%d+[/.]%d+,\n%d+[/.]ts.#EXTINF:%d[/.]%d%d%d,.%d+[/.]ts.#EXTINF:%d+[/.]%d+,\n%d+[/.]ts)[\r\n|0]*$")


With my test using 'https://rextester.com/l/lua_online_compiler' and a 
little sample m3u8 it seemed to work faster anyhow.




    applet:set_status(200)
    applet:add_header("Content-Type", "application/x-mpegURL")
    applet:add_header("content-length", string.len(result))
    applet:add_header("Connection", "close")
    applet:start_response()
    applet:send(result)
end)

Am 19.02.2019 um 21:31 schrieb Thomas Schmiedl:

Am 19.02.2019 um 05:29 schrieb Willy Tarreau:

Hello Thomas,

On Sun, Feb 17, 2019 at 05:55:29PM +0100, Thomas Schmiedl wrote:

Hello Bruno,

I think the problem is the parsing of the .m3u8-playlist in 
xupnpd2. The

first entry to the .ts-file is 4 hours behind the actual time. But I
have no c++ experience to change the code.


For me if it works but not correctly like this, it clearly indicates
there is a (possibly minor) incompatibility between the client and the
server. It just happens that if your client doesn't support https, it
was never tested against this server and very likely needs to be 
adapted

to work correctly.


Is it possible in haproxy to manipulate the playlist file (server
response), that only the last .ts-entries will be available and 
returned

to xupnpd2?


No, haproxy doesn't manipulate contents. Not only it's completely 
out of

the scope of a load balancing proxy, but it would also encourage some
users to try to work around some of their deployment issues in the
ugliest
possible way, causing even more trouble (and frankly, on *every*
infrastructure where you find such horrible tricks deployed, the admins
implore you to help them because they're in big trouble and are stuck
with
no option left to fix the issues they've created).

If it's only a matter of modifying one file on the fly, you may manage
to do it using Lua : instead of forwarding the request to the server,
you send it to a Lua function, which itself makes the request to the
server, buffers the response, rewrites it, then sends it back to the
client. You must just make sure to only send there the requests for
the playlist file and nothing else.

Could someone send me such a lua-script example and how to include in
haproxy. Thanks


I personally think this is ugly compared to trying to fix the faulty
client. Maybe you can report your issue to the author(s) and share your
config to help them reproduce it ?

Regards,
Willy


Regards,
PiBa-NL (Pieter)




Re: stable-bot: WARNING: 42 bug fixes in queue for next release

2019-03-14 Thread Willy Tarreau
On Thu, Mar 14, 2019 at 02:15:07PM +, stable-...@haproxy.com wrote:
> Last release 1.9.4 was issued on 2019/02/06.  There are currently 42 patches 
> in the queue cut down this way:
> - 1 BUG, first one merged on 2019/02/10
> - 6 MAJOR, first one merged on 2019/02/10
> - 20 MEDIUM, first one merged on 2019/02/10
> - 15 MINOR, first one merged on 2019/02/10
> 
> Thus the computed ideal release date for 1.9.5 would be 2019/02/24, which was
> two weeks ago.

Time really flies. I didn't see the last two weeks at all. We're still
fighting with this 100% CPU issue and others that we've discovered in
the middle that only affect 2.0 (easy to say once diagnosed). With Louis'
trace we may not be too far from nailing down the 100% one, so I'd feel
bad issuing 1.9.5 now knowing that several people are hit by something
like this. I have a bunch of fixes from Christopher that I started to
review but that I won't have time to finish to review before the end
of the week so unfortunately I guess half of them could be postponed.
Let's grant us another day on the 100% CPU issue, otherwise we'll
release without it.

Willy



Re: haproxy reverse proxy to https streaming backend

2019-03-14 Thread Thomas Schmiedl

Hello,

I never got a reply from the original author of xupnpd2 to fix the
hls-handling, so I created a lua-script (thanks to Thierry Fournier),
but it's too slow for the router cpu. Could someone rewrite the script
to a lua-c-module?

Regards,
Thomas

test.cfg:
global
lua-load /var/media/ftp/playlist.lua

frontend main
mode http
bind *:8080
acl is_index_m3u8 path -m end /index.m3u8
http-request use-service lua.playlist if is_index_m3u8
default_backend forward

backend forward
mode http
server gjirafa puma.gjirafa.com:443 ssl verify none

playlist.lua:
core.register_service("playlist", "http", function(applet)
local tcp = core.tcp()
tcp:connect_ssl("51.75.52.73", 443)
tcp:send("GET ".. applet.path .." HTTP/1.1\r\nConnection:
Close\r\nHost: puma.gjirafa.com\r\n\r\n")
local body = tcp:receive("*a")

local result = string.match(body,"^.*(#EXTM3U.-)#EXTINF")
result = result ..
string.match(body,"(...%d+.ts%d+.ts%d+.ts)[\r\n|0]*$")

applet:set_status(200)
applet:add_header("Content-Type", "application/x-mpegURL")
applet:add_header("content-length", string.len(result))
applet:add_header("Connection", "close")
applet:start_response()
applet:send(result)
end)

Am 19.02.2019 um 21:31 schrieb Thomas Schmiedl:

Am 19.02.2019 um 05:29 schrieb Willy Tarreau:

Hello Thomas,

On Sun, Feb 17, 2019 at 05:55:29PM +0100, Thomas Schmiedl wrote:

Hello Bruno,

I think the problem is the parsing of the .m3u8-playlist in xupnpd2. The
first entry to the .ts-file is 4 hours behind the actual time. But I
have no c++ experience to change the code.


For me if it works but not correctly like this, it clearly indicates
there is a (possibly minor) incompatibility between the client and the
server. It just happens that if your client doesn't support https, it
was never tested against this server and very likely needs to be adapted
to work correctly.


Is it possible in haproxy to manipulate the playlist file (server
response), that only the last .ts-entries will be available and returned
to xupnpd2?


No, haproxy doesn't manipulate contents. Not only it's completely out of
the scope of a load balancing proxy, but it would also encourage some
users to try to work around some of their deployment issues in the
ugliest
possible way, causing even more trouble (and frankly, on *every*
infrastructure where you find such horrible tricks deployed, the admins
implore you to help them because they're in big trouble and are stuck
with
no option left to fix the issues they've created).

If it's only a matter of modifying one file on the fly, you may manage
to do it using Lua : instead of forwarding the request to the server,
you send it to a Lua function, which itself makes the request to the
server, buffers the response, rewrites it, then sends it back to the
client. You must just make sure to only send there the requests for
the playlist file and nothing else.

Could someone send me such a lua-script example and how to include in
haproxy. Thanks


I personally think this is ugly compared to trying to fix the faulty
client. Maybe you can report your issue to the author(s) and share your
config to help them reproduce it ?

Regards,
Willy









Re: 1.9.2: Crash with 300% CPU and stuck agent-checks

2019-03-14 Thread Louis Chanouha
Hello, 

In fact I have two haproxies fighting for CPU, but seemless reload date doesn't 
correspond with CPU usage increasing (13/03 at 21:29:20, average normal usage 
is around 5%).

└──╼ ps -eo pid,lstart,%cpu,cmd | grep "[h]aproxy"
 8406 Wed Mar 13 18:06:21 2019 97.5 /usr/sbin/haproxy -Ws -f 
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 29691 -x 
/run/haproxy/admin.sock
29604 Wed Mar 13 16:59:49 2019  0.0 /usr/sbin/haproxy -Ws -f 
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 29691 -x 
/run/haproxy/admin.sock
29691 Wed Mar 13 16:59:49 2019  180 /usr/sbin/haproxy -Ws -f 
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid

I tried to get what you asked for but I really do not have any idea what i'm 
doing (see below) :x. Do you have a secure way or a GPG key to send you the 
core dump ? I just extracted core dump of pid 29691 (old instance). I crashed 
8406 with bad command.

This bug happen once a week so next time I can do more.
Production is not impacted, the HAProxy is fully functionnal except i guess for 
checks (dead backend never marked UP).

Louis

└──╼  gdb --pid 29691
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 29691
[New LWP 29692]
[New LWP 29693]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x7f1549f8e303 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
84      ../sysdeps/unix/syscall-template.S: Aucun fichier ou dossier de ce type.
(gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'
(gdb) up
#1  0x5585811a365d in ?? ()
(gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'
(gdb) up
#2  0x5585812485c2 in ?? ()
(gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'
(gdb) up
#3  0x5585811a1102 in main ()
(gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'
(gdb) up
Initial frame selected; you cannot go up.
(gdb) down
#2  0x5585812485c2 in ?? ()
(gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'
(gdb) down
#1  0x5585811a365d in ?? ()
(gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'
(gdb) down
#0  0x7f1549f8e303 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
84      in ../sysdeps/unix/syscall-template.S
(gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'

March 14, 2019 2:04:52 PM CET Willy Tarreau  wrote:On Thu, Mar 14, 
2019 at 01:22:37PM +0100, Louis Chanouha wrote:
> Hello,
> Thanks !
> 
> Seems OK to me, i don't have symbols error but your command doesn't works
(see below).

Oops sorry I know what's wrong :

> 0x7f06e3d0663f in __libc_send (fd=6698, buf=0x56295563fe90, n=4344,
flags=16448) at
> ../sysdeps/unix/sysv/linux/x86_64/send.c:26
> 26    ../sysdeps/unix/sysv/linux/x86_64/send.c: Aucun fichier ou dossier de
ce type.
> (gdb)  p task_per_thread[0].task_list_size
> cannot subscript something of type `'

The process was interrupted at a lower layer in the libc, you need
to go up in the task using the "up" cmomand. Each time you type it
it goes up, and if you type too many of it you can go down using "down".
You can also issue "bt full" which will show the complete backtrace of
each function called and their arguments.

> You told me how to generate core dump, i could give you the full file if it
> can be usefull to you.

Oh yes, that would be awesome. You can do that from gdb using the
command "generate-core-file". It will dump it into your current
directory, you may need to make sure to have enough room. Please
keep in mind that the executable is needed with the core file so
we'll need both.

thanks!
Willy

--

Louis Chanouha | Infrastructures informatiques
Service Numérique de l'Université de Toulouse
Université Fédérale Toulouse Midi-Pyrénées
Maison de la Recherche et de la Valorisation - MRV
118 route de Narbonne - 31062 Toulouse Cedex 09
Tél. : +33 5 61 10 80 45 /poste int. : 12 80 45
louis.chano...@univ-toulouse.fr
Facebook | Twitter | www.univ-toulouse.fr

Re: stable-bot: WARNING: 42 bug fixes in queue for next release

2019-03-14 Thread Tim Düsterhus
Hi

Am 14.03.19 um 15:15 schrieb stable-...@haproxy.com:
> Last release 1.9.4 was issued on 2019/02/06.  There are currently 42 patches 
> in the queue cut down this way:
> - 1 BUG, first one merged on 2019/02/10
> [...]
> 
> The current list of patches in the queue is:
> - BUG : 51d: In Hash Trie, multi header matching was affected by the 
> header names stored globaly.

That one clearly violates the commit message guidelines, but it caused
an interesting result for the bot. I believe such messages are
equivalent to as MINOR, no?

Best regards
Tim Düsterhus



stable-bot: WARNING: 42 bug fixes in queue for next release

2019-03-14 Thread stable-bot
Hi,

This is a friendly bot that watches fixes pending for the next haproxy-stable 
release!  One such e-mail is sent periodically once patches are waiting in the 
last maintenance branch, and an ideal release date is computed based on the 
severity of these fixes and their merge date.  Responses to this mail must be 
sent to the mailing list.

Last release 1.9.4 was issued on 2019/02/06.  There are currently 42 patches in 
the queue cut down this way:
- 1 BUG, first one merged on 2019/02/10
- 6 MAJOR, first one merged on 2019/02/10
- 20 MEDIUM, first one merged on 2019/02/10
- 15 MINOR, first one merged on 2019/02/10

Thus the computed ideal release date for 1.9.5 would be 2019/02/24, which was 
two weeks ago.

The current list of patches in the queue is:
- BUG : 51d: In Hash Trie, multi header matching was affected by the 
header names stored globaly.
- MAJOR   : listener: Make sure the listener exist before using it.
- MAJOR   : cache/htx: Set the start-line offset when a cached object is 
served
- MAJOR   : mux-h2: fix race condition between close on both ends
- MAJOR   : spoe: Don't try to get agent config during SPOP healthcheck
- MAJOR   : fd/threads, task/threads: ensure all spin locks are unlocked
- MAJOR   : stream: avoid double free on unique_id
- MEDIUM  : htx: count the amount of copied data towards the final count
- MEDIUM  : logs: Only attempt to free startup_logs once.
- MEDIUM  : server: initialize the idle conns list after parsing the config
- MEDIUM  : server: initialize the orphaned conns lists and tasks at the end
- MEDIUM  : mux-h1: Report the right amount of data xferred in h1_rcv_buf()
- MEDIUM  : proto_htx: Fix functions applying regex filters on HTX messages
- MEDIUM  : h2/htx: verify that :path doesn't contain invalid chars
- MEDIUM  : cache: Get objects from the cache only for GET and HEAD requests
- MEDIUM  : h2/htx: Correctly handle interim responses when HTX is enabled
- MEDIUM  : proto_htx: Fix data size update if end of the cookie is removed
- MEDIUM  : http_fetch: fix the "base" and "base32" fetch methods in HTX 
mode
- MEDIUM  : servers: Add a per-thread counter of idle connections.
- MEDIUM  : listeners: Don't call fd_stop_recv() if fd_updt is NULL.
- MEDIUM  : 51d: fix possible segfault on deinit_51degrees()
- MEDIUM  : servers: Use atomic operations when handling curr_idle_conns.
- MEDIUM  : h2: advertise to servers that we don't support push
- MEDIUM  : mux-h2/htx: send an empty DATA frame on empty HTX trailers
- MEDIUM  : http_fetch: fix "req.body_len" and "req.body_size" fetch 
methods in HTX mode
- MEDIUM  : spoe: initialization depending on nbthread must be done last
- MEDIUM  : mux-h2/htx: Always set CS flags before exiting h2_rcv_buf()
- MINOR   : ssl: fix warning about ssl-min/max-ver support
- MINOR   : channel: Set CF_WROTE_DATA when outgoing data are skipped
- MINOR   : lua: initialize the correct idle conn lists for the SSL sockets
- MINOR   : checks: make external-checks restore the original 
rlim_fd_cur/max
- MINOR   : config: Reinforce validity check when a process number is parsed
- MINOR   : mux-h2: Don't add ":status" pseudo-header on trailers
- MINOR   : spoe: do not assume agent->rt is valid on exit
- MINOR   : mux-h1: Add "transfer-encoding" header on outgoing requests if 
needed
- MINOR   : listener: keep accept rate counters accurate under saturation
- MINOR   : mworker: be careful to restore the original rlim_fd_cur/max on 
reload
- MINOR   : proto-htx: Consider a XFER_LEN message as chunked by default
- MINOR   : mux-h1: verify the request's version before dropping 
connection: keep-alive
- MINOR   : init: never lower rlim_fd_max
- MINOR   : cache/htx: Return only the headers of cached objects to HEAD 
requests
- MINOR   : mux-h1: Always initilize h1m variable in h1_process_input()

---
The haproxy stable-bot is freely provided by HAProxy Technologies to help 
improve the quality of each HAProxy release.  If you have any issue with these 
emails or if you want to suggest some improvements, please post them on the 
list so that the solutions suiting the most users can be found.



Re: 1.9.2: Crash with 300% CPU and stuck agent-checks

2019-03-14 Thread Willy Tarreau
On Thu, Mar 14, 2019 at 11:43:54AM +0100, Louis Chanouha wrote:
> Hello,
> Did I miss something ? Sorry I never used GDB.
> 
> +--? (gdb) p task_per_thread[0].task_list_size
> cannot subscript something of type `'

Ah sorry, I thought from your kind offer that you did :-)

You first need to attach gdb to the current process. For example :

   $  gdb --pid $(pidof haproxy)

Note that if you have multiple haproxy processes, you'll have to check
by hand and pass the pid yourself after "gdb --pid".

Then you can enter these commands. If gdb says it cannot find the
symbols or whatever, it means your executable was not built with
debugging symbols so you'll have to give up.

However, please keep in mind that once gdb is attached to the process,
your process is frozen and the traffic stalls. So you have not to waste
too much time typing your commands. I recommend putting them in a
terminal and copy-pasting them at once. Then in order to quit, simply
type "quit" and gdb will detach. You can even put this command into
the list that you copy-paste. Some people are good at scripting gdb
from the command line, I'm not. But I know it's possible to do better
using "-ex" or something like this.

Hoping this helps,
Willy



Re: 1.9.2: Crash with 300% CPU and stuck agent-checks

2019-03-14 Thread Louis Chanouha
Hello,
Did I miss something ? Sorry I never used GDB.

└──╼ (gdb) p task_per_thread[0].task_list_size
cannot subscript something of type `'

└──╼ haproxy -vvv
HA-Proxy version 1.9.3-1 2019/01/29 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/root/haproxy/haproxy-1.9.3=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-Wno-unused-label -Wno-sign-compare -Wno-unused-parameter 
-Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered 
-Wno-missing-field-initializers -Wtype-limits -Wshift-negative-value 
-Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 
USE_SYSTEMD=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.1a  20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.1a  20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.22 2016-07-29
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' keyword)
  h2 : mode=HTX    side=FE|BE
  h2 : mode=HTTP   side=FE
    : mode=HTX    side=FE|BE
    : mode=TCP|HTTP   side=FE|BE

Available filters :
    [SPOE] spoe
    [COMP] compression
    [CACHE] cache
    [TRACE] trace

March 14, 2019 11:18:39 AM CET Willy Tarreau  wrote:Louis,

I'd be interested in checking the values of task_per_thread[X].task_list_size
for each value of X between 0 and your number of threads minus 1. Example for
4 threads :

(gdb) p task_per_thread[0].task_list_size
$2 = 0
(gdb) p task_per_thread[1].task_list_size
$3 = 0
(gdb) p task_per_thread[2].task_list_size
$4 = 0
(gdb) p task_per_thread[3].task_list_size
$5 = 0

It will help ruling out certain areas which could set the negative value.

Thanks,
Willy

--

Louis Chanouha | Infrastructures informatiques
Service Numérique de l'Université de Toulouse
Université Fédérale Toulouse Midi-Pyrénées
Maison de la Recherche et de la Valorisation - MRV
118 route de Narbonne - 31062 Toulouse Cedex 09
Tél. : +33 5 61 10 80 45 /poste int. : 12 80 45
louis.chano...@univ-toulouse.fr
Facebook | Twitter | www.univ-toulouse.fr

Re: 1.9.2: Crash with 300% CPU and stuck agent-checks

2019-03-14 Thread Willy Tarreau
Louis,

I'd be interested in checking the values of task_per_thread[X].task_list_size
for each value of X between 0 and your number of threads minus 1. Example for
4 threads :

(gdb) p task_per_thread[0].task_list_size
$2 = 0
(gdb) p task_per_thread[1].task_list_size
$3 = 0
(gdb) p task_per_thread[2].task_list_size
$4 = 0
(gdb) p task_per_thread[3].task_list_size
$5 = 0

It will help ruling out certain areas which could set the negative value.

Thanks,
Willy



Re: 1.9.2: Crash with 300% CPU and stuck agent-checks

2019-03-14 Thread Willy Tarreau
Hello Louis,

On Thu, Mar 14, 2019 at 10:34:05AM +0100, Louis Chanouha wrote:
> Hello,
> I seems that i have the same problem than Mark Janssen.
> I did not restart so i still can do gdb debug.

Quite interesting as well, thank you. Indeed it looks identical, with
not all threads looping. I'm looking if I can spot anything to ask you
to check.

Thanks,
Willy



Re: High CPU with Haproxy 1.9.4 (and 1.9.2)

2019-03-14 Thread Willy Tarreau
On Thu, Mar 14, 2019 at 10:34:46AM +0100, Mark Janssen wrote:
> This was the 'show activity' info
> 
> Show activity:
> thread_id: 7
> date_now: 1552497125.537000
> loops: 1876310231 2198499593 29388065 2234235968 2189969792 23322503 11681489 
> 1867345227
> wake_cache: 4699089 4475087 5332367 4386659 4870234 5108383 5693172 4044835
> wake_tasks: 1868654965 2192153566 21290780 2227958941 2182536918 15400586 
> 2497632 1860594634

Thank you, so from here it already appears that some threads are much more
loaded than other ones, which means it's not the global run queue which is
the culprit but a local one. This helps a lot. I'll have a look at the
scheduler code to see if I can spot anything related to this.

> I've since restarted haproxy, so I can't query it currently for new info,
> but wheh it happens again, i'll try to keep it running on the slave node ;)

OK thank you!

Willy



Re: High CPU with Haproxy 1.9.4 (and 1.9.2)

2019-03-14 Thread Mark Janssen
This was the 'show activity' info

Show activity:
thread_id: 7
date_now: 1552497125.537000
loops: 1876310231 2198499593 29388065 2234235968 2189969792 23322503
11681489 1867345227
wake_cache: 4699089 4475087 5332367 4386659 4870234 5108383 5693172 4044835
wake_tasks: 1868654965 2192153566 21290780 2227958941 2182536918 15400586
2497632 1860594634
wake_signal: 0 0 0 0 0 0 0 0
poll_exp: 1873354056 2196628655 26623147 2232345602 2187407154 20508969
8190804 1864639469
poll_drop: 180791 105999 158230 91354 145087 154604 203869 160330
poll_dead: 0 0 0 0 0 0 0 0
poll_skip: 0 0 0 0 0 0 0 0
fd_skip: 0 0 0 0 0 0 0 0
fd_lock: 294901 536257 529154 502292 466785 487033 417977 185827
fd_del: 0 0 0 0 0 0 0 0
conn_dead: 0 0 0 0 0 0 0 0
stream: 2241864 1305507 1639446 1217533 1806315 1673571 2205727 1922870
empty_rq: 2027320 2602152 3377023 2577090 2438025 3050896 3109183 1715755
long_rq: 1300059 670921 20903418 600657 795707 435466 1735307 1079695
cpust_ms_tot: 76708 30331 75408 21508 31986 78251 78328 85689
cpust_ms_1s: 0 0 0 0 0 0 0 0
cpust_ms_15s: 32 0 117 0 178 46 108 24
avg_loop_us: 1 2 249 1 1 149 277 2


I've since restarted haproxy, so I can't query it currently for new info,
but wheh it happens again, i'll try to keep it running on the slave node ;)


On Thu, Mar 14, 2019 at 6:00 AM Willy Tarreau  wrote:

> Hi Mark,
>
> On Wed, Mar 13, 2019 at 02:08:15PM +0100, Mark Janssen wrote:
> > Hi,
> >
> > I've recenly switched a system over from 1.6.9, which has been running
> fine
> > for years, to 1.9.4.
> > I've updated the configuration to use nbthread instead of nbproc, and
> > cleaned up the config a lot.
> >
> > A few times now, however, i've seen haproxy using all available CPU on
> the
> > system, even when traffic is mostly idle (or when the loadbalancer isn't
> > even active anymore after a failover to the 2nd node).
>
> That's not expected. I'm seeing a problem in your show info dump (thanks
> for providing it) :
>
> > Run_queue: 4294967285
>
> The run queue has a negative size, I have no idea how we ended up in this
> state! So definitely the threads never sleep, believing they always have
> something to do. Could you please also report the output of "show activity"
> on the socket when this happens ? It will dump a number of per-thread
> indicators. It's unlikely we'll spot the issue there but it could help
> narrow it down.
>
> Thanks!
> Willy
>


-- 
Mark Janssen  --  m...@sig-io.nl
Unix / Linux Open-Source and Internet Consultant


Re: 1.9.2: Crash with 300% CPU and stuck agent-checks

2019-03-14 Thread Louis Chanouha
Hello,
I seems that i have the same problem than Mark Janssen.
I did not restart so i still can do gdb debug.

Louis

└──╼ haproxy -v
HA-Proxy version 1.9.3-1 2019/01/29 - https://haproxy.org/

──╼ /usr/bin/socat -T 15 -t 5 /run/haproxy/admin.sock - <<< "show info dump" 
|grep Run_queue
Run_queue: 4294967075

└──╼ /usr/bin/socat -T 15 -t 5 /run/haproxy/admin.sock - <<< "show activity" 
thread_id: 0
date_now: 1552556188.687920
loops: 1008276769 3401674410 4164783314
wake_cache: 32084592 30461706 80918252
wake_tasks: 974107386 3368299498 4072703110
wake_signal: 0 0 0
poll_exp: 1006191978 3398761204 4153621362
poll_drop: 290649 290640 408668
poll_dead: 0 0 0
poll_skip: 0 0 0
fd_skip: 0 0 0
fd_lock: 582021 566207 517485
fd_del: 0 0 0
conn_dead: 0 0 0
stream: 2544187 2588658 4177598
empty_rq: 10172386 11302238 74503517
long_rq: 1310868 4117304 2862308
cpust_ms_tot: 6731598 6509804 640930
cpust_ms_1s: 313 219 3
cpust_ms_15s: 3864 3003 142
avg_loop_us: 1 0 43

January 29, 2019 10:45:58 AM CET Willy Tarreau  wrote:On Tue, Jan 
29, 2019 at 10:41:52AM +0100, Louis Chanouha wrote:
> I'm pretty sure this bug is specific to version 1.9. Last week i restarted
> the process because is seemed to be stuck at around 100% CPU, but without
> anormal behaviour.
> I've never saw that in 1.7 or 1.8 series. We migrated from 1.8.15 to 1.9.2.
> 
> For 3 years, i've never saw HAProxy use more than 30% CPU of our VM.

OK that's already a good indication. For example you could have been running
fine with 1.9.1 and experienced the bug only once switching to 1.9.2, which
would have indicated a recent regression.

> As i guess they could be private keys in theses file, i will send you core
> dump privately (master/worker) and or haproxy conf file. Hope it will help. 

There would indeed be private info. I don't need them right now, I
just suggested that you keep them for a while just in case they would
be needed later to validate any hypothesis for example.

I've released 1.9.3 this morning you should definitely upgrade to this one
and see if the issue happens again. It contains the fixes for the suspicious
related bugs I mentioned.

Thanks,
Willy

--

Louis Chanouha | Infrastructures informatiques
Service Numérique de l'Université de Toulouse
Université Fédérale Toulouse Midi-Pyrénées
Maison de la Recherche et de la Valorisation - MRV
118 route de Narbonne - 31062 Toulouse Cedex 09
Tél. : +33 5 61 10 80 45 /poste int. : 12 80 45
louis.chano...@univ-toulouse.fr
Facebook | Twitter | www.univ-toulouse.fr