Re: DNS Load balancing needs feedback and advice.

2020-11-06 Thread Willy Tarreau
Hi Lukas,

On Thu, Nov 05, 2020 at 06:40:50PM +0100, Lukas Tribus wrote:
> What I don't like are code/subsystems that are not sufficiently
> covered maintenance- and maintainer-wise (whatever the reason may be).
> 
> In my opinion, the resolver code is like that today:
> 
> - issues (including bugs) are open for years
> - it's riddled with traps for the users that will suddenly blow up in
> their faces (lack of TCP support, IPv4 vs IPv6)
> - important discussions have come to a halt

Yes, I agree with you regarding this unpleasant situation, and actually
having someone else work on it is also one way to add resources to this
part.

The main issue that plagued DNS is that the use cases have fundamentally
changed over time and it has constantly been abused to go a bit further.
Initially it was only designed to support an automatic IP address change
of your AWS machine that just rebooted. Then it had to evolve to support
late resolution. Then deduplication to support setting up farms. Then
SRV records, etc. And looking back, I can also say "what a monster we've
done". The problem here is not directly related to the DNS implementation
per se, but its integration withinn server farms and its actions there,
because, by design, this is supposed to do counter-intuitive things
consisting in changing some settings from those that a user has carefully
placed in a configuration, which themselves possibly contradict others
from a state file. Adding to this that partial responses must not cause
the immediate removal of absent entries, and that some people expect their
multiple LBs to by consistent where the protocol does not offer this
consistency, and all this while still trying not to break the initial use
case, we can easily see the total mess this has engendered. And I guess
that such irreconciliable use cases have not really helped propose durable
solutions to a number of issues.

My vision on this is that we should not have ceeded on the abuses nor
demands of abuse of the initial DNS features, but instead we should have
created a completely independent discovery mechanism. It's obviously easy
to say after the trouble, with the architectural foundations available in
2020 that were not in 2015. But for me, discovery is not DNS, discovery
may use DNS or other services, but it's not the same thing as just
resolving a server name that may change at run time, and that's what
needs to be worked on separately.

What I'd like to see is the DNS protocol being updated to support TCP,
the DNS stack being possibly made even more modular (possibly separating
the message processing from the resolving), and known issues addressed,
even if this requires the addition of a few options or keyword to choose
between one behavior or another, instead of just crossing fingers.

> I cannot help here (other than explaining why some current behaviours
> are bad and triaging the bugs on GH, which is also lacking: most dns
> issues do not even have the dns subsystem label). All this blunt
> critique without providing suggestions to improve the situation is
> rude, but since we are discussing DNS load-balancing (which sounds
> like adding new fuel to the fire to me), apparently with the same
> amount of resources and enthusiasm, I am concerned that we will end up
> in the same or worse situation, which is why I have to share my
> (negative) opinion about the current situation.

I totally understand. I don't see your point as a blunt critique nor any
form of negative feedback, quite the contrary. You're the one who deals
with the most user reports and knows best what works, what doesn't, and
what traps users fall into. I do really valuate your feedback on this.
Rest assured that for me it is also a concern, as I don't like to know
that some areas are a bit unstable nor to think "wow, should this work at
all?" when seeing a config. And we know there is another area suffering
from similar traps (though much less), which is the server-state-file,
just because, similarly, it deals with conflicts between a supposed
state and a configured state.

I, too, would like to see these points addressed, if possible for 2.4,
so that we don't have to wonder anymore if a config will work. This will
require breaking changes, but likely for good given that users regularly
fall into traps. For the DNS resolvers in my opinion the technical issues
like protocol limitations should be within reach. For the state file, by
separating the administrative and operational states, and using a new
format, we should also address the concerns and at the same time make
them work better in relation with other dynamic changes (DNS included).

> > hate the noise that some people regularly make about "UDP support"
> 
> I am *way* more concerned about what to tell people when they report
> redundant production systems meltdowns because of the traps that we
> knew about for a long time and never improved. Like when the DNS
> response size surpasses accepted_payload_size and we don't 

Re: Backport ssl_{c,s}_chain_der to 2.2 ?

2020-11-06 Thread Christopher Faulet

Le 06/11/2020 à 11:46, Ionel GARDAIS a écrit :

Thanks Willy and the team for releasing 2.3 !

Is ssl_{c,s}_chain_der fetch planned to be backported to 2.2 ?



Not planned but small enough to be done. Now backported :)

--
Christopher Faulet



[ANNOUNCE] haproxy-2.0.19

2020-11-06 Thread Christopher Faulet

Hi,

HAProxy 2.0.19 was released on 2020/11/06. It added 38 new commits
after version 2.0.18.

The changelog is very similar than the ones for the 2.2.5 and 2.1.10,
excluding not backported fixes. Please see 2.2.5 announcement for the
details.

However, thanks to a last minute change, there is a small difference in this
release. During startup, if an errorfile has a payload size that differs
from the announced content-length, a warning is emitted and the
content-length is adapted to reflect the real payload size. On the 2.2 and
2.1, this was fixed too late for the releases and an fatal error is
triggered instead. Note that on the 2.3 and above, HAProxy fails to start in
this situation.

This release fixes some crashes and thread-safety issues. Thus, it is highly
recommended to upgrade.


Please find the usual URLs below :
   Site index   : http://www.haproxy.org/
   Discourse: http://discourse.haproxy.org/
   Slack channel: https://slack.haproxy.org/
   Issue tracker: https://github.com/haproxy/haproxy/issues
   Wiki : https://github.com/haproxy/wiki/wiki
   Sources  : http://www.haproxy.org/download/2.0/src/
   Git repository   : http://git.haproxy.org/git/haproxy-2.0.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy-2.0.git
   Changelog: http://www.haproxy.org/download/2.0/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/


---
Complete changelog :
Amaury Denoyelle (5):
  MINOR: counters: fix a typo in comment
  BUG/MINOR: stats: fix validity of the json schema
  BUG/MINOR: server: fix srv downtime calcul on starting
  BUG/MINOR: server: fix down_time report for stats
  BUG/MINOR: lua: initialize sample before using it

Brad Smith (1):
  BUILD: makefile: Fix building with closefrom() support enabled

Christopher Faulet (16):
  MINOR: hlua: Display debug messages on stderr only in debug mode
  BUG/MINOR: mux-h1: Always set the session on frontend h1 stream
  BUG/MEDIUM: mux-h2: Don't handle pending read0 too early on streams
  BUG/MINOR: http-htx: Expect no body for 204/304 internal HTTP responses
  BUG/MEDIUM: spoe: Unset variable instead of set it if no data provided
  BUG/MEDIUM: mux-h1: Get the session from the H1S when capturing bad 
messages
  BUG/MEDIUM: lb: Always lock the server when calling 
server_{take,drop}_conn
  BUG/MINOR: http-ana: Don't send payload for internal responses to HEAD 
requests
  BUG/MAJOR: mux-h2: Don't try to send data if we know it is no longer 
possible
  BUG/MEDIUM: filters: Don't try to init filters for disabled proxies
  BUG/MINOR: server: Set server without addr but with dns in RMAINT on 
startup
  MINOR: server: Copy configuration file and line for server templates
  BUG/MEDIUM: mux-pt: Release the tasklet during an HTTP upgrade
  BUG/MINOR: filters: Skip disabled proxies during startup only
  MINOR: http-htx: Add understandable errors for the errorfiles parsing
  BUG/MINOR: http-htx: Just warn if payload of an errorfile doesn't match 
the C-L

Eric Salama (1):
  BUG/MINOR: Fix several leaks of 'log_tag' in init().

Frédéric Lécaille (2):
  BUG/MINOR: peers: Inconsistency when dumping peer status codes.
  BUG/MINOR: peers: Possible unexpected peer seesion reset after collisions.

Olivier Houchard (1):
  BUG/MEDIUM: h1: Always try to receive more in h1_rcv_buf().

Remi Tricot-Le Breton (1):
  BUG/MINOR: cache: Inverted variables in http_calc_maxage function

William Lallemand (1):
  DOC: ssl: crt-list negative filters are only a hint

Willy Tarreau (10):
  BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe
  BUG/MINOR: init: only keep rlim_fd_cur if max is unlimited
  BUG/MINOR: mux-h2: do not stop outgoing connections on stopping
  MINOR: fd: report an error message when failing initial allocations
  BUG/MEDIUM: task: bound the number of tasks picked from the wait queue at 
once
  BUG/MINOR: queue: properly report redistributed connections
  BUG/MEDIUM: server: support changing the slowstart value from state-file
  BUG/MINOR: extcheck: add missing checks on extchk_setenv()
  BUG/MINOR: log: fix memory leak on logsrv parse error
  BUG/MEDIUM: stick-table: limit the time spent purging old entries

--
Christopher Faulet



Backport ssl_{c,s}_chain_der to 2.2 ?

2020-11-06 Thread Ionel GARDAIS
Thanks Willy and the team for releasing 2.3 ! 

Is ssl_{c,s}_chain_der fetch planned to be backported to 2.2 ? 

Ionel 

--
232 avenue Napoleon BONAPARTE 92500 RUEIL MALMAISON
Capital EUR 219 300,00 - RCS Nanterre B 408 832 301 - TVA FR 09 408 832 301

Re: [*EXT*] Re: Backport ssl_{c,s}_chain_der to 2.2 ?

2020-11-06 Thread Ionel GARDAIS
Thanks Christopher !

-- 
Ionel

- Mail original -
De: "Christopher Faulet" 
À: "Ionel GARDAIS" , "haproxy" 

Envoyé: Vendredi 6 Novembre 2020 12:09:07
Objet: [*EXT*] Re: Backport ssl_{c,s}_chain_der to 2.2 ?

Le 06/11/2020 à 11:46, Ionel GARDAIS a écrit :
> Thanks Willy and the team for releasing 2.3 !
> 
> Is ssl_{c,s}_chain_der fetch planned to be backported to 2.2 ?
> 

Not planned but small enough to be done. Now backported :)

-- 
Christopher Faulet
--
232 avenue Napoleon BONAPARTE 92500 RUEIL MALMAISON
Capital EUR 219 300,00 - RCS Nanterre B 408 832 301 - TVA FR 09 408 832 301




[ANNOUNCE] haproxy-1.8.27

2020-11-06 Thread Amaury Denoyelle
Hi,

HAProxy 1.8.27 was released on 2020/11/06. It added 44 new commits after
version 1.8.26. Every 1.8 users are encouraged to upgrade as it
contains several bug fixes.

This release contains some fixes also present on higher versions. Most
notably a fix on the h2 multiplexer, a thread-safety bug on
load-balancer algorithms, a design issue on SPOE and the skipping of
disabled proxies for filters. Also there is the possibility now to
update the server-state file without crashing haproxy. You can look the
detailled reports for them on 2.2.5 announce.

In addition, the following changes have been made :

The h2 multiplexer is more robust thanks to Christopher and Willy. First
it is now able to parse incomplete chunk formatting. An issue has also
been raised due to a certain combination of frame type and flags which
haproxy interprets wrongly as invalid and is now fixed.

Some improvement on the ssl have been made by William. Notably, a better
algorithm to choose a certificate when using wildcards with respect to
the supported encryption algorithm.

The lua engine now prevents to load map at runtime, which should never
have been permitted. There is also additional checks on arguments when
doing ip address manipulation.

A nasty bug has fixed by Willy, triggered when using multi processes
with expose-fd. In short, the "disable frontend" command had the
side-effect of pausing other process socket file-descriptors. Now this
case is properly handled, the other listeners are not impacted.

There is also a list of smaller fixes. Again, look at 2.2.5 announce that
summarizes them for more info.

Thanks to everyone for this release. Enjoy !

Please find the usual URLs below :
   Site index   : http://www.haproxy.org/
   Discourse: http://discourse.haproxy.org/
   Slack channel: https://slack.haproxy.org/
   Issue tracker: https://github.com/haproxy/haproxy/issues
   Wiki : https://github.com/haproxy/wiki/wiki
   Sources  : http://www.haproxy.org/download/1.8/src/
   Git repository   : http://git.haproxy.org/git/haproxy-1.8.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy-1.8.git
   Changelog: http://www.haproxy.org/download/1.8/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/

---
Complete changelog :
Amaury Denoyelle (6):
  BUG/MINOR: config: Fix memory leak on config parse listen
  MINOR: counters: fix a typo in comment
  BUG/MINOR: stats: fix validity of the json schema
  BUG/MINOR: server: fix srv downtime calcul on starting
  BUG/MINOR: server: fix down_time report for stats
  BUG/MINOR: lua: initialize sample before using it

Christopher Faulet (13):
  BUG/MEDIUM: mux-h2: Don't fail if nothing is parsed for a legacy chunk 
response
  BUG/MEDIUM: map/lua: Return an error if a map is loaded during runtime
  BUG/MINOR: lua: Check argument type to convert it to IPv4/IPv6 arg 
validation
  BUG/MINOR: lua: Check argument type to convert it to IP mask in arg 
validation
  BUG/MEDIUM: pattern: Renew the pattern expression revision when it is 
pruned
  MINOR: hlua: Display debug messages on stderr only in debug mode
  BUG/MEDIUM: spoe: Unset variable instead of set it if no data provided
  BUG/MEDIUM: lb: Always lock the server when calling 
server_{take,drop}_conn
  BUG/MAJOR: mux-h2: Don't try to send data if we know it is no longer 
possible
  BUG/MEDIUM: filters: Don't try to init filters for disabled proxies
  BUG/MINOR: server: Set server without addr but with dns in RMAINT on 
startup
  MINOR: server: Copy configuration file and line for server templates
  BUG/MINOR: filters: Skip disabled proxies during startup only

Dragan Dosen (1):
  BUG/MEDIUM: pattern: fix memory leak in regex pattern functions

Lukas Tribus (1):
  BUG/MINOR: dns: ignore trailing dot

Remi Tricot-Le Breton (1):
  BUG/MINOR: cache: Inverted variables in http_calc_maxage function

Tim Duesterhus (2):
  MINOR: Commit .gitattributes
  CLEANUP: Update .gitignore

William Dauchy (1):
  DOC: agent-check: fix typo in "fail" word expected reply

William Lallemand (5):
  BUG/MINOR: startup: haproxy -s cause 100% cpu
  BUG/MEDIUM: ssl: check OCSP calloc in ssl_sock_load_ocsp()
  BUG/MEDIUM: ssl: does not look for all SNIs before chosing a certificate
  BUG/MINOR: ssl: verifyhost is case sensitive
  DOC: ssl: crt-list negative filters are only a hint

Willy Tarreau (14):
  BUG/MINOR: stats: use strncmp() instead of memcmp() on health states
  BUG/MINOR: reload: do not fail when no socket is sent
  BUG/MINOR: threads: work around a libgcc_s issue with chrooting
  BUILD: thread: limit the libgcc_s workaround to glibc only
  BUILD: threads: better workaround for late loading of libgcc_s
  BUG/MEDIUM: h2: report frame bits only for handled types
  BUG/MEDIUM: listeners: do not pause foreign listeners
  REGTESTS: add a few 

[PATCH] switch from HA_OPENSSL_VERSION to well known macros

2020-11-06 Thread Илья Шипицин
Hello,

yet another patch.

Ilya
From 39173d569cfc3559a9293b40be958e683193fd05 Mon Sep 17 00:00:00 2001
From: Ilya Shipitsin 
Date: Fri, 6 Nov 2020 18:46:45 +0500
Subject: [PATCH] BUILD: ssl: more elegant OpenSSL early data support check

Let us change that feature detction to SSL_READ_EARLY_DATA_SUCCESS
macro check instead of version comparision.
---
 src/ssl_sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 9072752f2..28c7665b8 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -2294,7 +2294,7 @@ static void ssl_sock_switchctx_set(SSL *ssl, SSL_CTX *ctx)
 	SSL_set_SSL_CTX(ssl, ctx);
 }
 
-#if ((HA_OPENSSL_VERSION_NUMBER >= 0x10101000L) || defined(OPENSSL_IS_BORINGSSL))
+#if (defined(SSL_READ_EARLY_DATA_SUCCESS)) || defined(OPENSSL_IS_BORINGSSL))
 
 static int ssl_sock_switchctx_err_cbk(SSL *ssl, int *al, void *priv)
 {
-- 
2.28.0



Re: [PATCH] switch from HA_OPENSSL_VERSION to well known macros

2020-11-06 Thread Илья Шипицин
sorry, I sent a patch with typo.
please ignore it.

пт, 6 нояб. 2020 г. в 18:49, Илья Шипицин :

> Hello,
>
> yet another patch.
>
> Ilya
>


Re: [ANNOUNCE] haproxy-2.3.0

2020-11-06 Thread William Lallemand
On Thu, Nov 05, 2020 at 07:20:46PM +0100, Willy Tarreau wrote:
> Hi,
> 
> HAProxy 2.3.0 was released on 2020/11/05. It added 33 new commits after
> version 2.3-dev9. I was right to wait a few more days before releasing,
> we could spot two late regressions and fix them in time!
> 

Hi,

The crt-list support of this version is partly broken, lines are
silently ignored if they uses a certificate which was already loaded.

A fix was pushed today in the git:

https://git.haproxy.org/?p=haproxy-2.3.git;a=commit;h=689d981541a4805760acd6a2ba1433dc3d3534b1

Distribution maintainers should consider this one if they didn't emit
the 2.3.0 yet. 

We'll probably make a 2.3.1 release at the end of next week.

Sorry for the mess!

-- 
William Lallemand



Re: [2.0.17] crash with coredump

2020-11-06 Thread Kirill A. Korinsky
Hey,

I'm wondering, does it related to this code:

+   /* some tasks may have woken other ones up */
+   if (max_processed && thread_has_tasks())
+   goto not_done_yet;
+

from 
http://git.haproxy.org/?p=haproxy-2.2.git;a=blobdiff;f=src/task.c;h=500223f185bf324c0adb34a42ec0244e638ce63e;hp=1a7f44d9169e0a01d42ba13d8d335102aa43577b;hb=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba;hpb=77015abe0bcfde67bff519b1d48393a513015f77
 


as far as I understand it should be safe to remove (with not_done_yet label).

Can you try it?

--
wbr, Kirill

> On 3. Nov 2020, at 15:15, Maciej Zdeb  wrote:
> 
> I modified h2s struct in 2.2 branch with HEAD set to 
> f96508aae6b49277dcf142caa35042678cf8e2ca "MEDIUM: mux-h2: merge recv_wait and 
> send_wait event notifications" like below (subs is in exact place of removed 
> wait_event):
> 
> struct h2s {
> [...]
> struct tasklet *dummy0;
> struct wait_event *dummy1;
> struct wait_event *subs;  /* recv wait_event the conn_stream 
> associated is waiting on (via h2_subscribe) */
> struct list list; /* To be used when adding in h2c->send_list or 
> h2c->fctl_lsit */
> struct tasklet *shut_tl;  /* deferred shutdown tasklet, to retry to 
> send an RST after we failed to,
>* in case there's no other subscription to 
> do it */
> }
> 
> it crashed like before with subs = 0x:
> 
> (gdb) p *(struct h2s*)(0x7fde7459e9b0)
> $1 = {cs = 0x7fde5c02d260, sess = 0x5628283bc740 , h2c = 
> 0x5628295cbb80, h1m = {state = H1_MSG_RPBEFORE, flags = 12, curr_len = 0,
> body_len = 0, next = 0, err_pos = -1, err_state = 0}, by_id = {node = 
> {branches = {b = {0x0, 0x7fde3c2c6c60}}, node_p = 0x0,
>   leaf_p = 0x5628295cc018, bit = -5624, pfx = 29785}, key = 11}, id = 11, 
> flags = 28673, sws = -4060, errcode = H2_ERR_NO_ERROR, st = H2_SS_HREM,
>   status = 200, body_len = 0, rxbuf = {size = 0, area = 0x0, data = 0, head = 
> 0}, dummy0 = 0x0, dummy1 = 0x0, subs = 0x, list = {
> n = 0x7fde7459ea68, p = 0x7fde7459ea68}, shut_tl = 0x5628297eeaf0}
> 
> it crashes like above until commit: 
> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=5c8be272c732e4f42ccd6b3d65f25aa7425a2aba
>  
> 
>  which alters tasks processing.
> 
> 
> pon., 2 lis 2020 o 15:46 Maciej Zdeb mailto:mac...@zdeb.pl>> 
> napisał(a):
> I'm wondering, the corrupted address was always at "wait_event" in h2s 
> struct, after its removal in: 
> http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37
>  
> 
>  crashes went away.
> 
> But with the above patch and after altering h2s struct into:
> struct h2s {
> [...]
> struct tasklet *shut_tl;
> struct wait_event *recv_wait; /* recv wait_event the conn_stream 
> associated is waiting on (via h2_subscribe) */
> struct wait_event *send_wait; /* send wait_event the conn_stream 
> associated is waiting on (via h2_subscribe) */
> struct list list; /* To be used when adding in h2c->send_list or 
> h2c->fctl_lsit */
> };
> 
> the crash returned.
> 
> However after recv_wait and send_wait were merged in: 
> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca
>  
> 
>  crashes went away again.
> 
> In my opinion shut_tl should be corrupted again, but it is not. Maybe the 
> last patch fixed it?
> 
> pon., 2 lis 2020 o 15:37 Kirill A. Korinsky  > napisał(a):
> Maciej,
> 
> Looks like memory corruption is still here but it corrupt just some another 
> place.
> 
> Willy do you agree?
> 
> --
> wbr, Kirill
> 
>> On 2. Nov 2020, at 15:34, Maciej Zdeb > > wrote:
>> 
>> So after Kirill suggestion to modify h2s struct in a way that tasklet 
>> "shut_tl" is before recv_wait I verified if in 2.2.4 the same crash will 
>> occur nd it did not!
>> 
>> After the patch that merges recv_wait and send_wait: 
>> http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca
>>  
>> 
>> and witch such h2s (tasklet shut_tl before wait_event subs) the crashes are 
>> gone:
>> 
>> struct h2s {
>> [...]
>> struct buffer rxbuf; /* receive buffer, always valid (buf_empty or 

Re: [2.0.17] crash with coredump

2020-11-06 Thread Willy Tarreau
Hi Kirill,

On Fri, Nov 06, 2020 at 06:41:03PM +0100, Kirill A. Korinsky wrote:
> Hey,
> 
> I'm wondering, does it related to this code:
> 
> +   /* some tasks may have woken other ones up */
> +   if (max_processed && thread_has_tasks())
> +   goto not_done_yet;
> +
(...)
> as far as I understand it should be safe to remove (with not_done_yet label).
> 
> Can you try it?

It's indeed absolutely safe to remove, but it will not tell us anything
unfortunately. If the problem disappears or appears with/without it, it
will just further confirm that the problem has likely been there for
even longer and is sensitive to the sequencing.

I really wish I could have a way to reproduce it. I'd instrument the code to
crash as soon as we'd detect the corruption, and try to narrow down the area
where it happens till we find the offending code.

If someone else faces the same issue and figures a reliable way to reproduce
it, please suggest!

Cheers,
Willy



Multiplexing FastCGI Connections

2020-11-06 Thread Harris Kaufmann
Hi everyone,

I wanted to try the FastCGI multiplexing feature, but whatever I do HAProxy
never sends multiple requests simultaneously over the same backend
connection. This is my configuration:

--

defaults
mode http
timeout connect 5000ms
timeout client 5ms
timeout server 5ms


backend fastcgi
server server0 127.0.0.1:9002 proto fcgi maxconn 1
use-fcgi-app fcgi-app

fcgi-app fcgi-app
docroot /
option mpxs-conns
option max-reqs 20
no option get-values

frontend web
bind *:8080
default_backend fastcgi

--

When I send multiple HTTP requests that overlap, Haproxy just executes them
serially with new backend connections for each request (because of maxconn)
and most of them time out. Is my configuration wrong? Did I misunderstand
this feature?

Thanks and best regards,
Harris


Re: [2.0.17] crash with coredump

2020-11-06 Thread Willy Tarreau
Maciej,

I wrote this ugly patch to try to crash as soon as possible when a corrupt
h2s->subs is detected. The patch was written for 2.2. I only instrumented
roughly 30 places in process_stream() which is a fairly likely candidate.
I just hope it happens within the context of the stream itself otherwise
it will become really painful.

You can apply this patch on top of your existing changes. It will try to
detect the presence of a non-zero lowest bit in the subs pointer (which
should never happen). If we're lucky it will crash inside process_stream()
between two points and we'll be able to narrow it down. If we're unlucky
it will crash when entering it and that will not be fun.

If you want to play with it, you can apply TEST_SI() on stream_interface
pointers (often called "si"), TEST_STRM() on stream pointers, and TEST_CS()
on conn_stream pointers (often called "cs").

Please just let me know how it goes. Note, I tested it, it passes all
regtests for me so I'm reasonably confident it should not crash by
accident. But I can't be sure, I'm just using heuristics, so please do
not put it in sensitive production!

Thanks,
Willy
>From b7638769b3ee38a23bf319df5338c0ba46d9f57e Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Fri, 6 Nov 2020 19:54:01 +0100
Subject: EXP: try to spot where h2s->subs changes

---
 include/haproxy/bug.h |  7 +++
 src/mux_h2.c  | 22 +
 src/stream.c  | 54 +++
 3 files changed, 83 insertions(+)

diff --git a/include/haproxy/bug.h b/include/haproxy/bug.h
index a008126..c650f60 100644
--- a/include/haproxy/bug.h
+++ b/include/haproxy/bug.h
@@ -166,6 +166,13 @@ struct mem_stats {
 })
 #endif /* DEBUG_MEM_STATS*/
 
+
+#define TEST_CS(ptr) do { extern void testcorrupt(const void *); 
testcorrupt(ptr); } while (0)
+
+#define TEST_SI(si) do { if ((si)) TEST_CS((si)->end); } while (0)
+
+#define TEST_STRM(s) do { if ((s)) { TEST_SI(&(s)->si[0]); 
TEST_SI(&(s)->si[1]);} } while (0)
+
 #endif /* _HAPROXY_BUG_H */
 
 /*
diff --git a/src/mux_h2.c b/src/mux_h2.c
index 5830fdb..6b5a649 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -6251,3 +6251,25 @@ static int init_h2()
 }
 
 REGISTER_POST_CHECK(init_h2);
+
+void testcorrupt(void *ptr)
+{
+   const struct conn_stream *cs = objt_cs(ptr);
+   const struct h2s *h2s;
+
+   if (!cs)
+   return;
+
+   h2s = cs->ctx;
+   if (!h2s)
+   return;
+
+   if (h2s->cs != cs)
+   return;
+
+   if (!h2s->h2c || !h2s->h2c->conn || h2s->h2c->conn->mux != _ops)
+   return;
+
+   if ((long)h2s->subs & 1)
+   ABORT_NOW();
+}
diff --git a/src/stream.c b/src/stream.c
index 43f1432..6646d1a 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -531,6 +531,7 @@ struct stream *stream_new(struct session *sess, enum 
obj_type *origin)
 * the caller must handle the task_wakeup
 */
DBG_TRACE_LEAVE(STRM_EV_STRM_NEW, s);
+   TEST_STRM(s);
return s;
 
/* Error unrolling */
@@ -542,6 +543,7 @@ struct stream *stream_new(struct session *sess, enum 
obj_type *origin)
 out_fail_alloc_si1:
tasklet_free(s->si[0].wait_event.tasklet);
  out_fail_alloc:
+   TEST_STRM(s);
pool_free(pool_head_stream, s);
DBG_TRACE_DEVEL("leaving on error", STRM_EV_STRM_NEW|STRM_EV_STRM_ERR);
return NULL;
@@ -1497,6 +1499,8 @@ struct task *process_stream(struct task *t, void 
*context, unsigned short state)
struct stream_interface *si_f, *si_b;
unsigned int rate;
 
+   TEST_STRM(s);
+
DBG_TRACE_ENTER(STRM_EV_STRM_PROC, s);
 
activity[tid].stream_calls++;
@@ -1594,6 +1598,8 @@ struct task *process_stream(struct task *t, void 
*context, unsigned short state)
}
 
  resync_stream_interface:
+   TEST_STRM(s);
+
/* below we may emit error messages so we have to ensure that we have
 * our buffers properly allocated.
 */
@@ -1658,6 +1664,8 @@ struct task *process_stream(struct task *t, void 
*context, unsigned short state)
/* note: maybe we should process connection errors here ? */
}
 
+   TEST_STRM(s);
+
if (si_state_in(si_b->state, SI_SB_CON|SI_SB_RDY)) {
/* we were trying to establish a connection on the server side,
 * maybe it succeeded, maybe it failed, maybe we timed out, ...
@@ -1677,6 +1685,8 @@ struct task *process_stream(struct task *t, void 
*context, unsigned short state)
 * SI_ST_ASS/SI_ST_TAR/SI_ST_REQ for retryable errors.
 */
}
+   TEST_STRM(s);
+
 
rq_prod_last = si_f->state;
rq_cons_last = si_b->state;
@@ -1707,12 +1717,16 @@ struct task *process_stream(struct task *t, void 
*context, unsigned short state)
}
}
 
+   TEST_STRM(s);
+
/*
 * Note: of the transient states (REQ, CER, DIS), only REQ may remain