Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization

2020-07-08 Thread Willy Tarreau
On Thu, Jul 09, 2020 at 04:25:17AM +, mizuta.take...@fujitsu.com wrote:
> Hi, Willy,
> 
> Thank you for your support.
> Sorry for not considering other platforms.

No worries, that's exactly part of the reasons I preferred to merge this
after the 2.2 release :-)

Willy



RE: [PATCH] MEDIUM: Support TCP keepalive parameters customization

2020-07-08 Thread mizuta.take...@fujitsu.com
Hi, Willy,

Thank you for your support.
Sorry for not considering other platforms.

Best regards,
MIZUTA Takeshi

> -Original Message-
> From: Willy Tarreau 
> Sent: Thursday, July 9, 2020 1:09 PM
> To: Mizuta, Takeshi/水田 健司 
> Cc: 'haproxy@formilux.org' 
> Subject: Re: [PATCH] MEDIUM: Support TCP keepalive parameters
> customization
> 
> On Thu, Jul 09, 2020 at 04:42:29AM +0200, Willy Tarreau wrote:
> > On Thu, Jul 09, 2020 at 02:21:32AM +, mizuta.take...@fujitsu.com
> wrote:
> > > I have attached a patch with dual-sided tcpka-* removed.
> >
> > OK thank you, that's perfect, I'm taking it now!
> 
> Bah, we obviously broke non-linux builds, OSX and Windows immediately
> complained :-)
> 
> I've added the necessary ifdefs so that the keywords and the code are
> only
> available on relevant platforms and updated the doc accordingly.
> 
> Willy



OSX builds in Travis

2020-07-08 Thread Willy Tarreau
Hi Ilya,

is it normal that the OSX build procedure in travis pulls gigabytes of
ruby and python crap, including fonts, libgpg, gnutls, qt, postgresql
and whatnot for many minutes ?

 https://travis-ci.com/github/haproxy/haproxy/jobs/359124175

It's been doing this for more than 12 minutes now without even starting
to build haproxy. I'm suspecting that it's reinstalling a full-featured
desktop operating system, which seems awkward and counter productive at
best.

I don't know if that's automatically done based on broken dependencies
or if it is caused by a preliminary upgrade of the whole system, but I
think we need to address this as it's quite not efficient in this form.

Thanks!
Willy



Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization

2020-07-08 Thread Willy Tarreau
On Thu, Jul 09, 2020 at 04:42:29AM +0200, Willy Tarreau wrote:
> On Thu, Jul 09, 2020 at 02:21:32AM +, mizuta.take...@fujitsu.com wrote:
> > I have attached a patch with dual-sided tcpka-* removed.
> 
> OK thank you, that's perfect, I'm taking it now!

Bah, we obviously broke non-linux builds, OSX and Windows immediately
complained :-)

I've added the necessary ifdefs so that the keywords and the code are only
available on relevant platforms and updated the doc accordingly.

Willy



Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization

2020-07-08 Thread Willy Tarreau
On Thu, Jul 09, 2020 at 02:21:32AM +, mizuta.take...@fujitsu.com wrote:
> Hi, Willy,
> 
> > Given that even in the doc you strongly suggested to use only clika-* and
> > srvka-*, I'd rather kill the dual-sided tcpka-* before they start to be 
> > used.
> 
> Thank you for your suggestion!
> 
> Since clitcpka and srvtcpka have the dual-sided tcpka,
> I constructed these dual-sided parameters by imitating it.
> However, as you say, removing tcpka-* will avoid confusing users.
> 
> I have attached a patch with dual-sided tcpka-* removed.

OK thank you, that's perfect, I'm taking it now!

Cheers,
Willy



RE: [PATCH] MEDIUM: Support TCP keepalive parameters customization

2020-07-08 Thread mizuta.take...@fujitsu.com
Hi, Willy,

> Given that even in the doc you strongly suggested to use only clika-* and
> srvka-*, I'd rather kill the dual-sided tcpka-* before they start to be used.

Thank you for your suggestion!

Since clitcpka and srvtcpka have the dual-sided tcpka,
I constructed these dual-sided parameters by imitating it.
However, as you say, removing tcpka-* will avoid confusing users.

I have attached a patch with dual-sided tcpka-* removed.

Best regards,
MIZUTA Takeshi

> -Original Message-
> From: Willy Tarreau 
> Sent: Wednesday, July 8, 2020 5:03 PM
> To: Mizuta, Takeshi/水田 健司 
> Cc: 'haproxy@formilux.org' 
> Subject: Re: [PATCH] MEDIUM: Support TCP keepalive parameters
> customization
> 
> Hi,
> 
> On Mon, Jul 06, 2020 at 03:01:26AM +, mizuta.take...@fujitsu.com
> wrote:
> > Hi, Willy,
> >
> > Thank you for your quick reply!
> >
> > > But I mean, that's probably OK and I won't argue on this. I'd be
> > > interested in others' opinions and/or suggestions on this, but
> > > it's not critical.
> >
> > Thank you for your comment.
> > If these keywords are inappropriate for users,
> > please feel free to tell me anytime.
> > I'm not particular about these keywords :-)
> >
> > > Yes please, in the same commit so that any eventual backport that
> may
> > > happen isn't lost!
> >
> > I understand.
> > I've included a fix for doc/configuration.txt in the patch.
> > If you have any problems fixing the documentation, please let us know.
> >
> > > Please just add "tcp:" as a subsystem tag.
> > > I think you can tag it MINOR as the impact is extremely low
> >
> > I changed the Subject to:
> >   
> >   MINOR: tcp: Support TCP keepalive parameters customization
> >   
> >
> > Thank you for your cooperation.
> 
> So I had a look at it, everything looks good but I'm thinking that
> the "tcpka-*" keywords which act on both sides will only result in
> trouble, because if they are in a defaults section for example, there
> will be some confusion regarding whether they are used by default or
> the other side-specific keyword is used. The typical case is this one:
> 
> defaults
>  srvka-cnt 10
> 
> backend foo
>  tcpka-cnt 5
> 
> It's not obvious to me which one will have precedence here. Given that
> even in the doc you strongly suggested to use only clika-* and srvka-*,
> I'd rather kill the dual-sided tcpka-* before they start to be used.
> Would that be OK for you ?
> 
> Other than that I'm OK with merging the patch as-is and it's clean.
> 
> Thank you!
> Willy


0001-MINOR-tcp-Support-TCP-keepalive-parameters-customiza.patch
Description:  0001-MINOR-tcp-Support-TCP-keepalive-parameters-customiza.patch


Re: [ANNOUNCE] haproxy-2.2.0

2020-07-08 Thread Willy Tarreau
On Wed, Jul 08, 2020 at 07:53:51PM +0200, Luke Seelenbinder wrote:
> > We would then pick from the first
> > list and if it's empty, then the next one.
> 
> This slightly concerns me. Hopefully I'm just not quite understanding the 
> behavior.
> 
> Would that imply request A would pick from the primary server group for all
> backend requests (including retires) unless the primary is 100% down / empty?
> An ideal path for us (as odd as it may sound), is to allow the ability for
> request A to go to the primary group first, then optionally redispatch to
> secondary group. This isn't currently possible, and is the source of most of
> our remaining 5xx errors.

Ah I didn't catch this requirement. Well, I guess that would only be a
"configuration detail" :-)  After all we already know if a request is
the first one or a retry (the retry counter is decremented for each
attempt, so we can have that info). We could then configure somewhere
that the first group only takes initial attempt (implying that retries
only go to the second one). Or maybe we could do it even smarter and
say that the first group *prefers* initial requests. I.e. a retry ought
to be processed by the second one if available, and fall back to the
primary one otherwise.

> > We'd just document that the keyword "backup" means "server of the
> > secondary group", and probably figure new actions or decisions to
> > force to use one group over the other one.
> 
> I think if these actions are capable of changing the group picked by retries,
> that addresses my concerns.

Not necessarily but the above yes ;-)

> > I'm dumping all that in case it can help you get a better idea of the
> > various mid-term possibilities and what the steps could be (and also what
> > not to do if we don't want to shoot ourselves in the foot).
> 
> That helps my understanding quite a bit, too!
> 
> Regarding queues, LB algorithms, and such, this is of lesser concern for us.
> We want to reasonably fairly pick backends, but beyond that, we don't much
> care (perhaps therein lies the rub). I was a bit surprised to read that
> requests are queued for particular servers vs for a particular group at the
> moment,

Be careful, this only applies if the LB algorithm *mandates* a certain
server. that's why there's a distinction between determinist and non-
determinist algorithms. The determinist ones will not put their requests
into the backend's queue because you certainly don't want the wrong server
to pick them. Think a hash on the source address for example. It's the same
with cookie-based persistence. This is the case where the server's queue is
used. But non-determinist algorithms put their requests into the backend's
queue which is shared between all servers so that the first available one
will pick the requests. This is what guarantees shortest (and fair) queue
time among all requests, even if some servers are faster than others or if
some requests stay for too long.

> which has some interesting implications for L7 retries based on 5xx
> errors which in turn result in the server being marked down. It could explain
> why we're seeing occasional edge cases of errors that don't make complete
> sense. (Request D comes in, is scheduled for a server, the server goes down
> along with the rest of the group due to Requests A, B, and C failing, Request
> D then fails by default, since the group is empty.)

If a requests is in a server's queue, it will be handled because it means
we have nowhere else to send it. If, however it's in the backend's queue,
it means any server will pick it. However it is totally possible that the
time needed to switch the last server from up to down and refill the farm
with the backup servers leaves enough time for another thread to see an
empty farm and return a 503! Having two server groups could possibly make
this much smoother since we wouldn't have to reconstruct a new group when
switching as there would be no switch at all.

> A first step towards this would be to allow requests to be redispatched to
> the backup group.

This is always the case when option redispatch is set. "redispatch" in haproxy's
terminology means "break persistence and enforce load balancing again". But by
default it applies as a last resort, so it's done on the last retry. You can
change this (I don't remember if it's the redispatch or retries option which
takes an optional number).

Usually when you have a shopping cart you want to go to the same server you
were connected to, and work around rare network glitches by supporting a few
retries. However if the server is really dead you prefer to be forwarded to
another server who will say something polite to you. That's the idea. But in
your case if nobody sticks to a given server you shouldn't care at all about
that and should prefer to pick any server for any retry.

> That would eliminate many of our issues. We're fine with a
> few slower requests if we know they'll likely succeed the second time around
> (because the slow 

Re: [ANNOUNCE] haproxy-2.2.0

2020-07-08 Thread Luke Seelenbinder
Hi Willy,

Thanks for your tome treatment of my ideas! I forgot how much I enjoyed reading 
them. :)

>> To dig up an old discussion--I took a look at better support for SRV records
>> (using the priority field as backup/non-backup, etc.) a few weeks ago, but
>> determined it didn't make sense in our use case. The issue is 0 weighted
>> servers are considerably less useful to us since they aren't ever used, even
>> in the condition where every other server is down.
> 
> I seem to remember a discussion about making this configurable but I
> don't seem to see any commit matching anything like that, so maybe the
> discussion ended up in "change the behavior again the previous one was
> wrong", I don't remember well.

It was quite a long time ago (March), but I didn't have a chance to test 
behavior and look at the code until a few weeks ago.

> With your approach it would be almost identical except that we would
> always have two load-balancing groups, a primary one and a secondary
> one, the first one made only of the active servers and the second one
> made only of the backup servers.

Great! I'm glad it isn't a huge departure from the present code.

> We would then pick from the first
> list and if it's empty, then the next one.

This slightly concerns me. Hopefully I'm just not quite understanding the 
behavior.

Would that imply request A would pick from the primary server group for all 
backend requests (including retires) unless the primary is 100% down / empty? 
An ideal path for us (as odd as it may sound), is to allow the ability for 
request A to go to the primary group first, then optionally redispatch to 
secondary group. This isn't currently possible, and is the source of most of 
our remaining 5xx errors.

> We'd just document that the keyword "backup" means "server of the
> secondary group", and probably figure new actions or decisions to
> force to use one group over the other one.

I think if these actions are capable of changing the group picked by retries, 
that addresses my concerns.

> I'm dumping all that in case it can help you get a better idea of the
> various mid-term possibilities and what the steps could be (and also what
> not to do if we don't want to shoot ourselves in the foot).

That helps my understanding quite a bit, too!

Regarding queues, LB algorithms, and such, this is of lesser concern for us. We 
want to reasonably fairly pick backends, but beyond that, we don't much care 
(perhaps therein lies the rub). I was a bit surprised to read that requests are 
queued for particular servers vs for a particular group at the moment, which 
has some interesting implications for L7 retries based on 5xx errors which in 
turn result in the server being marked down. It could explain why we're seeing 
occasional edge cases of errors that don't make complete sense. (Request D 
comes in, is scheduled for a server, the server goes down along with the rest 
of the group due to Requests A, B, and C failing, Request D then fails by 
default, since the group is empty.)

A first step towards this would be to allow requests to be redispatched to the 
backup group. That would eliminate many of our issues. We're fine with a few 
slower requests if we know they'll likely succeed the second time around 
(because the slow region is not handling both). It'd likely help our 99p and 
999p times a good bit.

I was hoping 0 weighted servers would allow for this, but I was mistaken, since 
0 weighted servers are even less used than backup servers. :-)

I hope this helps clarify our needs.

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 8 Jul 2020, at 19:34, Willy Tarreau  wrote:
> 
> Hi Luke!
> 
> On Wed, Jul 08, 2020 at 11:57:15AM +0200, Luke Seelenbinder wrote:
>> I've been following along the torturous road, and I'm happy to see all the
>> issues resolved and the excellent results.
> 
> You can imagine how I am as well :-)
> 
>> Personally, I'm excited about the
>> performance gains. I'll deploy this soon on our network.
> 
> OK!
> 
>> To dig up an old discussion--I took a look at better support for SRV records
>> (using the priority field as backup/non-backup, etc.) a few weeks ago, but
>> determined it didn't make sense in our use case. The issue is 0 weighted
>> servers are considerably less useful to us since they aren't ever used, even
>> in the condition where every other server is down.
> 
> I seem to remember a discussion about making this configurable but I
> don't seem to see any commit matching anything like that, so maybe the
> discussion ended up in "change the behavior again the previous one was
> wrong", I don't remember well.
> 
>> That raises the next question: is the idea of server groups (with the ability
>> for a request to try group 1, then group 2, etc. on retries) in the
>> development plans at some point? Would that be something I could tinker as a
>> longer term project?
> 
> That could indeed be an interesting approach because we already almost do

Re: [ANNOUNCE] haproxy-2.2.0

2020-07-08 Thread Willy Tarreau
Hi Luke!

On Wed, Jul 08, 2020 at 11:57:15AM +0200, Luke Seelenbinder wrote:
> I've been following along the torturous road, and I'm happy to see all the
> issues resolved and the excellent results.

You can imagine how I am as well :-)

> Personally, I'm excited about the
> performance gains. I'll deploy this soon on our network.

OK!

> To dig up an old discussion--I took a look at better support for SRV records
> (using the priority field as backup/non-backup, etc.) a few weeks ago, but
> determined it didn't make sense in our use case. The issue is 0 weighted
> servers are considerably less useful to us since they aren't ever used, even
> in the condition where every other server is down.

I seem to remember a discussion about making this configurable but I
don't seem to see any commit matching anything like that, so maybe the
discussion ended up in "change the behavior again the previous one was
wrong", I don't remember well.

> That raises the next question: is the idea of server groups (with the ability
> for a request to try group 1, then group 2, etc. on retries) in the
> development plans at some point? Would that be something I could tinker as a
> longer term project?

That could indeed be an interesting approach because we already almost do
that between active and backup servers, except that there is always one
single group at a time. In fact there are 4 possible states for a servers
group:

  - populated only with all active servers which are UP or unchecked,
provided that there is at least one such server ;

  - populated only with all backup servers which are UP or unchecked,
provided there is at least one such server, that no active server
exists in UP or unchecked state, and that option useallbackups is
set;

  - populated with the first UP or unchecked backup server, provided that
there is at last one such server, that no active server exists in UP
or unchecked state, and that option useallbackups is not set;

  - no server: all are down ;

With your approach it would be almost identical except that we would
always have two load-balancing groups, a primary one and a secondary
one, the first one made only of the active servers and the second one
made only of the backup servers. We would then pick from the first
list and if it's empty, then the next one.

I shouldn't even consume too much memory since the structures used to
attach the servers to the group are carried by the servers themselves.
Only static hash-based algorithm would cause a memory increase on the
backend but they're rarely used with many servers due to the high risk
of rebalancing so I gues that could be a pretty reasonable change.

We'd just document that the keyword "backup" means "server of the
secondary group", and probably figure new actions or decisions to
force to use one group over the other one.

Please note that I'd rather avoid adding too many groups into a farm
because we don't want to start to scan many of them. If keeping 2 as
we have today is already sufficient for your use case, I'd rather
stick to this.

We still need to put a bit more thoughts on this because I vaguely
remember an old discussion where someone wanted to use a different
LB algorithm for the backup servers. Here in terms of implementation
it would not be a big deal, we could have one LB algo per group. But
in terms of configuration (for the user) and configuration storage
(in the code), it would be a real pain. But possibly that it would
still be worth the price if it starts to allow to assemble a backend
by "merging" several groups (that's a crazy old idea that has been
floating around for 10+ years and which could possibly make sense in
the future to address certain use cases).

If you're interested in going on these ideas, please, oh please, never
forget about the queues (those that are used when you set a maxconn
parameter), because their behavior is thightly coupled with the LB
algorithms, and the difficulty is to make sure a server which frees
a connection slot can immediately pick the oldest pending request
either in its own queue (server already assigned) or the backend's
(don't care about what server handles the request). This may become
more difficult when dealing with several groups, hence possibly queues.

My secret agenda would ideally be to one day support shared server groups
with their own queues between multiple backends so that we don't even
need to divide the servers' maxconn anymore. But it's still lacking some
reflexion.

I'm dumping all that in case it can help you get a better idea of the
various mid-term possibilities and what the steps could be (and also what
not to do if we don't want to shoot ourselves in the foot).

Cheers,
Willy



Re: [ANNOUNCE] haproxy-2.2.0

2020-07-08 Thread Luke Seelenbinder
Congrats on the release, Willy & the rest of the team!

I've been following along the torturous road, and I'm happy to see all the 
issues resolved and the excellent results. Personally, I'm excited about the 
performance gains. I'll deploy this soon on our network.

To dig up an old discussion—I took a look at better support for SRV records 
(using the priority field as backup/non-backup, etc.) a few weeks ago, but 
determined it didn't make sense in our use case. The issue is 0 weighted 
servers are considerably less useful to us since they aren't ever used, even in 
the condition where every other server is down.

That raises the next question: is the idea of server groups (with the ability 
for a request to try group 1, then group 2, etc. on retries) in the development 
plans at some point? Would that be something I could tinker as a longer term 
project?

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

> On 7 Jul 2020, at 19:20, Willy Tarreau  wrote:
> 
> Hi,
> 
> HAProxy 2.2.0 was released on 2020/07/07. It added 24 new commits
> after version 2.2-dev12.
> 
> There were very few last-minute changes since dev12, just as I hoped,
> that's pretty fine.
> 
> We're late by about 1 month compared to the initial planning, which is
> not terrible and should be seen instead as an investment on the debugging
> cycle since almost only bug fixes were merged during that period. In the
> end you get a better version later.
> 
> While I was initially worried that this version didn't seem to contain
> any outstanding changes, looking back in the mirror tells be it's another
> awesome one instead:
> 
>  - dynamic content emission:
> - "http-request return" directive to build dynamic responses ;
> - rewrite of headers (including our own) after the response ;
> - dynamic error files (errorfiles can be used as templates to
>   deliver personalized pages)
> 
>  - further improvements to TLS runtime certificates management:
> - insertion of new certificates
> - split of key and cert
> - manipulation and creation of crt-lists
> - even directories can be handled
> 
>And by the way now TLSv1.2 is set as the default minimum version.
> 
>  - significant reduction of server-side resources by sharing idle
>connection pools between all threads ; till 2.1 if you had 64 threads,
>each of them had its own connections, so the reuse rate was lower, and
>the idle connection count was very high. This is not the case anymore.
> 
>  - health-checks were rewritten to all rely on tcp-check rules behind the
>curtains. This allowed to get rid of all the dirt we had accumulate over
>18 years and to write extensible checks. New ones are much easier to add.
>In addition we now have http-checks which support header and body
>addition, and which pass through muxes (HTTP/1 and HTTP/2).
> 
>  - ring buffer creation with ability to forward any event to any log server
>including over TCP. This means that it's now possible to log over a TCP
>syslog server, and that adding new protocols should be fairly easy.
> 
>  - further refined and improved debugging (symbols in panic dumps, malloc
>debugging, more activity counters)
> 
>  - the default security was improved. For example fork() is forbidden by
>default, which will block against any potential code execution (and
>will also block external checks by default unless explicitly unblocked).
> 
>  - new performance improvements in the scheduler and I/O layers, reducing
>the cost of I/O processing and overall latency. I've known from private
>discussions that some noticed tremendous gains there.
> 
> I'm pretty sure there are many other things but I don't remember, I'm
> looking at my notes. I'm aware that HaproxyTech will soon post an in-depth
> review on the haproxy.com blog so just have a look there for all the details.
> (edit: it's already there: 
> https://www.haproxy.com/blog/announcing-haproxy-2-2/ ).
> 
> There are three things I noted during the development of this version.
> 
> The first one is that with the myriad of new tools we're using to help
> users and improve our code quality (discourse, travis, cirrus, oss-fuzz,
> mailing-list etc), some people really found their role in the project and
> are becoming more autonomous. This definitely scales much better and helps
> me spend less time on things that are not directly connected to my code
> activities, so thank you very much for this (Lukas, Tim, Ilya, Cyril).
> 
> The second one is that this is the first version that has been tortured
> in production long before the release. And when I'm saying "tortured", I
> really mean it, because several of us were suffering as well. But it
> allowed to address very serious issues that would have been a nightmare
> to debug and fix post-release. For this I really want to publicly thank
> William Dauchy for all his work and involvement on this, and for all the
> very detailed reports he's 

Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization

2020-07-08 Thread Willy Tarreau
Hi,

On Mon, Jul 06, 2020 at 03:01:26AM +, mizuta.take...@fujitsu.com wrote:
> Hi, Willy,
> 
> Thank you for your quick reply!
> 
> > But I mean, that's probably OK and I won't argue on this. I'd be
> > interested in others' opinions and/or suggestions on this, but
> > it's not critical.
> 
> Thank you for your comment.
> If these keywords are inappropriate for users,
> please feel free to tell me anytime.
> I'm not particular about these keywords :-)
> 
> > Yes please, in the same commit so that any eventual backport that may
> > happen isn't lost!
> 
> I understand.
> I've included a fix for doc/configuration.txt in the patch.
> If you have any problems fixing the documentation, please let us know.
> 
> > Please just add "tcp:" as a subsystem tag.
> > I think you can tag it MINOR as the impact is extremely low
> 
> I changed the Subject to:
>   
>   MINOR: tcp: Support TCP keepalive parameters customization
>   
> 
> Thank you for your cooperation.

So I had a look at it, everything looks good but I'm thinking that
the "tcpka-*" keywords which act on both sides will only result in
trouble, because if they are in a defaults section for example, there
will be some confusion regarding whether they are used by default or
the other side-specific keyword is used. The typical case is this one:

defaults
 srvka-cnt 10

backend foo
 tcpka-cnt 5

It's not obvious to me which one will have precedence here. Given that
even in the doc you strongly suggested to use only clika-* and srvka-*,
I'd rather kill the dual-sided tcpka-* before they start to be used.
Would that be OK for you ?

Other than that I'm OK with merging the patch as-is and it's clean.

Thank you!
Willy