Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization
On Thu, Jul 09, 2020 at 04:25:17AM +, mizuta.take...@fujitsu.com wrote: > Hi, Willy, > > Thank you for your support. > Sorry for not considering other platforms. No worries, that's exactly part of the reasons I preferred to merge this after the 2.2 release :-) Willy
RE: [PATCH] MEDIUM: Support TCP keepalive parameters customization
Hi, Willy, Thank you for your support. Sorry for not considering other platforms. Best regards, MIZUTA Takeshi > -Original Message- > From: Willy Tarreau > Sent: Thursday, July 9, 2020 1:09 PM > To: Mizuta, Takeshi/水田 健司 > Cc: 'haproxy@formilux.org' > Subject: Re: [PATCH] MEDIUM: Support TCP keepalive parameters > customization > > On Thu, Jul 09, 2020 at 04:42:29AM +0200, Willy Tarreau wrote: > > On Thu, Jul 09, 2020 at 02:21:32AM +, mizuta.take...@fujitsu.com > wrote: > > > I have attached a patch with dual-sided tcpka-* removed. > > > > OK thank you, that's perfect, I'm taking it now! > > Bah, we obviously broke non-linux builds, OSX and Windows immediately > complained :-) > > I've added the necessary ifdefs so that the keywords and the code are > only > available on relevant platforms and updated the doc accordingly. > > Willy
OSX builds in Travis
Hi Ilya, is it normal that the OSX build procedure in travis pulls gigabytes of ruby and python crap, including fonts, libgpg, gnutls, qt, postgresql and whatnot for many minutes ? https://travis-ci.com/github/haproxy/haproxy/jobs/359124175 It's been doing this for more than 12 minutes now without even starting to build haproxy. I'm suspecting that it's reinstalling a full-featured desktop operating system, which seems awkward and counter productive at best. I don't know if that's automatically done based on broken dependencies or if it is caused by a preliminary upgrade of the whole system, but I think we need to address this as it's quite not efficient in this form. Thanks! Willy
Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization
On Thu, Jul 09, 2020 at 04:42:29AM +0200, Willy Tarreau wrote: > On Thu, Jul 09, 2020 at 02:21:32AM +, mizuta.take...@fujitsu.com wrote: > > I have attached a patch with dual-sided tcpka-* removed. > > OK thank you, that's perfect, I'm taking it now! Bah, we obviously broke non-linux builds, OSX and Windows immediately complained :-) I've added the necessary ifdefs so that the keywords and the code are only available on relevant platforms and updated the doc accordingly. Willy
Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization
On Thu, Jul 09, 2020 at 02:21:32AM +, mizuta.take...@fujitsu.com wrote: > Hi, Willy, > > > Given that even in the doc you strongly suggested to use only clika-* and > > srvka-*, I'd rather kill the dual-sided tcpka-* before they start to be > > used. > > Thank you for your suggestion! > > Since clitcpka and srvtcpka have the dual-sided tcpka, > I constructed these dual-sided parameters by imitating it. > However, as you say, removing tcpka-* will avoid confusing users. > > I have attached a patch with dual-sided tcpka-* removed. OK thank you, that's perfect, I'm taking it now! Cheers, Willy
RE: [PATCH] MEDIUM: Support TCP keepalive parameters customization
Hi, Willy, > Given that even in the doc you strongly suggested to use only clika-* and > srvka-*, I'd rather kill the dual-sided tcpka-* before they start to be used. Thank you for your suggestion! Since clitcpka and srvtcpka have the dual-sided tcpka, I constructed these dual-sided parameters by imitating it. However, as you say, removing tcpka-* will avoid confusing users. I have attached a patch with dual-sided tcpka-* removed. Best regards, MIZUTA Takeshi > -Original Message- > From: Willy Tarreau > Sent: Wednesday, July 8, 2020 5:03 PM > To: Mizuta, Takeshi/水田 健司 > Cc: 'haproxy@formilux.org' > Subject: Re: [PATCH] MEDIUM: Support TCP keepalive parameters > customization > > Hi, > > On Mon, Jul 06, 2020 at 03:01:26AM +, mizuta.take...@fujitsu.com > wrote: > > Hi, Willy, > > > > Thank you for your quick reply! > > > > > But I mean, that's probably OK and I won't argue on this. I'd be > > > interested in others' opinions and/or suggestions on this, but > > > it's not critical. > > > > Thank you for your comment. > > If these keywords are inappropriate for users, > > please feel free to tell me anytime. > > I'm not particular about these keywords :-) > > > > > Yes please, in the same commit so that any eventual backport that > may > > > happen isn't lost! > > > > I understand. > > I've included a fix for doc/configuration.txt in the patch. > > If you have any problems fixing the documentation, please let us know. > > > > > Please just add "tcp:" as a subsystem tag. > > > I think you can tag it MINOR as the impact is extremely low > > > > I changed the Subject to: > > > > MINOR: tcp: Support TCP keepalive parameters customization > > > > > > Thank you for your cooperation. > > So I had a look at it, everything looks good but I'm thinking that > the "tcpka-*" keywords which act on both sides will only result in > trouble, because if they are in a defaults section for example, there > will be some confusion regarding whether they are used by default or > the other side-specific keyword is used. The typical case is this one: > > defaults > srvka-cnt 10 > > backend foo > tcpka-cnt 5 > > It's not obvious to me which one will have precedence here. Given that > even in the doc you strongly suggested to use only clika-* and srvka-*, > I'd rather kill the dual-sided tcpka-* before they start to be used. > Would that be OK for you ? > > Other than that I'm OK with merging the patch as-is and it's clean. > > Thank you! > Willy 0001-MINOR-tcp-Support-TCP-keepalive-parameters-customiza.patch Description: 0001-MINOR-tcp-Support-TCP-keepalive-parameters-customiza.patch
Re: [ANNOUNCE] haproxy-2.2.0
On Wed, Jul 08, 2020 at 07:53:51PM +0200, Luke Seelenbinder wrote: > > We would then pick from the first > > list and if it's empty, then the next one. > > This slightly concerns me. Hopefully I'm just not quite understanding the > behavior. > > Would that imply request A would pick from the primary server group for all > backend requests (including retires) unless the primary is 100% down / empty? > An ideal path for us (as odd as it may sound), is to allow the ability for > request A to go to the primary group first, then optionally redispatch to > secondary group. This isn't currently possible, and is the source of most of > our remaining 5xx errors. Ah I didn't catch this requirement. Well, I guess that would only be a "configuration detail" :-) After all we already know if a request is the first one or a retry (the retry counter is decremented for each attempt, so we can have that info). We could then configure somewhere that the first group only takes initial attempt (implying that retries only go to the second one). Or maybe we could do it even smarter and say that the first group *prefers* initial requests. I.e. a retry ought to be processed by the second one if available, and fall back to the primary one otherwise. > > We'd just document that the keyword "backup" means "server of the > > secondary group", and probably figure new actions or decisions to > > force to use one group over the other one. > > I think if these actions are capable of changing the group picked by retries, > that addresses my concerns. Not necessarily but the above yes ;-) > > I'm dumping all that in case it can help you get a better idea of the > > various mid-term possibilities and what the steps could be (and also what > > not to do if we don't want to shoot ourselves in the foot). > > That helps my understanding quite a bit, too! > > Regarding queues, LB algorithms, and such, this is of lesser concern for us. > We want to reasonably fairly pick backends, but beyond that, we don't much > care (perhaps therein lies the rub). I was a bit surprised to read that > requests are queued for particular servers vs for a particular group at the > moment, Be careful, this only applies if the LB algorithm *mandates* a certain server. that's why there's a distinction between determinist and non- determinist algorithms. The determinist ones will not put their requests into the backend's queue because you certainly don't want the wrong server to pick them. Think a hash on the source address for example. It's the same with cookie-based persistence. This is the case where the server's queue is used. But non-determinist algorithms put their requests into the backend's queue which is shared between all servers so that the first available one will pick the requests. This is what guarantees shortest (and fair) queue time among all requests, even if some servers are faster than others or if some requests stay for too long. > which has some interesting implications for L7 retries based on 5xx > errors which in turn result in the server being marked down. It could explain > why we're seeing occasional edge cases of errors that don't make complete > sense. (Request D comes in, is scheduled for a server, the server goes down > along with the rest of the group due to Requests A, B, and C failing, Request > D then fails by default, since the group is empty.) If a requests is in a server's queue, it will be handled because it means we have nowhere else to send it. If, however it's in the backend's queue, it means any server will pick it. However it is totally possible that the time needed to switch the last server from up to down and refill the farm with the backup servers leaves enough time for another thread to see an empty farm and return a 503! Having two server groups could possibly make this much smoother since we wouldn't have to reconstruct a new group when switching as there would be no switch at all. > A first step towards this would be to allow requests to be redispatched to > the backup group. This is always the case when option redispatch is set. "redispatch" in haproxy's terminology means "break persistence and enforce load balancing again". But by default it applies as a last resort, so it's done on the last retry. You can change this (I don't remember if it's the redispatch or retries option which takes an optional number). Usually when you have a shopping cart you want to go to the same server you were connected to, and work around rare network glitches by supporting a few retries. However if the server is really dead you prefer to be forwarded to another server who will say something polite to you. That's the idea. But in your case if nobody sticks to a given server you shouldn't care at all about that and should prefer to pick any server for any retry. > That would eliminate many of our issues. We're fine with a > few slower requests if we know they'll likely succeed the second time around > (because the slow regi
Re: [ANNOUNCE] haproxy-2.2.0
Hi Willy, Thanks for your tome treatment of my ideas! I forgot how much I enjoyed reading them. :) >> To dig up an old discussion--I took a look at better support for SRV records >> (using the priority field as backup/non-backup, etc.) a few weeks ago, but >> determined it didn't make sense in our use case. The issue is 0 weighted >> servers are considerably less useful to us since they aren't ever used, even >> in the condition where every other server is down. > > I seem to remember a discussion about making this configurable but I > don't seem to see any commit matching anything like that, so maybe the > discussion ended up in "change the behavior again the previous one was > wrong", I don't remember well. It was quite a long time ago (March), but I didn't have a chance to test behavior and look at the code until a few weeks ago. > With your approach it would be almost identical except that we would > always have two load-balancing groups, a primary one and a secondary > one, the first one made only of the active servers and the second one > made only of the backup servers. Great! I'm glad it isn't a huge departure from the present code. > We would then pick from the first > list and if it's empty, then the next one. This slightly concerns me. Hopefully I'm just not quite understanding the behavior. Would that imply request A would pick from the primary server group for all backend requests (including retires) unless the primary is 100% down / empty? An ideal path for us (as odd as it may sound), is to allow the ability for request A to go to the primary group first, then optionally redispatch to secondary group. This isn't currently possible, and is the source of most of our remaining 5xx errors. > We'd just document that the keyword "backup" means "server of the > secondary group", and probably figure new actions or decisions to > force to use one group over the other one. I think if these actions are capable of changing the group picked by retries, that addresses my concerns. > I'm dumping all that in case it can help you get a better idea of the > various mid-term possibilities and what the steps could be (and also what > not to do if we don't want to shoot ourselves in the foot). That helps my understanding quite a bit, too! Regarding queues, LB algorithms, and such, this is of lesser concern for us. We want to reasonably fairly pick backends, but beyond that, we don't much care (perhaps therein lies the rub). I was a bit surprised to read that requests are queued for particular servers vs for a particular group at the moment, which has some interesting implications for L7 retries based on 5xx errors which in turn result in the server being marked down. It could explain why we're seeing occasional edge cases of errors that don't make complete sense. (Request D comes in, is scheduled for a server, the server goes down along with the rest of the group due to Requests A, B, and C failing, Request D then fails by default, since the group is empty.) A first step towards this would be to allow requests to be redispatched to the backup group. That would eliminate many of our issues. We're fine with a few slower requests if we know they'll likely succeed the second time around (because the slow region is not handling both). It'd likely help our 99p and 999p times a good bit. I was hoping 0 weighted servers would allow for this, but I was mistaken, since 0 weighted servers are even less used than backup servers. :-) I hope this helps clarify our needs. Best, Luke — Luke Seelenbinder Stadia Maps | Founder stadiamaps.com > On 8 Jul 2020, at 19:34, Willy Tarreau wrote: > > Hi Luke! > > On Wed, Jul 08, 2020 at 11:57:15AM +0200, Luke Seelenbinder wrote: >> I've been following along the torturous road, and I'm happy to see all the >> issues resolved and the excellent results. > > You can imagine how I am as well :-) > >> Personally, I'm excited about the >> performance gains. I'll deploy this soon on our network. > > OK! > >> To dig up an old discussion--I took a look at better support for SRV records >> (using the priority field as backup/non-backup, etc.) a few weeks ago, but >> determined it didn't make sense in our use case. The issue is 0 weighted >> servers are considerably less useful to us since they aren't ever used, even >> in the condition where every other server is down. > > I seem to remember a discussion about making this configurable but I > don't seem to see any commit matching anything like that, so maybe the > discussion ended up in "change the behavior again the previous one was > wrong", I don't remember well. > >> That raises the next question: is the idea of server groups (with the ability >> for a request to try group 1, then group 2, etc. on retries) in the >> development plans at some point? Would that be something I could tinker as a >> longer term project? > > That could indeed be an interesting approach because we already almost do
Re: [ANNOUNCE] haproxy-2.2.0
Hi Luke! On Wed, Jul 08, 2020 at 11:57:15AM +0200, Luke Seelenbinder wrote: > I've been following along the torturous road, and I'm happy to see all the > issues resolved and the excellent results. You can imagine how I am as well :-) > Personally, I'm excited about the > performance gains. I'll deploy this soon on our network. OK! > To dig up an old discussion--I took a look at better support for SRV records > (using the priority field as backup/non-backup, etc.) a few weeks ago, but > determined it didn't make sense in our use case. The issue is 0 weighted > servers are considerably less useful to us since they aren't ever used, even > in the condition where every other server is down. I seem to remember a discussion about making this configurable but I don't seem to see any commit matching anything like that, so maybe the discussion ended up in "change the behavior again the previous one was wrong", I don't remember well. > That raises the next question: is the idea of server groups (with the ability > for a request to try group 1, then group 2, etc. on retries) in the > development plans at some point? Would that be something I could tinker as a > longer term project? That could indeed be an interesting approach because we already almost do that between active and backup servers, except that there is always one single group at a time. In fact there are 4 possible states for a servers group: - populated only with all active servers which are UP or unchecked, provided that there is at least one such server ; - populated only with all backup servers which are UP or unchecked, provided there is at least one such server, that no active server exists in UP or unchecked state, and that option useallbackups is set; - populated with the first UP or unchecked backup server, provided that there is at last one such server, that no active server exists in UP or unchecked state, and that option useallbackups is not set; - no server: all are down ; With your approach it would be almost identical except that we would always have two load-balancing groups, a primary one and a secondary one, the first one made only of the active servers and the second one made only of the backup servers. We would then pick from the first list and if it's empty, then the next one. I shouldn't even consume too much memory since the structures used to attach the servers to the group are carried by the servers themselves. Only static hash-based algorithm would cause a memory increase on the backend but they're rarely used with many servers due to the high risk of rebalancing so I gues that could be a pretty reasonable change. We'd just document that the keyword "backup" means "server of the secondary group", and probably figure new actions or decisions to force to use one group over the other one. Please note that I'd rather avoid adding too many groups into a farm because we don't want to start to scan many of them. If keeping 2 as we have today is already sufficient for your use case, I'd rather stick to this. We still need to put a bit more thoughts on this because I vaguely remember an old discussion where someone wanted to use a different LB algorithm for the backup servers. Here in terms of implementation it would not be a big deal, we could have one LB algo per group. But in terms of configuration (for the user) and configuration storage (in the code), it would be a real pain. But possibly that it would still be worth the price if it starts to allow to assemble a backend by "merging" several groups (that's a crazy old idea that has been floating around for 10+ years and which could possibly make sense in the future to address certain use cases). If you're interested in going on these ideas, please, oh please, never forget about the queues (those that are used when you set a maxconn parameter), because their behavior is thightly coupled with the LB algorithms, and the difficulty is to make sure a server which frees a connection slot can immediately pick the oldest pending request either in its own queue (server already assigned) or the backend's (don't care about what server handles the request). This may become more difficult when dealing with several groups, hence possibly queues. My secret agenda would ideally be to one day support shared server groups with their own queues between multiple backends so that we don't even need to divide the servers' maxconn anymore. But it's still lacking some reflexion. I'm dumping all that in case it can help you get a better idea of the various mid-term possibilities and what the steps could be (and also what not to do if we don't want to shoot ourselves in the foot). Cheers, Willy
Re: [ANNOUNCE] haproxy-2.2.0
Congrats on the release, Willy & the rest of the team! I've been following along the torturous road, and I'm happy to see all the issues resolved and the excellent results. Personally, I'm excited about the performance gains. I'll deploy this soon on our network. To dig up an old discussion—I took a look at better support for SRV records (using the priority field as backup/non-backup, etc.) a few weeks ago, but determined it didn't make sense in our use case. The issue is 0 weighted servers are considerably less useful to us since they aren't ever used, even in the condition where every other server is down. That raises the next question: is the idea of server groups (with the ability for a request to try group 1, then group 2, etc. on retries) in the development plans at some point? Would that be something I could tinker as a longer term project? Best, Luke — Luke Seelenbinder Stadia Maps | Founder stadiamaps.com > On 7 Jul 2020, at 19:20, Willy Tarreau wrote: > > Hi, > > HAProxy 2.2.0 was released on 2020/07/07. It added 24 new commits > after version 2.2-dev12. > > There were very few last-minute changes since dev12, just as I hoped, > that's pretty fine. > > We're late by about 1 month compared to the initial planning, which is > not terrible and should be seen instead as an investment on the debugging > cycle since almost only bug fixes were merged during that period. In the > end you get a better version later. > > While I was initially worried that this version didn't seem to contain > any outstanding changes, looking back in the mirror tells be it's another > awesome one instead: > > - dynamic content emission: > - "http-request return" directive to build dynamic responses ; > - rewrite of headers (including our own) after the response ; > - dynamic error files (errorfiles can be used as templates to > deliver personalized pages) > > - further improvements to TLS runtime certificates management: > - insertion of new certificates > - split of key and cert > - manipulation and creation of crt-lists > - even directories can be handled > >And by the way now TLSv1.2 is set as the default minimum version. > > - significant reduction of server-side resources by sharing idle >connection pools between all threads ; till 2.1 if you had 64 threads, >each of them had its own connections, so the reuse rate was lower, and >the idle connection count was very high. This is not the case anymore. > > - health-checks were rewritten to all rely on tcp-check rules behind the >curtains. This allowed to get rid of all the dirt we had accumulate over >18 years and to write extensible checks. New ones are much easier to add. >In addition we now have http-checks which support header and body >addition, and which pass through muxes (HTTP/1 and HTTP/2). > > - ring buffer creation with ability to forward any event to any log server >including over TCP. This means that it's now possible to log over a TCP >syslog server, and that adding new protocols should be fairly easy. > > - further refined and improved debugging (symbols in panic dumps, malloc >debugging, more activity counters) > > - the default security was improved. For example fork() is forbidden by >default, which will block against any potential code execution (and >will also block external checks by default unless explicitly unblocked). > > - new performance improvements in the scheduler and I/O layers, reducing >the cost of I/O processing and overall latency. I've known from private >discussions that some noticed tremendous gains there. > > I'm pretty sure there are many other things but I don't remember, I'm > looking at my notes. I'm aware that HaproxyTech will soon post an in-depth > review on the haproxy.com blog so just have a look there for all the details. > (edit: it's already there: > https://www.haproxy.com/blog/announcing-haproxy-2-2/ ). > > There are three things I noted during the development of this version. > > The first one is that with the myriad of new tools we're using to help > users and improve our code quality (discourse, travis, cirrus, oss-fuzz, > mailing-list etc), some people really found their role in the project and > are becoming more autonomous. This definitely scales much better and helps > me spend less time on things that are not directly connected to my code > activities, so thank you very much for this (Lukas, Tim, Ilya, Cyril). > > The second one is that this is the first version that has been tortured > in production long before the release. And when I'm saying "tortured", I > really mean it, because several of us were suffering as well. But it > allowed to address very serious issues that would have been a nightmare > to debug and fix post-release. For this I really want to publicly thank > William Dauchy for all his work and involvement on this, and for all the > very detailed reports he's sen
Re: [PATCH] MEDIUM: Support TCP keepalive parameters customization
Hi, On Mon, Jul 06, 2020 at 03:01:26AM +, mizuta.take...@fujitsu.com wrote: > Hi, Willy, > > Thank you for your quick reply! > > > But I mean, that's probably OK and I won't argue on this. I'd be > > interested in others' opinions and/or suggestions on this, but > > it's not critical. > > Thank you for your comment. > If these keywords are inappropriate for users, > please feel free to tell me anytime. > I'm not particular about these keywords :-) > > > Yes please, in the same commit so that any eventual backport that may > > happen isn't lost! > > I understand. > I've included a fix for doc/configuration.txt in the patch. > If you have any problems fixing the documentation, please let us know. > > > Please just add "tcp:" as a subsystem tag. > > I think you can tag it MINOR as the impact is extremely low > > I changed the Subject to: > > MINOR: tcp: Support TCP keepalive parameters customization > > > Thank you for your cooperation. So I had a look at it, everything looks good but I'm thinking that the "tcpka-*" keywords which act on both sides will only result in trouble, because if they are in a defaults section for example, there will be some confusion regarding whether they are used by default or the other side-specific keyword is used. The typical case is this one: defaults srvka-cnt 10 backend foo tcpka-cnt 5 It's not obvious to me which one will have precedence here. Given that even in the doc you strongly suggested to use only clika-* and srvka-*, I'd rather kill the dual-sided tcpka-* before they start to be used. Would that be OK for you ? Other than that I'm OK with merging the patch as-is and it's clean. Thank you! Willy