Re: [ANNOUNCE] haproxy-2.3.9
On Wed, Mar 31, 2021 at 02:29:40PM +0200, Vincent Bernat wrote: > ? 31 mars 2021 12:46 +02, Willy Tarreau: > > > On the kernel Greg solved all this by issuing all versions very > > frequently: as long as you produce updates faster than users are > > willing to deploy them, they can choose what to do. It just requires > > a bandwidth that we don't have :-/ Some weeks several of us work full > > time on backports and tests! Right now we've reached a point where > > backports can prevent us from working on mainline, and where this lack > > of time increases the risk of regressions, and the regressions require > > more backport time. > > Wouldn't this mean there are too many versions in parallel? It cannot be summed up this easily. Normally, old versions are not released often so they don't cost much. But not releasing them often complicates the backports and their testing so it's still better to try to feed them along with the other ones. However, releasing them in parallel to the other ones makes them more susceptible to get stupid issues like the last build failure with libmusl. But not releasing them wouldn't change much given that build failures in certain environments are only detected once the release sends the signal that it's time to update :-/ With this said, while the adoption of non-LTS versions has added one to two versions to the series, it has significantly reduced the pain of certain backports precisely because it resulted in splitting the population of users. So at the cost of ~1 more version in the pipe, we get more detailed reports from users who are more accustomed to enabling core dumps, firing gdb, applying patches etc, which reduces the time spent on bugs and increases the confidence in fixes that get backported. So I'd say that it remains a very good investment. However I wanted to make sure we shorten the non-LTS versions' life to limit the in-field fragmentation. And this works extremely well (I'm very grateful to our users for this, and I suspect that the status banner in the executable reminding about EOL helps). We probably have not seen any single 2.1 report in the issues over the last 3-4 months. And I expect that 6 months after 2.4 is released, we won't read about 2.3 anymore. Also if you dig into the issue tracker, you'll see a noticeable number of users who accept to run some tests on 2.3 to verify if it fixes an issue they face in 2.2. We're usually not asking for an upgrade, just a test on a very close version. This flexibility is very important as well. So the number of parallel versions is one aspect of the problem but it's also an important part of the solution. I hope we can continue to maintain short lives for non-LTS but at the same time it must remain a win-win: if we get useful reports on one version that are valid for other ones as well, I'm fine with extending it a little bit as we did for 1.9; there's no reason the ones making most efforts are the first ones punished. Overall the real issue remains the number of bugs we introduce in the code and that is unavoidable when working on lower layers where a good test coverage is extremely difficult to achieve. Making smaller and more detailed patches is mandatory. Continuing to add reg-tests definitely helps a lot. We've added more than one reg-test per week since 2.3, that's definitely not bad at all, but this effort must continue! The CI reports few false positives now and the situation has tremendously improved over the last 2 years. So with better code we can hope for less bugs, less fixes, less backports hence less risks of regressions. > > I think that the real problem arrives when a version becomes generally > > available in distros. And distro users are often the ones with the least > > autonomy when it comes to rolling back. When you build from sources, > > you're more at ease. Thus probably that a nice solution would be to > > add an idle period between a stable release and its appearance in > > distros so that it really gets some initial deployment before becoming > > generally available. And I know that some users complain when they do > > not immediately see their binary package, but that's something we can > > easily explain and document. We could even indicate a level of confidence > > in the announce messages. It has the merit of respecting the principle > > of least surprise for everyone in the chain, including those like you > > and me involved in the release cycle and who did not necessarily plan > > to stop all activities to work on yet-another-release because the > > long-awaited fix-of-the-month broke something and its own fix broke > > something else. > > We can do that. In the future, I may even tackle all the problems at > once: providing easy access to old versions and have two versions of > each repository: one with new versions immediately available and one > with a semi-fixed delay. Ah I really like this! Your packages definitely are the most exposed ones so this could very
Re: [ANNOUNCE] haproxy-2.3.9
❦ 31 mars 2021 12:46 +02, Willy Tarreau: > On the kernel Greg solved all this by issuing all versions very > frequently: as long as you produce updates faster than users are > willing to deploy them, they can choose what to do. It just requires > a bandwidth that we don't have :-/ Some weeks several of us work full > time on backports and tests! Right now we've reached a point where > backports can prevent us from working on mainline, and where this lack > of time increases the risk of regressions, and the regressions require > more backport time. Wouldn't this mean there are too many versions in parallel? > I think that the real problem arrives when a version becomes generally > available in distros. And distro users are often the ones with the least > autonomy when it comes to rolling back. When you build from sources, > you're more at ease. Thus probably that a nice solution would be to > add an idle period between a stable release and its appearance in > distros so that it really gets some initial deployment before becoming > generally available. And I know that some users complain when they do > not immediately see their binary package, but that's something we can > easily explain and document. We could even indicate a level of confidence > in the announce messages. It has the merit of respecting the principle > of least surprise for everyone in the chain, including those like you > and me involved in the release cycle and who did not necessarily plan > to stop all activities to work on yet-another-release because the > long-awaited fix-of-the-month broke something and its own fix broke > something else. We can do that. In the future, I may even tackle all the problems at once: providing easy access to old versions and have two versions of each repository: one with new versions immediately available and one with a semi-fixed delay. -- April 1 This is the day upon which we are reminded of what we are on the other three hundred and sixty-four. -- Mark Twain, "Pudd'nhead Wilson's Calendar"
Re: [ANNOUNCE] haproxy-2.3.9
Hello, Just giving my feedback on part of the story: On 31 Mar 12:46, Willy Tarreau wrote: > On the kernel Greg solved all this by issuing all versions very > frequently: as long as you produce updates faster than users are > willing to deploy them, they can choose what to do. It just requires > a bandwidth that we don't have :-/ Some weeks several of us work full > time on backports and tests! Right now we've reached a point where > backports can prevent us from working on mainline, and where this lack > of time increases the risk of regressions, and the regressions require > more backport time. I just want to say that I greatly appreciate the backport policy of HAProxy. I often see really small bugs or even small improvements being backported, where I personally would have been happy with them just fixed on devel. This is greatly appreciated! -- (o-Julien Pivotto //\Open-Source Consultant V_/_ Inuits - https://www.inuits.eu signature.asc Description: PGP signature
Re: [ANNOUNCE] haproxy-2.3.9
Hi Vincent! On Wed, Mar 31, 2021 at 12:11:32PM +0200, Vincent Bernat wrote: > It's a bit annoying that fixes reach a LTS version before the non-LTS > one. The upgrade scenario is one annoyance, but if there is a > regression, you also impact far more users. I know, this is also why I'm quite a bit irritated by this. > You could tag releases in > git (with -preX if needed) when preparing the releases and then issue > the release with a few days apart. In practice the tag serves no purpose, but that leads to the same principle as leaving some fixes pending in the -next branch. > Users of older versions will have > less frequent releases in case regressions are spotted, but I think > that's the general expectation: if you are running older releases it's > because you don't have time to upgrade and it's good enough for you. I definitely agree with this and that's also how I'm using LTS versions of various software and why we try to put more care on LTS versions here. > For example: > - 2.3, monthly release or when there is a big regression > - 2.2, 3 days after 2.3 > - 2.0, 3 days after 2.2, skip one out of two releases > - 1.8, 3 days after 2.0, skip one out of four releases > > So, you have a 2.3.9. At the same time, you tag 2.2.12-pre1 (to be > released in 3 working days if everything is fine) and you skip skip 2.0 > and 1.8 this time because they were releases to match 2.3.8. Next time, > you'll have a 2.0.22-pre1 but no 1.8.30-pre1 yet. This will not work. I tried this when I was maintaining kernels, and the reality is that users who stumble on a bug want their fix. And worse, their stability expectations when running on older releases make them even more impatient, because 1) older releases *are* expected to be reliable, 2) they're deployed on sensitive machines, where the business is, and 3) it's expected there are very few pending fixes so for them there's no justification for delaying the fix they're waiting for. > If for some reason, there is an important regression in 2.3.9 you want > to address, you release a 2.3.10 and a 2.2.12-pre2, still no 2.0.22-pre1 > nor 1.8.30-pre1. Hopefully, no more regressions spotted, you tag 2.2.12 > on top of 2.2.12-pre2 and issue a release. The thing is, the -pre releases will just be tags of no use at all. Maintenance branches collect fixes all the time and either you're on a release or you're following -git. And quite frankly, most stable users are on a point release because by definition that's what they need. What I'd like to do is to maintain a small delay between versions, but there is no need to maintain particularly long delays past the next LTS. What needs to be particularly protected are the LTS as a whole. There are more affected users by 2.2 breakage than 2.0 breakage, and the risk is the same for each of them. So instead we should make sure that all versions starting from the first LTS past the latest branch will be slightly delayed. But there's no need to further enforce a delay between them. What this means is that when issuing a 2.3 release, we can wait a bit before issuing the 2.2, and then once 2.2 is emitted, most of the potential damage is already done, so there's no reason for keeping older ones on hold as it can only force their users to live with known bugs. And when the latest branch is an LTS (like in a few months once 2.4 is out), we'd emit 2.4 and 2.3 together, then wait a bit and emit 2.2 and the other ones. This maintains the principle that the LTS before the latest branch should be very stable. With this said, remains the problem of late fixes that I mentioned and that are discovered during this grace period. The tricky ones can wait in the -next branch, but the other ones should be integrated, otherwise the nasty effect is that users think "let's not upgrade to this one but wait for the next one so that I do not have to schedule another update later and that I collect all fixes at once". But if we integrate sensitive fixes in 2.2 that were not yet in a released 2.3, those upgrading will face some breakage. On the kernel Greg solved all this by issuing all versions very frequently: as long as you produce updates faster than users are willing to deploy them, they can choose what to do. It just requires a bandwidth that we don't have :-/ Some weeks several of us work full time on backports and tests! Right now we've reached a point where backports can prevent us from working on mainline, and where this lack of time increases the risk of regressions, and the regressions require more backport time. I think that the real problem arrives when a version becomes generally available in distros. And distro users are often the ones with the least autonomy when it comes to rolling back. When you build from sources, you're more at ease. Thus probably that a nice solution would be to add an idle period between a stable release and its appearance in distros so that it really gets some initial deployment before becoming generally
Re: [ANNOUNCE] haproxy-2.3.9
❦ 31 mars 2021 10:35 +02, Willy Tarreau: >> Thanks Willy for the quick update. That's a good example to avoid >> pushing stable versions at the same time, so we have opportunities to >> find those regressions. > > I know and we're trying to separate them but it considerably increases the > required effort. In addition there is a nasty effect resulting from shifted > releases, which is that it ultimately results in older releases possibly > having more recent fixes than recent ones. And it will happen again with > 2.2.12 which I hope to issue today. It will contain the small fix for the > silent-drop issue (which is already in 2.3 of course) but was merged after > 2.3.9. The reporter of the issue is on 2.2, it would not be fair to him to > release another 2.2 without it (or we'd fall into a bureaucratic process > that doesn't serve users anymore). So 2.2.12 will contain this fix. But > if the person finally decides to upgrade to 2.3.9 a week or two later, she > may face the bug again. It's not a dramatic one so that's acceptable, but > that shows the difficulties of the process. It's a bit annoying that fixes reach a LTS version before the non-LTS one. The upgrade scenario is one annoyance, but if there is a regression, you also impact far more users. You could tag releases in git (with -preX if needed) when preparing the releases and then issue the release with a few days apart. Users of older versions will have less frequent releases in case regressions are spotted, but I think that's the general expectation: if you are running older releases it's because you don't have time to upgrade and it's good enough for you. For example: - 2.3, monthly release or when there is a big regression - 2.2, 3 days after 2.3 - 2.0, 3 days after 2.2, skip one out of two releases - 1.8, 3 days after 2.0, skip one out of four releases So, you have a 2.3.9. At the same time, you tag 2.2.12-pre1 (to be released in 3 working days if everything is fine) and you skip skip 2.0 and 1.8 this time because they were releases to match 2.3.8. Next time, you'll have a 2.0.22-pre1 but no 1.8.30-pre1 yet. If for some reason, there is an important regression in 2.3.9 you want to address, you release a 2.3.10 and a 2.2.12-pre2, still no 2.0.22-pre1 nor 1.8.30-pre1. Hopefully, no more regressions spotted, you tag 2.2.12 on top of 2.2.12-pre2 and issue a release. -- He hath eaten me out of house and home. -- William Shakespeare, "Henry IV"
Re: [ANNOUNCE] haproxy-2.3.9
On Wed, Mar 31, 2021 at 10:17:35AM +0200, William Dauchy wrote: > On Tue, Mar 30, 2021 at 6:59 PM Willy Tarreau wrote: > > HAProxy 2.3.9 was released on 2021/03/30. It added 5 new commits > > after version 2.3.8. > > > > This essentially fixes the rate counters issue that popped up in 2.3.8 > > after the previous fix for the rate counters already. > > > > What happened is that the internal time in millisecond wraps every 49.7 > > days and that the new global counter used to make sure rate counters are > > now stable across threads starts at zero and is initialized when older > > than the current thread's current date. It just happens that the wrapping > > happened a few hours ago at "Mon Mar 29 23:59:46 CEST 2021" exactly and > > that any process started since this date and for the next 24 days doesn't > > validate this condition anymore, hence doesn't rotate its rate counters > > anymore. > > Thanks Willy for the quick update. That's a good example to avoid > pushing stable versions at the same time, so we have opportunities to > find those regressions. I know and we're trying to separate them but it considerably increases the required effort. In addition there is a nasty effect resulting from shifted releases, which is that it ultimately results in older releases possibly having more recent fixes than recent ones. And it will happen again with 2.2.12 which I hope to issue today. It will contain the small fix for the silent-drop issue (which is already in 2.3 of course) but was merged after 2.3.9. The reporter of the issue is on 2.2, it would not be fair to him to release another 2.2 without it (or we'd fall into a bureaucratic process that doesn't serve users anymore). So 2.2.12 will contain this fix. But if the person finally decides to upgrade to 2.3.9 a week or two later, she may face the bug again. It's not a dramatic one so that's acceptable, but that shows the difficulties of the process. In an ideal world, there would be lots of tests in production on stable versions. The reality is that nobody (me included) is interested in upgrading prod servers running flawlessly to just confirm there's no nasty surprise with the forthcoming release, because either there's a bug and you prefer someone else to spot it first, or there's no problem and you'll upgrade once the final version is ready. With this option left off the table, it's clear that the only option that remains is the shifted versions. But here it would not even have provided anything because the code worked on monday and broke on tuesday! What I think we can try to do (and we discussed about this with the other co-maintainers) is to push the patches but not immediately emit the releases (so that the backport work is still factored), and to keep the tricky patches in the -next branch to prevent them from being backported too far too fast (it will save us from the risk of missing them if not merged). Overall the most important solution is that we release often enough so that in case of a regression that affects some users, they can stay on the previous version a little bit more without having to endure too many bugs. And if we don't have too many fixes per release, it's easy to emit yet another small one immediately after to fix a single regression. But over the last week we've been flooded on multiple channels by many reports and then it becomes really hard to focus on a single issue at once for a release :-/ Cheers, Willy
Re: [ANNOUNCE] haproxy-2.3.9
On Tue, Mar 30, 2021 at 6:59 PM Willy Tarreau wrote: > HAProxy 2.3.9 was released on 2021/03/30. It added 5 new commits > after version 2.3.8. > > This essentially fixes the rate counters issue that popped up in 2.3.8 > after the previous fix for the rate counters already. > > What happened is that the internal time in millisecond wraps every 49.7 > days and that the new global counter used to make sure rate counters are > now stable across threads starts at zero and is initialized when older > than the current thread's current date. It just happens that the wrapping > happened a few hours ago at "Mon Mar 29 23:59:46 CEST 2021" exactly and > that any process started since this date and for the next 24 days doesn't > validate this condition anymore, hence doesn't rotate its rate counters > anymore. Thanks Willy for the quick update. That's a good example to avoid pushing stable versions at the same time, so we have opportunities to find those regressions. -- William
Re: [ANNOUNCE] haproxy-2.3.9
On Tue, Mar 30, 2021 at 07:08:25PM +0200, Tim Düsterhus wrote: > Willy, > > On 3/30/21 6:58 PM, Willy Tarreau wrote: > > Note: I've just discovered that building with -DDEBUG_THREAD fails :-( > > I've now fixed it, and since this normally only affects haproxy > > developers who are supposed to use git versions, I don't expect > > this to be an issue for anyone. However if it's an issue for you, > > just let me know and I'll emit 2.3.10. I just don't want to spend > > my time creating releases for no reason. > > > > may I request one for 1.7 if you have some spare cycles: > https://github.com/haproxy/haproxy/issues/760#issuecomment-805280316? > > While the issue for Docker is fixed by manually applying the patch [1] > the current official release fails the build for musl. Ah yes, already forgot about this one, thank you :-/ We'll check tomorrow, or my head will explode and that would be dirty here. thanks, Willy
Re: [ANNOUNCE] haproxy-2.3.9
Willy, On 3/30/21 6:58 PM, Willy Tarreau wrote: > Note: I've just discovered that building with -DDEBUG_THREAD fails :-( > I've now fixed it, and since this normally only affects haproxy > developers who are supposed to use git versions, I don't expect > this to be an issue for anyone. However if it's an issue for you, > just let me know and I'll emit 2.3.10. I just don't want to spend > my time creating releases for no reason. > may I request one for 1.7 if you have some spare cycles: https://github.com/haproxy/haproxy/issues/760#issuecomment-805280316? While the issue for Docker is fixed by manually applying the patch [1] the current official release fails the build for musl. Best regards Tim Düsterhus [1] https://github.com/docker-library/haproxy/commit/42989bf9ab08e05e7333261ce62fff00d0dcb08a
[ANNOUNCE] haproxy-2.3.9
Hi, HAProxy 2.3.9 was released on 2021/03/30. It added 5 new commits after version 2.3.8. This essentially fixes the rate counters issue that popped up in 2.3.8 after the previous fix for the rate counters already. What happened is that the internal time in millisecond wraps every 49.7 days and that the new global counter used to make sure rate counters are now stable across threads starts at zero and is initialized when older than the current thread's current date. It just happens that the wrapping happened a few hours ago at "Mon Mar 29 23:59:46 CEST 2021" exactly and that any process started since this date and for the next 24 days doesn't validate this condition anymore, hence doesn't rotate its rate counters anymore. Another issue possibly not affecting the same users that I met on a 8-core 16-thread xeon is that there was still some important contention on the idle conns in case the limit of the file descriptors was about to be reached and threads fight to choose a queue to put it into, to the point of regularly triggering the watchdog. We already fixed a similar one recently but it existed at two places. And the fix for the rare crashes in mux-h1 on double shutdown reported in issue #1197 was backported. The rest is quite minor (payload sample fetch not properly waiting, and restarting servers not using the correct color on the stats page). Please find the usual URLs below : Site index : http://www.haproxy.org/ Discourse: http://discourse.haproxy.org/ Slack channel: https://slack.haproxy.org/ Issue tracker: https://github.com/haproxy/haproxy/issues Wiki : https://github.com/haproxy/wiki/wiki Sources : http://www.haproxy.org/download/2.3/src/ Git repository : http://git.haproxy.org/git/haproxy-2.3.git/ Git Web browsing : http://git.haproxy.org/?p=haproxy-2.3.git Changelog: http://www.haproxy.org/download/2.3/src/CHANGELOG Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ Note: I've just discovered that building with -DDEBUG_THREAD fails :-( I've now fixed it, and since this normally only affects haproxy developers who are supposed to use git versions, I don't expect this to be an issue for anyone. However if it's an issue for you, just let me know and I'll emit 2.3.10. I just don't want to spend my time creating releases for no reason. Willy --- Complete changelog : Christopher Faulet (1): BUG/MINOR: payload: Wait for more data if buffer is empty in payload/payload_lv Florian Apolloner (1): BUG/MINOR: stats: Apply proper styles in HTML status page. Willy Tarreau (3): BUG/MEDIUM: mux-h1: make h1_shutw_conn() idempotent MEDIUM: backend: use a trylock to grab a connection on high FD counts as well BUG/MEDIUM: time: make sure to always initialize the global tick ---