Re: [ANNOUNCE] haproxy-2.3.9

2021-03-31 Thread Willy Tarreau
On Wed, Mar 31, 2021 at 02:29:40PM +0200, Vincent Bernat wrote:
>  ? 31 mars 2021 12:46 +02, Willy Tarreau:
> 
> > On the kernel Greg solved all this by issuing all versions very
> > frequently: as long as you produce updates faster than users are
> > willing to deploy them, they can choose what to do. It just requires
> > a bandwidth that we don't have :-/ Some weeks several of us work full
> > time on backports and tests! Right now we've reached a point where
> > backports can prevent us from working on mainline, and where this lack
> > of time increases the risk of regressions, and the regressions require
> > more backport time.
> 
> Wouldn't this mean there are too many versions in parallel?

It cannot be summed up this easily. Normally, old versions are not
released often so they don't cost much. But not releasing them often
complicates the backports and their testing so it's still better to
try to feed them along with the other ones. However, releasing them
in parallel to the other ones makes them more susceptible to get stupid
issues like the last build failure with libmusl. But not releasing them
wouldn't change much given that build failures in certain environments
are only detected once the release sends the signal that it's time to
update :-/

With this said, while the adoption of non-LTS versions has added one
to two versions to the series, it has significantly reduced the pain
of certain backports precisely because it resulted in splitting the
population of users. So at the cost of ~1 more version in the pipe,
we get more detailed reports from users who are more accustomed to
enabling core dumps, firing gdb, applying patches etc, which reduces
the time spent on bugs and increases the confidence in fixes that get
backported. So I'd say that it remains a very good investment. However
I wanted to make sure we shorten the non-LTS versions' life to limit
the in-field fragmentation. And this works extremely well (I'm very
grateful to our users for this, and I suspect that the status banner
in the executable reminding about EOL helps). We probably have not
seen any single 2.1 report in the issues over the last 3-4 months.
And I expect that 6 months after 2.4 is released, we won't read about
2.3 anymore.

Also if you dig into the issue tracker, you'll see a noticeable number
of users who accept to run some tests on 2.3 to verify if it fixes an
issue they face in 2.2. We're usually not asking for an upgrade, just
a test on a very close version. This flexibility is very important as
well.

So the number of parallel versions is one aspect of the problem but
it's also an important part of the solution. I hope we can continue to
maintain short lives for non-LTS but at the same time it must remain a
win-win: if we get useful reports on one version that are valid for
other ones as well, I'm fine with extending it a little bit as we did
for 1.9; there's no reason the ones making most efforts are the first
ones punished.

Overall the real issue remains the number of bugs we introduce in the
code and that is unavoidable when working on lower layers where a good
test coverage is extremely difficult to achieve. Making smaller and more
detailed patches is mandatory. Continuing to add reg-tests definitely
helps a lot. We've added more than one reg-test per week since 2.3,
that's definitely not bad at all, but this effort must continue! The
CI reports few false positives now and the situation has tremendously
improved over the last 2 years. So with better code we can hope for
less bugs, less fixes, less backports hence less risks of regressions.

> > I think that the real problem arrives when a version becomes generally
> > available in distros. And distro users are often the ones with the least
> > autonomy when it comes to rolling back. When you build from sources,
> > you're more at ease. Thus probably that a nice solution would be to
> > add an idle period between a stable release and its appearance in
> > distros so that it really gets some initial deployment before becoming
> > generally available. And I know that some users complain when they do
> > not immediately see their binary package, but that's something we can
> > easily explain and document. We could even indicate a level of confidence
> > in the announce messages. It has the merit of respecting the principle
> > of least surprise for everyone in the chain, including those like you
> > and me involved in the release cycle and who did not necessarily plan
> > to stop all activities to work on yet-another-release because the
> > long-awaited fix-of-the-month broke something and its own fix broke
> > something else.
> 
> We can do that. In the future, I may even tackle all the problems at
> once: providing easy access to old versions and have two versions of
> each repository: one with new versions immediately available and one
> with a semi-fixed delay.

Ah I really like this! Your packages definitely are the most exposed
ones so this could very 

Re: [ANNOUNCE] haproxy-2.3.9

2021-03-31 Thread Vincent Bernat
 ❦ 31 mars 2021 12:46 +02, Willy Tarreau:

> On the kernel Greg solved all this by issuing all versions very
> frequently: as long as you produce updates faster than users are
> willing to deploy them, they can choose what to do. It just requires
> a bandwidth that we don't have :-/ Some weeks several of us work full
> time on backports and tests! Right now we've reached a point where
> backports can prevent us from working on mainline, and where this lack
> of time increases the risk of regressions, and the regressions require
> more backport time.

Wouldn't this mean there are too many versions in parallel?

> I think that the real problem arrives when a version becomes generally
> available in distros. And distro users are often the ones with the least
> autonomy when it comes to rolling back. When you build from sources,
> you're more at ease. Thus probably that a nice solution would be to
> add an idle period between a stable release and its appearance in
> distros so that it really gets some initial deployment before becoming
> generally available. And I know that some users complain when they do
> not immediately see their binary package, but that's something we can
> easily explain and document. We could even indicate a level of confidence
> in the announce messages. It has the merit of respecting the principle
> of least surprise for everyone in the chain, including those like you
> and me involved in the release cycle and who did not necessarily plan
> to stop all activities to work on yet-another-release because the
> long-awaited fix-of-the-month broke something and its own fix broke
> something else.

We can do that. In the future, I may even tackle all the problems at
once: providing easy access to old versions and have two versions of
each repository: one with new versions immediately available and one
with a semi-fixed delay.
-- 
April 1

This is the day upon which we are reminded of what we are on the other three
hundred and sixty-four.
-- Mark Twain, "Pudd'nhead Wilson's Calendar"



Re: [ANNOUNCE] haproxy-2.3.9

2021-03-31 Thread Julien Pivotto
Hello,

Just giving my feedback on part of the story:

On 31 Mar 12:46, Willy Tarreau wrote:
> On the kernel Greg solved all this by issuing all versions very
> frequently: as long as you produce updates faster than users are
> willing to deploy them, they can choose what to do. It just requires
> a bandwidth that we don't have :-/ Some weeks several of us work full
> time on backports and tests! Right now we've reached a point where
> backports can prevent us from working on mainline, and where this lack
> of time increases the risk of regressions, and the regressions require
> more backport time.

I just want to say that I greatly appreciate the backport policy of
HAProxy. I often see really small bugs or even small improvements being
backported, where I personally would have been happy with them just
fixed on devel. This is greatly appreciated!

-- 
 (o-Julien Pivotto
 //\Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu


signature.asc
Description: PGP signature


Re: [ANNOUNCE] haproxy-2.3.9

2021-03-31 Thread Willy Tarreau
Hi Vincent!

On Wed, Mar 31, 2021 at 12:11:32PM +0200, Vincent Bernat wrote:
> It's a bit annoying that fixes reach a LTS version before the non-LTS
> one. The upgrade scenario is one annoyance, but if there is a
> regression, you also impact far more users.

I know, this is also why I'm quite a bit irritated by this.

> You could tag releases in
> git (with -preX if needed) when preparing the releases and then issue
> the release with a few days apart.

In practice the tag serves no purpose, but that leads to the same
principle as leaving some fixes pending in the -next branch.

> Users of older versions will have
> less frequent releases in case regressions are spotted, but I think
> that's the general expectation: if you are running older releases it's
> because you don't have time to upgrade and it's good enough for you.

I definitely agree with this and that's also how I'm using LTS versions
of various software and why we try to put more care on LTS versions here.

> For example:
>  - 2.3, monthly release or when there is a big regression
>  - 2.2, 3 days after 2.3
>  - 2.0, 3 days after 2.2, skip one out of two releases
>  - 1.8, 3 days after 2.0, skip one out of four releases
> 
> So, you have a 2.3.9. At the same time, you tag 2.2.12-pre1 (to be
> released in 3 working days if everything is fine) and you skip skip 2.0
> and 1.8 this time because they were releases to match 2.3.8. Next time,
> you'll have a 2.0.22-pre1 but no 1.8.30-pre1 yet.

This will not work. I tried this when I was maintaining kernels, and the
reality is that users who stumble on a bug want their fix. And worse,
their stability expectations when running on older releases make them
even more impatient, because 1) older releases *are* expected to be
reliable, 2) they're deployed on sensitive machines, where the business
is, and 3) it's expected there are very few pending fixes so for them
there's no justification for delaying the fix they're waiting for.

> If for some reason, there is an important regression in 2.3.9 you want
> to address, you release a 2.3.10 and a 2.2.12-pre2, still no 2.0.22-pre1
> nor 1.8.30-pre1. Hopefully, no more regressions spotted, you tag 2.2.12
> on top of 2.2.12-pre2 and issue a release.

The thing is, the -pre releases will just be tags of no use at all.
Maintenance branches collect fixes all the time and either you're on a
release or you're following -git. And quite frankly, most stable users
are on a point release because by definition that's what they need. What
I'd like to do is to maintain a small delay between versions, but there
is no need to maintain particularly long delays past the next LTS.

What needs to be particularly protected are the LTS as a whole. There
are more affected users by 2.2 breakage than 2.0 breakage, and the risk
is the same for each of them. So instead we should make sure that all
versions starting from the first LTS past the latest branch will be
slightly delayed. But there's no need to further enforce a delay between
them.

What this means is that when issuing a 2.3 release, we can wait a bit
before issuing the 2.2, and then once 2.2 is emitted, most of the
potential damage is already done, so there's no reason for keeping older
ones on hold as it can only force their users to live with known bugs.

And when the latest branch is an LTS (like in a few months once 2.4 is
out), we'd emit 2.4 and 2.3 together, then wait a bit and emit 2.2 and
the other ones. This maintains the principle that the LTS before the
latest branch should be very stable.

With this said, remains the problem of late fixes that I mentioned and
that are discovered during this grace period. The tricky ones can wait
in the -next branch, but the other ones should be integrated, otherwise
the nasty effect is that users think "let's not upgrade to this one but
wait for the next one so that I do not have to schedule another update
later and that I collect all fixes at once". But if we integrate
sensitive fixes in 2.2 that were not yet in a released 2.3, those
upgrading will face some breakage.

On the kernel Greg solved all this by issuing all versions very
frequently: as long as you produce updates faster than users are
willing to deploy them, they can choose what to do. It just requires
a bandwidth that we don't have :-/ Some weeks several of us work full
time on backports and tests! Right now we've reached a point where
backports can prevent us from working on mainline, and where this lack
of time increases the risk of regressions, and the regressions require
more backport time.

I think that the real problem arrives when a version becomes generally
available in distros. And distro users are often the ones with the least
autonomy when it comes to rolling back. When you build from sources,
you're more at ease. Thus probably that a nice solution would be to
add an idle period between a stable release and its appearance in
distros so that it really gets some initial deployment before becoming
generally 

Re: [ANNOUNCE] haproxy-2.3.9

2021-03-31 Thread Vincent Bernat
 ❦ 31 mars 2021 10:35 +02, Willy Tarreau:

>> Thanks Willy for the quick update. That's a good example to avoid
>> pushing stable versions at the same time, so we have opportunities to
>> find those regressions.
>
> I know and we're trying to separate them but it considerably increases the
> required effort. In addition there is a nasty effect resulting from shifted
> releases, which is that it ultimately results in older releases possibly
> having more recent fixes than recent ones. And it will happen again with
> 2.2.12 which I hope to issue today. It will contain the small fix for the
> silent-drop issue (which is already in 2.3 of course) but was merged after
> 2.3.9. The reporter of the issue is on 2.2, it would not be fair to him to
> release another 2.2 without it (or we'd fall into a bureaucratic process
> that doesn't serve users anymore). So 2.2.12 will contain this fix. But
> if the person finally decides to upgrade to 2.3.9 a week or two later, she
> may face the bug again. It's not a dramatic one so that's acceptable, but
> that shows the difficulties of the process.

It's a bit annoying that fixes reach a LTS version before the non-LTS
one. The upgrade scenario is one annoyance, but if there is a
regression, you also impact far more users. You could tag releases in
git (with -preX if needed) when preparing the releases and then issue
the release with a few days apart. Users of older versions will have
less frequent releases in case regressions are spotted, but I think
that's the general expectation: if you are running older releases it's
because you don't have time to upgrade and it's good enough for you.

For example:
 - 2.3, monthly release or when there is a big regression
 - 2.2, 3 days after 2.3
 - 2.0, 3 days after 2.2, skip one out of two releases
 - 1.8, 3 days after 2.0, skip one out of four releases

So, you have a 2.3.9. At the same time, you tag 2.2.12-pre1 (to be
released in 3 working days if everything is fine) and you skip skip 2.0
and 1.8 this time because they were releases to match 2.3.8. Next time,
you'll have a 2.0.22-pre1 but no 1.8.30-pre1 yet.

If for some reason, there is an important regression in 2.3.9 you want
to address, you release a 2.3.10 and a 2.2.12-pre2, still no 2.0.22-pre1
nor 1.8.30-pre1. Hopefully, no more regressions spotted, you tag 2.2.12
on top of 2.2.12-pre2 and issue a release.
-- 
He hath eaten me out of house and home.
-- William Shakespeare, "Henry IV"



Re: [ANNOUNCE] haproxy-2.3.9

2021-03-31 Thread Willy Tarreau
On Wed, Mar 31, 2021 at 10:17:35AM +0200, William Dauchy wrote:
> On Tue, Mar 30, 2021 at 6:59 PM Willy Tarreau  wrote:
> > HAProxy 2.3.9 was released on 2021/03/30. It added 5 new commits
> > after version 2.3.8.
> >
> > This essentially fixes the rate counters issue that popped up in 2.3.8
> > after the previous fix for the rate counters already.
> >
> > What happened is that the internal time in millisecond wraps every 49.7
> > days and that the new global counter used to make sure rate counters are
> > now stable across threads starts at zero and is initialized when older
> > than the current thread's current date. It just happens that the wrapping
> > happened a few hours ago at "Mon Mar 29 23:59:46 CEST 2021" exactly and
> > that any process started since this date and for the next 24 days doesn't
> > validate this condition anymore, hence doesn't rotate its rate counters
> > anymore.
> 
> Thanks Willy for the quick update. That's a good example to avoid
> pushing stable versions at the same time, so we have opportunities to
> find those regressions.

I know and we're trying to separate them but it considerably increases the
required effort. In addition there is a nasty effect resulting from shifted
releases, which is that it ultimately results in older releases possibly
having more recent fixes than recent ones. And it will happen again with
2.2.12 which I hope to issue today. It will contain the small fix for the
silent-drop issue (which is already in 2.3 of course) but was merged after
2.3.9. The reporter of the issue is on 2.2, it would not be fair to him to
release another 2.2 without it (or we'd fall into a bureaucratic process
that doesn't serve users anymore). So 2.2.12 will contain this fix. But
if the person finally decides to upgrade to 2.3.9 a week or two later, she
may face the bug again. It's not a dramatic one so that's acceptable, but
that shows the difficulties of the process.

In an ideal world, there would be lots of tests in production on stable
versions. The reality is that nobody (me included) is interested in upgrading
prod servers running flawlessly to just confirm there's no nasty surprise
with the forthcoming release, because either there's a bug and you prefer
someone else to spot it first, or there's no problem and you'll upgrade
once the final version is ready.

With this option left off the table, it's clear that the only option that
remains is the shifted versions. But here it would not even have provided
anything because the code worked on monday and broke on tuesday!

What I think we can try to do (and we discussed about this with the other
co-maintainers) is to push the patches but not immediately emit the releases
(so that the backport work is still factored), and to keep the tricky
patches in the -next branch to prevent them from being backported too far
too fast (it will save us from the risk of missing them if not merged).

Overall the most important solution is that we release often enough so
that in case of a regression that affects some users, they can stay on
the previous version a little bit more without having to endure too many
bugs. And if we don't have too many fixes per release, it's easy to emit
yet another small one immediately after to fix a single regression. But
over the last week we've been flooded on multiple channels by many reports
and then it becomes really hard to focus on a single issue at once for a
release :-/

Cheers,
Willy



Re: [ANNOUNCE] haproxy-2.3.9

2021-03-31 Thread William Dauchy
On Tue, Mar 30, 2021 at 6:59 PM Willy Tarreau  wrote:
> HAProxy 2.3.9 was released on 2021/03/30. It added 5 new commits
> after version 2.3.8.
>
> This essentially fixes the rate counters issue that popped up in 2.3.8
> after the previous fix for the rate counters already.
>
> What happened is that the internal time in millisecond wraps every 49.7
> days and that the new global counter used to make sure rate counters are
> now stable across threads starts at zero and is initialized when older
> than the current thread's current date. It just happens that the wrapping
> happened a few hours ago at "Mon Mar 29 23:59:46 CEST 2021" exactly and
> that any process started since this date and for the next 24 days doesn't
> validate this condition anymore, hence doesn't rotate its rate counters
> anymore.

Thanks Willy for the quick update. That's a good example to avoid
pushing stable versions at the same time, so we have opportunities to
find those regressions.

-- 
William



Re: [ANNOUNCE] haproxy-2.3.9

2021-03-30 Thread Willy Tarreau
On Tue, Mar 30, 2021 at 07:08:25PM +0200, Tim Düsterhus wrote:
> Willy,
> 
> On 3/30/21 6:58 PM, Willy Tarreau wrote:
> > Note: I've just discovered that building with -DDEBUG_THREAD fails :-(
> >   I've now fixed it, and since this normally only affects haproxy
> >   developers who are supposed to use git versions, I don't expect
> >   this to be an issue for anyone. However if it's an issue for you,
> >   just let me know and I'll emit 2.3.10. I just don't want to spend
> >   my time creating releases for no reason.
> > 
> 
> may I request one for 1.7 if you have some spare cycles:
> https://github.com/haproxy/haproxy/issues/760#issuecomment-805280316?
> 
> While the issue for Docker is fixed by manually applying the patch [1]
> the current official release fails the build for musl.

Ah yes, already forgot about this one, thank you :-/  We'll check
tomorrow, or my head will explode and that would be dirty here.

thanks,
Willy



Re: [ANNOUNCE] haproxy-2.3.9

2021-03-30 Thread Tim Düsterhus
Willy,

On 3/30/21 6:58 PM, Willy Tarreau wrote:
> Note: I've just discovered that building with -DDEBUG_THREAD fails :-(
>   I've now fixed it, and since this normally only affects haproxy
>   developers who are supposed to use git versions, I don't expect
>   this to be an issue for anyone. However if it's an issue for you,
>   just let me know and I'll emit 2.3.10. I just don't want to spend
>   my time creating releases for no reason.
> 

may I request one for 1.7 if you have some spare cycles:
https://github.com/haproxy/haproxy/issues/760#issuecomment-805280316?

While the issue for Docker is fixed by manually applying the patch [1]
the current official release fails the build for musl.

Best regards
Tim Düsterhus

[1]
https://github.com/docker-library/haproxy/commit/42989bf9ab08e05e7333261ce62fff00d0dcb08a



[ANNOUNCE] haproxy-2.3.9

2021-03-30 Thread Willy Tarreau
Hi,

HAProxy 2.3.9 was released on 2021/03/30. It added 5 new commits
after version 2.3.8.

This essentially fixes the rate counters issue that popped up in 2.3.8
after the previous fix for the rate counters already.

What happened is that the internal time in millisecond wraps every 49.7
days and that the new global counter used to make sure rate counters are
now stable across threads starts at zero and is initialized when older
than the current thread's current date. It just happens that the wrapping
happened a few hours ago at "Mon Mar 29 23:59:46 CEST 2021" exactly and
that any process started since this date and for the next 24 days doesn't
validate this condition anymore, hence doesn't rotate its rate counters
anymore.

Another issue possibly not affecting the same users that I met on a 8-core
16-thread xeon is that there was still some important contention on the
idle conns in case the limit of the file descriptors was about to be
reached and threads fight to choose a queue to put it into, to the point
of regularly triggering the watchdog. We already fixed a similar one
recently but it existed at two places.

And the fix for the rare crashes in mux-h1 on double shutdown reported in
issue #1197 was backported.

The rest is quite minor (payload sample fetch not properly waiting, and
restarting servers not using the correct color on the stats page).

Please find the usual URLs below :
   Site index   : http://www.haproxy.org/
   Discourse: http://discourse.haproxy.org/
   Slack channel: https://slack.haproxy.org/
   Issue tracker: https://github.com/haproxy/haproxy/issues
   Wiki : https://github.com/haproxy/wiki/wiki
   Sources  : http://www.haproxy.org/download/2.3/src/
   Git repository   : http://git.haproxy.org/git/haproxy-2.3.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy-2.3.git
   Changelog: http://www.haproxy.org/download/2.3/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/

Note: I've just discovered that building with -DDEBUG_THREAD fails :-(
  I've now fixed it, and since this normally only affects haproxy
  developers who are supposed to use git versions, I don't expect
  this to be an issue for anyone. However if it's an issue for you,
  just let me know and I'll emit 2.3.10. I just don't want to spend
  my time creating releases for no reason.

Willy
---
Complete changelog :
Christopher Faulet (1):
  BUG/MINOR: payload: Wait for more data if buffer is empty in 
payload/payload_lv

Florian Apolloner (1):
  BUG/MINOR: stats: Apply proper styles in HTML status page.

Willy Tarreau (3):
  BUG/MEDIUM: mux-h1: make h1_shutw_conn() idempotent
  MEDIUM: backend: use a trylock to grab a connection on high FD counts as 
well
  BUG/MEDIUM: time: make sure to always initialize the global tick

---