Re: [squid-dev] Strategy about build farm nodes

2021-05-17 Thread Alex Rousskov
On 5/17/21 3:32 PM, Francesco Chemolli wrote:
> On Mon, May 17, 2021 at 8:32 PM Alex Rousskov wrote:
> 
> On 5/17/21 2:17 AM, Francesco Chemolli wrote:
> > $ make all push
> 
> Does that "make push" command automatically switch Jenkins CI to using
> the new/pushed containers? Or is that a separate manual step?

> Right now that's automatic.
> I'm pondering the best way to have a staging step; Docker supports
> versioning of images but we're using that feature to implement
> multiarch. In order to rely on that I'll need to change the naming
> conventions we rely on.

I think you should be free to use whatever versioning/naming scheme you
think works best for supporting safe Jenkins CI upgrades.

Ideally, the new docker images (and any other changes) should be tested
(against official branches) _before_ the official tests are switched to
use them. I hope you find a way to implement that without a lot of work.
If you need to tap into Foundation resources, please say so!


> The overarching objectives are:
> - focus on where most users probably are
> - allow iterating quickly giving some signal on tracked branches
> - allow landing quickly with more signal
> - be slow but exhaustive on every other platform we can cover. Do not
> block landing new code on less common or harder to test on platforms,
> but still rely on devs to own their changes there, too.

What exactly do you mean by "own their changes there"? The rest is
general/vague enough to allow for many interpretations I can agree with,
but that last bit seems rather specific, so I wanted to ask...


> I think that full round of PR tests (i.e. one initial set plus one
> staging set) should not exceed ~24 hours, _including_ any wait/queuing
> time.

> Is everyone okay with such a slow turnaround?

Well, you should not overload the question by calling a 24-hour delay
for an all-green signal slow :-).

It is all relative, of course: Most Squid PRs take a lot longer to
review and fix. There are open source projects where PRs wait for a week
or more, so 24 hours for a full round (and usually just an hour or two
for the first signal!) would feel very fast for them. Etc.

Yes, 24 hours is not "interactive", but it is acceptable for the vast
majority of PRs AFAICT, and you are providing dockers for those who want
to test in a more interactive mode.


> I think you are proposing to make some tests optional. Which ones?
> 
> 
> Not the tests, but the platforms.

From my test results consumer point of view, they are
[platform-specific] [build] tests.


> Things like gentoo, fedora rawhide, freebsd, openbsd.
> Testing is slower there, so results will lag and we need to batch,
> running a full test suite every week or so

OK, so you propose to make slow Jenkins platform-specific tests (i.e.
Jenkins tests on platforms where tests run slowly today) optional and
those platforms are gentoo, fedora rawhide, freebsd, openbsd. Got it!

What happens when such a slow test fails and the PR author ignores the
failure of that optional test and/or the PR is already closed? The
answer to that question is the key to evaluating any proposal that
declares any tests optional IMO...


> FWIW, I do not think reducing the number of #ifdefs will solve our
> primary CI problems.


> Each test build is right now 4 full builds and test suites:
> autodetected, minimal, maximum, everytthing-in-except-for-auth

And all of that takes less than 10 minutes on decent hardware which we
can rent or buy! To me, it feels like we are creating an unsolvable
problem here (i.e. testing a lot of things quickly on a raspberry Pi)
and then trying extra hard to solve that unsolvable problem. We should
either stop testing a lot of things or we should use appropriate
resources for testing a lot of things quickly.

Yes, we can reduce testing time by removing #ifdefs, but that is a lot
of work and does not really scale. #ifdefs should be removed primarily
for other reasons.


Cheers,

Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Strategy about build farm nodes

2021-05-17 Thread Francesco Chemolli
On Mon, May 17, 2021 at 8:32 PM Alex Rousskov <
rouss...@measurement-factory.com> wrote:

> On 5/17/21 2:17 AM, Francesco Chemolli wrote:
>
> > Our Linux environments are docker containers on amd64, armv7l and arm64.
> > On a roughly monthly cadence, I pull from our dockerfiles repo
> > (https://github.com/kinkie/dockerfiles) and
> > $ make all push
>
> Does that "make push" command automatically switch Jenkins CI to using
> the new/pushed containers? Or is that a separate manual step?
>

Right now that's automatic.
I'm pondering the best way to have a staging step; Docker supports
versioning of images but we're using that feature to implement multiarch.
In order to rely on that I'll need to change the naming conventions we rely
on.


> >>> What I would place on each individual dev is the case where a PR breaks
> >>> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
> >>> trunk-openbsd-matrix, trunk-freebsd-matrix builds
>
> >> I think you are saying that
> >> some CI tests called trunk-matrix, trunk-arm32-matrix,
> >> trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be
> >> classified as _required_.
>
> > That is how I read the statement too.
>
> ... but it sounds like that is _not_ what you (Francesco) is actually
> proposing because you are essentialy saying "that ideal would be
> impractical" in the paragraph below. Assuming that you are not attacking
> your own proposal, that means Amos and I do not know what your proposal
> is -- we both guessed incorrectly...
>

The overarching objectives are:
- focus on where most users probably are
- allow iterating quickly giving some signal on tracked branches
- allow landing quickly with more signal
- be slow but exhaustive on every other platform we can cover. Do not block
landing new code on less common or harder to test on platforms, but still
rely on devs to own their changes there, too.

> In a word of infinite resources and very efficient testing, sure.
> > But in a space where a single os/compiler combo takes 2hrs on Linux and
> > 4hrs on Freebsd or openbsd, and a full 5-pr-test takes 6 hours end to
> > end, we need to optimize or making any of these requirements blocking
> > would make these times get 4+ times larger (a full trunk-matrix takes
> > just about a day on amd64, 2 days on arm64), and the complaint would
> > then be that development or release is slowed down by the amount of
> > testing done.
>
> FWIW, I think that full round of PR tests (i.e. one initial set plus one
> staging set) should not exceed ~24 hours, _including_ any wait/queuing
> time. This kind of delay should still allow for reasonable progress with
> PRs AFAICT. This includes any release PRs, of course. If we exceed these
> limits (or whatever limits we can agree on), then we should add testing
> resources and/or drop tests.


Is everyone okay with such a slow turnaround?


> > My proposal aims to test/land the PR on the systems where we can be
> > efficient and that are relevant, and fix any remaining after-the-fact
> > issues with followup, PRs, that remain a soft requirement for the dev
> > introducing the change.
>
> Here, I think you are proposing to make some tests optional. Which ones?
>

Not the tests, but the platforms. Things like gentoo, fedora rawhide,
freebsd, openbsd.
Testing is slower there, so results will lag and we need to batch, running
a full test suite every week or so


> > For instance: we currently have a lot of different build-time
> > configurations meant to save core memory in runtime environments. Is it
> > maybe time to revisit this decision and move these checks to run time?
>
> Sure, quality proposals for removing #ifdefs should be welcomed, one
> (group of) #ifdefs at a time. We can warn squid-users in advance in case
> somebody wants to provide evidence of harm.
>

I'm glad this is having buy-in. Are there any other opinions (for or
against)?


> > Unfortunately, one of the problems we have is that we're running blind.
> > We don't know what configurations our users deploy; we can only assume
> > and that makes this conversation much more susceptible to opinions and
> > harder to build consensus on
>
> Yes, we are blind, but I doubt we should care much about actual
> configuration in this specific context. If we can remove an #ifdef
> without serious negative consequences, we should remove it. We can avoid
> long discussions by allowing anybody with concrete evidence of problems
> to block any particular removal. I bet that conservative approach would
> still allow for the removal of many #ifdefs.
>

Sounds good to me

FWIW, I do not think reducing the number of #ifdefs will solve our
> primary CI problems. I believe we should reduce the number of platforms
> we test on and then use Foundation resources to speed up and improve the
> remaining tests.
>

Each test build is right now 4 full builds and test suites:
autodetected, minimal, maximum, everytthing-in-except-for-auth

Can we reduce them to 2? If 

Re: [squid-dev] Strategy about build farm nodes

2021-05-17 Thread Alex Rousskov
On 5/17/21 2:17 AM, Francesco Chemolli wrote:

> Our Linux environments are docker containers on amd64, armv7l and arm64.
> On a roughly monthly cadence, I pull from our dockerfiles repo
> (https://github.com/kinkie/dockerfiles) and
> $ make all push

Does that "make push" command automatically switch Jenkins CI to using
the new/pushed containers? Or is that a separate manual step?


>>> What I would place on each individual dev is the case where a PR breaks
>>> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
>>> trunk-openbsd-matrix, trunk-freebsd-matrix builds

>> I think you are saying that
>> some CI tests called trunk-matrix, trunk-arm32-matrix,
>> trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be
>> classified as _required_.

> That is how I read the statement too.

... but it sounds like that is _not_ what you (Francesco) is actually
proposing because you are essentialy saying "that ideal would be
impractical" in the paragraph below. Assuming that you are not attacking
your own proposal, that means Amos and I do not know what your proposal
is -- we both guessed incorrectly...


> In a word of infinite resources and very efficient testing, sure.
> But in a space where a single os/compiler combo takes 2hrs on Linux and
> 4hrs on Freebsd or openbsd, and a full 5-pr-test takes 6 hours end to
> end, we need to optimize or making any of these requirements blocking
> would make these times get 4+ times larger (a full trunk-matrix takes
> just about a day on amd64, 2 days on arm64), and the complaint would
> then be that development or release is slowed down by the amount of
> testing done.

FWIW, I think that full round of PR tests (i.e. one initial set plus one
staging set) should not exceed ~24 hours, _including_ any wait/queuing
time. This kind of delay should still allow for reasonable progress with
PRs AFAICT. This includes any release PRs, of course. If we exceed these
limits (or whatever limits we can agree on), then we should add testing
resources and/or drop tests.


> My proposal aims to test/land the PR on the systems where we can be
> efficient and that are relevant, and fix any remaining after-the-fact
> issues with followup, PRs, that remain a soft requirement for the dev
> introducing the change.

Here, I think you are proposing to make some tests optional. Which ones?


> For instance: we currently have a lot of different build-time
> configurations meant to save core memory in runtime environments. Is it
> maybe time to revisit this decision and move these checks to run time?

Sure, quality proposals for removing #ifdefs should be welcomed, one
(group of) #ifdefs at a time. We can warn squid-users in advance in case
somebody wants to provide evidence of harm.


> Unfortunately, one of the problems we have is that we're running blind.
> We don't know what configurations our users deploy; we can only assume
> and that makes this conversation much more susceptible to opinions and
> harder to build consensus on

Yes, we are blind, but I doubt we should care much about actual
configuration in this specific context. If we can remove an #ifdef
without serious negative consequences, we should remove it. We can avoid
long discussions by allowing anybody with concrete evidence of problems
to block any particular removal. I bet that conservative approach would
still allow for the removal of many #ifdefs.


FWIW, I do not think reducing the number of #ifdefs will solve our
primary CI problems. I believe we should reduce the number of platforms
we test on and then use Foundation resources to speed up and improve the
remaining tests.


HTH,

Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Strategy about build farm nodes

2021-05-17 Thread Alex Rousskov
On 5/16/21 10:19 PM, squ...@treenet.co.nz wrote:
> On 2021-05-17 11:56, Alex Rousskov wrote:
>> On 5/16/21 3:31 AM, Amos Jeffries wrote:
>>> On 4/05/21 2:29 am, Alex Rousskov wrote:
 The principles I have proposed allow upgrades that do not violate key
 invariants. For example, if a proposed upgrade would break master, then
 master has to be changed _before_ that upgrade actually happens, not
 after. Upgrades must not break master.
>>>
>>> So ... a node is added/upgraded. It runs and builds master fine. Then
>>> added to the matrices some of the PRs start failing.
>>
>> It is easy to misunderstand what is going on because there is no good
>> visualization of complex PR-master-Jenkins_nodes-Jenkins_failures
>> relationships. Several kinds of PR test failures are possible. I will
>> describe the two most relevant to your email:
>>
>> * PR test failures due to problems introduced by PRs should be welcomed
>> at any time.
> 
> Strawman here. This is both general statement and not relevant to CI
> changes or design(s) we are discussing.

The first bullet (out of the two bullets that are meant to be
interpreted together, in their context) is not even close to being a
strawman argument by any definition I can find. Neither is the
combination of those two bullets.


>> CI improvements are allowed to find new bugs in open PRs.

> IMO the crux is that word "new". CI improvements very rarely find new
> bugs. What it actually finds and intentionally so is *existing bugs* the
> old CI config wrongly ignored.

Here, I used "new" to mean bugs that had not been found by the previous
CI version (i.e. "newly discovered" or "new to us"). These bugs existed
before and after CI improvements, of course -- neither master nor the PR
has changed -- but we did not know about them.


>> * PR test failures due to the existing master code are not welcomed.

> That is not as black/white as the statement above implies. There are
> some master branch bugs we don't want to block PRs merging, and there
> are some (rarely) we absolutely do not want any PRs to change master
> until fixed.

Yes, master bugs should not affect PR merging in the vast majority of
cases, but that is not what this bullet is about at all!

This bullet is about (a certain kind of) PR test failures. Hopefully, we
do not need to revisit the debate whether PRs with failed tests should
be merged. They should not be merged, which is exactly why PR test
failures caused by the combination of CI changes and the existing master
code are not welcomed -- they block progress of an innocent PR.


>> They represent a CI failure.
> 
> IMO this is absolutely false. The whole point of improving CI is to find
> those "existing" bugs which the previous CI config wrong missed.

Your second sentence is correct, but it does not make my statement
false. CI should achieve several goals. Finding bugs (i.e. blocking
buggy PRs) is one of them. Merging (i.e. not blocking) PRs that should
be merged is another. And there are a few more goals, of course. The
problem described in my second bullet represents a CI failure to reach
one of the key CI goals or a failure to maintain a critical CI
invariant. It is a CI failure rather than a PR failure (the latter is
covered by the first bullet).


> e.g. v4+ currently do not build on Windows. We know this, but the
> current CI testing does not show it. Upgrading the CI to include a test
> for Windows is not a "CI failure".

If such an upgrade would result in blocking PRs that do not touch
Windows code, then that upgrade would be a CI failure. Or a failure to
properly upgrade CI.


>> In these cases, if the latest master code
>> is tested with the same test after the problematic CI change, then that
>> master test will fail. Nothing a PR can do in this situation can fix
>> this kind of failure because it is not PR changes that are causing the
>> failure -- CI changes broke the master branch,
> 
> Ah. "broke the master branch" is a bit excessive. master is not broken
> any more or less than it already was.

If you can suggest a better short phrase to describe the problem, please
do so! Until then, I will have to continue to use "breaking master" (in
this context) simply for the lack of a better phrase. I tried to explain
what that phrase means to avoid misunderstanding. I cannot find a better
way to compress that explanation into a single phrase.


> What is *actually* broken is the CI test results.

One can easily argue that CI test results are actually "correct" in this
case -- they correctly discover a bug in the code that wants to become
the next master. The problem is in the assignment of responsibility: The
PR did not introduce that bug, so the PR should not be punished for that
bug. The only way to avoid such punishment (given the natural automated
CI limitations) is to avoid breaking master tests, as previously defined.


> Otherwise, short periods between sysadmin thinking it was a safe change
> and reverting as breakage appeared 

Re: [squid-dev] Strategy about build farm nodes

2021-05-17 Thread Francesco Chemolli
>
>
> Adding new nodes with next distro release versions is a manual process
> not related to keeping existing nodes up to date (which is automated?).
>

Mostly.
Our Linux environments are docker containers on amd64, armv7l and arm64.
On a roughly monthly cadence, I pull from our dockerfiles repo (
https://github.com/kinkie/dockerfiles) and
$ make all push
The resulting docker images are free for everybody to use and test things
on on any docker system
(https://hub.docker.com/r/squidcache/buildfarm). Just
$ docker run -ti --rm -u jenkins squidcache/buildfarm:$(uname -m)- /bin/bash -l
(note: the above command will not preserve any artifacts once the shell
exits)

Adding new Linux distros means copying and tweaking a Dockerfile, testing
things, and updating our Jenkins jobs. I do it roughly every 6 months

FreeBSD, OpenBSD and (hopefully soon) Windows are hand-managed and much
slower changing VMs


> >> What I would place on each individual dev is the case where a PR breaks
> >> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
> >> trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test
> >> and 5-pr-auto builds fail to detect the breakage because it happens on a
> >> unstable or old platform. >
> > This feels a bit out of topic for me, but I think you are saying that
> > some CI tests called trunk-matrix, trunk-arm32-matrix,
> > trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be
> > classified as _required_.
>
> That is how I read the statement too.
>

In a word of infinite resources and very efficient testing, sure.
But in a space where a single os/compiler combo takes 2hrs on Linux and
4hrs on Freebsd or openbsd, and a full 5-pr-test takes 6 hours end to end,
we need to optimize or making any of these requirements blocking would make
these times get 4+ times larger (a full trunk-matrix takes just about a day
on amd64, 2 days on arm64), and the complaint would then be that
development or release is slowed down by the amount of testing done.

My proposal aims to test/land the PR on the systems where we can be
efficient and that are relevant, and fix any remaining after-the-fact
issues with followup, PRs, that remain a soft requirement for the dev
introducing the change. The dev can test any work they're doing with the
anybranch-* jobs, if they don't have access to that OS

Can we do better? Sure.
For the sake of cost-mindedness a lot of the build farm nodes run in my
home - a couple of raspberry PIs, an Intel NUC, I'm in the process of
purchasing a second (and second-hand) NUC that comes with a Windows
license. The set up is meant to be thrifty, I'm mindful of burning
Foundation resources for little gain and I'm running a balancing act
between always-on VMs, on-demand VMs and my own gadgets.

The real game changer would be rethinking how we do things to reduce the
amount of testing needed.

For instance: we currently have a lot of different build-time
configurations meant to save core memory in runtime environments. Is it
maybe time to revisit this decision and move these checks to run time?
Unfortunately, one of the problems we have is that we're running blind. We
don't know what configurations our users deploy; we can only assume and
that makes this conversation much more susceptible to opinions and harder
to build consensus on

-- 
Francesco
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Strategy about build farm nodes

2021-05-16 Thread squid3

On 2021-05-17 11:56, Alex Rousskov wrote:

On 5/16/21 3:31 AM, Amos Jeffries wrote:

On 4/05/21 2:29 am, Alex Rousskov wrote:

On 5/3/21 12:41 AM, Francesco Chemolli wrote:

- we want our QA environment to match what users will use. For this
reason, it is not sensible that we just stop upgrading our QA nodes,


I see flaws in reasoning, but I do agree with the conclusion -- yes, 
we
should upgrade QA nodes. Nobody has proposed a ban on upgrades 
AFAICT!


The principles I have proposed allow upgrades that do not violate key
invariants. For example, if a proposed upgrade would break master, 
then

master has to be changed _before_ that upgrade actually happens, not
after. Upgrades must not break master.


So ... a node is added/upgraded. It runs and builds master fine. Then
added to the matrices some of the PRs start failing.


It is easy to misunderstand what is going on because there is no good
visualization of complex PR-master-Jenkins_nodes-Jenkins_failures
relationships. Several kinds of PR test failures are possible. I will
describe the two most relevant to your email:

* PR test failures due to problems introduced by PRs should be welcomed
at any time.


Strawman here. This is both general statement and not relevant to CI 
changes or design(s) we are discussing.



CI improvements are allowed to find new bugs in open PRs.


IMO the crux is that word "new". CI improvements very rarely find new 
bugs. What it actually finds and intentionally so is *existing bugs* the 
old CI config wrongly ignored.


Such findings, even when discovered at the "last minute", should be 
seen

as an overall positive event or progress -- our CI was able to identify
a problem before it got officially accepted! I do not recall anybody
complaining about such failures recently.



Conclusion being that due to the rarity of "new bugs" CI improvements 
very rarely get complained about due to them.




* PR test failures due to the existing master code are not welcomed.


That is not as black/white as the statement above implies. There are 
some master branch bugs we don't want to block PRs merging, and there 
are some (rarely) we absolutely do not want any PRs to change master 
until fixed.



They represent a CI failure.


IMO this is absolutely false. The whole point of improving CI is to find 
those "existing" bugs which the previous CI config wrong missed.


e.g. v4+ currently do not build on Windows. We know this, but the 
current CI testing does not show it. Upgrading the CI to include a test 
for Windows is not a "CI failure".




In these cases, if the latest master code
is tested with the same test after the problematic CI change, then that
master test will fail. Nothing a PR can do in this situation can fix
this kind of failure because it is not PR changes that are causing the
failure -- CI changes broke the master branch,


Ah. "broke the master branch" is a bit excessive. master is not broken 
any more or less than it already was.


What is *actually* broken is the CI test results.



not just the PR! This
kind of failures are the responsibility of CI administrators, and PR
authors should complain about them, especially when there are no signs
of CI administrators aware of and working on addressing the problem.



*IF* all the conditions and assumptions contained in that final sentence 
are true I would agree. Such case points to incompetence or neglect on 
part of the sysadmin who broken *the CI test* then abandoned fixing it - 
complaints are reasonable there.


 [ Is kinkie acting incompetently on a regular basis? I think no. ]

Otherwise, short periods between sysadmin thinking it was a safe change 
and reverting as breakage appeared is to be expected. That is why we 
have sysadmin doing advance notices for us all to be aware of CI changes 
planned. Complaints still happen, but not much reason to redesign the 
sysadmin practices and automation (which is yet more CI change, ...).




A good example of a failure of the second kind a -Wrange-loop-construct
error in a PR that does not touch any range loops (Jenkins conveniently
deleted the actual failed test, but my GitHub comment and PR contents
may be enough to restore what happened):
https://github.com/squid-cache/squid/pull/806#issuecomment-827924821



Thank you.

I see here two distros which have "rolling release" being updated by 
sysadmin from producing outdated and wrong test results, to producing 
correct test results. This is a correct change in line with the goal of 
our nodes representing what a user running that OS would see building 
Squid master or PRs.
  One distro changed compiler and both turned on a new warning by 
default which exposed existing Squid bugs. Exactly as intended.


IMO we can expect to occur on a regular basis and it is specific to 
"rolling release" distros. We can resolve it by having those OS only 
build in the N-matrix applied before releases, instead of the matrix 
blocking PR tests or merging.


 If we are all agreed, kinkie 

Re: [squid-dev] Strategy about build farm nodes

2021-05-16 Thread Alex Rousskov
On 5/16/21 3:31 AM, Amos Jeffries wrote:
> On 4/05/21 2:29 am, Alex Rousskov wrote:
>> On 5/3/21 12:41 AM, Francesco Chemolli wrote:
>>> - we want our QA environment to match what users will use. For this
>>> reason, it is not sensible that we just stop upgrading our QA nodes,
>>
>> I see flaws in reasoning, but I do agree with the conclusion -- yes, we
>> should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT!
>>
>> The principles I have proposed allow upgrades that do not violate key
>> invariants. For example, if a proposed upgrade would break master, then
>> master has to be changed _before_ that upgrade actually happens, not
>> after. Upgrades must not break master.
> 
> So ... a node is added/upgraded. It runs and builds master fine. Then
> added to the matrices some of the PRs start failing.

It is easy to misunderstand what is going on because there is no good
visualization of complex PR-master-Jenkins_nodes-Jenkins_failures
relationships. Several kinds of PR test failures are possible. I will
describe the two most relevant to your email:

* PR test failures due to problems introduced by PRs should be welcomed
at any time. CI improvements are allowed to find new bugs in open PRs.
Such findings, even when discovered at the "last minute", should be seen
as an overall positive event or progress -- our CI was able to identify
a problem before it got officially accepted! I do not recall anybody
complaining about such failures recently.

* PR test failures due to the existing master code are not welcomed.
They represent a CI failure. In these cases, if the latest master code
is tested with the same test after the problematic CI change, then that
master test will fail. Nothing a PR can do in this situation can fix
this kind of failure because it is not PR changes that are causing the
failure -- CI changes broke the master branch, not just the PR! This
kind of failures are the responsibility of CI administrators, and PR
authors should complain about them, especially when there are no signs
of CI administrators aware of and working on addressing the problem.

A good example of a failure of the second kind a -Wrange-loop-construct
error in a PR that does not touch any range loops (Jenkins conveniently
deleted the actual failed test, but my GitHub comment and PR contents
may be enough to restore what happened):
https://github.com/squid-cache/squid/pull/806#issuecomment-827924821


[I am skipping the exaggerations (about other people exaggerations) that
were, AFAICT, driven by mis-classifying the failures of the second kind
as regular PR failures of the first kind.]


>> What this means in terms of sysadmin steps for doing upgrades is up to
>> you. You are doing the hard work here, so you can optimize it the way
>> that works best for _you_. If really necessary, I would not even object
>> to trial upgrades (that may break master for an hour or two) as long as
>> you monitor the results and undo the breaking changes quickly and
>> proactively (without relying on my pleas to fix Jenkins to detect
>> breakages). I do not know what is feasible and what the best options
>> are, but, again, it is up to _you_ how to optimize this (while observing
>> the invariants).


> Uhm. Respectfully, from my perspective the above paragraph conflicts
> directly with actions taken.

My paragraph is not really about any taken actions so there cannot be
any such conflict AFAICT.


> From what I can tell kinkie (as sysadmin) *has* been making a new node
> and testing it first. Not just against master but the main branches and
> most active PRs before adding it for the *post-merge* matrix testing
> snapshot production.

The above summary does not match the (second kind of) PR test failures
that I have observed and asked Francesco to address. Those failures were
triggered by the latest master code, not PR changes. No PR changes would
have fixed those test failures. In fact, an "empty" PR that introduces
no code changes at all would have failed as well. See above for one
recent example.


>   But still threads like this one with complaints appear.

This squid-dev thread did not contain any complaints AFAICT. Francesco
is trying to get consensus on how to handle CI upgrades/changes. He is
doing the right thing and nobody is complaining about it.


> I understand there is some specific pain you have encountered to trigger
> the complaint. Can we get down to documenting as exactly as possible
> what the particular pain was?

Well, I apparently do not know what you call "complaint", so I cannot
answer this exact question, but if you are looking for a recent PR test
failure that triggered this squid-dev thread, then please see the GitHub
link above.



> Test designs that do not fit into our merge and release process sequence
> have proven time and again to be broken and painful to Alex when they
> operate as-designed. For the rest of us it is this constant re-build of
> automation which is the painful part.

While I am not 

Re: [squid-dev] Strategy about build farm nodes

2021-05-16 Thread Amos Jeffries

On 4/05/21 2:29 am, Alex Rousskov wrote:

On 5/3/21 12:41 AM, Francesco Chemolli wrote:

- we want our QA environment to match what users will use. For this
reason, it is not sensible that we just stop upgrading our QA nodes,


I see flaws in reasoning, but I do agree with the conclusion -- yes, we
should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT!

The principles I have proposed allow upgrades that do not violate key
invariants. For example, if a proposed upgrade would break master, then
master has to be changed _before_ that upgrade actually happens, not
after. Upgrades must not break master.


So ... a node is added/upgraded. It runs and builds master fine. Then 
added to the matrices some of the PRs start failing.


*THAT* is the situation I see happening recently. Master itself working 
fine and "huge amounts of pain, the sky is falling" complaints from a 
couple of people.


Sky is not falling. Master is no more nor less broken and buggy than it 
was before sysadmin touched Jenkins.


The PR itself is no more, nor less, "broken" than it would be if for 
example - it was only tested on Linux nodes and fails to compile on 
Windows. As the case for master *right now* happens to be.





What this means in terms of sysadmin steps for doing upgrades is up to
you. You are doing the hard work here, so you can optimize it the way
that works best for _you_. If really necessary, I would not even object
to trial upgrades (that may break master for an hour or two) as long as
you monitor the results and undo the breaking changes quickly and
proactively (without relying on my pleas to fix Jenkins to detect
breakages). I do not know what is feasible and what the best options
are, but, again, it is up to _you_ how to optimize this (while observing
the invariants).



Uhm. Respectfully, from my perspective the above paragraph conflicts 
directly with actions taken.


From what I can tell kinkie (as sysadmin) *has* been making a new node 
and testing it first. Not just against master but the main branches and 
most active PRs before adding it for the *post-merge* matrix testing 
snapshot production.


  But still threads like this one with complaints appear.



I understand there is some specific pain you have encountered to trigger 
the complaint. Can we get down to documenting as exactly as possible 
what the particular pain was?


 Much of the processes we are discussing are scripted automation not 
human processing mistakes. Handling such pain points as bugs with 
bugzilla "Project" section would be best. Re-designing the entire system 
policy just moves us all to another set of unknown bugs when the scripts 
are re-coded to meet that policy.






- I believe we should define four tiers of runtime environments, and
reflect these in our test setup:



  1. current and stable (e.g. ubuntu-latest-lts).
  2. current (e.g. fedora 34)
  3. bleeding edge
  4. everything else - this includes freebsd and openbsd


I doubt this classification is important to anybody _outside_ this
discussion, so I am OK with whatever classification you propose to
satisfy your internal needs.



IIRC this is the 5th iteration of ground-up redesign for this wheel.

Test designs that do not fit into our merge and release process sequence 
have proven time and again to be broken and painful to Alex when they 
operate as-designed. For the rest of us it is this constant re-build of 
automation which is the painful part.



A. dev pre-PR testing
   - random individual OS.
   - matrix of everything (anybranch-*-matrix)

B. PR submission testing
   - which OS for master (5-pr-test) ?
   - which OS for beta (5-pr-test) ?
   - which OS for stable (5-pr-test) ?

Are all of those sets the same identical OS+compilers? no.
Why are they forced to be the same matrix test?
  IIRC, policy forced on sysadmin with previous pain complaints.

Are we getting painful experiences from this?
  Yes. Lack of branch-specific testing before D on beta and stable 
causes those branches to break a lot more often at last-minute before 
releases than master. Adding random days/weeks to each scheduled release.



C. merge testing
   - which OS for master (5-pr-auto) ?
   - which OS for beta (5-pr-auto) ?
   - which OS for stable (5-pr-auto) ?
 NP: maintainer does manual override on beta/stable merges.

Are all of those sets the same identical OS+compilers? no.
  Why are they forced to be the same matrix test? Anubis

Are we getting painful experiences from this? yes. see (B).


D. pre-release testing (snapshots + formal)
   - which OS for master (trunk-matrix) ?
   - which OS for beta (5-matrix) ?
   - which OS for stable (4-matrix) ?

Are all of those sets the same identical OS+compilers? no.
Are we forcing them to use the same matrix test? no.
Are we getting painful experiences from this? maybe.
  Most loud complaints have been about "breaking master" which is the 
most volatile branch testing on the most volatile OS.




FTR: the reason all those 

Re: [squid-dev] Strategy about build farm nodes

2021-05-03 Thread Alex Rousskov
On 5/3/21 12:41 AM, Francesco Chemolli wrote:
> - we want our QA environment to match what users will use. For this
> reason, it is not sensible that we just stop upgrading our QA nodes, 

I see flaws in reasoning, but I do agree with the conclusion -- yes, we
should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT!

The principles I have proposed allow upgrades that do not violate key
invariants. For example, if a proposed upgrade would break master, then
master has to be changed _before_ that upgrade actually happens, not
after. Upgrades must not break master.

What this means in terms of sysadmin steps for doing upgrades is up to
you. You are doing the hard work here, so you can optimize it the way
that works best for _you_. If really necessary, I would not even object
to trial upgrades (that may break master for an hour or two) as long as
you monitor the results and undo the breaking changes quickly and
proactively (without relying on my pleas to fix Jenkins to detect
breakages). I do not know what is feasible and what the best options
are, but, again, it is up to _you_ how to optimize this (while observing
the invariants).


> - I believe we should define four tiers of runtime environments, and
> reflect these in our test setup:

>  1. current and stable (e.g. ubuntu-latest-lts).
>  2. current (e.g. fedora 34)
>  3. bleeding edge
>  4. everything else - this includes freebsd and openbsd

I doubt this classification is important to anybody _outside_ this
discussion, so I am OK with whatever classification you propose to
satisfy your internal needs.


> I believe we should focus on the first two tiers for our merge workflow,
> but then expect devs to fix any breakages in the third and fourth tiers
> if caused by their PR,

FWIW, I do not understand what "focus" implies in this statement, and
why developers should _not_ "fix any breakages" revealed by the tests in
the first two tiers.

The rules I have in mind use two natural tiers:

* If a PR cannot pass a required CI test, that PR has to change before
it can be merged.

* If a PR cannot pass an optional CI test, it is up to PR author and
reviewers to decide what to do next.

These are very simple rules that do not require developer knowledge of
any complex test node tiers that we might define/use internally.

Needless to say, the rules assume that the tests themselves are correct.
If not, the broken tests need to be fixed (by the Squid Project) before
the first bullet/rule above can be meaningfully applied (the second one
is flexible enough to allow PR author and reviewers to ignore optional
test failures).


> Breakages due to changes in nodes (e.g. introducing a new distro
> version) would be on me and would not stop the merge workflow.

What you do internally to _avoid_ breakage is up to you, but the primary
goal is to _prevent_ CI breakage (rather than to keep CI nodes "up to
date"!). If CI is broken, then it is the responsibility of the Squid
Project to fix CI ASAP. Thank you for assuming that responsibility as
far as Jenkins tests are concerned.

There are many ways to break CI and detect those breakages, of course,
but if master cannot pass required tests after a CI change, then the
change broke CI.


> What I would place on each individual dev is the case where a PR breaks
> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
> trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test
> and 5-pr-auto builds fail to detect the breakage because it happens on a
> unstable or old platform. 

This feels a bit out of topic for me, but I think you are saying that
some CI tests called trunk-matrix, trunk-arm32-matrix,
trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be
classified as _required_. In other words, a PR must pass those CI tests
before it can be merged. Is that the situation today? Or are you
proposing some changes to the list of required CI tests? What are those
changes?


Thank you,

Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Strategy about build farm nodes

2021-05-02 Thread Francesco Chemolli
On Wed, Apr 28, 2021 at 11:34 PM Alex Rousskov <
rouss...@measurement-factory.com> wrote:

> On 4/28/21 5:12 PM, Amos Jeffries wrote:
> > I'm not sure why this is so controversial still. We have already been
> > over these and have a policy from last time:
>
> Apparently, the recollections of what was agreed upon, if anything,
> during that "last time" differ. If you can provide a pointer to that
> "last time" agreement, please do so.
>

Merge workflows are agreed, and not in discussion. Recent discussions have
highlighted some issues with what's around them, and I'm trying to clarify
that as well

> * dev PR submissions use the volatile 5-pr-test, after test approval by
> > anyone in QA. Check against unstable OS nodes, as many as possible.
> > Kinkie adds/removes from that set as upgrades fix or break at CI end of
> > things.
>
> I do not know how to interpret the last sentence correctly, but, IMO, we
> should not add or change nodes if doing so breaks master tests. AFAICT
> from PR 806 discussion[1], Francesco thinks that it is not a big deal to
> do so. The current discussion is meant to resolve that disagreement.
>

Let me highlight the underlying principles for my proposal: IMO our
objectives are, in descending order of importance (all points should be
intended "as possible given our resources"):
1. ensure we ship healthy code to a maximum number of users
2. have minimal friction in the development workflow

These objectives have a set of consequences:
- we want our QA environment to match what users will use. For this reason,
it is not sensible that we just stop upgrading our QA nodes, or we would
target something that doesn't match our users' experience
- it makes little sense to target unstable distributions (fedora rawhide,
possibly centos stream, gentoo, opensuse tumbleweed, debian unstable) as
first-class citizens of the testing workflow, especially on stages that are
executed often (pr-test)

This means that:
- I periodically weed out distributions that are no longer supported (e.g.
Fedora 31, Ubuntu Xenial) and add current distribution (e.g. Ubuntu
Hirsute, Fedora 34).
I take it on me that when I do that, I need to ensure new compiler features
do not block previously undetected behaviours - I am currently failing
this, see https://build.squid-cache.org/job/trunk-matrix/121/ . I will need
to develop a process with a proper staging phase.
- I believe we should define four tiers of runtime environments, and
reflect these in our test setup:
 1. current and stable (e.g. ubuntu-latest-lts). These are not expected to
change much over a span of years, and to offer non-breaking updates over
their lifetime
 2. current (e.g. fedora 34)
 3. bleeding edge: they may introduce breaking changes which it makes sense
to follow because they might highlight real issues and because they will
eventually trickle down to current and then lts
 4. everything else - this includes freebsd and openbsd (mostly due to the
different virtualization tech they use)

I believe we should focus on the first two tiers for our merge workflow,
but then expect devs to fix any breakages in the third and fourth tiers if
caused by their PR, while I will care for any breakages caused by
dist-upgrades


> [1] https://github.com/squid-cache/squid/pull/806#issuecomment-827937563
>
>
> > * anubis auto branch tested by curated set of LTS stable nodes only.
>
> FWIW, the above list and the original list by Francesco appears to focus
> on distro stability, popularity, and other factors that are largely
> irrelevant to the disagreement at hand. The disagreement is whether it
> is OK to break master (and, hence, all PR) tests by changing CI. It does
> not matter whether that CI change comes from an upgrade of an "LTS
> stable node", "unstable node", or some other source IMO. Breaking
> changes should not be allowed (in the CI environments under our
> control). If they slip through despite careful monitoring for change
> effects, the breaking CI changes should be reverted.
>

I think it depends.
Breakages due to changes in nodes (e.g. introducing a new distro version)
would be on me and would not stop the merge workflow.
What I would place on each individual dev is the case where a PR breaks
something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix,
trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test
and 5-pr-auto builds fail to detect the breakage because it happens on a
unstable or old platform.

-- 
Francesco
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Strategy about build farm nodes

2021-04-28 Thread Alex Rousskov
On 4/28/21 5:12 PM, Amos Jeffries wrote:
> I'm not sure why this is so controversial still. We have already been
> over these and have a policy from last time:

Apparently, the recollections of what was agreed upon, if anything,
during that "last time" differ. If you can provide a pointer to that
"last time" agreement, please do so.


> * dev PR submissions use the volatile 5-pr-test, after test approval by
> anyone in QA. Check against unstable OS nodes, as many as possible.
> Kinkie adds/removes from that set as upgrades fix or break at CI end of
> things.

I do not know how to interpret the last sentence correctly, but, IMO, we
should not add or change nodes if doing so breaks master tests. AFAICT
from PR 806 discussion[1], Francesco thinks that it is not a big deal to
do so. The current discussion is meant to resolve that disagreement.

[1] https://github.com/squid-cache/squid/pull/806#issuecomment-827937563


> * anubis auto branch tested by curated set of LTS stable nodes only.

FWIW, the above list and the original list by Francesco appears to focus
on distro stability, popularity, and other factors that are largely
irrelevant to the disagreement at hand. The disagreement is whether it
is OK to break master (and, hence, all PR) tests by changing CI. It does
not matter whether that CI change comes from an upgrade of an "LTS
stable node", "unstable node", or some other source IMO. Breaking
changes should not be allowed (in the CI environments under our
control). If they slip through despite careful monitoring for change
effects, the breaking CI changes should be reverted.

Alex.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Strategy about build farm nodes

2021-04-28 Thread ‪Amos Jeffries‬
I'm not sure why this is so controversial still. We have already been over these and have a policy from last time: * dev PR submissions use the volatile 5-pr-test, after test approval by anyone in QA. Check against unstable OS nodes, as many as possible. Kinkie adds/removes from that set as upgrades fix or break at CI end of things.* anubis auto branch tested by curated set of LTS stable nodes only.* website snapshot builds tested against curated set of all nodes that are known to work well building the relevant branch in the past. No recently updated nodes.Amos___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] Strategy about build farm nodes

2021-04-28 Thread Alex Rousskov
On 4/28/21 1:45 AM, Francesco Chemolli wrote:

>   I'm moving here the discussion from PR #806 about what strategy to
> have for CI tests, looking for an agreement.

> We have 3 classes of tests ni our CI farm
> (https://build.squid-cache.org/)

> - PR staging tests, triggered by commit hooks on GitHub (possibly with
> human approval)

>    the job is 5-pr-auto (misnamed, it tests trunk with the PR applied).

FYI: The "auto" branch is "master branch + PR". When PR is merged,
master becomes identical to the auto branch. The name "auto" is fairly
common in CI AFAICT. In Anubis (and relatively widespread/common)
terminology[1], the words "staged" and "staging" mean what you describe
as "trunk with the PR applied" below.

[1]
https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-merging-algorithm

AFAICT, this class should _follow_ the 5-pr-test class (described below)
because a PR is not staged until it is approved and passes 5-pr-test checks.


>    To be successful, it needs to successfully compile and unit-test all
> combinations of clang and gcc on the latest stable version of the most
> popular linux distros, at this time centos-8, debian-testing,
> fedora-33, opensuse-leap, ubuntu-focal


> - PR build tests run , after a PR is approved, also triggered by GitHub
>   the job is 5-pr-test

AFAICT, this job tests the "GitHub-generated merge commit for the PR
branch". That commit may become stale if master advances while there
were no PR branch commits. Unlike the GitHub-generated merge commit, the
"auto" branch (mentioned above) is kept in sync with master by Anubis.

We do not merge the GitHub-generated merge commit into master.

If 5-pr-test requires a PR approval, then something probably went wrong
somewhere. Testing PR branch or the GitHub-generated merge commit should
not require a PR approval. Perhaps you meant the "OK to test" _testing_
approval. That testing approval is indeed required and expected for
testing GitHub-generated merge commits on vulnerable Jenkins.


>   To be successful, it needs to compile and unit-test all combinations
> of clang and gcc on LTS and most recent versions of most popular linux
> distros, at this time: centos-7,
> debian-stable, debian-unstable, fedora-32, fedora-34, ubuntu-bionic, 
> ubuntu-hirsute

> - full-scale build tests (jobs: trunk-matrix, trunk-arm32-matrix,
> trunk-arm64-matrix, trunk-freebsd13-clang-matrix, trunk-openbsd-matrix)
>   these test everything we can test, including bleeding edge distros
> such as getntoo, rawhide, tumbleweed, and the latest compilers. 

Please clarify what branches/commits these "full-scale build tests" test.

Ideally, all this should be documented somewhere on Squid wiki!


> Failing
> these will not block a PR from mergeing, but there is an expectation
> that build regressions will be fixed

If we expect a fix, then the failure has to block something important,
like the responsible PR (ideally), the next numbered release, or all
PRs. What does this failure block today, if anything?


> The change in policy since last week

There was no change in any Project policies AFAIK. You have changed
Jenkins setup, but that is not a change in policy. It is likely that
there is no policy at all right now -- just ad hoc decisions by
sysadmins. It would be a stretch to call that current approach a
"policy", but even if we do call it that, then that "policy" has not
"changed" :-).

Alternatively, Jenkins configuration changes have violated a Project
policy (one cannot _change_ a Project policy unilaterally; one can only
violate it). I prefer the "no policy" version above though :-).


> is that the PR-blocking builds used
> to depend on unstabledistros (fedora-rawhide and opensuse-tumbleweed),
> I've removed that today as part of the discussion on PR #806
> This policy would allow keeping stable distros uptodate and matching
> expected user experience, while not introducing instability that would
> come in from the unstable distros
> One distro that's notably missing is centos-stream, this is due to
> technical issues with getting a good docker image for it, when available
> I'll add it

> Would this work as a general policy?

I am not sure what exact general policy you are proposing -- please
extract it from the lengthy prose above (removing the ever-changing
low-level specifics to the extent possible, to arrive at something truly
"general"). FWIW, if the fleshed out proposal violates the existing "do
not break master" principle below, then I do not think it will work
well. I suspect that a focus on vague optimization targets like
"expected user experience" will not allow us to formulate a usable
policy we can all agree on, but I would be happy to be proven wrong.


I can propose the following. Perhaps knowing my state of mind will help
you finalize your proposal.

We already have a "do not break master" principle:

* Master always passes all current development-related tests.


I propose to add the following 

[squid-dev] Strategy about build farm nodes

2021-04-27 Thread Francesco Chemolli
Hi all,
  I'm moving here the discussion from PR #806 about what strategy to have
for CI tests, looking for an agreement.

We have 3 classes of tests ni our CI farm (https://build.squid-cache.org/)
- PR staging tests, triggered by commit hooks on GitHub (possibly with
human approval)
   the job is 5-pr-auto (misnamed, it tests trunk with the PR applied).
   To be successful, it needs to successfully compile and unit-test all
combinations of clang and gcc on the latest stable version of the most
popular linux distros, at this time centos-8, debian-testing,
fedora-33, opensuse-leap, ubuntu-focal
- PR build tests run , after a PR is approved, also triggered by GitHub
  the job is 5-pr-test
  To be successful, it needs to compile and unit-test all combinations of
clang and gcc on LTS and most recent versions of most popular linux
distros, at this time: centos-7,
debian-stable, debian-unstable, fedora-32, fedora-34, ubuntu-bionic,
ubuntu-hirsute
- full-scale build tests (jobs: trunk-matrix, trunk-arm32-matrix,
trunk-arm64-matrix, trunk-freebsd13-clang-matrix, trunk-openbsd-matrix)
  these test everything we can test, including bleeding edge distros such
as getntoo, rawhide, tumbleweed, and the latest compilers. Failing these
will not block a PR from mergeing, but there is an expectation that build
regressions will be fixed

The change in policy since last week is that the PR-blocking builds used to
depend on unstabledistros (fedora-rawhide and opensuse-tumbleweed), I've
removed that today as part of the discussion on PR #806
This policy would allow keeping stable distros uptodate and matching
expected user experience, while not introducing instability that would come
in from the unstable distros
One distro that's notably missing is centos-stream, this is due to
technical issues with getting a good docker image for it, when available
I'll add it

Would this work as a general policy? If there is agreement, I'll support
it moving forward

Thanks

-- 
Francesco
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev