On 2021-05-17 11:56, Alex Rousskov wrote:
On 5/16/21 3:31 AM, Amos Jeffries wrote:
On 4/05/21 2:29 am, Alex Rousskov wrote:
On 5/3/21 12:41 AM, Francesco Chemolli wrote:
- we want our QA environment to match what users will use. For this
reason, it is not sensible that we just stop upgrading our QA nodes,
I see flaws in reasoning, but I do agree with the conclusion -- yes,
we
should upgrade QA nodes. Nobody has proposed a ban on upgrades
AFAICT!
The principles I have proposed allow upgrades that do not violate key
invariants. For example, if a proposed upgrade would break master,
then
master has to be changed _before_ that upgrade actually happens, not
after. Upgrades must not break master.
So ... a node is added/upgraded. It runs and builds master fine. Then
added to the matrices some of the PRs start failing.
It is easy to misunderstand what is going on because there is no good
visualization of complex PR-master-Jenkins_nodes-Jenkins_failures
relationships. Several kinds of PR test failures are possible. I will
describe the two most relevant to your email:
* PR test failures due to problems introduced by PRs should be welcomed
at any time.
Strawman here. This is both general statement and not relevant to CI
changes or design(s) we are discussing.
CI improvements are allowed to find new bugs in open PRs.
IMO the crux is that word "new". CI improvements very rarely find new
bugs. What it actually finds and intentionally so is *existing bugs* the
old CI config wrongly ignored.
Such findings, even when discovered at the "last minute", should be
seen
as an overall positive event or progress -- our CI was able to identify
a problem before it got officially accepted! I do not recall anybody
complaining about such failures recently.
Conclusion being that due to the rarity of "new bugs" CI improvements
very rarely get complained about due to them.
* PR test failures due to the existing master code are not welcomed.
That is not as black/white as the statement above implies. There are
some master branch bugs we don't want to block PRs merging, and there
are some (rarely) we absolutely do not want any PRs to change master
until fixed.
They represent a CI failure.
IMO this is absolutely false. The whole point of improving CI is to find
those "existing" bugs which the previous CI config wrong missed.
e.g. v4+ currently do not build on Windows. We know this, but the
current CI testing does not show it. Upgrading the CI to include a test
for Windows is not a "CI failure".
In these cases, if the latest master code
is tested with the same test after the problematic CI change, then that
master test will fail. Nothing a PR can do in this situation can fix
this kind of failure because it is not PR changes that are causing the
failure -- CI changes broke the master branch,
Ah. "broke the master branch" is a bit excessive. master is not broken
any more or less than it already was.
What is *actually* broken is the CI test results.
not just the PR! This
kind of failures are the responsibility of CI administrators, and PR
authors should complain about them, especially when there are no signs
of CI administrators aware of and working on addressing the problem.
*IF* all the conditions and assumptions contained in that final sentence
are true I would agree. Such case points to incompetence or neglect on
part of the sysadmin who broken *the CI test* then abandoned fixing it -
complaints are reasonable there.
[ Is kinkie acting incompetently on a regular basis? I think no. ]
Otherwise, short periods between sysadmin thinking it was a safe change
and reverting as breakage appeared is to be expected. That is why we
have sysadmin doing advance notices for us all to be aware of CI changes
planned. Complaints still happen, but not much reason to redesign the
sysadmin practices and automation (which is yet more CI change, ...).
A good example of a failure of the second kind a -Wrange-loop-construct
error in a PR that does not touch any range loops (Jenkins conveniently
deleted the actual failed test, but my GitHub comment and PR contents
may be enough to restore what happened):
https://github.com/squid-cache/squid/pull/806#issuecomment-827924821
Thank you.
I see here two distros which have "rolling release" being updated by
sysadmin from producing outdated and wrong test results, to producing
correct test results. This is a correct change in line with the goal of
our nodes representing what a user running that OS would see building
Squid master or PRs.
One distro changed compiler and both turned on a new warning by
default which exposed existing Squid bugs. Exactly as intended.
IMO we can expect to occur on a regular basis and it is specific to
"rolling release" distros. We can resolve it by having those OS only
build in the N-matrix applied before releases, instead of the matrix
blocking PR tests or merging.
If we are all agreed, kinkie or I can implement ASAP.
<skip>
B. PR submission testing
- which OS for master (5-pr-test) ?
- which OS for beta (5-pr-test) ?
- which OS for stable (5-pr-test) ?
Are all of those sets the same identical OS+compilers? no.
Why are they forced to be the same matrix test?
I do not understand the question. Are you asking why Jenkins uses the
same 5-pr-test configuration for all three branches (master, beta,
_and_
stable)? I do not know the answer.
So can we agree that they should be different tests?
If we are all agreed, that can be implemented.
After test separation we have the choice of OS to answer those questions
I posed.
My idea is to go through distrowatch (see file attached) and sync the
tests with OS that provide that vN (or lower) of Squid as part of its
release. Of course, following the sysadmin testing process for any
additions wanted.
IIRC, policy forced on sysadmin with previous pain complaints.
Complaints, even legitimate ones, should not shape a policy. Goals and
principles should do that.
I remember one possibly related discussion where we were trying to
reduce Jenkins/PR wait times by changing which tests are run at what PR
merging stages, but that is probably a different issue because your
question appears to be about a single merging stage.
I think it was the discussion re-inventing the policy prior to that
performance one.
C. merge testing
- which OS for master (5-pr-auto) ?
- which OS for beta (5-pr-auto) ?
- which OS for stable (5-pr-auto) ?
NP: maintainer does manual override on beta/stable merges.
Are all of those sets the same identical OS+compilers? no.
Why are they forced to be the same matrix test? Anubis
This is too cryptic for me to understand, but Anubis does not force any
tests on anybody -- it simply checks that the required tests have
passed. I am not aware of any Anubis bugs in this area, but please
correct me if I am wrong.
My understanding was that Anubis only has ability to check PRs against
its auto branch which tracks master. Ability to have it track other
non-master branches and merge there is not available for use.
If that ability were available, we would need to implement different
matrix as with N-pr-test to use it without guaranteed pain points.
IMO we should look into this. But it is a technical project for sysadmin
+ Eduard to coordinate. Not a policy thing.
D. pre-release testing (snapshots + formal)
- which OS for master (trunk-matrix) ?
- which OS for beta (5-matrix) ?
- which OS for stable (4-matrix) ?
Are all of those sets the same identical OS+compilers? no.
Are we forcing them to use the same matrix test? no.
Are we getting painful experiences from this? maybe.
Most loud complaints have been about "breaking master" which is the
most volatile branch testing on the most volatile OS.
FWIW, I think you misunderstood what those "complaints" where about. I
do not know how that relates to the above questions/answers though.
Maybe. Our different view on what comprises "breaking master" certainly
confuses interpretations when the phrase is used as the
problem/complaint/report description.
FTR: the reason all those matrices have '5-' prefix is because several
redesigns ago the system was that master/trunk had a matrix which the
sysadmin added nodes to as OS upgraded. During branching vN the
maintainer would clone/freeze that matrix into an N-foo which would be
used to test the code against OS+compilers which the code in the vN
branch was designed to build on.
I think the above description implies that some time ago we were (more)
careful about (not) adding new nodes when testing stable branches. We
did not want a CI change to break a stable branch. That sounds like the
right principle to me (and it should apply to beta and master as well).
How that specific principle is accomplished is not important (to me) so
CI admins should propose whatever technique they think is best.
Can we have the people claiming pain specify exactly what the pain is
coming from, and let the sysadmin/developer(s) with specialized
knowledge of the automation in that area decide how best to fix it?
We can, and that is exactly what is going on in this thread AFAICT.
This
particular thread was caused by CI changes breaking master, and
Francesco was discussing how to avoid such breakages in the future.
There are other goals/principles to observe, of course, and it is
possible that Francesco is proposing more changes to optimize something
else as well, but that is something only he can clarify (if needed).
AFAICT, Francesco and I are on the same page regarding not breaking
master anymore -- he graciously agreed to prevent such breakages in the
future, and I am very thankful that he did. Based on your comments
discussing several cases where such master breakage is, in your
opinion,
OK, you currently disagree with that principle. I do not know why.
I think we differ in our definitions of "breaking master". You seem to
be including breakage of things in the CI system itself which I consider
outside of "master", or expected results of normal sysadmin activity. I
hope my response to the two use-cases you present at the top of this
email clarify.
Amos
Distrowatch report for Squid versions published:
Squid 5:
Fedora (rawhide)
Alpine Linux (3.13.5+)
Squid 4:
Manjaro Linux
Ubuntu
Debian
openSUSE
Arch Linux
Mageia
FreeBSD
PCLinuxOS
CentOS
Devuan GNU+Linux
Gentoo Linux
KNOPPIX
Red Hat Enterprise Linux
DragonFly BSD
OpenBSD
AlmaLinux OS
Oracle Linux
ALT Linux
Clear Linux
Calculate Linux
Univention Corporate Server
IPFire
Debian Edu/Skolelinux
Rocky Linux
SUSE Linux Enterprise
Zentyal Server
NetBSD
Karoshi
Springdale Linux
HardenedBSD
Exherbo
Vine Linux
Untangle NG Firewall
Network Security Toolkit
Condres OS (not active)
Feather Linux (not active)
Frugalware Linux (not active)
Lunar Linux (not active)
Squid 3.5:
EuroLinux
Funtoo Linux
Scientific Linux
SME Server
Endian Firewall
Asianux
Rocks Cluster Distribution
PLD Linux Distribution
Devil-Linux (not active)
Nova (not active)
Windows Cygwin [Diladele]
Squid 3.4: (dead)
Squid 3.3: (dead)
Squid 3.2: (dead)
Squid 3.1:
T2 SDE
Squid 3.0: (dead)
Squid 2.7:
MidnightBSD
Windows Native (Acme inactive)
_______________________________________________
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev