Re: [squid-dev] Strategy about build farm nodes
On 5/17/21 3:32 PM, Francesco Chemolli wrote: > On Mon, May 17, 2021 at 8:32 PM Alex Rousskov wrote: > > On 5/17/21 2:17 AM, Francesco Chemolli wrote: > > $ make all push > > Does that "make push" command automatically switch Jenkins CI to using > the new/pushed containers? Or is that a separate manual step? > Right now that's automatic. > I'm pondering the best way to have a staging step; Docker supports > versioning of images but we're using that feature to implement > multiarch. In order to rely on that I'll need to change the naming > conventions we rely on. I think you should be free to use whatever versioning/naming scheme you think works best for supporting safe Jenkins CI upgrades. Ideally, the new docker images (and any other changes) should be tested (against official branches) _before_ the official tests are switched to use them. I hope you find a way to implement that without a lot of work. If you need to tap into Foundation resources, please say so! > The overarching objectives are: > - focus on where most users probably are > - allow iterating quickly giving some signal on tracked branches > - allow landing quickly with more signal > - be slow but exhaustive on every other platform we can cover. Do not > block landing new code on less common or harder to test on platforms, > but still rely on devs to own their changes there, too. What exactly do you mean by "own their changes there"? The rest is general/vague enough to allow for many interpretations I can agree with, but that last bit seems rather specific, so I wanted to ask... > I think that full round of PR tests (i.e. one initial set plus one > staging set) should not exceed ~24 hours, _including_ any wait/queuing > time. > Is everyone okay with such a slow turnaround? Well, you should not overload the question by calling a 24-hour delay for an all-green signal slow :-). It is all relative, of course: Most Squid PRs take a lot longer to review and fix. There are open source projects where PRs wait for a week or more, so 24 hours for a full round (and usually just an hour or two for the first signal!) would feel very fast for them. Etc. Yes, 24 hours is not "interactive", but it is acceptable for the vast majority of PRs AFAICT, and you are providing dockers for those who want to test in a more interactive mode. > I think you are proposing to make some tests optional. Which ones? > > > Not the tests, but the platforms. From my test results consumer point of view, they are [platform-specific] [build] tests. > Things like gentoo, fedora rawhide, freebsd, openbsd. > Testing is slower there, so results will lag and we need to batch, > running a full test suite every week or so OK, so you propose to make slow Jenkins platform-specific tests (i.e. Jenkins tests on platforms where tests run slowly today) optional and those platforms are gentoo, fedora rawhide, freebsd, openbsd. Got it! What happens when such a slow test fails and the PR author ignores the failure of that optional test and/or the PR is already closed? The answer to that question is the key to evaluating any proposal that declares any tests optional IMO... > FWIW, I do not think reducing the number of #ifdefs will solve our > primary CI problems. > Each test build is right now 4 full builds and test suites: > autodetected, minimal, maximum, everytthing-in-except-for-auth And all of that takes less than 10 minutes on decent hardware which we can rent or buy! To me, it feels like we are creating an unsolvable problem here (i.e. testing a lot of things quickly on a raspberry Pi) and then trying extra hard to solve that unsolvable problem. We should either stop testing a lot of things or we should use appropriate resources for testing a lot of things quickly. Yes, we can reduce testing time by removing #ifdefs, but that is a lot of work and does not really scale. #ifdefs should be removed primarily for other reasons. Cheers, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Strategy about build farm nodes
On Mon, May 17, 2021 at 8:32 PM Alex Rousskov < rouss...@measurement-factory.com> wrote: > On 5/17/21 2:17 AM, Francesco Chemolli wrote: > > > Our Linux environments are docker containers on amd64, armv7l and arm64. > > On a roughly monthly cadence, I pull from our dockerfiles repo > > (https://github.com/kinkie/dockerfiles) and > > $ make all push > > Does that "make push" command automatically switch Jenkins CI to using > the new/pushed containers? Or is that a separate manual step? > Right now that's automatic. I'm pondering the best way to have a staging step; Docker supports versioning of images but we're using that feature to implement multiarch. In order to rely on that I'll need to change the naming conventions we rely on. > >>> What I would place on each individual dev is the case where a PR breaks > >>> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix, > >>> trunk-openbsd-matrix, trunk-freebsd-matrix builds > > >> I think you are saying that > >> some CI tests called trunk-matrix, trunk-arm32-matrix, > >> trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be > >> classified as _required_. > > > That is how I read the statement too. > > ... but it sounds like that is _not_ what you (Francesco) is actually > proposing because you are essentialy saying "that ideal would be > impractical" in the paragraph below. Assuming that you are not attacking > your own proposal, that means Amos and I do not know what your proposal > is -- we both guessed incorrectly... > The overarching objectives are: - focus on where most users probably are - allow iterating quickly giving some signal on tracked branches - allow landing quickly with more signal - be slow but exhaustive on every other platform we can cover. Do not block landing new code on less common or harder to test on platforms, but still rely on devs to own their changes there, too. > In a word of infinite resources and very efficient testing, sure. > > But in a space where a single os/compiler combo takes 2hrs on Linux and > > 4hrs on Freebsd or openbsd, and a full 5-pr-test takes 6 hours end to > > end, we need to optimize or making any of these requirements blocking > > would make these times get 4+ times larger (a full trunk-matrix takes > > just about a day on amd64, 2 days on arm64), and the complaint would > > then be that development or release is slowed down by the amount of > > testing done. > > FWIW, I think that full round of PR tests (i.e. one initial set plus one > staging set) should not exceed ~24 hours, _including_ any wait/queuing > time. This kind of delay should still allow for reasonable progress with > PRs AFAICT. This includes any release PRs, of course. If we exceed these > limits (or whatever limits we can agree on), then we should add testing > resources and/or drop tests. Is everyone okay with such a slow turnaround? > > My proposal aims to test/land the PR on the systems where we can be > > efficient and that are relevant, and fix any remaining after-the-fact > > issues with followup, PRs, that remain a soft requirement for the dev > > introducing the change. > > Here, I think you are proposing to make some tests optional. Which ones? > Not the tests, but the platforms. Things like gentoo, fedora rawhide, freebsd, openbsd. Testing is slower there, so results will lag and we need to batch, running a full test suite every week or so > > For instance: we currently have a lot of different build-time > > configurations meant to save core memory in runtime environments. Is it > > maybe time to revisit this decision and move these checks to run time? > > Sure, quality proposals for removing #ifdefs should be welcomed, one > (group of) #ifdefs at a time. We can warn squid-users in advance in case > somebody wants to provide evidence of harm. > I'm glad this is having buy-in. Are there any other opinions (for or against)? > > Unfortunately, one of the problems we have is that we're running blind. > > We don't know what configurations our users deploy; we can only assume > > and that makes this conversation much more susceptible to opinions and > > harder to build consensus on > > Yes, we are blind, but I doubt we should care much about actual > configuration in this specific context. If we can remove an #ifdef > without serious negative consequences, we should remove it. We can avoid > long discussions by allowing anybody with concrete evidence of problems > to block any particular removal. I bet that conservative approach would > still allow for the removal of many #ifdefs. > Sounds good to me FWIW, I do not think reducing the number of #ifdefs will solve our > primary CI problems. I believe we should reduce the number of platforms > we test on and then use Foundation resources to speed up and improve the > remaining tests. > Each test build is right now 4 full builds and test suites: autodetected, minimal, maximum, everytthing-in-except-for-auth Can we reduce them to 2? If
Re: [squid-dev] Strategy about build farm nodes
On 5/17/21 2:17 AM, Francesco Chemolli wrote: > Our Linux environments are docker containers on amd64, armv7l and arm64. > On a roughly monthly cadence, I pull from our dockerfiles repo > (https://github.com/kinkie/dockerfiles) and > $ make all push Does that "make push" command automatically switch Jenkins CI to using the new/pushed containers? Or is that a separate manual step? >>> What I would place on each individual dev is the case where a PR breaks >>> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix, >>> trunk-openbsd-matrix, trunk-freebsd-matrix builds >> I think you are saying that >> some CI tests called trunk-matrix, trunk-arm32-matrix, >> trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be >> classified as _required_. > That is how I read the statement too. ... but it sounds like that is _not_ what you (Francesco) is actually proposing because you are essentialy saying "that ideal would be impractical" in the paragraph below. Assuming that you are not attacking your own proposal, that means Amos and I do not know what your proposal is -- we both guessed incorrectly... > In a word of infinite resources and very efficient testing, sure. > But in a space where a single os/compiler combo takes 2hrs on Linux and > 4hrs on Freebsd or openbsd, and a full 5-pr-test takes 6 hours end to > end, we need to optimize or making any of these requirements blocking > would make these times get 4+ times larger (a full trunk-matrix takes > just about a day on amd64, 2 days on arm64), and the complaint would > then be that development or release is slowed down by the amount of > testing done. FWIW, I think that full round of PR tests (i.e. one initial set plus one staging set) should not exceed ~24 hours, _including_ any wait/queuing time. This kind of delay should still allow for reasonable progress with PRs AFAICT. This includes any release PRs, of course. If we exceed these limits (or whatever limits we can agree on), then we should add testing resources and/or drop tests. > My proposal aims to test/land the PR on the systems where we can be > efficient and that are relevant, and fix any remaining after-the-fact > issues with followup, PRs, that remain a soft requirement for the dev > introducing the change. Here, I think you are proposing to make some tests optional. Which ones? > For instance: we currently have a lot of different build-time > configurations meant to save core memory in runtime environments. Is it > maybe time to revisit this decision and move these checks to run time? Sure, quality proposals for removing #ifdefs should be welcomed, one (group of) #ifdefs at a time. We can warn squid-users in advance in case somebody wants to provide evidence of harm. > Unfortunately, one of the problems we have is that we're running blind. > We don't know what configurations our users deploy; we can only assume > and that makes this conversation much more susceptible to opinions and > harder to build consensus on Yes, we are blind, but I doubt we should care much about actual configuration in this specific context. If we can remove an #ifdef without serious negative consequences, we should remove it. We can avoid long discussions by allowing anybody with concrete evidence of problems to block any particular removal. I bet that conservative approach would still allow for the removal of many #ifdefs. FWIW, I do not think reducing the number of #ifdefs will solve our primary CI problems. I believe we should reduce the number of platforms we test on and then use Foundation resources to speed up and improve the remaining tests. HTH, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Strategy about build farm nodes
On 5/16/21 10:19 PM, squ...@treenet.co.nz wrote: > On 2021-05-17 11:56, Alex Rousskov wrote: >> On 5/16/21 3:31 AM, Amos Jeffries wrote: >>> On 4/05/21 2:29 am, Alex Rousskov wrote: The principles I have proposed allow upgrades that do not violate key invariants. For example, if a proposed upgrade would break master, then master has to be changed _before_ that upgrade actually happens, not after. Upgrades must not break master. >>> >>> So ... a node is added/upgraded. It runs and builds master fine. Then >>> added to the matrices some of the PRs start failing. >> >> It is easy to misunderstand what is going on because there is no good >> visualization of complex PR-master-Jenkins_nodes-Jenkins_failures >> relationships. Several kinds of PR test failures are possible. I will >> describe the two most relevant to your email: >> >> * PR test failures due to problems introduced by PRs should be welcomed >> at any time. > > Strawman here. This is both general statement and not relevant to CI > changes or design(s) we are discussing. The first bullet (out of the two bullets that are meant to be interpreted together, in their context) is not even close to being a strawman argument by any definition I can find. Neither is the combination of those two bullets. >> CI improvements are allowed to find new bugs in open PRs. > IMO the crux is that word "new". CI improvements very rarely find new > bugs. What it actually finds and intentionally so is *existing bugs* the > old CI config wrongly ignored. Here, I used "new" to mean bugs that had not been found by the previous CI version (i.e. "newly discovered" or "new to us"). These bugs existed before and after CI improvements, of course -- neither master nor the PR has changed -- but we did not know about them. >> * PR test failures due to the existing master code are not welcomed. > That is not as black/white as the statement above implies. There are > some master branch bugs we don't want to block PRs merging, and there > are some (rarely) we absolutely do not want any PRs to change master > until fixed. Yes, master bugs should not affect PR merging in the vast majority of cases, but that is not what this bullet is about at all! This bullet is about (a certain kind of) PR test failures. Hopefully, we do not need to revisit the debate whether PRs with failed tests should be merged. They should not be merged, which is exactly why PR test failures caused by the combination of CI changes and the existing master code are not welcomed -- they block progress of an innocent PR. >> They represent a CI failure. > > IMO this is absolutely false. The whole point of improving CI is to find > those "existing" bugs which the previous CI config wrong missed. Your second sentence is correct, but it does not make my statement false. CI should achieve several goals. Finding bugs (i.e. blocking buggy PRs) is one of them. Merging (i.e. not blocking) PRs that should be merged is another. And there are a few more goals, of course. The problem described in my second bullet represents a CI failure to reach one of the key CI goals or a failure to maintain a critical CI invariant. It is a CI failure rather than a PR failure (the latter is covered by the first bullet). > e.g. v4+ currently do not build on Windows. We know this, but the > current CI testing does not show it. Upgrading the CI to include a test > for Windows is not a "CI failure". If such an upgrade would result in blocking PRs that do not touch Windows code, then that upgrade would be a CI failure. Or a failure to properly upgrade CI. >> In these cases, if the latest master code >> is tested with the same test after the problematic CI change, then that >> master test will fail. Nothing a PR can do in this situation can fix >> this kind of failure because it is not PR changes that are causing the >> failure -- CI changes broke the master branch, > > Ah. "broke the master branch" is a bit excessive. master is not broken > any more or less than it already was. If you can suggest a better short phrase to describe the problem, please do so! Until then, I will have to continue to use "breaking master" (in this context) simply for the lack of a better phrase. I tried to explain what that phrase means to avoid misunderstanding. I cannot find a better way to compress that explanation into a single phrase. > What is *actually* broken is the CI test results. One can easily argue that CI test results are actually "correct" in this case -- they correctly discover a bug in the code that wants to become the next master. The problem is in the assignment of responsibility: The PR did not introduce that bug, so the PR should not be punished for that bug. The only way to avoid such punishment (given the natural automated CI limitations) is to avoid breaking master tests, as previously defined. > Otherwise, short periods between sysadmin thinking it was a safe change > and reverting as breakage appeared
Re: [squid-dev] Strategy about build farm nodes
> > > Adding new nodes with next distro release versions is a manual process > not related to keeping existing nodes up to date (which is automated?). > Mostly. Our Linux environments are docker containers on amd64, armv7l and arm64. On a roughly monthly cadence, I pull from our dockerfiles repo ( https://github.com/kinkie/dockerfiles) and $ make all push The resulting docker images are free for everybody to use and test things on on any docker system (https://hub.docker.com/r/squidcache/buildfarm). Just $ docker run -ti --rm -u jenkins squidcache/buildfarm:$(uname -m)- /bin/bash -l (note: the above command will not preserve any artifacts once the shell exits) Adding new Linux distros means copying and tweaking a Dockerfile, testing things, and updating our Jenkins jobs. I do it roughly every 6 months FreeBSD, OpenBSD and (hopefully soon) Windows are hand-managed and much slower changing VMs > >> What I would place on each individual dev is the case where a PR breaks > >> something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix, > >> trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test > >> and 5-pr-auto builds fail to detect the breakage because it happens on a > >> unstable or old platform. > > > This feels a bit out of topic for me, but I think you are saying that > > some CI tests called trunk-matrix, trunk-arm32-matrix, > > trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be > > classified as _required_. > > That is how I read the statement too. > In a word of infinite resources and very efficient testing, sure. But in a space where a single os/compiler combo takes 2hrs on Linux and 4hrs on Freebsd or openbsd, and a full 5-pr-test takes 6 hours end to end, we need to optimize or making any of these requirements blocking would make these times get 4+ times larger (a full trunk-matrix takes just about a day on amd64, 2 days on arm64), and the complaint would then be that development or release is slowed down by the amount of testing done. My proposal aims to test/land the PR on the systems where we can be efficient and that are relevant, and fix any remaining after-the-fact issues with followup, PRs, that remain a soft requirement for the dev introducing the change. The dev can test any work they're doing with the anybranch-* jobs, if they don't have access to that OS Can we do better? Sure. For the sake of cost-mindedness a lot of the build farm nodes run in my home - a couple of raspberry PIs, an Intel NUC, I'm in the process of purchasing a second (and second-hand) NUC that comes with a Windows license. The set up is meant to be thrifty, I'm mindful of burning Foundation resources for little gain and I'm running a balancing act between always-on VMs, on-demand VMs and my own gadgets. The real game changer would be rethinking how we do things to reduce the amount of testing needed. For instance: we currently have a lot of different build-time configurations meant to save core memory in runtime environments. Is it maybe time to revisit this decision and move these checks to run time? Unfortunately, one of the problems we have is that we're running blind. We don't know what configurations our users deploy; we can only assume and that makes this conversation much more susceptible to opinions and harder to build consensus on -- Francesco ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Strategy about build farm nodes
On 2021-05-17 11:56, Alex Rousskov wrote: On 5/16/21 3:31 AM, Amos Jeffries wrote: On 4/05/21 2:29 am, Alex Rousskov wrote: On 5/3/21 12:41 AM, Francesco Chemolli wrote: - we want our QA environment to match what users will use. For this reason, it is not sensible that we just stop upgrading our QA nodes, I see flaws in reasoning, but I do agree with the conclusion -- yes, we should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT! The principles I have proposed allow upgrades that do not violate key invariants. For example, if a proposed upgrade would break master, then master has to be changed _before_ that upgrade actually happens, not after. Upgrades must not break master. So ... a node is added/upgraded. It runs and builds master fine. Then added to the matrices some of the PRs start failing. It is easy to misunderstand what is going on because there is no good visualization of complex PR-master-Jenkins_nodes-Jenkins_failures relationships. Several kinds of PR test failures are possible. I will describe the two most relevant to your email: * PR test failures due to problems introduced by PRs should be welcomed at any time. Strawman here. This is both general statement and not relevant to CI changes or design(s) we are discussing. CI improvements are allowed to find new bugs in open PRs. IMO the crux is that word "new". CI improvements very rarely find new bugs. What it actually finds and intentionally so is *existing bugs* the old CI config wrongly ignored. Such findings, even when discovered at the "last minute", should be seen as an overall positive event or progress -- our CI was able to identify a problem before it got officially accepted! I do not recall anybody complaining about such failures recently. Conclusion being that due to the rarity of "new bugs" CI improvements very rarely get complained about due to them. * PR test failures due to the existing master code are not welcomed. That is not as black/white as the statement above implies. There are some master branch bugs we don't want to block PRs merging, and there are some (rarely) we absolutely do not want any PRs to change master until fixed. They represent a CI failure. IMO this is absolutely false. The whole point of improving CI is to find those "existing" bugs which the previous CI config wrong missed. e.g. v4+ currently do not build on Windows. We know this, but the current CI testing does not show it. Upgrading the CI to include a test for Windows is not a "CI failure". In these cases, if the latest master code is tested with the same test after the problematic CI change, then that master test will fail. Nothing a PR can do in this situation can fix this kind of failure because it is not PR changes that are causing the failure -- CI changes broke the master branch, Ah. "broke the master branch" is a bit excessive. master is not broken any more or less than it already was. What is *actually* broken is the CI test results. not just the PR! This kind of failures are the responsibility of CI administrators, and PR authors should complain about them, especially when there are no signs of CI administrators aware of and working on addressing the problem. *IF* all the conditions and assumptions contained in that final sentence are true I would agree. Such case points to incompetence or neglect on part of the sysadmin who broken *the CI test* then abandoned fixing it - complaints are reasonable there. [ Is kinkie acting incompetently on a regular basis? I think no. ] Otherwise, short periods between sysadmin thinking it was a safe change and reverting as breakage appeared is to be expected. That is why we have sysadmin doing advance notices for us all to be aware of CI changes planned. Complaints still happen, but not much reason to redesign the sysadmin practices and automation (which is yet more CI change, ...). A good example of a failure of the second kind a -Wrange-loop-construct error in a PR that does not touch any range loops (Jenkins conveniently deleted the actual failed test, but my GitHub comment and PR contents may be enough to restore what happened): https://github.com/squid-cache/squid/pull/806#issuecomment-827924821 Thank you. I see here two distros which have "rolling release" being updated by sysadmin from producing outdated and wrong test results, to producing correct test results. This is a correct change in line with the goal of our nodes representing what a user running that OS would see building Squid master or PRs. One distro changed compiler and both turned on a new warning by default which exposed existing Squid bugs. Exactly as intended. IMO we can expect to occur on a regular basis and it is specific to "rolling release" distros. We can resolve it by having those OS only build in the N-matrix applied before releases, instead of the matrix blocking PR tests or merging. If we are all agreed, kinkie
Re: [squid-dev] Strategy about build farm nodes
On 5/16/21 3:31 AM, Amos Jeffries wrote: > On 4/05/21 2:29 am, Alex Rousskov wrote: >> On 5/3/21 12:41 AM, Francesco Chemolli wrote: >>> - we want our QA environment to match what users will use. For this >>> reason, it is not sensible that we just stop upgrading our QA nodes, >> >> I see flaws in reasoning, but I do agree with the conclusion -- yes, we >> should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT! >> >> The principles I have proposed allow upgrades that do not violate key >> invariants. For example, if a proposed upgrade would break master, then >> master has to be changed _before_ that upgrade actually happens, not >> after. Upgrades must not break master. > > So ... a node is added/upgraded. It runs and builds master fine. Then > added to the matrices some of the PRs start failing. It is easy to misunderstand what is going on because there is no good visualization of complex PR-master-Jenkins_nodes-Jenkins_failures relationships. Several kinds of PR test failures are possible. I will describe the two most relevant to your email: * PR test failures due to problems introduced by PRs should be welcomed at any time. CI improvements are allowed to find new bugs in open PRs. Such findings, even when discovered at the "last minute", should be seen as an overall positive event or progress -- our CI was able to identify a problem before it got officially accepted! I do not recall anybody complaining about such failures recently. * PR test failures due to the existing master code are not welcomed. They represent a CI failure. In these cases, if the latest master code is tested with the same test after the problematic CI change, then that master test will fail. Nothing a PR can do in this situation can fix this kind of failure because it is not PR changes that are causing the failure -- CI changes broke the master branch, not just the PR! This kind of failures are the responsibility of CI administrators, and PR authors should complain about them, especially when there are no signs of CI administrators aware of and working on addressing the problem. A good example of a failure of the second kind a -Wrange-loop-construct error in a PR that does not touch any range loops (Jenkins conveniently deleted the actual failed test, but my GitHub comment and PR contents may be enough to restore what happened): https://github.com/squid-cache/squid/pull/806#issuecomment-827924821 [I am skipping the exaggerations (about other people exaggerations) that were, AFAICT, driven by mis-classifying the failures of the second kind as regular PR failures of the first kind.] >> What this means in terms of sysadmin steps for doing upgrades is up to >> you. You are doing the hard work here, so you can optimize it the way >> that works best for _you_. If really necessary, I would not even object >> to trial upgrades (that may break master for an hour or two) as long as >> you monitor the results and undo the breaking changes quickly and >> proactively (without relying on my pleas to fix Jenkins to detect >> breakages). I do not know what is feasible and what the best options >> are, but, again, it is up to _you_ how to optimize this (while observing >> the invariants). > Uhm. Respectfully, from my perspective the above paragraph conflicts > directly with actions taken. My paragraph is not really about any taken actions so there cannot be any such conflict AFAICT. > From what I can tell kinkie (as sysadmin) *has* been making a new node > and testing it first. Not just against master but the main branches and > most active PRs before adding it for the *post-merge* matrix testing > snapshot production. The above summary does not match the (second kind of) PR test failures that I have observed and asked Francesco to address. Those failures were triggered by the latest master code, not PR changes. No PR changes would have fixed those test failures. In fact, an "empty" PR that introduces no code changes at all would have failed as well. See above for one recent example. > But still threads like this one with complaints appear. This squid-dev thread did not contain any complaints AFAICT. Francesco is trying to get consensus on how to handle CI upgrades/changes. He is doing the right thing and nobody is complaining about it. > I understand there is some specific pain you have encountered to trigger > the complaint. Can we get down to documenting as exactly as possible > what the particular pain was? Well, I apparently do not know what you call "complaint", so I cannot answer this exact question, but if you are looking for a recent PR test failure that triggered this squid-dev thread, then please see the GitHub link above. > Test designs that do not fit into our merge and release process sequence > have proven time and again to be broken and painful to Alex when they > operate as-designed. For the rest of us it is this constant re-build of > automation which is the painful part. While I am not
Re: [squid-dev] Strategy about build farm nodes
On 4/05/21 2:29 am, Alex Rousskov wrote: On 5/3/21 12:41 AM, Francesco Chemolli wrote: - we want our QA environment to match what users will use. For this reason, it is not sensible that we just stop upgrading our QA nodes, I see flaws in reasoning, but I do agree with the conclusion -- yes, we should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT! The principles I have proposed allow upgrades that do not violate key invariants. For example, if a proposed upgrade would break master, then master has to be changed _before_ that upgrade actually happens, not after. Upgrades must not break master. So ... a node is added/upgraded. It runs and builds master fine. Then added to the matrices some of the PRs start failing. *THAT* is the situation I see happening recently. Master itself working fine and "huge amounts of pain, the sky is falling" complaints from a couple of people. Sky is not falling. Master is no more nor less broken and buggy than it was before sysadmin touched Jenkins. The PR itself is no more, nor less, "broken" than it would be if for example - it was only tested on Linux nodes and fails to compile on Windows. As the case for master *right now* happens to be. What this means in terms of sysadmin steps for doing upgrades is up to you. You are doing the hard work here, so you can optimize it the way that works best for _you_. If really necessary, I would not even object to trial upgrades (that may break master for an hour or two) as long as you monitor the results and undo the breaking changes quickly and proactively (without relying on my pleas to fix Jenkins to detect breakages). I do not know what is feasible and what the best options are, but, again, it is up to _you_ how to optimize this (while observing the invariants). Uhm. Respectfully, from my perspective the above paragraph conflicts directly with actions taken. From what I can tell kinkie (as sysadmin) *has* been making a new node and testing it first. Not just against master but the main branches and most active PRs before adding it for the *post-merge* matrix testing snapshot production. But still threads like this one with complaints appear. I understand there is some specific pain you have encountered to trigger the complaint. Can we get down to documenting as exactly as possible what the particular pain was? Much of the processes we are discussing are scripted automation not human processing mistakes. Handling such pain points as bugs with bugzilla "Project" section would be best. Re-designing the entire system policy just moves us all to another set of unknown bugs when the scripts are re-coded to meet that policy. - I believe we should define four tiers of runtime environments, and reflect these in our test setup: 1. current and stable (e.g. ubuntu-latest-lts). 2. current (e.g. fedora 34) 3. bleeding edge 4. everything else - this includes freebsd and openbsd I doubt this classification is important to anybody _outside_ this discussion, so I am OK with whatever classification you propose to satisfy your internal needs. IIRC this is the 5th iteration of ground-up redesign for this wheel. Test designs that do not fit into our merge and release process sequence have proven time and again to be broken and painful to Alex when they operate as-designed. For the rest of us it is this constant re-build of automation which is the painful part. A. dev pre-PR testing - random individual OS. - matrix of everything (anybranch-*-matrix) B. PR submission testing - which OS for master (5-pr-test) ? - which OS for beta (5-pr-test) ? - which OS for stable (5-pr-test) ? Are all of those sets the same identical OS+compilers? no. Why are they forced to be the same matrix test? IIRC, policy forced on sysadmin with previous pain complaints. Are we getting painful experiences from this? Yes. Lack of branch-specific testing before D on beta and stable causes those branches to break a lot more often at last-minute before releases than master. Adding random days/weeks to each scheduled release. C. merge testing - which OS for master (5-pr-auto) ? - which OS for beta (5-pr-auto) ? - which OS for stable (5-pr-auto) ? NP: maintainer does manual override on beta/stable merges. Are all of those sets the same identical OS+compilers? no. Why are they forced to be the same matrix test? Anubis Are we getting painful experiences from this? yes. see (B). D. pre-release testing (snapshots + formal) - which OS for master (trunk-matrix) ? - which OS for beta (5-matrix) ? - which OS for stable (4-matrix) ? Are all of those sets the same identical OS+compilers? no. Are we forcing them to use the same matrix test? no. Are we getting painful experiences from this? maybe. Most loud complaints have been about "breaking master" which is the most volatile branch testing on the most volatile OS. FTR: the reason all those
Re: [squid-dev] Strategy about build farm nodes
On 5/3/21 12:41 AM, Francesco Chemolli wrote: > - we want our QA environment to match what users will use. For this > reason, it is not sensible that we just stop upgrading our QA nodes, I see flaws in reasoning, but I do agree with the conclusion -- yes, we should upgrade QA nodes. Nobody has proposed a ban on upgrades AFAICT! The principles I have proposed allow upgrades that do not violate key invariants. For example, if a proposed upgrade would break master, then master has to be changed _before_ that upgrade actually happens, not after. Upgrades must not break master. What this means in terms of sysadmin steps for doing upgrades is up to you. You are doing the hard work here, so you can optimize it the way that works best for _you_. If really necessary, I would not even object to trial upgrades (that may break master for an hour or two) as long as you monitor the results and undo the breaking changes quickly and proactively (without relying on my pleas to fix Jenkins to detect breakages). I do not know what is feasible and what the best options are, but, again, it is up to _you_ how to optimize this (while observing the invariants). > - I believe we should define four tiers of runtime environments, and > reflect these in our test setup: > 1. current and stable (e.g. ubuntu-latest-lts). > 2. current (e.g. fedora 34) > 3. bleeding edge > 4. everything else - this includes freebsd and openbsd I doubt this classification is important to anybody _outside_ this discussion, so I am OK with whatever classification you propose to satisfy your internal needs. > I believe we should focus on the first two tiers for our merge workflow, > but then expect devs to fix any breakages in the third and fourth tiers > if caused by their PR, FWIW, I do not understand what "focus" implies in this statement, and why developers should _not_ "fix any breakages" revealed by the tests in the first two tiers. The rules I have in mind use two natural tiers: * If a PR cannot pass a required CI test, that PR has to change before it can be merged. * If a PR cannot pass an optional CI test, it is up to PR author and reviewers to decide what to do next. These are very simple rules that do not require developer knowledge of any complex test node tiers that we might define/use internally. Needless to say, the rules assume that the tests themselves are correct. If not, the broken tests need to be fixed (by the Squid Project) before the first bullet/rule above can be meaningfully applied (the second one is flexible enough to allow PR author and reviewers to ignore optional test failures). > Breakages due to changes in nodes (e.g. introducing a new distro > version) would be on me and would not stop the merge workflow. What you do internally to _avoid_ breakage is up to you, but the primary goal is to _prevent_ CI breakage (rather than to keep CI nodes "up to date"!). If CI is broken, then it is the responsibility of the Squid Project to fix CI ASAP. Thank you for assuming that responsibility as far as Jenkins tests are concerned. There are many ways to break CI and detect those breakages, of course, but if master cannot pass required tests after a CI change, then the change broke CI. > What I would place on each individual dev is the case where a PR breaks > something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix, > trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test > and 5-pr-auto builds fail to detect the breakage because it happens on a > unstable or old platform. This feels a bit out of topic for me, but I think you are saying that some CI tests called trunk-matrix, trunk-arm32-matrix, trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix should be classified as _required_. In other words, a PR must pass those CI tests before it can be merged. Is that the situation today? Or are you proposing some changes to the list of required CI tests? What are those changes? Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Strategy about build farm nodes
On Wed, Apr 28, 2021 at 11:34 PM Alex Rousskov < rouss...@measurement-factory.com> wrote: > On 4/28/21 5:12 PM, Amos Jeffries wrote: > > I'm not sure why this is so controversial still. We have already been > > over these and have a policy from last time: > > Apparently, the recollections of what was agreed upon, if anything, > during that "last time" differ. If you can provide a pointer to that > "last time" agreement, please do so. > Merge workflows are agreed, and not in discussion. Recent discussions have highlighted some issues with what's around them, and I'm trying to clarify that as well > * dev PR submissions use the volatile 5-pr-test, after test approval by > > anyone in QA. Check against unstable OS nodes, as many as possible. > > Kinkie adds/removes from that set as upgrades fix or break at CI end of > > things. > > I do not know how to interpret the last sentence correctly, but, IMO, we > should not add or change nodes if doing so breaks master tests. AFAICT > from PR 806 discussion[1], Francesco thinks that it is not a big deal to > do so. The current discussion is meant to resolve that disagreement. > Let me highlight the underlying principles for my proposal: IMO our objectives are, in descending order of importance (all points should be intended "as possible given our resources"): 1. ensure we ship healthy code to a maximum number of users 2. have minimal friction in the development workflow These objectives have a set of consequences: - we want our QA environment to match what users will use. For this reason, it is not sensible that we just stop upgrading our QA nodes, or we would target something that doesn't match our users' experience - it makes little sense to target unstable distributions (fedora rawhide, possibly centos stream, gentoo, opensuse tumbleweed, debian unstable) as first-class citizens of the testing workflow, especially on stages that are executed often (pr-test) This means that: - I periodically weed out distributions that are no longer supported (e.g. Fedora 31, Ubuntu Xenial) and add current distribution (e.g. Ubuntu Hirsute, Fedora 34). I take it on me that when I do that, I need to ensure new compiler features do not block previously undetected behaviours - I am currently failing this, see https://build.squid-cache.org/job/trunk-matrix/121/ . I will need to develop a process with a proper staging phase. - I believe we should define four tiers of runtime environments, and reflect these in our test setup: 1. current and stable (e.g. ubuntu-latest-lts). These are not expected to change much over a span of years, and to offer non-breaking updates over their lifetime 2. current (e.g. fedora 34) 3. bleeding edge: they may introduce breaking changes which it makes sense to follow because they might highlight real issues and because they will eventually trickle down to current and then lts 4. everything else - this includes freebsd and openbsd (mostly due to the different virtualization tech they use) I believe we should focus on the first two tiers for our merge workflow, but then expect devs to fix any breakages in the third and fourth tiers if caused by their PR, while I will care for any breakages caused by dist-upgrades > [1] https://github.com/squid-cache/squid/pull/806#issuecomment-827937563 > > > > * anubis auto branch tested by curated set of LTS stable nodes only. > > FWIW, the above list and the original list by Francesco appears to focus > on distro stability, popularity, and other factors that are largely > irrelevant to the disagreement at hand. The disagreement is whether it > is OK to break master (and, hence, all PR) tests by changing CI. It does > not matter whether that CI change comes from an upgrade of an "LTS > stable node", "unstable node", or some other source IMO. Breaking > changes should not be allowed (in the CI environments under our > control). If they slip through despite careful monitoring for change > effects, the breaking CI changes should be reverted. > I think it depends. Breakages due to changes in nodes (e.g. introducing a new distro version) would be on me and would not stop the merge workflow. What I would place on each individual dev is the case where a PR breaks something in the trunk-matrix,trunk-arm32-matrix, trunk-arm64-matrix, trunk-openbsd-matrix, trunk-freebsd-matrix builds, even if the 5-pr-test and 5-pr-auto builds fail to detect the breakage because it happens on a unstable or old platform. -- Francesco ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Strategy about build farm nodes
On 4/28/21 5:12 PM, Amos Jeffries wrote: > I'm not sure why this is so controversial still. We have already been > over these and have a policy from last time: Apparently, the recollections of what was agreed upon, if anything, during that "last time" differ. If you can provide a pointer to that "last time" agreement, please do so. > * dev PR submissions use the volatile 5-pr-test, after test approval by > anyone in QA. Check against unstable OS nodes, as many as possible. > Kinkie adds/removes from that set as upgrades fix or break at CI end of > things. I do not know how to interpret the last sentence correctly, but, IMO, we should not add or change nodes if doing so breaks master tests. AFAICT from PR 806 discussion[1], Francesco thinks that it is not a big deal to do so. The current discussion is meant to resolve that disagreement. [1] https://github.com/squid-cache/squid/pull/806#issuecomment-827937563 > * anubis auto branch tested by curated set of LTS stable nodes only. FWIW, the above list and the original list by Francesco appears to focus on distro stability, popularity, and other factors that are largely irrelevant to the disagreement at hand. The disagreement is whether it is OK to break master (and, hence, all PR) tests by changing CI. It does not matter whether that CI change comes from an upgrade of an "LTS stable node", "unstable node", or some other source IMO. Breaking changes should not be allowed (in the CI environments under our control). If they slip through despite careful monitoring for change effects, the breaking CI changes should be reverted. Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Strategy about build farm nodes
I'm not sure why this is so controversial still. We have already been over these and have a policy from last time: * dev PR submissions use the volatile 5-pr-test, after test approval by anyone in QA. Check against unstable OS nodes, as many as possible. Kinkie adds/removes from that set as upgrades fix or break at CI end of things.* anubis auto branch tested by curated set of LTS stable nodes only.* website snapshot builds tested against curated set of all nodes that are known to work well building the relevant branch in the past. No recently updated nodes.Amos___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Strategy about build farm nodes
On 4/28/21 1:45 AM, Francesco Chemolli wrote: > I'm moving here the discussion from PR #806 about what strategy to > have for CI tests, looking for an agreement. > We have 3 classes of tests ni our CI farm > (https://build.squid-cache.org/) > - PR staging tests, triggered by commit hooks on GitHub (possibly with > human approval) > the job is 5-pr-auto (misnamed, it tests trunk with the PR applied). FYI: The "auto" branch is "master branch + PR". When PR is merged, master becomes identical to the auto branch. The name "auto" is fairly common in CI AFAICT. In Anubis (and relatively widespread/common) terminology[1], the words "staged" and "staging" mean what you describe as "trunk with the PR applied" below. [1] https://github.com/measurement-factory/anubis/blob/master/README.md#pull-request-merging-algorithm AFAICT, this class should _follow_ the 5-pr-test class (described below) because a PR is not staged until it is approved and passes 5-pr-test checks. > To be successful, it needs to successfully compile and unit-test all > combinations of clang and gcc on the latest stable version of the most > popular linux distros, at this time centos-8, debian-testing, > fedora-33, opensuse-leap, ubuntu-focal > - PR build tests run , after a PR is approved, also triggered by GitHub > the job is 5-pr-test AFAICT, this job tests the "GitHub-generated merge commit for the PR branch". That commit may become stale if master advances while there were no PR branch commits. Unlike the GitHub-generated merge commit, the "auto" branch (mentioned above) is kept in sync with master by Anubis. We do not merge the GitHub-generated merge commit into master. If 5-pr-test requires a PR approval, then something probably went wrong somewhere. Testing PR branch or the GitHub-generated merge commit should not require a PR approval. Perhaps you meant the "OK to test" _testing_ approval. That testing approval is indeed required and expected for testing GitHub-generated merge commits on vulnerable Jenkins. > To be successful, it needs to compile and unit-test all combinations > of clang and gcc on LTS and most recent versions of most popular linux > distros, at this time: centos-7, > debian-stable, debian-unstable, fedora-32, fedora-34, ubuntu-bionic, > ubuntu-hirsute > - full-scale build tests (jobs: trunk-matrix, trunk-arm32-matrix, > trunk-arm64-matrix, trunk-freebsd13-clang-matrix, trunk-openbsd-matrix) > these test everything we can test, including bleeding edge distros > such as getntoo, rawhide, tumbleweed, and the latest compilers. Please clarify what branches/commits these "full-scale build tests" test. Ideally, all this should be documented somewhere on Squid wiki! > Failing > these will not block a PR from mergeing, but there is an expectation > that build regressions will be fixed If we expect a fix, then the failure has to block something important, like the responsible PR (ideally), the next numbered release, or all PRs. What does this failure block today, if anything? > The change in policy since last week There was no change in any Project policies AFAIK. You have changed Jenkins setup, but that is not a change in policy. It is likely that there is no policy at all right now -- just ad hoc decisions by sysadmins. It would be a stretch to call that current approach a "policy", but even if we do call it that, then that "policy" has not "changed" :-). Alternatively, Jenkins configuration changes have violated a Project policy (one cannot _change_ a Project policy unilaterally; one can only violate it). I prefer the "no policy" version above though :-). > is that the PR-blocking builds used > to depend on unstabledistros (fedora-rawhide and opensuse-tumbleweed), > I've removed that today as part of the discussion on PR #806 > This policy would allow keeping stable distros uptodate and matching > expected user experience, while not introducing instability that would > come in from the unstable distros > One distro that's notably missing is centos-stream, this is due to > technical issues with getting a good docker image for it, when available > I'll add it > Would this work as a general policy? I am not sure what exact general policy you are proposing -- please extract it from the lengthy prose above (removing the ever-changing low-level specifics to the extent possible, to arrive at something truly "general"). FWIW, if the fleshed out proposal violates the existing "do not break master" principle below, then I do not think it will work well. I suspect that a focus on vague optimization targets like "expected user experience" will not allow us to formulate a usable policy we can all agree on, but I would be happy to be proven wrong. I can propose the following. Perhaps knowing my state of mind will help you finalize your proposal. We already have a "do not break master" principle: * Master always passes all current development-related tests. I propose to add the following
[squid-dev] Strategy about build farm nodes
Hi all, I'm moving here the discussion from PR #806 about what strategy to have for CI tests, looking for an agreement. We have 3 classes of tests ni our CI farm (https://build.squid-cache.org/) - PR staging tests, triggered by commit hooks on GitHub (possibly with human approval) the job is 5-pr-auto (misnamed, it tests trunk with the PR applied). To be successful, it needs to successfully compile and unit-test all combinations of clang and gcc on the latest stable version of the most popular linux distros, at this time centos-8, debian-testing, fedora-33, opensuse-leap, ubuntu-focal - PR build tests run , after a PR is approved, also triggered by GitHub the job is 5-pr-test To be successful, it needs to compile and unit-test all combinations of clang and gcc on LTS and most recent versions of most popular linux distros, at this time: centos-7, debian-stable, debian-unstable, fedora-32, fedora-34, ubuntu-bionic, ubuntu-hirsute - full-scale build tests (jobs: trunk-matrix, trunk-arm32-matrix, trunk-arm64-matrix, trunk-freebsd13-clang-matrix, trunk-openbsd-matrix) these test everything we can test, including bleeding edge distros such as getntoo, rawhide, tumbleweed, and the latest compilers. Failing these will not block a PR from mergeing, but there is an expectation that build regressions will be fixed The change in policy since last week is that the PR-blocking builds used to depend on unstabledistros (fedora-rawhide and opensuse-tumbleweed), I've removed that today as part of the discussion on PR #806 This policy would allow keeping stable distros uptodate and matching expected user experience, while not introducing instability that would come in from the unstable distros One distro that's notably missing is centos-stream, this is due to technical issues with getting a good docker image for it, when available I'll add it Would this work as a general policy? If there is agreement, I'll support it moving forward Thanks -- Francesco ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev