On 11/16/23 09:51, Dumitru Ceara wrote: > On 11/16/23 09:44, Ales Musil wrote: >> On Thu, Nov 16, 2023 at 9:34 AM Dumitru Ceara <[email protected]> wrote: >> >>> On 11/16/23 00:12, Ilya Maximets wrote: >>>> On 11/15/23 22:20, Mark Michelson wrote: >>>>> On 11/15/23 12:02, Ilya Maximets wrote: >>>>>> On 11/15/23 17:00, Dumitru Ceara wrote: >>>>>>> All of the above were changed to track the latest available releases. >>>>>>> Initially that seemed like a good idea but in practice, a new release >>> would >>>>>>> potentially (silently) cause CI runs that used to pass on given stable >>>>>>> versions to unexpectedly start failing. >>>>>>> >>>>>>> To address that this series pins all versions we can control allowing >>> us >>>>>>> to use different values for different branches. >>>>>>> >>>>>>> NOTE: the last commit of this series will look slightly different on >>>>>>> stable branches, e.g., on branches <= 23.06 we need Python <= 3.11 and >>>>>>> Fedora 37. >>>>>>> >>>>>>> The first commit of the series bumps the OVS submodule to include the >>>>>>> latest fixes to errors reported by recent versions of flake8. >>>>>>> >>>>>>> Changes in v2: >>>>>>> - added first patch to bump OVS submodule >>>>>>> - addressed Ales' review comments: >>>>>>> - moved the logic to determine which image to use out of ci.sh; >>>>>>> it's now in the workflow itself. >>>>>>> - moved setting of OVN_IMAGE in the same block with all related >>>>>>> env vars >>>>>>> - added a note for maintainers in the release process documentation >>>>>>> to mention that the new workflow will likely have to be triggered >>>>>>> manually (at least once) when branching for a new release. GitHub >>>>>>> doesn't allow periodic jobs on non-default branches. >>>>>> >>>>>> IMHO, that sounds horrible to maintain. Can we just build things per >>> run? >>>>> >>>>> What if we set up the "Containers" action to run on branch creation? >>>>> Would that adequately substitute for the manual run that creates the >>> image? >>>>> >>>>> If we only have to run the action once each time a branch is created, I >>>>> don't think that qualifies as "horrible" to maintain. This could be >>>>> lumped into our branch creation scripts. However, if it can be >>> automated >>>>> as I suggested above, that would be better. >>>> >>>> The job will never be re-run, meaning we'll be using the same container >>>> for many years. In case we'll need an update for this container, it will >>>> need to be done manually for each branch, except for main. >>>> >>>> Also, in the patch, the Cirrus CI is set to use the container from main, >>>> so this will need to be adjusted for each branch, will need to become >>>> part of the release patches (?). >>>> >>>> Another issue with current implementation of containers is that it's >>>> a pain for external contributor to change dependencies, because they'll >>>> have to modify the CI in multiple places in order to build and use their >>>> own containers. >>>> And with these new changes it will become worse. For example, I never >>> use >>>> 'main' or 'branch-*' branches in my own forks. And CI depends on the >>> branch >>>> name now, and it will fall back to an unmaintained old-style-named image. >>>> Might not be easy to figure out why all your builds are failing, >>> especially >>>> if you're new to the project. >>>> >>> >>> We could try to make it build the container image every time on stable >>> branches and avoid any manual steps. >>> >>>>> >>>>>> How much value these containers actually have? >>> >>> I think you mentioned the benefit lower already: >>> - we can share the artifact (container image) with Cirrus CI; aarch64 >>> container build takes significantly longer >>> - we can share the artifact with downstream CIs >>> >>>>> >>>>> As patch 2 mentions, it allows for us to use different distro bases for >>>>> each version. >>>> >>>> This can be done by simply specifying 'container: name' in the job, >>>> GHA will fetch and use that container. >>>> >>>>> But I also think that having the container image cached >>>>> allows for quicker test runs since we are not having to reconstruct the >>>>> environment each time we perform test runs. >>>> >>>> It takes less than 2 minutes to create a full x86 environment. We spend >>>> more time re-building OVS and OVN for every variant of tests, even though >>>> most of them are using the same compiler flags. >>>> >>>> Anyways, there is no need to push the container into registry, we can >>> build >>>> it and pass to a next job via artifacts. I don't think spending extra >>> 1-2 >>>> minutes per workflow (not job, workflow) is a significant problem, if it >>>> allows to just build whatever the current dependencies are and not think >>>> about what kind of outdated build from which branch is going to be used. >>>> >>>> The only downside of this is that we can't share artifact with Cirrus CI >>>> or some downstream CIs, but that also doesn't seem to be a significant >>>> issue. 1-2 minutes per workflow is not that much. Building the aarch64 >>>> container takes a lot more time, but switching it from fedora to ubuntu >>>> takes only 10 minutes to build, so might be reasonable to just rebuild it >>>> on-demand as well. >>>> >>> >>> Ales, why do we need the Cirrus CI image to be Fedora based? >>> >> >> We don't, but it's way easier for me to navigate. I'm not even sure how/if >> Ubuntu one works for aarch64. We can switch that one, however I don't see >> any clear benefit in doing so. Also one thing to note, even if it's 1-2 > > Well, it takes way less time to build, that might be an argument for it. > >> minutes (it's probably slightly more) we are losing the ability to have >> reproducible builds. It's very easy today to take the image and reproduce >> the same issue IMO. >> > > That's a good point, reproducible builds are important. Storing the > image as GitHub CI artifact would still allow us to download it and load > it locally but it's way easier if it's just stored in the ghcr.io register. > >> >>> >>>> Not a strong opinion, but the branch name thing might be the most >>> annoying, >>>> if I understand correctly. >>>> >>> >>> On the (very) short term however, my problem is that I can't merge any >>> bug fix in OVN because I can't run any CI at the moment. Mostly that's >>> because of different versions of OVN (and OVS via submodule) python code >>> that triggers different flake8 warnings with different >>> flake8/python/distro versions. >>> >>> A quick fix for that for now could be to bump OVS submodule again to >>> fdbf0bb2aed5 ("flake8: Fix E721 check failures.") (which is what patch >>> 1/3 is doing) but on _all_ stable branches. That's not ideal because >>> that commit (and its backports) are not part of any release yet. >>> >>> I wanted to avoid doing that but it seems we don't have an easy >>> alternative so I'm starting to think we should just go for the OVS >>> submodule bump and try to find a better solution for the container >>> workflow on the longer term. >>>
Just in case I wasn't clear, what I meant here is applying patch 1/3 to main and its corresponding stable branch backports bumping the submodule to the tips of OVS branch-3.2, branch-3.1 and branch-3.0. >>> What do you guys think? >>> >> >> I'm afraid that we will constantly chase those ghosts of different >> dependency problems for various branches. And I'm not sure what would be >> the proper way to handle that. The unfortunate thing is that we don't >> control directly OvS so some of those "hacks" might be always necessary to >> have it working. >> > > Isn't that a different discussion? Partially because OVN uses > "internal" OVS structures directly. > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
