On Sun, Mar 31, 2024 at 12:05:03PM +0200, Thibaut wrote: > > > Le 31 mars 2024 à 01:07, Elliott Mitchell <[email protected]> a écrit : > > > >> Normally upstream publishes release tarballs that are different than the > >> automatically generated ones in GitHub. In these modified tarballs, a > >> malicious version of build-to-host.m4 is included to execute a script > >> during the build process. > > > > So the malicious source code was part of all tarballs, but only the > > tarballs with the modified `build-to-host.m4` would trigger the malicious > > payload. > > > > So obtaining GitHub's tarballs which came directly from the Git > > repository *does* avoid the breach. > > https://git.tukaani.org/?p=xz.git;a=commitdiff;h=f9cf4c05edd14dedfe63833f8ccbe41b55823b00 > > Let’s not lure ourselves into thinking that not using upstream-provided > tarballs but upstream-provided repo instead is inherently safer. With > adversarial upstream, *nothing* is safe anyway.
Just using git checkouts (or **repoducible** tarballs generated from a repo's git-ref, ie. tag or commit) by itself of course doesn't help much. But for myself, maintaining a medium 2-digit number of packages, using git checkouts (or **reproducible** tarballs generated from git checkouts) would mean that I can at least be sure that the git commits I've been seeing and the diff between version tags **would really correspond to the content of tarball**, without having to put extra work just into that (which imho nobody does). I've never claimed that this alone is the solution, but if we are already used to a) the content of a release tarball not matching the git repo (because of `make dist` autotools nonsense, for example), b) the hash of such tarball being different depending on who generates it with subtle difference such as the folder name, c) people all the time "fix" PKG_MIRROR_HASH without anyone having any option to validate the cause for the "wrong" hash in first place. Then the added security of PKG_HASH and esp. PKG_MIRROR_HASH is very small. Too small, if you ask me. And other than the complex social/economical/political problems which lead to something like the xz backdoor (out of question: those are the bigger problems), that's a technical problem we could quite easily improve **and it would have been sufficient to prevent the attack** in this case. There is a reason the attacker(s) went through great lengths to move the official mirror site of the project, change the PGP key and hide the key piece of the exploit in the tarballs they generated (and signed) instead of in a git commit. This is not by chance. What we need is "Reproducible Source/Release Tarballs", not as a solution to all our problems, but as a **pre-condition** which currently isn't met for obvious reasons. Hence I'm still arguing that the lesser resource use of downloading Github archive/codeload/release tarballs is not worth the loss of integrity and audit-trail of git. Yes, I know SHA-1 is outdated, but in the context of git it's not so easy to add lots of random padding which would be required to generate a hash collission, which has yet to be seen even for contexts with much more freedom than the narrow syntax of a git diff (and commit message). So sure, it's not perfect, but it's better than nothing. And while release tarballs (being *delibertely* different from the content of the source repo at their corresponding tag for things like an added VERSION or ChangeLog file or stuff like that which is information the build process could otherwise learn from .git) have some small arguable value, hard or impossible to reproduce Github-generated tarballs really do NOT have any value. They are an obstacle, and lure people into bad practices such as all those "Fix PKG_MIRROR_HASH" commits which become the norm (and should really not). And regarding the first case (deliberately added VERSION or ChangeLog information and such) we should aim for a **standardized** way to do add them in a **reproducible** way. But that's a longer story, and certainly boring and trivial, but worth debating never the less. On the other hand, what does "maintained" actually mean in the context of an OpenWrt package? I can be anything from 0. I'm not even using this, don't understand the language it is written in. Just somehow ended up maintaining it. 1. I occassionally bump the version to the newest release or merge PRs of other people suggesting that. 2. I actually validate GPG signatures while bumping the release. 3. I follow up on git history of that project between releases. 4. I have at least rough understanding of the code and purpose of each file of that project. 5. I've contributed to that project myself in the past. 6. I at least quickly read git diff of that project between releases. 7. I study each commit at the time it is made. [...] up to X. I'm the author, I've written that code, I know the reason for every line of code to be there. Obviously also (X) is also kinda problematic, the sweet-spot is somewhere around (6) or (7) imho. But I must admit that also for most packages I maintain the level of maintainance is often closer to (4), sometimes just (2). However, we should probably define that in some kind of "maintainer guideline" somewhere, and give maintainers the options to communicate (as a self-assessment) the level of maintainance they put into a package. > > And even when upstream repo isn’t entirely under adversarial control, a bad > actor can sneak stuff in: > https://github.com/libarchive/libarchive/commit/6110e9c82d8ba830c3440f36b990483ceaaea52c I've seen that, and by itself it does not present a security risk in the context libarchive is intended to be used. libarchive is not thread-safe, has never been intended to be used in a multi-threaded context. Probably the actor just wanted to find out how well suggested changes are being reviewed and how deep the knowledge of the reviewers goes... _______________________________________________ openwrt-devel mailing list [email protected] https://lists.openwrt.org/mailman/listinfo/openwrt-devel
