Re: De-vendoring gnulib in Debian packages
Bruno Haible writes: > Simon Josefsson wrote: >> Finally, while this is somewhat gnulib specific, I think the practice >> goes beyond gnulib > > Yes, gnulib-tool for modules written in C is similar to > > * 'npm install' for JavaScript source code packages [1], > * 'cargo fetch' for Rust source code packages [2], > > except that gnulib-tool is simpler: it fetches from a single source location > only. > > How does Debian handle these kinds of source-code dependencies? I don't know the details but I believe those commands are turned into local requests for source code, either vendored or previously packaged in Debian. No network access during builds. Same for Go packages, which I have some experience with, although for Go packages they lose the strict versioning so if Go package X declare a depedency on package Y version Z then on Debian it may build against version Z+1 or Z+2 which may in theory break and was not upstream's intended or supported configuration. We have a circular dependency situation for some core Go libraries in Debian right now due to this. I think fundamentally the shift that causes challenges for distributions may be dealing with packages dependencies that are version >= X to package dependencies that are version = X. If there is a desire to support that, some new patterns of the work flow is needed. Some package maintainers reject this approach and refuse to co-operate with those upstreams, but I'm not sure if this is a long-term winning strategy: it often just lead to useful projects not being available through distributions, and users suffers as a result. /Simon signature.asc Description: PGP signature
Re: De-vendoring gnulib in Debian packages
On 2024-05-11 07:09, Simon Josefsson via Gnulib discussion list wrote: I would assume that (some stripped down version of) git is a requirement to do any useful work on any platform these days, so maybe it isn't a problem Yes, my impression also is that Git has migrated into the realm of cc/gcc in that everybody has it, so it can depend indirectly on a possibly earlier version of itself. Although it is worrisome that our collective trusted computing base keeps growing, let's face it, if there's a security bug in Git we're all in big trouble anyway.
Re: De-vendoring gnulib in Debian packages
Simon Josefsson wrote: > Finally, while this is somewhat gnulib specific, I think the practice > goes beyond gnulib Yes, gnulib-tool for modules written in C is similar to * 'npm install' for JavaScript source code packages [1], * 'cargo fetch' for Rust source code packages [2], except that gnulib-tool is simpler: it fetches from a single source location only. How does Debian handle these kinds of source-code dependencies? Bruno [1] https://nodejs.org/en/learn/getting-started/an-introduction-to-the-npm-package-manager [2] https://doc.rust-lang.org/cargo/commands/cargo-fetch.html
De-vendoring gnulib in Debian packages
All, (going out to both debian-devel and bug-gnulib, please be respectful of each community's different perspectives and trim Cc when focus shifts to any Debian or gnulib specific topics) (please pardon the accidental duplicate post to bug-gnulib...) The content of upstream source code releases can largely be categorized into 1) the actual native source-code from the upstream supplier, 2) pre-generated artifacts from build tools (e.g., ./configure script) and 3) third-party maintained source code (e.g., config.guess or getopt.c). The files in 3) may be referred to as "vendoring". The habit of including vendored and pre-generated artifacts is a powerful and successful way to make release tarballs usable for users, going back to the 1980's. This habit pose some challenges for packaging though: 1) Pre-generated files (e.g., ./configure) should be re-generated to make sure the package is built from source code, and to allow patches on the toolchain used to generate the pre-generated files to have any effect. Otherwise we risk using pre-generated files created using non-free or non-public tools, which if I understand correctly against Debian main policy. Verifying that this happens for all pre-generated files in an upstream tarball is complicated, fragile and tedious work. I think it is simple to find examples of mistakes in this area even for important top-popcon Debian packages. The current approach of running autoreconf -fi is based on a misunderstanding: autoreconf -fi is documented to not replace certain files with newer versions: https://lists.nongnu.org/archive/html/bug-gnulib/2024-04/msg00052.html 2) If a security problem in vendored code is discovered, the security team may have to patch 50+ packages if the vendor origin is popular. Maybe even different versions of the same vendored code has to be patched. 3) Auditing the difference between the tarball and what is stored in upstream version control system (VCS) is challenging. The xz incident exploited the fact that some pre-generated files aren't included in upstream VCS. Some upstream react to this by adding all pre-generated artifacts to VCS -- OpenSSH seems to take the route of adding the generated ./configure script to git, which moves that file from 3) to 1) but the problem is remaining. 4) Auditing for license compliance is challenging, since not only do we have to audit all upstream's code but we also have to audit the license of pre-generated files and vendored source-code. There are probably more problems involved, and probably better ways to articulate the problems than what I managed to do above. The Go and Rust ecosystems solve some of these issues, which has other consequences for packaging. We have largely ignored that the same challenges apply to many C packages, and I'm focusing on those that uses gnulib -- https://www.gnu.org/software/gnulib/ -- gzip, tar, grep, m4, sed, bison, awk, coreutils, grub, libiconv, libtasn1, libidn2, inetutils, etc: https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=users.txt Solving all of the problems for all packages will require some work and will take time. I've started to see if we can make progress on the gnulib-related packages. I'm speaking as contributor to gnulib and maintainer of a couple of Debian packages, but still learning to navigate -- the purpose of this post is to describe what I've done for libntlm and ask for feddback to hopefully make this into a re-usable pattern that can be applied to more packages. It would be great to improve collaboration on these topics between GNU and Debian. So let's turn this post into a recipe for Debian maintainers of packages that use gnulib to follow for their packages. I'm assuming git for now on, but feel free to mentally s/git/$VCS/. The first step is to establish an upstream tarball that you want to work with. There are too many opinions floating around on this to make any single solution a pre-requisite so here are the different approaches I can identify, ordered by my own preference, and the considerations with each. 1) Use upstream's PGP signed git-archive tarball. See my recent blog posts for this new approach. The key property here is that there is no need to audit difference between upstream tarball and upstream git. https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/ https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/ 2) Use upstream's PGP signed tarball. This is the current most common and recommended approach, as far as I know. 3) Create a PGP signed git-archive tarball. If upstream doesn't publish PGP signed tarballs, or if there is a preference from upstream or from you as Debian package maintainer to not do 1) or 2), then create a minimal source-only copy of the git archive and sign
De-vendoring gnulib in Debian packages
All, (going out to both debian-devel and bug-gnulib, please be respectful of each community's different perspectives and trim Cc when focus shifts to any Debian or gnulib specific topics) The content of upstream source code releases can largely be categorized into 1) the actual native source-code from the upstream supplier, 2) pre-generated artifacts from build tools (e.g., ./configure script) and 3) third-party maintained source code (e.g., config.guess or getopt.c). The files in 3) may be referred to as "vendoring". The habit of including vendored and pre-generated artifacts is a powerful and successful way to make release tarballs usable for users, going back to the 1980's. This habit pose some challenges for packaging though: 1) Pre-generated files (e.g., ./configure) should be re-generated to make sure the package is built from source code, and to allow patches on the toolchain used to generate the pre-generated files to have any effect. Otherwise we risk using pre-generated files created using non-free or non-public tools, which if I understand correctly against Debian main policy. Verifying that this happens for all pre-generated files in an upstream tarball is complicated, fragile and tedious work. I think it is simple to find examples of mistakes in this area even for important top-popcon Debian packages. The current approach of running autoreconf -fi is based on a misunderstanding: autoreconf -fi is documented to not replace certain files with newer versions: https://lists.nongnu.org/archive/html/bug-gnulib/2024-04/msg00052.html 2) If a security problem in vendored code is discovered, the security team may have to patch 50+ packages if the vendor origin is popular. Maybe even different versions of the same vendored code has to be patched. 3) Auditing the difference between the tarball and what is stored in upstream version control system (VCS) is challenging. The xz incident exploited the fact that some pre-generated files aren't included in upstream VCS. Some upstream react to this by adding all pre-generated artifacts to VCS -- OpenSSH seems to take the route of adding the generated ./configure script to git, which moves that file from 3) to 1) but the problem is remaining. 4) Auditing for license compliance is challenging, since not only do we have to audit all upstream's code but we also have to audit the license of pre-generated files and vendored source-code. There are probably more problems involved, and probably better ways to articulate the problems than what I managed to do above. The Go and Rust ecosystems solve some of these issues, which has other consequences for packaging. We have largely ignored that the same challenges apply to many C packages, and I'm focusing on those that uses gnulib -- https://www.gnu.org/software/gnulib/ -- gzip, tar, grep, m4, sed, bison, awk, coreutils, grub, libiconv, libtasn1, libidn2, inetutils, etc: https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=users.txt Solving all of the problems for all packages will require some work and will take time. I've started to see if we can make progress on the gnulib-related packages. I'm speaking as contributor to gnulib and maintainer of a couple of Debian packages, but still learning to navigate -- the purpose of this post is to describe what I've done for libntlm and ask for feddback to hopefully make this into a re-usable pattern that can be applied to more packages. It would be great to improve collaboration on these topics between GNU and Debian. So let's turn this post into a recipe for Debian maintainers of packages that use gnulib to follow for their packages. I'm assuming git for now on, but feel free to mentally s/git/$VCS/. The first step is to establish an upstream tarball that you want to work with. There are too many opinions floating around on this to make any single solution a pre-requisite so here are the different approaches I can identify, ordered by my own preference, and the considerations with each. 1) Use upstream's PGP signed git-archive tarball. See my recent blog posts for this new approach. The key property here is that there is no need to audit difference between upstream tarball and upstream git. https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/ https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/ 2) Use upstream's PGP signed tarball. This is the current most common and recommended approach, as far as I know. 3) Create a PGP signed git-archive tarball. If upstream doesn't publish PGP signed tarballs, or if there is a preference from upstream or from you as Debian package maintainer to not do 1) or 2), then create a minimal source-only copy of the git archive and sign it yourself. Could be done something like this: git clone