Re: De-vendoring gnulib in Debian packages

2024-05-11 Thread Simon Josefsson via Gnulib discussion list
Bruno Haible  writes:

> Simon Josefsson wrote:
>> Finally, while this is somewhat gnulib specific, I think the practice
>> goes beyond gnulib
>
> Yes, gnulib-tool for modules written in C is similar to
>
>   * 'npm install' for JavaScript source code packages [1],
>   * 'cargo fetch' for Rust source code packages [2],
>
> except that gnulib-tool is simpler: it fetches from a single source location
> only.
>
> How does Debian handle these kinds of source-code dependencies?

I don't know the details but I believe those commands are turned into
local requests for source code, either vendored or previously packaged
in Debian.  No network access during builds.  Same for Go packages,
which I have some experience with, although for Go packages they lose
the strict versioning so if Go package X declare a depedency on package
Y version Z then on Debian it may build against version Z+1 or Z+2 which
may in theory break and was not upstream's intended or supported
configuration.  We have a circular dependency situation for some core Go
libraries in Debian right now due to this.

I think fundamentally the shift that causes challenges for distributions
may be dealing with packages dependencies that are version >= X to
package dependencies that are version = X.  If there is a desire to
support that, some new patterns of the work flow is needed.  Some
package maintainers reject this approach and refuse to co-operate with
those upstreams, but I'm not sure if this is a long-term winning
strategy: it often just lead to useful projects not being available
through distributions, and users suffers as a result.

/Simon


signature.asc
Description: PGP signature


Re: De-vendoring gnulib in Debian packages

2024-05-11 Thread Paul Eggert

On 2024-05-11 07:09, Simon Josefsson via Gnulib discussion list wrote:

I would assume that (some stripped down
version of) git is a requirement to do any useful work on any platform
these days, so maybe it isn't a problem


Yes, my impression also is that Git has migrated into the realm of 
cc/gcc in that everybody has it, so it can depend indirectly on a 
possibly earlier version of itself.


Although it is worrisome that our collective trusted computing base 
keeps growing, let's face it, if there's a security bug in Git we're all 
in big trouble anyway.




Re: De-vendoring gnulib in Debian packages

2024-05-11 Thread Bruno Haible
Simon Josefsson wrote:
> Finally, while this is somewhat gnulib specific, I think the practice
> goes beyond gnulib

Yes, gnulib-tool for modules written in C is similar to

  * 'npm install' for JavaScript source code packages [1],
  * 'cargo fetch' for Rust source code packages [2],

except that gnulib-tool is simpler: it fetches from a single source location
only.

How does Debian handle these kinds of source-code dependencies?

Bruno

[1] 
https://nodejs.org/en/learn/getting-started/an-introduction-to-the-npm-package-manager
[2] https://doc.rust-lang.org/cargo/commands/cargo-fetch.html






De-vendoring gnulib in Debian packages

2024-05-11 Thread Simon Josefsson via Gnulib discussion list
All, (going out to both debian-devel and bug-gnulib, please be
  respectful of each community's different perspectives and trim Cc
  when focus shifts to any Debian or gnulib specific topics)
  (please pardon the accidental duplicate post to bug-gnulib...)

The content of upstream source code releases can largely be categorized
into 1) the actual native source-code from the upstream supplier, 2)
pre-generated artifacts from build tools (e.g., ./configure script) and
3) third-party maintained source code (e.g., config.guess or getopt.c).
The files in 3) may be referred to as "vendoring".  The habit of
including vendored and pre-generated artifacts is a powerful and
successful way to make release tarballs usable for users, going back to
the 1980's.  This habit pose some challenges for packaging though:

1) Pre-generated files (e.g., ./configure) should be re-generated to
   make sure the package is built from source code, and to allow patches
   on the toolchain used to generate the pre-generated files to have any
   effect.  Otherwise we risk using pre-generated files created using
   non-free or non-public tools, which if I understand correctly against
   Debian main policy.  Verifying that this happens for all
   pre-generated files in an upstream tarball is complicated, fragile
   and tedious work.  I think it is simple to find examples of mistakes
   in this area even for important top-popcon Debian packages.  The
   current approach of running autoreconf -fi is based on a
   misunderstanding: autoreconf -fi is documented to not replace certain
   files with newer versions:
   https://lists.nongnu.org/archive/html/bug-gnulib/2024-04/msg00052.html

2) If a security problem in vendored code is discovered, the security
   team may have to patch 50+ packages if the vendor origin is popular.
   Maybe even different versions of the same vendored code has to be
   patched.

3) Auditing the difference between the tarball and what is stored in
   upstream version control system (VCS) is challenging.  The xz
   incident exploited the fact that some pre-generated files aren't
   included in upstream VCS.  Some upstream react to this by adding all
   pre-generated artifacts to VCS -- OpenSSH seems to take the route of
   adding the generated ./configure script to git, which moves that file
   from 3) to 1) but the problem is remaining.

4) Auditing for license compliance is challenging, since not only do we
   have to audit all upstream's code but we also have to audit the
   license of pre-generated files and vendored source-code.

There are probably more problems involved, and probably better ways to
articulate the problems than what I managed to do above.  The Go and
Rust ecosystems solve some of these issues, which has other consequences
for packaging.  We have largely ignored that the same challenges apply
to many C packages, and I'm focusing on those that uses gnulib --
https://www.gnu.org/software/gnulib/ -- gzip, tar, grep, m4, sed, bison,
awk, coreutils, grub, libiconv, libtasn1, libidn2, inetutils, etc:
https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=users.txt

Solving all of the problems for all packages will require some work and
will take time.  I've started to see if we can make progress on the
gnulib-related packages.  I'm speaking as contributor to gnulib and
maintainer of a couple of Debian packages, but still learning to
navigate -- the purpose of this post is to describe what I've done for
libntlm and ask for feddback to hopefully make this into a re-usable
pattern that can be applied to more packages.  It would be great to
improve collaboration on these topics between GNU and Debian.

So let's turn this post into a recipe for Debian maintainers of packages
that use gnulib to follow for their packages.  I'm assuming git for now
on, but feel free to mentally s/git/$VCS/.

The first step is to establish an upstream tarball that you want to work
with.  There are too many opinions floating around on this to make any
single solution a pre-requisite so here are the different approaches I
can identify, ordered by my own preference, and the considerations with
each.

1) Use upstream's PGP signed git-archive tarball.

   See my recent blog posts for this new approach.  The key property
   here is that there is no need to audit difference between upstream
   tarball and upstream git.

   
https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/
   
https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/
  

2) Use upstream's PGP signed tarball.

   This is the current most common and recommended approach, as far as I
   know.

3) Create a PGP signed git-archive tarball.

   If upstream doesn't publish PGP signed tarballs, or if there is a
   preference from upstream or from you as Debian package maintainer to
   not do 1) or 2), then create a minimal source-only copy of the git
   archiv

De-vendoring gnulib in Debian packages

2024-05-11 Thread Simon Josefsson via Gnulib discussion list
All, (going out to both debian-devel and bug-gnulib, please be
  respectful of each community's different perspectives and trim Cc
  when focus shifts to any Debian or gnulib specific topics)

The content of upstream source code releases can largely be categorized
into 1) the actual native source-code from the upstream supplier, 2)
pre-generated artifacts from build tools (e.g., ./configure script) and
3) third-party maintained source code (e.g., config.guess or getopt.c).
The files in 3) may be referred to as "vendoring".  The habit of
including vendored and pre-generated artifacts is a powerful and
successful way to make release tarballs usable for users, going back to
the 1980's.  This habit pose some challenges for packaging though:

1) Pre-generated files (e.g., ./configure) should be re-generated to
   make sure the package is built from source code, and to allow patches
   on the toolchain used to generate the pre-generated files to have any
   effect.  Otherwise we risk using pre-generated files created using
   non-free or non-public tools, which if I understand correctly against
   Debian main policy.  Verifying that this happens for all
   pre-generated files in an upstream tarball is complicated, fragile
   and tedious work.  I think it is simple to find examples of mistakes
   in this area even for important top-popcon Debian packages.  The
   current approach of running autoreconf -fi is based on a
   misunderstanding: autoreconf -fi is documented to not replace certain
   files with newer versions:
   https://lists.nongnu.org/archive/html/bug-gnulib/2024-04/msg00052.html

2) If a security problem in vendored code is discovered, the security
   team may have to patch 50+ packages if the vendor origin is popular.
   Maybe even different versions of the same vendored code has to be
   patched.

3) Auditing the difference between the tarball and what is stored in
   upstream version control system (VCS) is challenging.  The xz
   incident exploited the fact that some pre-generated files aren't
   included in upstream VCS.  Some upstream react to this by adding all
   pre-generated artifacts to VCS -- OpenSSH seems to take the route of
   adding the generated ./configure script to git, which moves that file
   from 3) to 1) but the problem is remaining.

4) Auditing for license compliance is challenging, since not only do we
   have to audit all upstream's code but we also have to audit the
   license of pre-generated files and vendored source-code.

There are probably more problems involved, and probably better ways to
articulate the problems than what I managed to do above.  The Go and
Rust ecosystems solve some of these issues, which has other consequences
for packaging.  We have largely ignored that the same challenges apply
to many C packages, and I'm focusing on those that uses gnulib --
https://www.gnu.org/software/gnulib/ -- gzip, tar, grep, m4, sed, bison,
awk, coreutils, grub, libiconv, libtasn1, libidn2, inetutils, etc:
https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=users.txt

Solving all of the problems for all packages will require some work and
will take time.  I've started to see if we can make progress on the
gnulib-related packages.  I'm speaking as contributor to gnulib and
maintainer of a couple of Debian packages, but still learning to
navigate -- the purpose of this post is to describe what I've done for
libntlm and ask for feddback to hopefully make this into a re-usable
pattern that can be applied to more packages.  It would be great to
improve collaboration on these topics between GNU and Debian.

So let's turn this post into a recipe for Debian maintainers of packages
that use gnulib to follow for their packages.  I'm assuming git for now
on, but feel free to mentally s/git/$VCS/.

The first step is to establish an upstream tarball that you want to work
with.  There are too many opinions floating around on this to make any
single solution a pre-requisite so here are the different approaches I
can identify, ordered by my own preference, and the considerations with
each.

1) Use upstream's PGP signed git-archive tarball.

   See my recent blog posts for this new approach.  The key property
   here is that there is no need to audit difference between upstream
   tarball and upstream git.

   
https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/
   
https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/
  

2) Use upstream's PGP signed tarball.

   This is the current most common and recommended approach, as far as I
   know.

3) Create a PGP signed git-archive tarball.

   If upstream doesn't publish PGP signed tarballs, or if there is a
   preference from upstream or from you as Debian package maintainer to
   not do 1) or 2), then create a minimal source-only copy of the git
   archive and sign it yourself.  Could be done something like this:

   git c

Re: Gnulib in Debian

2024-04-24 Thread Reuben Thomas
On Wed, 24 Apr 2024 at 15:56, Simon Josefsson  wrote:

>
> The last aspect should be solved: the latest gnulib in Debian contains a
> git bundle of gnulib,


That's fantastic, I wish I had thought of that. I still don't fancy doing
the necessary packaging work, but I'll let those who have been helping me
know that there's a viable alternative.

 I should write a post to debian-devel describing this pattern on
> how to use gnulib in Debian packages, but you can infer everything from
> the links given in my blog post [1] and the latest upload of libntlm
> into Debian.
>

Thanks very much for writing this up. If in a few years libpaper2 has still
not made it into Debian I shall probably find the energy to look at it
myself!

-- 
https://rrt.sc3d.org


Re: Gnulib in Debian

2024-04-24 Thread Simon Josefsson via Gnulib discussion list
Reuben Thomas  writes:

> TLDR: FTP Master rejected my libpaper package because it contains gnulib
> source files. I pointed out that other Debian packages for which I am
> upstream do exactly this and have been accepted, and that it is the
> standard way to use gnulib. A few senior Debian Developers said they did
> not consider this use of gnulib to be against Debian policy. But FTP
> Master's stance appears to be that they will not let any new packages into
> the archive that contain gnulib sources (or in general, vendored
> sources—they don't have anything against gnulib in particular!). I also
> argued that building against Debian's version of gnulib would risk
> introducing bugs (I have found that updating gnulib in my projects can make
> previously-working code fail).

The last aspect should be solved: the latest gnulib in Debian contains a
git bundle of gnulib, so you can Build-Depends on gnulib and via
GNULIB_REVISION pick out exactly the gnulib git revision that libpaper
needs.  This avoids including gnulib files in the tarball that is
uploaded to Debian, and there is no risk that you will get gnulib code
from a different git commit.  It requires an added 'Build-Depends: git'
in libpaper, though, which is unfortunate but I don't see how to avoid
it.  I should write a post to debian-devel describing this pattern on
how to use gnulib in Debian packages, but you can infer everything from
the links given in my blog post [1] and the latest upload of libntlm
into Debian.

/Simon

[1] 
https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-tarballs/
[2] https://salsa.debian.org/auth-team/libntlm/-/tree/master/debian


signature.asc
Description: PGP signature


Re: Gnulib in Debian

2024-04-24 Thread Reuben Thomas
On Wed, 24 Apr 2024 at 01:51, Bruno Haible  wrote:

> Reuben Thomas wrote:
> > (not yet in Debian, sadly, as they don't like me "vendoring gnulib", as
> FTP
> > Master calls it, or "using gnulib as other packages like Enchant do, and
> as
> > designed", as I call it).
>
> I assume you are alluding to the mail thread that starts at
>  ?
>

Yes.


> I haven't read the thread. But you write:
>   "I am the upstream maintainer of libpaper ..., and also a Debian
> Maintainer
>trying to get a new version of libpaper into Debian."
>

TLDR: FTP Master rejected my libpaper package because it contains gnulib
source files. I pointed out that other Debian packages for which I am
upstream do exactly this and have been accepted, and that it is the
standard way to use gnulib. A few senior Debian Developers said they did
not consider this use of gnulib to be against Debian policy. But FTP
Master's stance appears to be that they will not let any new packages into
the archive that contain gnulib sources (or in general, vendored
sources—they don't have anything against gnulib in particular!). I also
argued that building against Debian's version of gnulib would risk
introducing bugs (I have found that updating gnulib in my projects can make
previously-working code fail).

Is the problem something that affects the package upstream, or only
> something
> that is specific to Debian?
>

It's Debian-specific, though I imagine other distros might also take a
similar stance.

In this case, the solution is for someone else to repackage libpaper
without the offending files (by generating a new source tarball). I have
said I don't want to do this myself; to be honest it's just a depressing
thought to spend hours doing something that makes no sense to me, and that
will potentially cause me bug reports in future.

I do sympathise with Debian's aim here, and the long-mooted "libposix"
project, or rather an extended "libgnu" version—that is, an installable
version of gnulib that one can use like any other library—would solve this
problem for both me and Debian. Maybe I'll summon the energy to tackle some
of the libposix to-do list one day.


> In the latter case, I don't want to interfere with that. Distros package
> the
> software like they want to. Debian, in particular, has hundreds of pages of
> policy documents. It's not my business as an upstream maintainer to
> interfere
> with that.
>

Sure, I'm just complaining, not asking for a solution. I should have been
clearer about that, sorry.

-- 
https://rrt.sc3d.org


Re: Gnulib in Debian

2024-04-23 Thread Bruno Haible
Reuben Thomas wrote:
> (not yet in Debian, sadly, as they don't like me "vendoring gnulib", as FTP
> Master calls it, or "using gnulib as other packages like Enchant do, and as
> designed", as I call it).

I assume you are alluding to the mail thread that starts at
 ?

I haven't read the thread. But you write:
  "I am the upstream maintainer of libpaper ..., and also a Debian Maintainer
   trying to get a new version of libpaper into Debian."

Is the problem something that affects the package upstream, or only something
that is specific to Debian?

In the latter case, I don't want to interfere with that. Distros package the
software like they want to. Debian, in particular, has hundreds of pages of
policy documents. It's not my business as an upstream maintainer to interfere
with that. (*)

Bruno

(*) Except when there's a copyright infringement, like it happened in the
Oracle Solaris downstream distribution of GNU libsigsegv. I had to write them,
so they stopped doing that.