Re: Upstream dist tarball transparency (was Re: Validating tarballs against git repositories)

2024-04-06 Thread James Addison
Thanks for the response!

On Fri, 5 Apr 2024 11:12:33 +0200, Guillem wrote:
> On Wed, 2024-04-03 at 23:53:56 +0100, James Addison wrote:
> > On Wed, 3 Apr 2024 19:36:33 +0200, Guillem wrote:
> > > On Fri, 2024-03-29 at 23:29:01 -0700, Russ Allbery wrote:
> > > > On 2024-03-29 22:41, Guillem Jover wrote:
> > > > I think with my upstream hat on I'd rather ship a clear manifest 
> > > > (checked
> > > > into Git) that tells distributions which files in the distribution 
> > > > tarball
> > > > are build artifacts, and guarantee that if you delete all of those 
> > > > files,
> > > > the remaining tree should be byte-for-byte identical with the
> > > > corresponding signed Git tag.  (In other words, Guillem's suggestion.)
> > > > Then I can continue to ship only one release artifact.
> > >
> > > I've been pondering about this and I think I might have come up with a
> > > protocol that to me (!) seems safe, even against a malicious upstream. And
> > > does not require two tarballs which as you say seems cumbersome, and makes
> > > it harder to explain to users. But I'd like to run this through the list
> > > in case I've missed something obvious.

Ok, after a bit more time to process the details, this makes more sense to me
now.  It's a fairly strong assertion about the precise VCS origin and commit
that a _subset_ of the files in a dist tarball originate from.

And the strength of the claim (I think) varies based on how feasible it would
be for an attacker to take control of the origin and write a substitute commit
with the same VCS commit ID and file list - so it's based on fairly
well-understood principles about crytographic hash strength.

(this seems similar in some ways to the existing .dsc file format, although
in relation to a 'source package source' and not sources of binary packages)

In any case: I'm reasonably convinced that the provenance (claim-of-origin)
that this would provide for a source tarball is fairly strong.  That's not my
only concern though (in particular, goto: regeneration).

> > Does this cater for situations where part of the preparation of a source
> > tarball involves populating a directory with a list of filenames that
> correspond to hostnames known to the source preparer?
> >
> > If that set of hostnames changes, then regardless of the same source
> > VCS checkout being used, the resulting distribution source tarball could
> > differ.
>
> > Yes, it's a hypothetical example; but given time and attacker patience,
> > someone is motivated to attempt any workaround.  In practice the
> > difference could be a directory of hostnames or it could be a bitflag
> > that is part of a macro that is only evaluated under various nested
> > conditions.
>
> I'm not sure whether I've perhaps misunderstood your scenario, but if
> the distributed tarball contains things not present in the VCS, then
> with this proposal those can then be easily removed, which means it
> does not matter much if they differ between same tarball generation
> (I mean it matters in the sense that it's an alarm sign, but it does
> not matter in the sense that you can get at the same state as with a
> clean VCS checkout).

Yep, you managed to translate my baffling scenario description into a clearer
problem statement :)

> The other part then is whether the remaining contents differ from what
> is in the VCS.
>
> If any of these trigger a difference, then that would require manual
> review. That of course does not exempt one from reviewing the VCS, it
> just potentially removes one avenue for smuggling artifacts.

Why not reject an upload automatically if a difference is detected between the
source package source and the dist tarball?

> > To take a leaf from the Reproducible Builds[1] project: to achieve a
> > one-to-one mapping between a set of inputs and an output, you need to
> > record all of the inputs; not only the source code, but also the build
> > environment.
> >
> > I'm not yet convinced that source-as-was-written to 
> > distributed-source-tarball
> > is a problem that is any different to that of distributed-source-tarball to
> > built-package.  Changes to tooling do, in reality, affect the output of
> > build processes -- and that's usually good, because it allows for
> > performance optimizations.  But it also necessitates the inclusion of the
> > toolchain and environment to produce repeatable results.
>
> In this case, the property you'd gain is that you do not need to trust
> the system of the person preparing the distribution tarball, and can
> then regenerate those outputs from (supposedly) good inputs from both
> the distribution tarball, and _your_ (or the distribution) system
> toolchain.

regeneration:

Here is the problem.  Let's say that in future with this transparency code in
place, a security bug is discovered in versions of autotools that were
available in the testing or unstable distributions (and may have been used by
some Debian maintainers/developers).  It could be useful to determine a 

Re: Validating tarballs against git repositories

2024-04-06 Thread Sean Whitton
Hello,

On Fri 05 Apr 2024 at 03:19pm +01, Simon McVittie wrote:

> There are basically three dgit-compatible workflows, with some minor
> adjustments around handling of .gitignore files:
>
> - "patches applied" (git-debrebase, etc.):
>   This is the workflow that proponents of dgit sometimes recommend,
>   and dgit uses it as its canonicalized internal representation of
>   the package.
>   The git tree is the same as `dpkg-source -x`, with upstream source code
>   included, debian/ also included, and any Debian delta to the upstream
>   source pre-applied to those source files.
>   In the case of bubblewrap, if we used this workflow, after you clone
>   the project, bubblewrap.c would already have the Debian-specific error
>   message.
>   (dgit --split-view=never or dgit --quilt=dpm)
>
> - "patches unapplied" (gbp pq, quilt, etc.):
>   This is the workflow that many of the big teams use (at least Perl,
>   Python, GNOME and systemd), and is the one that bubblewrap really uses.
>   The git tree is the same as `dpkg-source -x --skip-patches`, with
>   upstream source code included, and debian/ also included.
>   Any Debian delta to the upstream source is represented in debian/patches
>   but is *not* pre-applied to the source files: for example, in the case
>   of bubblewrap, after you clone
>   https://salsa.debian.org/debian/bubblewrap.git and view bubblewrap.c,
>   it still has the upstream error message, not the Debian-specific one.
>   (dgit --quilt=gbp or dgit --quilt=unapplied; I use the latter)
>
> - debian/ only:
>   This is what you're advocating above.
>   The git tree contians only debian/. If there is Debian delta to the
>   upstream source, it is in debian/patches/ as usual.
>   (dgit --quilt=baredebian* family)

People interested in these differences may also want to look at:
.

>
> Again, checking for that error is something that can be (and is)
> automated: I use this workflow myself (e.g. in bubblewrap), so I know from
> experience that dgit *does* check for that error, and will fail to build
> the source package if the invariant does not hold. Again, dpkg-source
> in 3.0 (quilt) format will also make your source package fail to build
> if that error exists, except in the cases that it intentionally ignores.

Right, both dgit and tag2upload's client-side wrapped of git-tag(1),
git-debpush(1), do this check.  

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-04-05 Thread Marco d'Itri
On Apr 05, Simon McVittie  wrote:

> I find that having the upstream source code in git (in the same form that
> we use for the .orig.tar.*, so including Autotools noise, etc. if present,
> but excluding any files that we exclude by repacking) is an extremely
> useful tool, because it lets me trace the history of all of the files
> that we are treating as source - whether hand-written or autogenerated -
> if I want to do that. If we are concerned about defending against actively
I agree: it would be untinkable for me to not have the complete history 
immediately available while I am working on a package.

-- 
ciao,
Marco


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-04-05 Thread Luca Boccassi
On Fri, 5 Apr 2024 at 16:18, Colin Watson  wrote:
>
> On Fri, Apr 05, 2024 at 03:19:23PM +0100, Simon McVittie wrote:
> > I find that having the upstream source code in git (in the same form that
> > we use for the .orig.tar.*, so including Autotools noise, etc. if present,
> > but excluding any files that we exclude by repacking) is an extremely
> > useful tool, because it lets me trace the history of all of the files
> > that we are treating as source - whether hand-written or autogenerated -
> > if I want to do that. If we are concerned about defending against actively
> > malicious upstreams like the recent xz releases, then that's already a
> > difficult task and one where it's probably unrealistic to expect a high
> > success rate, but I think we are certainly not going to be able to achieve
> > it if we reject tools like git that could make it easier.
>
> Strongly agree.  For many many things I rely heavily on having the
> upstream source code available in the same working tree when doing any
> kind of archaeology across Debian package versions, which is something I
> do a lot.
>
> I would hate to see an attacker who relied on an overloaded maintainer
> push us into significantly less convenient development setups, thereby
> increasing the likelihood of overload.

+1

gbp workflow is great, easy to review and very productive



Re: Validating tarballs against git repositories

2024-04-05 Thread Colin Watson
On Fri, Apr 05, 2024 at 03:19:23PM +0100, Simon McVittie wrote:
> I find that having the upstream source code in git (in the same form that
> we use for the .orig.tar.*, so including Autotools noise, etc. if present,
> but excluding any files that we exclude by repacking) is an extremely
> useful tool, because it lets me trace the history of all of the files
> that we are treating as source - whether hand-written or autogenerated -
> if I want to do that. If we are concerned about defending against actively
> malicious upstreams like the recent xz releases, then that's already a
> difficult task and one where it's probably unrealistic to expect a high
> success rate, but I think we are certainly not going to be able to achieve
> it if we reject tools like git that could make it easier.

Strongly agree.  For many many things I rely heavily on having the
upstream source code available in the same working tree when doing any
kind of archaeology across Debian package versions, which is something I
do a lot.

I would hate to see an attacker who relied on an overloaded maintainer
push us into significantly less convenient development setups, thereby
increasing the likelihood of overload.

> In the "debian/ only" workflow, the Debian delta is exactly the contents
> of debian/. There is no redundancy, so every tree is in some sense a
> valid one (although of course sometimes patches will fail to apply, or
> whatever).

I'd argue that this, and the similar error case in patches-unapplied, is
symmetric with the error case in the patches-applied workflow (although
it's true that there is redundancy in _commits_ in the latter case).

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: Validating tarballs against git repositories

2024-04-05 Thread Simon McVittie
On Sat, 30 Mar 2024 at 14:16:21 +0100, Guillem Jover wrote:
> in my mind this incident reinforces my view that precisely storing
> more upstream stuff in git is the opposite of what we'd want, and
> makes reviewing even harder, given that in our context we are on a
> permanent fork against upstream, and if you include merge commits and
> similar, there's lots of places to hide stuff. In contrast storing
> only the packaging bits (debian/ dir alone) like pretty much every
> other downstream is doing with their packaging bits, makes for an
> obviously more manageable thing to review and not get drown into,
> more so if we have to consider that next time perhaps the long-game
> gets played within Debian.

I'd like to push back against this, because I'm not convinced by this
reasoning, and I'd like to provide another point of view to consider.

I find that having the upstream source code in git (in the same form that
we use for the .orig.tar.*, so including Autotools noise, etc. if present,
but excluding any files that we exclude by repacking) is an extremely
useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively
malicious upstreams like the recent xz releases, then that's already a
difficult task and one where it's probably unrealistic to expect a high
success rate, but I think we are certainly not going to be able to achieve
it if we reject tools like git that could make it easier.

Am I correct to say that you are assuming here that we have a way to
verify the upstream source code out-of-band (therefore catching the xz
backdoor is out-of-scope here), and what you are aiming to detect here
is malicious changes that exist inside the Debian delta, more precisely
the dpkg-source 1.0 .diff.gz or 3.0 (quilt) .debian.tar.*? If that's your
threat model, then I don't think any of the modes that dgit can cope with
are actually noticeably more difficult than a debian/-only git repo.

As my example of a project that applies patches, I'm going to use
bubblewrap, which is a small project and has a long-standing patch that
changes an error message in bubblewrap.c to point to Debian-specific
documentation; this makes it convenient to tell at a glance whether
bubblewrap.c is the upstream version or the Debian version.

There are basically three dgit-compatible workflows, with some minor
adjustments around handling of .gitignore files:

- "patches applied" (git-debrebase, etc.):
  This is the workflow that proponents of dgit sometimes recommend,
  and dgit uses it as its canonicalized internal representation of
  the package.
  The git tree is the same as `dpkg-source -x`, with upstream source code
  included, debian/ also included, and any Debian delta to the upstream
  source pre-applied to those source files.
  In the case of bubblewrap, if we used this workflow, after you clone
  the project, bubblewrap.c would already have the Debian-specific error
  message.
  (dgit --split-view=never or dgit --quilt=dpm)

- "patches unapplied" (gbp pq, quilt, etc.):
  This is the workflow that many of the big teams use (at least Perl,
  Python, GNOME and systemd), and is the one that bubblewrap really uses.
  The git tree is the same as `dpkg-source -x --skip-patches`, with
  upstream source code included, and debian/ also included.
  Any Debian delta to the upstream source is represented in debian/patches
  but is *not* pre-applied to the source files: for example, in the case
  of bubblewrap, after you clone
  https://salsa.debian.org/debian/bubblewrap.git and view bubblewrap.c,
  it still has the upstream error message, not the Debian-specific one.
  (dgit --quilt=gbp or dgit --quilt=unapplied; I use the latter)

- debian/ only:
  This is what you're advocating above.
  The git tree contians only debian/. If there is Debian delta to the
  upstream source, it is in debian/patches/ as usual.
  (dgit --quilt=baredebian* family)

In the "patches applied" workflow, the Debian delta is something like
`git diff upstream/VERSION..debian/latest`, where upstream/VERSION must
match the .orig.tar.* and debian/latest is the packaging you are reviewing.
Not every tree is a valid one, because if you are using 3.0 (quilt),
then there is redundancy between the upstream source code and what's in
debian/patches: it is an error if the result of reverting all the patches
does not match the upstream source in the .orig.tar.*, modulo possibly
some accommodation for changes to **/.gitignore being accepted and ignored.
To detect malicious Debian changes in 3.0 (quilt) format, you would want
to either check for that error, or review both the direct diff and the
patches.

Checking for that error is something that can be (and is) automated:
I don't use this workflow myself, but as far as I'm aware, dgit will
check that invariant, and it will fail to build your source package
if the invariant 

Re: Upstream dist tarball transparency (was Re: Validating tarballs against git repositories)

2024-04-05 Thread Guillem Jover
Hi!

On Wed, 2024-04-03 at 23:53:56 +0100, James Addison wrote:
> On Wed, 3 Apr 2024 19:36:33 +0200, Guillem wrote:
> > On Fri, 2024-03-29 at 23:29:01 -0700, Russ Allbery wrote:
> > > On 2024-03-29 22:41, Guillem Jover wrote:
> > > I think with my upstream hat on I'd rather ship a clear manifest (checked
> > > into Git) that tells distributions which files in the distribution tarball
> > > are build artifacts, and guarantee that if you delete all of those files,
> > > the remaining tree should be byte-for-byte identical with the
> > > corresponding signed Git tag.  (In other words, Guillem's suggestion.)
> > > Then I can continue to ship only one release artifact.
> >
> > I've been pondering about this and I think I might have come up with a
> > protocol that to me (!) seems safe, even against a malicious upstream. And
> > does not require two tarballs which as you say seems cumbersome, and makes
> > it harder to explain to users. But I'd like to run this through the list
> > in case I've missed something obvious.
> 
> Does this cater for situations where part of the preparation of a source
> tarball involves populating a directory with a list of filenames that
> correspond to hostnames known to the source preparer?
> 
> If that set of hostnames changes, then regardless of the same source
> VCS checkout being used, the resulting distribution source tarball could
> differ.

> Yes, it's a hypothetical example; but given time and attacker patience,
> someone is motivated to attempt any workaround.  In practice the
> difference could be a directory of hostnames or it could be a bitflag
> that is part of a macro that is only evaluated under various nested
> conditions.

I'm not sure whether I've perhaps misunderstood your scenario, but if
the distributed tarball contains things not present in the VCS, then
with this proposal those can then be easily removed, which means it
does not matter much if they differ between same tarball generation
(I mean it matters in the sense that it's an alarm sign, but it does
not matter in the sense that you can get at the same state as with a
clean VCS checkout).

The other part then is whether the remaining contents differ from what
is in the VCS.

If any of these trigger a difference, then that would require manual
review. That of course does not exempt one from reviewing the VCS, it
just potentially removes one avenue for smuggling artifacts.

> To take a leaf from the Reproducible Builds[1] project: to achieve a
> one-to-one mapping between a set of inputs and an output, you need to
> record all of the inputs; not only the source code, but also the build
> environment.
> 
> I'm not yet convinced that source-as-was-written to distributed-source-tarball
> is a problem that is any different to that of distributed-source-tarball to
> built-package.  Changes to tooling do, in reality, affect the output of
> build processes -- and that's usually good, because it allows for
> performance optimizations.  But it also necessitates the inclusion of the
> toolchain and environment to produce repeatable results.

In this case, the property you'd gain is that you do not need to trust
the system of the person preparing the distribution tarball, and can
then regenerate those outputs from (supposedly) good inputs from both
the distribution tarball, and _your_ (or the distribution) system
toolchain.

The distinction I see from the reproducible build effort, is that in
this case we can just discard some of the inputs and outputs and go
from original sources.

(Not sure whether that clarifies or I've talked past you now. :)

Thanks,
Guillem



Re: Upstream dist tarball transparency (was Re: Validating tarballs against git repositories)

2024-04-03 Thread James Addison
Hi Guillem,

On Wed, 3 Apr 2024 19:36:33 +0200, Guillem wrote:
> On Fri, 2024-03-29 at 23:29:01 -0700, Russ Allbery wrote:
> > On 2024-03-29 22:41, Guillem Jover wrote:
> > I think with my upstream hat on I'd rather ship a clear manifest (checked
> > into Git) that tells distributions which files in the distribution tarball
> > are build artifacts, and guarantee that if you delete all of those files,
> > the remaining tree should be byte-for-byte identical with the
> > corresponding signed Git tag.  (In other words, Guillem's suggestion.)
> > Then I can continue to ship only one release artifact.
>
> I've been pondering about this and I think I might have come up with a
> protocol that to me (!) seems safe, even against a malicious upstream. And
> does not require two tarballs which as you say seems cumbersome, and makes
> it harder to explain to users. But I'd like to run this through the list
> in case I've missed something obvious.

Does this cater for situations where part of the preparation of a source
tarball involves populating a directory with a list of filenames that
correspond to hostnames known to the source preparer?

If that set of hostnames changes, then regardless of the same source
VCS checkout being used, the resulting distribution source tarball could
differ.

Yes, it's a hypothetical example; but given time and attacker patience,
someone is motivated to attempt any workaround.  In practice the
difference could be a directory of hostnames or it could be a bitflag
that is part of a macro that is only evaluated under various nested
conditions.

To take a leaf from the Reproducible Builds[1] project: to achieve a
one-to-one mapping between a set of inputs and an output, you need to
record all of the inputs; not only the source code, but also the build
environment.

I'm not yet convinced that source-as-was-written to distributed-source-tarball
is a problem that is any different to that of distributed-source-tarball to
built-package.  Changes to tooling do, in reality, affect the output of
build processes -- and that's usually good, because it allows for
performance optimizations.  But it also necessitates the inclusion of the
toolchain and environment to produce repeatable results.

Regards,
James

[1] - https://reproducible-builds.org/



Upstream dist tarball transparency (was Re: Validating tarballs against git repositories)

2024-04-03 Thread Guillem Jover
Hi!

On Fri, 2024-03-29 at 23:29:01 -0700, Russ Allbery wrote:
> On 2024-03-29 22:41, Guillem Jover wrote:
> > (For dpkg at least I'm pondering whether to play with switching to
> > doing something equivalent to «git archive» though, but see above, or
> > maybe generate two tarballs, a plain «git archive» and a portable one.)
> 
> Yeah, with my upstream hat on, I'm considering something similar, but I
> still believe I have users who want to compile from source on systems
> without current autotools, so I still need separate release tarballs.
> Having to generate multiple release artifacts (and document them, and
> explain to people which ones they want, etc.) is certainly doable, but I
> can't say that I'm all that thrilled about it.
> 
> I think with my upstream hat on I'd rather ship a clear manifest (checked
> into Git) that tells distributions which files in the distribution tarball
> are build artifacts, and guarantee that if you delete all of those files,
> the remaining tree should be byte-for-byte identical with the
> corresponding signed Git tag.  (In other words, Guillem's suggestion.)
> Then I can continue to ship only one release artifact.

I've been pondering about this and I think I might have come up with a
protocol that to me (!) seems safe, even against a malicious upstream. And
does not require two tarballs which as you say seems cumbersome, and makes
it harder to explain to users. But I'd like to run this through the list
in case I've missed something obvious.

I've implemented a prototype for dpkg, in the branch:

  
https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/log/?h=next/dist-transparency

For context, for a long time dpkg dist tarballs have already shipped a
«.dist-version», I think some GNU projects started to do something
similar but with a different name.

The relevant commits:

  
https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=54a6ad9db3da335a40fed9020195864c4a87bdc1
(Add .dist-vcs-id, in git main already)

At least for dpkg, if «make dist» is run from outside a tag, then
the version will include the commit and whether the working dir
was dirty, but from a tag, only the version is included and there's
no link to what commit that was pointing to at that time. This file
adds that link, regardless of the current commit. And prints it as
part of the configure summary.

  
https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=1944a90d13c7c63592c438e550a212ab9e3aad76
(Remove VCS specific files from dist)

Simplifies the comparisons.

  
https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=39d181e60b3413c58a72056beec0a5a6f584cd92
(Add .dist-vcs-url)

This adds a new file to track the upstream VCS URL, so that it can
used from a deterministic place, for verification purposes.

  
https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=b3d7e0f195bd69b4622121e78fce751ea76dc0bc
(Add .dist-vcs-files)

This adds a new file with the list of files *in* the VCS, so that
we can get back to that clean state, even from a distributed
tarball, or from a extracted directory with built artifacts.

I also thought about listing the autogenerated files as Russ
mentions, but that seems error-prone and non-exhaustive, because
those might depend on the version of the autotools (or other build
system) used, and does not include artifacts part of the build phase,
which could be used to smuggle things in.

This last commit lists the three operations that all this makes
possible:

  * list difference in file lists (should be none)
  * list difference in file contents (should be none)
  * resetting the directory into a state like the VCS (except
for the VCS tracking/supporting files)

These operations are fairly generic, the one thing I could see
being "configurable" is the VCS files to exclude, maybe via
another file, but I've not thought about the consequences here.


I think this is safe (in the sense of detecting smuggled artifacts or
modifications in the dist tarball not present in the VCS, but certainly
not against modifications or artifacts smuggled in the VCS), because a
user that wants to verify any of this can make sure the URL is the
expected one, and everything else seems to fall from here, otherwise
you should get differences. (Thinking now, perhaps one of the checks
should be whether the expected tag or branch matches the commit id?)

This is currently catered for a Debian native package or just handling
the upstream part with no packaging, but I don't think it would be much
work to integrate this into packaged upstreams (mostly excluding whatever
is in the debian.tar parts?), or even to use something like this from
an upstream that does not provide these files by adding equivalent files
or metadata in the packaging.

The only things that one would need to trust are the invocations to
perform those actions, which should *not* be part of 

Re: Validating tarballs against git repositories

2024-04-02 Thread Jeremy Stanley
On 2024-04-02 16:44:54 -0700 (-0700), Russ Allbery wrote:
[...]
> I think a shallow clone of depth 1 is sufficient, although that's not
> sufficient to get the correct version number from Git in all cases.
[...]

Some tools (python3-reno, for example) want to inspect the commits
and historical tags on branches, in order to do things like
assembling release notes documents. I don't know if any reno-using
projects packaged in Debian get release notes included, but if they
do then shallow clones would break that process. The python3-pbr
plugin also wants to look at commit messages on the current branch
since the most recent tag if its SemVer-based version-guessing kicks
in (typically if the current commit isn't tagged and the version
string hasn't been overridden with an envvar).
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-04-02 Thread Jeremy Stanley
On 2024-04-03 00:33:47 +0200 (+0200), Thomas Goirand wrote:
[...]
> Also, sdists are *not* "upstream-created source tarballs". I
> consider the binary form built for PyPi. Just like we have .debs,
> PyPi has tarballs and wheels, rather than how you describe them.
[...]

Upstream in OpenStack we believe we are distributing source tarballs
in sdist format. We produce and sign them, and serve them from
multiple locations. When you rebuild from a Git tag of an OpenStack
repository using a standard Python packaging ecosystem toolchain,
SetupTools is generating an ephemeral sdist on the fly in order to
set the metadata PBR and other components need.

I think it's fine that you'd rather rebuild the source distributions
from revision control than use the ones published by the OpenStack
community (we sign our tags with the same OpenPGP key as our
tarballs anyway), but it's merely your opinion that sdists are *not*
"upstream-created source tarballs" (an opinion *not* shared by
everyone).
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-04-02 Thread Russ Allbery
Stefano Rivera  writes:

> Then you haven't come across any that are using this mechanism to
> install data, yet. You're only seeing the version determination.  You
> will, at some point run into this problem. It's getting more popular.

Yup, we use this mechanism heavily at work, since it avoids having to
separately maintain a MANIFEST.in file.  Anything that's checked in to Git
in the appropriate trees ships with the module.  But this means that you
have to build the module from a Git repository, if you're not using the
artifact uploaded to PyPI (which expands out all the information derived
from Git).

If I correctly remember the failure mode, which I sometimes run into
during local development if I forget to git add new data files, the data
files are just not installed since nothing tells the build system they
should be included with the module.

I think a shallow clone of depth 1 is sufficient, although that's not
sufficient to get the correct version number from Git in all cases.

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-04-02 Thread Stefano Rivera
Hi Thomas (2024.04.02_22:33:47_+)
> Anyways, on the 400+ packages that I maintain within the OpenStack team, I
> did come across some upstream using setuptools-scm. To my experience, using
> the:
> 
> git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
>   | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz
> 
> workflow out of an upstream always work, including for those that are using
> setuptools-scm.

Then you haven't come across any that are using this mechanism to
install data, yet. You're only seeing the version determination.
You will, at some point run into this problem. It's getting more
popular.

Stefano

-- 
Stefano Rivera
  http://tumbleweed.org.za/
  +1 415 683 3272



Re: Validating tarballs against git repositories

2024-04-02 Thread Thomas Goirand

On 3/30/24 08:02, Gioele Barabucci wrote:
For too many core packages there is an opaque "something happens on the 
Debian maintainer laptop" step that has no place in 2024.



Let's replace this by an opaque "something happens on the Salsa CI".


Cheers,

Thomas Goirand (zigo)



Re: Validating tarballs against git repositories

2024-04-02 Thread Thomas Goirand

On 4/1/24 00:32, Stefano Rivera wrote:

So... for Python packages using setuptools-scm, we're pushed towards
depending on upstream-created source tarballs (sdists), rather than
upstream git archives, because we don't have the ".git" directory in our
source packages.


Hi Stefano,

Thanks for jumping in this thread, though I'll have to disagree with 
you... with all due respect! :)


As you know, I'm by far the biggest uploader of Python modules in 
Debian, due to the fact I'm maintaining OpenStack. As you may know as 
well, the reason I'm not maintaining all of that inside the Debian 
Python Team, is only because it's forbidden to use an upstream git tag 
workflow in the team, and that I have to use pristine-tar. I very much 
regret this fact, but live with it when I have to package within the 
Debian Python Team.


Anyways, on the 400+ packages that I maintain within the OpenStack team, 
I did come across some upstream using setuptools-scm. To my experience, 
using the:


git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
| xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz

workflow out of an upstream always work, including for those that are 
using setuptools-scm. One simply needs to add the dependency, and 
correctly set, with something like this:


SETUPTOOLS_SCM_PRETEND_VERSION=$(shell dpkg-parsechangelog -SVersion \
| sed -e 's/^[[:digit:]]*://' \
-e 's/[-].*//' \
-e 's/~/.0/' \
-e 's/+dfsg1//' \
| head -n 1)

Because I'm dealing with the DPT packages as well, I can tell, and I 
insist: the workflow to to work with upstream Git is a way nicer than 
the pristine-tar / gbp import-orig one. The upstream tag workflow 
*ALWAYS* work, and often (even: always, for the case of Python modules) 
contain less pre-built artifacts.


Also, sdists are *not* "upstream-created source tarballs". I consider 
the binary form built for PyPi. Just like we have .debs, PyPi has 
tarballs and wheels, rather than how you describe them.


By the way, am I the only one to think that's so lame to use tarballs in 
so many ways... Isn't this a so ... legacy retro-computing format? :)


Cheers,

Thomas Goirand (zigo)



Re: Validating tarballs against git repositories

2024-04-02 Thread Xiyue Deng
PICCA Frederic-Emmanuel 
writes:

> One missing piece for me in order to migrate to meson is the integration 
> between flymake and the autotools.
>
> https://www.emacswiki.org/emacs/FlyMake#h5o-7
>

There is an unofficial Meson LSP[1].  Maybe it can be configured with
Eglot or lsp-mode.

-- 
Xiyue Deng

[1] https://github.com/JCWasmx86/mesonlsp



Re: autoreconf --force not forcing (was Re: Validating tarballs against git repositories)

2024-04-02 Thread Colin Watson
On Tue, Apr 02, 2024 at 08:20:31PM +0300, Adrian Bunk wrote:
> On Tue, Apr 02, 2024 at 06:05:22PM +0100, Colin Watson wrote:
> > On Tue, Apr 02, 2024 at 06:57:20PM +0300, Adrian Bunk wrote:
> > > Does gnulib upstream support upgrading/downgrading the gnulib m4 files
> > > (like the one used in the xz backdoor) without upgrading/downgrading
> > > the corresponding gnulib C files?
> > 
> > Yes, although it takes a bit of effort.  You can use the --local-dir
> > option of gnulib-tool, which allows overriding individual Gnulib files
> > or modules or applying patches to Gnulib files; or you can define a
> > bootstrap_post_import_hook function in bootstrap.conf and do whatever
> > you want there.
> 
> I had the impression that what Guillem has in mind is more towards 
> adding dependencies on packages like gnulib and autoconf-archive
> to dh-autoreconf, which would then blindly overwrite all m4 files
> where a copy (same or older or newer) exists on the build system.

Oh, I see what you mean now.

IMO it would be a mistake to attempt to do this in such a way that it
upgraded only the m4 files and not the C files.  Changes made to gnulib
modules (which typically consist of some m4, some C, and some metadata)
often touch both m4 and C at once; it seems unwise to try to arbitrarily
split those up.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: Validating tarballs against git repositories

2024-04-02 Thread Richard Laager

On 2024-04-02 11:05, Russ Allbery wrote:

Meson honestly sounds great, and I personally love the idea of using a
build system whose language is a bit more like Python, since I use that
language professionally anyway.  (It would be nice if it *was* Python
rather than yet another ad hoc language, but I also get why they may want
to restrict it.)


If Python is what you want, you could use waf, but there is one big 
downside... You have to repack the upstream tarball: 
https://wiki.debian.org/UnpackWaf


--
Richard



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Validating tarballs against git repositories

2024-04-02 Thread PICCA Frederic-Emmanuel
One missing piece for me in order to migrate to meson is the integration 
between flymake and the autotools.

https://www.emacswiki.org/emacs/FlyMake#h5o-7



Re: autoreconf --force not forcing (was Re: Validating tarballs against git repositories)

2024-04-02 Thread Adrian Bunk
On Tue, Apr 02, 2024 at 06:05:22PM +0100, Colin Watson wrote:
> On Tue, Apr 02, 2024 at 06:57:20PM +0300, Adrian Bunk wrote:
> > On Mon, Apr 01, 2024 at 08:07:27PM +0200, Guillem Jover wrote:
> > > On Sat, 2024-03-30 at 14:16:21 +0100, Guillem Jover wrote:
> > > > This seems like a serious bug in autoreconf, but I've not checked if
> > > > this has been brought up upstream, and whether they consider it's
> > > > working as intended. I expect the serial to be used only when not
> > > > in --force mode though. :/
> > >...
> > > We might have to perform a mass rebuild to check if there could be
> > > fallout out of a true --force behavior change I guess.
> > 
> > Does gnulib upstream support upgrading/downgrading the gnulib m4 files
> > (like the one used in the xz backdoor) without upgrading/downgrading
> > the corresponding gnulib C files?
> 
> Yes, although it takes a bit of effort.  You can use the --local-dir
> option of gnulib-tool, which allows overriding individual Gnulib files
> or modules or applying patches to Gnulib files; or you can define a
> bootstrap_post_import_hook function in bootstrap.conf and do whatever
> you want there.

I had the impression that what Guillem has in mind is more towards 
adding dependencies on packages like gnulib and autoconf-archive
to dh-autoreconf, which would then blindly overwrite all m4 files
where a copy (same or older or newer) exists on the build system.

cu
Adrian



Re: autoreconf --force not forcing (was Re: Validating tarballs against git repositories)

2024-04-02 Thread Colin Watson
On Tue, Apr 02, 2024 at 06:57:20PM +0300, Adrian Bunk wrote:
> On Mon, Apr 01, 2024 at 08:07:27PM +0200, Guillem Jover wrote:
> > On Sat, 2024-03-30 at 14:16:21 +0100, Guillem Jover wrote:
> > > This seems like a serious bug in autoreconf, but I've not checked if
> > > this has been brought up upstream, and whether they consider it's
> > > working as intended. I expect the serial to be used only when not
> > > in --force mode though. :/
> >...
> > We might have to perform a mass rebuild to check if there could be
> > fallout out of a true --force behavior change I guess.
> 
> Does gnulib upstream support upgrading/downgrading the gnulib m4 files
> (like the one used in the xz backdoor) without upgrading/downgrading
> the corresponding gnulib C files?

Yes, although it takes a bit of effort.  You can use the --local-dir
option of gnulib-tool, which allows overriding individual Gnulib files
or modules or applying patches to Gnulib files; or you can define a
bootstrap_post_import_hook function in bootstrap.conf and do whatever
you want there.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: autoreconf --force not forcing (was Re: Validating tarballs against git repositories)

2024-04-02 Thread Adrian Bunk
On Mon, Apr 01, 2024 at 08:07:27PM +0200, Guillem Jover wrote:
>...
> On Sat, 2024-03-30 at 14:16:21 +0100, Guillem Jover wrote:
>...
> > This seems like a serious bug in autoreconf, but I've not checked if
> > this has been brought up upstream, and whether they consider it's
> > working as intended. I expect the serial to be used only when not
> > in --force mode though. :/
>...
> We might have to perform a mass rebuild to check if there could be
> fallout out of a true --force behavior change I guess.

Does gnulib upstream support upgrading/downgrading the gnulib m4 files
(like the one used in the xz backdoor) without upgrading/downgrading
the corresponding gnulib C files?

> Thanks,
> Guillem

cu
Adrian



Re: Validating tarballs against git repositories

2024-04-02 Thread Russ Allbery
Adrian Bunk  writes:
> On Mon, Apr 01, 2024 at 11:17:21AM -0400, Theodore Ts'o wrote:

>> Yeah, that too.  There are still people building e2fsprogs on AIX,
>> Solaris, and other legacy Unix systems, and I'd hate to break them, or
>> require a lot of pain for people who are building on MacPorts, et. al.
>>...

> Everything you mention should already be supported by Meson.

Meson honestly sounds great, and I personally love the idea of using a
build system whose language is a bit more like Python, since I use that
language professionally anyway.  (It would be nice if it *was* Python
rather than yet another ad hoc language, but I also get why they may want
to restrict it.)

The prospect of converting 25 years of portability code from M4 into a new
language is daunting, however.  For folks new to this ecosystem, what
resources are already available?  Are there large libraries of tests
already out there akin to gnulib and the Autoconf Archive?  Is there a
really good "porting from Autotools" guide for Meson that goes beyond the
very cursory guide in the Meson documentation?

The problem with this sort of migration is that it is an immense amount of
work just to get back to where you started.  I look at the amount of
effort and start thinking things like "well, if I'm going to rewrite a
bunch of things anyway, maybe I should just rewrite the software in Rust
instead."

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-04-02 Thread Adrian Bunk
On Mon, Apr 01, 2024 at 11:17:21AM -0400, Theodore Ts'o wrote:
> On Sat, Mar 30, 2024 at 08:44:36AM -0700, Russ Allbery wrote:
>...
> > Yes, perhaps it's time to switch to a different build system, although one
> > of the reasons I've personally been putting this off is that I do a lot of
> > feature probing for library APIs that have changed over time, and I'm not
> > sure how one does that in the non-Autoconf build systems.  Meson's Porting
> > from Autotools [1] page, for example, doesn't seem to address this use
> > case at all.
> 
> The other problem is that many of the other build systems are much
> slower than autoconf/makefile.  (Note: I don't use libtool, because
> it's so d*mn slow.)  Or building the alternate system might require a
> major bootstrapping phase, or requires downloading a JVM, etc.

The main selling point of Meson has been that it is a lot faster
than autotools.

> > Maybe the answer is "you should give up on portability to older systems as
> > the cost of having a cleaner build system," and that's not an entirely
> > unreasonable thing to say, but that's going to be a hard sell for a lot of
> > upstreams that care immensely about this.
> 
> Yeah, that too.  There are still people building e2fsprogs on AIX,
> Solaris, and other legacy Unix systems, and I'd hate to break them, or
> require a lot of pain for people who are building on MacPorts, et. al.
>...

Everything you mention should already be supported by Meson.

>   - Ted

cu
Adrian



Re: Validating tarballs against git repositories

2024-04-01 Thread Gioele Barabucci

On 31/03/24 08:59, Sven Joachim wrote:

The coreutils bootstrap script fetches files over the network, so it is
not possible to build the Debian package from upstream git tags.  At the
very least it would lack any translations, and there is also the
problem of the gnulib submodule.

Aren't these the same kinds of problem that affect go and rust packages?

--
Gioele Barabucci



autoreconf --force not forcing (was Re: Validating tarballs against git repositories)

2024-04-01 Thread Guillem Jover
Hi!

On Sat, 2024-03-30 at 14:16:21 +0100, Guillem Jover wrote:
> Let's try to go in detail on how this was done on the build system
> side (I'm doing this right now, as previously only had skimmed over
> the process).
> 
> The build system hook was planted in the tarball by adding a modified
> m4/build-to-host.m4 file. This file is originally from gnulib (but
> gettext would usually embed it if it required it). The macros contained
> within are used by m4/gettext.m4 coming from gettext.
> 
> So to start with, this dependency (the AM_GNU_GETTEXT macro uses
> gl_BUILD_TO_HOST) is only present with newer gettext versions. The
> tarball was autoreconf'ed with gettext 0.22.4, Debian has gettext 0.21,
> which does not pull that dependency in. In that case if gettext.m4
> would get modified in this build now, then the hook would be inert,
> but once we update to a newer gettext then it would get activated
> again.
> 
> The m4/build-to-host.m4 file in addition to hooking the payload into
> the build system, also got its serial number bumped from 3 to 30.
> 
> And the bigger issue is that «autoreconf -f -i» does not even refresh
> the files (as you'd expect from the --force), if the .m4 serial is higher.
> So in Debian currently, the gettext.m4 in the tarball does not get
> refreshed (still pulling in the malicious build-to-host.m4, which
> would not happen with the gettext version from Debian), and if we
> updated to a newer gettext then it would not update build-to-host.m4
> anyway due to its bumped serial.
> 
> This seems like a serious bug in autoreconf, but I've not checked if
> this has been brought up upstream, and whether they consider it's
> working as intended. I expect the serial to be used only when not
> in --force mode though. :/

I filed a report to autoconf upstream at:

  https://lists.gnu.org/archive/html/bug-autoconf/2024-03/threads.html

the discussion now continues on the next month archive at:

  https://lists.gnu.org/archive/html/bug-autoconf/2024-04/msg3.html

We might have to perform a mass rebuild to check if there could be
fallout out of a true --force behavior change I guess.

Thanks,
Guillem



Re: Validating tarballs against git repositories

2024-04-01 Thread Bastian Blank
On Mon, Apr 01, 2024 at 06:36:30PM +0200, Vincent Bernat wrote:
> On 2024-04-01 12:44, Bastian Blank wrote:
> > So in the end you still need to manually review all the stuff that the
> > tarball contains extra to the git.  And for that I don't see that it
> > actually gives some helping hands and makes it easier.
> > 
> > So I really don't see how this makes the problem in hand any better.
> > Again the workload of review is on the person doing the job.  Aka we do
> > fragile manual work instead of possibly failing automatic work.
> 
> I think that if Debian was using git instead of the generated tarball, this
> part of the backdoor would have just been included in the git repository as
> well. If we were able to magically switch everything to git (and we won't,
> we are not even able to agree on simpler stuff), I don't think it would have
> prevented the attack.

Nothing prevents such an attack.  Prevent would be a 100% fix, which can
not exist.  However what we can do is to make it harder to pull off.

If they had been forced to commit all the activation code into the repo,
it would have been directly visible for everyone.  But instead, they
choose to only ship it in the tarballs.

That's why I asked if this would make it better, by removing this manual
review task from the maintainer.

Bastian

-- 
I object to intellect without discipline;  I object to power without
constructive purpose.
-- Spock, "The Squire of Gothos", stardate 2124.5



Re: Validating tarballs against git repositories

2024-04-01 Thread Theodore Ts'o
On Mon, Apr 01, 2024 at 06:36:30PM +0200, Vincent Bernat wrote:
>
> I think that if Debian was using git instead of the generated tarball, this
> part of the backdoor would have just been included in the git repository as
> well. If we were able to magically switch everything to git (and we won't,
> we are not even able to agree on simpler stuff), I don't think it would have
> prevented the attack.

I'm not sure how much it would have helped, but I think the theory
behind eliminating the gap between the release tarball and the git
tree is the theory that in 2024, more developers are more likely to be
building and testing against the git tree, and so it might have been
more likely noticed.  After all, Jia Tan decided it was worth while to
check in 99% of the exploit in git, but to only enable it when it was
built from the release tarball.  If the exploit was always active when
built from the git tree, perhaps someone might have noticed before it
Debian uploaded the trojan'ed binary package to unstable, and then a
week or so later, having it promoted to testing.

I'm not sure how likely that would be for the specific case of
xz-utils, since it appears the number of developers (not just
Maintainers) was extremely small, but presumably Jia Tan decided to do
things in that way in the hopes of making less likely that the malware
would be noticed.

- Ted



Re: Validating tarballs against git repositories

2024-04-01 Thread Vincent Bernat

On 2024-04-01 12:44, Bastian Blank wrote:


So in the end you still need to manually review all the stuff that the
tarball contains extra to the git.  And for that I don't see that it
actually gives some helping hands and makes it easier.

So I really don't see how this makes the problem in hand any better.
Again the workload of review is on the person doing the job.  Aka we do
fragile manual work instead of possibly failing automatic work.


I think that if Debian was using git instead of the generated tarball, 
this part of the backdoor would have just been included in the git 
repository as well. If we were able to magically switch everything to 
git (and we won't, we are not even able to agree on simpler stuff), I 
don't think it would have prevented the attack.




Re: Validating tarballs against git repositories

2024-04-01 Thread Colin Watson
On Mon, Apr 01, 2024 at 05:24:45PM +0200, Simon Josefsson wrote:
> Colin Watson  writes:
> > On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:
> >> Running ./bootstrap in a tarball may lead to different results than the
> >> maintainer running ./bootstrap in pristine git.  It is the same problem
> >> as running 'autoreconf -fvi' in a tarball does not necessarily lead to
> >> the same result as the maintainer running 'autoreconf -fvi' from
> >> pristine git.  The different is what is pulled in from the system
> >> environment.  Neither tool was designed to be run from within a tarball,
> >> so this is just bad practice that never worked reliable and without a
> >> lot of complexity it will likely not become reliable either.
> >
> > The practice of running "autoreconf -fi" or similar via dh-autoreconf
> > has worked extremely well at scale in Debian.  I'm sure there are
> > complex edge cases where it's caused problems, but it's far from being a
> > disaster area.
> 
> Agreed.  I'm saying it doesn't fix the problem that I perceive that some
> people appear to believe, i.e., that running 'autoreconf -fi' solves the
> re-bootrapping problem.

Indeed - I've been pointing this out to people pretty much since the
xz-utils backdoor was discovered.

> >> I have suggested before that upstream's (myself included) should publish
> >> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
> >> checkout including submodules,
> >
> > A while back I contributed support to Gnulib's bootstrap script to allow
> > pinning particular commits without using submodules.  I would recommend
> > this mode; submodules have very strange UI.
> 
> I never liked git submodules generally, so I would be happy to work on
> getting that to be supported -- do you have pointers for earlier works
> here?

https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00029.html and
thread - it's been in gnulib for some years.  (I think you may have
misread me as saying that I'd tried to contribute this and that it never
made it, or something like that?)

> What is necessary, I think, is having something like this in
> bootstrap.conf:
> 
> gnulib_commit_id = 123abc567...

This is what I implemented, except I spelled it GNULIB_REVISION.  Then
see e.g.
https://gitlab.com/libpipeline/libpipeline/-/blob/main/bootstrap.conf.

> > As I noted in a comment on your blog, I think there is a case to be made
> > for .po files being committed to upstream git, and I'm not fond of the
> > practice of pulling them in only at bootstrap time (although I can
> > understand why that's come to be popular as a result of limited
> > maintainer time).  I have several reasons to believe this:
> 
> Those are all good arguments, but it still feels backwards to put these
> files into git.  It felt so good to externalize all the translation
> churn outside of my git (or then, CVS...) repositories many years ago.
> 
> I would prefer to maintain a po/SHA256SUMS in git and continue to
> download translations but have some mechanism to refuse to continue if
> the hashes differ.

I wonder if a middle ground would be automated commits of translations.
I don't think that's as robust, but a number of projects do it (e.g.
d-i) and at least it's amenable to having translations go through CI
rather than just being YOLOed straight into release tarballs.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: Validating tarballs against git repositories

2024-04-01 Thread Simon Josefsson
Colin Watson  writes:

> On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:
>> Running ./bootstrap in a tarball may lead to different results than the
>> maintainer running ./bootstrap in pristine git.  It is the same problem
>> as running 'autoreconf -fvi' in a tarball does not necessarily lead to
>> the same result as the maintainer running 'autoreconf -fvi' from
>> pristine git.  The different is what is pulled in from the system
>> environment.  Neither tool was designed to be run from within a tarball,
>> so this is just bad practice that never worked reliable and without a
>> lot of complexity it will likely not become reliable either.
>
> The practice of running "autoreconf -fi" or similar via dh-autoreconf
> has worked extremely well at scale in Debian.  I'm sure there are
> complex edge cases where it's caused problems, but it's far from being a
> disaster area.

Agreed.  I'm saying it doesn't fix the problem that I perceive that some
people appear to believe, i.e., that running 'autoreconf -fi' solves the
re-bootrapping problem.  Only some files get re-generated, such as the
./configure script, which is good, but not all files.  It wouldn't have
solved the xz case: build-to-host.m4 wouldn't have been re-generated.

With a *-src.tar.gz approach [1], the build-to-host.m4 file shouldn't
even be part of the tarball.  That would be easier to detect during an
audit of list of files compared to git repository, rather than waiting
for code review of file content (which usually only happens when
debugging some real-world problem).

[1] 
https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/

> I don't think running ./bootstrap can be generalized as easily as
> running autoreconf can, and it's definitely going to be tough to apply
> to all packages that use gnulib; but I think the blanket statement that
> it's bad practice is painting with too broad a brush.  For the packages
> where I've applied it so far (most of which I'm upstream for,
> admittedly), it's fine.

I'm not saying autoreconf -fi is bad practice, I'm saying it is
incomplete and leads to a feeling of having solved the re-bootstrapping
problem that isn't backed by facts.

>> I have suggested before that upstream's (myself included) should publish
>> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
>> checkout including submodules,
>
> A while back I contributed support to Gnulib's bootstrap script to allow
> pinning particular commits without using submodules.  I would recommend
> this mode; submodules have very strange UI.

I never liked git submodules generally, so I would be happy to work on
getting that to be supported -- do you have pointers for earlier works
here?

What is necessary, I think, is having something like this in
bootstrap.conf:

gnulib_commit_id = 123abc567...

and it would then use the external git repository pointed to by
--gnulib-refdir and locate that commit, and extract the gnulib files
from that gnulib commit.  And refuse to continue if it can't find that
particular commit.

This is essentially the same as a git submodule -- encoding the gnulib
commit to use in the project's own git history -- but without the bad
git submodule user experience.

I use different approaches to gnulib in projects. In OATH Toolkit I
still put all gnulib-generated content in git because running
./bootstrap otherwise used to take several minutes.  In most projects I
have given up and use git submodules.  In some I rely on running
gnulib-tool from git, and the exact gnulib git commit to use is only
whatever I happened to have checked out on my development machine.

>> *.po translations,
>
> As I noted in a comment on your blog, I think there is a case to be made
> for .po files being committed to upstream git, and I'm not fond of the
> practice of pulling them in only at bootstrap time (although I can
> understand why that's come to be popular as a result of limited
> maintainer time).  I have several reasons to believe this:

Those are all good arguments, but it still feels backwards to put these
files into git.  It felt so good to externalize all the translation
churn outside of my git (or then, CVS...) repositories many years ago.

I would prefer to maintain a po/SHA256SUMS in git and continue to
download translations but have some mechanism to refuse to continue if
the hashes differ.

/Simon


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-04-01 Thread Theodore Ts'o
On Sat, Mar 30, 2024 at 08:44:36AM -0700, Russ Allbery wrote:
> Luca Boccassi  writes:
> 
> > In the end, massaged tarballs were needed to avoid rerunning autoconfery
> > on twelve thousands different proprietary and non-proprietary Unix
> > variants, back in the day. In 2024, we do dh_autoreconf by default so
> > it's all moot anyway.
> 
> This is true from Debian's perspective.  This is much less obviously true
> from upstream's perspective, and there are some advantages to aligning
> with upstream about what constitutes the release artifact.

My upstream perspective is that I've burned repeatedly with
incompatible version changes in autotools programs which causes my
configure.{in,ac} file to no longer create a working configure script,
or which causes subtle breakages.  So my practice is to use autoconf
on my Debian testing development system before checking in the
configure.ac and configure files --- but I ship the generated files
and I don't tell people to run autoreconf before running ./configure.
And if things break after they run autoreconf, I tell them, "you ran
autoreconf; you get to keep both pieces".

And there *have* been times when autoconf has gotten updated in Debian
testing, and the resulting configure script has broken, at which point
I curse at autotools, and fix the configure.ac and/or aclocal.m4
files, etc., and *then* check in the generated configure file and
autotool source files.

> Yes, perhaps it's time to switch to a different build system, although one
> of the reasons I've personally been putting this off is that I do a lot of
> feature probing for library APIs that have changed over time, and I'm not
> sure how one does that in the non-Autoconf build systems.  Meson's Porting
> from Autotools [1] page, for example, doesn't seem to address this use
> case at all.

The other problem is that many of the other build systems are much
slower than autoconf/makefile.  (Note: I don't use libtool, because
it's so d*mn slow.)  Or building the alternate system might require a
major bootstrapping phase, or requires downloading a JVM, etc.

> Maybe the answer is "you should give up on portability to older systems as
> the cost of having a cleaner build system," and that's not an entirely
> unreasonable thing to say, but that's going to be a hard sell for a lot of
> upstreams that care immensely about this.

Yeah, that too.  There are still people building e2fsprogs on AIX,
Solaris, and other legacy Unix systems, and I'd hate to break them, or
require a lot of pain for people who are building on MacPorts, et. al.
It hasn't been *all* that long ago that I started require C99
compilers

That being said, if someone who was worried about an Jia Tan-style
attack with e2fsprogs, first of all, you can verify that configure
corresponds to autoconf on the Debian testing at the time when the
archive was generated, and the officially released tar file is
generated via:

git archive --prefix=e2fsprogs-${ver}/ ${commit} | gzip -9n > $fn

... and the release tarballs are also in the pristine-tar branch of
e2fsprogs.  So even if kernel.org (preferred) and sourceforget.net
(legacy) servers for the e2fsprogs tar files completely implodes, and
you only have access to the git repo, you can still get the original
e2fsprogs tar files using pristine-tar.

- Ted



Re: Validating tarballs against git repositories

2024-04-01 Thread Andrey Rakhmatullin
On Mon, Apr 01, 2024 at 04:10:55PM +0200, Alexandre Detiste wrote:
> Le lun. 1 avr. 2024 à 15:49, Colin Watson  a écrit :
> >
> > The practice of running "autoreconf -fi" or similar via dh-autoreconf
> > has worked extremely well at scale in Debian.  I'm sure there are
> > complex edge cases where it's caused problems, but it's far from being a
> > disaster area.
> 
> It's pretty uncommon, only old stuff.
> 
> That could be monitored (via lintian ?),
> anything with "--without autoreconf"
> or "overides dh_autoreconf".
Or compat < 10.

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-04-01 Thread Alexandre Detiste
Le lun. 1 avr. 2024 à 15:49, Colin Watson  a écrit :
>
> The practice of running "autoreconf -fi" or similar via dh-autoreconf
> has worked extremely well at scale in Debian.  I'm sure there are
> complex edge cases where it's caused problems, but it's far from being a
> disaster area.

It's pretty uncommon, only old stuff.

That could be monitored (via lintian ?),
anything with "--without autoreconf"
or "overides dh_autoreconf".

grep autoreconf */debian/rules
enigma/debian/rules:override_dh_autoreconf:
enigma/debian/rules:#dh_autoreconf
geki2/debian/rules: dh $@ --no-parallel --without autoreconf
kanatest/debian/rules:execute_before_dh_autoreconf:
lincity-ng/debian/rules:dh $@ --without autoreconf



Re: Validating tarballs against git repositories

2024-04-01 Thread Colin Watson
On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:
> Running ./bootstrap in a tarball may lead to different results than the
> maintainer running ./bootstrap in pristine git.  It is the same problem
> as running 'autoreconf -fvi' in a tarball does not necessarily lead to
> the same result as the maintainer running 'autoreconf -fvi' from
> pristine git.  The different is what is pulled in from the system
> environment.  Neither tool was designed to be run from within a tarball,
> so this is just bad practice that never worked reliable and without a
> lot of complexity it will likely not become reliable either.

The practice of running "autoreconf -fi" or similar via dh-autoreconf
has worked extremely well at scale in Debian.  I'm sure there are
complex edge cases where it's caused problems, but it's far from being a
disaster area.

I don't think running ./bootstrap can be generalized as easily as
running autoreconf can, and it's definitely going to be tough to apply
to all packages that use gnulib; but I think the blanket statement that
it's bad practice is painting with too broad a brush.  For the packages
where I've applied it so far (most of which I'm upstream for,
admittedly), it's fine.

> I have suggested before that upstream's (myself included) should publish
> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
> checkout including submodules,

A while back I contributed support to Gnulib's bootstrap script to allow
pinning particular commits without using submodules.  I would recommend
this mode; submodules have very strange UI.

> *.po translations,

As I noted in a comment on your blog, I think there is a case to be made
for .po files being committed to upstream git, and I'm not fond of the
practice of pulling them in only at bootstrap time (although I can
understand why that's come to be popular as a result of limited
maintainer time).  I have several reasons to believe this:

 * There at least used to be some edge cases where format string
   mismatches aren't caught by the gettext toolchain.  I've forgotten
   the details, but I remember running into one case where this turned
   into at least a translation-induced crash if not a security
   vulnerability.

 * Like just about everyone, translators make mistakes.  Since they're
   often working with technical text across a wide variety of domains,
   my experience is that they're more likely to make mistakes when
   dealing with package-specific terms, and these are often left
   untranslated, which means that the maintainer is in a much better
   position to catch those mistakes than you might think.  I don't want
   to cast shade on anyone in particular, but I find that I catch
   mistakes in a significant fraction of man-db translation updates just
   by looking at the diff without having to understand the target
   language; for example, if I add an item to a list and also make some
   other nearby textual changes, it's quite common for translators to
   miss adding the item to the list, and I can spot that sort of thing
   almost regardless of the language.

 * Actively malicious translations are rare, but they do happen.
   
https://discourse.ubuntu.com/t/announcement-ubuntu-desktop-23-10-release-image-translation-incident-now-resolved/39365
   was a recent case of this.  I seem to remember that when I tracked
   down the original files it was fairly obvious that the "translations"
   had nothing to do with the source strings even without understanding
   Ukrainian.

 * If you're faced with a user report containing translated messages,
   then it's much easier to figure out what's going on if you can just
   look for them in git.  I've found this to be a source of frustration
   on several occasions when dealing with packages where ./bootstrap
   pulls in translations.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: Validating tarballs against git repositories

2024-04-01 Thread Bastian Blank
On Mon, Apr 01, 2024 at 12:03:48PM +0200, Bastian Blank wrote:
> On Mon, Apr 01, 2024 at 02:31:47AM +0200, gregor herrmann wrote:
> > That's not mutually exclusive. When adding an additional git remote
> > and using gbp-import-orig's --upstream-vcs-tag you get the best of
> > both worlds.
> And this will error out if there are unexpected changes in the tarball?
> How will it be able to detect those?

Okay, I looked into what it does.  It just adds another parent to the
commit with the import of the tar.  It does nothing else with this
information.

So in the end you still need to manually review all the stuff that the
tarball contains extra to the git.  And for that I don't see that it
actually gives some helping hands and makes it easier.

So I really don't see how this makes the problem in hand any better.
Again the workload of review is on the person doing the job.  Aka we do
fragile manual work instead of possibly failing automatic work.

Bastian

-- 
Women professionals do tend to over-compensate.
-- Dr. Elizabeth Dehaver, "Where No Man Has Gone Before",
   stardate 1312.9.



Re: Validating tarballs against git repositories

2024-04-01 Thread Bastian Blank
On Mon, Apr 01, 2024 at 02:31:47AM +0200, gregor herrmann wrote:
> That's not mutually exclusive. When adding an additional git remote
> and using gbp-import-orig's --upstream-vcs-tag you get the best of
> both worlds.

And this will error out if there are unexpected changes in the tarball?
How will it be able to detect those?

Bastian

-- 
I've already got a female to worry about.  Her name is the Enterprise.
-- Kirk, "The Corbomite Maneuver", stardate 1514.0



Re: Validating tarballs against git repositories

2024-04-01 Thread Simon Josefsson
"G. Branden Robinson"  writes:

> At 2024-03-31T22:32:49+, Stefano Rivera wrote:
>> Upstreams would probably prefer that we used git repositories
>> *directly* as source artifacts, but that comes with a whole other can
>> of worms...
>
> Speaking from my upstream groff perspective, I wouldn't _prefer_ that.
>
> The distribution archives get build-testing on a much wider variety of
> systems, thanks to people on the groff@ and platform-testers@gnu mailing
> lists that help out when a release candidate is announced.  They have
> access to platforms more exotic that I and a few other bleeding-edge
> HEAD mavens do.  This practice tangibly improved the quality of the
> groff 1.23.0 release, especially on surviving proprietary Unix systems.
>
> Building from the repo, or using the bootstrap script--which Colin
> Watson just today ensured will be in future distribution archives--is
> fine.[1]  I'm glad some people build the project that way.  But I think
> that procedure serves an audience that is distinguishable in some ways.

Running ./bootstrap in a tarball may lead to different results than the
maintainer running ./bootstrap in pristine git.  It is the same problem
as running 'autoreconf -fvi' in a tarball does not necessarily lead to
the same result as the maintainer running 'autoreconf -fvi' from
pristine git.  The different is what is pulled in from the system
environment.  Neither tool was designed to be run from within a tarball,
so this is just bad practice that never worked reliable and without a
lot of complexity it will likely not become reliable either.

I have suggested before that upstream's (myself included) should publish
PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
checkout including submodules, *.po translations, and whatever else is
required to actually build the project that is normally pulled in from
external places (autoconf archive macros?).  This *-src.tar.gz tarball
should be possible to ./bootstrap and that would be the intended way to
build it for people who care about vendored files.  Thoughts?  Perhaps I
should formalize this proposal a bit more.

/Simon

> Regards,
> Branden
>
> [1] 
> https://git.savannah.gnu.org/cgit/groff.git/commit/?id=822fef56e9ab7cbe69337b045f6f20e32e25f566
>


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-31 Thread Marco d'Itri
On Apr 01, gregor herrmann  wrote:

> > I switched long ago all my packages from tar archives to the git 
> > upstream tree. Not only this makes much easier to understand the changes 
> > in a new release,
> That's not mutually exclusive. When adding an additional git remote
> and using gbp-import-orig's --upstream-vcs-tag you get the best of
> both worlds.
No: I get nothing of value by doing that and the repository will be 
cluttered by commits that I do not care about.
Also: upstream VCS snapshots.

-- 
ciao,
Marco


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-31 Thread gregor herrmann
On Sun, 31 Mar 2024 23:59:20 +0200, Marco d'Itri wrote:

> I switched long ago all my packages from tar archives to the git 
> upstream tree. Not only this makes much easier to understand the changes 
> in a new release,

That's not mutually exclusive. When adding an additional git remote
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.


Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   


signature.asc
Description: Digital Signature


Re: Validating tarballs against git repositories

2024-03-31 Thread gregor herrmann
On Sun, 31 Mar 2024 10:12:35 -0700, Russ Allbery wrote:

> > My point is that, while there will be for sure exceptions here and
> > there, by and large the need for massaged tarballs comes from projects
> > using autoconf and wanting to ship source archives that do not require
> > to run the autoconf machinery.
> Just as a data point, literally every C project for which I am upstream
> ships additional files in the release tarballs that are not in Git for
> reasons unrelated to Autoconf and friends.

This is also true for every perl distribution on the CPAN made with
the standard build tools (and I write this as a response to a mail of
yours as I know that you know what I'm talking about :))
 
> Just to note, though, this means that we lose the upstream signature in
> the archive.  The only place the upstream signature would then live is in
> Salsa.

This also means that we are, at least in some ecosystems, diverging
from the preferred way of distribution, and maybe more important, that
we are adding a new step 0 to our build process, which is: making a
(fake) upstream release.


Cheers,
gregor 

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   


signature.asc
Description: Digital Signature


Re: Validating tarballs against git repositories

2024-03-31 Thread G. Branden Robinson
At 2024-03-31T22:32:49+, Stefano Rivera wrote:
> Upstreams would probably prefer that we used git repositories
> *directly* as source artifacts, but that comes with a whole other can
> of worms...

Speaking from my upstream groff perspective, I wouldn't _prefer_ that.

The distribution archives get build-testing on a much wider variety of
systems, thanks to people on the groff@ and platform-testers@gnu mailing
lists that help out when a release candidate is announced.  They have
access to platforms more exotic that I and a few other bleeding-edge
HEAD mavens do.  This practice tangibly improved the quality of the
groff 1.23.0 release, especially on surviving proprietary Unix systems.

Building from the repo, or using the bootstrap script--which Colin
Watson just today ensured will be in future distribution archives--is
fine.[1]  I'm glad some people build the project that way.  But I think
that procedure serves an audience that is distinguishable in some ways.

Regards,
Branden

[1] 
https://git.savannah.gnu.org/cgit/groff.git/commit/?id=822fef56e9ab7cbe69337b045f6f20e32e25f566


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-31 Thread Stefano Rivera
Hi Guillem (2024.03.30_04:41:37_+)
> > 1. Move towards allowing, and then favoring, git-tags over source tarballs
> 
> I assume you mean git archives out of git tags? Otherwise how do you
> go from git-tag to a source package in your mind?

There are some issues with transforming upstream's git-centric world
into tarballs for Debian source packages, that are worth bearing in mind.

The upstream git repository has some extra metadata available that
upstream build tools start depending on. Things like: versions, tracked
files, and ignored files.

This came up in the Python world, where setuptools-scm has become more
popular over the years. This is a plugin for setuptools that extracts
some metadata from the git repository:
1. Determine the current version. Historically, specified in setup.py.
2. Determine the data files that should be shipped in the installed
   package. Historically, these were specified in a MANIFEST.in file,
   but developers got lazy and delegated this problem to git.

Currently we set the version for packages that depend on 1 by an
environment variable that setuptools-scm will consume.

For packages that get file lists from git, it's a little more complex.
setuptools writes a foo.eggi-info/SOURCES.txt into source artifacts that
it produces (sdists). When this file is present, it's used as a list of
files.
https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest

So... for Python packages using setuptools-scm, we're pushed towards
depending on upstream-created source tarballs (sdists), rather than
upstream git archives, because we don't have the ".git" directory in our
source packages.

I can imagine that other ecosystems would run into similar problems and
solve them by inventing similar protocols, if they solve them at all.
Upstreams would probably prefer that we used git repositories *directly*
as source artifacts, but that comes with a whole other can of worms...

Stefano
-- 
Stefano Rivera
  http://tumbleweed.org.za/
  +1 415 683 3272



Re: Validating tarballs against git repositories

2024-03-31 Thread Marco d'Itri
On Mar 31, Russ Allbery  wrote:

> Most of this is pregenerated documentation (primarily man pages generated
> from POD), but it also includes generated test data and other things.  The
> reason is similar: regenerating those files requires tools that may not be
> present on an older system (like a mess of random Perl modules) or, in the
> case of the man pages, may be old and thus produce significantly inferior
> output.
But we do not use older systems to build our packages, so this does not 
matter.

Indeed, long ago I started building inn2 from the git tree, no more 
tarballs...
I switched long ago all my packages from tar archives to the git 
upstream tree. Not only this makes much easier to understand the changes 
in a new release, but it also makes possible packaging upstream 
snapshots.

> Just to note, though, this means that we lose the upstream signature in
> the archive.  The only place the upstream signature would then live is in
> Salsa.
Totally worth it!

-- 
ciao,
Marco


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-31 Thread Adrian Bunk
On Sat, Mar 30, 2024 at 11:55:04AM +, Luca Boccassi wrote:
>...
> In the end, massaged tarballs were needed to avoid rerunning
> autoconfery on twelve thousands different proprietary and
> non-proprietary Unix variants, back in the day. In 2024, we do
> dh_autoreconf by default so it's all moot anyway.
>...

The first step of the xz exploit was in a vendored gnulib m4 file that
is not (and should not be) in git and that does not get updated by 
dh_autoreconf.

cu
Adrian



Re: Validating tarballs against git repositories

2024-03-31 Thread Russ Allbery
Luca Boccassi  writes:
> On Sat, 30 Mar 2024 at 15:44, Russ Allbery  wrote:
>> Luca Boccassi  writes:

>>> In the end, massaged tarballs were needed to avoid rerunning
>>> autoconfery on twelve thousands different proprietary and
>>> non-proprietary Unix variants, back in the day. In 2024, we do
>>> dh_autoreconf by default so it's all moot anyway.

>> This is true from Debian's perspective.  This is much less obviously
>> true from upstream's perspective, and there are some advantages to
>> aligning with upstream about what constitutes the release artifact.

> My point is that, while there will be for sure exceptions here and
> there, by and large the need for massaged tarballs comes from projects
> using autoconf and wanting to ship source archives that do not require
> to run the autoconf machinery.

Just as a data point, literally every C project for which I am upstream
ships additional files in the release tarballs that are not in Git for
reasons unrelated to Autoconf and friends.

Most of this is pregenerated documentation (primarily man pages generated
from POD), but it also includes generated test data and other things.  The
reason is similar: regenerating those files requires tools that may not be
present on an older system (like a mess of random Perl modules) or, in the
case of the man pages, may be old and thus produce significantly inferior
output.

> However, we as in Debian do not have this problem. We can and do re-run
> the autoconf machinery on every build. And at least on the main forges,
> the autogenerated (and thus out of reach from this kind of attacks)
> tarball is always present too - the massaged tarball is an _addition_,
> not a _substitution_. Hence: we should really really think about forcing
> all packages, by policy, to use the autogenerated tarball by default
> instead of the autoconf one, when both are present, unless extenuating
> circumstances (that have to be documented) are present.

I think this is probably right as long as by "autogenerated" you mean
basing the Debian package on a signed upstream Git tag and *locally*
generating a tarball to satisfy Debian's .orig.tar.gz requirement, not
using GitHub's autogenerated tarball that has all sorts of other potential
issues.

Just to note, though, this means that we lose the upstream signature in
the archive.  The only place the upstream signature would then live is in
Salsa.

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-03-31 Thread Iustin Pop
On 2024-03-31 08:03:40, Gioele Barabucci wrote:
> On 30/03/24 20:43, Iustin Pop wrote:
> > On 2024-03-30 11:47:56, Luca Boccassi wrote:
> > > On Sat, 30 Mar 2024 at 09:57, Iustin Pop  wrote:
> > > > Give me good Salsa support for autopkgtest + lintian + piuparts, and
> > > > easy support (so that I just have to toggle one checkbox), and I'm
> > > > happy. Or even better, integrate all that testing with Salsa (I don't
> > > > know if it has "CI tests must pass before merging"), and block tagging
> > > > on the tagged version having been successfully tested.
> > > 
> > > This is all already implemented by Salsa CI? You just need to include
> > > the yml and enable the CI in the settings
> > 
> > I will be the first to admit I'm not up to date on latest Salsa news,
> > but see, what you mention  - "include the yml" - is exactly what I don't
> > want.
> 
> Salsa CI is enabled by default for all projects in the debian/ namespace
> .
> 
> Adding a yml file or changing the CI settings to reference the Salsa CI
> pipeline is needed only for projects in team- or maintainer-specific
> repositories, or when the dev wants to enable additional tests (or
> configure/block the default tests).

That sounds good, but are you sure that all /debian/ projects get it?

I chose one random package of mine,
https://salsa.debian.org/debian/python-pyxattr, and on the home page I
see "Setup CI/CD" (implying it's disabled), and under build, I see
nothing enabled.

Is there a howto somewhere? Happy to read/follow.

iustin



Re: Validating tarballs against git repositories

2024-03-31 Thread Luca Boccassi
On Sat, 30 Mar 2024 at 15:44, Russ Allbery  wrote:
>
> Luca Boccassi  writes:
>
> > In the end, massaged tarballs were needed to avoid rerunning autoconfery
> > on twelve thousands different proprietary and non-proprietary Unix
> > variants, back in the day. In 2024, we do dh_autoreconf by default so
> > it's all moot anyway.
>
> This is true from Debian's perspective.  This is much less obviously true
> from upstream's perspective, and there are some advantages to aligning
> with upstream about what constitutes the release artifact.

My point is that, while there will be for sure exceptions here and
there, by and large the need for massaged tarballs comes from projects
using autoconf and wanting to ship source archives that do not require
to run the autoconf machinery. And said upstreams might care about
this because they support backward compatibility with ancient Unix
stuff and such like (I mean, I _am_ upstream in one project that does
exactly this for exactly this reason, zeromq, so I understand that
requirement perfectly well).
However, we as in Debian do not have this problem. We can and do
re-run the autoconf machinery on every build. And at least on the main
forges, the autogenerated (and thus out of reach from this kind of
attacks) tarball is always present too - the massaged tarball is an
_addition_, not a _substitution_. Hence: we should really really think
about forcing all packages, by policy, to use the autogenerated
tarball by default instead of the autoconf one, when both are present,
unless extenuating circumstances (that have to be documented) are
present.



Re: Validating tarballs against git repositories

2024-03-31 Thread Stefano Zacchiroli
On Sun, Mar 31, 2024 at 08:16:33AM +0200, Lucas Nussbaum wrote:
> On 29/03/24 at 23:29 -0700, Russ Allbery wrote:
> > This is why I am somewhat skeptical that forcing everything into Git
> > commits is as much of a benefit as people are hoping.  This particular
> > attacker thought it was better to avoid the Git repository, so that is
> > evidence in support of that approach, and it's certainly more helpful,
> > once you know something bad has happened, to be able to use all the Git
> > tools to figure out exactly what happened.  But I'm not sure we're fully
> > accounting for the fact that tags can be moved, branches can be
> > force-pushed, and if the Git repository is somewhere other than GitHub,
> > the malicious possibilities are even broader.
> 
> I wonder if Software Heritage could help with that part?

Yeah (provided that archival happens at the right moment) you can use
Software Heritage APIs to detect, for instance, git history rewrites as
and also commits moving from one branch/tag to another.

It occurs to me that in the Guix/Nix packaging model, where they note
down the commit of interest in their packaging recipe, you'll also
automatically discover if a commit disappeared from upstream repo
without needing a lot of extra tooling/integration (although not if it
has moved between branches). However, you need a backup place to
retrieve the commit from in case it disappear or gets rewritten upstream
(Guix uses Software Heritage for this).

Cheers
-- 
Stefano Zacchiroli . z...@upsilon.cc . https://upsilon.cc/zack  _. ^ ._
Full professor of Computer Science  o o   o \/|V|\/
Télécom Paris, Polytechnic Institute of Paris o o o   <\>
Co-founder & CTO Software Heritageo o o o   /\|^|/\
https://twitter.com/zacchiro . https://mastodon.xyz/@zacchiro   '" V "'



Re: Validating tarballs against git repositories

2024-03-31 Thread Sven Joachim
On 2024-03-30 12:19 +0100, Simon Josefsson wrote:

> Gioele Barabucci  writes:
>
>> Just as an example, bootstrapping coreutils currently requires
>> bootstrapping at least 68 other packages, including libx11-6 [1]. If
>> coreutils supported  [2], the transitive closure of its
>> Build-Depends would be reduced to 20 packages, most of which in
>> build-essential.
>>
>> [1]
>> https://buildd.debian.org/status/fetch.php?pkg=coreutils=amd64=9.4-3.1=1710441056=1
>> [2] https://bugs.debian.org/1057136
>
> Coreutils in Debian uses upstream tarballs and does not do a full
> bootstrap build.  It does autoreconf instead of ./bootstrap.  So the
> dependencies above is not the entire bootstrapping story to build
> coreutils from git compared to building from tarballs.

The coreutils bootstrap script fetches files over the network, so it is
not possible to build the Debian package from upstream git tags.  At the
very least it would lack any translations, and there is also the
problem of the gnulib submodule.

Cheers,
   Sven



Re: Validating tarballs against git repositories

2024-03-31 Thread Lucas Nussbaum
On 29/03/24 at 23:29 -0700, Russ Allbery wrote:
> Antonio Russo  writes:
> > But, I will definitely concede that, had I seen a commit that changed
> > that line in the m4, there's a good chance my eyes would have glazed
> > over it.
> 
> This is why I am somewhat skeptical that forcing everything into Git
> commits is as much of a benefit as people are hoping.  This particular
> attacker thought it was better to avoid the Git repository, so that is
> evidence in support of that approach, and it's certainly more helpful,
> once you know something bad has happened, to be able to use all the Git
> tools to figure out exactly what happened.  But I'm not sure we're fully
> accounting for the fact that tags can be moved, branches can be
> force-pushed, and if the Git repository is somewhere other than GitHub,
> the malicious possibilities are even broader.
> 
> We could narrow those possibilities somewhat by maintaining
> Debian-controlled mirrors of upstream Git repositories so that we could
> detect rewritten history.  (There are a whole lot of reasons why I think
> dgit is a superior model for archive management.  One of them is that it
> captures the full Git history of upstream at the point of the upload on
> Debian-controlled infrastructure if the maintainer of the package bases it
> on upstream's Git tree.)

I wonder if Software Heritage could help with that part?

Lucas



Re: Validating tarballs against git repositories

2024-03-31 Thread Gioele Barabucci

On 30/03/24 20:43, Iustin Pop wrote:

On 2024-03-30 11:47:56, Luca Boccassi wrote:

On Sat, 30 Mar 2024 at 09:57, Iustin Pop  wrote:

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.


This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings


I will be the first to admit I'm not up to date on latest Salsa news,
but see, what you mention  - "include the yml" - is exactly what I don't
want.


Salsa CI is enabled by default for all projects in the debian/ namespace 
.


Adding a yml file or changing the CI settings to reference the Salsa CI 
pipeline is needed only for projects in team- or maintainer-specific 
repositories, or when the dev wants to enable additional tests (or 
configure/block the default tests).


Regard,

--
Gioele Barabucci



Git and SHA1 collisions (Was: Re: Validating tarballs against git repositories)

2024-03-30 Thread Gioele Barabucci

On 30/03/24 23:09, Simon Josefsson wrote:

Russ Allbery  writes:

Simon Josefsson  writes:

Sean Whitton  writes:



We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.



Can you share that analysis?  As far as I understand, it is possible for
a malicious actor to create a git repository with the same commit id as
HEAD, with different historic commits and tree content.  I thought a
signed tag is merely a signed reference to a particular commit id.  If
that commit id is a SHA1 reference, that opens up for ambiguity given
recent (well, 2019) results on SHA1.  Of course, I may be wrong in any
of the chain, so would appreciate explanation of how this doesn't work.


I believe you're talking about two different things.  I think Sean is
talking about preimage resistance, which assumes that the known-good
repository is trusted, and I believe Simon is talking about manufactured
collisions where the attacker controls both the good and the bad
repository.


Right.  I think the latter describes the xz scenario: someone could have
pushed a maliciously crafted commit with a SHA1 collision commit id, so
there are two different git repositories with that commit id, and a
signed git tag on that commit id authenticates both trees, opening up
for uncertainty about what was intended to be used.  Unless I'm missing
some detail of how git signed tag verification works that would catch
this.


Git contains a couple of countermeasures meant to greatly reduce the 
practical feasibility of such manipulations.


The first is the fact that it uses a hardened SHA-1 function that 
produces different hashes when it is fed one of the known collision 
seeds ("disturbance vectors"). This hardened version of SHA-1 is only 
resistant against known attacks, but it substantially raises the bar 
from "use one of these files downloaded from the Web" to "set up your 
own collision generator that will work only once for this specific 
attack and once discovered will no longer work".


From https://lwn.net/Articles/716093/


Git can be configured with the USE_SHA1DC build time configuration
variable to use SHA-1 implementation from shattered.io that detects
attempted collisions

From https://shattered.io/


Is Hardened SHA-1 vulnerable? No, SHA-1 hardened with
counter-cryptanalysis (see ‘how do I detect the attack’) will detect
cryptanalytic collision attacks.


The second countermeasure is the fact that if two objects (e.g., 
commits) happen to have the same hash, then Git will use the one it has 
seen first. In the common case in which the original author has pushed a 
commit and the attacker subsequently pushed a malicious version of that 
commit with the same hash, then all people that fetch that repository 
will always see (as in, write to disk during a checkout) the original 
version, not the malicious version. The malicious version will still be 
in the git pack, but git will ignore it.


From https://marc.info/?l=git=115678778717621=2

Nope. If it has the same SHA1, it means that when we receive the 
object from the other end, we will _not_ overwrite the object we 
already have. […] if we ever see a collision, the "earlier" object

in any particular repository will always end up overriding


With these countermeasures in places, in order to successfully pull a 
collusion attack, the attacker must:


1. Create an unseen collision seed.
2. Have access to the server that hosts the official repo to remove 
traces of the original commit.

3. Hope that nobody pulled the repo before they tampered it.
4. Hope that nobody will notice a series of random characters being 
shown during operations like git log -p.


Sure, SHA1 is broken, should be avoided and not relied upon. And many 
people can easily see how to work around the countermeasures put in 
place by SHA1.


But pulling a successful collision attack is not a trivial task. For 
instance, the xz attacker did not have all that was required to carry it 
out (for example they had no direct access to the git servers... yet).


Regards,

--
Gioele Barabucci



Re: Validating tarballs against git repositories

2024-03-30 Thread Timo Röhling

Hi,

* Simon Josefsson  [2024-03-30 12:19]:

Relying on signed git tags is not reliable because git is primarily
SHA1-based which in 2019 cost $45K to do a collission attack for.
FWIW, Gitlab is working on support for SHA 256 hashing [1], and as 
of Git 2.42, the SHA 256 repository format has matured enough that 
backwards incompatible breaks are very unlikely [2].



Cheers
Timo


[1] 
https://about.gitlab.com/blog/2023/08/28/sha256-support-in-gitaly/

[2] https://lore.kernel.org/lkml/xmqqr0nwp8mv.fsf@gitster.g/


--
⢀⣴⠾⠻⢶⣦⠀   ╭╮
⣾⠁⢠⠒⠀⣿⡁   │ Timo Röhling   │
⢿⡄⠘⠷⠚⠋⠀   │ 9B03 EBB9 8300 DF97 C2B1  23BF CC8C 6BDD 1403 F4CA │
⠈⠳⣄   ╰╯


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Russ Allbery
Simon Josefsson  writes:
> Russ Allbery  writes:

>> I believe you're talking about two different things.  I think Sean is
>> talking about preimage resistance, which assumes that the known-good
>> repository is trusted, and I believe Simon is talking about
>> manufactured collisions where the attacker controls both the good and
>> the bad repository.

> Right.  I think the latter describes the xz scenario: someone could have
> pushed a maliciously crafted commit with a SHA1 collision commit id, so
> there are two different git repositories with that commit id, and a
> signed git tag on that commit id authenticates both trees, opening up
> for uncertainty about what was intended to be used.  Unless I'm missing
> some detail of how git signed tag verification works that would catch
> this.

This is also my understanding.

>> The dgit and tag2upload design probably (I'd have to think about it
>> some more, ideally while bouncing the problem off of someone else,
>> because I've recycled those brain cells for other things) only needs
>> preimage resistance, but the general case of a malicious upstream may
>> be vulnerable to manufactured collisions.

> It is not completely clear to me: How about if some malicious person
> pushed a commit to salsa, asked a DD to "please review this repository
> and sign a tag to make the upload"?  The DD would presumably sign a
> commit id that authenticate two different git trees, one with the
> exploit and one without it.

Oh, hm, yes, this is a good point.  I had forgotten that tag2upload was
intended to work by pushing a tag to Salsa.  This means an attacker can
potentially race Salsa CI to move that tag to the malicious tree before
the tree is fetched by tag from Salsa, or reuse the signed tag with a
different repository with the same SHA-1.

The first, most obvious step is that one has to make sure that a signed
tag is restricted to a specific package and version and not portable to a
different package and/or version that has the same SHA-1 hash due to
attacker construction.  There are several obvious ways that could be done;
the one that comes immediately to mind is to require the tag message be
the source package name and version number, which is good practice anyway.

I think any remaining issues could be addressed with a fairly simple
modification to the protocol: rather than pushing the signed tag to Salsa,
the DD reviewer should push the signed tag to a separate archive server
similar to that used by dgit today.  As long as the first time the signed
tag leaves the DD's system is in conjunction with a push of the
corresponding reviewed tree to secure project systems, this avoids the
substitution problem.  The tag could then be pushed back to Salsa, either
by the DD or by the service.

This unfortunately means that one couldn't use the Salsa CI service to do
the source package construction, and one has to know about this extra
server.  I think that restriction comes from the fact that we're worried
an attacker may be able to manipulate the Salsa Git repository (through
force pushes and tag replacements, for example), whereas the separate
dedicated archive server can be more restrictive and never allow force
pushes or tag moves, and reject any attempts to push a SHA-1 hash that has
already been seen.

Another possible option would be to prevent force pushes and tag moves in
Salsa, since I think one of those operations would be required to pull off
this attack, but maybe I'm missing someting.  One of the things I'm murky
on is exactly what Git operations are required to substitute the two trees
with identical SHA-1 hashes.  That property is going to break Git in weird
ways, and I'm not sure what that means for one's ability to manipulate a
Git repository over the protocols that Salsa exposes.

Obviously it would be ideal if Git used stronger hashes than SHA-1 for
tags, so that one need worry less about all of this.

Even if my analysis is wrong, I think there are some fairly obvious and
trivial additions to the tag2upload process that would prevent this
attack, such as building a Merkle tree of the reviewed source tree using a
SHA-256 hash and embedding the top hash of that tree in the body of the
signed tag where it can be verified by the archive infrastructure.  That
might be a good idea *anyway*, although it does have the unfortunate side
effect of requiring a local client to produce a correct tag rather than
using standard Git signed tags.  Uploading to Debian currently already
semi-requires a custom local client, so to me this isn't a big deal,
although I think there was some hope to avoid that.

(These variations unfortunately don't help with the upstream problem.)

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-03-30 Thread Adrian Bunk
On Fri, Mar 29, 2024 at 11:29:01PM -0700, Russ Allbery wrote:
>...
> In other words, we should make sure that breaking the specific tactics
> *this* attacker used truly make the attacker's life harder, as opposed to
> making life harder for Debian packagers while only forcing a one-time,
> minor shift in attacker tactics.  I *think* I'm mostly convinced that
> forcing the attacker into Git commits is a useful partial defense, but I'm
> not sure this is obviously true.
>...

There are also other reasons why using tarballs by default is no longer 
a good option.

In many cases our upstream source is the unsigned tarball Github 
automatically provides for every tag, which invites MITM attacks.

The hash of these tarballs is expected to change over time, which makes 
it harder to reliably verify that the upstream sources we have in the 
archive match what is provided upstream.

cu
Adrian



Re: Validating tarballs against git repositories

2024-03-30 Thread Simon Josefsson
Russ Allbery  writes:

> Simon Josefsson  writes:
>> Sean Whitton  writes:
>
>>> We did some analysis on the SHA1 vulnerabilities and determined that
>>> they did not meaningfully affect dgit & tag2upload's design.
>
>> Can you share that analysis?  As far as I understand, it is possible for
>> a malicious actor to create a git repository with the same commit id as
>> HEAD, with different historic commits and tree content.  I thought a
>> signed tag is merely a signed reference to a particular commit id.  If
>> that commit id is a SHA1 reference, that opens up for ambiguity given
>> recent (well, 2019) results on SHA1.  Of course, I may be wrong in any
>> of the chain, so would appreciate explanation of how this doesn't work.
>
> I believe you're talking about two different things.  I think Sean is
> talking about preimage resistance, which assumes that the known-good
> repository is trusted, and I believe Simon is talking about manufactured
> collisions where the attacker controls both the good and the bad
> repository.

Right.  I think the latter describes the xz scenario: someone could have
pushed a maliciously crafted commit with a SHA1 collision commit id, so
there are two different git repositories with that commit id, and a
signed git tag on that commit id authenticates both trees, opening up
for uncertainty about what was intended to be used.  Unless I'm missing
some detail of how git signed tag verification works that would catch
this.

> The dgit and tag2upload design probably (I'd have to think about it some
> more, ideally while bouncing the problem off of someone else, because I've
> recycled those brain cells for other things) only needs preimage
> resistance, but the general case of a malicious upstream may be vulnerable
> to manufactured collisions.

It is not completely clear to me: How about if some malicious person
pushed a commit to salsa, asked a DD to "please review this repository
and sign a tag to make the upload"?  The DD would presumably sign a
commit id that authenticate two different git trees, one with the
exploit and one without it.

/Simon


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Adrian Bunk
On Fri, Mar 29, 2024 at 06:21:27PM -0600, Antonio Russo wrote:
>...
> 1. Move towards allowing, and then favoring, git-tags over source tarballs
>...

git commit IDs, not tags.

Upstream moving git tags does sometimes happen.

Usually for bad-but-not-malicious reasons like "add one more last-minute fix",
but using tags would also invite to manipulation similar to what 
happened with xz at any point after the release.

> Best,
> Antonio Russo

cu
Adrian



Re: Validating tarballs against git repositories

2024-03-30 Thread Robert Edmonds
Russ Allbery wrote:
> Yes, perhaps it's time to switch to a different build system, although one
> of the reasons I've personally been putting this off is that I do a lot of
> feature probing for library APIs that have changed over time, and I'm not
> sure how one does that in the non-Autoconf build systems.  Meson's Porting
> from Autotools [1] page, for example, doesn't seem to address this use
> case at all.
> 
> [1] https://mesonbuild.com/Porting-from-autotools.html

Have a look at the documentation for the meson "compiler" object [1]. There is a
lot of functionality in meson that has analogs in autoconf that isn't described
in the "Porting from Autotools" document.

[1] https://mesonbuild.com/Reference-manual_returned_compiler.html

-- 
Robert Edmonds
edmo...@debian.org



Re: Validating tarballs against git repositories

2024-03-30 Thread Iustin Pop
On 2024-03-31 00:58:49, Andrey Rakhmatullin wrote:
> On Sat, Mar 30, 2024 at 10:56:40AM +0100, Iustin Pop wrote:
> > > Now it is time to take a step forward:
> > > 
> > > 1. new upstream release;
> > > 2. the DD/DM merges the upstream release VCS into the Debian VCS;
> > > 3. the buildd is notified of the new release;
> > > 4. the buildd creates and uploads the non-reviewed-in-practice blobs 
> > > "source
> > > deb" and "binary deb" to unstable.
> > > 
> > > This change would have three advantages:
> > 
> > I think everyone fully agrees this is a good thing, no need to list the
> > advantages.
> > 
> > The problem is that this requies functionality testing to be fully
> > automated via autopkgtest, and moved off the "update changelog, build
> > package, test locally, test some more, upload".
> Do you mean this theoretical workflow will not have a step of the
> maintainer actually looking at the package and running it locally, or
> running any building or linting locally before pushing the changes?
> Then yeah, looking at some questions in the past years I understand that
> some people are already doing that, powered by Salsa CI (I can think of
> several possible reasons for that workflow but it still frustrates me).

Not that it necessarily won't have that step, but how to integrate the
testing into the tag signing/pushing step.

I.e. before moving archive wide to "sign tag + push", there should be a
standard of how this is all tested for a package. Maybe there is and I'm
not aware, my Debian activities are very low key (but I try to keep up
with mailing lists).

> > Give me good Salsa support for autopkgtest + lintian + piuparts, and
> > easy support (so that I just have to toggle one checkbox), and I'm
> > happy. Or even better, integrate all that testing with Salsa (I don't
> > know if it has "CI tests must pass before merging"), and block tagging
> > on the tagged version having been successfully tested.
> AFAIK the currently suggested way of enabling that is putting
> "recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
> file" in the salsa settings (no idea where is the page that tells that or
> how to find it even knowing it exists).

Aha, see, this I didn't know. On my list to test once archive is
unblocked and I have time for packaging.

regards,
iustin



Re: Validating tarballs against git repositories

2024-03-30 Thread Andrey Rakhmatullin
On Sat, Mar 30, 2024 at 10:56:40AM +0100, Iustin Pop wrote:
> > Now it is time to take a step forward:
> > 
> > 1. new upstream release;
> > 2. the DD/DM merges the upstream release VCS into the Debian VCS;
> > 3. the buildd is notified of the new release;
> > 4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
> > deb" and "binary deb" to unstable.
> > 
> > This change would have three advantages:
> 
> I think everyone fully agrees this is a good thing, no need to list the
> advantages.
> 
> The problem is that this requies functionality testing to be fully
> automated via autopkgtest, and moved off the "update changelog, build
> package, test locally, test some more, upload".
Do you mean this theoretical workflow will not have a step of the
maintainer actually looking at the package and running it locally, or
running any building or linting locally before pushing the changes?
Then yeah, looking at some questions in the past years I understand that
some people are already doing that, powered by Salsa CI (I can think of
several possible reasons for that workflow but it still frustrates me).

> Give me good Salsa support for autopkgtest + lintian + piuparts, and
> easy support (so that I just have to toggle one checkbox), and I'm
> happy. Or even better, integrate all that testing with Salsa (I don't
> know if it has "CI tests must pass before merging"), and block tagging
> on the tagged version having been successfully tested.
AFAIK the currently suggested way of enabling that is putting
"recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
file" in the salsa settings (no idea where is the page that tells that or
how to find it even knowing it exists).

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Iustin Pop
On 2024-03-30 11:47:56, Luca Boccassi wrote:
> On Sat, 30 Mar 2024 at 09:57, Iustin Pop  wrote:
> >
> > On 2024-03-30 08:02:04, Gioele Barabucci wrote:
> > > Now it is time to take a step forward:
> > >
> > > 1. new upstream release;
> > > 2. the DD/DM merges the upstream release VCS into the Debian VCS;
> > > 3. the buildd is notified of the new release;
> > > 4. the buildd creates and uploads the non-reviewed-in-practice blobs 
> > > "source
> > > deb" and "binary deb" to unstable.
> > >
> > > This change would have three advantages:
> >
> > I think everyone fully agrees this is a good thing, no need to list the
> > advantages.
> >
> > The problem is that this requies functionality testing to be fully
> > automated via autopkgtest, and moved off the "update changelog, build
> > package, test locally, test some more, upload".
> >
> > Give me good Salsa support for autopkgtest + lintian + piuparts, and
> > easy support (so that I just have to toggle one checkbox), and I'm
> > happy. Or even better, integrate all that testing with Salsa (I don't
> > know if it has "CI tests must pass before merging"), and block tagging
> > on the tagged version having been successfully tested.
> 
> This is all already implemented by Salsa CI? You just need to include
> the yml and enable the CI in the settings

I will be the first to admit I'm not up to date on latest Salsa news,
but see, what you mention  - "include the yml" - is exactly what I don't
want.

If maintainers need to include a yaml file, it means it can vary between
projects, which means it can either have bugs or be hijacked. In my
view, there should be no freedom here, just one setting - "enable
tag2upload with automated autopkg testing", and all packages would
behave mostly the same way. But there are 2KiB single-binary packages as
well as 2GB 25 binary packages, so maybe this is too wide scope.

I just learned about tag2upload, need to look into that.

(I'm still processing this whole story, and I fear the fallout/impact
in terms of how development is regarded will be extremely high.)

regards,
iustin



Re: Validating tarballs against git repositories

2024-03-30 Thread Russ Allbery
Jeremy Stanley  writes:
> On 2024-03-29 23:29:01 -0700 (-0700), Russ Allbery wrote:
> [...]
>> if the Git repository is somewhere other than GitHub, the
>> malicious possibilities are even broader.
> [...]

> I would not be so quick to make the same leap of faith. GitHub is
> not itself open source, nor is it transparently operated. It's a
> proprietary commercial service, with all the trust challenges that
> represents. Long, long before XZ was a twinkle in anyone's eye,
> malicious actors were already regularly getting their agents hired
> onto development teams to compromise commercial software. Just look
> at the Juniper VPN backdoor debacle for a fairly well-documented
> example (but there's strong evidence this practice dates back well
> before free/libre open source software even, at least to the 1970s).

This is a valid point: let me instead say that the malicious possibilities
are *different*.  All of your points about GitHub are valid, but the
counterexample I had in mind is one where the malicious upstream runs the
entire Git hosting architecture themselves and can make completely
arbitrary changes to the Git repository freely.  I don't think we know
everything that is possible to do in that situation.  I think it would be
difficult (not impossible, but difficult) to get into that position at
GitHub, whereas it is commonplace among self-hosted projects.

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-03-30 Thread Jeremy Stanley
On 2024-03-29 23:29:01 -0700 (-0700), Russ Allbery wrote:
[...]
> if the Git repository is somewhere other than GitHub, the
> malicious possibilities are even broader.
[...]

I would not be so quick to make the same leap of faith. GitHub is
not itself open source, nor is it transparently operated. It's a
proprietary commercial service, with all the trust challenges that
represents. Long, long before XZ was a twinkle in anyone's eye,
malicious actors were already regularly getting their agents hired
onto development teams to compromise commercial software. Just look
at the Juniper VPN backdoor debacle for a fairly well-documented
example (but there's strong evidence this practice dates back well
before free/libre open source software even, at least to the 1970s).

If anything, compromising an open project or transparent service is
probably considerably harder, these sorts of people thrive in the
comfort of shadows that the proprietary software world offers them,
and (thankfully) struggle in the open, like with the rather quick
identification and public response demonstrated in this case. I
would be quite surprised by similarly rapid or open discussion from
a proprietary service who discovered a saboteur in their ranks.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Russ Allbery
Simon Josefsson  writes:
> Sean Whitton  writes:

>> We did some analysis on the SHA1 vulnerabilities and determined that
>> they did not meaningfully affect dgit & tag2upload's design.

> Can you share that analysis?  As far as I understand, it is possible for
> a malicious actor to create a git repository with the same commit id as
> HEAD, with different historic commits and tree content.  I thought a
> signed tag is merely a signed reference to a particular commit id.  If
> that commit id is a SHA1 reference, that opens up for ambiguity given
> recent (well, 2019) results on SHA1.  Of course, I may be wrong in any
> of the chain, so would appreciate explanation of how this doesn't work.

I believe you're talking about two different things.  I think Sean is
talking about preimage resistance, which assumes that the known-good
repository is trusted, and I believe Simon is talking about manufactured
collisions where the attacker controls both the good and the bad
repository.

The dgit and tag2upload design probably (I'd have to think about it some
more, ideally while bouncing the problem off of someone else, because I've
recycled those brain cells for other things) only needs preimage
resistance, but the general case of a malicious upstream may be vulnerable
to manufactured collisions.

(So far as I know, preimage attacks against *MD5* are still infeasible,
let alone against SHA-1.)

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-03-30 Thread Russ Allbery
Ingo Jürgensmann  writes:

> This reminds me of https://xkcd.com/2347/ - and I think that’s getting a
> more common threat vector for FLOSS: pick up some random lib that is
> widely used, insert some malicious code and have fun. Then also imagine
> stuff that automates builds in other ways like docker containers, Ruby,
> Rust, pip that pull stuff from the network and installs it without
> further checks.

> I hope (and am confident) that Debian as a project will react
> accordingly to prevent this happening again.

Debian has precisely the same problem.  We have more work to do than we
possibly can do with the resources we have, there is some funding but not
a lot of funding so most of the work is hobby work stolen from scarce free
time, and we're under a lot of pressure to encourage and incorporate the
work of new maintainers.

And 99% of the time trusting the people who step up to help works out
great.

The hardest part about defending against social engineering is that it
doesn't attack attack the weakness of a community.  It attacks its
*strengths*: trust, collaboration, and mutual assistance.

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-03-30 Thread Russ Allbery
Luca Boccassi  writes:

> In the end, massaged tarballs were needed to avoid rerunning autoconfery
> on twelve thousands different proprietary and non-proprietary Unix
> variants, back in the day. In 2024, we do dh_autoreconf by default so
> it's all moot anyway.

This is true from Debian's perspective.  This is much less obviously true
from upstream's perspective, and there are some advantages to aligning
with upstream about what constitutes the release artifact.

> When using Meson/CMake/home-grown makefiles there's no meaningful
> difference on average, although I'm sure there are corner cases and
> exceptions here and there.

Yes, perhaps it's time to switch to a different build system, although one
of the reasons I've personally been putting this off is that I do a lot of
feature probing for library APIs that have changed over time, and I'm not
sure how one does that in the non-Autoconf build systems.  Meson's Porting
from Autotools [1] page, for example, doesn't seem to address this use
case at all.

[1] https://mesonbuild.com/Porting-from-autotools.html

Maybe the answer is "you should give up on portability to older systems as
the cost of having a cleaner build system," and that's not an entirely
unreasonable thing to say, but that's going to be a hard sell for a lot of
upstreams that care immensely about this.

-- 
Russ Allbery (r...@debian.org)  



Re: Validating tarballs against git repositories

2024-03-30 Thread Simon Josefsson
Jonathan Carter  writes:

> On 2024/03/30 11:05, Simon Josefsson wrote:
>>> 1. Move towards allowing, and then favoring, git-tags over source tarballs
>>
>> Some people have suggested this before -- and I have considered adopting
>> that approach myself, but one thing that is often overlooked is that
>> building from git usually increase the Build-Depends quite a lot
>> compared to building from tarball
>
> How in the world do you jump to that conclusion?

By comparing the set of tools required to build from git with the tools
installed by Build-Depends* for common projects.  I'm thinking of
projects like coreutils, wget, libidn2, gnutls, gzip, etc.

/Simon


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Gioele Barabucci

On 30/03/24 14:08, Jonathan Carter wrote:

On 2024/03/30 12:43, Sean Whitton wrote:

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs 
"source

deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the
advantages.

 >

It is also already fully implemented as tag2upload, and is merely as yet
undeployed, for social reasons.


My understanding is that DSA aren't quite comfortable with it, since it 
would need to archive GPG signing key (or a keypair trusted by DAK)?


Don't the buildd already work like in similar way?

The source deb is signed by the DD, the buildd checks the signature of 
the source deb, then builds and signs the binary debs.


In the future the tag is signed by the DD, the buildd checks the 
signature of the tag, then builds and signs the source deb and the 
binary debs.


--
Gioele Barabucci



Re: Validating tarballs against git repositories

2024-03-30 Thread Gioele Barabucci

On 30/03/24 13:38, Jonathan Carter wrote:

On 2024/03/30 11:05, Simon Josefsson wrote:
1. Move towards allowing, and then favoring, git-tags over source 
tarballs

 >

Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball


How in the world do you jump to that conclusion?


Usually it's due to thing like precompiled documentation: 
{man,info}pages, HTML pages.


To generate these files you usually need texinfo, groff, pandoc, sphinx, 
etc. All big packages with plenty of runtime and build-time dependencies.


But as I said, for cases like arch rebootstraps,  can remove the 
need to bootstrap a long (and often circular) chain of dependencies.


Regards,

--
Gioele Barabucci



Re: Validating tarballs against git repositories

2024-03-30 Thread Antonio Russo
There are many important and useful things here, but I want to address
this one point:

On 2024-03-30 00:29, Russ Allbery wrote:
> Antonio Russo  writes:
> 
>> If that's the case, could make those files at packaging time, analogous
>> to the DFSG-exclude stripping process?
> 
> If I have followed this all correctly, I believe that in this case the
> exploit is not in a build artifact.  It's in a very opaque source artifact
> that is different in the release tarball from the Git archive.  Assuming
> that I have that right, stripping build artifacts wouldn't have done
> anything about this exploit, but comparing Git and release tarballs would
> have.
> 
> I think you're here anticipating a *different* exploit that would be
> carried in build artifacts that Debian didn't remove and reconstruct, and
> that we want to remove those from our upstream source archives in order to
> ensure that we can't accidentally do that.

In this case, as Guillem walks through in a later email, build-to-host.m4
would be generated by autotools under different circumstances (not Debian
today, because of differences in versions).

I therefore consider that file a build artifact, perhaps incorrectly given
Simone's comment that autoreconf cannot be used to reliably re-bootstrap all
of these files.

I was (before Simone's point) arguing to ALWAYS re-bootstrap it all, or at
least always rebootstrap on a Debian machine.  A prerequisite to this, more
generally then, is that we can always rebootstrap from auditable source.

I appreciate that, unless that binary process happens reproducibly, that
just shifts the trustee to a different person, and doesn't actually address
this kind of carefully-orchestrated attack. I also appreciate the Ken
Thompson "trusting trust" nightmare scenario makes the compiler another
major issue.

What I'm hoping for is more limited: assume our existing infrastructure is
sound, but develop hygiene and tooling that prevents accepting binary and
build-artifact Trojans into Debian.

Best,
Antonio

OpenPGP_0xB01C53D5DED4A4EE.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Lisandro Damián Nicanor Pérez Meyer
On Sat, 30 Mar 2024 at 10:16, Guillem Jover  wrote:
[snip]

This:

> I'm personally not a fan of pristine-tar, and my impression is that it
> is falling out of favor in various corners and big teams within the
> project. And then I'm also not a fan either for mixing packaging with
> upstream git history. The non-native packages I maintain only contain
> debian/ directories, which to me have the superior properties (but not
> tooling), including in a situation like this. I'll expand on this later.

And this:

> I think the biggest issue is that we are pretty much based on a model
> that relies on trusting upstreams, for code, for license and copyright
> compliance, etc. We tend to assume upstreams (and us!) can make
> mistakes, but that in general they are not working against us.
>
> When confronted with a known hostile (and not necessarily malicious)
> upstream the only winning game is not to play. If we do not even know
> the upstream is hostile and/or malicious that seems like a losing
> prospect to me. There are so many ways such upstream can slip stuff
> through in this model that this gets really nasty really quickly.
>
> Don't get me wrong, I think we can/should modify our processes and
> tooling somehow to at last try tot deter this path as much as possible,
> but it still seems to go counter to our model, and seems like a losing
> prospect. (You could have an upstream that tries to overwhelm you with
> sheer amount of commits for example. In this case they even included
> the bulk of the backdoor in git, and in the end I guess I don't see
> much difference between smuggling something through git or a tarball.)
>
> And, coming back to the Debian side of things. To me the most
> important part is that we might be able to close a bit this door with
> upstream, but what about this happening within Debian? I think we have
> discussed in the past, what would happen if someone tried this kind of
> long term attack on the project, and my feeling is that we have kind
> of shrugged it off as either "it would take too much effort so it's
> implausible" or "if they want to do it we are lost anyway" but perhaps
> I'm misremembering.
>
> Related to this, dgit has been brought up as the solution to this, but
> in my mind this incident reinforces my view that precisely storing
> more upstream stuff in git is the opposite of what we'd want, and
> makes reviewing even harder, given that in our context we are on a
> permanent fork against upstream, and if you include merge commits and
> similar, there's lots of places to hide stuff. In contrast storing
> only the packaging bits (debian/ dir alone) like pretty much every
> other downstream is doing with their packaging bits, makes for an
> obviously more manageable thing to review and not get drown into,
> more so if we have to consider that next time perhaps the long-game
> gets played within Debian. :(
>
> (An additional bonus of only keeping debian/ directories is that it
> would make it possible to checkout all Debian packaging locally. :)

Are the most insightful things I've read so far, both on the social
and technical side of things.

I will add that comparing autogenerated files in big/huge projects is
something utterly complicated. Could take days of work. On the other
hand big/huge projects tend to have much more eyes, but things can
always go wrong.

-- 
Lisandro Damián Nicanor Pérez Meyer
https://perezmeyer.com.ar/



Re: Validating tarballs against git repositories

2024-03-30 Thread Simon Josefsson
Sean Whitton  writes:

> Hello,
>
> On Sat 30 Mar 2024 at 12:19pm +01, Simon Josefsson wrote:
>
>> Relying on signed git tags is not reliable because git is primarily
>> SHA1-based which in 2019 cost $45K to do a collission attack for.
>
> We did some analysis on the SHA1 vulnerabilities and determined that
> they did not meaningfully affect dgit & tag2upload's design.

Can you share that analysis?  As far as I understand, it is possible for
a malicious actor to create a git repository with the same commit id as
HEAD, with different historic commits and tree content.  I thought a
signed tag is merely a signed reference to a particular commit id.  If
that commit id is a SHA1 reference, that opens up for ambiguity given
recent (well, 2019) results on SHA1.  Of course, I may be wrong in any
of the chain, so would appreciate explanation of how this doesn't work.

/Simon


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Guillem Jover
Hi!

On Fri, 2024-03-29 at 23:53:20 -0600, Antonio Russo wrote:
> On 2024-03-29 22:41, Guillem Jover wrote:
> > On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
> >> Had tooling existed in Debian to automatically validate this faithful
> >> reproduction, we might not have been exposed to this issue.
> > 
> > Given that the autogenerated stuff is not present in the git tree,
> > a diff between tarball and git would always generate tons of delta,
> > so this would not have helped.
> 
> I may not have been clear, but I'm suggesting scrubbing all the
> autogenerated stuff, and comparing that against a similarly scrubbed
> git tag contents.  (But you explain that this is problematic.)

Yes, the point here is how we determine what is autogenerated stuff
when confronted with a malicious upstream, so the problem again is
that if you need to verify everything then you might easily get
overwhelmed by sheer amount of autogenerated output. But see below.

> >> Having done this myself, it has been my experience that many partial
> >> build artifacts are captured in source tarballs that are not otherwise
> >> maintained in the git repository.  For instance, in zfs (which I have
> >> contributed to in the past), many automake files are regenerated.
> >> (I do not believe that specific package is vulnerable to an attack
> >> on the autoconf/automake files, since the debian package calls the
> >> upstream tooling to regenerate those files.)
> 
> (Hopefully the above clears up that I at least have some superficial
> awareness of the build artifacts showing up in the release tarball!)

(Sorry, I guess my reply might have sounded patronizing? I noticed later
on that you explicitly mentioned this, but thought that would be clear
then when reading the whole mail, thought about adding a note to the
earlier text, but considered it unnecessary. Should have probably added
it anyway. :)

> >> 1. Move towards allowing, and then favoring, git-tags over source tarballs
> > 
> > I assume you mean git archives out of git tags? Otherwise how do you
> > go from git-tag to a source package in your mind?
> 
> I'm not wed to any specific mechanism, but I'd be content with that.  I'd
> be most happy DD-signed tags that were certified dfsg, policy compliant
> (i.e., lacking build artifacts), and equivalent to scrubbed upstream source.
> (and more on that later, building on what you say).
> 
> Many repositories today already do things close to this with pristine-tar,
> so this seems to me a direction where the tooling already exists.
> 
> I'll add that, if we drop the desire for a signed archive, and instead
> require a signed git-tag (from which we can generate a source tar on
> demand, as you suggest), we can drop the pristine-tar requirement.  If we
> are less progressive, but move to exclusively with Debian-regenerated
> .tar files, we can probably avoid many of the frustrating edge cases that
> pristine-tar still struggles with.

I'm personally not a fan of pristine-tar, and my impression is that it
is falling out of favor in various corners and big teams within the
project. And then I'm also not a fan either for mixing packaging with
upstream git history. The non-native packages I maintain only contain
debian/ directories, which to me have the superior properties (but not
tooling), including in a situation like this. I'll expand on this later.

I've been thinking and, perhaps the only thing we'd need, is to include
either a file or a field in some file that refers to the upstream commit
we think the tarball is derived from. We also have fields that contain
the upstream VCS repo. Then we could also have tooling that could perform
such checks, independently from how we transport and pack our sources.

> >> 2. Require upstream-built artifacts be removed (instead, generate these
> >>ab-initio during build)
> > 
> > The problem here is that the .m4 file to hook into the build system was
> > named like one shipped by gnulib (so less suspicious), but xz-utils does
> > not use gnulib, and thus the autotools machinery does not know anything
> > about it, so even the «autoreconf -f -i» done by debhelper via
> > dh-autoreconf, would not regenerate it.
> 
> The way I see it, there are two options in handling a buildable package:
> 
> 1. That file would have been considered a build artifact, consequently
> removed and then regenerated.  No backdoor.
>
> 2. The file would not have been scrubbed, and a difference between the
> git version and the released tar version would have been noticed.
> Backdoor found.
> 
> Either of these is, in my mind, dramatically better than what happened.

Sure, but that relies on knowing for certain what is and what is not
autogenerated for 1), and to not be able to get drown in autogenerated
output for 2) so that this cannot be easily missed, and for autoreconf
to do what we expect! Also important is when this would be done, only
on initial packaging, on every build? Because 1) has the bad property
that it 

Re: Validating tarballs against git repositories

2024-03-30 Thread Jonathan Carter

Hi Sean

On 2024/03/30 12:43, Sean Whitton wrote:

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the
advantages.

>

It is also already fully implemented as tag2upload, and is merely as yet
undeployed, for social reasons.


My understanding is that DSA aren't quite comfortable with it, since it 
would need to archive GPG signing key (or a keypair trusted by DAK)?


I did enjoy the tag2upload talk that was given earlier this year at 
miniDebConf Campridge:


https://peertube.debian.social/w/pav68XBWdurWzfTYvDgWRM

One of the things I like most about it is that it doesn't break any 
existing workflow or technical implementation. And it seems like 
something most people would reasonably want to see implemented.


So I think it boils down to finding some constructive way to engage with 
ftpmasters to find a solution that they are content with, because 
without that, nothing is going to happen. I'm not 100% sure that I would 
classify that as a social reason, DSA/ftpmaster is careful out of necessity.


Any chance we can convince both ftpmaster members and tag2upload team to 
join at DebConf24 in Busan so that an attempt can be made to hash this 
out in person? I'm not sure everyone involved will be motivated enough 
to join a sprint just to work on this, but it tends to work so much 
better when people work on problems together in person rather than 
emails where people want to reply thoughtfully and then end up taking 
weeks to do so.


I think it's not so much a question of *if* the Debian would ever switch 
to a git-based workflow, but *when*. And tag2upload's opt-in nature 
provides a great bridge to that future, there's clearly been a lot of 
good thought put into it, and there's really no alternative that even 
comes close in either design or being so close to being ready for 
implementation. However, I think it can only happen if you get all the 
right people in the same room to address the remaining concerns.


-Jonathan



Re: Validating tarballs against git repositories

2024-03-30 Thread Bastian Blank
On Sat, Mar 30, 2024 at 01:30:07PM +0100, Jan-Benedict Glaw wrote:
> On Sat, 2024-03-30 08:02:04 +0100, Gioele Barabucci  wrote:
> > On 30/03/24 01:21, Antonio Russo wrote:
> > > 3. Have tooling that automatically checks the sanitized sources against
> > > the development RCSs.
> > git-buildpackage and pristine-tar can be used for that.
> Would be nice if pristine-tar's data file would be reproducible,
> too...

Use pristine-lfs.  Or just generate via "git archive".

Bastian

-- 
It is undignified for a woman to play servant to a man who is not hers.
-- Spock, "Amok Time", stardate 3372.7



Re: Validating tarballs against git repositories

2024-03-30 Thread G. Branden Robinson
At 2024-03-30T14:38:03+0200, Jonathan Carter wrote:
> On 2024/03/30 11:05, Simon Josefsson wrote:
> > > 1. Move towards allowing, and then favoring, git-tags over source tarballs
> >
> > Some people have suggested this before -- and I have considered
> > adopting that approach myself, but one thing that is often
> > overlooked is that building from git usually increase the
> > Build-Depends quite a lot compared to building from tarball
> 
> How in the world do you jump to that conclusion?

I don't know about "usually", but "often" seems fair enough; it's true
of groff.

https://git.savannah.gnu.org/cgit/groff.git/tree/INSTALL.REPO?h=1.23.0#n15

Regards,
Branden


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Jonathan Carter

On 2024/03/30 11:05, Simon Josefsson wrote:

1. Move towards allowing, and then favoring, git-tags over source tarballs

>

Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball


How in the world do you jump to that conclusion?

-Jonathan



Re: Validating tarballs against git repositories

2024-03-30 Thread Jan-Benedict Glaw
On Sat, 2024-03-30 08:02:04 +0100, Gioele Barabucci  wrote:
> On 30/03/24 01:21, Antonio Russo wrote:
> > 3. Have tooling that automatically checks the sanitized sources against
> > the development RCSs.
> 
> git-buildpackage and pristine-tar can be used for that.

Would be nice if pristine-tar's data file would be reproducible,
too...

MfG, JBG

-- 


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Sean Whitton
Hello,

On Sat 30 Mar 2024 at 12:19pm +01, Simon Josefsson wrote:

> Relying on signed git tags is not reliable because git is primarily
> SHA1-based which in 2019 cost $45K to do a collission attack for.

We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Luca Boccassi
On Sat, 30 Mar 2024 at 06:29, Russ Allbery  wrote:
>
> Antonio Russo  writes:
>
> > The way I see it, there are two options in handling a buildable package:
>
> > 1. That file would have been considered a build artifact, consequently
> > removed and then regenerated.  No backdoor.
>
> > 2. The file would not have been scrubbed, and a difference between the
> > git version and the released tar version would have been noticed.
> > Backdoor found.
>
> > Either of these is, in my mind, dramatically better than what happened.
>
> I think the point that you're assuming (probably because you quite
> reasonably think it's too obvious to need to be stated, but I'm not sure
> it's that obvious to everyone) is that malicious code injected via a
> commit is significantly easier to detect than malicious code that is only
> in the release tarball.
>
> This is not *always* correct; it really depends on how many eyes are on
> the upstream repository and how complex or unreadable the code upstream
> writes normally is.  (For example, I am far from confident that I can
> eyeball the difference between valid and malicious procmail-style C code
> or random M4 files.)  I think it's clearly at least *sometimes* correct,
> though, so I'm sympathetic, particularly given that it's already Debian
> practice to regenerate the build system files anyway.
>
> In other words, we should make sure that breaking the specific tactics
> *this* attacker used truly make the attacker's life harder, as opposed to
> making life harder for Debian packagers while only forcing a one-time,
> minor shift in attacker tactics.  I *think* I'm mostly convinced that
> forcing the attacker into Git commits is a useful partial defense, but I'm
> not sure this is obviously true.

While it's of course true that avoiding massaged tarballs as orig.tar
is not a panacea, and that obfuscated malicious code can and is
checked in git, I am pretty sure it is undeniable that having
everything tracked in git makes it _easier_ to audit and investigate.
Not perfect, not fool-proof, but easier, compared to manually diffing
tarballs. And given we are talking about malicious actors using
subterfuge to attack us, I think we could use all the help we can get,
even if there's no perfect solution.

In the end, massaged tarballs were needed to avoid rerunning
autoconfery on twelve thousands different proprietary and
non-proprietary Unix variants, back in the day. In 2024, we do
dh_autoreconf by default so it's all moot anyway. When using
Meson/CMake/home-grown makefiles there's no meaningful difference on
average, although I'm sure there are corner cases and exceptions here
and there.



Re: Validating tarballs against git repositories

2024-03-30 Thread Luca Boccassi
On Sat, 30 Mar 2024 at 09:57, Iustin Pop  wrote:
>
> On 2024-03-30 08:02:04, Gioele Barabucci wrote:
> > Now it is time to take a step forward:
> >
> > 1. new upstream release;
> > 2. the DD/DM merges the upstream release VCS into the Debian VCS;
> > 3. the buildd is notified of the new release;
> > 4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
> > deb" and "binary deb" to unstable.
> >
> > This change would have three advantages:
>
> I think everyone fully agrees this is a good thing, no need to list the
> advantages.
>
> The problem is that this requies functionality testing to be fully
> automated via autopkgtest, and moved off the "update changelog, build
> package, test locally, test some more, upload".
>
> Give me good Salsa support for autopkgtest + lintian + piuparts, and
> easy support (so that I just have to toggle one checkbox), and I'm
> happy. Or even better, integrate all that testing with Salsa (I don't
> know if it has "CI tests must pass before merging"), and block tagging
> on the tagged version having been successfully tested.

This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings



Re: Validating tarballs against git repositories

2024-03-30 Thread Simon Josefsson
Gioele Barabucci  writes:

> Just as an example, bootstrapping coreutils currently requires
> bootstrapping at least 68 other packages, including libx11-6 [1]. If 
> coreutils supported  [2], the transitive closure of its
> Build-Depends would be reduced to 20 packages, most of which in 
> build-essential.
>
> [1]
> https://buildd.debian.org/status/fetch.php?pkg=coreutils=amd64=9.4-3.1=1710441056=1
> [2] https://bugs.debian.org/1057136

Coreutils in Debian uses upstream tarballs and does not do a full
bootstrap build.  It does autoreconf instead of ./bootstrap.  So the
dependencies above is not the entire bootstrapping story to build
coreutils from git compared to building from tarballs.

It would help if upstreams would publish PGP-signed 'git-archive'-style
tarballs, including content from git submodules in them.

Relying on signed git tags is not reliable because git is primarily
SHA1-based which in 2019 cost $45K to do a collission attack for.

/Simon


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Sean Whitton
Hello,

On Sat 30 Mar 2024 at 10:56am +01, Iustin Pop wrote:

> On 2024-03-30 08:02:04, Gioele Barabucci wrote:
>> Now it is time to take a step forward:
>>
>> 1. new upstream release;
>> 2. the DD/DM merges the upstream release VCS into the Debian VCS;
>> 3. the buildd is notified of the new release;
>> 4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
>> deb" and "binary deb" to unstable.
>>
>> This change would have three advantages:
>
> I think everyone fully agrees this is a good thing, no need to list the
> advantages.

It is also already fully implemented as tag2upload, and is merely as yet
undeployed, for social reasons.

-- 
Sean Whitton



Re: Validating tarballs against git repositories

2024-03-30 Thread Sean Whitton
Hello,

On Fri 29 Mar 2024 at 06:21pm -06, Antonio Russo wrote:

> 1. Move towards allowing, and then favoring, git-tags over source tarballs

Many of us already do this.  dgit maintains an official store of the tags.

-- 
Sean Whitton



Re: Validating tarballs against git repositories

2024-03-30 Thread Gioele Barabucci

On 30/03/24 10:05, Simon Josefsson wrote:

Antonio Russo  writes:


1. Move towards allowing, and then favoring, git-tags over source tarballs


Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball, and that will more likely trigger
cyclic dependencies.  People that do bootstrapping for new platforms or
cross-platform dislike such added dependency.


Most of the time such added dependencies could be worked around with 
build profiles and cross building. More widespread support for , 
 and Multi-Arch annotations can greatly reduce the number of 
deps needed to bootstrap an architecture.


Just as an example, bootstrapping coreutils currently requires 
bootstrapping at least 68 other packages, including libx11-6 [1]. If 
coreutils supported  [2], the transitive closure of its 
Build-Depends would be reduced to 20 packages, most of which in 
build-essential.


[1] 
https://buildd.debian.org/status/fetch.php?pkg=coreutils=amd64=9.4-3.1=1710441056=1

[2] https://bugs.debian.org/1057136

Regards,

--
Gioele Barabucci



Re: Validating tarballs against git repositories

2024-03-30 Thread Iustin Pop
On 2024-03-30 08:02:04, Gioele Barabucci wrote:
> Now it is time to take a step forward:
> 
> 1. new upstream release;
> 2. the DD/DM merges the upstream release VCS into the Debian VCS;
> 3. the buildd is notified of the new release;
> 4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
> deb" and "binary deb" to unstable.
> 
> This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the
advantages.

The problem is that this requies functionality testing to be fully
automated via autopkgtest, and moved off the "update changelog, build
package, test locally, test some more, upload".

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.

And yes, this should be uniform across all packages stored on Salsa, so
as to not diverge how the testing is done.

iustin



Re: Validating tarballs against git repositories

2024-03-30 Thread Andrey Rakhmatullin
On Sat, Mar 30, 2024 at 09:58:22AM +0100, Ingo Jürgensmann wrote:
> > Yes. In that specific case, the original xz maintainer (Lasse Collin)
> > was socially-pressed by a likely fake person (Jigar Kumar) to do the
> > "right thing" and hand over maintenance.
> > https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html
> 
> In his reply to that mail Lasse writes in 
> https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html:
> 
> > It's also good to keep in mind that this is an unpaid hobby project.
> 
> 
> This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more 
> common threat vector for FLOSS: pick up some random lib that is widely used, 
> insert some malicious code and have fun. Then also imagine stuff that 
> automates builds in other ways like docker containers, Ruby, Rust, pip that 
> pull stuff from the network and installs it without further checks. 
> 
> I hope (and am confident) that Debian as a project will react accordingly to 
> prevent this happening again. 
How?

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Simon Josefsson
Antonio Russo  writes:

> 1. Move towards allowing, and then favoring, git-tags over source tarballs

Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball, and that will more likely trigger
cyclic dependencies.  People that do bootstrapping for new platforms or
cross-platform dislike such added dependency.

One response to that may be "sorry, our concerns for supply chain
security trumps your desire for easier building" but so far I believe
the approach has been to compromise a little on supply chain side (i.e.,
building from tarballs) and compromise a little on the
bootstrap/crossbuild smoothness (e.g., adding nodoc or nocheck targets).

Moving that needle isn't all that trivial, although I think I'm moving
myself to a preference that we really need to build everything from
source code and preferrably not even including non-source code files
because they may dormant and activated later on a'la the xz attack.

An old irk of mine is that people seems to believe that running
'autoreconf -fi' is intended or supposed to combat problems related to
this: autoreconf was never designed for that purpose, nor does it
achieve it realiably.  Many distributions have adopted a preference to
do run 'autoreconf' to "re-bootstrap" a project from source code.  This
misses a lot of generated files, and sometimes generate incorrect (and
possibly harmful) newly generated files.  For example:
https://gitlab.com/libidn/libidn2/-/issues/108

/Simon


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Ingo Jürgensmann
Am 30.03.2024 um 08:56 schrieb Lucas Nussbaum :

> Yes. In that specific case, the original xz maintainer (Lasse Collin)
> was socially-pressed by a likely fake person (Jigar Kumar) to do the
> "right thing" and hand over maintenance.
> https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

In his reply to that mail Lasse writes in 
https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html:

> It's also good to keep in mind that this is an unpaid hobby project.


This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more 
common threat vector for FLOSS: pick up some random lib that is widely used, 
insert some malicious code and have fun. Then also imagine stuff that automates 
builds in other ways like docker containers, Ruby, Rust, pip that pull stuff 
from the network and installs it without further checks. 

I hope (and am confident) that Debian as a project will react accordingly to 
prevent this happening again. 

But as a society (that is widely using FLOSS) I would also hope that our 
developers will get proper funding instead of requiring them to maintain such 
software in their spare time. 

-- 
Ciao...  //Web: http://blog.windfluechter.net
  Ingo \X/ XMPP/Jabber:   i...@jhookipa.net

gpg pubkey:  http://www.juergensmann.de/ij_public_key.asc





Re: Validating tarballs against git repositories

2024-03-30 Thread Aníbal Monsalve Salazar
On Fri, 2024-03-29 23:53:20 -0600, Antonio Russo wrote:
> On 2024-03-29 22:41, Guillem Jover wrote:
>> See for example .
> 
> I take a look at these every year or so to keep me terrified of C!
> If it's a single upstream developer, I absolutely agree, but if there's an
> upstream community reviewing the git commits, I really do believe there is
> hope (of them!) identifying bad(tm) things.

Another scary example, "Reflections on Trusting Trust" by Ken Thompson:

https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html


signature.asc
Description: PGP signature


Re: Validating tarballs against git repositories

2024-03-30 Thread Lucas Nussbaum
On 29/03/24 at 23:29 -0700, Russ Allbery wrote:
> The sad irony here is that the xz maintainer tried to do exactly what we
> advise people in this situation to do: try to add a comaintainer to share
> the work, and don't block work because you don't have time to personally
> vet everything in detail.  This is *exactly* why maintainers often don't
> want to do that, and thus force people to fork packages rather than join
> in maintaining the existing package.

Yes. In that specific case, the original xz maintainer (Lasse Collin)
was socially-pressed by a likely fake person (Jigar Kumar) to do the
"right thing" and hand over maintenance.

https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

I wonder if "Dennis Enn" is also a fake person. In retrospect, that
email looks suspicious:

On 2022-06-21 Dennis Ens wrote:
> Why not pass on maintainership for XZ for C so you can give XZ for
> Java more attention? Or pass on XZ for Java to someone else to focus
> on XZ for C? Trying to maintain both means that neither are
> maintained well.

Lucas



Re: Validating tarballs against git repositories

2024-03-30 Thread Gioele Barabucci

On 30/03/24 01:21, Antonio Russo wrote:

3. Have tooling that automatically checks the sanitized sources against
the development RCSs.


git-buildpackage and pristine-tar can be used for that.


4. Look unfavorably on upstreams without RCS.


And look unfavorably on Debian packages without VCS. And, in addition:

5. Require something like tag2upload to create new releases of Debian 
packages.


For too many core packages there is an opaque "something happens on the 
Debian maintainer laptop" step that has no place in 2024. We have no 
idea how many Debian DDs/DMs machiens have been compromised because of 
this attack. (Hopefully zero.) Any future upload of source debs may, in 
principle, contain malicious code.


The workflow for Debian packages has already gone from:

1. new upstream release;
2. something happens on the DD/DM machine;
3. the DD/DM uploads two non-reviewed-in-practice blobs (source deb, 
binary deb) to unstable.


to:

1. new upstream release;
2. something happens on the DD/DM machine;
3. the DD/DM uploads a non-reviewed-in-practice blob (source deb) to the 
buildd;

4. the buildd compiles the source deb into the binary deb;
5. the buildd uploads a non-reviewed-in-practice blob (binary deb) to 
unstable.


This change moved a lot of trust from the hands (and machines) of a 
myriad of DDs/DMs into a handful of closely guarded build machines. A 
compromised gcc on the DD/DM machine is no longer a problem. But a 
compromised tar/dpkg/debhelper still is.


Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs 
"source deb" and "binary deb" to unstable.


This change would have three advantages:

* Make the whole process happen outside the DD/DM computer, so it 
becomes more public and easier to review (commits vs debs), removing 
many chances for compromises.


* Close two specific attack vectors (hiding code in upstream release 
tarballs and in source debs) that have always existed and for one of 
which we have now proof of exploitation.


* Force attackers to do their work under public scrutiny, raising the 
complexity and the cost of carrying out an attack.


Yes, such a workflow will not stop many other attack vectors, but at 
least _these_ attack vectors will be stopped.


Regards,

--
Gioele Barabucci



Re: Validating tarballs against git repositories

2024-03-30 Thread Russ Allbery
Antonio Russo  writes:

> The way I see it, there are two options in handling a buildable package:

> 1. That file would have been considered a build artifact, consequently
> removed and then regenerated.  No backdoor.

> 2. The file would not have been scrubbed, and a difference between the
> git version and the released tar version would have been noticed.
> Backdoor found.

> Either of these is, in my mind, dramatically better than what happened.

I think the point that you're assuming (probably because you quite
reasonably think it's too obvious to need to be stated, but I'm not sure
it's that obvious to everyone) is that malicious code injected via a
commit is significantly easier to detect than malicious code that is only
in the release tarball.

This is not *always* correct; it really depends on how many eyes are on
the upstream repository and how complex or unreadable the code upstream
writes normally is.  (For example, I am far from confident that I can
eyeball the difference between valid and malicious procmail-style C code
or random M4 files.)  I think it's clearly at least *sometimes* correct,
though, so I'm sympathetic, particularly given that it's already Debian
practice to regenerate the build system files anyway.

In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to
making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics.  I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm
not sure this is obviously true.

> Ok, so am I understanding you correctly in that you are saying: we do
> actually want *some* build artifacts in the source archives?

> If that's the case, could make those files at packaging time, analogous
> to the DFSG-exclude stripping process?

If I have followed this all correctly, I believe that in this case the
exploit is not in a build artifact.  It's in a very opaque source artifact
that is different in the release tarball from the Git archive.  Assuming
that I have that right, stripping build artifacts wouldn't have done
anything about this exploit, but comparing Git and release tarballs would
have.

I think you're here anticipating a *different* exploit that would be
carried in build artifacts that Debian didn't remove and reconstruct, and
that we want to remove those from our upstream source archives in order to
ensure that we can't accidentally do that.

> On 2024-03-29 22:41, Guillem Jover wrote:

>> (For dpkg at least I'm pondering whether to play with switching to
>> doing something equivalent to «git archive» though, but see above, or
>> maybe generate two tarballs, a plain «git archive» and a portable one.)

Yeah, with my upstream hat on, I'm considering something similar, but I
still believe I have users who want to compile from source on systems
without current autotools, so I still need separate release tarballs.
Having to generate multiple release artifacts (and document them, and
explain to people which ones they want, etc.) is certainly doable, but I
can't say that I'm all that thrilled about it.

I think with my upstream hat on I'd rather ship a clear manifest (checked
into Git) that tells distributions which files in the distribution tarball
are build artifacts, and guarantee that if you delete all of those files,
the remaining tree should be byte-for-byte identical with the
corresponding signed Git tag.  (In other words, Guillem's suggestion.)
Then I can continue to ship only one release artifact.

> I take a look at these every year or so to keep me terrified of C!  If
> it's a single upstream developer, I absolutely agree, but if there's an
> upstream community reviewing the git commits, I really do believe there
> is hope (of them!) identifying bad(tm) things.

A single upstream developer is the most common case, though.  Perhaps less
so for core libraries, but, well, there are plenty of examples.  (To pick
another one that comes readily to mind, zlib appears to only have one
active maintainer.)

The reality that we are struggling with is that the free software
infrastructure on which much of computing runs is massively and painfully
underfunded by society as a whole, and is almost entirely dependent on
random people maintaining things in their free time because they find it
fun, many of whom are close to burnout.  This is, in many ways, the true
root cause of this entire event.

The sad irony here is that the xz maintainer tried to do exactly what we
advise people in this situation to do: try to add a comaintainer to share
the work, and don't block work because you don't have time to personally
vet everything in detail.  This is *exactly* why maintainers often don't
want to do that, and thus force people to fork packages rather than join
in maintaining the existing package.

This is an aside, but this is why my personal policy for my own projects
that I no 

Re: Validating tarballs against git repositories

2024-03-29 Thread Antonio Russo
On 2024-03-29 22:41, Guillem Jover wrote:
> Hi!
> 
> On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
>> This is a vector I've been somewhat paranoid about myself, and I
>> typically check the difference between git archive $TAG and the downloaded
>> tar, whenever I package things.  Obviously a backdoor could have been
>> inserted into the git repository directly, but there is a culture
>> surrounding good hygiene in commits: they ought to be small, focused,
>> and well described.
> 
> But the backdoor was in fact included in a git commit (it's hidden
> inside a test compressed file).
> 
> The part that was only present in the tarball was the code to extract
> and hook the inclusion of the backdoor via the build system.

Yes. The "test compressed file" needs to be massaged via:

  >  tr "\-_" " _\-" | xz -d

That code comes out of the m4 file, which is not present in git source
code.  I'm unaware at this point of any direct evidence that the git
source code alone is in any way dangerous (aside from the fact that we
cannot trust the developer at all!).

>> People are comfortable discussing and challenging
>> a commit that looks fishy, even if that commit is by the main developer
>> of a package.  I have been assuming tooling existed in package
>> maintainers' toolkits to verify the faithful reproduction of the
>> published git tag in the downloaded source tarball, beyond a signature
>> check by the upstream developer.  Apparently, this is not universal.
>>
>> Had tooling existed in Debian to automatically validate this faithful
>> reproduction, we might not have been exposed to this issue.
> 
> Given that the autogenerated stuff is not present in the git tree,
> a diff between tarball and git would always generate tons of delta,
> so this would not have helped.

I may not have been clear, but I'm suggesting scrubbing all the
autogenerated stuff, and comparing that against a similarly scrubbed
git tag contents.  (But you explain that this is problematic.)

>> Having done this myself, it has been my experience that many partial
>> build artifacts are captured in source tarballs that are not otherwise
>> maintained in the git repository.  For instance, in zfs (which I have
>> contributed to in the past), many automake files are regenerated.
>> (I do not believe that specific package is vulnerable to an attack
>> on the autoconf/automake files, since the debian package calls the
>> upstream tooling to regenerate those files.)

(Hopefully the above clears up that I at least have some superficial
awareness of the build artifacts showing up in the release tarball!)

>> We already have a policy of not shipping upstream-built artifacts, so
>> I am making a proposal that I believe simply takes that one step further:
>>
>> 1. Move towards allowing, and then favoring, git-tags over source tarballs
> 
> I assume you mean git archives out of git tags? Otherwise how do you
> go from git-tag to a source package in your mind?

I'm not wed to any specific mechanism, but I'd be content with that.  I'd
be most happy DD-signed tags that were certified dfsg, policy compliant
(i.e., lacking build artifacts), and equivalent to scrubbed upstream source.
(and more on that later, building on what you say).

Many repositories today already do things close to this with pristine-tar,
so this seems to me a direction where the tooling already exists.

I'll add that, if we drop the desire for a signed archive, and instead
require a signed git-tag (from which we can generate a source tar on
demand, as you suggest), we can drop the pristine-tar requirement.  If we
are less progressive, but move to exclusively with Debian-regenerated
.tar files, we can probably avoid many of the frustrating edge cases that
pristine-tar still struggles with.

>> 2. Require upstream-built artifacts be removed (instead, generate these
>>ab-initio during build)
> 
> The problem here is that the .m4 file to hook into the build system was
> named like one shipped by gnulib (so less suspicious), but xz-utils does
> not use gnulib, and thus the autotools machinery does not know anything
> about it, so even the «autoreconf -f -i» done by debhelper via
> dh-autoreconf, would not regenerate it.

The way I see it, there are two options in handling a buildable package:

1. That file would have been considered a build artifact, consequently
removed and then regenerated.  No backdoor.

2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.

Either of these is, in my mind, dramatically better than what happened.

One automatic approach would be run dh-autoreconf and identify the
changed files.  Remove those files from both the distributed tarball and
git tag.  Check if those differ. (You also suggest something very similar
to this, and repacking the archive with those debian-generated build
artifacts).

I may be missing something here, though!

> 

Re: Validating tarballs against git repositories

2024-03-29 Thread Guillem Jover
Hi!

On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
> This is a vector I've been somewhat paranoid about myself, and I
> typically check the difference between git archive $TAG and the downloaded
> tar, whenever I package things.  Obviously a backdoor could have been
> inserted into the git repository directly, but there is a culture
> surrounding good hygiene in commits: they ought to be small, focused,
> and well described.

But the backdoor was in fact included in a git commit (it's hidden
inside a test compressed file).

The part that was only present in the tarball was the code to extract
and hook the inclusion of the backdoor via the build system.

> People are comfortable discussing and challenging
> a commit that looks fishy, even if that commit is by the main developer
> of a package.  I have been assuming tooling existed in package
> maintainers' toolkits to verify the faithful reproduction of the
> published git tag in the downloaded source tarball, beyond a signature
> check by the upstream developer.  Apparently, this is not universal.
> 
> Had tooling existed in Debian to automatically validate this faithful
> reproduction, we might not have been exposed to this issue.

Given that the autogenerated stuff is not present in the git tree,
a diff between tarball and git would always generate tons of delta,
so this would not have helped.

> Having done this myself, it has been my experience that many partial
> build artifacts are captured in source tarballs that are not otherwise
> maintained in the git repository.  For instance, in zfs (which I have
> contributed to in the past), many automake files are regenerated.
> (I do not believe that specific package is vulnerable to an attack
> on the autoconf/automake files, since the debian package calls the
> upstream tooling to regenerate those files.)
> 
> We already have a policy of not shipping upstream-built artifacts, so
> I am making a proposal that I believe simply takes that one step further:
> 
> 1. Move towards allowing, and then favoring, git-tags over source tarballs

I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?

> 2. Require upstream-built artifacts be removed (instead, generate these
>ab-initio during build)

The problem here is that the .m4 file to hook into the build system was
named like one shipped by gnulib (so less suspicious), but xz-utils does
not use gnulib, and thus the autotools machinery does not know anything
about it, so even the «autoreconf -f -i» done by debhelper via
dh-autoreconf, would not regenerate it.

Removing these might be cumbersome after the fact if upstream includes
for example their own maintained .m4 files. See dpkg's m4 dir for an
example of this (although there it's easy as all are namespaced but…).

Not using an upstream provided tarball, might also mean we stop being
able to use upstream signatures, which seems worse. The alternative
might be promoting for upstreams to just do the equivalent of
«git archive», but that might defeat the portability and dependency
reduction properties that were designed into the autotools build
system, or increase the bootstrap set (see for example the
pkg.dpkg.author-release build profile used by dpkg).

(For dpkg at least I'm pondering whether to play with switching to
doing something equivalent to «git archive» though, but see above, or
maybe generate two tarballs, a plain «git archive» and a portable one.)

> 3. Have tooling that automatically checks the sanitized sources against
>the development RCSs.

Perhaps we could have a declarative way to state all the autogenerated
artifacts included in a tarball that need to be cleaned up
automatically after unpack, in a similar way as how we have a way to
automatically exclude stuff when repackaging tarballs via uscan?

(.gitignore, if upstream properly maintains those might be a good
starting point, but that will tend to include more than necessary.)

> 4. Look unfavorably on upstreams without RCS.

Some upstreams have a VCS, but still do massive code drops, or include
autogenerated stuff in the VCS, or do not do atomic commits, or in
addition their commit message are of the style "fix stuff", "." or
alike. So while this is something we should encourage, it's not
sufficient. I think part of this might already be present in our
Upstream Guidelines in the wiki.

> In the present case, the triggering modification was in a modified .m4 file
> that injected a snippet into the configure script.  That modification
> could have been flagged using this kind of process.

I don't think this modification would have been spotted, because it
was not modifying a file it would usually get autogenerated by its
build system.

> While this would be a lot of work, I believe doing so would require a
> much larger amount of additional complexity in orchestrating attacks
> against Debian in the future.

It would certainly make it a bit harder, but I'm 

Validating tarballs against git repositories

2024-03-29 Thread Antonio Russo
Hello everyone,

As I'm sure we're all aware of at this point, Debian has been a victim
of a relatively sophisticated first-party attack whereby a backdoor
of the XZ package was smuggled into sshd via a systemd dependency.
This backdoor, at a minimum, attacked key verification. As far as I
understand, it is not yet understood what exactly the effects of
these backdoors are. (There are two versions 5.6.0 and 5.6.1 that are
affected, and investigation is ongoing.)

There are many things to talk about here, but one that involves the
task of package maintainers, and that I would like to discuss now, is
the way the backdoor was distributed.  The code in the xz git
repository does not build a vulnerable version, while the code in the
5.6.0 and 5.6.1 source tarballs do.

This is a vector I've been somewhat paranoid about myself, and I
typically check the difference between git archive $TAG and the downloaded
tar, whenever I package things.  Obviously a backdoor could have been
inserted into the git repository directly, but there is a culture
surrounding good hygiene in commits: they ought to be small, focused,
and well described.  People are comfortable discussing and challenging
a commit that looks fishy, even if that commit is by the main developer
of a package.  I have been assuming tooling existed in package
maintainers' toolkits to verify the faithful reproduction of the
published git tag in the downloaded source tarball, beyond a signature
check by the upstream developer.  Apparently, this is not universal.

Had tooling existed in Debian to automatically validate this faithful
reproduction, we might not have been exposed to this issue.

Having done this myself, it has been my experience that many partial
build artifacts are captured in source tarballs that are not otherwise
maintained in the git repository.  For instance, in zfs (which I have
contributed to in the past), many automake files are regenerated.
(I do not believe that specific package is vulnerable to an attack
on the autoconf/automake files, since the debian package calls the
upstream tooling to regenerate those files.)

We already have a policy of not shipping upstream-built artifacts, so
I am making a proposal that I believe simply takes that one step further:

1. Move towards allowing, and then favoring, git-tags over source tarballs
2. Require upstream-built artifacts be removed (instead, generate these
   ab-initio during build)
3. Have tooling that automatically checks the sanitized sources against
   the development RCSs.
4. Look unfavorably on upstreams without RCS.

In the present case, the triggering modification was in a modified .m4 file
that injected a snippet into the configure script.  That modification
could have been flagged using this kind of process.

While this would be a lot of work, I believe doing so would require a
much larger amount of additional complexity in orchestrating attacks
against Debian in the future.

Best,
Antonio Russo

OpenPGP_0xB01C53D5DED4A4EE.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature