Re: Validating tarballs against git repositories

2024-03-29 Thread Russ Allbery
Antonio Russo  writes:

> The way I see it, there are two options in handling a buildable package:

> 1. That file would have been considered a build artifact, consequently
> removed and then regenerated.  No backdoor.

> 2. The file would not have been scrubbed, and a difference between the
> git version and the released tar version would have been noticed.
> Backdoor found.

> Either of these is, in my mind, dramatically better than what happened.

I think the point that you're assuming (probably because you quite
reasonably think it's too obvious to need to be stated, but I'm not sure
it's that obvious to everyone) is that malicious code injected via a
commit is significantly easier to detect than malicious code that is only
in the release tarball.

This is not *always* correct; it really depends on how many eyes are on
the upstream repository and how complex or unreadable the code upstream
writes normally is.  (For example, I am far from confident that I can
eyeball the difference between valid and malicious procmail-style C code
or random M4 files.)  I think it's clearly at least *sometimes* correct,
though, so I'm sympathetic, particularly given that it's already Debian
practice to regenerate the build system files anyway.

In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to
making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics.  I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm
not sure this is obviously true.

> Ok, so am I understanding you correctly in that you are saying: we do
> actually want *some* build artifacts in the source archives?

> If that's the case, could make those files at packaging time, analogous
> to the DFSG-exclude stripping process?

If I have followed this all correctly, I believe that in this case the
exploit is not in a build artifact.  It's in a very opaque source artifact
that is different in the release tarball from the Git archive.  Assuming
that I have that right, stripping build artifacts wouldn't have done
anything about this exploit, but comparing Git and release tarballs would
have.

I think you're here anticipating a *different* exploit that would be
carried in build artifacts that Debian didn't remove and reconstruct, and
that we want to remove those from our upstream source archives in order to
ensure that we can't accidentally do that.

> On 2024-03-29 22:41, Guillem Jover wrote:

>> (For dpkg at least I'm pondering whether to play with switching to
>> doing something equivalent to «git archive» though, but see above, or
>> maybe generate two tarballs, a plain «git archive» and a portable one.)

Yeah, with my upstream hat on, I'm considering something similar, but I
still believe I have users who want to compile from source on systems
without current autotools, so I still need separate release tarballs.
Having to generate multiple release artifacts (and document them, and
explain to people which ones they want, etc.) is certainly doable, but I
can't say that I'm all that thrilled about it.

I think with my upstream hat on I'd rather ship a clear manifest (checked
into Git) that tells distributions which files in the distribution tarball
are build artifacts, and guarantee that if you delete all of those files,
the remaining tree should be byte-for-byte identical with the
corresponding signed Git tag.  (In other words, Guillem's suggestion.)
Then I can continue to ship only one release artifact.

> I take a look at these every year or so to keep me terrified of C!  If
> it's a single upstream developer, I absolutely agree, but if there's an
> upstream community reviewing the git commits, I really do believe there
> is hope (of them!) identifying bad(tm) things.

A single upstream developer is the most common case, though.  Perhaps less
so for core libraries, but, well, there are plenty of examples.  (To pick
another one that comes readily to mind, zlib appears to only have one
active maintainer.)

The reality that we are struggling with is that the free software
infrastructure on which much of computing runs is massively and painfully
underfunded by society as a whole, and is almost entirely dependent on
random people maintaining things in their free time because they find it
fun, many of whom are close to burnout.  This is, in many ways, the true
root cause of this entire event.

The sad irony here is that the xz maintainer tried to do exactly what we
advise people in this situation to do: try to add a comaintainer to share
the work, and don't block work because you don't have time to personally
vet everything in detail.  This is *exactly* why maintainers often don't
want to do that, and thus force people to fork packages rather than join
in maintaining the existing package.

This is an aside, but this is why my personal policy for my own projects
that I no lon

Re: Validating tarballs against git repositories

2024-03-29 Thread Antonio Russo
On 2024-03-29 22:41, Guillem Jover wrote:
> Hi!
> 
> On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
>> This is a vector I've been somewhat paranoid about myself, and I
>> typically check the difference between git archive $TAG and the downloaded
>> tar, whenever I package things.  Obviously a backdoor could have been
>> inserted into the git repository directly, but there is a culture
>> surrounding good hygiene in commits: they ought to be small, focused,
>> and well described.
> 
> But the backdoor was in fact included in a git commit (it's hidden
> inside a test compressed file).
> 
> The part that was only present in the tarball was the code to extract
> and hook the inclusion of the backdoor via the build system.

Yes. The "test compressed file" needs to be massaged via:

  >  tr "\-_" " _\-" | xz -d

That code comes out of the m4 file, which is not present in git source
code.  I'm unaware at this point of any direct evidence that the git
source code alone is in any way dangerous (aside from the fact that we
cannot trust the developer at all!).

>> People are comfortable discussing and challenging
>> a commit that looks fishy, even if that commit is by the main developer
>> of a package.  I have been assuming tooling existed in package
>> maintainers' toolkits to verify the faithful reproduction of the
>> published git tag in the downloaded source tarball, beyond a signature
>> check by the upstream developer.  Apparently, this is not universal.
>>
>> Had tooling existed in Debian to automatically validate this faithful
>> reproduction, we might not have been exposed to this issue.
> 
> Given that the autogenerated stuff is not present in the git tree,
> a diff between tarball and git would always generate tons of delta,
> so this would not have helped.

I may not have been clear, but I'm suggesting scrubbing all the
autogenerated stuff, and comparing that against a similarly scrubbed
git tag contents.  (But you explain that this is problematic.)

>> Having done this myself, it has been my experience that many partial
>> build artifacts are captured in source tarballs that are not otherwise
>> maintained in the git repository.  For instance, in zfs (which I have
>> contributed to in the past), many automake files are regenerated.
>> (I do not believe that specific package is vulnerable to an attack
>> on the autoconf/automake files, since the debian package calls the
>> upstream tooling to regenerate those files.)

(Hopefully the above clears up that I at least have some superficial
awareness of the build artifacts showing up in the release tarball!)

>> We already have a policy of not shipping upstream-built artifacts, so
>> I am making a proposal that I believe simply takes that one step further:
>>
>> 1. Move towards allowing, and then favoring, git-tags over source tarballs
> 
> I assume you mean git archives out of git tags? Otherwise how do you
> go from git-tag to a source package in your mind?

I'm not wed to any specific mechanism, but I'd be content with that.  I'd
be most happy DD-signed tags that were certified dfsg, policy compliant
(i.e., lacking build artifacts), and equivalent to scrubbed upstream source.
(and more on that later, building on what you say).

Many repositories today already do things close to this with pristine-tar,
so this seems to me a direction where the tooling already exists.

I'll add that, if we drop the desire for a signed archive, and instead
require a signed git-tag (from which we can generate a source tar on
demand, as you suggest), we can drop the pristine-tar requirement.  If we
are less progressive, but move to exclusively with Debian-regenerated
.tar files, we can probably avoid many of the frustrating edge cases that
pristine-tar still struggles with.

>> 2. Require upstream-built artifacts be removed (instead, generate these
>>ab-initio during build)
> 
> The problem here is that the .m4 file to hook into the build system was
> named like one shipped by gnulib (so less suspicious), but xz-utils does
> not use gnulib, and thus the autotools machinery does not know anything
> about it, so even the «autoreconf -f -i» done by debhelper via
> dh-autoreconf, would not regenerate it.

The way I see it, there are two options in handling a buildable package:

1. That file would have been considered a build artifact, consequently
removed and then regenerated.  No backdoor.

2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.

Either of these is, in my mind, dramatically better than what happened.

One automatic approach would be run dh-autoreconf and identify the
changed files.  Remove those files from both the distributed tarball and
git tag.  Check if those differ. (You also suggest something very similar
to this, and repacking the archive with those debian-generated build
artifacts).

I may be missing something here, though!

> Re

Re: Validating tarballs against git repositories

2024-03-29 Thread Guillem Jover
Hi!

On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
> This is a vector I've been somewhat paranoid about myself, and I
> typically check the difference between git archive $TAG and the downloaded
> tar, whenever I package things.  Obviously a backdoor could have been
> inserted into the git repository directly, but there is a culture
> surrounding good hygiene in commits: they ought to be small, focused,
> and well described.

But the backdoor was in fact included in a git commit (it's hidden
inside a test compressed file).

The part that was only present in the tarball was the code to extract
and hook the inclusion of the backdoor via the build system.

> People are comfortable discussing and challenging
> a commit that looks fishy, even if that commit is by the main developer
> of a package.  I have been assuming tooling existed in package
> maintainers' toolkits to verify the faithful reproduction of the
> published git tag in the downloaded source tarball, beyond a signature
> check by the upstream developer.  Apparently, this is not universal.
> 
> Had tooling existed in Debian to automatically validate this faithful
> reproduction, we might not have been exposed to this issue.

Given that the autogenerated stuff is not present in the git tree,
a diff between tarball and git would always generate tons of delta,
so this would not have helped.

> Having done this myself, it has been my experience that many partial
> build artifacts are captured in source tarballs that are not otherwise
> maintained in the git repository.  For instance, in zfs (which I have
> contributed to in the past), many automake files are regenerated.
> (I do not believe that specific package is vulnerable to an attack
> on the autoconf/automake files, since the debian package calls the
> upstream tooling to regenerate those files.)
> 
> We already have a policy of not shipping upstream-built artifacts, so
> I am making a proposal that I believe simply takes that one step further:
> 
> 1. Move towards allowing, and then favoring, git-tags over source tarballs

I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?

> 2. Require upstream-built artifacts be removed (instead, generate these
>ab-initio during build)

The problem here is that the .m4 file to hook into the build system was
named like one shipped by gnulib (so less suspicious), but xz-utils does
not use gnulib, and thus the autotools machinery does not know anything
about it, so even the «autoreconf -f -i» done by debhelper via
dh-autoreconf, would not regenerate it.

Removing these might be cumbersome after the fact if upstream includes
for example their own maintained .m4 files. See dpkg's m4 dir for an
example of this (although there it's easy as all are namespaced but…).

Not using an upstream provided tarball, might also mean we stop being
able to use upstream signatures, which seems worse. The alternative
might be promoting for upstreams to just do the equivalent of
«git archive», but that might defeat the portability and dependency
reduction properties that were designed into the autotools build
system, or increase the bootstrap set (see for example the
pkg.dpkg.author-release build profile used by dpkg).

(For dpkg at least I'm pondering whether to play with switching to
doing something equivalent to «git archive» though, but see above, or
maybe generate two tarballs, a plain «git archive» and a portable one.)

> 3. Have tooling that automatically checks the sanitized sources against
>the development RCSs.

Perhaps we could have a declarative way to state all the autogenerated
artifacts included in a tarball that need to be cleaned up
automatically after unpack, in a similar way as how we have a way to
automatically exclude stuff when repackaging tarballs via uscan?

(.gitignore, if upstream properly maintains those might be a good
starting point, but that will tend to include more than necessary.)

> 4. Look unfavorably on upstreams without RCS.

Some upstreams have a VCS, but still do massive code drops, or include
autogenerated stuff in the VCS, or do not do atomic commits, or in
addition their commit message are of the style "fix stuff", "." or
alike. So while this is something we should encourage, it's not
sufficient. I think part of this might already be present in our
Upstream Guidelines in the wiki.

> In the present case, the triggering modification was in a modified .m4 file
> that injected a snippet into the configure script.  That modification
> could have been flagged using this kind of process.

I don't think this modification would have been spotted, because it
was not modifying a file it would usually get autogenerated by its
build system.

> While this would be a lot of work, I believe doing so would require a
> much larger amount of additional complexity in orchestrating attacks
> against Debian in the future.

It would certainly make it a bit harder, but I'm afra

Bug#1068048: ITA: gnu-which -- Utility to show the full path of commands

2024-03-29 Thread Zachary Liebl
Package: wnpp
Severity: normal
Owner: Zachary Liebl 
X-Debbugs-Cc: debian-devel@lists.debian.org, deb...@zachliebl.com

  Package name: gnu-which
  Version : 2.21+dfsg-2
  Upstream Contact: Carlo Wood 
  URL : https://savannah.gnu.org/projects/which
  License : GPL-3+
  Programming Lang: C
  Description : Utility to show the full path of commands

This package provides the classic unix "which" command.

It has recently been orphaned and I intend to adopt it. I have contacted the 
original debian maintainer and he as approved of this. 

This will be my first time maintaining a package, so I intentionally chose a 
simple one.


The package description is:
 This package provides GNU implementation of which command.
 This tool provides the functionality to show the full path
 of commands.

From,
Zachary Liebl



Validating tarballs against git repositories

2024-03-29 Thread Antonio Russo
Hello everyone,

As I'm sure we're all aware of at this point, Debian has been a victim
of a relatively sophisticated first-party attack whereby a backdoor
of the XZ package was smuggled into sshd via a systemd dependency.
This backdoor, at a minimum, attacked key verification. As far as I
understand, it is not yet understood what exactly the effects of
these backdoors are. (There are two versions 5.6.0 and 5.6.1 that are
affected, and investigation is ongoing.)

There are many things to talk about here, but one that involves the
task of package maintainers, and that I would like to discuss now, is
the way the backdoor was distributed.  The code in the xz git
repository does not build a vulnerable version, while the code in the
5.6.0 and 5.6.1 source tarballs do.

This is a vector I've been somewhat paranoid about myself, and I
typically check the difference between git archive $TAG and the downloaded
tar, whenever I package things.  Obviously a backdoor could have been
inserted into the git repository directly, but there is a culture
surrounding good hygiene in commits: they ought to be small, focused,
and well described.  People are comfortable discussing and challenging
a commit that looks fishy, even if that commit is by the main developer
of a package.  I have been assuming tooling existed in package
maintainers' toolkits to verify the faithful reproduction of the
published git tag in the downloaded source tarball, beyond a signature
check by the upstream developer.  Apparently, this is not universal.

Had tooling existed in Debian to automatically validate this faithful
reproduction, we might not have been exposed to this issue.

Having done this myself, it has been my experience that many partial
build artifacts are captured in source tarballs that are not otherwise
maintained in the git repository.  For instance, in zfs (which I have
contributed to in the past), many automake files are regenerated.
(I do not believe that specific package is vulnerable to an attack
on the autoconf/automake files, since the debian package calls the
upstream tooling to regenerate those files.)

We already have a policy of not shipping upstream-built artifacts, so
I am making a proposal that I believe simply takes that one step further:

1. Move towards allowing, and then favoring, git-tags over source tarballs
2. Require upstream-built artifacts be removed (instead, generate these
   ab-initio during build)
3. Have tooling that automatically checks the sanitized sources against
   the development RCSs.
4. Look unfavorably on upstreams without RCS.

In the present case, the triggering modification was in a modified .m4 file
that injected a snippet into the configure script.  That modification
could have been flagged using this kind of process.

While this would be a lot of work, I believe doing so would require a
much larger amount of additional complexity in orchestrating attacks
against Debian in the future.

Best,
Antonio Russo

OpenPGP_0xB01C53D5DED4A4EE.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: xz backdoor

2024-03-29 Thread Russ Allbery
Moritz Mühlenhoff  writes:
> Russ Allbery  wrote:

>> I think this question can only be answered with reverse-engineering of
>> the backdoors, and I personally don't have the skills to do that.

> In the pre-disclosure discussion permission was asked to share the
> payload with a company specialising in such reverse engineering. If that
> went through, I'd expect results to be publicly available in the next
> days.

Excellent, thank you.

For those who didn't read the analysis on oss-security yet, note that the
initial investigation of the injected exploit indicates that it
deactivates itself if argv[0] is not /usr/sbin/sshd, so there are good
reasons to believe that the problem is bounded to testing or unstable
systems running the OpenSSH server.  If true, this is a huge limiting
factor and in many ways quite relieving compared to what could have
happened.  But the stakes are high enough that hopefully we'll get
detailed confirmation from people with expertise in understanding this
sort of thing.

-- 
Russ Allbery (r...@debian.org)  



Re: xz backdoor

2024-03-29 Thread Moritz Mühlenhoff
Russ Allbery  wrote:
> I think this question can only be answered with reverse-engineering of the
> backdoors, and I personally don't have the skills to do that.

In the pre-disclosure discussion permission was asked to share the payload
with a company specialising in such reverse engineering. If that went
through, I'd expect results to be publicly available in the next days.

Cheers,
Moritz



Re: xz backdoor

2024-03-29 Thread Russ Allbery
Russ Allbery  writes:
> Sirius  writes:

>> This is quite actively discussed on Fedora lists.
>> https://www.openwall.com/lists/oss-security/2024/
>> https://www.openwall.com/lists/oss-security/2024/03/29/4

>> Worth taking a look if action need to be taken on Debian.

> The version of xz-utils was reverted to 5.4.5 in unstable yesterday by
> the security team and migrated to testing today.  Anyone running an
> unstable or testing system should urgently upgrade.

I think the big open question we need to ask now is what exactly the
backdoor (or, rather, backdoors; we know there were at least two versions
over time) did.  If they only target sshd, that's one thing, and we have a
bound on systems possibly affected.  But liblzma is linked directly or
indirectly into all sorts of things such as, to give an obvious example,
apt-get.  A lot of Debian developers use unstable or testing systems.  If
the exploit was also exfiltrating key material, backdooring systems that
didn't use sshd, etc., we have a lot more cleanup to do.

I think this question can only be answered with reverse-engineering of the
backdoors, and I personally don't have the skills to do that.

-- 
Russ Allbery (r...@debian.org)  



Re: xz backdoor

2024-03-29 Thread Geert Stappers
On Fri, Mar 29, 2024 at 09:09:45PM +0100, Sirius wrote:
> Hi there,
> 
> This is quite actively discussed on Fedora lists.
> https://www.openwall.com/lists/oss-security/2024/
> https://www.openwall.com/lists/oss-security/2024/03/29/4
> 
> Worth taking a look if action need to be taken on Debian.
> 

https://tracker.debian.org/news/1515519/accepted-xz-utils-561really545-1-source-into-unstable/

 
Groeten
Geert Stappers
-- 
Silence is hard to parse



Re: xz backdoor

2024-03-29 Thread Russ Allbery
Sirius  writes:

> This is quite actively discussed on Fedora lists.
> https://www.openwall.com/lists/oss-security/2024/
> https://www.openwall.com/lists/oss-security/2024/03/29/4

> Worth taking a look if action need to be taken on Debian.

The version of xz-utils was reverted to 5.4.5 in unstable yesterday by the
security team and migrated to testing today.  Anyone running an unstable
or testing system should urgently upgrade.

-- 
Russ Allbery (r...@debian.org)  



Re: xz backdoor

2024-03-29 Thread Jérémy Lal
xz-utils (5.6.1+really5.4.5-1) unstable; urgency=critical



  * Non-maintainer upload by the Security Team.

  * Revert back to the 5.4.5-0.2 version



 -- Salvatore Bonaccorso   Thu, 28 Mar 2024 15:59:38
+0100

Le ven. 29 mars 2024 à 21:17, Sirius  a écrit :

> Hi there,
>
> This is quite actively discussed on Fedora lists.
> https://www.openwall.com/lists/oss-security/2024/
> https://www.openwall.com/lists/oss-security/2024/03/29/4
>
> Worth taking a look if action need to be taken on Debian.
>
> --
> Kind regards,
>
> /S
>
>


xz backdoor

2024-03-29 Thread Sirius
Hi there,

This is quite actively discussed on Fedora lists.
https://www.openwall.com/lists/oss-security/2024/
https://www.openwall.com/lists/oss-security/2024/03/29/4

Worth taking a look if action need to be taken on Debian.

-- 
Kind regards,

/S



Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)

2024-03-29 Thread Rene Engelhard

Hi,

Am 25.03.24 um 19:17 schrieb Julian Gilbey:

   * Reading and writing file formats (like CSV, Apache ORC, and Apache
 Parquet)


liborcus supports this (Apache Parquet) if built with Apache Arrow. And 
thus makes LibreOffice being able to handle it.


I didn't invest any time in Apache Arrow since I am already too low on 
time anyway and I deemed it too a "low popularity" thing anyway.



So this is a plea for anyone looking for something really helpful to
do: it would be great to have a group of developers finally package
this!

Indeed.

There was some initial work done (see the RFP bug report for
details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021),
but that is fairly old now.  As Apache Arrow supports numerous
languages, it may well benefit from having a group of developers with
different areas of expertise to build it.  (Or perhaps it would make
more sense to split the upstream source into a collection of different
Debian source packages for the different supported languages.  I don't
know.)


Would definitely make transitions easier.


  Unfortunately I don't have the capacity to devote any time to
it myself.


Dito.


Regards,


Rene



Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)

2024-03-29 Thread Diane Trout
On Mon, 2024-03-25 at 18:17 +, Julian Gilbey wrote:
> 
> 
> So this is a plea for anyone looking for something really helpful to
> do: it would be great to have a group of developers finally package
> this!  There was some initial work done (see the RFP bug report for
> details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021),
> but that is fairly old now.  As Apache Arrow supports numerous
> languages, it may well benefit from having a group of developers with
> different areas of expertise to build it.  (Or perhaps it would make
> more sense to split the upstream source into a collection of
> different
> Debian source packages for the different supported languages.  I
> don't
> know.)  Unfortunately I don't have the capacity to devote any time to
> it myself.
> 
> Thanks in advance for anyone who can step forward for this!

I've been maintain dask and anndata and saw that apache arrow was
getting increasingly popular.

I took the current science-team preliminary packaging 7.0.0 packaging
and managed to get it to build through a combination of patches and
turning off features.

I even mostly managed to get pyarrow to build. (Though some tests fail
due to pytest lazy-fixture being abandoned).

I pushed my current work in progress to.

https://salsa.debian.org/diane/arrow.git

Was anyone else planning on working on it or should I push my updates
to the science-team package?

Diane



Bug#1067948: ITP: python-naked -- a command line application framework

2024-03-29 Thread Josenilson Ferreira da Silva
Package: wnpp
Severity: wishlist
Owner: Josenilson Ferreira da Silva 
X-Debbugs-Cc: debian-devel@lists.debian.org, nilsonfsi...@hotmail.com

* Package name: python-naked
  Version : 0.1.32
  Upstream Contact:  Christopher Simpkins 
* URL : https://github.com/chrissimpkins/naked
* License : MIT/X
  Programming Lang: Python
  Description : a command line application framework

 Naked is a Python library that allows you to create executable scripts in
 Python quickly and easily, without the need to use a complex development
 environment. It is designed to simplify the process of turning Python scripts
 into standalone applications that can run directly on any operating system
 without needing to install additional dependencies.
 .
 Offers a simple and straightforward API that allows developers to create
 executable scripts with just a few lines of code, ensuring that scripts
 created with "Naked" can be run in a variety of Python environments, as
 well as different operating systems, including Windows, macOS, and Linux,
 without the need for additional modification.

 Note: This package is a required dependency for the TeraBoxUtility
 ITP package: #1067395