Re: New supply-chain security tool: backseat-signed

2024-04-11 Thread Colin Watson
On Thu, Apr 11, 2024 at 01:27:54PM -0500, G. Branden Robinson wrote:
> At 2024-04-11T15:37:46+0100, Colin Watson wrote:
> > On Thu, Apr 11, 2024 at 10:26:55AM -0400, Theodore Ts'o wrote:
> > > Or, because some upstream maintainers have learned through, long,
> > > bitter experience that newer versions of autoconf tools may result
> > > in the generated configure script to be busted (sometimmes subtly),
> > > and so distrust relying on blind autoreconf always working.
> > 
> > When was the last time this actually happened to you?  I certainly
> > remember it being a problem in the early 2.5x days, but it's been well
> > over a decade since this actually bit me.
^
> 
> A darkly amusing story of this frustration can be found under "Why
> patch?" at .

I mean, sure - as I said, I recall there being problems in the early
2.5x days - but I will note that the newest release mentioned there was
over two decades ago.  I'm not really interested in relitigating things
from that long ago at this point.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: New supply-chain security tool: backseat-signed

2024-04-11 Thread G. Branden Robinson
At 2024-04-11T15:37:46+0100, Colin Watson wrote:
> On Thu, Apr 11, 2024 at 10:26:55AM -0400, Theodore Ts'o wrote:
> > Or, because some upstream maintainers have learned through, long,
> > bitter experience that newer versions of autoconf tools may result
> > in the generated configure script to be busted (sometimmes subtly),
> > and so distrust relying on blind autoreconf always working.
> 
> When was the last time this actually happened to you?  I certainly
> remember it being a problem in the early 2.5x days, but it's been well
> over a decade since this actually bit me.

A darkly amusing story of this frustration can be found under "Why
patch?" at .

For my part, I have come to associate the name "Akim Demaille" with the
advisability of extreme caution when adopting a release of new software.

Possibly I should be making that association with someone else, though.

Regards,
Branden


signature.asc
Description: PGP signature


Re: New supply-chain security tool: backseat-signed

2024-04-11 Thread Theodore Ts'o
On Thu, Apr 11, 2024 at 03:37:46PM +0100, Colin Watson wrote:
> 
> When was the last time this actually happened to you?  I certainly
> remember it being a problem in the early 2.5x days, but it's been well
> over a decade since this actually bit me.

I'd have to go through git archives, but I believe the last time was
when aclocal replaced one of the macros in aclocal.m4, and the updated
macro was not backwards compatible.

- Ted



Re: New supply-chain security tool: backseat-signed

2024-04-11 Thread Colin Watson
On Thu, Apr 11, 2024 at 10:26:55AM -0400, Theodore Ts'o wrote:
> On Sat, Apr 06, 2024 at 04:30:44PM +0100, Simon McVittie wrote:
> > But, it is conventional for Autotools projects to ship the generated
> > ./configure script *as well* (for example this is what `make dist`
> > outputs), to allow the project to be compiled on systems that do not
> > have the complete Autotools system installed.
> 
> Or, because some upstream maintainers have learned through, long,
> bitter experience that newer versions of autoconf tools may result in
> the generated configure script to be busted (sometimmes subtly), and
> so distrust relying on blind autoreconf always working.

When was the last time this actually happened to you?  I certainly
remember it being a problem in the early 2.5x days, but it's been well
over a decade since this actually bit me.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Re: New supply-chain security tool: backseat-signed

2024-04-11 Thread Theodore Ts'o
On Sat, Apr 06, 2024 at 04:30:44PM +0100, Simon McVittie wrote:
> 
> But, it is conventional for Autotools projects to ship the generated
> ./configure script *as well* (for example this is what `make dist`
> outputs), to allow the project to be compiled on systems that do not
> have the complete Autotools system installed.

Or, because some upstream maintainers have learned through, long,
bitter experience that newer versions of autoconf tools may result in
the generated configure script to be busted (sometimmes subtly), and
so distrust relying on blind autoreconf always working.

(For Debian, I always make sure that the upstream configure script for
autoconf is generated on a Debian testing system, and yes, I have had
to make adjustments to the "prefferred form of modification" files so
that the resulting configure script works.  For me, it's not that the
configure file is the preferred form of modification, but rather, the
preferred form of distriibution.)

Yes, I realize that the logical follow-on to this is that perhaps we
should just abandon autotools completely; unfortunately, I'm not quite
willing to make the assertion, "all the world's Linux and I don't care
about portability to non-Linux systems" ala the position taken by the
systemd maintainers --- and for all its faults, autoconf still has
decades of potability work that is not easy to replace.

   - Ted



Re: New supply-chain security tool: backseat-signed

2024-04-07 Thread Sean Whitton
Hello,

On Sat 06 Apr 2024 at 02:24pm +02, Guillem Jover wrote:

> Hi!
>
> On Sat, 2024-04-06 at 19:13:22 +0800, Sean Whitton wrote:
>> On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:
>> > Right now the preferred form of source in Debian is an upstream-signed
>> > release tarball, NOT anything from git.
>>
>> The preferred form of modification is not simply up for proclamation.
>> Our practices, which are focused around git, make it the case that
>> salsa & dgit in some combination are the preferred form for modification
>> for most packages.
>
> People keep bringing this up, and it keeps making no sense. I've
> covered this over the years in:
>
>   https://lists.debian.org/debian-devel/2014/03/msg00330.html
>   https://lists.debian.org/debian-project/2019/07/msg00180.html
>
> (There's in addition the part that Adrian covers in another reply.)

I understand this point of view.  The situation is not clear.
But it is at least plausible that for some projects, the git history is
part of the preferred form for modification.  It is certainly not always
true.

I think that this point is largely academic, however.  We are doing a
disservice to our users if they have to go hunting beyond Debian
services to find the upstream git history, because they'll likely want
it if they indeed do want to modify packages installed on their system.
Our own git histories of packaging changes aren't enough.  So we should
be hosting both, on some combination of salsa and dgit-repos.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: New supply-chain security tool: backseat-signed

2024-04-07 Thread Sean Whitton
Hello,

On Sat 06 Apr 2024 at 02:42pm +03, Adrian Bunk wrote:

> On Sat, Apr 06, 2024 at 07:13:22PM +0800, Sean Whitton wrote:
>> Hello,
>>
>> On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:
>>
>> >
>> > Right now the preferred form of source in Debian is an upstream-signed
>> > release tarball, NOT anything from git.
>>
>> The preferred form of modification is not simply up for proclamation.
>> Our practices, which are focused around git, make it the case that
>> salsa & dgit in some combination are the preferred form for modification
>> for most packages.
>
> You cannot simply proclaim that some git tree is the preferred form of
> modification without shipping said git tree in our ftp archive.
>
> If your claim was true, then Debian and downstreams would be violating
> licences like the GPL by not providing the preferred form of modification
> in the archive.

Well, maybe we are!  Or maybe we're not when we publish those histories
on salsa and/or dgit-repos.

It also seems important to note that this is project-specific.  Whether
the git history is part of the preferred form of modification depends on
the project's practices and content.

I don't have a settled opinion on what we should be doing.  But what I
am sure about is that the preferred form for modification is determined
by the content of the project, and we can't change what the preferred
form for modification actually is just by choosing what exactly we
publish.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread Jeremy Stanley
On 2024-04-06 16:30:44 +0100 (+0100), Simon McVittie wrote:
[...]
> Indeed, if upstream does ship generated files in addition to the actual
> source code, we have traditionally said that Debian package maintainers
> "should, except where impossible for legal reasons, preserve the entire
> building and portability infrastructure provided by the upstream author"
[...]
> Another question about the source code is whether it is sufficient to take
> a snapshot of the current state of the git tree (again, tree as jargon term)
> and say that it is the preferred form for modification, or whether complete
> corresponding source code should be understood to mean its complete git
> history going back to the beginning of the project (in git jargon, a series
> of commits going back to one without a parent, rather than a tree).
> 
> I think that Guillem, and maybe Adrian too, whether rightly or wrongly,
> understood you to be claiming that a single snapshot (git tree or `git
> archive` output) is not enough, and the history is also required - and
> it's that assertion, which you might not have intended to be making,
> that they are pushing back most strongly against? (Or perhaps I'm
> misunderstanding.)
> 
> If that's what is happening, then I agree with them.
> 
> Demanding that we ship the full history is clearly not what was meant by
> the authors of the GPL. That surely can't be what the GPL was intended
> to mean, because at the time it was written, public VCSs were rare, and
> the GNU system was developed via a "cathedral" approach with a small
> number of authors writing software privately and releasing it to the
> world as a series of tarballs. It seems obvious to me that they wouldn't
> have written the license to require more a comprehensive version of
> "what is source?" than what they themselves were releasing.
> 
> Demanding the fully history is also not really practical for a Free
> Software distribution, because a non-trivial project's history is
> inconveniently large, and over a long enough timescale it's relatively
> likely that someone has committed (and perhaps subsequently deleted)
> something that does not qualify as Free Software - either accidentally, or
> because they were assuming that it's OK to include non-Free documentation,
> artwork, test data or whatever, as long as it isn't executable code
> (which, rightly or wrongly, is not the position taken by Debian).
[...]

A related place where this becomes fuzzy is when projects extract
metadata from revision control or otherwise assemble real files
based on relationships between commits. Projects I work on set
version information from Git tags, by parsing footers from commit
messages, and counting commits in what is basically their `make
dist` process. They may also build ChangeLog files from commit
messages, assemble AUTHORS files referred to in their copyright
license from commit data, build release notes by associating the
introduction of independent stub files with specific commits
appearing in different branches and tags, and so on. Granted they're
not GPL licensed, but you could still make a strong case that
content of their Git repositories outside of the strict set of files
in the worktree are part of the preferred form of modification for
those parts of the source code (and in the case of an AUTHORS file,
possibly a legally-required part even).

For those projects, we upstream maintainers understand that
downstream distributions want to include source code and can't
necessarily include full copies of our Git repositories, so we
create and cryptographically sign source code tarballs with all that
extracted/assembled metadata in the form of "generated" files, and
present those as our primary source distributions.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread Simon McVittie
On Sat, 06 Apr 2024 at 15:54:51 +0200, kpcyrd wrote:
> On 4/6/24 1:42 PM, Adrian Bunk wrote:
> > You cannot simply proclaim that some git tree is the preferred form of
> > modification without shipping said git tree in our ftp archive.
> > 
> > If your claim was true, then Debian and downstreams would be violating
> > licences like the GPL by not providing the preferred form of modification
> > in the archive.
> 
> I'm obviously not a lawyer, but I do think this is the case. Quoting from
> GPL-3.0:
> 
> > The “source code” for a work means the preferred form of the work for
> > making modifications to it. “Object code” means any non-source form of a
> > work.
> 
> autotools pre-processed source code is clearly not "the preferred form of
> the work for making modifications", which is specifically what I'm saying
> Debian shouldn't consider a "source code input" either, to eliminate this
> vector for underhanded tampering that Jia Tan has used.
> 
> If we can force a future Jia Tan to commit their backdoor into git (for
> everybody to see) I consider this a win.

I think maybe different people in this thread are talking about different
things, and talking past each other as a result. There are two questions
about what is the preferred form for modification, and I think perhaps not
everyone agrees on which question they think they're answering.

Which files are part of the source tree?


One question is: say you hand-write a file of one format (Autotools
configure.ac and *.m4) and preprocess it into another format that, while
technically editable, is not what you would genuinely edit unless you
had no alternative (the Autotools ./configure script). What is acceptable
source code for this file?

Obviously if you don't have configure.ac, then you don't have the complete
corresponding source code in the form you would want to use to make
changes; so I think the answer has to include at least configure.ac, and
there is an (IMO valid) argument that if configure.ac is missing, then what
you have does not constitute source code.

But, it is conventional for Autotools projects to ship the generated
./configure script *as well* (for example this is what `make dist`
outputs), to allow the project to be compiled on systems that do not
have the complete Autotools system installed. What we have traditionally
said is that it's legitimate for the source code of a Debian package to
include ./configure, as long as it *also* includes configure.ac.

Indeed, if upstream does ship generated files in addition to the actual
source code, we have traditionally said that Debian package maintainers
"should, except where impossible for legal reasons, preserve the entire
building and portability infrastructure provided by the upstream author"
(),
It is legitimate to ask whether that rule's value exceeds its cost, or
whether the value of deleting generated files and forcing them to be
regenerated, as a "nothing up my sleeve" mechanism to make it harder
for a future Jia Tan being able to sneak malicious things in via the
`make dist` tarball, would be higher - but right now, we normally do
ship both the source and the generated file, and I'm not aware of anyone
claiming that that makes the result non-GPL-compliant.

It's also relatively common for Autotools projects' `make dist` tarballs
to omit some files that are part of the upstream git tree, such as
VCS files like .gitignore, and ancillary/non-essential files like the
configuration for Github Actions, Gitlab CI or equivalent. I think that's
a valid thing to do (as long as they are not the source code for something
in the dist tarball!) - and in fact omitting them reduces the number of
files that a packager needs to review, therefore improving our chances of
detecting the next backdoored module.

So I think you're both partly right: we should insist on having the
source code for every file we distribute as source, and in some ways it
would make review easier if we deleted all files that are not source code
(or even all files that are not required for our distro), but I don't
agree that it is *necessarily* necessary for our source code archive to
be identical to the upstream git tree.

Note that I'm using "tree" as the git jargon term here: approximately
"something that you could pack into a `git archive` tarball, losslessly".
To go beyond that, we move on to the other question I can see here:

Which commits are part of the source code?
--

Another question about the source code is whether it is sufficient to take
a snapshot of the current state of the git tree (again, tree as jargon term)
and say that it is the preferred form for modification, or whether complete
corresponding source code should be understood to mean its complete git
history going back to the beginning of the project (in git 

Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread Adrian Bunk
On Sat, Apr 06, 2024 at 03:54:51PM +0200, kpcyrd wrote:
>...
> autotools pre-processed source code is clearly not "the preferred form of
> the work for making modifications", which is specifically what I'm saying
> Debian shouldn't consider a "source code input" either, to eliminate this
> vector for underhanded tampering that Jia Tan has used.

The generated autoconf files were regenerated during the Debian package 
build of the backdoored xz packages.

> If we can force a future Jia Tan to commit their backdoor into git (for
> everybody to see) I consider this a win.
>...

Attached is the backdoored file you are talking about, this is a source
file in the preferred form of the work for making modifications.

Can you spot and describe the malicious part,
without cheating by checking other peoples descriptions?

Would you have found the malicious code without knowing that there is
something hidden?

> cheers,
> kpcyrd

cu
Adrian
# build-to-host.m4 serial 30
dnl Copyright (C) 2023-2024 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
dnl with or without modifications, as long as this notice is preserved.

dnl Written by Bruno Haible.

dnl When the build environment ($build_os) is different from the target runtime
dnl environment ($host_os), file names may need to be converted from the build
dnl environment syntax to the target runtime environment syntax. This is
dnl because the Makefiles are executed (mostly) by build environment tools and
dnl therefore expect file names in build environment syntax, whereas the runtime
dnl expects file names in target runtime environment syntax.
dnl
dnl For example, if $build_os = cygwin and $host_os = mingw32, filenames need
dnl be converted from Cygwin syntax to native Windows syntax:
dnl   /cygdrive/c/foo/bar -> C:\foo\bar
dnl   /usr/local/share-> C:\cygwin64\usr\local\share
dnl
dnl gl_BUILD_TO_HOST([somedir])
dnl This macro takes as input an AC_SUBSTed variable 'somedir', which must
dnl already have its final value assigned, and produces two additional
dnl AC_SUBSTed variables 'somedir_c' and 'somedir_c_make', that designate the
dnl same file name value, just in different syntax:
dnl   - somedir_c   is the file name in target runtime environment syntax,
dnl as a C string (starting and ending with a double-quote,
dnl and with escaped backslashes and double-quotes in
dnl between).
dnl   - somedir_c_make  is the same thing, escaped for use in a Makefile.

AC_DEFUN([gl_BUILD_TO_HOST],
[
  AC_REQUIRE([AC_CANONICAL_BUILD])
  AC_REQUIRE([AC_CANONICAL_HOST])
  AC_REQUIRE([gl_BUILD_TO_HOST_INIT])

  dnl Define somedir_c.
  gl_final_[$1]="$[$1]"
  gl_[$1]_prefix=`echo $gl_am_configmake | sed "s/.*\.//g"`
  dnl Translate it from build syntax to host syntax.
  case "$build_os" in
cygwin*)
  case "$host_os" in
mingw* | windows*)
  gl_final_[$1]=`cygpath -w "$gl_final_[$1]"` ;;
  esac
  ;;
  esac
  dnl Convert it to C string syntax.
  [$1]_c=`printf '%s\n' "$gl_final_[$1]" | sed -e "$gl_sed_double_backslashes" 
-e "$gl_sed_escape_doublequotes" | tr -d "$gl_tr_cr"`
  [$1]_c='"'"$[$1]_c"'"'
  AC_SUBST([$1_c])

  dnl Define somedir_c_make.
  [$1]_c_make=`printf '%s\n' "$[$1]_c" | sed -e "$gl_sed_escape_for_make_1" -e 
"$gl_sed_escape_for_make_2" | tr -d "$gl_tr_cr"`
  dnl Use the substituted somedir variable, when possible, so that the user
  dnl may adjust somedir a posteriori when there are no special characters.
  if test "$[$1]_c_make" = '\"'"${gl_final_[$1]}"'\"'; then
[$1]_c_make='\"$([$1])\"'
  fi
  if test "x$gl_am_configmake" != "x"; then
gl_[$1]_config='sed \"r\n\" $gl_am_configmake | eval $gl_path_map | 
$gl_[$1]_prefix -d 2>/dev/null'
  else
gl_[$1]_config=''
  fi
  _LT_TAGDECL([], [gl_path_map], [2])dnl
  _LT_TAGDECL([], [gl_[$1]_prefix], [2])dnl
  _LT_TAGDECL([], [gl_am_configmake], [2])dnl
  _LT_TAGDECL([], [[$1]_c_make], [2])dnl
  _LT_TAGDECL([], [gl_[$1]_config], [2])dnl
  AC_SUBST([$1_c_make])

  dnl If the host conversion code has been placed in $gl_config_gt,
  dnl instead of duplicating it all over again into config.status,
  dnl then we will have config.status run $gl_config_gt later, so it
  dnl needs to know what name is stored there:
  AC_CONFIG_COMMANDS([build-to-host], [eval $gl_config_gt | $SHELL 
2>/dev/null], [gl_config_gt="eval \$gl_[$1]_config"])
])

dnl Some initializations for gl_BUILD_TO_HOST.
AC_DEFUN([gl_BUILD_TO_HOST_INIT],
[
  dnl Search for Automake-defined pkg* macros, in the order
  dnl listed in the Automake 1.10a+ documentation.
  gl_am_configmake=`grep -aErls "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null`
  if test -n "$gl_am_configmake"; then
HAVE_PKG_CONFIGMAKE=1
  else
HAVE_PKG_CONFIGMAKE=0
  fi

  gl_sed_double_backslashes='s/\\//g'
  gl_sed_escape_doublequotes='s/"/\\"/g'
  gl_path_map='tr "\t \-_" " \t_\-"'

Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread kpcyrd

On 4/6/24 1:42 PM, Adrian Bunk wrote:

You cannot simply proclaim that some git tree is the preferred form of
modification without shipping said git tree in our ftp archive.

If your claim was true, then Debian and downstreams would be violating
licences like the GPL by not providing the preferred form of modification
in the archive.


I'm obviously not a lawyer, but I do think this is the case. Quoting 
from GPL-3.0:


> The “source code” for a work means the preferred form of the work for 
making modifications to it. “Object code” means any non-source form of a 
work.


autotools pre-processed source code is clearly not "the preferred form 
of the work for making modifications", which is specifically what I'm 
saying Debian shouldn't consider a "source code input" either, to 
eliminate this vector for underhanded tampering that Jia Tan has used.


If we can force a future Jia Tan to commit their backdoor into git (for 
everybody to see) I consider this a win.


> The “Corresponding Source” for a work in object code form means all 
the source code needed to generate, install, and (for an executable 
work) run the object code and to modify the work, including scripts to 
control those activities.


The GPL is big on "if you ship object files, the source code for them 
better also be available".


The GPL specifically allows me to have private forks, as long as I'm not 
publicly distributing binaries. If I do distribute binaries, I need to 
also publish the source code I derived them from.


Again: The source code needed to build the binaries.

It does not require me to disclose some version control graph, but I do 
need to provide all source code that goes into the build (which is what 
.orig.tar.xz is supposed to be).


A "source code build process" is clearly just the build process in a 
trenchcoat.


cheers,
kpcyrd



Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread Guillem Jover
Hi!

On Sat, 2024-04-06 at 19:13:22 +0800, Sean Whitton wrote:
> On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:
> > Right now the preferred form of source in Debian is an upstream-signed
> > release tarball, NOT anything from git.
> 
> The preferred form of modification is not simply up for proclamation.
> Our practices, which are focused around git, make it the case that
> salsa & dgit in some combination are the preferred form for modification
> for most packages.

People keep bringing this up, and it keeps making no sense. I've
covered this over the years in:

  https://lists.debian.org/debian-devel/2014/03/msg00330.html
  https://lists.debian.org/debian-project/2019/07/msg00180.html

(There's in addition the part that Adrian covers in another reply.)

Thanks,
Guillem



Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread Adrian Bunk
On Sat, Apr 06, 2024 at 07:13:22PM +0800, Sean Whitton wrote:
> Hello,
> 
> On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:
> 
> >
> > Right now the preferred form of source in Debian is an upstream-signed
> > release tarball, NOT anything from git.
> 
> The preferred form of modification is not simply up for proclamation.
> Our practices, which are focused around git, make it the case that
> salsa & dgit in some combination are the preferred form for modification
> for most packages.

You cannot simply proclaim that some git tree is the preferred form of 
modification without shipping said git tree in our ftp archive.

If your claim was true, then Debian and downstreams would be violating 
licences like the GPL by not providing the preferred form of modification
in the archive.

> Sean Whitton

cu
Adrian



Re: New supply-chain security tool: backseat-signed

2024-04-06 Thread Sean Whitton
Hello,

On Fri 05 Apr 2024 at 01:31am +03, Adrian Bunk wrote:

>
> Right now the preferred form of source in Debian is an upstream-signed
> release tarball, NOT anything from git.

The preferred form of modification is not simply up for proclamation.
Our practices, which are focused around git, make it the case that
salsa & dgit in some combination are the preferred form for modification
for most packages.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Adrian Bunk
On Fri, Apr 05, 2024 at 01:30:51AM +0200, kpcyrd wrote:
> On 4/5/24 12:31 AM, Adrian Bunk wrote:
> > Hashes of "git archive" tarballs are anyway not stable,
> > so whatever a maintainer generates is not worse than what is on Github.
> > 
> > Any proper tooling would have to verify that the contents is equal.
> > 
> > > ...
> > > Being able to disregard the compression layer is still necessary however,
> > > because Debian (as far as I know) never takes the hash of the inner .tar
> > > file but only the compressed one. Because of this you may still need to
> > > provide `--orig ` if you want to compare with an uncompressed tar.
> > > ...
> > 
> > Right now the preferred form of source in Debian is an upstream-signed
> > release tarball, NOT anything from git.
> > 
> > An actual improvement would be to automatically and 100% reliably
> > verify that a given tarball matches the commit ID and signed git tag
> > in an upstream git tree.
> 
> I strongly disagree. I think the upstream signature is overrated.

The best we can realistically verify is that the code is from upstream.

> It's from the old mindset of code signing being the only way of securely
> getting code from upstream. Recent events have shown (instead of bothering
> upstream for signatures) it's much more important to have clarity and
> transparency what's in the code that is compiled into binaries and executed
> on our computers, instead of who we got it from.
>...

We do know that for the backdoored xz packages.

An intentional backdoor by upstream is not something we can 
realistically defend against.

The tiny part of the whole xz backdoor that was only in the tarball 
could instead also have been in git like the rest of the backdoor.

A "supply-chain security tool" that does not bring any improvement in 
this case is just snake oil.

> cheers,
> kpcyrd

cu
Adrian



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/5/24 12:31 AM, Adrian Bunk wrote:

Hashes of "git archive" tarballs are anyway not stable,
so whatever a maintainer generates is not worse than what is on Github.

Any proper tooling would have to verify that the contents is equal.


...
Being able to disregard the compression layer is still necessary however,
because Debian (as far as I know) never takes the hash of the inner .tar
file but only the compressed one. Because of this you may still need to
provide `--orig ` if you want to compare with an uncompressed tar.
...


Right now the preferred form of source in Debian is an upstream-signed
release tarball, NOT anything from git.

An actual improvement would be to automatically and 100% reliably
verify that a given tarball matches the commit ID and signed git tag
in an upstream git tree.


I strongly disagree. I think the upstream signature is overrated.

It's from the old mindset of code signing being the only way of securely 
getting code from upstream. Recent events have shown (instead of 
bothering upstream for signatures) it's much more important to have 
clarity and transparency what's in the code that is compiled into 
binaries and executed on our computers, instead of who we got it from. 
The entire reproducible builds effort is based on the idea of the source 
code in Debian being safe and sound to use.


If upstream refused to sign anything but pre-compiled llvm IR, I'd put 
both the IR and signature in the trash and build from source code.


If upstream wouldn't sign anything but autotools pre-processed archives 
with 25k lines of auto-generated shell scripts I'd put it next to the IR 
and build from the actual source code as well.


If upstream would only sign a tarball with files sorted in the order 
they were returned by their kernel to readdir(), I'd raise the question 
why we're having this in 2024 (and possibly suggest to use a tar with 
sorted entries).


Although to be honest if this would really be the only problem we'd be 
having, I'd likely not care anymore and put my time to better use.



Or perhaps stop using tarballs in Debian as sole permitted
form of source.


I'd be fine with that.

cheers,
kpcyrd



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread James McCoy
On Fri, Apr 05, 2024 at 01:31:25AM +0300, Adrian Bunk wrote:
> On Thu, Apr 04, 2024 at 09:39:51PM +0200, kpcyrd wrote:
> >...
> > I've checked both, upstreams github release page and their website[1], but
> > couldn't find any mention of .tar.xz, so I think my claim of Debian doing
> > the compression is fair.
> > 
> > [1]: https://www.vim.org/download.php
> >...
> 
> Perhaps that's a maintainer running "git archive" manually?

Yes, in whichever way git-deborig(1) is driving git archive.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Adrian Bunk
On Thu, Apr 04, 2024 at 09:39:51PM +0200, kpcyrd wrote:
>...
> I've checked both, upstreams github release page and their website[1], but
> couldn't find any mention of .tar.xz, so I think my claim of Debian doing
> the compression is fair.
> 
> [1]: https://www.vim.org/download.php
>...

Perhaps that's a maintainer running "git archive" manually?

Hashes of "git archive" tarballs are anyway not stable,
so whatever a maintainer generates is not worse than what is on Github.

Any proper tooling would have to verify that the contents is equal.

>...
> Being able to disregard the compression layer is still necessary however,
> because Debian (as far as I know) never takes the hash of the inner .tar
> file but only the compressed one. Because of this you may still need to
> provide `--orig ` if you want to compare with an uncompressed tar.
>...

Right now the preferred form of source in Debian is an upstream-signed 
release tarball, NOT anything from git.

An actual improvement would be to automatically and 100% reliably
verify that a given tarball matches the commit ID and signed git tag
in an upstream git tree.

But for that writing tooling would be the trivial part,
architectural topics like where to store the commit ID
and where to store the git tree would be the harder parts.

Or perhaps stop using tarballs in Debian as sole permitted
form of source.

> cheers,
> kpcyrd

cu
Adrian



Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Jeremy Stanley
On 2024-04-04 21:39:51 +0200 (+0200), kpcyrd wrote:
[...]
> I don't know if Debian has this kind of provenance information available, to
> my knowledge, Debian operates on "our maintainers upload .tar.xz files into
> our archive and we take them for face value". Which does make sense,
> considering not every software project uses git, some may develop their own
> VCS, some software projects do not have any VCS at all and it's just one
> person applying patches to a folder on their local computer and uploading
> .tar snapshots to a webserver every other month.
[...]

Looking at this with my upstream hat on, there is more information
in a Git repository than is represented in a flat export of its
worktree. Some projects consider the Git metadata context to be part
of the source code, and run source build processes in order to bake
that additional information into our source archives.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/3/24 4:21 AM, Adrian Bunk wrote:

On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:

...
I figured out a somewhat straight-forward way to check if a given `git
archive` output is cryptographically claimed to be the source input of a
given binary package in either Arch Linux or Debian (or both).


For Debian the proper approach would be to copy Checksums-Sha256 for the
source package to the buildinfo file, and there is nothing where it would
matter whether the tarball was generated from git or otherwise.


I believe this to be the "reproducible source tarball" thing some people
have been asking about.
...


The lack of a reliably reproducible checksum when using "git archive" is
the problem, and git cannot realistically provide that.

Even when called with the same parameters, "git archive" executed in
different environments might produce different archives for the same
commit ID.

It is documented that auto-generated Github tarballs for the same tag
and with the same commit ID downloaded at different times might have
different checksums.


Granted it takes some skill to take snapshots that match what github is 
generating (and there are occasional issues) but generally speaking it 
works quite well. The required command is in the README, and I encourage 
you to give it a try.


If you want something that's explicitly designed for taking reproducible 
VCS snapshots you could also consider the "Nix Archive" format[0], 
however I think more people would be in favor of agreeing on how to 
canonically derive a given git tree into a `.tar.gz` (or at least .tar) 
instead of switching Debian to the .nar file format.


[0]: https://github.com/ebkalderon/libnar

I think regular `git archive` is already pretty good, complaining that 
it may only work in 98% of cases, I'd say, is a Luxusproblem considering 
the current state of things. The next paragraph is the bigger headache:



This tool highlights the concept of "canonical sources", which is supposed
to give guidance on what to code review.
...


How does it tell the git commit ID the tarball was generated from?

Doing a code review of git sources as tarball would would be stupid,
you really want the git metadata that usually shows when, why and by
whom something was changed.


It doesn't. It works like a one-way function, it can verify a given VCS 
snapshot is definitely the source code that was ingested into Debian, 
but it can't locate the source code on its own.


I don't know if Debian has this kind of provenance information 
available, to my knowledge, Debian operates on "our maintainers upload 
.tar.xz files into our archive and we take them for face value". Which 
does make sense, considering not every software project uses git, some 
may develop their own VCS, some software projects do not have any VCS at 
all and it's just one person applying patches to a folder on their local 
computer and uploading .tar snapshots to a webserver every other month.


There's some packages that have some kind of system behind them, like 
rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to 
match  (although 
sometimes files get excluded from the tar upload). I'd like to 
explicitly encourage people to point me in the right direction if 
there's any existing effort of mapping debian .orig.tar.gz files to git 
tags (not necessarily bit-for-bit, but at least which commit we expect 
it to come from).



https://github.com/kpcyrd/backseat-signed

The README
...


"This requires some squinting since in Debian the source tarball is
  commonly recompressed so only the inner .tar is compared"

This doesn't sound true.


I've updated the wording and intend to investigate this further. By 
default the relevant command even expects an exact match. For example 
this works:


```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz
[2024-04-04T18:45:09Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.gz"

[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] File verified 
successfully

```

But if I repack the .tar.gz into .tar.xz it's going to get rejected:

```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz
[2024-04-04T18:48:32Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.xz"

[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Searching in index...
Error: Could not find source tarball with matching hash in source index
```

Being able to disregard the compression layer is still necessary 
however, because Debian (as far as I know) never takes the 

Re: New supply-chain security tool: backseat-signed

2024-04-02 Thread Adrian Bunk
On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:
>...
> I figured out a somewhat straight-forward way to check if a given `git
> archive` output is cryptographically claimed to be the source input of a
> given binary package in either Arch Linux or Debian (or both).

For Debian the proper approach would be to copy Checksums-Sha256 for the 
source package to the buildinfo file, and there is nothing where it would
matter whether the tarball was generated from git or otherwise.

> I believe this to be the "reproducible source tarball" thing some people
> have been asking about.
>...

The lack of a reliably reproducible checksum when using "git archive" is 
the problem, and git cannot realistically provide that.

Even when called with the same parameters, "git archive" executed in 
different environments might produce different archives for the same
commit ID.

It is documented that auto-generated Github tarballs for the same tag 
and with the same commit ID downloaded at different times might have 
different checksums.

> This tool highlights the concept of "canonical sources", which is supposed
> to give guidance on what to code review.
>...

How does it tell the git commit ID the tarball was generated from?

Doing a code review of git sources as tarball would would be stupid,
you really want the git metadata that usually shows when, why and by
whom something was changed.

> https://github.com/kpcyrd/backseat-signed
> 
> The README
>...

"This requires some squinting since in Debian the source tarball is 
 commonly recompressed so only the inner .tar is compared"

This doesn't sound true.

> Let me know what you think. 
> 
> Happy feet,
> kpcyrd

cu
Adrian



New supply-chain security tool: backseat-signed

2024-04-02 Thread kpcyrd

Hello,

I'm going to keep this short, I've been writing a lot of text recently 
(which is quite exhausting, on top of my dayjob and all the code I wrote 
today afterwards. Apologies if you're still waiting for a reply in one 
of the other threads).


I figured out a somewhat straight-forward way to check if a given `git 
archive` output is cryptographically claimed to be the source input of a 
given binary package in either Arch Linux or Debian (or both).


I believe this to be the "reproducible source tarball" thing some people 
have been asking about. As explained in the README, I believe 
reproducing autotools-generated tarballs isn't worth everybody's time 
and instead a distribution that claims to build from source should 
operate on VCS snapshots instead of tarballs with 25k lines of 
pre-generated shell-script. Building from VCS snapshots is already the 
case  for a large number of Arch Linux packages (through auto-generated 
Github tarballs). Some packages have been actively converted to VCS 
snapshots by Arch Linux staff in response to the xz incident.


This tool highlights the concept of "canonical sources", which is 
supposed to give guidance on what to code review. This is also why I 
think code signing by upstream is somewhat low priority, since the big 
distros can form consensus around "what's the source code" regardless.


https://github.com/kpcyrd/backseat-signed

The README shows how to verify Arch Linux and Debian build cmatrix from 
the same source code - they may both still apply patches (which would be 
considered part of the build instructions), but the specified source 
input is the same. This tarball can also be bit-for-bit reproduced from 
VCS by taking a `git archive` snapshot of the v2.0 tag in the cmatrix 
repository.


(If somebody ever tells you programming in Rust is slower, I wrote the 
entirety of this codebase within a few hours of a single day)


Let me know what you think. 

Happy feet,
kpcyrd