Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support

2025-07-25 Thread Sean Whitton
Hello,

On Mon 21 Jul 2025 at 07:32pm +02, Andrea Pappacoda wrote:

> diff --git a/git-debpush b/git-debpush
> index e3a4ba39..1782513e 100755
> --- a/git-debpush
> +++ b/git-debpush
> @@ -457,6 +457,16 @@ if $upstream; then
>  to_push+=("$upstream_tag")
>  fi
>
> +# TODO: pristine-tar
> +# I obtain the commit ID at the time of the upload, so that I can be sure 
> that
> +# the tag2upload service generates the tarball with the expected pristine-tar
> +# branch state
> +pristine_tar_info=''
> +if pristine_tar_commit=$(git rev-parse --verify --quiet 
> 'refs/heads/pristine-tar'); then
> +pristine_tar_info=" pristine-tar=$pristine_tar_commit"
> +fi
> +
> +
>  # Useful sanity checks 

There is already some pristine-tar stuff in git-debpush, the "Intent to
use pristine-tar for this upload" check.  I think that your changes
should be integrated with that.  I.e., the conditions under which we
currently fail because we think the user wants to use pristine-tar,
should be precisely the conditions in which pristine_tar= gets added to
the tag.

As for everything else, I'll assume Ian's review probably covered it,
and I'll take a closer look at your v2.  Thank you for helping us with
this.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support

2025-07-25 Thread Ian Jackson
Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v1] tag2upload: add 
pristine-tar support"):
> On Fri Jul 25, 2025 at 12:00 AM CEST, Ian Jackson wrote:
> > This would be the first such, (--quilt ought to have been but is 
> > grandfathered) but I think probably if someone hypothetically somehow 
> > sends this option to a t2u service where this support has not yet been 
> > deployed (or has been withdrawn!) what should happen?  Probably it 
> > should fail.
> 
> Yeah, I'd make sense. If I have git-debpush adding pristine-tar 
> metadata, I'm probably expecting the t2u service to use it. But maybe 
> it'd make sense to make all extensions critical by default? I can't 
> think of a scenario where git-debpush would add something which the t2u 
> service could ignore.

We've already doucmented (and implemented in our three existing
parsers) that unknown extensions with the current syntax should be
ignored.  So I don't think we should change this now.

> > So how about
> >   [dgit ... !pristine-tar=... ...]
> > or
> >   [dgit ... +pristine-tar=... ...]
> > or some such?
> >
> > The new metadata item will need to be documented in tag2upload(5).
> 
> I like "!" more, as "+" to me has more of an "cumulative" feel.

Let's go with !.

> > What happens if another maintainer did this new upstream version, and 
> > our user hasn't got that up-to-date pristine-tar branch yet?  I think 
> > ideally we'd use the orig from the archive, but there is a risk of 
> > lossage if the uploads come too close together - see ##1109130.
> 
> Maybe in the t2u service we should check if pristine-tar data exists in 
> the repo and error out if the pushed tag didn't contain pristine-tar 
> metadata? Because in any case, user's local builds would still be using 
> a different orig than the intended one, so you're effectively testing 
> your package against the wrong upstream code.

I think even if #1109130 happens, we can make the upload *fail* rather
than use the wrong orig.

But, I suggest we could deal with this by warning the user (with a
failed check) if the upstream pristine-tar branch is ahead of their
own.

> > We might want to linewrap this a bit more.  That line is getting quite
> > long.  I guess we could put all the upstream and pristine-tar info in
> > a separate line maybe?
> 
> You mean like:
> 
> [dgit distro=$distro split$quilt_mode_text]
> [dgit please-upload source=$source version=$version]
> [dgit $upstream_info$pristine_tar_info]
> 
> If there's no functional difference, why not.

Yes.  No, there isn't.

> > I think this option should be called "pristine_tar".
> 
> Doesn't the tag2upload-obtain-origs script check that the option name 
> only contains [0-9a-z-] characters? I could add an underscore there.

Huh.  That makes no sense because it tries to make them into shell
varisbles!  This seems like a bug in the script.

> > I don't think this is right, is it ?  We want to rewind the local
> > pristine-tar branch to that commit, not the HEAD.  I'm assuming
> > pristine-tar implicitly uses refs/heads/pristine-tar rather than HEAD.
> 
> Yes, that's what I wanted to do. Didn't know about the existence of `git 
> update-ref` (which is what I'm supposed to use here, right?). The tool 
> also has a `--no-deref` option which makes me kinda nervous (having 
> heard of all the security issues related to symlinks) - maybe it'd make 
> sense to use that too?

We completely control the local ref namespace, because we use a
`git fetch` with refspecs, rather than `git clone`.  So there won't be
unexpected symrefs.

> > We definitely don't want it to be random!  Let's call this an error 
> > for now.
> 
> Got it. For the record, `gbp export-orig` generates the tarball 
> mentioned in the most recent pristine-tar branch commit message. If the 
> messages do not contain the expected tarball version, it proceeds 
> assuming gzip, and if that fails, generates a new tarball with git 
> archive and commits it into the pristine-tar branch. Which is probably 
> less sensible than using a random one :)

Blimey.

> > If this turns out to be a problem in practice, and users don't want to 
> > just git rm the unwanted info from their pristine-tar branch, then 
> > I guess we'll need to add the tarball name to the tag or something.
> 
> Yes, but how? If there are two tarballs of the same version in 
> pristine-tar, the user might not even realise that. Which of the two 
> should be added? Should the user be asked about that by git-debpush? 
> It's a 

Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support

2025-07-25 Thread Andrea Pappacoda

On Fri Jul 25, 2025 at 12:00 AM CEST, Ian Jackson wrote:

Thanks.   This seems to be going in the right direction.  Very
exciting!


Nice to hear!

I have a bunch of comments about various details etc.  Please don't be 
discouraged by my pickiness.  It's no reflection on the quality of 
your work.  Getting feedback like this from me is entirely normal and 
expected :-).


Don't worry at all! Please be as picky as you can be. I'm not afraid of 
negative feedbacks :)


1. Added a new pristine-tar=commit_id metadata field to the signed tag 
   generated by git-debpush


This seems right to me.

Should this be a "critical extension" in X.509 terminology?

This would be the first such, (--quilt ought to have been but is 
grandfathered) but I think probably if someone hypothetically somehow 
sends this option to a t2u service where this support has not yet been 
deployed (or has been withdrawn!) what should happen?  Probably it 
should fail.


Yeah, I'd make sense. If I have git-debpush adding pristine-tar 
metadata, I'm probably expecting the t2u service to use it. But maybe 
it'd make sense to make all extensions critical by default? I can't 
think of a scenario where git-debpush would add something which the t2u 
service could ignore.



So how about
  [dgit ... !pristine-tar=... ...]
or
  [dgit ... +pristine-tar=... ...]
or some such?

The new metadata item will need to be documented in tag2upload(5).


I like "!" more, as "+" to me has more of an "cumulative" feel. And, in 
case we go for critical by default, I'd use "?" for 
a non-critical/optional field.


2. If a pristine-tar commitid is present, the git repo is hard-reset to 
   it, then the tarball name is obtained from the tree, and

   `pristine-tar checkout` is invoked to generate the tarball.


By "the tarball name is obtained from the tree" you mean that the orig
tarball name is obtained from the pristine-tar commit's tree.


I'm not super familiar with Git terminology, but yes.

I believe this is safe security-wise because [...]. Do you see flaws 
in this reasoning?


I think this is correct from my understanding of pristine-tar.
Subject to what I write below about pristine-tar vs HEAD.


Great!


--- a/git-debpush
+++ b/git-debpush

...

+pristine_tar_info=''
+if pristine_tar_commit=$(git rev-parse --verify --quiet 
'refs/heads/pristine-tar'); then
+pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+fi


Don't we want to do this only if the current upstream version actually
has any informaation in pristine-tar ?  If not, then the user didn't
import the tarball, presumably ?


Yes, makes perfect sense. I have a couple of packages where 
a pristine-tar branch is present but outdated, so this would break all 
of them actually :)


What happens if another maintainer did this new upstream version, and 
our user hasn't got that up-to-date pristine-tar branch yet?  I think 
ideally we'd use the orig from the archive, but there is a risk of 
lossage if the uploads come too close together - see ##1109130.


Maybe in the t2u service we should check if pristine-tar data exists in 
the repo and error out if the pushed tag didn't contain pristine-tar 
metadata? Because in any case, user's local builds would still be using 
a different orig than the intended one, so you're effectively testing 
your package against the wrong upstream code.



 # Useful sanity checks 


I don't know what those might be.  I guess we could leave them for now
and see what things go wrong in reality.  Ie, I don't think this is a
blocker.


 [dgit distro=$distro split$quilt_mode_text]
-[dgit please-upload source=$source version=$version$upstream_info]
+[dgit please-upload source=$source 
version=$version$upstream_info$pristine_tar_info]


We might want to linewrap this a bit more.  That line is getting quite
long.  I guess we could put all the upstream and pristine-tar info in
a separate line maybe?


You mean like:

   [dgit distro=$distro split$quilt_mode_text]
   [dgit please-upload source=$source version=$version]
   [dgit $upstream_info$pristine_tar_info]

If there's no functional difference, why not.


 git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
index f6a3716c..e797123b 100755
--- a/infra/dgit-repos-server
+++ b/infra/dgit-repos-server


This part looks correct.  Except:


+   "pristinetar=$t2u_pristinetar",


I think this option should be called "pristine_tar".


Doesn't the tag2upload-obtain-origs script check that the option name 
only contains [0-9a-z-] characters? I could add an underscore there.



-x git deborig "$s_u"
+if [ -n "$s_pristinetar" ]; then
+old_head=$(git rev-parse --verify HEAD)
+git reset --hard -- "$s_pristinetar"


I don't think this is right, is it ?  We want to rewind the local
pristine-tar branch to that commit, not the HEAD.  I'm assuming
pristine-tar implicitly uses refs/heads/pristine-tar rather than HEAD.


Yes, that's what I

Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support

2025-07-24 Thread Ian Jackson
Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v1] tag2upload: add 
pristine-tar support"):
> I did two things:

Thanks.   This seems to be going in the right direction.  Very
exciting!

I have a bunch of comments about various details etc.  Please don't be
discouraged by my pickiness.  It's no reflection on the quality of
your work.  Getting feedback like this from me is entirely normal and
expected :-).

> 1. Added a new pristine-tar=commit_id metadata field to the signed tag 
>generated by git-debpush

This seems right to me.

Should this be a "critical extension" in X.509 terminology?

This would be the first such, (--quilt ought to have been but is
grandfathered) but I think probably if someone hypothetically somehow
sends this option to a t2u service where this support has not yet been
deployed (or has been withdrawn!) what should happen?  Probably it
should fail.

So how about
  [dgit ... !pristine-tar=... ...]
or
  [dgit ... +pristine-tar=... ...]
or some such?

The new metadata item will need to be documented in tag2upload(5).

> 2. If a pristine-tar commitid is present, the git repo is hard-reset to 
>it, then the tarball name is obtained from the tree, and
>`pristine-tar checkout` is invoked to generate the tarball.

By "the tarball name is obtained from the tree" you mean that the orig
tarball name is obtained from the pristine-tar commit's tree.

> I believe this is safe security-wise because the commit id represented 
> the expected status of the pristine-tar branch on the developer's 
> machine is signed at the time of the upload. If for some reason the 
> branch gets in-between the upload, and the expected commit is lost, 
> things will just fail instead of generating a "wrong" tarball. Do you 
> see flaws in this reasoning?

I think this is correct from my understanding of pristine-tar.
Subject to what I write below about pristine-tar vs HEAD.

> Note: I did not test that my implementation is actually correct and 
> working. I did not add any new test either. I believe there are no 
> regressions in old tests but I'm testing this on a plane and didn't look 
> too much into the log output.

Right.

> Let me know that you think! Bye :)

> --- a/git-debpush
> +++ b/git-debpush
...
> +pristine_tar_info=''
> +if pristine_tar_commit=$(git rev-parse --verify --quiet 
> 'refs/heads/pristine-tar'); then
> +pristine_tar_info=" pristine-tar=$pristine_tar_commit"
> +fi

Don't we want to do this only if the current upstream version actually
has any informaation in pristine-tar ?  If not, then the user didn't
import the tarball, presumably ?

What happens if another maintainer did this new upstream version, and
our user hasn't got that up-to-date pristine-tar branch yet?  I think
ideally we'd use the orig from the archive, but there is a risk of
lossage if the uploads come too close together - see ##1109130.

>  # Useful sanity checks 

I don't know what those might be.  I guess we could leave them for now
and see what things go wrong in reality.  Ie, I don't think this is a
blocker.

>  [dgit distro=$distro split$quilt_mode_text]
> -[dgit please-upload source=$source version=$version$upstream_info]
> +[dgit please-upload source=$source 
> version=$version$upstream_info$pristine_tar_info]

We might want to linewrap this a bit more.  That line is getting quite
long.  I guess we could put all the upstream and pristine-tar info in
a separate line maybe?

>  git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
> diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
> index f6a3716c..e797123b 100755
> --- a/infra/dgit-repos-server
> +++ b/infra/dgit-repos-server

This part looks correct.  Except:

> + "pristinetar=$t2u_pristinetar",

I think this option should be called "pristine_tar".

> -x git deborig "$s_u"
> +if [ -n "$s_pristinetar" ]; then
> +old_head=$(git rev-parse --verify HEAD)
> +git reset --hard -- "$s_pristinetar"

I don't think this is right, is it ?  We want to rewind the local
pristine-tar branch to that commit, not the HEAD.  I'm assuming
pristine-tar implicitly uses refs/heads/pristine-tar rather than HEAD.

We should definitely have a test case simulating an attack where the
salsa pristine-tar branch has got strange stuff in it.

> +# Only get the first tarball file matching the current package version.
> +# pristine-tar does not disallow having two tarballs for the same package
> +# version using different compression algorithms, so this will just pick 
> a
> +# random one. It might make sense to error out instead in such cases.

We definitely don't

Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support

2025-07-22 Thread Andrea Pappacoda

On Mon Jul 21, 2025 at 7:32 PM CEST, Andrea Pappacoda wrote:

diff --git a/tag2upload-obtain-origs b/tag2upload-obtain-origs
index 016fa655..0453c2de 100755
--- a/tag2upload-obtain-origs
+++ b/tag2upload-obtain-origs
@@ -16,6 +16,7 @@
 # optional settings:
 #
 # bpd  defaults to ../bpd
+# pristinetar=PRISTINE-TAR-COMMITID
 
 set -eu -o pipefail

 shopt -s inherit_errexit # #514862, wtf
@@ -96,7 +97,27 @@ case "$rc" in
;;
 esac
 
-x git deborig "$s_u"

+if [ -n "$s_pristinetar" ]; then
+old_head=$(git rev-parse --verify HEAD)
+git reset --hard -- "$s_pristinetar"
+
+# Only get the first tarball file matching the current package version.
+# pristine-tar does not disallow having two tarballs for the same package
+# version using different compression algorithms, so this will just pick a
+# random one. It might make sense to error out instead in such cases.
+tarball=$(git ls-tree --name-only | while IFS= read -r file; do
+case "$file" in
+"${s_p}_${fversion}.orig.*.id") printf '%s\n' "${file%.id}"
+break ;;
+esac
+done)
+
+x pristine-tar checkout "$tarball"


Here there should probably also be a `mv "$tarball" ..`, as pristine-tar 
checkout generates the tarball in the current working directory.



+git reset --hard -- "$old_head"
+else
+x git deborig "$s_u"
+fi
+
 
 report 'created orig'
 




Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support

2025-07-21 Thread Andrea Pappacoda

I noticed I made some typos in my previous message. Sorry, I'm tired.

On Mon Jul 21, 2025 at 7:32 PM CEST, Andrea Pappacoda wrote:
I believe this is safe security-wise because the commit id represented 
the expected status of the pristine-tar branch on the developer's 
machine is signed at the time of the upload.


I meant "representing" instead of "represented".

If for some reason the branch gets in-between the upload, and the 
expected commit is lost, things will just fail instead of generating 
a "wrong" tarball.


I meant "the branch gets **modified** in-between the upload **and the 
t2u service build**".




Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support

2025-07-21 Thread Andrea Pappacoda
Hi!

Here's an initial draft of the pristine-tar support. More than something 
intended to be merged in, this is just a way to get some feedback, 
trying to understand if I'm on the right track.

I did two things:

1. Added a new pristine-tar=commit_id metadata field to the signed tag 
   generated by git-debpush
2. If a pristine-tar commitid is present, the git repo is hard-reset to 
   it, then the tarball name is obtained from the tree, and
   `pristine-tar checkout` is invoked to generate the tarball.

I believe this is safe security-wise because the commit id represented 
the expected status of the pristine-tar branch on the developer's 
machine is signed at the time of the upload. If for some reason the 
branch gets in-between the upload, and the expected commit is lost, 
things will just fail instead of generating a "wrong" tarball. Do you 
see flaws in this reasoning?

Note: I did not test that my implementation is actually correct and 
working. I did not add any new test either. I believe there are no 
regressions in old tests but I'm testing this on a plane and didn't look 
too much into the log output.

Let me know that you think! Bye :)

---
 git-debpush | 12 +++-
 infra/dgit-repos-server |  5 -
 tag2upload-obtain-origs | 23 ++-
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/git-debpush b/git-debpush
index e3a4ba39..1782513e 100755
--- a/git-debpush
+++ b/git-debpush
@@ -457,6 +457,16 @@ if $upstream; then
 to_push+=("$upstream_tag")
 fi
 
+# TODO: pristine-tar
+# I obtain the commit ID at the time of the upload, so that I can be sure that
+# the tag2upload service generates the tarball with the expected pristine-tar
+# branch state
+pristine_tar_info=''
+if pristine_tar_commit=$(git rev-parse --verify --quiet 
'refs/heads/pristine-tar'); then
+pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+fi
+
+
 # Useful sanity checks 
 
 # UNRELEASED suite
@@ -837,7 +847,7 @@ fi
 tagmessage="$source release $version for $target
 
 [dgit distro=$distro split$quilt_mode_text]
-[dgit please-upload source=$source version=$version$upstream_info]
+[dgit please-upload source=$source 
version=$version$upstream_info$pristine_tar_info]
 "
 
 git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
index f6a3716c..e797123b 100755
--- a/infra/dgit-repos-server
+++ b/infra/dgit-repos-server
@@ -1304,7 +1304,7 @@ our ($t2u_email_noreply, $t2u_email_noreply_addr, 
$t2u_email_reply_to,
  @t2u_email_copies, $t2u_jid, $t2u_url, $t2u_putative_package);
 our ($t2u_tagger, $t2u_tagger_addr, $t2u_timeout);
 our ($t2u_signing_keyid);
-our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt);
+our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt, $t2u_pristinetar);
 
 sub t2u_dgit_cmd () {
 (
@@ -1840,6 +1840,8 @@ sub tag2upload_parsetag ($) {
$package = $1;
} elsif (s/^version=(\S+) //) {
$tagversion = $1;
+   } elsif (s/^pristine-tar=(\w+) //) {
+   $t2u_pristinetar = $1;
} else {
return 0;
}
@@ -2030,6 +2032,7 @@ END
 "v=$version",
 "s=$suite",
 "u=$t2u_upstreamc",
+   "pristinetar=$t2u_pristinetar",
 );
 flush EMAIL_REPORT or confess $!;
 open STDOUT, ">& EMAIL_REPORT" or confess $!;
diff --git a/tag2upload-obtain-origs b/tag2upload-obtain-origs
index 016fa655..0453c2de 100755
--- a/tag2upload-obtain-origs
+++ b/tag2upload-obtain-origs
@@ -16,6 +16,7 @@
 # optional settings:
 #
 # bpd  defaults to ../bpd
+# pristinetar=PRISTINE-TAR-COMMITID
 
 set -eu -o pipefail
 shopt -s inherit_errexit # #514862, wtf
@@ -96,7 +97,27 @@ case "$rc" in
;;
 esac
 
-x git deborig "$s_u"
+if [ -n "$s_pristinetar" ]; then
+old_head=$(git rev-parse --verify HEAD)
+git reset --hard -- "$s_pristinetar"
+
+# Only get the first tarball file matching the current package version.
+# pristine-tar does not disallow having two tarballs for the same package
+# version using different compression algorithms, so this will just pick a
+# random one. It might make sense to error out instead in such cases.
+tarball=$(git ls-tree --name-only | while IFS= read -r file; do
+case "$file" in
+"${s_p}_${fversion}.orig.*.id") printf '%s\n' "${file%.id}"
+break ;;
+esac
+done)
+
+x pristine-tar checkout "$tarball"
+git reset --hard -- "$old_head"
+else
+x git deborig "$s_u"
+fi
+
 
 report 'created orig'
 
-- 
2.47.2