Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Sean Whitton
Hello,

On Sun 03 Aug 2025 at 03:59pm +02, Andrea Pappacoda wrote:

> On Sun Aug 3, 2025 at 3:15 PM CEST, Sean Whitton wrote:
>> I thought that the .delta files were mostly to cover, for example, the
>> tarball containing autotools-generated files that aren't in git?
>> Isn't that a key use case?
>
> Mhh, I'd say no, maybe. Let me explain.
>
> The main undeniable advantage of pristine-tar is regenerating a tarball
> which is bit-by-bit identical to the upstream one, without having to
> keep the actual tarball around. This is useful for source
> reproducibility use cases (ignoring that Git is better for this anyway).
>
> I argue that containing autotools-generated files is not the main use
> case because in the usual git-buildpackage workflow you actually import
> the tarballs into git, so the Debian git tree has the autotools stuff as
> well.
>
> When one uses a mixed upstream git + tarballs gbp workflow, the tarball
> contents gets applied as a new commit on top of the upstream git tag.
> So, even there, the contents of the tarball match the contents of the
> git tree pointed by the upstream/latest branch (minus stuff like empty
> dirs).
>
> So, we can just say: if you want to use pristine-tar, make sure to
> commit its contents to the upstream/latest branch (gbp does this by
> default anyway).
>
> Note: here I use "upstream/latest" to refer to the branch containing the
> upstream code to be used for package builds. It could have a different
> name, of course, but that's what DEP14 recommends.

Thanks for confirming, Andrea -- we're on the same page.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Andrea Pappacoda

On Sun Aug 3, 2025 at 3:31 PM CEST, Ian Jackson wrote:

Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar 
support"):

I thought that the .delta files were mostly to cover, for example, the
tarball containing autotools-generated files that aren't in git?
Isn't that a key use case?


Not according to Colin in the "want Jia Tan option" bug,
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109423#15

Empty directories are a corner case but git will consider them
treesame so if we do the check in git all will be well.


Yes, exactly. This is what I tried to explain in my previous message, 
but Colin has done so way better :)




Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Andrea Pappacoda

On Sun Aug 3, 2025 at 3:15 PM CEST, Sean Whitton wrote:
I thought that the .delta files were mostly to cover, for example, the 
tarball containing autotools-generated files that aren't in git?

Isn't that a key use case?


Mhh, I'd say no, maybe. Let me explain.

The main undeniable advantage of pristine-tar is regenerating a tarball 
which is bit-by-bit identical to the upstream one, without having to 
keep the actual tarball around. This is useful for source 
reproducibility use cases (ignoring that Git is better for this anyway).


I argue that containing autotools-generated files is not the main use 
case because in the usual git-buildpackage workflow you actually import 
the tarballs into git, so the Debian git tree has the autotools stuff as 
well.


When one uses a mixed upstream git + tarballs gbp workflow, the tarball 
contents gets applied as a new commit on top of the upstream git tag. 
So, even there, the contents of the tarball match the contents of the 
git tree pointed by the upstream/latest branch (minus stuff like empty 
dirs).


So, we can just say: if you want to use pristine-tar, make sure to 
commit its contents to the upstream/latest branch (gbp does this by 
default anyway).


Note: here I use "upstream/latest" to refer to the branch containing the 
upstream code to be used for package builds. It could have a different 
name, of course, but that's what DEP14 recommends.




Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Sean Whitton
Hello,

On Sun 03 Aug 2025 at 02:31pm +01, Ian Jackson wrote:

> Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add
> pristine-tar support"):
>> I thought that the .delta files were mostly to cover, for example, the
>> tarball containing autotools-generated files that aren't in git?
>> Isn't that a key use case?
>
> Not according to Colin in the "want Jia Tan option" bug,
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109423#15
>
> Empty directories are a corner case but git will consider them
> treesame so if we do the check in git all will be well.

Ah, right, thanks.  This is confusing, huh?  Then indeed, let's not
support them.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Ian Jackson
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add 
pristine-tar support"):
> I thought that the .delta files were mostly to cover, for example, the
> tarball containing autotools-generated files that aren't in git?
> Isn't that a key use case?

Not according to Colin in the "want Jia Tan option" bug,
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109423#15

Empty directories are a corner case but git will consider them
treesame so if we do the check in git all will be well.

Ian.

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.



Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Sean Whitton
Hello,

On Sun 03 Aug 2025 at 12:27pm +02, Andrea Pappacoda wrote:

>> I don't think I fully understand the implications.  My default
>> position is that the answer should be "no" unless one of us *does*
>> understand the implications :-).
>
> One different example which may illustrates the "unexpected" results
> which this could lead to is this one. Here, the tarball is created with
> a file containing "evil" content, while in the upstream/latest branch
> only the "good" content is stored. Upon tarball checkout, the good
> content gets replaced with the evil one:

I thought that the .delta files were mostly to cover, for example, the
tarball containing autotools-generated files that aren't in git?
Isn't that a key use case?

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Sean Whitton
Hello,

On Sat 02 Aug 2025 at 07:34pm +02, Andrea Pappacoda wrote:

> I'm not sure I get this part, but if you meant what I understood, then
> it's wrong. The .id file does not contain the hash of the tarball, it
> contains a single line which corresponds to the tree id, as mentioned
> above. I'm honestly not sure where the hash verification happens, but *i
> believe* it's part of the reconstruction when pristine-gz and co re run,
> thanks to information stored in the .delta (VCDIFF) file.

Okay, could you rewrite this part, then?

It might be a good time to open an MR adding the latest version of your
text to tag2upload(5).

> One question remains unanswered. Should we allow .delta files modifying
> the tarball contents (i.e., do we want to allow generating tarballs
> which have different contents then the git tree)?

ISTM that we should allow this as otherwise we would not be supporting
many pristine-tar users.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-03 Thread Andrea Pappacoda

On Sat Aug 2, 2025 at 9:17 PM CEST, Ian Jackson wrote:

I've come back from a party and am a bit tipsy so I will read this
properly later, but:

Thanks for engaging with these questions!


I'm replying to your email after a small party too, but at least I have 
slept a couple of hours :)



I think in principle it might be a .sig.


Maybe yes, but regardless of the input signature filename, pristine-tar 
always stores the signature in Git with a name of orig.asc. Also, 
doesn't dpkg-source look for .asc files only?


So the .id contains the tree (git tree object) which uniquely 
identifies the *contents* of the tarball.


Yes, but see below.

But how does the pristine-tar information specify the precise hash of 
the tarball itself?  Does the .delta file say what the output hash is 
supposed to be ?


Yes, I've checked now and the .delta contains the expected SHA256 hash.

I don't think I fully understand the implications.  My default 
position is that the answer should be "no" unless one of us *does* 
understand the implications :-).


One "innocuous" example which I don't see issues allowing is one where 
the orig tarball contains empty dirs, which are not representable in 
Git. As an example:


   $ tar -xvzf mypackage_1.0.orig.tar.gz
   mypackage/
   mypackage/file.txt
   mypackage/empty_dir/

   $ cd mypackage

   $ git init -b upstream/latest

   $ git add --all

   $ git commit -m init

   $ git show pristine-tar:mypackage_1.0.orig.tar.gz.id | xargs git show
   tree 385d33e969fefd23b8efaca69c1d2db507ce0daf

   file.txt

   $ pristine-tar commit ../mypackage_1.0.orig.tar.gz upstream/latest

   $ rm ../mypackage_1.0.orig.tar.gz

   $ pristine-tar --debug checkout mypackage_1.0.orig.tar.gz
   pristine-tar: set subdir to mypackage
   pristine-tar: subdir is mypackage
   pristine-tar: mypackage/empty_dir/ is listed in the manifest but may not be 
present in the source directory
   pristine-tar: creating missing mypackage/empty_dir/
   pristine-tar: doing full tree sweep to catch missing files
   pristine-tar: successfully generated mypackage_1.0.orig.tar.gz

   $ tar -tzf mypackage_1.0.orig.tar.gz
   mypackage/
   mypackage/file.txt
   mypackage/empty_dir/

One different example which may illustrates the "unexpected" results 
which this could lead to is this one. Here, the tarball is created with 
a file containing "evil" content, while in the upstream/latest branch 
only the "good" content is stored. Upon tarball checkout, the good 
content gets replaced with the evil one:


   $ mkdir repo

   $ echo evil > repo/file.txt

   $ tar -czf repo_1.0.orig.tar.gz repo

   $ echo good > repo/file.txt

   $ cd repo

   $ git init -b upstream/latest

   $ git add --all

   $ git commit -m init

   $ pristine-tar commit ../repo_1.0.orig.tar.gz upstream/latest

   $ git show pristine-tar:repo_1.0.orig.tar.gz.id
   ca1cc63dd18610bc64a150397556d33e850a61e8

   $ git rev-parse --verify --end-of-options 'upstream/latest^{tree}'
   ca1cc63dd18610bc64a150397556d33e850a61e8

   $ git show ca1cc63dd18610bc64a150397556d33e850a61e8:file.txt
   good

   $ rm ../repo_1.0.orig.tar.gz

   $ pristine-tar checkout repo_1.0.orig.tar.gz

   $ tar -xvzf repo_1.0.orig.tar.gz
   repo/
   repo/file.txt

   $ cat repo/file.txt
   evil

Even though both the pristine-tar .id file and the upstream/latest 
branch point to the same tree id, the binary .delta contains 
modifications to file.txt which change the contents from "good" (stored 
in the git tree) to "evil" upon orig checkout.


Even though this example is artificial (the tarball contents are usually 
committed to version control after it has been downloaded, not before), 
it would still theoretically be possible for a malicious maintainer to 
sneak a backdoor in (like in the xz backdoor case, but with the extra 
step of also having a Debian maintainer collaborate). So I'm inclined to 
say "sorry, no, this is too dangerous".


It is also true that this is currently allowed in regular Salsa repos, 
so allowing this would not really make the situation worse.


The thing is: how do we disallow this? I'm not aware of any pristine-tar 
switch which makes it fail when such .delta file performing file content 
modifications exists. Do we have to perform our own checking *after* the 
tarball is checked out, by e.g. extracting it again on top of the 
upstream commit tree and making sure no differences exist? Hacky but may 
work.


Let me know! Bye :)



Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-02 Thread Ian Jackson
I've come back from a party and am a bit tipsy so I will read this
properly later, but:

Thanks for engaging with these questions!

Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add 
pristine-tar support"):
> In practise, pristine-tar always stores the signature file as 
> "orig_name.asc". So I think we could just specify this requirement here.

I think in principle it might be a .sig.

> >>   If an orig tarball needs to be (re)generated, the service will use 
> >>   pristine-tar, using precixely the metadata in the .id file.  The 
> >>   service will check that the generated tarball MATCHES THE HASH IN 
> >>   THE .ID FILE and that its contained tree is identical to SOMETHING.
> 
> I'm not sure I get this part, but if you meant what I understood, then 
> it's wrong. The .id file does not contain the hash of the tarball, it 
> contains a single line which corresponds to the tree id, as mentioned 
> above. I'm honestly not sure where the hash verification happens, but *i 
> believe* it's part of the reconstruction when pristine-gz and co re run, 
> thanks to information stored in the .delta (VCDIFF) file.

So the .id contains the tree (git tree object) which uniquely
identifies the *contents* of the tarball.

But how does the pristine-tar information specify the precise hash of
the tarball itself?  Does the .delta file say what the output hash is
supposed to be ?

> One question remains unanswered. Should we allow .delta files modifying 
> the tarball contents (i.e., do we want to allow generating tarballs 
> which have different contents then the git tree)?

I don't think I fully understand the implications.  My default
position is that the answer should be "no" unless one of us *does*
understand the implications :-).

Regards,
Ian.

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.



Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-02 Thread Andrea Pappacoda

Hi Sean!

On Sat Aug 2, 2025 at 12:20 PM CEST, Sean Whitton wrote:

Have you had a chance to look at the following?


Sorry, I thought I had replied already. Thanks for the reminder :)


On Mon 28 Jul 2025 at 08:19pm +01, Ian Jackson wrote:


Something like

  Names a commit containing pristine-tar metadata.

  The commit must contain SOMETHING LIKE exactly one .id file with 
  SOME PROPERTIES OR OTHER.  The .id file MUST SATISFY SOME 
  CONDITIONS THAT I DON'T UNDERSTAND.


The branch must contain exactly one .id file per upstream release. Its 
name should correspond to the name of the orig tarball, with the ".id" 
suffix. The file must be a regular file.


  The tag must also contain an C item, and the tree named 
  in the .id file must be identical to that of the C 
  commit.


Yes.

  The pristine-tar commit may contain SOMEHOW IDENTIFIABLE signature 
  file.  The signature file MUST SATISFY REASONAB.E CONDITIONS SUCH 
  AS ITS FILENAME BEING SANE.


In practise, pristine-tar always stores the signature file as 
"orig_name.asc". So I think we could just specify this requirement here.


  The signature file will then be published together with the orig 
  tarball.  The signature file is treated as pure data by the service 
  (so will not be verified or even format checked).


Yes.

  If an orig tarball needs to be (re)generated, the service will use 
  pristine-tar, using precixely the metadata in the .id file.  The 
  service will check that the generated tarball MATCHES THE HASH IN 
  THE .ID FILE and that its contained tree is identical to SOMETHING.


I'm not sure I get this part, but if you meant what I understood, then 
it's wrong. The .id file does not contain the hash of the tarball, it 
contains a single line which corresponds to the tree id, as mentioned 
above. I'm honestly not sure where the hash verification happens, but *i 
believe* it's part of the reconstruction when pristine-gz and co re run, 
thanks to information stored in the .delta (VCDIFF) file.



  The named prstine-tar commit must be reachable from the
  C branch in the repository.


Yes.

One question remains unanswered. Should we allow .delta files modifying 
the tarball contents (i.e., do we want to allow generating tarballs 
which have different contents then the git tree)?




Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-08-02 Thread Sean Whitton
Hi Andrea,

Have you had a chance to look at the following?

On Mon 28 Jul 2025 at 08:19pm +01, Ian Jackson wrote:

> Something like
>
>   Names a commit containing pristine-tar metadata.
>
>   The commit must contain SOMETHING LIKE exactly one .id file with
>   SOME PROPERTIES OR OTHER.  The .id file MUST SATISFY SOME
>   CONDITIONS THAT I DON'T UNDERSTAND.
>
>   The tag must also contain an C item, and the tree named in
>   the .id file must be identical to that of the C commit.
>
>   The pristine-tar commit may contain SOMEHOW IDENTIFIABLE signature
>   file.  The signature file MUST SATISFY REASONAB.E CONDITIONS SUCH AS
>   ITS FILENAME BEING SANE.  The signature file will then be published
>   together with the orig tarball.  The signature file is treated as
>   pure data by the service (so will not be verified or even format
>   checked).
>
>   If an orig tarball needs to be (re)generated, the service will use
>   pristine-tar, using precixely the metadata in the .id file.  The
>   service will check that the generated tarball MATCHES THE HASH IN
>   THE .ID FILE and that its contained tree is identical to SOMETHING.
>
>   The named prstine-tar commit must be reachable from the
>   C branch in the repository.
>
> Ian.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-29 Thread Sean Whitton
Hello,

On Mon 28 Jul 2025 at 09:21pm +02, Andrea Pappacoda wrote:

> It's likely that I didn't explain myself correctly. I meant the existing
> upstream= and upstream-tag= metadata fields which git-debpush already
> uses. The pristine-tar tool does not need those to generate a tarball,
> but I believe it's still useful to include them alongside the
> pristine-tar= metadata field to compare the pristine-tar tree to the
> tree of the git commit contained in the upstream= metadata field.

Thanks.  We don't want to depend on the pristine-tar field for anything
other than obtaining the orig.tar, so we would definitely want to keep
the upstream= and upstream-tag= fields no matter what.

>> Trying to read your patch, I think the fact I don't use pristine-tar is
>> really showing.  Is the .id file defined somewhere?  Is your knowledge
>> of the pristine-tar branch contents from reading a spec, or empirical?
>
> Kind of both. The pristine-tar(1) manpage says, under the `pristine-tar
> commit _tarball_ _upstream_` section:
>
>> The upstream parameter specifies the tag or branch [or commit, Ed.]
>> that contains the same content that is present in the tarball. The
>> name of the tree it points to will be recorded for later use by
>> pristine-tar checkout.
>
> So yes, pristine-tar specifies that it stores the tree id somewhere. It
> does not explicitly say where (well, not in the manpages), but it does
> store that tree id inside a file named as the input tarball with ".id"
> appended (as shown in its source code). This is not configurable, and
> pristine-tar also looks for such file when running `pristine-tar
> checkout`, so it cannot change really, otherwise new pristine-tar
> versions would be unable to extract old tarballs, which defeats the
> purpose of the tool.
>
> The ".delta" file is explicitly mentioned in the manpage, just below the
> paragraph I quoted before.

Thanks, I understand now.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-28 Thread Andrea Pappacoda

On Mon Jul 28, 2025 at 8:44 PM CEST, Sean Whitton wrote:

On Sun 27 Jul 2025 at 04:11pm +02, Andrea Pappacoda wrote:
I tried to add to the tag2upload.5 manpage the pristine-tar handling 
design outlined in our discussions, which is inline below. Still, 
I have a few questions:


What should we do with that upstream commit metadata?


Sorry, but which metadata is that?


It's likely that I didn't explain myself correctly. I meant the existing 
upstream= and upstream-tag= metadata fields which git-debpush already 
uses. The pristine-tar tool does not need those to generate a tarball, 
but I believe it's still useful to include them alongside the 
pristine-tar= metadata field to compare the pristine-tar tree to the 
tree of the git commit contained in the upstream= metadata field.


Hope it's clearer now! If not, here's some code which should express my 
intent less ambiguously than in English.


   pristine_tar_tree_id=$(git cat-file -- blob 
"${s_pristine_tar}:${tarball}.id")
   upstream_commit_tree_id=$(git rev-parse --verify --end-of-options 
"${s_u}^{tree}")
   if [ "$pristine_tar_tree_id" != "$upstream_commit_tree_id" ]; then
   fail 'pristine-tar tree id differs from the upstream commit one'
   fi

Trying to read your patch, I think the fact I don't use pristine-tar is 
really showing.  Is the .id file defined somewhere?  Is your knowledge 
of the pristine-tar branch contents from reading a spec, or empirical?


Kind of both. The pristine-tar(1) manpage says, under the `pristine-tar 
commit _tarball_ _upstream_` section:


The upstream parameter specifies the tag or branch [or commit, Ed.] 
that contains the same content that is present in the tarball. The 
name of the tree it points to will be recorded for later use by 
pristine-tar checkout.


So yes, pristine-tar specifies that it stores the tree id somewhere. It 
does not explicitly say where (well, not in the manpages), but it does 
store that tree id inside a file named as the input tarball with ".id" 
appended (as shown in its source code). This is not configurable, and 
pristine-tar also looks for such file when running `pristine-tar 
checkout`, so it cannot change really, otherwise new pristine-tar 
versions would be unable to extract old tarballs, which defeats the 
purpose of the tool.


The ".delta" file is explicitly mentioned in the manpage, just below the 
paragraph I quoted before.



Glad we have someone who knows it better working on it.


Thanks!



Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-28 Thread Ian Jackson
Andrea Pappacoda writes ("Re: Bug#1106071: [RFC PATCH dgit v2] tag2upload: add 
pristine-tar support"):
> I tried to add to the tag2upload.5 manpage the pristine-tar handling 
> design outlined in our discussions, which is inline below. Still, I have 
> a few questions:
> 
> What should we do with that upstream commit metadata? pristine-tar does 
> not need that, since it'll generate the tarball from the git tree id 
> stored in source_version.orig.tar.id. Still, we might want to make sure 
> that the pristine-tar tree corresponds to the one of the upstream commit 
> id. I don't know how useful this would be though, since the delta may 
> contain additional file additions and removals. Also, what should we do 
> with such tarballs whose contents are not identical to the git tree?
> 
> In the text below, I assume that:
> 
> - We want to verify equality of upstreamc's tree and the one used by 
>   pristine-tar.
> - We allow binary deltas (i.e., the .delta file) to contain 
>   modifications to files stored in the referenced tree, such as the 
>   addition of configure scripts.
> 
> Here it is:
> 
> =item C=COMMITID

I think we decided it should start with !.

I will add something to the spec about critical extensions starting
with !.

> Identifies the state of the pristine-tar branch at the time of push, if 
> present and containing data related to the current upstream version.

This is ind kof in the wrong mood.  It reads like a description of
when git-debpush should include it.  It ought instead to be a
specification of what meaning of the item is.

Something like

  Names a commit containing pristine-tar metadata.

  The commit must contain SOMETHING LIKE exactly one .id file with
  SOME PROPERTIES OR OTHER.  The .id file MUST SATISFY SOME
  CONDITIONS THAT I DON'T UNDERSTAND.

  The tag must also contain an C item, and the tree named in
  the .id file must be identical to that of the C commit.

  The pristine-tar commit may contain SOMEHOW IDENTIFIABLE signature
  file.  The signature file MUST SATISFY REASONAB.E CONDITIONS SUCH AS
  ITS FILENAME BEING SANE.  The signature file will then be published
  together with the orig tarball.  The signature file is treated as
  pure data by the service (so will not be verified or even format
  checked).

  If an orig tarball needs to be (re)generated, the service will use
  pristine-tar, using precixely the metadata in the .id file.  The
  service will check that the generated tarball MATCHES THE HASH IN
  THE .ID FILE and that its contained tree is identical to SOMETHING.

  The named prstine-tar commit must be reachable from the
  C branch in the repository.

Ian.

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.



Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-28 Thread Sean Whitton
Hello,

On Mon 28 Jul 2025 at 08:58pm +02, Andrea Pappacoda wrote:

> Great :)
>
> I think that for the time being, publishing the signature without extra
> processing is the most appropriate solution. We always have time to
> revise it if needed.

ACK.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-28 Thread Andrea Pappacoda

Hi,

On Mon Jul 28, 2025 at 8:41 PM CEST, Sean Whitton wrote:

On Sat 26 Jul 2025 at 03:56pm +02, Andrea Pappacoda wrote:
There's really no reason why really, I just tried to put everything 
pristine-tar related in the same place. Thinking about it, these 
checks can only go before obtaining the pristine_tar_info, because 
I cannot reasonably get the pristine-tar info before first making 
sure there's just one orig.


I would suggest it should go in the section marked "Gather git history 
information".


Will do!

What I was thinking is that changing to use git(1) instead of 
pristine-tar(1) is a logically distinct change from changing from 
a check to embedding pristine-tar info in the tag.  So they should be 
separate commits anyway, and we'd want to run the full test suite 
against both of them.  While we are still discussing design you could 
get the first change out of the way with a MR now.


Put this way, it makes sense. Will send another patch soon.


Do you mean whether dak does any verification?  I don't know.


Great :)

I think that for the time being, publishing the signature without extra 
processing is the most appropriate solution. We always have time to 
revise it if needed.




Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-28 Thread Sean Whitton
Hello,

On Sun 27 Jul 2025 at 04:11pm +02, Andrea Pappacoda wrote:

> Hi again!
>
> I tried to add to the tag2upload.5 manpage the pristine-tar handling
> design outlined in our discussions, which is inline below. Still, I have
> a few questions:
>
> What should we do with that upstream commit metadata?

Sorry, but which metadata is that?

Trying to read your patch, I think the fact I don't use pristine-tar is
really showing.  Is the .id file defined somewhere?  Is your knowledge
of the pristine-tar branch contents from reading a spec, or empirical?

Glad we have someone who knows it better working on it.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-28 Thread Sean Whitton
Hello,

On Sat 26 Jul 2025 at 03:56pm +02, Andrea Pappacoda wrote:

> This is meant as a way to handle the potential issue described by Ian in
> <[email protected]>. Unless I have
> misunderstood the issue, of course!
>
> If one does a -2 upload and the archive does not have the orig yet, and
> t2u has a reference to the pristine-tar branch, it can (safely?)
> re-create the tarball as it would be bit-by-bit identical to the already
> uploaded one. Does it make sense?

Right, I see, thank you.

> There's really no reason why really, I just tried to put everything
> pristine-tar related in the same place. Thinking about it, these checks
> can only go before obtaining the pristine_tar_info, because I cannot
> reasonably get the pristine-tar info before first making sure there's
> just one orig.

I would suggest it should go in the section marked "Gather git history
information".

>> I take it you switched from invoking pristine-tar itself to calling
>> git-ls-tree in order to use NUL termination?  If so, maybe we should
>> make that change first to the existing check.  Perhaps you could
>> prepare an MR to that effect.
>
> Kind of. I first wrote the checks in tag2upload-obtain-origs using plain
> shell and git, and then simply copied them back here. This was before
> looking at the existing pristine-tar check. But yes, pristine-tar does
> not use nul termination.
>
> You mean I should write a separate patch for the check and submit it
> independently from this patch? I'd like to finish this patch in
> a reasonable time, so maybe it doesn't make sense to fixup a local check
> which is going to be removed soon anyway?

What I was thinking is that changing to use git(1) instead of
pristine-tar(1) is a logically distinct change from changing from a
check to embedding pristine-tar info in the tag.  So they should be
separate commits anyway, and we'd want to run the full test suite
against both of them.  While we are still discussing design you could
get the first change out of the way with a MR now.

> Thanks! Also, the code mixes tabs and spaces for indentation; looking at
> the diff here made me remember that. Not my fault!

Yeah, this is our inconsistent use of Emacs, sorry about that.
Just don't worry about it.

> Yes, that'd be the correct thing to do. What I wasn't sure about is
> whether t2u should just checkout the signature and upload it to the
> archive, verify it as well, or not do anything at all. In other words:
> do we have to do verification here, or does it happen after sending
> everything to the archive with dput? In that case, would it make sense
> to duplicate the verification?

Do you mean whether dak does any verification?  I don't know.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-27 Thread Andrea Pappacoda

Hi again!

I tried to add to the tag2upload.5 manpage the pristine-tar handling 
design outlined in our discussions, which is inline below. Still, I have 
a few questions:


What should we do with that upstream commit metadata? pristine-tar does 
not need that, since it'll generate the tarball from the git tree id 
stored in source_version.orig.tar.id. Still, we might want to make sure 
that the pristine-tar tree corresponds to the one of the upstream commit 
id. I don't know how useful this would be though, since the delta may 
contain additional file additions and removals. Also, what should we do 
with such tarballs whose contents are not identical to the git tree?


In the text below, I assume that:

- We want to verify equality of upstreamc's tree and the one used by 
 pristine-tar.
- We allow binary deltas (i.e., the .delta file) to contain 
 modifications to files stored in the referenced tree, such as the 
 addition of configure scripts.


Here it is:

=item C=COMMITID

Identifies the state of the pristine-tar branch at the time of push, if 
present and containing data related to the current upstream version.


If this metadata item is present, the C and C 
items must be present too. The tag2upload service will ensure that the 
tree contained in the .id file of the pristine-tar branch will 
correspond to the tree referenced by the commit id contained in the 
C metadata item.


If the pristine-tar branch contains a signature file, this will be 
published together with the orig tarball, and no signature verification 
will be performed.


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-26 Thread Ian Jackson
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add 
pristine-tar support"):
> Thanks.  I've included some inline comments below.
> 
> I think it would be helpful to work on the spec in tag2upload(5) before
> continuing too much with code.  It'll make it easier to keep the three
> of us on the same page.

Yes, I very much agree.  Spec work on the protocol ought to precede
the code.

> On Sat 26 Jul 2025 at 02:12pm +02, Andrea Pappacoda wrote:
> > - Differently from the old pristine-tar check, the code is not run just
> >   for the first (i.e., -1 or -0.1) revision, but for any upload. This
> >   way, the t2u service can potentially handle the case where
> >   a pristine-tar upload was intended, but no orig is available in the
> >   archive yet. Please let me know if this makes sense or not!
> 
> It might make sense, I'm not sure yet.  Can you describe a concrete
> example that would lead to this being helpful?

I'm pretty sure Andrea has this right.

If we have pristine-tar support we should engage it whenever we are
doing upstream handling (ie for non-native source formats) and this
ought not to depend on the version number.

The -1 etc. thing is just a guess really.  That's fine for a check,
but it's not fine for functional code.

Of course the pristine-tar codepath is entitled to decide that
pristine-tar ought not to be applied to this upload, using its
pristine-tar-specific knowledge.

> > +# TODO: what about signature files?
> 
> Do you think we could extract them and include them in the upload?

Does pristine-tar convey signature files?  If so we should definitely
support them.

> I think we can verify them by using the upstream key embedded in the
> source package, right?  And if that verification fails we should
> probably abort the upload -- maintainers who choose to use tarball
> signatures had better make sure they verify.

I don't think I agree that we ought to be doing signature verification
that isn't related to our functioning.

In particular, this means that if we were to accept an upload with no
signature file, we should accept an upload with a signature file we
can't verify for whatever reason.   Since we are obviously not relying
on the signature if we don't mind if it's totally absent.

I know that this is not standard in Debian tooling but I find the
approach of Debian's tooling contrary to reasonable cryptographic
protocol design.

I guess a signature you can't verify might be a failed check, but,
really, is a maintainer *actually* going to get as far as git-debpush
without having discovered the signature doesn't verify in their local
environment?  They've probably *run* the upstream code by then.

Or to put it another way, a signature that won't verify probably means
that the upload was previously done by another maintainer or on
another system where the right public key *was* available, but it's
not available here and now.  It doesn't seem to me that it is likely
to mean "this is an attack and we should stop" or anything like that.


Questions like the ones we discuss above are examples of reasons why
it is a good idea to nail down the spec before writing code.  Deciding
on correct behaviour in advance saves rework (and rework is extra
effort and often leads to additional confusion and additional bugs).

Ian.

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.



Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-26 Thread Andrea Pappacoda

Hi Sean,

On Sat Jul 26, 2025 at 3:31 PM CEST, Sean Whitton wrote:
I think it would be helpful to work on the spec in tag2upload(5) 
before continuing too much with code.  It'll make it easier to keep 
the three of us on the same page.


Ok :)
Documentation work has arrived earlier than I had anticipated...


- The pristine-tar checking code is now only run if this is a non-native
  package (i.e., "if $upstream").
- The upstream version is used instead of the Debian-revised one.


ITYM your new code, right?
The old check already had both these properties.


Yes. I copied them from the old check :)


- [..] the t2u service can potentially handle the case where
  a pristine-tar upload was intended, but no orig is available in the
  archive yet. Please let me know if this makes sense or not!


It might make sense, I'm not sure yet.  Can you describe a concrete 
example that would lead to this being helpful?


This is meant as a way to handle the potential issue described by Ian in 
<[email protected]>. Unless I have 
misunderstood the issue, of course!


If one does a -2 upload and the archive does not have the orig yet, and 
t2u has a reference to the pristine-tar branch, it can (safely?) 
re-create the tarball as it would be bit-by-bit identical to the already 
uploaded one. Does it make sense?



diff --git a/git-debpush b/git-debpush
index e3a4ba39..78e42fb9 100755
--- a/git-debpush
+++ b/git-debpush
@@ -457,6 +457,30 @@ if $upstream; then
 to_push+=("$upstream_tag")
 fi

+# I obtain the commit ID at the time of the upload, so that I can be sure that
+# the tag2upload service generates the tarball with the expected pristine-tar
+# branch state
+pristine_tar_info=''
+if $upstream; then
+uversion="${version%-*}"
+
+if pristine_tar_commit=$(git rev-parse --verify --quiet 
'refs/heads/pristine-tar'); then
+pristine_tar_tarballs=$(git ls-tree -z --name-only -- 
'refs/heads/pristine-tar' \
+| grep -zF -- "${source}_${uversion}.orig.tar." \
+| grep -zc -- "\.id$")
+
+if [ "$pristine_tar_tarballs" -gt 1 ]; then
+fail 'more then one pristine-tar orig'
+fi
+
+# If there's no tarball, the user probably stopped using pristine-tar a
+# while ago, but didn't delete the branch. Just ignore it.
+if [ "$pristine_tar_tarballs" -eq 1 ]; then
+pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+fi
+fi
+fi
+
 # Useful sanity checks 


Can you explain why you've put this in at this point in the script?  I
think that maybe it should go later, after all the sanity checks.


No, I cannot explain that :)

There's really no reason why really, I just tried to put everything 
pristine-tar related in the same place. Thinking about it, these checks 
can only go before obtaining the pristine_tar_info, because I cannot 
reasonably get the pristine-tar info before first making sure there's 
just one orig.


I take it you switched from invoking pristine-tar itself to calling 
git-ls-tree in order to use NUL termination?  If so, maybe we should 
make that change first to the existing check.  Perhaps you could 
prepare an MR to that effect.


Kind of. I first wrote the checks in tag2upload-obtain-origs using plain 
shell and git, and then simply copied them back here. This was before 
looking at the existing pristine-tar check. But yes, pristine-tar does 
not use nul termination.


You mean I should write a separate patch for the check and submit it 
independently from this patch? I'd like to finish this patch in 
a reasonable time, so maybe it doesn't make sense to fixup a local check 
which is going to be removed soon anyway?



@@ -2031,6 +2033,9 @@ END
 "s=$suite",
 "u=$t2u_upstreamc",
 );
+if (length $t2u_pristine_tar) {
+   push(@obtain_origs, "pristine_tar=$t2u_pristine_tar")
+}


Generally we avoid parentheses on builtin operators and use poetry
style, so

push @obtain_origs, "pristine_tar=$t2u_pristine_tar"
  if $t2u_pristine_tar;


Thanks! Also, the code mixes tabs and spaces for indentation; looking at 
the diff here made me remember that. Not my fault!



+# TODO: what about signature files?


Do you think we could extract them and include them in the upload?
I think we can verify them by using the upstream key embedded in the 
source package, right?  And if that verification fails we should 
probably abort the upload -- maintainers who choose to use tarball 
signatures had better make sure they verify.


Yes, that'd be the correct thing to do. What I wasn't sure about is 
whether t2u should just checkout the signature and upload it to the 
archive, verify it as well, or not do anything at all. In other words: 
do we have to do verification here, or does it happen after sending 
everything to the archive with dput? In that case, would it make sense 
to duplicate the verification?


Thanks for the review, Sean!


signature.asc
Des

Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-26 Thread Sean Whitton
Hello,

Thanks.  I've included some inline comments below.

I think it would be helpful to work on the spec in tag2upload(5) before
continuing too much with code.  It'll make it easier to keep the three
of us on the same page.

On Sat 26 Jul 2025 at 02:12pm +02, Andrea Pappacoda wrote:

> ---
> Ok! Round two.
>
> Here's a summary of the changes from v1. For git-debpush:
>
> - The pristine-tar checking code is now only run if this is a non-native
>   package (i.e., "if $upstream").
> - The upstream version is used instead of the Debian-revised one.

ITYM your new code, right?
The old check already had both these properties.

> - Differently from the old pristine-tar check, the code is not run just
>   for the first (i.e., -1 or -0.1) revision, but for any upload. This
>   way, the t2u service can potentially handle the case where
>   a pristine-tar upload was intended, but no orig is available in the
>   archive yet. Please let me know if this makes sense or not!

It might make sense, I'm not sure yet.  Can you describe a concrete
example that would lead to this being helpful?

> diff --git a/git-debpush b/git-debpush
> index e3a4ba39..78e42fb9 100755
> --- a/git-debpush
> +++ b/git-debpush
> @@ -457,6 +457,30 @@ if $upstream; then
>  to_push+=("$upstream_tag")
>  fi
>
> +# I obtain the commit ID at the time of the upload, so that I can be sure 
> that
> +# the tag2upload service generates the tarball with the expected pristine-tar
> +# branch state
> +pristine_tar_info=''
> +if $upstream; then
> +uversion="${version%-*}"
> +
> +if pristine_tar_commit=$(git rev-parse --verify --quiet 
> 'refs/heads/pristine-tar'); then
> +pristine_tar_tarballs=$(git ls-tree -z --name-only -- 
> 'refs/heads/pristine-tar' \
> +| grep -zF -- "${source}_${uversion}.orig.tar." \
> +| grep -zc -- "\.id$")
> +
> +if [ "$pristine_tar_tarballs" -gt 1 ]; then
> +fail 'more then one pristine-tar orig'
> +fi
> +
> +# If there's no tarball, the user probably stopped using 
> pristine-tar a
> +# while ago, but didn't delete the branch. Just ignore it.
> +if [ "$pristine_tar_tarballs" -eq 1 ]; then
> +pristine_tar_info=" pristine-tar=$pristine_tar_commit"
> +fi
> +fi
> +fi
> +
>  # Useful sanity checks 

Can you explain why you've put this in at this point in the script?  I
think that maybe it should go later, after all the sanity checks.

I take it you switched from invoking pristine-tar itself to calling
git-ls-tree in order to use NUL termination?  If so, maybe we should
make that change first to the existing check.  Perhaps you could prepare
an MR to that effect.

> @@ -2031,6 +2033,9 @@ END
>  "s=$suite",
>  "u=$t2u_upstreamc",
>  );
> +if (length $t2u_pristine_tar) {
> + push(@obtain_origs, "pristine_tar=$t2u_pristine_tar")
> +}

Generally we avoid parentheses on builtin operators and use poetry
style, so

push @obtain_origs, "pristine_tar=$t2u_pristine_tar"
  if $t2u_pristine_tar;

> +# TODO: what about signature files?

Do you think we could extract them and include them in the upload?
I think we can verify them by using the upstream key embedded in the
source package, right?  And if that verification fails we should
probably abort the upload -- maintainers who choose to use tarball
signatures had better make sure they verify.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support

2025-07-26 Thread Andrea Pappacoda
---
Ok! Round two.

Here's a summary of the changes from v1. For git-debpush:

- The pristine-tar checking code is now only run if this is a non-native 
  package (i.e., "if $upstream").
- The upstream version is used instead of the Debian-revised one.
- Differently from the old pristine-tar check, the code is not run just 
  for the first (i.e., -1 or -0.1) revision, but for any upload. This 
  way, the t2u service can potentially handle the case where 
  a pristine-tar upload was intended, but no orig is available in the 
  archive yet. Please let me know if this makes sense or not!

For tag2upload-obtain-origs:

- Code is a bit more carefully written (using nul terminated command 
  output when possible). This also applies to the git-debpush script.
- The pristinetar option has been renamed to pristine_tar, and the 
  option keys glob has been changed to accept underscores instead of 
  dashes.
- The process will fail if there is more than one pristine-tar orig.
- It is now checked that pristine-tar metadata is a regular file 
  (according to git ls-tree)
- git update-ref is not used to rewind the pristine-tar branch, instead 
  of git reset --hard.

I did not add the "critical extension" stuff yet. Also, what should we 
do about the signature files which pristine-tar can optionally store and 
retrieve?

 git-debpush | 41 ++---
 infra/dgit-repos-server |  7 ++-
 tag2upload-obtain-origs | 38 --
 3 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/git-debpush b/git-debpush
index e3a4ba39..78e42fb9 100755
--- a/git-debpush
+++ b/git-debpush
@@ -457,6 +457,30 @@ if $upstream; then
 to_push+=("$upstream_tag")
 fi
 
+# I obtain the commit ID at the time of the upload, so that I can be sure that
+# the tag2upload service generates the tarball with the expected pristine-tar
+# branch state
+pristine_tar_info=''
+if $upstream; then
+uversion="${version%-*}"
+
+if pristine_tar_commit=$(git rev-parse --verify --quiet 
'refs/heads/pristine-tar'); then
+pristine_tar_tarballs=$(git ls-tree -z --name-only -- 
'refs/heads/pristine-tar' \
+| grep -zF -- "${source}_${uversion}.orig.tar." \
+| grep -zc -- "\.id$")
+
+if [ "$pristine_tar_tarballs" -gt 1 ]; then
+fail 'more then one pristine-tar orig'
+fi
+
+# If there's no tarball, the user probably stopped using pristine-tar a
+# while ago, but didn't delete the branch. Just ignore it.
+if [ "$pristine_tar_tarballs" -eq 1 ]; then
+pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+fi
+fi
+fi
+
 # Useful sanity checks 
 
 # UNRELEASED suite
@@ -522,20 +546,6 @@ case "$branch" in
 fi
 esac
 
-# Intent to use pristine-tar for this upload
-
-case "$version" in
-*"-1"|*"-0.1")
-   uversion="${version%-*}"
-   if $upstream && type pristine-tar >/dev/null 2>/dev/null \
-   && pristine-tar list \
-   | grep -q "^${source}_${uversion}"'\.orig\.tar\.'
-   then
-   fail_check pristine-tar \
- "pristine-tar data present for $uversion, but this will be ignored (#1106071)"
-   fi
-esac
-
 # Submodules
 
 # Per gitmodules(7) "FORMS", .gitmodules is always present at the
@@ -837,7 +847,8 @@ fi
 tagmessage="$source release $version for $target
 
 [dgit distro=$distro split$quilt_mode_text]
-[dgit please-upload source=$source version=$version$upstream_info]
+[dgit please-upload source=$source version=$version]
+${upstream_info:+[dgit $upstream_info$pristine_tar_info]}
 "
 
 git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
index f6a3716c..96058d56 100755
--- a/infra/dgit-repos-server
+++ b/infra/dgit-repos-server
@@ -1304,7 +1304,7 @@ our ($t2u_email_noreply, $t2u_email_noreply_addr, 
$t2u_email_reply_to,
  @t2u_email_copies, $t2u_jid, $t2u_url, $t2u_putative_package);
 our ($t2u_tagger, $t2u_tagger_addr, $t2u_timeout);
 our ($t2u_signing_keyid);
-our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt);
+our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt, $t2u_pristine_tar);
 
 sub t2u_dgit_cmd () {
 (
@@ -1840,6 +1840,8 @@ sub tag2upload_parsetag ($) {
$package = $1;
} elsif (s/^version=(\S+) //) {
$tagversion = $1;
+   } elsif (s/^pristine-tar=(\w+) //) {
+   $t2u_pristine_tar = $1;
} else {
return 0;
}
@@ -2031,6 +2033,9 @@ END
 "s=$suite",
 "u=$t2u_upstreamc",
 );
+if (length $t2u_pristine_tar) {
+   push(@obtain_origs, "pristine_tar=$t2u_pristine_tar")
+}
 flush EMAIL_REPORT or confess $!;
 open STDOUT, ">& EMAIL_REPORT" or confess $!;
 t2u_b_run_fetch_cmd_errok 'work', @obtain_origs;
diff --git a/tag2upload-obtain-origs b/tag2upload-obtain-origs
index 016fa655..73a23bea 100755
--- a/