Re: [RFC] Proposal for new source format

2019-11-08 Thread Ian Jackson
gregor herrmann writes ("Re: [RFC] Proposal for new source format"):
> On Thu, 31 Oct 2019 11:59:07 +, Ian Jackson wrote:
> >  * tag2upload service, or some related service:
> > - determines that the maintainer is using a dgit-compatible git
> >   workflow, by looking at the tags, and looks at some in-dsc
> >   metadata to find the maintainer's repo
> > - determines that the maintainer is using salsa or launchpad,
> >   converts the NMU to the maintainer branch's format, and
> >   submits an MR
> 
> … after checking that the current state of the git repo corresponds
> to the current version in the archive (which is often not the case in
> my experience with NMUs) …

Right.  This is one of the difficulties with the ad-hoc maintainer
branches you find on salsa.  And it is one of the ways that dgit
helps.

An NMU done with `dgit push-source', and starting from `dgit clone',
always gets this right.  But if the maintainer didn't use dgit to
upload, `dgit clone' produces useless history, so the NMUer has to
cope with lack of history; or if the NMUer wants history they can dig
around in salsa.  But if the NMUer fetches a branch from salsa it is
not so easy to make sure that they start from the right git commit
(and not possible to make a tool which gets it right every time).

My enhanced scheme as I propose above could also get all this right.
The NMUer would use `dgit clone' and they would see proper history
because the maintainer had used tag2upload.  So the NMUer's
canonical-view git branch starts at the last upload, as it should.

The tags specify where all the relevant versions are.  So the
converter can make a maintainer view branch starting at the
maintainer's last upload.  That's what would become the maintainer
view MR.  (It could be rebased if desired.)

Ian.



Re: [RFC] Proposal for new source format

2019-10-31 Thread gregor herrmann
On Thu, 31 Oct 2019 11:59:07 +, Ian Jackson wrote:

>  * tag2upload service, or some related service:
> - determines that the maintainer is using a dgit-compatible git
>   workflow, by looking at the tags, and looks at some in-dsc
>   metadata to find the maintainer's repo
> - determines that the maintainer is using salsa or launchpad,
>   converts the NMU to the maintainer branch's format, and
>   submits an MR

… after checking that the current state of the git repo corresponds
to the current version in the archive (which is often not the case in
my experience with NMUs) …


Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at -- Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
   `-   NP: Leonard Cohen: You Know Who I Am


signature.asc
Description: Digital Signature


Re: [RFC] Proposal for new source format

2019-10-31 Thread Sean Whitton
Hello,

On Thu 31 Oct 2019 at 11:59AM +00, Ian Jackson wrote:

> Well, that's fair enough as far as it goes.  But I think we could do
> better.
>
> It would be possible to imagine some service that works like this:
> [...]

This would be cool!

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: [RFC] Proposal for new source format

2019-10-31 Thread Ian Jackson
Sean Whitton writes ("Re: [RFC] Proposal for new source format"):
> Sorry, I didn't phrase my suggestion carefully.  I was assuming that we
> will continue to expect maintainers to accept patches in the BTS, but
> that if they *prefer* something else, they could document that in
> README.source.
> 
> Someone making a large number of changes could just choose to submit
> them all as patches to the BTS, due to the high cost of checking
> README.source -- I'm sure maintainers would understand this.

Well, that's fair enough as far as it goes.  But I think we could do
better.

It would be possible to imagine some service that works like this:

 * NMUer does dgit clone, makes changes, does tag2upload with some
   option (ie a parseable note in the git tag) to say "automatically
   do the NMU things"

 * tag2upload service, or some related service:
- determines that the maintainer is using a dgit-compatible git
  workflow, by looking at the tags, and looks at some in-dsc
  metadata to find the maintainer's repo
- determines that the maintainer is using salsa or launchpad,
  converts the NMU to the maintainer branch's format, and
  submits an MR
- files a Debian bug referencing the MR
- if the preconditions are not satisfied, sends a traditional
  debdiff by email to the BTS instead

 * Maintainer looks at the MR.  (If discussion is needed they do it in
   the bug [1].)  Maintainer merges the branch to master.

 * git hosting service autocloses the MR.  Metadata gateway service
   marks the Debian bug pending.

I think this would give us most of the best of the various possible
worlds.  (You could also do most of this without the actual upload of
course.  Such an canonical-view-to-maintainer-MR gateway would already
be possible, and would work when the maintainer used dgit.)

Ian.

[1] IMO bugs are better for this because they provide a less bitty
conversational structure and are archived and published more usefully.
But it would be possible to handle this via the hosting system MR
conversation instead, or maybe mirror the conversation.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-30 Thread Sean Whitton
Hello,

On Tue 29 Oct 2019 at 08:32AM +01, Tobias Frost wrote:

>> For example, you would not be able to do this:
>>git clone salsa:something
>>cd something
>>make some straightforward change
>>git tag# } [1]
>>git push   # }
>> Instead you would have to download the .origs and so on, and wait
>> while your machine crunched about unpacking and repacking tarballs,
>> applying patches, etc.
>
> 
> I'm missing a "and then I test my package to ensure it still works before
> upload" step…
>
> I wonder how someone should test their packages when they do
> not build it locally.
> And if they do (as they should), the advantages you line
> out are simply not there.
> 

If you use `dpkg-buildpackage -b` to do your local tests, then the
advantage of not having to go near any source packages remains.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: [RFC] Proposal for new source format

2019-10-30 Thread Sean Whitton
Hello Helmut,

On Mon 28 Oct 2019 at 09:35PM +01, Helmut Grohne wrote:

> On Sun, Oct 27, 2019 at 10:11:22AM -0700, Sean Whitton wrote:
>> On Sat 26 Oct 2019 at 04:24PM -07, Russ Allbery wrote:
>> > Hm, that's an interesting thought.  I do generally include that sort of
>> > information in the docuemntation of all packages for which I'm upstream,
>> > but for Debian I've assumed the preferred way to propose changes is the
>> > BTS.  Now that's potentially changing with Salsa.  I don't really mind
>> > monitoring multiple input formats, but some people will.
>>
>> I think that README.source is a fine place for this sort of information.
>
> Hell, no!
>
> Having to read some arbitrary README.source slows down patch submission
> excessively. You may consider this cost low, but if you try to file
> thousands of patches across the whole archive, this adds up. Documenting
> the preferred way of change submission in a machine-readable format
> absolutely is a requirement for performing archive-wide changes. Our
> present implementation of this requirement is "maintainers must consume
> bugs filed via the BTS". I think this is less than ideal, but works
> reasonably well from a submitter-pov.  Changing this to "look up in
> README.source" would make me stop contributing to Debian.

Sorry, I didn't phrase my suggestion carefully.  I was assuming that we
will continue to expect maintainers to accept patches in the BTS, but
that if they *prefer* something else, they could document that in
README.source.

Someone making a large number of changes could just choose to submit
them all as patches to the BTS, due to the high cost of checking
README.source -- I'm sure maintainers would understand this.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: [RFC] Proposal for new source format

2019-10-30 Thread Ian Jackson
Russ Allbery writes ("Re: [RFC] Proposal for new source format"):
> Ian Jackson  writes:
> > Of course this means that the resulting source packages are not the "3.0
> > (quilt)" patch queue source packages that many people (even some people
> > who like git) say is important to them.
> 
> > A key design goal for dgit and my tag2upload proposal, is that (when
> > used in the most usual way) it produces nice source packages like
> > everyone is used to.
> 
> My recollection is that you found 3.0 (quilt) packages had a lot of edge
> cases and strange interactions with Git that you've had to work around.

Oh certainly.  I don't like them very much.  However, lots of people
have, over a long period, told me that they like them and that their
features are valuable to them.  This comes up over and over again in
threads like this one.

> I think there may be some deep conflicts here between a source package
> that is inherently a useful basis for work and modification (one of the
> design goals of 3.0 (quilt), and also one of the things those of us who
> like Git source packages have always wanted) and a source package that is
> easy to reproducibly generate and contains as little complexity as
> possible so that the archive software doesn't need to use any complex
> tools.

My response to this situation has been to solve it with superior
technology.  dgit is a reliable bidirectional (mostly [1]) converter
between .dscs including `3.0 (quilt)' and useful[1] git branches.
That is its core purpose.

I have certainly encountered a large number of anomalies and
difficulties but I have overcome them and the result is a system where
everyone gets to keep what they value.

I took this approach because I wanted to make new stuff that people
would *enjoy more than the old stuff* and *want to use*.  Software
whose output everyone would like.

[1] If to you `useful' means patches-unapplied or bare Debian, then
the dgit ecosystem does not yet have a converter from dsc to your git
branch.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: [RFC] Proposal for new source format

2019-10-29 Thread Russ Allbery
Ian Jackson  writes:

> Of course this means that the resulting source packages are not the "3.0
> (quilt)" patch queue source packages that many people (even some people
> who like git) say is important to them.

> A key design goal for dgit and my tag2upload proposal, is that (when
> used in the most usual way) it produces nice source packages like
> everyone is used to.

My recollection is that you found 3.0 (quilt) packages had a lot of edge
cases and strange interactions with Git that you've had to work around.

I think there may be some deep conflicts here between a source package
that is inherently a useful basis for work and modification (one of the
design goals of 3.0 (quilt), and also one of the things those of us who
like Git source packages have always wanted) and a source package that is
easy to reproducibly generate and contains as little complexity as
possible so that the archive software doesn't need to use any complex
tools.

As long as I can get at the richer representation of the source, I think I
don't really care what the archive distributes, but I can only speak for
myself.

My impression of Bastian's proposal is that his 4.0 looks a lot like 3.0
(native) with overlays, which at least reduces the necessary tools to the
compressor and tar, although tar is still richer than one would ideally
want and therefore kind of a mess from a reproducibility standpoint.

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-29 Thread Ian Jackson
Russ Allbery writes ("Re: [RFC] Proposal for new source format"):
> And therefore the goal of this proposal is to define a source package
> format that allows this to be done more easily than our current source
> package format allows?

Of course this means that the resulting source packages are not the
"3.0 (quilt)" patch queue source packages that many people (even some
people who like git) say is important to them.

A key design goal for dgit and my tag2upload proposal, is that (when
used in the most usual way) it produces nice source packages like
everyone is used to.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: [RFC] Proposal for new source format

2019-10-29 Thread Russ Allbery
Bastian Blank  writes:
> On Tue, Oct 29, 2019 at 12:19:03PM -0700, Russ Allbery wrote:

>> Could you help me understand what this would look like?  Is it something
>> like this workflow?
>> 
>> 1. tag2upload determines the local Git tree that should be uploaded as a
>>new source package.
>> 
>> 2. tag2upload locally constructs a source package from that Git tree.
>> 
>> 3. The uploading user signs the source package that tag2upload constructs.

> The uploading user signs the .dsc file that was constructed.

Ah, okay, so the idea is to either embed the full signed .dsc file in the
tag or to embed at least the signature (and have the server reconstruct
the corresponding .dsc file)?

>> 4. tag2upload pushes a rich tag to its upload server that contains enough
>>information to identify the Git tree that should be uploaded and that
>>includes the signature over the source package constructed from that
>>tree.
>> 
>> 5. The tag2upload server reconstructs the source package from Git,
>>attaches the signature, and then forwards both to dak.

> The server reconstructs the source, attaches the signed (by the user)
> .dsc file and signs the .changes file covering the whole upload itself.

Aha, I think this is new: dak would then be willing to accept a .changes
file signed by the tag2upload server as long as (I'm assuming the rules
here, please check) (a) this is a source-only upload, and (b) the .dsc
file is signed by a DD or DM.  And then the upload permission check would
be done against the identity of the .dsc signer rather than the .changes
signer?

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-29 Thread Bastian Blank
Hi Russ

On Tue, Oct 29, 2019 at 12:19:03PM -0700, Russ Allbery wrote:
> Could you help me understand what this would look like?  Is it something
> like this workflow?
> 
> 1. tag2upload determines the local Git tree that should be uploaded as a
>new source package.
> 
> 2. tag2upload locally constructs a source package from that Git tree.
> 
> 3. The uploading user signs the source package that tag2upload constructs.

The uploading user signs the .dsc file that was constructed.

> 4. tag2upload pushes a rich tag to its upload server that contains enough
>information to identify the Git tree that should be uploaded and that
>includes the signature over the source package constructed from that
>tree.
> 
> 5. The tag2upload server reconstructs the source package from Git,
>attaches the signature, and then forwards both to dak.

The server reconstructs the source, attaches the signed (by the user)
.dsc file and signs the .changes file covering the whole upload itself.

> 6. dak validates the signature on the source package and accepts the
>package.
> 
> And therefore the goal of this proposal is to define a source package
> format that allows this to be done more easily than our current source
> package format allows?

Yes.

Regards,
Bastian

-- 
Captain's Log, star date 21:34.5...



Re: [RFC] Proposal for new source format

2019-10-29 Thread Sam Hartman
> "Bastian" == Bastian Blank  writes:

>> I don't think this proposal is sufficiently well developed where
>> you're going to get much good feedback on debian-devel.

Bastian> What would be the correct location for it?

I'm fairly frustrated that you snipped the key part of my mail trying to
give you constructive feedback and ignored the point I'm trying to make.

The problem is not that you're bringing this up on debian-devel.

The problem is that your proposal IS NOT SUFFICIENTLY DEVELOPED.
Several people including me have given you feedback that we didn't get
it.  based on other responses I may understand what you were trying to
do better than when I read your original message.  But as it stands, I
think the point is not to present the proposal somewhere else.  I think
that those wanting this proposal need to do a bit more work before
presenting it to the community.

In particular, quoting my original message:
>I don't think this proposal is sufficiently well developed where you're
>going to get much good feedback on debian-devel.

>My recommendation:

>1) Consider Russ's comment.  Consider whether you still want to go
>forward.

[It sounds like this may have happened by now.]

>2) If so, find a small group of people who think your idea sounds good.
>Focus on fleshing things out.  Include use cases and how this would make
>the world better.


>3) Solicit review on such a fleshed out proposal.

I think there is a bunch of stuff surrounding the assumptions and goals
of your proposal that is currently in your head and not in the proposal.
I'm reasonably up to date on these discussions.  If my initial reaction
is confusion, others are going to have that reaction too.



Re: [RFC] Proposal for new source format

2019-10-29 Thread Russ Allbery
Bastian Blank  writes:

> We had that discussion already, it is about the possibility of
> reproducing the content of the upload.  The tag2upload proposal said
> they can't do it and everyone need to trust this service to do the right
> thing.  I like to solve this problem and allow such a tool/service to
> forward the trust information by reproducing the output.

Could you help me understand what this would look like?  Is it something
like this workflow?

1. tag2upload determines the local Git tree that should be uploaded as a
   new source package.

2. tag2upload locally constructs a source package from that Git tree.

3. The uploading user signs the source package that tag2upload constructs.

4. tag2upload pushes a rich tag to its upload server that contains enough
   information to identify the Git tree that should be uploaded and that
   includes the signature over the source package constructed from that
   tree.

5. The tag2upload server reconstructs the source package from Git,
   attaches the signature, and then forwards both to dak.

6. dak validates the signature on the source package and accepts the
   package.

And therefore the goal of this proposal is to define a source package
format that allows this to be done more easily than our current source
package format allows?

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-29 Thread Bastian Blank
On Tue, Oct 22, 2019 at 07:33:47AM -0400, Sam Hartman wrote:
> My initial reaction is that this is additional complexity in a direction
> that we don't need.

It is not a question of complexity.  It is a question of trust and who
we want and need to trust.

If we abolish the principle that we want to need little trust as
possible and be able to verify all the steps within the archive, then we
don't actually need the complexity.  But someone needs to stand up and
proclamate exactly that.  This is what no-one did.

It we don't want to do sacrifica that, we have to stick to a chain of
trust.

> Like Russ, I generally assume that VCS-like things are the future.
> I understand there is complexity there.

What is "VCS-like"?  Please define it.  A source package is no VCS, it
does not need to be.

E.g. dgit is not a VCS-like source package, as it solves a different
purpose to a source package we ship in the archive to all our users.

Because we are running around this concept for some time now, please
help me to actually understand what you mean with it.

> But I don't understand why this proposed format would be a step forward
> in a world where we care more about VCSes.  As an example, I don't
> understand how this would make things better for tag2upload.

We had that discussion already, it is about the possibility of
reproducing the content of the upload.  The tag2upload proposal said
they can't do it and everyone need to trust this service to do the right
thing.  I like to solve this problem and allow such a tool/service to
forward the trust information by reproducing the output.

> I don't think this proposal is sufficiently well developed where you're
> going to get much good feedback on debian-devel.

What would be the correct location for it?

Regards,
Bastian

-- 
Those who hate and fight must stop themselves -- otherwise it is not stopped.
-- Spock, "Day of the Dove", stardate unknown



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-29 Thread Ian Jackson
Helmut Grohne writes ("Re: Building Debian source packages reproducibly (was: 
Re: [RFC] Proposal for new source format)"):
> I think I'd trust the tag2upload service given the documentation you
> presented about it. I'm less faithful in all dgit installations being
> sane, sorry. We've run into too many builds in dirty chroots already.

That does make sense.  This is one of the ways that tag2upload is
better than dgit push.  (It is a shame that "integrity" concerns are
blocking integrity improvements.)

It would be possible to write a QA service which would verify Dgit
fields and automatically file RC bugs.  So far that hasn't seemed
necessary.

It would also be possible for dgit clone to verify the correspondence
itself, at the point where it honours the Dgit field.  Would that be a
useful feature for you ?  Of course it does mean downloading the
elements of the source package, which it currently doesn't need to do
if it finds a Dgit field, but there's no real difficulty.  (I wouldn't
make this the default!)

> > You do not need to talk to any random git servers.  The git tree is
> > available on a single official Debian server, the dgit git server.
> > The Dgit: field in the .dsc identifies the commitid.  The .dsc is of
> > course available via the signed apt repositories, as well as being
> > available from the ftpmaster data API.
> 
> I was not trying to imply dgit to be a random git server. Given that
> dgit (currently) only contains history for a fraction of packages, we
> still need to compare with Vcs-Git. Given enough time, dgit will have
> useful histories eventually.

Yes.  If tag2upload is deployed, I expect it to be very popular.

Until then Vcs-Git has all the problems you mention and many others
too: it is hard to reliably find the right tag (even the tag name is
not formally standardised!) and certainly nothing checks that the tag
corresponds in any particular way.  How it might correspond is
generally not even documented anywhere - at least, not anywhere
machine-readable.

> Hmm. I'm not sure whether I actually need the tag object. The commit id
> is what I really need. dak might need the tag object. I'll defer to
> others.

I think ftpmaster's concerns mean that dak would want the tag object
to redo the uploader identify verification, even though from my point
of view that would be a redundant check.  But it's simple to provide
the tag and there is some integrity improvement from doing so, so that
is what I am proposing.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-29 Thread Bastian Blank
Hi Didier

On Mon, Oct 28, 2019 at 10:05:11AM +0100, Didier 'OdyX' Raboud wrote:
> Of course, all of this can only work if we can have, or make the ".git to 
> .dsc" conversion reproducible; hence my query.

Now, please read the first mail of this thread again.  Yes, maybe parts
of it are unclear, but we are way past the "we need this conversion"
stage.

Maybe we can stop running in circles around this concept and design
solutions.

Bastian

-- 
Prepare for tomorrow -- get ready.
-- Edith Keeler, "The City On the Edge of Forever",
   stardate unknown



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-29 Thread Helmut Grohne
Hi Ian,

On Tue, Oct 29, 2019 at 12:54:57PM +, Ian Jackson wrote:
> I wonder if I have misunderstood you, because:
> 
> The tag2upload proposal is based on dgit, which already provides this.
> dgit indeed defines an isomorphism between source packages and git
> trees, and dgit clone gives a git branch that is thus-isomorphic to
> the .dsc.  This is fundamental to dgit's design.

I get that this is the intention, but I don't see that this property can
be safely assumed. I see the Dgit field as a hint. It says "this source
package should be equivalent to this commit" without any guarantees of
this actually being the case. I guess that for all uploads performed
thus far, this is indeed the case, but it is not a requirement validated
by dak or any other trusted (by me) entity. We could easily end up with
an upload where the commit id is accidentally different. Everything that
we can be gotten wrong, we will eventually get wrong.

> With `dgit push', the isomorphism is checked on the maintainer's
> machine during `dgit push'.  With tag2upload it is ensured by the
> tag2upload service.  (When the uploader didn't use dgit, dgit clone
> does a .dsc import, thus ensuring the isomorphism.)

I think I'd trust the tag2upload service given the documentation you
presented about it. I'm less faithful in all dgit installations being
sane, sorry. We've run into too many builds in dirty chroots already.

> > This property allows me to start from a git tree that is
> > authenticated by dak rather than a random git tree on a random git
> > server of questionable origin.
> 
> You do not need to talk to any random git servers.  The git tree is
> available on a single official Debian server, the dgit git server.
> The Dgit: field in the .dsc identifies the commitid.  The .dsc is of
> course available via the signed apt repositories, as well as being
> available from the ftpmaster data API.

I was not trying to imply dgit to be a random git server. Given that
dgit (currently) only contains history for a fraction of packages, we
still need to compare with Vcs-Git. Given enough time, dgit will have
useful histories eventually.

> It is true that this doesn't give you precisely the *tag* object -
> just the commit.  Adding the objectid of the tag object to the .dsc
> Dgit: field would be easy, if that would be helpful to you.  (Please
> file a wishlist bug against dgit if so.)  Alternatively, dak could
> publish the tag object (in a similar way to how it publishes binary
> buildinfos).

Hmm. I'm not sure whether I actually need the tag object. The commit id
is what I really need. dak might need the tag object. I'll defer to
others.

Helmut



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-29 Thread Ian Jackson
Helmut Grohne writes ("Re: Building Debian source packages reproducibly (was: 
Re: [RFC] Proposal for new source format)"):
> In other words, I want these formats (source package and tagged git
> tree) to be isomorphic (minus history). This requirement is too strong
> since not every source package will have a corresponding tag, but when
> there is a tag, I want to safely go from source package to tag and back
> again and arrive where I started from.

I wonder if I have misunderstood you, because:

The tag2upload proposal is based on dgit, which already provides this.
dgit indeed defines an isomorphism between source packages and git
trees, and dgit clone gives a git branch that is thus-isomorphic to
the .dsc.  This is fundamental to dgit's design.

With `dgit push', the isomorphism is checked on the maintainer's
machine during `dgit push'.  With tag2upload it is ensured by the
tag2upload service.  (When the uploader didn't use dgit, dgit clone
does a .dsc import, thus ensuring the isomorphism.)

> This property allows me to start from a git tree that is
> authenticated by dak rather than a random git tree on a random git
> server of questionable origin.

You do not need to talk to any random git servers.  The git tree is
available on a single official Debian server, the dgit git server.
The Dgit: field in the .dsc identifies the commitid.  The .dsc is of
course available via the signed apt repositories, as well as being
available from the ftpmaster data API.

It is true that this doesn't give you precisely the *tag* object -
just the commit.  Adding the objectid of the tag object to the .dsc
Dgit: field would be easy, if that would be helpful to you.  (Please
file a wishlist bug against dgit if so.)  Alternatively, dak could
publish the tag object (in a similar way to how it publishes binary
buildinfos).

Note that there are *two* tag objects: 1. the canonical view:
the dgit view tag, which is simply-isomorphic to the source package.
2. the maintainer tag, which is on the maintainer's branch and refers
to a commit in maintainer branch format.

With dgit push these are both made during dgit push with the
maintainer's key.  With tag2upload the canonical view tag is made by
the tag2upload service, because it is that service which performs the
maintainer->canonical conversion.

Each maintainer workflow defines a different mapping between
maintainer views and canonical views.  The (currently supported[1])
workflows are all isomorphisms.  So it is possible in principle to
reverse the maintainer->canonical transformation (if you know the
workflow, which can be found in the tags) but there is not currently
code to do that.  I don't get the impression, however, that this is a
thing you feel you need ?  (Some form of reverse transformation would
be needed to automatically and workflow-agnostically handle MRs whose
submitter is using the canonical view.)

> This backwards-connection seems to be missing thus far, but I do find it
> important for the reasons above. Adding it would easily allow dak to
> validate the signature on the tag.

So, I'm not sure I understand what you think is missing.

Ian.

[1] I think with monorepo workflows the maintainer->canonical
conversion is generally irreversible, because it discards information
about source packages other than the one in question.  This wouldn't
block MR processing because MRs are deltas and by definition the other
parts of the monorepo aren't edited in the MR.  It does mean you
couldn't reconstruct the whole monorepo given just the canonical view.

(Arguably this means that the .dsc representation of a source package
from a git monorepo is not a PFM.  See arguments on -legal and
-project, passim.  But the canonical view dgit branch does contain the
whole of the monorepo in its history, in a discoverable way, so
doesn't have this issue.)

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-29 Thread Tobias Frost
Hi Ian,

On Mon, Oct 28, 2019 at 05:53:00PM +, Ian Jackson wrote:

(...)
 
> For example, you would not be able to do this:
>git clone salsa:something
>cd something
>make some straightforward change
>git tag# } [1]
>git push   # }
> Instead you would have to download the .origs and so on, and wait
> while your machine crunched about unpacking and repacking tarballs,
> applying patches, etc.


I'm missing a "and then I test my package to ensure it still works before
upload" step…

I wonder how someone should test their packages when they do
not build it locally.
And if they do (as they should), the advantages you line
out are simply not there.


-- 
 tobi
 


signature.asc
Description: PGP signature


Re: [RFC] Proposal for new source format

2019-10-28 Thread Helmut Grohne
Hi Sean,

On Sun, Oct 27, 2019 at 10:11:22AM -0700, Sean Whitton wrote:
> On Sat 26 Oct 2019 at 04:24PM -07, Russ Allbery wrote:
> > Hm, that's an interesting thought.  I do generally include that sort of
> > information in the docuemntation of all packages for which I'm upstream,
> > but for Debian I've assumed the preferred way to propose changes is the
> > BTS.  Now that's potentially changing with Salsa.  I don't really mind
> > monitoring multiple input formats, but some people will.
> 
> I think that README.source is a fine place for this sort of information.

Hell, no!

Having to read some arbitrary README.source slows down patch submission
excessively. You may consider this cost low, but if you try to file
thousands of patches across the whole archive, this adds up. Documenting
the preferred way of change submission in a machine-readable format
absolutely is a requirement for performing archive-wide changes. Our
present implementation of this requirement is "maintainers must consume
bugs filed via the BTS". I think this is less than ideal, but works
reasonably well from a submitter-pov.  Changing this to "look up in
README.source" would make me stop contributing to Debian.

I think a good way to look at this is from a user-interface pov. I'm
attaching a script "reportpatch" to this mail. It takes the path to a
.debdiff and submits the change to the package maintainer. The current
implementation parses the package and version from the patched
debian/changelog and fires up reportbug. Filing patches becomes
reasonably quick this way.

Now if a different submission is preferred (e.g. salsa pr), a new
reportpatch should detect that and create a suitable pr at a suitable
project or fall back to the bts without me having to even think about
all of this.

I caution that it is not as easy. There are two more bits that are
easily overlooked. As a patch submitter, I don't want to work from an
arbitrary VCS snapshot that may or may not work. I want a tree that
precisely reproduces the failure seen in QA. In other words, I don't
want to deal with maintainer git trees. This is sad, but required to be
effective.

After submitting the change, we all know that Debian maintainers quickly
apply it. In Debian, quickly can mean years. Consequently, I need some
identifier that allows me to mechanically check whether the requested
change was applied to unstable (or declined). Yes, this is very
different from a salsa issue being closed. The version tracking in the
BTS quite reliably answers this question.

Enough people have complained about the BTS now that I'm convinced that
we need to somehow change the process for change submission. I'm equally
convinced, that documenting it in README.source is not the solution.

Helmut
#!/usr/bin/python3

import os
import re
import sys
import unidiff

def die(message):
sys.stderr.write("%s\n" % message)
sys.exit(1)

c = [f for f in unidiff.PatchSet.from_filename(sys.argv[1])
 if re.match("^[^/]*/debian/changelog$", f.target_file)]
if len(c) < 1:
die("debian/changelog not patched")
elif len(c) > 1:
die("multiple debian/changelog??")
pkg, version = None, None
for h in c[0]:
for l in h:
if l.is_context:
m = re.match(r"^([a-z0-9.+-]+) \(([^()_ ]+)\) \S+;", l.value)
if not m:
continue
pkg, version = m.groups()
os.execlp("reportbug", "reportbug", "--from-buildd=%s_%s" % (pkg, version), 
"--tag=patch")


Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Helmut Grohne
Hi Ian,

On Mon, Oct 28, 2019 at 05:53:00PM +, Ian Jackson wrote:
> The sticking point, as I understand it, is that this still does not
> allow dak to verify that the *contents* of the .dsc were as intended
> by the uploading human. [0]
> 
> In the tag2upload proposal, the conversion from git tag to dsc is
> `merely' done by an official Debian service on an official Debian
> machine, and is `merely' fully reproducible and auditable.
> 
> But this is not good enough for some ftpmasters, who want that
> verification to be done *by dak*.  Various people attempted in the
> previous thread on this topic to find out *why* this is thought
> important, without apparent success.

I fear I'll have to side with "some ftpmasters" here. As a user, I also
want this verification work in both ways. Going from tag to upload is
insufficient in my view. What I want is "apt source" with history. This
is not debcheckout. I want the exact tree (tag) that matches unstable
including its git history in a way that exactly reproduces the build
failure seen on the source package.

In other words, I want these formats (source package and tagged git
tree) to be isomorphic (minus history). This requirement is too strong
since not every source package will have a corresponding tag, but when
there is a tag, I want to safely go from source package to tag and back
again and arrive where I started from. This property allows me to start
from a git tree that is authenticated by dak rather than a random git
tree on a random git server of questionable origin.

This backwards-connection seems to be missing thus far, but I do find it
important for the reasons above. Adding it would easily allow dak to
validate the signature on the tag.

Helmut



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Scott Kitterman



On October 28, 2019 5:53:00 PM UTC, Ian Jackson 
 wrote:
>Scott Kitterman writes ("Re: Building Debian source packages
>reproducibly (was: Re: [RFC] Proposal for new source format)"):
>> Effectively tag2upload would replace DAK as the entry point for
>> packages into the archive (the equivalent to the current source
>> package signature verification being the git tag signature
>> verification).  I don't think the discussion got to a point where a
>> path forward that was considered reasonable by both the tag2upload
>> developers and the FTP Masters was reached.
>
>The current tag2upload proposal includes providing dak with the signed
>git tag object so that it can re-verify the identity of the human DD
>who authorised the upload.
>
>The sticking point, as I understand it, is that this still does not
>allow dak to verify that the *contents* of the .dsc were as intended
>by the uploading human. [0]
>
>In the tag2upload proposal, the conversion from git tag to dsc is
>`merely' done by an official Debian service on an official Debian
>machine, and is `merely' fully reproducible and auditable.
>
>But this is not good enough for some ftpmasters, who want that
>verification to be done *by dak*.  Various people attempted in the
>previous thread on this topic to find out *why* this is thought
>important, without apparent success.
>
>It would be nice to be able to work around this objection somehow by
>writing more code.  Unfortunately, any alternative - such as that
>described by Didier earlier in this thread - has undesirable
>properties.  In particular, it does not seem that it would be possible
>to do anything along these lines without continuing to burden the
>developer's working system with a whole pile of messing about with
>tarballs and quilt and so on.
>
>For example, you would not be able to do this:
>   git clone salsa:something
>   cd something
>   make some straightforward change
>   git tag# } [1]
>   git push   # }
>Instead you would have to download the .origs and so on, and wait
>while your machine crunched about unpacking and repacking tarballs,
>applying patches, etc.
>
>With tag2upload all that work is done for you on the tag2upload
>service, which of course has a fast network connection - and you don't
>need to wait for it.
>
>Ian.
>
>[0] Currently, of course, this requirement is not achieved for
>existing git based uploads.  In practice, DDs use ad-hoc
>git-buildpackage runes on their local machine to convert git data into
>.dscs.  That conversion is not controlled, not reproducible, and not
>practically auditable.  I guess maybe those blocking tag2upload think
>it is sufficient that we can blame the DD if it is messed up or
>suborned ?
>
>[1] In practice with tag2upload you would use `git-debpush' instead of
>`git tag' + `git push' but this is a thin wrapper around `git tag' and
>does not involve downloads or indeed any network activity, etc.

And the talking past each other surely continues because I don't think that in 
any way answers the objections.  Repeating the same thing you've said before 
isn't going to close the communication gap.  I don't think it's possible to do 
so right now.  Also, I'm a mere FTP Assistant, so I'm not one of the ones you 
have to convince.

Scott K



Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Ian Jackson
Didier 'OdyX' Raboud writes ("Building Debian source packages reproducibly 
(was: Re: [RFC] Proposal for new source format)"):
> Where I'm coming from is that we were discussing the tag2upload
> problem at miniDebConf Vaumarcus.  [...]

I appreciate your efforts to try to unstick all this.

> The hard part is not the packing and unpacking of the special tag; that's 
> mostly just strings massaging. But building the exact same source package in 
> different environments is harder than I expected.

Yes.  I have code to do it for tag2upload, though.  It's not released
yet because I stopped putting effort into this whole area after
getting discouraged.

> Of course, all of this can only work if we can have, or make the ".git to 
> .dsc" conversion reproducible; hence my query.
> 
> All-in-all; would this be a welcome mechanism?

I think by requiring the user to always have the tarballs on hand and
wait for them to be manipulated and maybe transferred, you are losing
a fair amount of the benefit of tag2upload.

But if you did want to do something along these lines, maybe you
should do it by adding code to git-debpush and the tag2upload service
rather than by reinventing the rest of the machinery ?

Regards,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Ian Jackson
Scott Kitterman writes ("Re: Building Debian source packages reproducibly (was: 
Re: [RFC] Proposal for new source format)"):
> Effectively tag2upload would replace DAK as the entry point for
> packages into the archive (the equivalent to the current source
> package signature verification being the git tag signature
> verification).  I don't think the discussion got to a point where a
> path forward that was considered reasonable by both the tag2upload
> developers and the FTP Masters was reached.

The current tag2upload proposal includes providing dak with the signed
git tag object so that it can re-verify the identity of the human DD
who authorised the upload.

The sticking point, as I understand it, is that this still does not
allow dak to verify that the *contents* of the .dsc were as intended
by the uploading human. [0]

In the tag2upload proposal, the conversion from git tag to dsc is
`merely' done by an official Debian service on an official Debian
machine, and is `merely' fully reproducible and auditable.

But this is not good enough for some ftpmasters, who want that
verification to be done *by dak*.  Various people attempted in the
previous thread on this topic to find out *why* this is thought
important, without apparent success.

It would be nice to be able to work around this objection somehow by
writing more code.  Unfortunately, any alternative - such as that
described by Didier earlier in this thread - has undesirable
properties.  In particular, it does not seem that it would be possible
to do anything along these lines without continuing to burden the
developer's working system with a whole pile of messing about with
tarballs and quilt and so on.

For example, you would not be able to do this:
   git clone salsa:something
   cd something
   make some straightforward change
   git tag# } [1]
   git push   # }
Instead you would have to download the .origs and so on, and wait
while your machine crunched about unpacking and repacking tarballs,
applying patches, etc.

With tag2upload all that work is done for you on the tag2upload
service, which of course has a fast network connection - and you don't
need to wait for it.

Ian.

[0] Currently, of course, this requirement is not achieved for
existing git based uploads.  In practice, DDs use ad-hoc
git-buildpackage runes on their local machine to convert git data into
.dscs.  That conversion is not controlled, not reproducible, and not
practically auditable.  I guess maybe those blocking tag2upload think
it is sufficient that we can blame the DD if it is messed up or
suborned ?

[1] In practice with tag2upload you would use `git-debpush' instead of
`git tag' + `git push' but this is a thin wrapper around `git tag' and
does not involve downloads or indeed any network activity, etc.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Scott Kitterman
On Monday, October 28, 2019 9:45:36 AM EDT Theodore Y. Ts'o wrote:
> On Mon, Oct 28, 2019 at 10:05:11AM +0100, Didier 'OdyX' Raboud wrote:
> > Where I'm coming from is that we were discussing the tag2upload problem at
> > miniDebConf Vaumarcus. The heart of the problem is that FTP-Master are
> > (currently) not going to accept .dscs built reproducibly by a (even
> > trusted) service. tag2upload is built on the idea that a signed git tag
> > is the only needed thing (`git tag -s`) to trigger an upload, and that is
> > not going to fly currently.
> 
> Ah, now I understand the problem you're trying to solve; thanks for
> the context.
> 
> What are FTP Master's objections?  Given that they *do* accept a
> source-only upload, which is just a signed dsc plus the source/debian
> tarballs, I would presume all that would be necessary is 
> demonstate that we have tools which can reliably translate between a
> git commit and the dsc plus source tarball, and (b) that the git tree
> is stored in Debian project infrastructure so we can be assured that
> it can be stored with the same level of assurance as where we store
> the source tar files.
> 
> Do they have other concerns?  If so, what are they?  I would be
> surprised that it has anything at all to do with reliable builds,
> given the acceptance of source-only uploads today.

My recollection of the discussion is that they key (pun intended) factor is 
signed by who.  Currently all uploads are signed by an individual authorized 
to upload the package to the archive.  The tag2upload proposal is premised on 
such keys being replaced by a single service based signing key.

Effectively tag2upload would replace DAK as the entry point for packages into 
the archive (the equivalent to the current source package signature 
verification being the git tag signature verification).  I don't think the 
discussion got to a point where a path forward that was considered reasonable 
by both the tag2upload developers and the FTP Masters was reached.

There was a fair amount of discussion on this point in the tag2upload threads.

Scott K




Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Theodore Y. Ts'o
On Mon, Oct 28, 2019 at 10:05:11AM +0100, Didier 'OdyX' Raboud wrote:
> Where I'm coming from is that we were discussing the tag2upload problem at 
> miniDebConf Vaumarcus. The heart of the problem is that FTP-Master are 
> (currently) not going to accept .dscs built reproducibly by a (even trusted) 
> service. tag2upload is built on the idea that a signed git tag is the only 
> needed thing (`git tag -s`) to trigger an upload, and that is not going to 
> fly 
> currently.

Ah, now I understand the problem you're trying to solve; thanks for
the context.

What are FTP Master's objections?  Given that they *do* accept a
source-only upload, which is just a signed dsc plus the source/debian
tarballs, I would presume all that would be necessary is (a)
demonstate that we have tools which can reliably translate between a
git commit and the dsc plus source tarball, and (b) that the git tree
is stored in Debian project infrastructure so we can be assured that
it can be stored with the same level of assurance as where we store
the source tar files.

Do they have other concerns?  If so, what are they?  I would be
surprised that it has anything at all to do with reliable builds,
given the acceptance of source-only uploads today.

> The hard part is not the packing and unpacking of the special tag; that's 
> mostly just strings massaging. But building the exact same source package in 
> different environments is harder than I expected.

Is there more than just (a) making sure the package can be built
reproducibly in the first place, and (b) the information in the
buildinfo file?

Of course, the big problem is that not all packages are currently set
up to be reproducibly built; for example if you try to compile using
Link Optimization (LTO), you're completely out of luck.  (I've since
dropped use of LTO to deal with this issue.)

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932098

But if it *is* reproducibly buildable, are there case where setting up
a build environment using the information in buildinfo not enough?

Cheers.

- Ted



Re: Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Marek Mosiewicz
Hello,

In fact what can be important is problem of downloading artifacts
during build. At least in Java world given application can be small but
be dependant on many libs which are downloaded during build. Program
works, build is reproducible, but we can have no idea what it consist
of.

Best regards,
Marek Mosiewicz

W dniu pon, 28.10.2019 o godzinie 10∶05 +0100, użytkownik Didier 'OdyX'
Raboud napisał:
> Le mercredi, 23 octobre 2019, 15.49:11 h CET Theodore Y. Ts'o a écrit
> :
> > On Wed, Oct 23, 2019 at 11:18:24AM +1000, Russell Stuart wrote:
> > > On Tue, 2019-10-22 at 16:52 -0700, Russ Allbery wrote:
> > > > That seems excessively pessimistic.  What about Git makes you
> > > > think
> > > > it's impossible to create a reproducible source package?
> > > 
> > > Has it been done?  Given this point has been raised several times
> > > before if it hasn't been done by now I think it's reasonable to
> > > assume
> > > it's difficult, and thinking that it's so is not excessively
> > > pessimistic.
> > 
> > Generating a reproducible source package given a particuar git
> > commit
> > is trivial.  All you have to do is use "git archive".  For example:
> 
> When talking about upstream projects, sure.
> 
> But generating Debian source packages (.dsc and friends) from a
> `debian/master` (+ `pristine-tar`) reproducibly is not really, right?
> 
> As far as I understand, `gbp buildpackage -S` is the closest we have,
> but so 
> far, I fail to get it to give me the bit-by-bit identical unsigned
> .dsc that 
> I'd like to get. What am I missing?
> 
> (A little digresssion…)
> 
> Where I'm coming from is that we were discussing the tag2upload
> problem at 
> miniDebConf Vaumarcus. The heart of the problem is that FTP-Master
> are 
> (currently) not going to accept .dscs built reproducibly by a (even
> trusted) 
> service. tag2upload is built on the idea that a signed git tag is the
> only 
> needed thing (`git tag -s`) to trigger an upload, and that is not
> going to fly 
> currently.
> 
> The solution that seemed obvious during the discussion [0] is to
> instead rely 
> on a local tool to produce a git tag with significantly more metadata
> (such as 
> .dsc signature, _source.changes signature); and reconstruct the a
> signed set 
> of .dsc and _source.changes automatically (as last pipeline step in
> Gitlab 
> CI), which are then acceptable by the archive.
> 
> In other words, its "tag2upload", but with a reproducible way to:
> - build a source package on developer machine;
> - sign it locally;
> - create and push a special git tag
> Then, in a different environment (such as a GitLab CI pipeline step),
> given a 
> special git tag and a repository;
> - build the exact unsigned same source package
> - unpack the special git tag;
> - apply the signatures to get the exact same signed source packages;
> - dput to the archive.
> 
> The hard part is not the packing and unpacking of the special tag;
> that's 
> mostly just strings massaging. But building the exact same source
> package in 
> different environments is harder than I expected.
> 
> Some caveats:
> - Q: if you built and signed the source package locally, why not
> uploading?  
>   A: Because you might want to only upload _after_ automated tests,
> and in an 
>  unsupervised manner.
> - Q: If one can fit pgp signatures in a git tag; why not inlining the
> complete 
>  .dsc and _source.changes?
>   A: Indeed. You still need the debian.tar though.
> - Q: What about Dgit: in the .dsc, or buildinfo files?
>   A: Currently optional; could just be left out for a prototype.
> 
> Of course, all of this can only work if we can have, or make the
> ".git to 
> .dsc" conversion reproducible; hence my query.
> 
> All-in-all; would this be a welcome mechanism?
> 
> 
> OdyX
> 
> [0] It probably was already considered.



Re: [RFC] Proposal for new source format

2019-10-28 Thread Andrey Rahmatullin
On Wed, Oct 23, 2019 at 09:49:11AM -0400, Theodore Y. Ts'o wrote:
> > > That seems excessively pessimistic.  What about Git makes you think
> > > it's impossible to create a reproducible source package?
> > 
> > Has it been done?  Given this point has been raised several times
> > before if it hasn't been done by now I think it's reasonable to assume
> > it's difficult, and thinking that it's so is not excessively
> > pessimistic.
> 
> Generating a reproducible source package given a particuar git commit
> is trivial.  All you have to do is use "git archive".  For example:
If you want bit-by-bit reproducible .tar.gz this assumes that git archive
is reproducible (and that gzip -9n is reproducible).

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Building Debian source packages reproducibly (was: Re: [RFC] Proposal for new source format)

2019-10-28 Thread Didier 'OdyX' Raboud
Le mercredi, 23 octobre 2019, 15.49:11 h CET Theodore Y. Ts'o a écrit :
> On Wed, Oct 23, 2019 at 11:18:24AM +1000, Russell Stuart wrote:
> > On Tue, 2019-10-22 at 16:52 -0700, Russ Allbery wrote:
> > > That seems excessively pessimistic.  What about Git makes you think
> > > it's impossible to create a reproducible source package?
> > 
> > Has it been done?  Given this point has been raised several times
> > before if it hasn't been done by now I think it's reasonable to assume
> > it's difficult, and thinking that it's so is not excessively
> > pessimistic.
> 
> Generating a reproducible source package given a particuar git commit
> is trivial.  All you have to do is use "git archive".  For example:

When talking about upstream projects, sure.

But generating Debian source packages (.dsc and friends) from a
`debian/master` (+ `pristine-tar`) reproducibly is not really, right?

As far as I understand, `gbp buildpackage -S` is the closest we have, but so 
far, I fail to get it to give me the bit-by-bit identical unsigned .dsc that 
I'd like to get. What am I missing?

(A little digresssion…)

Where I'm coming from is that we were discussing the tag2upload problem at 
miniDebConf Vaumarcus. The heart of the problem is that FTP-Master are 
(currently) not going to accept .dscs built reproducibly by a (even trusted) 
service. tag2upload is built on the idea that a signed git tag is the only 
needed thing (`git tag -s`) to trigger an upload, and that is not going to fly 
currently.

The solution that seemed obvious during the discussion [0] is to instead rely 
on a local tool to produce a git tag with significantly more metadata (such as 
.dsc signature, _source.changes signature); and reconstruct the a signed set 
of .dsc and _source.changes automatically (as last pipeline step in Gitlab 
CI), which are then acceptable by the archive.

In other words, its "tag2upload", but with a reproducible way to:
- build a source package on developer machine;
- sign it locally;
- create and push a special git tag
Then, in a different environment (such as a GitLab CI pipeline step), given a 
special git tag and a repository;
- build the exact unsigned same source package
- unpack the special git tag;
- apply the signatures to get the exact same signed source packages;
- dput to the archive.

The hard part is not the packing and unpacking of the special tag; that's 
mostly just strings massaging. But building the exact same source package in 
different environments is harder than I expected.

Some caveats:
- Q: if you built and signed the source package locally, why not uploading?  
  A: Because you might want to only upload _after_ automated tests, and in an 
 unsupervised manner.
- Q: If one can fit pgp signatures in a git tag; why not inlining the complete 
 .dsc and _source.changes?
  A: Indeed. You still need the debian.tar though.
- Q: What about Dgit: in the .dsc, or buildinfo files?
  A: Currently optional; could just be left out for a prototype.

Of course, all of this can only work if we can have, or make the ".git to 
.dsc" conversion reproducible; hence my query.

All-in-all; would this be a welcome mechanism?


OdyX

[0] It probably was already considered.

signature.asc
Description: This is a digitally signed message part.


Re: [RFC] Proposal for new source format

2019-10-27 Thread Russ Allbery
Russell Stuart  writes:

> I don't believe that.  I guess we are talking past each other.  Out of
> curiosity do you do maintain the changsets manually in git, or use
> something like gquilt?

I've tried a whole bunch of different things over the years, ranging from
manually-maintained feature branches through TopGit.  Currently, I'm using
gbp pq, which makes for nice separated changes but has the disadvantage of
making the history of each change irritating to view and understand (since
it involves diffs of diffs), and prevents use of native Git tools on the
history of the changes to upstream.

I'm a pretty strong believer in the merits of a rebase workflow for
exactly the reasons you stated, so I think my ideal solution is one of the
tools that maintains both a rebased branch and a merged branch with
history.

It's on my list to look at git-debrebase, which I'm hoping gives me the
pieces I want.

Note that my argument for Git is only partly about the history of upstream
changes.  A lot of it for me is the history of changes to the Debian
packaging files, about pull-request workflows, and about the ease of
someone who isn't intimately familiar with Debian's tools proposing a new
change to the package.  In my ideal world, someone would be able to make a
normal Git commit to the packaging repository and then push a tag and all
the right things would happen, although that's hard to achieve with a
rebase workflow.

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-27 Thread Russell Stuart
On Sun, 2019-10-27 at 20:29 -0700, Russ Allbery wrote:
> If you modify the upstream source, then by definition you do not have
> reproducibility of the upstream source, and you're now talking about
> something else (review of the changes, which I called audit in my
> previous message).

I think I'm guilty of a poor choice of words.

> I have no idea how you got that from my previous messages, but you
> have misunderstood.

Excellent.

> This is exactly my objection to reducing everything to patches rather
> than using the power of Git to represent the history and structure of
> the changes made for Debian.

Personally I don't see the "power of git" adds much apart from history,
but really it doesn't matter for this discussion.

> am completely baffled by your belief that this is inherently easier
> to do with quilt than with Git.

I don't believe that.  I guess we are talking past each other.  Out of
curiosity do you do maintain the changsets manually in git, or use
something like gquilt?



signature.asc
Description: This is a digitally signed message part


Re: [RFC] Proposal for new source format

2019-10-27 Thread Russ Allbery
Russell Stuart  writes:

> Harking back to the time we removed the randomness generator from
> openssl, it's very nice to have a single patch say "it was removed
> because it wasn't exercised in the tests.  upstream didn't respond to
> requests for comment" rather than having it interspersed with the 650
> odd other lines of other changes we carry with no explanation.

Git neither hurts nor helps that.  quilt neither hurts nor helps that.
This is a request for package maintainers to record changes as clear,
separable, single-topic changes with clear documentation, something that's
possible with all of these tools and possible to fail to do with all of
these tools.

I completely agree with you (well, for most packages; in some cases, we've
effectively forked the package and attempting to trace changes back to
some ancient upstream release is not useful, but that's a separate
problem), but it's orthogonal to this thread, and it's confusing to raise
this point here as if anyone in this thread is arguing against this.

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-27 Thread Russ Allbery
Russell Stuart  writes:

> That is a great definition of reproducibility if all you are interested
> in is the Debian version of the package.  It is not so great if you want
> is the upstream version of the package - ie, it is important to you that
> it behaves identically or at least diverges in accountable ways.  In
> that case you want a clear audit trail from the upstream source to the
> Debian binary.

If you modify the upstream source, then by definition you do not have
reproducibility of the upstream source, and you're now talking about
something else (review of the changes, which I called audit in my previous
message).  Surely this is obvious?  My definition does extend
reproducibility all the way back to upstream in the case that upstream is
unmodified.

In retrospect, for the modification case, I missed explicitly stating that
there should be some verifiable way to identify the point of divergance
from upstream and verify that this point matches the signed upstream
artifact, but for what it's worth that was part of my mental model.

I agree that audit is also interesting, but I object to confusing it with
reproducibility.

> What is important to me is the source contain an audit trail or how
> Debian got from the upstream source to the Debian package.  If I
> understand your position correctly, your proposal boils down adding a
> (single) branch to the upstream .git for the debian changes.

I have no idea how you got that from my previous messages, but you have
misunderstood.  I haven't talked at all about what representation in Git
should be used, since I think one of the features of standardizing on Git
is that you have all the substantial tools of Git available to you to find
a meaningful representation for your package.  That includes rebasing
workflows if they make sense, feature branches, and so forth.

> My problem isn't with using git - it's with the word "single".  It isn't
> even with you using a single branch, as perhaps that's appropriate for
> the packages you maintain (which it would be if the only change is to
> add a debian directory).  My problem is the implication that since it
> good enough for you, it's good enough for every package.  It's not.
> When you are carrying a lot of changes it's bloody horrible.

This is exactly my objection to reducing everything to patches rather than
using the power of Git to represent the history and structure of the
changes made for Debian.

> I can see how you might think that.  The reality is a different.  At no
> stage have I suggested you should be prevented from using git, or indeed
> any other mechanism you desire.  I have said if you adopt a new system
> like dgit please figure our a way of implementing one feature the one
> you are replacing (quilt) - a way to audit changes.

I agree with the importance of retaining the ability to audit changes, and
am completely baffled by your belief that this is inherently easier to do
with quilt than with Git.  I think maybe you're confusing the tool with
the implementation and are actually arguing in favor of a rebase workflow
for managing changes to upstream source, in which case we agree and this
is exactly why I use rebase workflows with my packages.  Using Git.

> But it has been proposed that everybody be forced to drop whatever
> workflow they might like in favour of dgit, and you look to be arguing
> in favour of that idea.

This sentence makes no sense to me.  dgit is not a workflow.

I'm fairly certain that I'm not in favor of whatever you think I'm in
favor of.  I don't recognize my position in your summaries.

> With the current split between development format and source
> distribution format you get to develop in any way you please.  If it
> wasn't split now there would have been no dgit.  The source format can
> be optimised for distribution and in fact is so optimised.  It is pretty
> much as efficient as it can be - it contains all the things you need to
> work on the source, and little else.

Yes, it's becoming clear that I'm retreading an argument that the dgit
developers already went through and have arrived at agreement with their
conclusions: we should have a lightweight, history-free, minimal source
package representation (which I believe is also the problem that Bastian
is trying to solve in the message that started this thread), and we should
separately have a proper representation of the package history for the
people who find that data valuable and important.

dgit largely achieves the second part (I think it doesn't for all possible
Git representations, but that's a solvable problem without long
debian-devel threads and can be driven by someone using a different
workflow who wants to use dgit); the remaining piece that I want is a
tag-based push workflow so that we can provide a native upload process for
developers who are used to a Git-centric way of working, rather than the
(from their perspective) arcane and complex mucking about with unfamiliar,
Debian-specific 

Re: [RFC] Proposal for new source format

2019-10-27 Thread Russell Stuart
On Wed, 2019-10-23 at 09:49 -0400, Theodore Y. Ts'o wrote:
> Generating a reproducible source package given a particuar git commit
> is trivial.  All you have to do is use "git archive".  For example:

It is indeed.  Almost a tautology.  But it's not what I'm interested in
doing.  The focus is on showing the connection between upstream's
source and Debian, not on reproducing Debian's source.

Repeating my earlier example, I want to show whether openssl (insert
name of fully audited package here) in Debian is a bit for bit
reproduction of upstream's openssl.  It won't be, of course, so I want
the next best thing: an audit trailing explaining exactly why it's
different.

Harking back to the time we removed the randomness generator from
openssl, it's very nice to have a single patch say "it was removed
because it wasn't exercised in the tests.  upstream didn't respond to
requests for comment" rather than having it interspersed with the 650
odd other lines of other changes we carry with no explanation.


signature.asc
Description: This is a digitally signed message part


Re: [RFC] Proposal for new source format

2019-10-27 Thread Russell Stuart
On Tue, 2019-10-22 at 20:21 -0700, Russ Allbery wrote:
> I define reproducibility as generating the same Debian source package
> from a signed Git tag of my packaging repository plus, for non-native 
> packages, whatever release artifacts upstream considers canonical
> (which may be a signed tarball or may be a Git tag or may be
> something else entirely).

That is a great definition of reproducibility if all you are interested
in is the Debian version of the package.  It is not so great if you
want is the upstream version of the package - ie, it is important to
you that it behaves identically or at least diverges in accountable
ways.  In that case you want a clear audit trail from the upstream
source to the Debian binary.

On Tue, 2019-10-22 at 20:21 -0700, Russ Allbery wrote:
> All of this business with patches and whatnot is an implementation
> detail.

If you are thinking of patches in terms of .dpatch files in
debian/patches then we both agree, as I don't consider the
representation to be particularly important.  It could be branches
stored in git for all I care, perhaps managed by a tool like gquilt. 

What is important to me is the source contain an audit trail or how
Debian got from the upstream source to the Debian package.  If I
understand your position correctly, your proposal boils down adding a
(single) branch to the upstream .git for the debian changes.  My
problem isn't with using git - it's with the word "single".  It isn't
even with you using a single branch, as perhaps that's appropriate for
the packages you maintain (which it would be if the only change is to
add a debian directory).  My problem is the implication that since it
good enough for you, it's good enough for every package.  It's not. 
When you are carrying a lot of changes it's bloody horrible.

Perhaps an illustration may help.  I used to be a consumer of RedHat
kernels. Back in the 2.6 days they carried 100's if not thousands of
individual patches for stuff they backported form Linux 3.0.  (I gather
they still do carry a lot of patches for their LTS releases.) When you
wanted to add your own modification there was invariably conflicts, and
without knowing what patches it conflicted with and why it was just
impossible.  Then Oracle released their "own" Linux distribution.  It
was a copy of RedHat, something Oracle didn't go out of its way to
acknowledge.  Effectively Oracle was garnishing for themselves part of
RedHat's revenue stream (support fees) using a rebadged RedHat product.
RedHat responded by doing effectively what you are suggesting.  They
replaced source rpm's audit trail of every change they made and why
with one humongous, uncommented patch.  Technically they were operating
in accordance with the layers reading of the GPL I guess - they were
distributing the source.  But it sure as hell wasn't in accordance with
a programmers definition of "source" (which is along the lines of
something you can edit), as porting a patch from a the .orig kernel to
RedHat's became damned near impossible.

A second illustration is the kernel development process itself.  One
huge patch is not considered acceptable.  They must be smaller, easily
understood, digestible patches.  The quilt source format encouraged
that format - to the point of having lintian checks for it.  Nowhere do
you propose a similar mechanism - or even acknowledge it's important.

On Tue, 2019-10-22 at 23:20 -0700, Russ Allbery wrote:
> Checking reproducibility only back to a set of patches does *not*
> provide a real guarantee of reproducibility, since a supply-chain
> attack could still have introduced malicious code in the patch 
> generation process.

You are damming the good because it's not perfect.  It's true there are
still ways of attacking the code, it merely renders those attacks
visible and attributable.  In fact rendering all changes visible and
attributable by insisting they are signed is *precisely* the mechanism
the kernel uses to defend itself both from malware attacks of the type
you envisage and when someone attempts to add copyrighted code that
opens the kernel to legal attack later.  Turns out a bit of sunlight is
a great disinfectant.

On Tue, 2019-10-22 at 23:20 -0700, Russ Allbery wrote:
> like an argument for dropping all of the features that I want and 
> retaining only the feature that you want, when you can derive the 
> feature that you want (at some additional complexity cost, to be 
> sure) from the format that I'm arguing for.

I can see how you might think that.  The reality is a different.  At no
stage have I suggested you should be prevented from using git, or
indeed any other mechanism you desire.  I have said if you adopt a new
system like dgit please figure our a way of implementing one feature
the one you are replacing (quilt) - a way to audit changes.  But it has
been proposed that everybody be forced to drop whatever workflow they
might like in favour of dgit, and you look to be arguing in favour of
that idea.  If we 

Re: [RFC] Proposal for new source format

2019-10-27 Thread Sean Whitton
Hello,

On Sat 26 Oct 2019 at 04:24PM -07, Russ Allbery wrote:

>> It probably would also be useful if the metadata had some standardized
>> way to indicate the preferred way to propose changes to either upstream
>> or the debian packaging maintainer --- whether it's e-mail to a
>> particular e-mail address, or a pull request, etc.
>
> Hm, that's an interesting thought.  I do generally include that sort of
> information in the docuemntation of all packages for which I'm upstream,
> but for Debian I've assumed the preferred way to propose changes is the
> BTS.  Now that's potentially changing with Salsa.  I don't really mind
> monitoring multiple input formats, but some people will.

I think that README.source is a fine place for this sort of information.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: [RFC] Proposal for new source format

2019-10-27 Thread Simon Richter
Hi,

On 27.10.19 01:20, Theodore Y. Ts'o wrote:

> I think we will need to support the source tar.gz for the forseeable
> future.  At very least, *deprecating* the tar.gz/tar.gz.asc format
> should be independent of question we also support a format that
> involves a URL to a git repoistory plus a signed git commit ID or a
> signed git tag.

We shouldn't forget that users may want to run their own mirrors for
whatever reason they might have, and anything that cannot be expressed
as "files in a directory tree referencing each other" is going to be tricky.

If I wanted to make a git-based archive format, I'd probably ship
bundles containing the upstream version that is being packaged plus all
commits on top of that and relevant tags.

That would also work as a base for modifications, because you can create
a shallow clone from a bundle and work inside that.

   Simon



signature.asc
Description: OpenPGP digital signature


Re: [RFC] Proposal for new source format

2019-10-26 Thread Russ Allbery
"Theodore Y. Ts'o"  writes:

> I do think, though that we should allow the specification of *multiple*
> git repositories, with some kind of type specifier so it can be clear
> whether a particular repository is just a read-only clone versus a
> read/write "master" repository, and whether a repository+branch is the
> upstream repository, and/or used by the debian maintainer's to maintain
> its packaging.

Oh, sure, absolutely.  Like you, I maintain all of my packages in multiple
Git repositories (once I add Salsa, there will generally be four of them)
for each package.

> It probably would also be useful if the metadata had some standardized
> way to indicate the preferred way to propose changes to either upstream
> or the debian packaging maintainer --- whether it's e-mail to a
> particular e-mail address, or a pull request, etc.

Hm, that's an interesting thought.  I do generally include that sort of
information in the docuemntation of all packages for which I'm upstream,
but for Debian I've assumed the preferred way to propose changes is the
BTS.  Now that's potentially changing with Salsa.  I don't really mind
monitoring multiple input formats, but some people will.

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-26 Thread Theodore Y. Ts'o
On Wed, Oct 23, 2019 at 11:40:06AM -0700, Russ Allbery wrote:
> Marvin Renich  writes:
> 
> > The source package has historically (prior to the widespread use of VCS)
> > also provided the basis for future development.  Since most development
> > these days is done using VCS, it's natural to try to adapt the source
> > package to contain the VCS.  I believe this is a mistake.  I think the
> > source package should remain a succinct encapsulation of the source
> > required to build a specific version of the binary packages.  It should
> > also identify the canonical VCS location where new development occurs
> > (and from which this snapshot was taken), but it does not need, and
> > should not have, the complete VCS history.
> 
> I think I'm coming around to this position.  I don't think it's the best
> or most elegant design in the abstract, but given where we're starting
> from and the various concerns involved, it does seem like the most
> practical design.

I think we will need to support the source tar.gz for the forseeable
future.  At very least, *deprecating* the tar.gz/tar.gz.asc format
should be independent of question we also support a format that
involves a URL to a git repoistory plus a signed git commit ID or a
signed git tag.

> That said, I don't like accepting the idea that we're always going to
> point to random different VCSs per package, which may be down,
> inaccessible, deleted by the maintainer, and so forth.  I don't want to
> force anyone to do anything, but I think there is immense value in the Git
> repositories created by dgit from archive uploads, and that value gets
> even stronger if those repositories are enhanced by including the
> maintainer and upstream history where available.

I believe that the hypothetical git source format which involves a git
URL must involve a git server under the Debian project's control.
That is, Debian must keep a permanent archive of the git repository,
regardless of whether or not it is the primary repository for the
purposes of making changes.  Certainly the dgit repositories would
qualify, but potentially other git hosting solutions might qualify.

I do think, though that we should allow the specification of
*multiple* git repositories, with some kind of type specifier so it
can be clear whether a particular repository is just a read-only clone
versus a read/write "master" repository, and whether a
repository+branch is the upstream repository, and/or used by the
debian maintainer's to maintain its packaging.

It probably would also be useful if the metadata had some standardized
way to indicate the preferred way to propose changes to either
upstream or the debian packaging maintainer --- whether it's e-mail to
a particular e-mail address, or a pull request, etc.

   - Ted



Re: [RFC] Proposal for new source format

2019-10-23 Thread Bastian Blank
Hi Ansgar

Thanks for filling in the gaps I left in my explanation.

On Wed, Oct 23, 2019 at 10:15:16AM +0200, Ansgar wrote:
> kernel.org uses a similar scheme: there are signatures for the
> uncompressed tarballs by the maintainer (linux-*.tar.sign).  In addition
> there is a sha256sums.asc which has strong hashes of the compresssed
> files (linux-*.tar.{gz,xz}) and is signed by their archive management
> system.

That's what I was thinking about.  My expection was:

The source .dsc file contains checksums for uncompressed files.  Those
can be built easily in a reproducible way, just as you mentioned that
the output of "git archive" is already pretty stable over time.  Those
are also the values dak uses to find identical files.

The upload .changes file contains checksums of the actually uploaded
compressed files.

The archive Sources file will be filled with checksums for either only
the compressed files or both.

The only piece of software that would be susceptible to attacks on the
decompressor are tools to download .dsc and the listed source files
directly without going throw a Sources file.

Regards,
Bastian

-- 
Schshschshchsch.
-- The Gorn, "Arena", stardate 3046.2



Re: [RFC] Proposal for new source format

2019-10-23 Thread Russ Allbery
Marvin Renich  writes:

> The source package has historically (prior to the widespread use of VCS)
> also provided the basis for future development.  Since most development
> these days is done using VCS, it's natural to try to adapt the source
> package to contain the VCS.  I believe this is a mistake.  I think the
> source package should remain a succinct encapsulation of the source
> required to build a specific version of the binary packages.  It should
> also identify the canonical VCS location where new development occurs
> (and from which this snapshot was taken), but it does not need, and
> should not have, the complete VCS history.

I think I'm coming around to this position.  I don't think it's the best
or most elegant design in the abstract, but given where we're starting
from and the various concerns involved, it does seem like the most
practical design.

That said, I don't like accepting the idea that we're always going to
point to random different VCSs per package, which may be down,
inaccessible, deleted by the maintainer, and so forth.  I don't want to
force anyone to do anything, but I think there is immense value in the Git
repositories created by dgit from archive uploads, and that value gets
even stronger if those repositories are enhanced by including the
maintainer and upstream history where available.  As much as possible, we
should try to centralize this information and these repositories on
project assets.  That doesn't mean the maintainer needs to *only* use
project hosting or even *primarily* use project hosting, but mirroring
this information into project hosting seems like a clear improvement for
everyone.

> One important aspect of this separation is that the VCS can include the
> original, unmodified upstream source, as long as it is redistributable
> in that fashion.  It has always bothered me that the modifications
> needed to convert the upstream source to a DFSG-compatible source are
> lost in the Debian source package.  Keeping the VCS separate allows it
> to contain the original, non-DFSG source and show what was done to make
> it DFSG and why.

Agreed, altough that "as long as it is redistributable" caveat is
important.  But you're right, this is a major advantage of separating the
Git repository from the source archive that I wasn't considering in my
message.

> I think the VCS-agnostic aspect of this has not been brought up in the
> related "Git Packaging" thread, but I think this is important.  While
> git is overwhelmingly the most popular VCS, it is not the only one (it's
> also not my preferred VCS for usability reasons).  I think it is
> short-sighted for Debian policy to mandate or even to strongly encourage
> a specific VCS.

I don't really agree with this.  The advantages of encouraging people to
standardize on a VCS are huge.  I'm not saying that we need to require
everyone use Git, but we should strongly encourage it, and a lot of things
in Debian will be much easier if you use Git (and should be).

That said, it may not matter that we don't agree here, since it sounds
like we're aligned on the high-level goal discussed in this thread and you
are arguing for adding an additional layer of abstraction over what I'd
add.  That's fine with me; I'm happy to make compromises like that if we
can get tag2upload support for Git.

> This effectively shuts down the use of any other VCS, and extremely
> hinders attempts to get a new or better VCS to be used for Debian
> development.

I don't believe this is the case because, practically, I believe
interoperability with Git is going to be a mandatory feature for any
future VCS that has any hope of achieving Git's level of support and
usage, regardless of anything Debian does or doesn't do.

> The tools for interacting with the archive, e.g. making it trivial to
> upload a signed commit to be built on official Debian hardware, should
> define a very small set of standard features necessary to accomplish
> that, such as clone, commit, checkout, push, pull, and tag (with
> signature).  This API should have a minimum of options, and it should be
> easy for someone to implement a shim from that API to a specific VCS.

This sounds like a fine (if more complicated) way to support tag2upload.

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-23 Thread Marvin Renich
I think this discussion has conflated two separate needs that should be
kept distinct.  The current source package provides a record of how the
binary packages were built from source.  This includes signatures and
verifiability of source, and, more recently, reproducibility.  It
provides the ability to build the binary packages on a user's own
machine, and can be a starting point for building them in an environment
not supported by Debian (in simple cases this might be exactly the same
as building on a current Debian architecture).

The source package has historically (prior to the widespread use of VCS)
also provided the basis for future development.  Since most development
these days is done using VCS, it's natural to try to adapt the source
package to contain the VCS.  I believe this is a mistake.  I think the
source package should remain a succinct encapsulation of the source
required to build a specific version of the binary packages.  It should
also identify the canonical VCS location where new development occurs
(and from which this snapshot was taken), but it does not need, and
should not have, the complete VCS history.

I do believe that Debian should strongly encourage use of a publicly
accessible VCS for packaging, and should define some VCS-agnostic
standards that the repositories SHOULD (RFC implications) follow, e.g.
basic branch structures, tag naming conventions, etc.

Separating the source package from the VCS repository, but having both,
allows both Russ' and Russell's goals to be met easily.

One important aspect of this separation is that the VCS can include the
original, unmodified upstream source, as long as it is redistributable
in that fashion.  It has always bothered me that the modifications
needed to convert the upstream source to a DFSG-compatible source are
lost in the Debian source package.  Keeping the VCS separate allows it
to contain the original, non-DFSG source and show what was done to make
it DFSG and why.  It will also help to facilitate using a single
repository to build both a source package for main and a corresponding
source package for non-free, when that is appropriate.

[Perhaps the following should be moved to the Git Packaging thread, but
I think much of this sub-thread could also be said to belong there.]

I think the VCS-agnostic aspect of this has not been brought up in the
related "Git Packaging" thread, but I think this is important.  While
git is overwhelmingly the most popular VCS, it is not the only one (it's
also not my preferred VCS for usability reasons).  I think it is
short-sighted for Debian policy to mandate or even to strongly encourage
a specific VCS.  This effectively shuts down the use of any other VCS,
and extremely hinders attempts to get a new or better VCS to be used for
Debian development.

Debian should separate everyday development with the VCS from
interaction between the VCS and the Debian archive.  The tools provided
for the former can be very VCS-specific, and can make full use of that
VCS' features.

The tools for interacting with the archive, e.g. making it trivial to
upload a signed commit to be built on official Debian hardware, should
define a very small set of standard features necessary to accomplish
that, such as clone, commit, checkout, push, pull, and tag (with
signature).  This API should have a minimum of options, and it should be
easy for someone to implement a shim from that API to a specific VCS.

This separation would allow much greater freedom of choice, but still
allow the developer to take full advantage of the chosen VCS.

...Marvin



Re: [RFC] Proposal for new source format

2019-10-23 Thread Sean Whitton
Hello,

On Wed 23 Oct 2019 at 09:49AM -04, Theodore Y. Ts'o wrote:

> Generating a reproducible source package given a particuar git commit
> is trivial.  All you have to do is use "git archive".  For example:
>
> #!/bin/bash
> #
> # Generate the e2fsprogs release tar ball
> #
>
> commit=HEAD
>
> if test -n "$1" ; then
> commit="$1"
> fi
>
> ver=`git show ${commit}:version.h | grep E2FSPROGS_VERSION  \
>   | awk '{print $3}' | tr \" " " | awk '{print $1}'`
> fn=e2fsprogs-${ver}.tar.gz
>
> git archive --prefix=e2fsprogs-${ver}/ ${commit} | gzip -9n > $fn
> echo "Generated $fn"
>
> Note that most of the hair is in deciding what *name* the source tar
> file.

Just for the benefit of those that might not know, this is implemented
in git-deborig in devscripts (though perhaps git-deborig doesn't quite
fit e2fsprogs's needs).

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: [RFC] Proposal for new source format

2019-10-23 Thread Theodore Y. Ts'o
On Wed, Oct 23, 2019 at 11:18:24AM +1000, Russell Stuart wrote:
> On Tue, 2019-10-22 at 16:52 -0700, Russ Allbery wrote:
> > That seems excessively pessimistic.  What about Git makes you think
> > it's impossible to create a reproducible source package?
> 
> Has it been done?  Given this point has been raised several times
> before if it hasn't been done by now I think it's reasonable to assume
> it's difficult, and thinking that it's so is not excessively
> pessimistic.

Generating a reproducible source package given a particuar git commit
is trivial.  All you have to do is use "git archive".  For example:

#!/bin/bash
#
# Generate the e2fsprogs release tar ball
#

commit=HEAD

if test -n "$1" ; then
commit="$1"
fi

ver=`git show ${commit}:version.h | grep E2FSPROGS_VERSION  \
| awk '{print $3}' | tr \" " " | awk '{print $1}'`
fn=e2fsprogs-${ver}.tar.gz

git archive --prefix=e2fsprogs-${ver}/ ${commit} | gzip -9n > $fn
echo "Generated $fn"

Note that most of the hair is in deciding what *name* the source tar
file.

- Ted



Re: [RFC] Proposal for new source format

2019-10-23 Thread Ian Jackson
Russ Allbery writes ("Re: [RFC] Proposal for new source format"):
> That said, Bastian's point about what we should do if we find that the Git
> repository contains something that isn't distributable is valid and needs
> to be dealt with regardless.  I think one of our points of disagreement is
> that I don't see how this is a concern specific to the archive; we already
> have this problem because Salsa is an official project service, so we need
> to solve this problem for arbitrary Git repositories already.

It is also a problem that the dgit git service could face.  That is
also an official project service.  I have anticipated the potential
need to deal with this issue.

With the assistance of the dgit server admin, a maintainer can rewrite
the history; and the server admin can blocklist the troublesome git
objects (which will prevent them from ever being pushed again).  The
server admin can also of course simply delete a package entirely.

So far this has not been necessary.  I don't know how often a similar
situation has arisen with alioth and now salsa.

(I agree with everything else you wrote, too.)

Thanks,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Re: [RFC] Proposal for new source format

2019-10-23 Thread Ansgar
Simon McVittie writes:
> On Tue, 22 Oct 2019 at 05:22:57 +0200, Bastian Blank wrote:
>> - Files need to be compressed and are recorded as such, which is a hard
>>   problem and give rise to tools like pristine-tar and such.
>
> My understanding is that this is deliberate: it means the only layer
> with the hard requirement to be able to cope with malicious/crafted files
> without introducing security vulnerabilities (whether that means arbitrary
> code execution via parser bugs, or denial of service via "zip bombs")
> is the PGP signature verification on the (uncompressed) .dsc. Everything
> else is authenticated before being decompressed, either directly via
> the PGP signature or via the authenticated hashes in the .dsc.

I think there are two separate uses:

 - if you want to validate that the upload is as intended by the
   maintainer, then a signature of the uncompressed source is
   sufficient. (A signature over the compressed source works too if you
   do not want to switch to new compression format later.)

 - for all other purposes (regular downloads, ...), one would like a
   signature over the data that is used, i.e. usually for downloads of
   the compressed variant.

kernel.org uses a similar scheme: there are signatures for the
uncompressed tarballs by the maintainer (linux-*.tar.sign).  In addition
there is a sha256sums.asc which has strong hashes of the compresssed
files (linux-*.tar.{gz,xz}) and is signed by their archive management
system.

As far as I understand git-archive is fairly good as reproducing
identical uncompressed tarballs at a later time from the git repository.

Ansgar



Re: [RFC] Proposal for new source format

2019-10-23 Thread Russ Allbery
Russell Stuart  writes:
> On Tue, 2019-10-22 at 20:21 -0700, Russ Allbery wrote:

>> This history has at least one commit per upload, although ideally has
>> the package maintainer's full revision history and upstream's full
>> revision history.

> I understand you like the history.  A lot of people do.  But not
> everyone values it, and I don't.

The nice thing about having the Git repository is that you can choose.  If
you don't care about the history, make a shallow clone, and your clone can
still interact with the repository with all of the standard tools.

There's a good point in there, though, that a shallow clone assumes that
you have a Git client interacting with a repository, which may be a good
argument for not trying to provide the features that I want directly in
the archive.  I agree that there's a size cost to having the source format
be a tarball of a Git repository with full history.  I'm not sure the size
cost is enough to matter, but I get why you view that with concern.

For the record, I use history extensively with packages where I'm
literally the only developer (Debian and upstream).  I'm not going to try
to convince you that you should do the same, but I will try to convince
you that Debian should not hamper my use cases (which I feel is currently
the case).

> That's a perfectly understandable perspective from a Debian Developer. 
> But lets take a different perspective, or a Debian user installing
> audited-crypto-program-x. What you are dismissing as "artefacts" is
> exactly the information the person installing this needs to assure
> themselves the Debian version of audited-crypto-program-x is a
> reasonably faithful reproduction of the original.  If the packaging is
> done well it will be broken down into small changes, each with a
> documented purpose.

I don't agree with this statement.  I think you're muddling auditing and
reproducibility.

The question reproducibility answers is "is this source package an
accurate and untampered copy of the combination of upstream source and
Debian packaging that the Debian package maintainer intended to put
together."  In other words, it's a question of supply-chain security.
Checking reproducibility only back to a set of patches does *not* provide
a real guarantee of reproducibility, since a supply-chain attack could
still have introduced malicious code in the patch generation process.  You
have to trace the provenance all the way to the maintainer's working tree,
which is what a signed Git tag will do.  Your proposed transformation
leaves a supply-chain security gap.

You're talking about an audit, which is when you open the hood and
determine whether you trust the Debian package maintainer or their work.
I agree that's *also* important.  I disagree with the assertion that a set
of patches is the best format for the information required to do an audit;
I'd much rather have a Git repository with full history, from which those
patches and much more can be easily derived.  But, either way, this is a
somewhat rarer use case than reproducibility (which is ideally checked for
every package continuously, since it's a security control preventing a
type of supply-chain attack).

> The point of defining the process of constructing the Debian source
> representation as a "pure function" is to guarantee it faithfully
> reflects the original source for and documented changes _only_ - not
> some random crap living in stale state carried across from years ago.

I understand why you want this specific artifact.  I'm objecting to what
feels, from my perspective, like an argument for dropping all of the
features that I want and retaining only the feature that you want, when
you can derive the feature that you want (at some additional complexity
cost, to be sure) from the format that I'm arguing for.

-- 
Russ Allbery (r...@debian.org)  



Re: [RFC] Proposal for new source format

2019-10-22 Thread Russell Stuart
On Tue, 2019-10-22 at 20:21 -0700, Russ Allbery wrote:
> This history has at least one commit per upload, although ideally 
> has the package maintainer's full revision history and upstream's 
> full revision history.

I understand you like the history.  A lot of people do.  But not
everyone values it, and I don't.  The only uses I've found for it are
git-bisect, reversing hasty deletes, and auditing who contributed what
which is a handy weapon in a court room copyright battle.  I can count
the number of times I've done all of those things in my life on one
hand.  Regular backups do those jobs almost as well, and I have to do
them anyway.

Source code control becomes a real time saver when you have a lot of
people working on the same source - I'd go so far as to say
indispensable for that case.  Such merging of histories needs a small
amount history to work with of course, and that makes that small amount
of history equally indispensable.  However the typical Debian Developer
scenario of the "lot of people" being you and upstream is a fairly
degenerate case, so their is understandably get some argument about
whether a heavyweight tool like git adds much.  If you like it that's
great - but others thinking it's not worth the bother is also great.  I
doubt anybody who just wants a one off copy of the source is going to
see much in the way of greatness.

On Tue, 2019-10-22 at 20:21 -0700, Russ Allbery wrote:
> I don't agree with this definition of reproducibility.  You're
> defining reproducibility from inputs that I consider build artifacts,
> which to me is rather weird.

That's a perfectly understandable perspective from a Debian Developer. 
But lets take a different perspective, or a Debian user installing
audited-crypto-program-x. What you are dismissing as "artefacts" is
exactly the information the person installing this needs to assure
themselves the Debian version of audited-crypto-program-x is a
reasonably faithful reproduction of the original.  If the packaging is
done well it will be broken down into small changes, each with a
documented purpose.

(None of this is rocket science or new, we are fairly close to it now. 
One the reasons I am writing this is it would be if better - and
definitely not get worse at it.)

The point of defining the process of constructing the Debian source
representation as a "pure function" is to guarantee it faithfully
reflects the original source for and documented changes _only_ - not
some random crap living in stale state carried across from years ago.

From my perspective there are lots of ways a Debian developer could
store this stuff.  Quilt patches with their headers are one and they
work well enough from this perspective - but a Git repository with
branches representing those same changes works equally well, although
it would be nice if a git branch (as opposed to a commit) could have a
rant associated with it about why it is there akin to the quilt patch
header.  (I guess this would be trivial to add by insisting each branch
adds a description file to the debian/branches directory.)  The Debian
developer is free to use whatever representation works best for them,
so long as when I download it, I can easily see the debian version of
openssl contained a patch that changed random number its generator to
getpid(), along with the reason why.

AFAICT, dgit does not address this, at all.  It's written purely from
the perspective of the Debian developer.


signature.asc
Description: This is a digitally signed message part


Re: [RFC] Proposal for new source format

2019-10-22 Thread Sean Whitton
Hello,

On Tue 22 Oct 2019 at 08:21PM -07, Russ Allbery wrote:

> In looking this over, none of this precludes the source format 4.0 that
> Bastian proposed, provided that there was some way to export that source
> format easily and simply from point 4.  Maybe it doesn't matter what's
> published in the source repository if everyone who wants this workflow
> uses some other service to interact with the Git repositories instead.  If
> this were available, I personally would stop using Debian source packages
> entirely and forget that they even exist, and would only use the above
> workflow.  Source packages then become an internal implementation detail
> of the archive that no one needs to care about unless they want to, or
> unless they're maintaining the dgit import service.
>
> It feels inelegant to me to have multiple publication mechanisms and
> multiple canonical formats and the ongoing cost of conversion from one to
> the other, but maybe that's already a sunk cost and it's worth paying it
> to avoid having tedious arguments?

Those of us working on dgit used to talk about eliminating source
packages.  More recently, we've come to the conclusion that it is not
helpful to think in those terms at all.  Source packages are just too
deeply embedded in all sorts of places, and there are edge cases, such
as packages with large binaries which we wouldn't want to check into
git.

Instead, we're now thinking of the work in the terms that you used --
trying to it possible to interact with the archive using only gittish
workflows and gittish ways of thinking.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Re: [RFC] Proposal for new source format

2019-10-22 Thread Russ Allbery
Russell Stuart  writes:

> Has it been done?  Given this point has been raised several times before
> if it hasn't been done by now I think it's reasonable to assume it's
> difficult, and thinking that it's so is not excessively pessimistic.

Oh, it's news to me that anyone has raised this before.  I was assuming no
one had bothered to try yet because it wasn't relevant.

Intuitively it feels like a much easier problem than reproducible binaries
given the nature of a Git repository.  The hardest part is probably the
same as with tar: how to keep the output reproducible over time as Git
changes.  I haven't tried it, though.

> I personally wonder how the mirrors are expected to handle .git
> repositories.  That would increase the number of files they have to
> handle by a couple of orders of magnitude.  What are the plans for that?
> Maybe you think that can handle it?  Maybe you plan to abandon the
> mirror network in favour of something else like the CDN?  Maybe you plan
> to remove the source from the mirrors?

I was implicitly assuming the actual source format would be some archive
of the Git repository rather than the raw Git repository.  I agree that
distributing raw Git repositories and thus tons of separate files per
source package doesn't sound like a good idea.  Although that does mean
that one can't just point a Git client at the archive, which would have
been neat.  More on that below.

Agreed that it's worth saying that explicitly, and it might be worth some
thought on what the best archive format would be, since tar has proven
troublesome for reproducibility.

I gave this some more thought over dinner and realized that my previous
message wasn't very constructive.  Let me try to make up for that by
describing what my goals are.  In writing this up, I realized that these
goals may not need to be met by the archive.  It feels awkward and less
than ideal to me to have multiple distribution points for source packages
in different formats, but it could be less awkward than the alternatives,
I suppose.

My goals (some of which are already met by dgit) are:

1. Every package in Debian has a canonical representation of its source
   history in Git, with a branch structure that reflects the divergence
   between different archive suites.  This history has at least one commit
   per upload, although ideally has the package maintainer's full revision
   history and upstream's full revision history.

2. This representation is readily available in some straightforward way
   (git clone would be ideal, some equivalently simple tool would be
   fine).

3. Every uploaded package clearly and unambiguously maps to a signed tag
   in the Git repository in the appropriate place in the revision
   history.

4. It's possible to upload a new version of a package to Debian (if one
   has the relevant permissions) by adding a signed tag and pushing to
   some Git remote.  If that upload is successful (which at least involves
   permission and sanity checks and ideally involves a test suite), that
   new upload appears in the canonical Git repository.  This should not
   require rewriting the branch or tag relative to the maintainer's local
   repository; in other words, it should match the Git tree that the
   maintainer tagged.

All of these together allow us to interact with the archive the way that
is now common to interact with other large Git projects, following any of
the standard Git workflows and using Git as the native tool for expressing
changes and tagging releases.  (At this point, I think it's safe to say
that Git has sufficiently won the VCS wars that any future wildly popular
VCS will have some mechanism to bidirectionally interact with Git
repositories.)

I believe dgit already does 1-3.  tag2upload would achieve 4.

In looking this over, none of this precludes the source format 4.0 that
Bastian proposed, provided that there was some way to export that source
format easily and simply from point 4.  Maybe it doesn't matter what's
published in the source repository if everyone who wants this workflow
uses some other service to interact with the Git repositories instead.  If
this were available, I personally would stop using Debian source packages
entirely and forget that they even exist, and would only use the above
workflow.  Source packages then become an internal implementation detail
of the archive that no one needs to care about unless they want to, or
unless they're maintaining the dgit import service.

It feels inelegant to me to have multiple publication mechanisms and
multiple canonical formats and the ongoing cost of conversion from one to
the other, but maybe that's already a sunk cost and it's worth paying it
to avoid having tedious arguments?

That said, Bastian's point about what we should do if we find that the Git
repository contains something that isn't distributable is valid and needs
to be dealt with regardless.  I think one of our points of disagreement is
that I don't see how this 

Re: [RFC] Proposal for new source format

2019-10-22 Thread Russell Stuart
On Tue, 2019-10-22 at 16:52 -0700, Russ Allbery wrote:
> That seems excessively pessimistic.  What about Git makes you think
> it's impossible to create a reproducible source package?

Has it been done?  Given this point has been raised several times
before if it hasn't been done by now I think it's reasonable to assume
it's difficult, and thinking that it's so is not excessively
pessimistic.

I personally wonder how the mirrors are expected to handle .git
repositories.  That would increase the number of files they have to
handle by a couple of orders of magnitude.  What are the plans for
that?  Maybe you think that can handle it?  Maybe you plan to abandon
the mirror network in favour of something else like the CDN?  Maybe
you plan to remove the source from the mirrors?

Finally, there are more consumers of the source format than the Debian
packagers.  For example, I regularly download Debian source packages
just to figure why the hell something isn't working as I expect.  When
I do that, there are two things that are important to me:

1.  The download is as small as possible, and doesn't require a
specialised tool.  (Github and gitlab go to the trouble of 
providing just such as thing, which I think is evidence it's
needed.)  The current format is pretty good in this area.  At
a pinch you can get away without using deb-source to unpack it. 

2.  The point that has been raised here - reproducible builds of the
source package.  By that I mean a reproducible build should be
pure function that is given the upstream source package and some
data in the form of patches or whatever, and ends up with the
source and build instructions.  Being a pure function it always
produces the same outputs give the same inputs.

Unfortunately Debian doesn't always do a good job of this 
currently, albeit for good reasons - we can't distribute the 
upstream source package so DD's rebuild it, but they are allowed
to do so in any way they please.

Any source format that handled the issues above would get the thumbs up
from me.  (Interestingly despite the hairs it has in other areas the
rpm source format have always done well on those issues.) 
Unfortunately Bastian's proposal doesn't address them directly.


signature.asc
Description: This is a digitally signed message part


Re: [RFC] Proposal for new source format

2019-10-22 Thread Russ Allbery
Bastian Blank  writes:
> On Mon, Oct 21, 2019 at 09:29:05PM -0700, Russ Allbery wrote:

>> If we're going to go to the trouble of defining a new source format,
>> I'd prefer we embrace a VCS-based one rather than once again rolling
>> our own idiosyncratic representation of a tree of files

> I'm not completely sure what you mean with "VCS-based".  You want to add
> a complete repository (dump) to the source?  Do we need to define a
> subformat for each VCS then?  CVS, SVN, GIT, just to name some used
> ones.  In any case we would be defining our own representation anyway,
> because each VCS behaves different.

I think it's safe at this point to just use Git.  It's the dominant VCS by
far and seems likely to remain so for the apparent future.  We'll need to
have some mechanism to generate a simple Git tree from source packages in
some other format, but that's not really a problem.  Right now, we convert
100% of non-native packages to a different format than their original.  If
we used Git as a VCS format, we would be closer to most of our upstreams
and the difference for non-Git upstreams doesn't seem too significant.

If at some point something else takes over from Git, we could always
switch to a 5.0 format.

> Also this would negate all the things we've accomplished on
> reproducibilty of source packages.

That seems excessively pessimistic.  What about Git makes you think it's
impossible to create a reproducible source package?

> We never shipped history as part of our source.  Was this asked for?

We've always shipped one version of history as part of our source.  That's
a large part of the point of separate upstream and Debian tarballs.  With
the addition of quilt, we ship even more history in the form of the patch
sequence.

And yes, this has been repeatedly requested and wanted by the project
going all the way back to Joey Hess's original proposal for the 3.0 (git)
source package format.  I think that was at least ten years ago?

Those of us who wanted it then haven't stopped wanting it.

> dpkg currently supports "3.0 (git)" as format, however it was never
> accepted by the archive.

To be clear, many of us would be happily using it right now.  It wasn't
accepted by the archive because the archive team vetoed it.

> There is a reason for that, as this would force license reviews not only
> on the current state, but on the history as well.  We would also just
> distribute arbitrary information we don't actually need to ship to
> hundred of unrelated mirror people and would bring them into jeopardy.

> If something really problematic slips in, we also would be forced to
> remove all intermediate versions, because they ship the history.

I understand the concerns with shipping *all* of the history, and I think
we'll need to get somewhat creative about what history to include and what
history to elide if we still have concerns about non-free elements
sneaking into the archive via history (which I'm dubious about, but see
below).  But Git has mechanisms to handle this (shallow clones, for
instance) that still preserve some of the utility of having a native Git
package.

> I think we are talking about different things.  I'm talking about the
> source we _must_ provide to fulfil several licenses and our own policy.
> If we save them in the form of snapshots.d.o for example, we have a
> complete history of the releases.

> You are talking about the detailed history, a history that might not
> even be accurate, as it can be changed retrospectively.

To be clear, I think including the history is just one of many advantages
to basing the source format on Git.  The overall advantage is that for
many packages the Debian source package becomes a familiar construct,
rather than some idiosyncratic invention of Debian, that can be
manipulated with standard tools and that is far closer to something that
one can immediately start hacking on.  This has huge benefits even if we
ship only a shallow clone with only one revision of history.

It has more benefits if we can include history, of course.

It also more clearly unblocks releasing via pushing signed tags, which is
the way that many, if not most by total number (if not by significance),
free software packages do releases these days, thus lowering the barrier
to entry for people packaging for Debian and again standardizing on common
tools.

> Just think about what would happen if a contributor adds code he must
> not distribute for whatever reason.  Another contributor finds this and
> removes it before any release happens.  This shows up some time later
> and we get angry mails or letters stating we ship stuff we must not.  So
> we now need to purge this information everywhere, even if it was never
> inside a release.

How do you plan to deal with this problem with Salsa right now?  Can't the
archive use the same mechanism that Salsa would?

There are also plenty of packages where the risk of this happening seems
low and where the Debian package maintainer 

Re: [RFC] Proposal for new source format

2019-10-22 Thread Thomas Goirand
Hi Bastian,

thanks for this proposal,

On 10/22/19 5:22 AM, Bastian Blank wrote:
> During the tag2upload discussion, I think it got clear that it does not
> fit anywhere.  And my standing is, that we can't implement such a
> service properly without some core changes to how our sources look like.

A few years back, I was building OpenStack packages on each push to the
repository. I was using Jenkins, I still do, but I've stopped doing
this, because Jenkins is too insecure to leave it opened to the world.
Though, I didn't need a new source package to implement this. I just
needed Git, and a bit of scripting.

All that to say: I really fail to understand why you think we need a new
source package format to implement tag2upload. Could you explain further?

I also fail to understand the current limitations that you're trying to
lift, and what issue you're trying to solve.

Cheers,

Thomas Goirand (zigo)



Re: [RFC] Proposal for new source format

2019-10-22 Thread Bastian Blank
Hi Russ

On Mon, Oct 21, 2019 at 09:29:05PM -0700, Russ Allbery wrote:
> If we're going to go to the trouble of defining a new source format, I'd
> prefer we embrace a VCS-based one rather than once again rolling our own
> idiosyncratic representation of a tree of files

I'm not completely sure what you mean with "VCS-based".  You want to add
a complete repository (dump) to the source?  Do we need to define a
subformat for each VCS then?  CVS, SVN, GIT, just to name some used
ones.  In any case we would be defining our own representation anyway,
because each VCS behaves different.

Also this would negate all the things we've accomplished on
reproducibilty of source packages.

> their history.

We never shipped history as part of our source.  Was this asked for?

dpkg currently supports "3.0 (git)" as format, however it was never
accepted by the archive.  There is a reason for that, as this would
force license reviews not only on the current state, but on the history
as well.  We would also just distribute arbitrary information we don't
actually need to ship to hundred of unrelated mirror people and would
bring them into jeopardy.

If something really problematic slips in, we also would be forced to
remove all intermediate versions, because they ship the history.

> Even
> if one is not convinced of the merits of uploading a true Git history of
> the package, I think it's clear that a lot of us *do* want to do this and

I think we are talking about different things.  I'm talking about the
source we _must_ provide to fulfil several licenses and our own policy.
If we save them in the form of snapshots.d.o for example, we have a
complete history of the releases.

You are talking about the detailed history, a history that might not
even be accurate, as it can be changed retrospectively.

Just think about what would happen if a contributor adds code he must
not distribute for whatever reason.  Another contributor finds this and
removes it before any release happens.  This shows up some time later
and we get angry mails or letters stating we ship stuff we must not.  So
we now need to purge this information everywhere, even if it was never
inside a release.

You still would need to prune it from the one (few?) central VCS, if you
use such thing.  But the project don't need to handle the hundres of
public copies out there, owned by hundreds of operators providing us
with mirror space

> see this as a very valuable step forward in providing a more complete
> history of the package and of decisions made in maintaining that package.

Even then you need additional information like the BTS to know the
decisions.

I think I understand what you mean and we have different goals.

I want to modify how we ship the source we _must_ ship, where we don't
have the option not to.  Just make the handling of it less painfull,
without sacrifice too many things we currently have.

You want to ship more info in immutable form.  Info that have the
abbility to bite us, the whole project, and many other people, just by
distributing it.

I hope this makes the reasons clear, why I proposed what I did and not
further.

Regards,
Bastian

-- 
Punishment becomes ineffective after a certain point.  Men become insensitive.
-- Eneg, "Patterns of Force", stardate 2534.7



Re: [RFC] Proposal for new source format

2019-10-22 Thread Simon McVittie
On Tue, 22 Oct 2019 at 05:22:57 +0200, Bastian Blank wrote:
> - Files need to be compressed and are recorded as such, which is a hard
>   problem and give rise to tools like pristine-tar and such.

My understanding is that this is deliberate: it means the only layer
with the hard requirement to be able to cope with malicious/crafted files
without introducing security vulnerabilities (whether that means arbitrary
code execution via parser bugs, or denial of service via "zip bombs")
is the PGP signature verification on the (uncompressed) .dsc. Everything
else is authenticated before being decompressed, either directly via
the PGP signature or via the authenticated hashes in the .dsc.

smcv



Re: [RFC] Proposal for new source format

2019-10-22 Thread Sam Hartman


Hi.
My initial reaction is that this is additional complexity in a direction
that we don't need.

Like Russ, I generally assume that VCS-like things are the future.
I understand there is complexity there.

But I don't understand why this proposed format would be a step forward
in a world where we care more about VCSes.  As an example, I don't
understand how this would make things better for tag2upload.

I don't think this proposal is sufficiently well developed where you're
going to get much good feedback on debian-devel.

My recommendation:

1) Consider Russ's comment.  Consider whether you still want to go
forward.

2) If so, find a small group of people who think your idea sounds good.
Focus on fleshing things out.  Include use cases and how this would make
the world better.

3) Solicit review on such a fleshed out proposal.

--Sam



Re: [RFC] Proposal for new source format

2019-10-22 Thread Bernd Zeimetz



On 10/22/19 6:29 AM, Russ Allbery wrote:
> Bastian Blank  writes:
> 
>> I would like to have some comments on a large revision on the source
>> format.  It also needs modifications to dak to handle some parts of it.
> 
>> - Source format version would be "4.0".
>> - Each source includes an arbitrary number of "tar" layers, which are
>>   applied sequentially and override files from previous steps.  I'm not
>>   sure if we need tombstones to be able to remove files.
> 
> If we're going to go to the trouble of defining a new source format, I'd
> prefer we embrace a VCS-based one rather than once again rolling our own
> idiosyncratic representation of a tree of files and their history.  Even
> if one is not convinced of the merits of uploading a true Git history of
> the package, I think it's clear that a lot of us *do* want to do this and
> see this as a very valuable step forward in providing a more complete
> history of the package and of decisions made in maintaining that package.

Ack!

Although a layered format might come handy if docker and friends would
be able to handle it. But I do NOT think we should go that way...



-- 
 Bernd ZeimetzDebian GNU/Linux Developer
 http://bzed.dehttp://www.debian.org
 GPG Fingerprint: ECA1 E3F2 8E11 2432 D485  DD95 EB36 171A 6FF9 435F



Re: [RFC] Proposal for new source format

2019-10-21 Thread Russ Allbery
Bastian Blank  writes:

> I would like to have some comments on a large revision on the source
> format.  It also needs modifications to dak to handle some parts of it.

> - Source format version would be "4.0".
> - Each source includes an arbitrary number of "tar" layers, which are
>   applied sequentially and override files from previous steps.  I'm not
>   sure if we need tombstones to be able to remove files.

If we're going to go to the trouble of defining a new source format, I'd
prefer we embrace a VCS-based one rather than once again rolling our own
idiosyncratic representation of a tree of files and their history.  Even
if one is not convinced of the merits of uploading a true Git history of
the package, I think it's clear that a lot of us *do* want to do this and
see this as a very valuable step forward in providing a more complete
history of the package and of decisions made in maintaining that package.

-- 
Russ Allbery (r...@debian.org)  



[RFC] Proposal for new source format

2019-10-21 Thread Bastian Blank
Hi

Debian in form of dpkg have a rather strict view on how our source
packages should look like.

- Files need to be compressed and are recorded as such, which is a hard
  problem and give rise to tools like pristine-tar and such.
- Different formats require different version formats.  The native
  sub-formats only allow native versions for example, leading to version
  mangling for packages like "linux-signed-*".
- Restrictions on what files needs to be in which tar for the quilt
  sub-format.
- The quilt sub-format can't be built without a correct orig tar.
- We have a large amount of git repositories that consists of
  source+patch, instead of read usable source, because using quilt is
  the only option we have.

During the tag2upload discussion, I think it got clear that it does not
fit anywhere.  And my standing is, that we can't implement such a
service properly without some core changes to how our sources look like.

I would like to have some comments on a large revision on the source
format.  It also needs modifications to dak to handle some parts of it.

- Source format version would be "4.0".
- Each source includes an arbitrary number of "tar" layers, which are
  applied sequentially and override files from previous steps.  I'm not
  sure if we need tombstones to be able to remove files.
- The tar files can be named the following ways:
  - $package_$version.tar
  - $package_$version.*.tar
  - $package_$upstreamversion.tar
  - $package_$upstreamversion.*.tar
- All files are recorded with their uncompressed checksum and the used
  transport compression is recorded in the dsc file.
- Either the existence of debian/patches/series or an explicit
  sub-format would do backward compatible setup with quilt.

dpkg-buildpackage would build a subset of this spec:
- .orig.tar
- .orig-*.tar
- .debian.tar

Other tools like the proposed tag2upload service can produce a different
subset.  They could ignore .orig if they don't have the info for example.

I'm missing things here, but I really would like to hear thoughts of
other people about it.  I also have no proof of concept yet.

Regards,
Bastian

-- 
A Vulcan can no sooner be disloyal than he can exist without breathing.
-- Kirk, "The Menagerie", stardate 3012.4