Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-28 Thread Jeremy Stanley
On 2021-06-27 23:49:18 -0300 (-0300), Emmanuel Arias wrote:
[...]
> if we package from PyPi, that don't contain the testsuite, that
> result in packages with any test, and that isn't good.
> 
> Also, I'm not sure, but the docs aren't in PyPi, isn't?
[...]

This depends entirely on how upstream is creating their sdists. They
might certainly choose to omit tests or even documentation, but I
think that's becoming less popular now that wheels exist. It is
expected for a wheel to omit basically everything except the
application, licensing information and some metadata. This has
reduced the pressure on upstreams with massive suites of tests or
volumes of documentation to strip them out of sdists, making it more
likely they'll ship full source distributions that way.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-27 Thread Emmanuel Arias
Hola everybody,

On 6/26/21 7:51 PM, Louis-Philippe Véronneau wrote:
> To me, the most important thing is that all packages must at least run
> the upstream testsuite when it exists (I'm planning on writing a policy
> proposal saying this after the freeze). If PyPi releases include them, I
> think it's fine (but they often don't).

I was a little surprised because in all discussion anyone take account
the tests (or if a missed sorry) thanks pollo to get it in discussion. I
don't have the correct numbers but I saw  many python package without
autopkgtest, and if we package from PyPi, that don't contain the
testsuite, that result in packages with any test, and that isn't good.

Also, I'm not sure, but the docs aren't in PyPi, isn't?

In the other hand, files like .git* or CI files, can be easily remove
it. So, I don't see any problems with that.

cheers

-- 
Emmanuel Arias
@eamanu
yaerobi.com



OpenPGP_0xFA9DEC5DE11C63F1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-26 Thread Jeremy Stanley
On 2021-06-26 18:51:26 -0400 (-0400), Louis-Philippe Véronneau wrote:
[...]
> To me, the most important thing is that all packages must at least
> run the upstream testsuite when it exists (I'm planning on writing
> a policy proposal saying this after the freeze). If PyPi releases
> include them, I think it's fine (but they often don't).

When you do write that, you'll of course want to clarify what "the
upstream testsuite" really means too. Lots of projects have vast
testing which is simply not feasible to replicate within Debian for
a number of reasons. Running some battery of upstream tests makes
sense, but testsuites which require root access outside a chroot,
integration tests orchestrated across multiple machines, access to
unusual sorts of accelerator or network hardware, and so on can
easily comprise part of "the upstream testsuite."
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-26 Thread Louis-Philippe Véronneau
On 2021-06-25 16 h 42, Nicholas D Steeves wrote:
> Hi Team!
> 
> I feel like there is probably consensus against the use of PyPi-provided
> upstream source tarballs in preference for what will usually be a GitHub
> release tarball, so I made an MR to this effect (moderate recommendation
> rather than a "must" directive):
> 
>   
> https://salsa.debian.org/python-team/tools/python-modules/-/merge_requests/16

I don't often use PyPi releases because of the issues mentioned in the
MR, but I think Jeremy's point is valid. IMO, rewording the text so that
it clearly says "should" and not "must" would fix the issues at hand, as
long as people justify their usage of PyPi when it's "The Right Thing"
in a file somewhere.

To me, the most important thing is that all packages must at least run
the upstream testsuite when it exists (I'm planning on writing a policy
proposal saying this after the freeze). If PyPi releases include them, I
think it's fine (but they often don't).

-- 
  ⢀⣴⠾⠻⢶⣦⠀
  ⣾⠁⢠⠒⠀⣿⡁  Louis-Philippe Véronneau
  ⢿⡄⠘⠷⠚⠋   po...@debian.org / veronneau.org
  ⠈⠳⣄



OpenPGP_signature
Description: OpenPGP digital signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-26 Thread Simon McVittie
On Fri, 25 Jun 2021 at 18:29:19 -0400, Nicholas D Steeves wrote:
> Take for example the
> case where upstream exclusively supports a Flatpak and/or Snap
> package...

Flatpak and Snap aren't source package formats (like Autotools "make dist"
or Meson "meson dist" or Python sdist), they're binary package formats
(like .deb or Python wheels).

I don't know Snap infrastructure well, but Flatpak apps are built from
a manifest that lists one or more source projects, referenced as either
a VCS commit with a known-good commit identifier (usually git) or an
archive with a known-good hash (usually tar and sha256). The manifest
format and the upstream-recommended Flathub "app store" infrastructure
try to push authors towards building from source, although as with
.deb, technically it's possible to release an archive containing binary
blobs and use it as the "source" (which is how proprietary apps like
com.valvesoftware.SteamLink work, similar to many packages in the non-free
archive area).

If the upstream only provides source via their VCS, then obviously we
have to use `git archive` or equivalent because we have no other way to
get a flat-file version, and the experimental dpkg-source format
"3.0 (git)" isn't currently allowed in the Debian archive. If the upstream
releases tarball artifacts and builds their Flatpak app from those, we can
use those too.

I think the problem case here is when the upstream releases something that
has the name and format we would associate with a source release, but
has contents that are somewhere between a pure source release and a binary
release. Autotools "make dist" has always been a bit like this (it contains
a pre-generated build system so that people can build on platforms where
m4 and perl aren't available, and it's common to include pre-generated
convenience copies of things like gtk-doc documentation); Python sdist
archives are sometimes similar. In both Autotools and setuptools, it's
also far too easy to have files in the VCS but accidentally omit them from
the source distribution, by not listing them in Autotools EXTRA_DIST or in
setuptools MANIFEST.in.

What I have generally done to resolve this problem is to use the upstream's
official source releases ("make dist" or sdist), and if they are missing
files that we want, send merge requests to add them to the next release
(for example https://gitlab.gnome.org/GNOME/gi-docgen/-/commit/5fcaba6f
and https://github.com/containers/bubblewrap/commit/1c775f43),
and if necessary work around missing files by shipping them in debian/
(for example https://salsa.debian.org/gnome-team/gi-docgen/-/commit/f16845d9).

Several upstreams of projects I work on, notably GNOME, have been
switching from Autotools to Meson, and one of the reasons I'm in favour
of this tendency is that the Meson "meson dist" archive is a lightly
filtered version of `git archive` (it excludes `.gitignore` and other
highly git-specific files, but includes everything else), making it
harder for upstreams to accidentally omit necessary source code from
their source releases.

smcv



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-26 Thread Martin
On 2021-06-26 02:04, Paul Wise wrote:
> I would like to see #2 split into two separate tarballs, one for the
> exact copy of the git tree and one containing the data about the other
> tarball. Then use dpkg-source v3 secondary tarballs to add the data
> about the git repo to the Debian source package.

IIRC, last time I tried multiple tarballs, I got stuck with
pristine-tar. Not sure, if I didn't find out how to commit or
if the problem was with checkout, though.

Do you happen to know, if this is an issue?

PS: Just for the record: I'm always(?) using upstream sources
from git, not PyPi, because the latter typically are missing
unit tests, which we want to run in Debian.



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Donald Stufft
File names on PyPI are write once. Once a specific file name has been used it 
can never be used again (even if the entire project was deleted and recreated). 

Projects can delete uploaded files (and as mentioned they can be yanked, but 
yanking is just extra metadata beside the file), but file content can never 
change, only be removed. 

Sent from my iPhone

> On Jun 25, 2021, at 11:47 PM, Brian Thompson  wrote:
> 
> On Fri, Jun 25, 2021 at 07:01:39PM -0400, Nicholas D Steeves wrote:
>> Does PyPi provide immutable releases?
> 
> From experience, I can tell you that yes, releases cannot be overwritten,
> but they can be "yanked".  Pypi states that a yanked release is:
> 
>  "A release that is always ignored by an installer, unless it is the
>  only release that matches a version specifier (using either '==' or
>  '===)."
> 
> -- 
> Best regards,
> 
> Brian T



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Jeremy Stanley
On 2021-06-26 02:04:40 + (+), Paul Wise wrote:
> On Fri, Jun 25, 2021 at 11:42 PM Jeremy Stanley wrote:
[..]
> > 2. Cryptographically signed tarballs of the file tree corresponding
> >to a tag in the Git repository, with versioning, revision
> >history, release notes and authorship extracted into files
> >included directly within the tarball.
> 
> I would like to see #2 split into two separate tarballs, one for the
> exact copy of the git tree and one containing the data about the other
> tarball. Then use dpkg-source v3 secondary tarballs to add the data
> about the git repo to the Debian source package.
[...]

You might like to see them split, but why is the exact copy of the
work tree the only legitimate way to export data from a Git
repository? Adding egg-info to the tarball creates a *Python Source
Distribution* which is a long-standing standard method for
distributing source code of Python software. Those files could even
be checked directly into the repository, so that the work tree was
itself also a valid sdist. The only reason the projects I work on
don't do that is because some of it would be redundant with the
metadata from the revision control system.

You could of course create your own split tarballs of the work tree
and the additional metadata files, but to what end? If upstream is
already delivering them together in a release tarball, how is making
your own beneficial when it still has to be done by the package
maintainer before assembling the source package? Users of Debian
don't benefit, because they still can't recreate your split tarball
if they wanted without also having a copy of the upstream Git
repository anyway. It just seems like make-work.

> Probably we should start systematically comparing upstream VCS repos
> with upstream sdists and reacting to the differences. So far, I've
> reacted by ignoring the sdists completely.

I highly recommend it. We explicitly test that our sdists don't omit
files from the Git worktree (sans .git* files like .gitignore and
.gitreview which make no sense outside the context of a Git
repository). On the other hand, I've found at least one case where a
copyright statement in a Debian package refers to an AUTHORS file
shipped as part of the sdist, but since the maintainer chose to
package it from Git instead and did not generate that file when
doing so, it's not included in the packaged version distributed in
Debian. (Not linking the bug report here as I don't want it to seem
like I'm picking on the maintainer.)

Just to reiterate, as an upstream we don't consider the work trees
of our Git repos to be complete source distributions. They can be
used along with the versioning and history tracked as part of the
repository to generate a complete source distribution, and that's
what we officially release. Downstream distributions are encouraged
to either use our release tarballs or clones of our Git repositories
to recreate the same files we would release, but if you choose to do
neither of those you're likely to miss something.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Brian Thompson
On Fri, Jun 25, 2021 at 07:01:39PM -0400, Nicholas D Steeves wrote:
> Does PyPi provide immutable releases?

From experience, I can tell you that yes, releases cannot be overwritten,
but they can be "yanked".  Pypi states that a yanked release is:

  "A release that is always ignored by an installer, unless it is the
  only release that matches a version specifier (using either '==' or
  '===)."

-- 
Best regards,

Brian T


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Paul Wise
On Fri, Jun 25, 2021 at 11:42 PM Jeremy Stanley wrote:

> 1. Cryptographically signed tags in a Git repository, with
>versioning, revision history, release notes and authorship either
>embedded within or tied to the Git metadata.
>
> 2. Cryptographically signed tarballs of the file tree corresponding
>to a tag in the Git repository, with versioning, revision
>history, release notes and authorship extracted into files
>included directly within the tarball.

I would like to see #2 split into two separate tarballs, one for the
exact copy of the git tree and one containing the data about the other
tarball. Then use dpkg-source v3 secondary tarballs to add the data
about the git repo to the Debian source package.

> Saying that a raw dump of the file content from a revision control
> system is recommended over using upstream's sdists presumes all
> upstreams are the same. They're not, and which is preferable (or
> doable, or even legal) differs from one to another. Just because
> some sdists, or even many, are not suitable as a basis for packaging
> doesn't mean that sdists are a bad idea to base packages on. Yes,
> basing packages on bad sdists is bad, it's hard to disagree with
> that.

Probably we should start systematically comparing upstream VCS repos
with upstream sdists and reacting to the differences. So far, I've
reacted by ignoring the sdists completely.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Paul Wise
On Fri, Jun 25, 2021 at 9:17 PM Jeremy Stanley wrote:

> The proposal is somewhat akin to saying that a
> tarball created via `make dist` is unsuitable for packaging.

This is definitely true; they generally contain generated files
(configure, Makefile.in) and embedded code copies (missing install-sh
depcomp config.sub config.guess etc), neither of which should be part
of the "source".

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Paul Wise
On Fri, Jun 25, 2021 at 8:49 PM Nicholas D Steeves wrote:

> I feel like there is probably consensus against the use of PyPi-provided
> upstream source tarballs in preference for what will usually be a GitHub
> release tarball

I think this should be a Debian-wide default and documented in Debian
Policy. I plan to bring this up more widely after the bullseye
release.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Nicholas D Steeves
Hi Jeremy!

Wow, you've given me a lot to think about.  Thank you :-)

Yes, I agree with you that my MR doesn't adequately address the much
more heterogeneous reality. (and is also indelicate, lacks nuance, etc)
I'll take a day or two to think about this, and also to take into
account what everyone else has written before making revisions for v2.
If I do it right now my work won't be rigorous enough, nor fair to the
contributions others have made to this thread.  If nothing else it will
require careful outlining...

Maybe the heading should be:

How to choose a source for the tarball (and why this can be difficult)
==

;-)

Take care,
Nicholas


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Jeremy Stanley
On 2021-06-25 19:01:39 -0400 (-0400), Nicholas D Steeves wrote:
[...]
> And yes, I agree moderate is better, but I must sadly confess
> ignorance to the technical reasons why PyPI is sometimes more
> appropriate. Without technical reasons it seems like a case of
> ideological compromise (based on the standards I've been mentored
> to and the feedback I've received over the years).

Hopefully my other replies here and in Salsa have provided some
fairly large counterexamples for you. If those still aren't entirely
clear, I'm happy to go into deeper detail or broaden to related
examples elsewhere in the ecosystem.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Jeremy Stanley
On 2021-06-25 18:29:19 -0400 (-0400), Nicholas D Steeves wrote:
> A recommendation is non-binding, and the intent of this proposal is to
> say that the most "sourceful" form of source is the *most* suitable for
> Debian packages.  The inverse of this is that `make dist` is less
> suitable for Debian packages.  Neither formulation of this premise
> applies to a scope outside of Debian.  In other words, just because a
> particular form of source packaging and distribution is not considered
> ideal in Debian does not in any comment on its suitability for other
> purposes.  Would you prefer to see a note like "PyPi is a good thing for
> the Python ecosystem, but sdists are not the preferred form of Debian
> source tarballs"?

To reset this discussion, take the case of an upstream like the one
I'm involved with. For each project, two forms of source release are
made available:

1. Cryptographically signed tags in a Git repository, with
   versioning, revision history, release notes and authorship either
   embedded within or tied to the Git metadata.

2. Cryptographically signed tarballs of the file tree corresponding
   to a tag in the Git repository, with versioning, revision
   history, release notes and authorship extracted into files
   included directly within the tarball.

If some alternative mechanism is used to grab only the work tree
from a checkout of the Git repository, critical information about
the software is lost, making it uninstallable in some cases (can't
figure out its own version), or even illegal to redistribute
(missing authors list referenced from the copyright license).

So in this case you have a few options: package from upstream's Git
repository, package from upstream's "release tarball" (which happens
to be in Python sdist format because the egg-info is used to hold
information extracted from their Git metadata), or use something
which is neither of those and then have to rely on one of them
anyway to supply the missing bits.

> It's also worth mentioning that upstream's "official release"
> preference is not necessarily relevant to a Debian context.  Take
> for example the case where upstream exclusively supports a Flatpak
> and/or Snap package...
[...]

The problem is that you seem to want to talk in absolutes. Sure some
(I'll wager many) Python projects can be reasonably packaged from a
flat dump of the file content in their revision control. There are
many which can't. Sure some upstreams may only want to release
Flatpaks or Snaps, or may even be openly hostile to getting packaged
in distributions at all. There are also quite a few which don't host
their revision control in platforms which provide raw tarball
exports generated on the fly. Some sdist tarballs leave out files, I
agree, but they don't have to (ours don't, we only add more in order
to supply the exported revision control metadata).

Saying that a raw dump of the file content from a revision control
system is recommended over using upstream's sdists presumes all
upstreams are the same. They're not, and which is preferable (or
doable, or even legal) differs from one to another. Just because
some sdists, or even many, are not suitable as a basis for packaging
doesn't mean that sdists are a bad idea to base packages on. Yes,
basing packages on bad sdists is bad, it's hard to disagree with
that.

> Thinking about an ideal solution, and the interesting PBR case, I
> remember that gbp is supposed to be able to associate gbp tags with
> upstream commits (or possibly tags), so maybe it's also possible to do
> this:
> 
> 1. When gbp import-orig finds a new release
> 2. Fetch upstream remote as well
> 3. Run PBR against the upstream release tag
> 4. Stage this[ese] file[s]
> 5. Either append them to the upstream tarball before committing to the
>pristine-tar branch, or generate the upstream tarball from the
>upstream branch (intent being that the upstream branch's HEAD should
>be identical to the contents of that tarball)
> 6. Gbp creates upstream/x.y tag
> 7. Gbp merges to Debian packaging branch.

You'll either need a copy of the upstream Git repository or at least
some of the files generated from that repository's metadata which
has been embedded in the release tarball. I understand the desire to
not put files into Debian source packages which can be generated at
package build time from other files in Debian, but when those files
can't be generated without the presence of the Git repository itself
which *isn't* files in Debian, using the generated copies supplied
(and signed!) by upstream seems no different than many other sorts
of data which get shipped in Debian source packages.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Nicholas D Steeves
Hi Simon,

Simon McVittie  writes:

> On Fri, 25 Jun 2021 at 16:42:42 -0400, Nicholas D Steeves wrote:
>> I feel like there is probably consensus against the use of PyPi-provided
>> upstream source tarballs in preference for what will usually be a GitHub
>> release tarball
>
> This is not really consistent with what devref says:
>
> The defining characteristic of a pristine source tarball is that the
> .orig.tar.{gz,bz2,xz} file is byte-for-byte identical to a tarball
> officially distributed by the upstream author
>
> — 
> https://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.en.html#best-practices-for-orig-tar-gz-bz2-xz-files
>
> Sites like Github and Gitlab that generate tarballs from git contents
> don't (can't?) guarantee that the exported tarball will never change -

I agree 100%

> I'm fairly sure `git archive` doesn't try to make that guarantee - so it
> seems hard to say that the official source code release artifact is always
> the one that appears as a side-effect of the upstream project's git hosting
> platform.
>

Also agreed 100%.  This line of inquiry is actually why I think using
upstream tags is best, but even then it's possible upstream will delete
the tag and push a new one.  Does PyPi provide immutable releases?  If
so, yes, I agree there's a strong argument to be made for using PyPi vis
à vis DevRef within a DPT context where upstream git tags (and history)
are not merged :-)

> That doesn't *necessarily* mean that the equivalent of a `git archive`
> is always the wrong thing (and indeed there are a lot of packages where
> it's the only reasonably easily-obtained thing that is suitable for our
> requirememnts), but I don't think it's as simple or clear-cut as you
> are implying.
>

Also agreed 100%, but I've learned people often look at comprehensive
proposals as tldr, so I wanted to try a discussion-based approach ;-)

> devref also says:
>
> A repackaged .orig.tar.{gz,bz2,xz} ... should, except where impossible
> for legal reasons, preserve the entire building and portablility
> infrastructure provided by the upstream author. For example, it is
> not a sufficient reason for omitting a file that it is used only
> when building on MS-DOS. Similarly, a Makefile provided by upstream
> should not be omitted even if the first thing your debian/rules does
> is to overwrite it by running a configure script.
>
> I think devref goes too far on this - for projects where the official
> upstream release artifact contains a significant amount of content we
> don't want (convenience copies, portability glue, generated files, etc.),
> checking the legal status of everything can end up being more work than
> the actual packaging, and that's work that isn't improving the quality of
> our operating system (which is, after all, the point).
>

I agree, and will support a proposal to modify DefRef to this end,
because as far as I know the source tarballs in our archive aren't part
of a secondary project to archive upstream tarballs as-released (eg: a
kind of "ark" or source-bank, like a seed-bank, for DFSG software)...but
maybe that is a secondary objective?

> However, PyPI sdist archives are (at least in some cases) upstream's
> official source code release artifact, so I think a blanket recommendation
> that we ignore them probably goes too far in the other direction.
>
> I'd prefer to mention both options and have "use your best judgement,
> like you have to do for every other aspect of the packaging" as a
> recommendation :-)
>

So far the text I've been able to come up with to address this is
something like:

In some cases PyPI sdist archives may be the most appropriate
upstream source tarball (then your "use your best judgement..."
as a conclusion) :-)

It would be really nice to include technical reasons that describe cases
where PyPI is more appropriate, but I don't know any.  My experience in
Debian thus far has been that "what most closely fulfils Debian ideals"
is always preferable to upstream preference.  Yes, that's arguably
insular, but I thought there was consensus on this.

And yes, I agree moderate is better, but I must sadly confess ignorance
to the technical reasons why PyPI is sometimes more appropriate.
Without technical reasons it seems like a case of ideological compromise
(based on the standards I've been mentored to and the feedback I've
received over the years).

Thanks!
Nicholas


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Nicholas D Steeves
Hi Scott,

Scott Talbert  writes:

> On Fri, 25 Jun 2021, Jeremy Stanley wrote:
>
[snip]
> I tend to agree about PyPI being the official releases for a lot of 
> projects.  "GitHub tarballs" also tend to include other undesirable stuff 
> for distribution like upstream CI/CD configuration files, etc.
>

Would you please expand on "etc"?  It seems like it would be reasonable
to exclude CI/CD files via the watch file for the similar reasons to
excluding an upstream-provided debian/ subdir.

Thanks!
Nicholas


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Nicholas D Steeves
Hi Jeremy,

Thank you for your comments, reply follows inline:

Jeremy Stanley  writes:

> On 2021-06-25 16:42:42 -0400 (-0400), Nicholas D Steeves wrote:
>> I feel like there is probably consensus against the use of PyPi-provided
>> upstream source tarballs in preference for what will usually be a GitHub
>> release tarball, so I made an MR to this effect (moderate recommendation
>> rather than a "must" directive):
>> 
>>   
>> https://salsa.debian.org/python-team/tools/python-modules/-/merge_requests/16
>> 
>> Comments, corrections, requests for additional information, and
>> objections welcome :-)  I'm also curious if there isn't consensus by
>> this point and if it requires further discussion
>
> I work on a vast ecosystem of Python-based projects which consider
> the sdist tarballs they upload to PyPI to be their official release
> tarballs, because they encode information otherwise only available
> in revision control metadata (version information, change history,
> copyright holders). The proposal is somewhat akin to saying that a
> tarball created via `make dist` is unsuitable for packaging.
>

A recommendation is non-binding, and the intent of this proposal is to
say that the most "sourceful" form of source is the *most* suitable for
Debian packages.  The inverse of this is that `make dist` is less
suitable for Debian packages.  Neither formulation of this premise
applies to a scope outside of Debian.  In other words, just because a
particular form of source packaging and distribution is not considered
ideal in Debian does not in any comment on its suitability for other
purposes.  Would you prefer to see a note like "PyPi is a good thing for
the Python ecosystem, but sdists are not the preferred form of Debian
source tarballs"?

It's also worth mentioning that upstream's "official release" preference
is not necessarily relevant to a Debian context.  Take for example the
case where upstream exclusively supports a Flatpak and/or Snap
package...

> "GitHub tarballs" (aside from striking me as a blatant endorsement
> of a wholly non-free software platform) lack this metadata, being
> only a copy of the file contents from source control while missing
> other relevant context Git would normally provide.

"GitHub [and Gitlab!] tarballs" are fairly well understood, and it takes
fewer words to talk about them than to write about integrating a merging
or rebasing tag-based workflow (possibly with excluded files with a
merge driver) in a team that has standardised on git-buildpackage.  I
might have out-of-date info, btw.  Would it still upset the DSA if DPT
packages' watch files polled using the lightweight git driver?  I also
prefer to have upstream git history :-)

Thinking about an ideal solution, and the interesting PBR case, I
remember that gbp is supposed to be able to associate gbp tags with
upstream commits (or possibly tags), so maybe it's also possible to do
this:

1. When gbp import-orig finds a new release
2. Fetch upstream remote as well
3. Run PBR against the upstream release tag
4. Stage this[ese] file[s]
5. Either append them to the upstream tarball before committing to the
   pristine-tar branch, or generate the upstream tarball from the
   upstream branch (intent being that the upstream branch's HEAD should
   be identical to the contents of that tarball)
6. Gbp creates upstream/x.y tag
7. Gbp merges to Debian packaging branch.

Cheers,
Nicholas


signature.asc
Description: PGP signature


Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Simon McVittie
On Fri, 25 Jun 2021 at 16:42:42 -0400, Nicholas D Steeves wrote:
> I feel like there is probably consensus against the use of PyPi-provided
> upstream source tarballs in preference for what will usually be a GitHub
> release tarball

This is not really consistent with what devref says:

The defining characteristic of a pristine source tarball is that the
.orig.tar.{gz,bz2,xz} file is byte-for-byte identical to a tarball
officially distributed by the upstream author

— 
https://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.en.html#best-practices-for-orig-tar-gz-bz2-xz-files

Sites like Github and Gitlab that generate tarballs from git contents
don't (can't?) guarantee that the exported tarball will never change -
I'm fairly sure `git archive` doesn't try to make that guarantee - so it
seems hard to say that the official source code release artifact is always
the one that appears as a side-effect of the upstream project's git hosting
platform.

That doesn't *necessarily* mean that the equivalent of a `git archive`
is always the wrong thing (and indeed there are a lot of packages where
it's the only reasonably easily-obtained thing that is suitable for our
requirememnts), but I don't think it's as simple or clear-cut as you
are implying.

devref also says:

A repackaged .orig.tar.{gz,bz2,xz} ... should, except where impossible
for legal reasons, preserve the entire building and portablility
infrastructure provided by the upstream author. For example, it is
not a sufficient reason for omitting a file that it is used only
when building on MS-DOS. Similarly, a Makefile provided by upstream
should not be omitted even if the first thing your debian/rules does
is to overwrite it by running a configure script.

I think devref goes too far on this - for projects where the official
upstream release artifact contains a significant amount of content we
don't want (convenience copies, portability glue, generated files, etc.),
checking the legal status of everything can end up being more work than
the actual packaging, and that's work that isn't improving the quality of
our operating system (which is, after all, the point).

However, PyPI sdist archives are (at least in some cases) upstream's
official source code release artifact, so I think a blanket recommendation
that we ignore them probably goes too far in the other direction.

I'd prefer to mention both options and have "use your best judgement,
like you have to do for every other aspect of the packaging" as a
recommendation :-)

smcv



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Scott Talbert

On Fri, 25 Jun 2021, Jeremy Stanley wrote:


I feel like there is probably consensus against the use of PyPi-provided
upstream source tarballs in preference for what will usually be a GitHub
release tarball, so I made an MR to this effect (moderate recommendation
rather than a "must" directive):

  https://salsa.debian.org/python-team/tools/python-modules/-/merge_requests/16

Comments, corrections, requests for additional information, and
objections welcome :-)  I'm also curious if there isn't consensus by
this point and if it requires further discussion


I work on a vast ecosystem of Python-based projects which consider
the sdist tarballs they upload to PyPI to be their official release
tarballs, because they encode information otherwise only available
in revision control metadata (version information, change history,
copyright holders). The proposal is somewhat akin to saying that a
tarball created via `make dist` is unsuitable for packaging.

"GitHub tarballs" (aside from striking me as a blatant endorsement
of a wholly non-free software platform) lack this metadata, being
only a copy of the file contents from source control while missing
other relevant context Git would normally provide.


I tend to agree about PyPI being the official releases for a lot of 
projects.  "GitHub tarballs" also tend to include other undesirable stuff 
for distribution like upstream CI/CD configuration files, etc.


Scott



Re: [RFC] DPT Policy: Canonise recommendation against PyPi-provided upstream source tarballs

2021-06-25 Thread Jeremy Stanley
On 2021-06-25 16:42:42 -0400 (-0400), Nicholas D Steeves wrote:
> I feel like there is probably consensus against the use of PyPi-provided
> upstream source tarballs in preference for what will usually be a GitHub
> release tarball, so I made an MR to this effect (moderate recommendation
> rather than a "must" directive):
> 
>   
> https://salsa.debian.org/python-team/tools/python-modules/-/merge_requests/16
> 
> Comments, corrections, requests for additional information, and
> objections welcome :-)  I'm also curious if there isn't consensus by
> this point and if it requires further discussion

I work on a vast ecosystem of Python-based projects which consider
the sdist tarballs they upload to PyPI to be their official release
tarballs, because they encode information otherwise only available
in revision control metadata (version information, change history,
copyright holders). The proposal is somewhat akin to saying that a
tarball created via `make dist` is unsuitable for packaging.

"GitHub tarballs" (aside from striking me as a blatant endorsement
of a wholly non-free software platform) lack this metadata, being
only a copy of the file contents from source control while missing
other relevant context Git would normally provide.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature