Re: Package dependency versions and consistency

2020-12-30 Thread Adrian Bunk
On Wed, Dec 30, 2020 at 12:47:04PM +0100, Philipp Kern wrote:
>...
> I would have liked to make the ability to binNMU more accessible
> (similar to the give-back self-service), however I'm now somewhat
> convinced that we need no change source-only uploads, preferably
> performed centrally by dak.
>...

Are you talking about stable or unstable?

In unstable I do see the point of changing binNMUs to be source-only 
uploads similar to what Ubuntu does.

Security support for static-only ecosystems would require rebuilding
the whole ecosystem in stable on a regular basis, could you imagine
a Haskell transition in stable?

> Kind regards
> Philipp Kern
>...

cu
Adrian



Re: Package dependency versions and consistency

2020-12-30 Thread Philipp Kern
On 29.12.20 23:39, Josh Triplett wrote:
> API is not ABI, and in many ecosystems (including but not limited to
> Rust), a library is more than just a set of symbol names pointing to
> compiled machine code. For instance, libraries can include generics,
> compile-time code generation, constant functions evaluated at
> compile-time, code run by build scripts, macros, and a variety of other
> things that get processed at build time and can't simply get
> pre-compiled to a symbol in a shared library. It may be possible, in the
> future, to carefully construct something like a shared library using a
> subset ABI, but doing so would have substantial limitations, and would
> not be a general-purpose solution for every library. It *might* be a
> viable solution for specific libraries pre-identified as being
> particularly likely to require updates (e.g. crypto libraries).

Interestingly enough this is also growing discontent in the C++
community around ABI stability holding them back (e.g. [1] but that's by
far not the only such opinion).

I would have liked to make the ability to binNMU more accessible
(similar to the give-back self-service), however I'm now somewhat
convinced that we need no change source-only uploads, preferably
performed centrally by dak. And you need to be able to supply build
ordering constraints.

I do wonder if delta downloads of .debs would really help, though. Even
though individual builds might be reproducible the same is not
necessarily true across independent uploads. So at least for final
applications, with optimizations like LTO, it seems not that useful. For
the lot of small helper packages in between it might help but then the
delay is often to grab that off the mirror's disk and a delta scheme
might only make that worse.

Kind regards
Philipp Kern

[1] https://cor3ntin.github.io/posts/abi/



Re: Package dependency versions and consistency

2020-12-29 Thread Adrian Bunk
On Tue, Dec 29, 2020 at 02:39:04PM -0800, Josh Triplett wrote:
>...
> I've seen and experienced multiple times, in Debian, that it's dangerous
> to start implementing solutions before first ensuring that they will be
> accepted by whoever actually makes the call for what to adopt. Once
> there are multiple acceptable solutions on the table, stepping up and
> implementing one can help settle debate, but first there needs to be
> some indication of which solutions will get accepted.

The number of acceptable solutions on the table so far feels more
like a zero.

> Hence trying to find a solution for one specific problem with the start
> of this thread: it'd be substantially easier to address vendoring, if it
> were at least possible to reliably upload small packages of individual
> libraries, with the expectation that such packages would be accepted
> even if small, un-bundled, and an additional semver-major version.
>...

To me this sounds like repeating old mistakes yet another time.

In the C/C++ ecosystem the biggest problem with both vendoring and 
shipping several semvers of libraries is usually that this is extra
work for our security team.

This xyz_helper problem you gave also sounded like similar problems when 
shipping more than one semver of a C/C++ library.

Decades of experience (relearned gazillion times the hard way)
tell that what works best for a distribution is to ship one
version of a library, and trying to push upstream libraries
to the maturity where API+ABI are long-term stable.

Accumulating more and more ancient semvers of a library just sounds like 
a bad idea, and this is what you get when software can use old semvers 
forever and semver making it easy for libraries to break API all the time.

The last part is important to understand, a real problem is that
1. there is no incentive for library developers to spend the extra work 
   for providing a stable API, and
2. there is no incentive for library users to spend the extra work for
   following upstream semver changes, and
3. there is no incentive for Debian maintainers to spend the extra work
   for pushing upstreams to improve if additional semver-major versions
   are permitted in Debian.

To untangle this mess, there has to be a strict policy that there must 
not be more than one semver of a library in Debian.

cu
Adrian



Re: Package dependency versions and consistency

2020-12-29 Thread Josh Triplett
On Tue, Dec 29, 2020 at 03:19:30PM +0200, Adrian Bunk wrote:
> [...] Rust [...]

I did not bring up Rust, nor was I referring to Rust specifically, nor
am I speaking for either Rust upstream or the work of the Rust team in
Debian. There are *multiple* ecosystems in which the equivalent of
shared-library dynamic linking either doesn't exist or has substantial
drawbacks, including WebAssembly, JavaScript "compiled" bundles (not
related to vendoring), Go (-dynlink is still experimental), Haskell, C++
"header-only" libraries, and various others.

It's possible to get a sample of such cases by looking through the
output of

grep-dctrl -FBinary -sPackage,Binary --pattern=-dev /var/lib/apt/lists/*Sources

for packages that ship a -dev package and don't ship a shared library
package. There are *many* such packages already. We need an actual
solution for rebuilding reverse-dependencies when needed.

And again, static linking is not the problem I was hoping to address
with this thread.

> would first have to implement themselves whatever is needed to provide
> security support.

I've seen and experienced multiple times, in Debian, that it's dangerous
to start implementing solutions before first ensuring that they will be
accepted by whoever actually makes the call for what to adopt. Once
there are multiple acceptable solutions on the table, stepping up and
implementing one can help settle debate, but first there needs to be
some indication of which solutions will get accepted.

Hence trying to find a solution for one specific problem with the start
of this thread: it'd be substantially easier to address vendoring, if it
were at least possible to reliably upload small packages of individual
libraries, with the expectation that such packages would be accepted
even if small, un-bundled, and an additional semver-major version. As
this subthread has entirely diverged from that issue, I don't plan on
responding further here, and I'd like to drop this subthread if at all
possible.

> With semver the API is already guaranteed to be 100% backwards 
> compatible with all previous releases implementing the same semver.
> Why don't you use this guarantee to build a libxyz.so.2 for semver xyz-2?

API is not ABI, and in many ecosystems (including but not limited to
Rust), a library is more than just a set of symbol names pointing to
compiled machine code. For instance, libraries can include generics,
compile-time code generation, constant functions evaluated at
compile-time, code run by build scripts, macros, and a variety of other
things that get processed at build time and can't simply get
pre-compiled to a symbol in a shared library. It may be possible, in the
future, to carefully construct something like a shared library using a
subset ABI, but doing so would have substantial limitations, and would
not be a general-purpose solution for every library. It *might* be a
viable solution for specific libraries pre-identified as being
particularly likely to require updates (e.g. crypto libraries).



Re: Package dependency versions and consistency

2020-12-29 Thread Adrian Bunk
On Mon, Dec 28, 2020 at 03:51:12PM -0800, Josh Triplett wrote:
>...
> 3) Such a patch would require further analysis to determine if other
>changes need to happen in concert to avoid breakage. If abc exposes
>any types from xyz, it may need a major version bump as well; this
>isn't common, but anyone writing such a patch would need to check.
>Furthermore, much more commonly, moving to xyz 3.0.1 requires
>checking if some other dependency uses a different version of xyz and
>expects to interoperate (e.g. passing an xyz::Foo to
>xyz_helper::takes_a_foo won't work if you upgrade your xyz but not
>xyz_helper's xyz, which would typically involve upgrading xyz_helper
>as well).
>...

Note that this is a problem that is created by your proposal.

The simple solution is never having more than one version of xyz
in unstable.

> - Josh Triplett

cu
Adrian



Re: Package dependency versions and consistency

2020-12-29 Thread Adrian Bunk
On Mon, Dec 28, 2020 at 03:51:12PM -0800, Josh Triplett wrote:
> On Mon, Dec 28, 2020 at 03:20:35PM +0200, Adrian Bunk wrote:
> > On Sat, Dec 26, 2020 at 02:55:17PM -0800, Josh Triplett wrote:
>...
> 2) There's not enough benefit to the patch to carry it downstream. This
>is part of the point of this thread: allow transitions to happen in
>the archive and in concert with upstream, rather than before first
>upload or via Debian-specific changes.
>...
> If you need a targeted bugfix in
> the old version, it's possible to publish a new micro-version of the
> package containing just that bugfix.
>...

Don't assume upstream.

The vast majority of packages in a distribution has/had one person as 
upstream. This is a single point of failure, and it fails frequently.

There is software in Debian whose upstream died in the last millenium, 
more common is that upstream just found another hobby or got a new job 
or children.

Sometimes upstream becomes active again after a decade or two when the 
children are older or pension age is reached.

In more scientific areas of the archive it is common that software was 
written for a Bachelor/Masters/PhD thesis or a funded science project 
and abandoned by the author afterwards.

Just a few days ago I pinged upstream regarding a bug in a game in Debian.
The game was written in the 1982 and the version in Debian is from 1992.
Luckily upstream is still alive and is even an active Debian developer 
today so that was easy to resolve, but the far more common case would
have been to resolve the problem in a Debian-specific change.

>...
> > The way a distribution like Debian works, I do not see how security 
> > support would be possible for a static-only ecosystem with a non-trivial 
> > number of packages.
> 
> Cheaper binNMU-style rebuilds, and better incremental downloads. This is
> a solvable problem, once you start from the perspective that it requires
> a solution.

There is a huge amount of arrogance in the Rust community when it comes 
to taking any responsibility for maintainability or security.

This includes the Debian Rust Maintainers, who do not feel responsible 
to maintain whatever they dumped into Debian stable releases.

The root problem is that we are letting people get away with such 
antisocial behaviour, and suggestions for solutions would be less 
lunatic if the people who want the Rust ecosystem in Debian would first 
have to implement themselves whatever is needed to provide security 
support.

I do not have the time and energy to attempt getting the whole Rust 
ecosystem removed from bullseye, but short-term this would be the
only option to protect our users from the insecure Rust ecosystem.

If anyone would care about security of software written in Rust shipped 
in distributions and wanted to implement a solution, my first question 
would be:

With semver the API is already guaranteed to be 100% backwards 
compatible with all previous releases implementing the same semver.
Why don't you use this guarantee to build a libxyz.so.2 for semver xyz-2?

> - Josh Triplett

cu
Adrian



Re: Package dependency versions and consistency

2020-12-28 Thread Josh Triplett
Simon McVittie wrote:
> On Sat, 26 Dec 2020 at 14:55:17 -0800, Josh Triplett wrote:
> > I'm talking about packaging xyz 1.3.1 and 2.0.1, as separate xyz-1 and
> > xyz-2 packages, and allowing the use of both in build dependencies.
>
> This is not all that rare even for C/C++ code, as exemplified by
> GTK and other libraries that follow the principles described in
>  (written by a then-maintainer of
> GTK during the GTK 1.2 to GTK 2 transition). A few examples: GTK, SDL,
> libpcre, Allegro, Boost, LLVM, libftdi, libfuse, mozjs; and historically
> we also had this for GStreamer, Qt, OpenSSL and WebKitGTK among others.
>
> We try to minimize the number of parallel-installable versions of a
> particular library because maintainers and the security team can only
> parallelize so far, but the extent to which we can do this depends on
> balancing several factors, including the effort required to maintain those
> versions, the extent to which they are exposed at security boundaries,
> the extent of the incompatibilities, the effort needed and available
> upstream or downstream to port all dependent libraries and programs
> to the currently-recommended version, and the extent to which we could
> accept losing packages still dependent on the old version from Debian.
> For example, the Qt/KDE team were able to drop Qt 4 from Debian between
> Debian 10 and 11, but GTK 2 certainly can't get the same treatment until
> there is a stable release of GIMP that uses GTK 3 or later.
> 
> I can see that in language ecosystems that encourage more/smaller packages
> than C/C++, or that break API more frequently, we have a scaling problem:
> GTK has historically had a new major version about once a decade, which
> doesn't represent too many trips through NEW even if it speeds up by
> an order of magnitude, but the library ecosystems we're discussing on
> this thread are presumably expected to break API/ABI rather more often
> than that. (To an extent, typical C/C++ tooling accidentally protects
> us from this, by making it so painful to break API/ABI that maintainers
> go to great lengths to avoid it.)

Exactly; complete agreement with everything you've written here. Even
with as painful as it is with C libraries, we've still done it. But with
some other languages, which make it substantially less painful, current
practice (NEW, and package rejections) nonetheless pushes back much more
strongly on co-installable packages.

> However, if a language has a way to ask for a library by its (name,
> major version) pair, it would seem sensible for any Debian-specific
> machinery around it to usually map (name, major version) pairs (as
> opposed to bare names) to installation filenames/directories and
> Debian package names.

This is exactly what I'm proposing (along with dropping the aversion to
small, separately-packaged libraries, and the pressure to bundle them
together in fewer source or binary packages). If the packages would
otherwise be co-installable, it should be possible to upload multiple
major versions as separate packages, and we should leave it up to the
judgment of the package maintainer whether it's appropriate for multiple
versions to coexist or not. Policy should provide guidance, in the form
of the many factors you mentioned above, and we can still *encourage*
developers to reduce proliferation by porting software to newer versions
(not to older versions).

We also need to reduce the overhead of NEW for both package maintainers
and package reviewers, at least for the case where the primary reason
the package hits NEW is that the major version (and thus the package
name) has changed. That includes the case where the new major version
has a corresponding new source package, but the source package is
derived from the previous source package. A package in the archive can
always have an RC bug filed on it later, if an issue arises.

- Josh Triplett



Re: Package dependency versions and consistency

2020-12-28 Thread Josh Triplett
On Mon, Dec 28, 2020 at 03:20:35PM +0200, Adrian Bunk wrote:
> On Sat, Dec 26, 2020 at 02:55:17PM -0800, Josh Triplett wrote:
> >...
> > If you want to package abc version 1.2.3, and among many other things,
> > abc depends on xyz version 2.1.4, and xyz has a new version 3.0.1 now,
> > it makes sense to work with the upstream of abc, sending them a patch to
> > migrate to the new version, and waiting for abc 1.2.4 to come out with
> > that update. It *doesn't* make sense to maintain a downstream Debian
> > patch to make abc work with the newer xyz.
> 
> Maintaining a backported patch is usually very cheap,

1) Writing, testing, and maintaining 200 backported patches in order to
   upload a package is not. And on average, doing 200 instances of
   something that is "usually" easy means encountering several
   exceptions that are not easy. Doing 200 instances of a task that's
   *usually* easy also makes mistakes more likely.
2) There's not enough benefit to the patch to carry it downstream. This
   is part of the point of this thread: allow transitions to happen in
   the archive and in concert with upstream, rather than before first
   upload or via Debian-specific changes.
3) Such a patch would require further analysis to determine if other
   changes need to happen in concert to avoid breakage. If abc exposes
   any types from xyz, it may need a major version bump as well; this
   isn't common, but anyone writing such a patch would need to check.
   Furthermore, much more commonly, moving to xyz 3.0.1 requires
   checking if some other dependency uses a different version of xyz and
   expects to interoperate (e.g. passing an xyz::Foo to
   xyz_helper::takes_a_foo won't work if you upgrade your xyz but not
   xyz_helper's xyz, which would typically involve upgrading xyz_helper
   as well).

Note that all of this only applies when talking about a major version
change. For a minor version change, it suffices to simply rebuild abc
once a new version of xyz is uploaded, and abc will pick up the new
version of xyz.

It might also make sense to work with the Semantic Versioning standard
and ecosystems using semver to create a mechanism for specifying a
potentially unbounded number of "downstream revision" numbers, which
would make it substantially safer to make changes downstream when
absolutely necessary. That still doesn't mean we should do so at every
possible opportunity, only when there's substantial benefit to doing so
that outweighs the drawbacks of doing so. The right place for fixes is
*always* upstream; Debian-specific patches are always technical debt.

> > abc can just build-depend on
> > xyz-2, and a later version of abc can build-depend on xyz-3. That isn't
> > a reflection of complexity in xyz, or in abc.
> 
> It is usually a reflection of very poor API design in xyz.

No, it isn't. It's a reflection that APIs do not become utterly
immutable when published, and that it's acceptable to fix and evolve
designs rather than working around them at all costs.  If you want the
old API, the old version still exists. If you need a targeted bugfix in
the old version, it's possible to publish a new micro-version of the
package containing just that bugfix. This is not a slippery slope, where
major versions either never change, or change daily, or cause the amount
of pain Python 3 did when they change. There are far more points in the
spectrum than that.

More to the point, Debian does not control upstream, making this entire
line of discussion moot. Upstream packages in ecosystems that use semver
will continue to do so, and some of them will *occasionally* bump major
versions. It doesn't help to have an argument over whether Debian is
always right or other ecosystems are always right or (as is typically
the case) there exists nuance here. That argument won't turn out any
differently than all the previous iterations of that argument, and at
the end of the day, the problem of how to handle packaging will be no
closer to a solution.

You don't have to work on that solution. But the point of this thread is
to seek solutions, not to complain about how it'd be easier if new
software acted more like existing software so we didn't have to develop
new ways to handle the new software.

> >...
> > By contrast with that, security support may not be nearly as much of an
> > issue. The *majority* of libraries in Debian don't require any security
> > updates at all.
> 
> My basic assumption would be that any code that might handle untrusted 
> input is only one security audit away from a CVE.

My statement stands: the vast majority of libraries in Debian don't have
any security updates.

Also, let's make it easier to package code written in languages where
that's less of a problem, where every single piece of code handling a
string or a memory allocation isn't a security bug waiting to happen.

> >...
> > I'm not talking about packaging xyz 1.2.3, 1.2.4, 1.3.1, and 2.0.1. When
> > xyz 1.3.1 is uploaded, it can safely 

Re: Package dependency versions and consistency

2020-12-28 Thread Adrian Bunk
On Sat, Dec 26, 2020 at 02:55:17PM -0800, Josh Triplett wrote:
>...
> If you want to package abc version 1.2.3, and among many other things,
> abc depends on xyz version 2.1.4, and xyz has a new version 3.0.1 now,
> it makes sense to work with the upstream of abc, sending them a patch to
> migrate to the new version, and waiting for abc 1.2.4 to come out with
> that update. It *doesn't* make sense to maintain a downstream Debian
> patch to make abc work with the newer xyz.

Maintaining a backported patch is usually very cheap,
it can just be dropped when updating to abc >= 1.2.4.

> abc can just build-depend on
> xyz-2, and a later version of abc can build-depend on xyz-3. That isn't
> a reflection of complexity in xyz, or in abc.

It is usually a reflection of very poor API design in xyz.

The whole semver approach in general seems to discourage proper API 
design and encourage breaking changes all the time.

Python 2 and Python 3 also have a semver mechanism, and while the semver 
mechanism permitted a transition over more than a decade it did not 
reduce the enormous amount of work for eventually migrating most code 
from the old version to the new version.

The Python ecosystem has learned the hard way how horrible breaking 
changes can be even when there is a semver mechanism, and noone would
dare to suggest a similar amount of breakage for Python 4.

>...
> By contrast with that, security support may not be nearly as much of an
> issue. The *majority* of libraries in Debian don't require any security
> updates at all.

My basic assumption would be that any code that might handle untrusted 
input is only one security audit away from a CVE.

>...
> I'm not talking about packaging xyz 1.2.3, 1.2.4, 1.3.1, and 2.0.1. When
> xyz 1.3.1 is uploaded, it can safely replace 1.2.4,

In stable this is not safe.

How can you ensure that upgrading from 1.2.4 to 1.3.1 would not cause
any regressions?

Security updates get automatically installed to production environments
and deployed devices of users.

If xyz 1.2.4 in stable has a CVE that is fixed in 1.3.1, to minimize
the risk of regressions the usual approach is to do the minimal fix
of applying the CVE fix only to 1.2.4.

> and packages using xyz 1.2.4 can get rebuilt via binNMU if needed.

This is also a huge problem.

The way a distribution like Debian works, I do not see how security 
support would be possible for a static-only ecosystem with a non-trivial 
number of packages.

Imagine the C/C++ ecosystem would also be an ecosystem without shared
libraries.

Currently there are over 30k source packages in bullseye that build 
architecture-specific binary packages for amd64.
Every security fix for glibc would require rebuilding more than 30k 
packages for all release architectures in stable.

A normal user has over 1k architecture-specific binary packages installed.
Every security fix for glibc would require every user to download 
and upgrade more than 1k packages.

>...
> - Josh Triplett

cu
Adrian



Re: Package dependency versions and consistency

2020-12-26 Thread Simon McVittie
On Sat, 26 Dec 2020 at 14:55:17 -0800, Josh Triplett wrote:
> I'm talking about packaging xyz 1.3.1 and 2.0.1, as separate xyz-1 and
> xyz-2 packages, and allowing the use of both in build dependencies.

This is not all that rare even for C/C++ code, as exemplified by
GTK and other libraries that follow the principles described in
 (written by a then-maintainer of
GTK during the GTK 1.2 to GTK 2 transition). A few examples: GTK, SDL,
libpcre, Allegro, Boost, LLVM, libftdi, libfuse, mozjs; and historically
we also had this for GStreamer, Qt, OpenSSL and WebKitGTK among others.

We try to minimize the number of parallel-installable versions of a
particular library because maintainers and the security team can only
parallelize so far, but the extent to which we can do this depends on
balancing several factors, including the effort required to maintain those
versions, the extent to which they are exposed at security boundaries,
the extent of the incompatibilities, the effort needed and available
upstream or downstream to port all dependent libraries and programs
to the currently-recommended version, and the extent to which we could
accept losing packages still dependent on the old version from Debian.
For example, the Qt/KDE team were able to drop Qt 4 from Debian between
Debian 10 and 11, but GTK 2 certainly can't get the same treatment until
there is a stable release of GIMP that uses GTK 3 or later.

I can see that in language ecosystems that encourage more/smaller packages
than C/C++, or that break API more frequently, we have a scaling problem:
GTK has historically had a new major version about once a decade, which
doesn't represent too many trips through NEW even if it speeds up by
an order of magnitude, but the library ecosystems we're discussing on
this thread are presumably expected to break API/ABI rather more often
than that. (To an extent, typical C/C++ tooling accidentally protects
us from this, by making it so painful to break API/ABI that maintainers
go to great lengths to avoid it.)

Some non-C languages (notably Perl and Python) have a single system-wide
search path and cannot easily accommodate more than one major version
unless they have different names (libdancer-perl, libdancer2-perl;
python3-boto, python3-boto3). However, if a language has a way to ask
for a library by its (name, major version) pair, it would seem sensible
for any Debian-specific machinery around it to usually map
(name, major version) pairs (as opposed to bare names) to installation
filenames/directories and Debian package names.

The GObject-Introspection bindings for GLib-based libraries like GTK
are another example of this, if you look at them from an appropriate angle:
for example, the gir1.2-gtk-3.0 package contains Gtk-3.0.typelib, which
is installable in parallel with Gtk-2.0.typelib and Gtk-4.0.typelib.

smcv



Re: Package dependency versions and consistency

2020-12-26 Thread Josh Triplett
Adrian Bunk wrote:
> On Fri, Dec 18, 2020 at 04:25:19PM -0800, Josh Triplett wrote:
> >...
> > I'm not suggesting there should be 50 versions of a given
> > library in the archive, but allowing 2-4 versions would greatly simplify
> > packaging, and would allow such unification efforts to take place
> > incrementally, via transitions *in the archive* and *in collaboration
> > with upstream*, rather than *all at once before a new package can be
> > uploaded*.
> > 
> > (I also *completely* understand pushing back on having 2-4 versions of
> > something like OpenSSL; that'd be a huge maintenance and security
> > burden. That doesn't mean we couldn't have 2-4 semver-major versions of
> > a library to emit ANSI color codes, and handle reducing that number via
> > incremental porting in the archive rather than via prohibition in
> > advance.)
> 
> It is important to always remember that the main product we are 
> delivering to our users are our stable releases.

(This is somewhat off-topic, but: I think that Debian stable is *one* of
the main products of Debian, but not by any means the only one. Debian
testing and unstable/experimental are also incredibly valuable. We need
solutions that work for all of those. Those solutions *do* need to work
for stable as well, though, and I'll address the rest of your mail in
that regard.)

> We do have 4 different versions of autoconf in the archive.
> This works because autoconf does not have CVEs.

There's a great deal of software out there with similar properties, most
notably that it doesn't sit at a security boundary. That doesn't just
include build-time code. Also, some types of security vulnerabilities
are rare-to-nonexistent in other ecosystems. A library, written in a
safe language, whose job is to generate ANSI terminal color codes, is
not likely to have security vulnerabilities. It's not critical to force
all packages to move to the latest version of that library immediately,
before they can upload at all.

Bundling *can* make it much more difficult to handle security support,
for a variety of reasons (updating distinct embedded copies, dealing
with more version skew, etc). But in the absence of bundling, if the
*only* issue is that there may be 2-4 semver-major versions in the
archive, I'd expect the process to be roughly "upload new versions of
those packages, trigger rebuilds of dependencies". On balance, I
wouldn't expect substantial scaling issues with the former. The *latter*
would be where we may need some tooling improvements, for ecosystems
that do the equivalent of static linking or library bundling at build
time and ship a compiled artifact in their binary package.

> If a library is so complex that your "unification efforts in 
> collaboration with upstream" would apply, chances are there
> will be CVEs if anyone does a security audit of the code.

I'm not talking about complexity of an individual library; that's not
the primary issue here. I'm talking about quantity. If your package has
300 dependencies, most of which are relatively small, focused,
self-contained libraries, the "collaboration with upstream" part is
about collaboration with the upstream of your package, not the upstreams
of the dependencies.

If you want to package abc version 1.2.3, and among many other things,
abc depends on xyz version 2.1.4, and xyz has a new version 3.0.1 now,
it makes sense to work with the upstream of abc, sending them a patch to
migrate to the new version, and waiting for abc 1.2.4 to come out with
that update. It *doesn't* make sense to maintain a downstream Debian
patch to make abc work with the newer xyz. abc can just build-depend on
xyz-2, and a later version of abc can build-depend on xyz-3. That isn't
a reflection of complexity in xyz, or in abc.

Also, sometimes those dependencies are indirect through other
dependencies, and to transition forward, you may want to move multiple
dependencies forward in concert, for compatibility reasons or just to
minimize duplication within one application.

> > I think much of our resistance to allowing 2-4 distinct semver-major
> > versions of a given library comes down to ELF shared libraries making it
> > painful to have two versions of a library with distinct SONAMEs loaded
> > at once, and while that can be worked around with symbol versioning,
> > we've collectively experienced enough pain in such cases that we're
> > hesitant to encourage it. Our policies have done a fair bit to mitigate
> > that pain. But much of that pain is specific to ELF shared libraries and
> > similar.
> 
> No, the only real pain is providing security support.

Debian has gone through many library transitions that have incurred
substantial pain, including those where a lack of symbol versioning
resulted in serious issues if two versions of the same library ended up
in the same address space. That's in addition to the normal pain of
library transitions, and in addition to all the *infrastructure* that
Debian has built up around library 

Re: Package dependency versions and consistency

2020-12-24 Thread Paul Wise
On Tue, Dec 22, 2020 at 10:24 PM Adrian Bunk wrote:

> To me it always feels as if these ecosystems are not interested in
> providing any support for that.

NPM at least provides security advisories. I used to try syncing those
to the Debian sectracker but don't bother now as it is too much work
to do manually and I don't think the Debian secteam want automatic
importing of non-CVE security issue databases.

https://www.npmjs.com/advisories

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: Package dependency versions and consistency

2020-12-22 Thread Adrian Bunk
On Fri, Dec 18, 2020 at 04:25:19PM -0800, Josh Triplett wrote:
>...
> I'm not suggesting there should be 50 versions of a given
> library in the archive, but allowing 2-4 versions would greatly simplify
> packaging, and would allow such unification efforts to take place
> incrementally, via transitions *in the archive* and *in collaboration
> with upstream*, rather than *all at once before a new package can be
> uploaded*.
> 
> (I also *completely* understand pushing back on having 2-4 versions of
> something like OpenSSL; that'd be a huge maintenance and security
> burden. That doesn't mean we couldn't have 2-4 semver-major versions of
> a library to emit ANSI color codes, and handle reducing that number via
> incremental porting in the archive rather than via prohibition in
> advance.)

It is important to always remember that the main product we are 
delivering to our users are our stable releases.

Right now we are close to the freeze of bullseye.
We will security-support bullseye until mid-2024.

We do have 4 different versions of autoconf in the archive.
This works because autoconf does not have CVEs.

If a library is so complex that your "unification efforts in 
collaboration with upstream" would apply, chances are there
will be CVEs if anyone does a security audit of the code.

> I think much of our resistance to allowing 2-4 distinct semver-major
> versions of a given library comes down to ELF shared libraries making it
> painful to have two versions of a library with distinct SONAMEs loaded
> at once, and while that can be worked around with symbol versioning,
> we've collectively experienced enough pain in such cases that we're
> hesitant to encourage it. Our policies have done a fair bit to mitigate
> that pain. But much of that pain is specific to ELF shared libraries and
> similar.

No, the only real pain is providing security support.

>...
> The
> dependency and library mechanisms of some other ecosystems, are designed
> to support having multiple distinct versions of libraries in the same
> address space, with fully automatic equivalents of symbol versioning.
>...

How can Debian security support packages from such ecosystems?

To meit always feels as if these ecosystems are not interested in 
providing any support for that.

The basic idea behind a distribution like Debian stable or Ubuntu LTS
is to provide one set of packages, which will then stay unchanged for
3 or 5 years except for security fixes.

There are usecases for rolling release distributions,
and there are usecases for stable distributions like Debian.

If there is a CVE in a library that is used by 20 different packages
in 20 different versions, how does the ecosystem help Debian with
applying this CVE fix to all 20 versions with reasonable effort?

> - Josh Triplett

cu
Adrian



Re: Package dependency versions and consistency

2020-12-19 Thread Paul Gevers
Hi,

On 19-12-2020 01:25, Josh Triplett wrote:
> Given all of the above improvements, it'd be much more feasible for
> tooling to help systematically unbundle and package dependencies, and to
> help manage and transition those dependencies in the archive.

Especially in the JavaScript arena, I think there is to gain without
much overhead already in the current way of working. What if packages
where multiple versions are in use would ship multiple versions *in the
same binary package*? In my experience, the maintainer needs to link to
the right directory anyways, if those are semver versioned directories,
it would be clear (with a tiny bit of tooling maybe) which package needs
which version. And if the maintainer of the shipping package wants to
drop some, they could communicate about that. Maybe they could even ship
 a latest or recommended version for those packages where it's not
absolutely important which version they get.

No idea if this idea works for other areas.

Paul



OpenPGP_signature
Description: OpenPGP digital signature


Re: Package dependency versions and consistency

2020-12-19 Thread Tomas Pospisek

On 19.12.20 01:25, Josh Triplett wrote:

Jonas Smedegaard wrote:

Quoting Raphael Hertzog (2020-12-17 13:16:14)

Even if you package everything, you will never ever have the right
combination of version of the various packages.


What is possible to auto-compute is a coarse view of the work needed.

In reality, most Nodejs modules declare too tight versioning for their
dependencies, and in many cases it is adequate that a module is packaged
even if not at the version declared as required.  A concrete example is
"ansi-styles" which is most likely working just fine in version 4.x.


This is not at all as simple as it sounds, even on a small scale, let
alone when multiplied by a few hundred dependencies.

(Let's please not go on the standard tangent into complaints about
the number of dependencies, because at the end of that tangent, people
will still use fine-grained packages and dependencies per the standard
best-practices of those communities, no matter the number or content of
mails in this thread suggesting otherwise. The extremes of "package for
a one-line function" are not the primary issue here; not every
fine-grained dependency is that small, and the issues raised in this
mail still apply whether you have 200 dependencies or 600. So let's take
it as a given that packages *will* have hundreds of library
dependencies, and try to make that more feasible.)

Figuring out whether those dependencies are actually too specific or if
they're required is a substantial amount of work by itself; the
packaging metadata and dependency versions recorded upstream exist to
declare the required version of dependencies, and there isn't typically
a *second* way that upstream records "no, really, there's a reason for
this dependency version requirement". This is hard enough in a
statically typed language, where you can at least have the verification
of seeing if it compiles with the older version (though the package
might be relying on new semantics); with a dynamically typed language,
you might not know that the older version of the dependency has caused a
problem until runtime. As an upstream developer, the safest assumption
when preparing your own dependencies is "well, it works with the version
of the dependency I tested with, and assuming that component correctly
follows semver, it should work with newer semver-compatible versions".

To clarify something: I *don't* believe Debian should compromise on
network access at build time. Debian package dependencies should be
completely self-contained within the Debian archive. The aspect I'm
concerned about here is that Debian pushes hard to force every single
package to use *the same version* of a given dependency, even if the
dependency has multiple incompatible versions (properly declared with
different semver major numbers, equivalent to libraries with different
SONAMEs). I'm not suggesting there should be 50 versions of a given
library in the archive, but allowing 2-4 versions would greatly simplify
packaging, and would allow such unification efforts to take place
incrementally, via transitions *in the archive* and *in collaboration
with upstream*, rather than *all at once before a new package can be
uploaded*.

(I also *completely* understand pushing back on having 2-4 versions of
something like OpenSSL; that'd be a huge maintenance and security
burden. That doesn't mean we couldn't have 2-4 semver-major versions of
a library to emit ANSI color codes, and handle reducing that number via
incremental porting in the archive rather than via prohibition in
advance.)

I think much of our resistance to allowing 2-4 distinct semver-major
versions of a given library comes down to ELF shared libraries making it
painful to have two versions of a library with distinct SONAMEs loaded
at once, and while that can be worked around with symbol versioning,
we've collectively experienced enough pain in such cases that we're
hesitant to encourage it. Our policies have done a fair bit to mitigate
that pain. But much of that pain is specific to ELF shared libraries and
similar. And some of our packaging limitations are built around this
(e.g. "one version of a given package at a time"), which in turn forces
some of those same limitations onto ecosystems that don't share the
problems that motivated those limitations in the first place. The
dependency and library mechanisms of some other ecosystems, are designed
to support having multiple distinct versions of libraries in the same
address space, with fully automatic equivalents of symbol versioning.

In Debian packaging, this issue typically results in one of three
scenarios for every dependency (recursively):

- Trying to port the package to work with older versions of
   dependencies. This incurs all of the burden mentioned above for
   determining if the older dependency actually suffices. On top of that,
   this may involve actual porting of code to not rely on the
   functionality of newer versions, which is very much wasted effort
   

Package dependency versions and consistency

2020-12-18 Thread Josh Triplett
Jonas Smedegaard wrote:
> Quoting Raphael Hertzog (2020-12-17 13:16:14)
> > Even if you package everything, you will never ever have the right
> > combination of version of the various packages.
>
> What is possible to auto-compute is a coarse view of the work needed.
>
> In reality, most Nodejs modules declare too tight versioning for their
> dependencies, and in many cases it is adequate that a module is packaged
> even if not at the version declared as required.  A concrete example is
> "ansi-styles" which is most likely working just fine in version 4.x.

This is not at all as simple as it sounds, even on a small scale, let
alone when multiplied by a few hundred dependencies.

(Let's please not go on the standard tangent into complaints about
the number of dependencies, because at the end of that tangent, people
will still use fine-grained packages and dependencies per the standard
best-practices of those communities, no matter the number or content of
mails in this thread suggesting otherwise. The extremes of "package for
a one-line function" are not the primary issue here; not every
fine-grained dependency is that small, and the issues raised in this
mail still apply whether you have 200 dependencies or 600. So let's take
it as a given that packages *will* have hundreds of library
dependencies, and try to make that more feasible.)

Figuring out whether those dependencies are actually too specific or if
they're required is a substantial amount of work by itself; the
packaging metadata and dependency versions recorded upstream exist to
declare the required version of dependencies, and there isn't typically
a *second* way that upstream records "no, really, there's a reason for
this dependency version requirement". This is hard enough in a
statically typed language, where you can at least have the verification
of seeing if it compiles with the older version (though the package
might be relying on new semantics); with a dynamically typed language,
you might not know that the older version of the dependency has caused a
problem until runtime. As an upstream developer, the safest assumption
when preparing your own dependencies is "well, it works with the version
of the dependency I tested with, and assuming that component correctly
follows semver, it should work with newer semver-compatible versions".

To clarify something: I *don't* believe Debian should compromise on
network access at build time. Debian package dependencies should be
completely self-contained within the Debian archive. The aspect I'm
concerned about here is that Debian pushes hard to force every single
package to use *the same version* of a given dependency, even if the
dependency has multiple incompatible versions (properly declared with
different semver major numbers, equivalent to libraries with different
SONAMEs). I'm not suggesting there should be 50 versions of a given
library in the archive, but allowing 2-4 versions would greatly simplify
packaging, and would allow such unification efforts to take place
incrementally, via transitions *in the archive* and *in collaboration
with upstream*, rather than *all at once before a new package can be
uploaded*.

(I also *completely* understand pushing back on having 2-4 versions of
something like OpenSSL; that'd be a huge maintenance and security
burden. That doesn't mean we couldn't have 2-4 semver-major versions of
a library to emit ANSI color codes, and handle reducing that number via
incremental porting in the archive rather than via prohibition in
advance.)

I think much of our resistance to allowing 2-4 distinct semver-major
versions of a given library comes down to ELF shared libraries making it
painful to have two versions of a library with distinct SONAMEs loaded
at once, and while that can be worked around with symbol versioning,
we've collectively experienced enough pain in such cases that we're
hesitant to encourage it. Our policies have done a fair bit to mitigate
that pain. But much of that pain is specific to ELF shared libraries and
similar. And some of our packaging limitations are built around this
(e.g. "one version of a given package at a time"), which in turn forces
some of those same limitations onto ecosystems that don't share the
problems that motivated those limitations in the first place. The
dependency and library mechanisms of some other ecosystems, are designed
to support having multiple distinct versions of libraries in the same
address space, with fully automatic equivalents of symbol versioning.

In Debian packaging, this issue typically results in one of three
scenarios for every dependency (recursively):

- Trying to port the package to work with older versions of
  dependencies. This incurs all of the burden mentioned above for
  determining if the older dependency actually suffices. On top of that,
  this may involve actual porting of code to not rely on the
  functionality of newer versions, which is very much wasted effort
  (that functionality was added