Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-02-05 Thread Marek Marczykowski-Górecki
On Tue, Jan 05, 2021 at 07:01:56PM +, Matthew Almond via devel wrote:
> Signature *verification* partially works. Everything to do with
> signatures on just the header works (and the header describes the
> payload digest). There is one specific area which needs fixed: regular
> RPMs are read, digested, and signature verified before decompression.
> We need to guard against malicious compressed payloads that either
> perform a DoS on space/time, or worse (but more difficult) could
> exploit a bug in a decompression library. I am actively working on
> this.

I just want to say, this is IMHO critical to even consider such proposal.
Signature verification should come before parsing whatever is under that
signature, otherwise you risk exposing to attacks various processing
code that previously assumed it is feed with trusted data only. This
applies to decompression library, actual transcoding code and possibly
much more. Even if _currently_ there are no known vulnerabilities in a
particular part, it doesn't mean they won't be discovered later. The
defence in depth is especially important for update system, you don't
want to be in a situation where like "oh, we've found a bug in an update
system, so you need to execute this very part that is vulnerable to get
it fixed".

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-02-03 Thread Matthew Almond via devel
On Mon, 2020-12-21 at 11:28 -0500, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/RPMCoW 
> 
> 
> == Summary ==
> 
> RPM Copy on Write provides a better experience for Fedora Users as it
> reduces the amount of I/O and offsets CPU cost of package
> decompression. RPM Copy on Write uses reflinking capabilities in
> btrfs, which is the default filesystem in Fedora 33.
> 

I've been communicating with the maintainer of RPM on the pull request
and it's become clear that this likely depends on the creation of a
public, supportable API for RPM. This is not achievable within the
window for Fedora 34, so I'm withdrawing the change for Fedora 34 at
this time. I will continue to work on this, and expect to re-submit for
Fedora 35.

Just a reminder for those interested: I'm giving a talk at CentOS Dojo
on this topic on Friday at 17:00 CET[2]

Regards, Matthew.

[1] 
https://github.com/rpm-software-management/rpm/pull/1470#issuecomment-772410935
[2] https://hopin.com/events/centos-dojo-fosdem
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-05 Thread Matthew Almond via devel
On Tue, 2021-01-05 at 18:18 +0100, Daniel Mach wrote:
> Dne 24. 12. 20 v 22:54 Matthew Almond via devel napsal(a):
> > Depends on how it got there, and what you asked for. Here's some
> > examples:
> > 
> > 1. cp foo.rpm /var/cache/dnf//Packages/ && dnf install foo
> > ...will fail the librepo full file check, and it'll be re-
> > downloaded.
> > 2. dnf install /root/foo.rpm || rpm -i /root/foo.rpm
> > (not actually tested) will likely fail with CPIO/payload error
> > 
> > Note that tools like rpm2cpio and rpm2archive will also fail on
> > transcoded rpms. I have an open task to make the dnf plugin not
> > transcode with 'yumdownloader' or 'dnf download' (plugin) as those
> > are
> > reasonable command to run. I will look at making error messages
> > better
> > and/or making some of these use cases work.
> > 
> 
> This concerns us (speaking for RPM and DNF people) a little bit.
> If the transcoded RPM cannot be used as a regular RPM, it probably 
> should have a different identity, for example a different suffix
> than 
> .rpm. RPM and DNF are designed for generic use cases. I see these 
> transcoded packages as a "cache" tailored for btrfs based systems
> only. 
> It would be probably good to draw a border between them.
> 
> If I understand it correctly, the transcoding happens on each host.
> Have 
> you considered transcoding all RPMs in a repo on server instead? Or 
> would that be inefficient and increase network traffic too much?
> 
The transcoded RPM is a valid RPM for the rpm program and most use
cases. It's got all the original headers, so querying it (-qp) works
perfectly, and it's still signed, and it's only produced in concert
with dnf/librepo which validates that the file downloaded is the one
described in the repo.

Notably it doesn't work with rpm2cpio and rpm2archive (and yes, I'd
like to have a better story there). You typically get these through
'dnf download'. When I implemented the plugin for yum, I was able to
avoid transcoding, but on dnf it's not implemented yet.

Signature *verification* partially works. Everything to do with
signatures on just the header works (and the header describes the
payload digest). There is one specific area which needs fixed: regular
RPMs are read, digested, and signature verified before decompression.
We need to guard against malicious compressed payloads that either
perform a DoS on space/time, or worse (but more difficult) could
exploit a bug in a decompression library. I am actively working on
this.

The bottom line is that in every place you'd expect to see an rpm, and
use it later, you have exactly the same number of things in the same
place. If you want to reflink clone the dnf cache into a container -
it'll work (really well). If you want to clean up the cache in some
selective way. The interface for the cache remains the same.

On server side: no this doesn't make sense: you'd save on
decompressing, but you'd use 2x the bandwidth and space on server side.
You'll also need to keep the original rpms for clients that don't or
can't use reflinking, so it's more like 2.5x the space, which I think
is unreasonable.

I do think there's some room in the future to think and talk about how
repos could be changed to take better advantage of this. I got some
feedback on RPM PR (
https://github.com/rpm-software-management/rpm/pull/1470#issuecomment-754025728
) which is somewhere along the lines of what I was already thinking.
That said, I'm aware that I'm being ambitious with this change request,
and I'm focused on trying to integrate things that have been
written/exist and can be demonstrated first.

Hope these explanations help! Thanks for the feedback :)

Matthew.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-05 Thread Daniel Mach

Dne 24. 12. 20 v 22:54 Matthew Almond via devel napsal(a):

Depends on how it got there, and what you asked for. Here's some
examples:

1. cp foo.rpm /var/cache/dnf//Packages/ && dnf install foo
...will fail the librepo full file check, and it'll be re-
downloaded.
2. dnf install /root/foo.rpm || rpm -i /root/foo.rpm
(not actually tested) will likely fail with CPIO/payload error

Note that tools like rpm2cpio and rpm2archive will also fail on
transcoded rpms. I have an open task to make the dnf plugin not
transcode with 'yumdownloader' or 'dnf download' (plugin) as those are
reasonable command to run. I will look at making error messages better
and/or making some of these use cases work.



This concerns us (speaking for RPM and DNF people) a little bit.
If the transcoded RPM cannot be used as a regular RPM, it probably 
should have a different identity, for example a different suffix than 
.rpm. RPM and DNF are designed for generic use cases. I see these 
transcoded packages as a "cache" tailored for btrfs based systems only. 
It would be probably good to draw a border between them.


If I understand it correctly, the transcoding happens on each host. Have 
you considered transcoding all RPMs in a repo on server instead? Or 
would that be inefficient and increase network traffic too much?

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-04 Thread Matthew Almond via devel
On Sun, 2021-01-03 at 16:16 -0500, Colin Walters wrote:
> 
> On Sat, Jan 2, 2021, at 10:03 AM, Zbigniew Jędrzejewski-Szmek wrote:
> 
> > I fail to see why this would be significantly better...
> 
> I don't claim that the "separate temporary directory of unpacked
> content" is *better* - just that it's as easy to implement *and*
> doesn't require an RPM format change (with all the consequent pain)
> or support for reflinks from the underlying filesystem.
> 
> >  The logic to
> > handle the split rpm contents would seem to be more complicated
> > than the
> > rewrite with /usr/bin/rpm2extents. Other comments?
> 
> Hard to really say for sure I guess without trying to write
> both.  Probably the biggest impediment is that changes like that
> would end up needing to be split across the librpm + zypper/rpm-
> ostree/dnf tools.  It wasn't an accident really that for rpm-ostree
> /usr/bin/rpm is read-only - we effectively squash those layers
> togther and can thus make deep changes as a single unit.
> 
> Anyways, none of this really *requires* reflinks in any way and so
> calling the Change "RPMCoW" is misleading from that
> perspective.  "DnfParallelUnpack" would probably be a better title,
> with a dependency on "RPMFormatCowReady" or something.  And then my
> point is that one could do "DnfParallelUnpack" without changing the
> RPM format without much more complexity, if any.

Early on in this project I looked at creating all the files during
download in a temporary directory. It would work. It is more filesystem
type agnostic. If moving the decompression to an earlier step were the
sole goal, it's reasonable.

The goal of RPMCoW is to write once, and re-use data multiple times.
This comes up in a number of circumstances for this proposal:

1. Reflinking allows for de-duplication of file content. Today this is 
   only within a single RPM. I am looking at changing rpm2extents to
   reuse data across (cached) rpms to achieve something kind of like
   delta rpm. That is: if you already have file X, you don't write it,
   you clone it from any other rpm.
2. Reflinking allows sharing of file contents, without side effects 
   from the installed copy. Each copy is a real, distinct file, can be 
   deleted and or modified. Only the differences cost something, and
   99% of rpms files don't get modified. The net result is that the 
   rpm cache costs very little.
3. If you can keep a rpm cache, you can reuse the data very quickly, 
   either to build a new rootfs in a subdir/subvolume with the same or 
   different packages, and you can use those files for containers.
   This sounds similar to using snapshots, but with snapshots you're
   operating on a filesystem at a time, and you can only go backwards.
   Here you can decide what you want, and you get maximum reuse 
   automatically.

By contrast "DnfParallelUnpack" by itself, without CoW, is less useful
because you will need to re-fetch and re-decompress data.

Lastly, I'd like to emphasize that I'm not trying to change the "normal
rpm format". Doing so would orphan every previously built and signed
rpm, and would present a serious backward compatibility problem. I aim
to only change how they're downloaded and stored in the cache, locally,
and consumed in rpm itself within the confines of hosts that (can)
enable this.

- Matthew
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-04 Thread Matthew Miller
On Sun, Jan 03, 2021 at 03:25:29PM -0800, Kevin Fenzi wrote:
> I remember when drpms landed I heard people say they choose Fedora
> because of them. That may have changed over the years I guess. :) 
> and there have been only 2 or 3 reports about how few drpms exist
> in the last few years (ie, most people didn't really notice). 

Hmmm, here's an idea: what if instead of nightly drpms, we made them only
every two weeks, but always exactly two weeks, so that people updating on a
specific cadence would get them?


-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-03 Thread Kevin Fenzi
On Sat, Jan 02, 2021 at 06:17:03PM +, Jonathan Dieter wrote:
> On Sat, 2021-01-02 at 18:12 +, Jonathan Dieter wrote:
> > FWIW, I also think it's time for drpms to go.  Aside from any potential
> > issues with the proposed change, they haven't been useful in Fedora for
> > three years, (see https://pagure.io/releng/issue/7215), and nobody's
> > been able to put in the time to fix it yet.  If that changed and
> > someone was willing to step up and commit to fixing this, I'd feel very
> > differently.

It's not been something thats a priority. ;( 

I think the way to do it would be to drop making drpms from the bodhi
pungi and setup a script to manage them: create them, make the repos,
keep N days of old ones from the last repos, etc. I'd be happy to help
interested folks with requirements and such, but I don't think I can
commit to fixing it.

I remember when drpms landed I heard people say they choose Fedora
because of them. That may have changed over the years I guess. :) 
and there have been only 2 or 3 reports about how few drpms exist
in the last few years (ie, most people didn't really notice). 

> > In addition, drpms aren't even working at the moment.  Something has
> > changed during the last week or so that's broken them (see
> > https://bugzilla.redhat.com/show_bug.cgi?id=1911828).  I'll take a
> > look, but, being honest, there's not much motivation to investigate
> > this when drpms are of such marginal use in Fedora at the moment.

Yeah, understand...

kevin


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-03 Thread Colin Walters


On Sat, Jan 2, 2021, at 10:03 AM, Zbigniew Jędrzejewski-Szmek wrote:

> I fail to see why this would be significantly better...

I don't claim that the "separate temporary directory of unpacked content" is 
*better* - just that it's as easy to implement *and* doesn't require an RPM 
format change (with all the consequent pain) or support for reflinks from the 
underlying filesystem.

>  The logic to
> handle the split rpm contents would seem to be more complicated than the
> rewrite with /usr/bin/rpm2extents. Other comments?

Hard to really say for sure I guess without trying to write both.  Probably the 
biggest impediment is that changes like that would end up needing to be split 
across the librpm + zypper/rpm-ostree/dnf tools.  It wasn't an accident really 
that for rpm-ostree /usr/bin/rpm is read-only - we effectively squash those 
layers togther and can thus make deep changes as a single unit.

Anyways, none of this really *requires* reflinks in any way and so calling the 
Change "RPMCoW" is misleading from that perspective.  "DnfParallelUnpack" would 
probably be a better title, with a dependency on "RPMFormatCowReady" or 
something.  And then my point is that one could do "DnfParallelUnpack" without 
changing the RPM format without much more complexity, if any.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-02 Thread Jonathan Dieter
On Sat, 2021-01-02 at 18:12 +, Jonathan Dieter wrote:
> FWIW, I also think it's time for drpms to go.  Aside from any potential
> issues with the proposed change, they haven't been useful in Fedora for
> three years, (see https://pagure.io/releng/issue/7215), and nobody's
> been able to put in the time to fix it yet.  If that changed and
> someone was willing to step up and commit to fixing this, I'd feel very
> differently.
> 
> In addition, drpms aren't even working at the moment.  Something has
> changed during the last week or so that's broken them (see
> https://bugzilla.redhat.com/show_bug.cgi?id=1911828).  I'll take a
> look, but, being honest, there's not much motivation to investigate
> this when drpms are of such marginal use in Fedora at the moment.
> 
> Jonathan

Apologies for the odd quoting in the previous email; Evolution decided
that what you see isn't what you get. :)  I've trimmed out everything
but my response here.

Jonathan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-02 Thread Jonathan Dieter
On Sat, 2021-01-02 at 13:42 +, Zbigniew Jędrzejewski-Szmek wrote:
On Wed, Dec 30, 2020 at 10:10:27AM -0800, Kevin Fenzi wrote:
> 
> This is most likely because we are only making drpms against the most
> recent updates. So, we are making very few drpms and only against
> things
> that recently updated. 

So... people who actually care about the total download are likely not
to update all the time, which also means that drpms will not work for
them.

> For example:
> https://dl.fedoraproject.org/pub/fedora/linux/updates/33/Everything/x86_64/drpms/
> (126 drpms for all of f33 updates). 
> 
So... that means that drpms wouldn't even make a difference for people
who update often.

...and the proposed Change would require additional contortions to
allow
drpms to work. It sounds like drpms are not worth the trouble anymore.
The effort to make them work properly would be large. I think the
crucial bit is that we have more packages and updates than ever, and
at the same time more people update at custom schedules, so any
reasonable
subset of drpms will cover a shrinking subset of upgrades.

Zbyszek

> > Maybe the time has come to just disable DRPM entirely for F34.
> 
> We could. Or try and make them more usefull again. 


FWIW, I also think it's time for drpms to go.  Aside from any potential
issues with the proposed change, they haven't been useful in Fedora for
three years, (see https://pagure.io/releng/issue/7215), and nobody's
been able to put in the time to fix it yet.  If that changed and
someone was willing to step up and commit to fixing this, I'd feel very
differently.

In addition, drpms aren't even working at the moment.  Something has
changed during the last week or so that's broken them (see
https://bugzilla.redhat.com/show_bug.cgi?id=1911828).  I'll take a
look, but, being honest, there's not much motivation to investigate
this when drpms are of such marginal use in Fedora at the moment.

Jonathan
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-02 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Dec 22, 2020 at 09:41:35PM -, Matthew Almond via devel wrote:
> > On Mon, Dec 21, 2020, at 1:07 PM, Neal Gompa wrote:
> > 
> > Yes it does.  It avoids writing the compressed data and then copying it 
> > back out
> > uncompressed, which is the same amount of savings as the reflink approach.
> > 
> > (It's also equally incompatible with deltarpm)

This part doesn't seem to have been answered...

I'll restate what Colin said (please chime in if I misunderstand the proposal):

  During the download, packages are unpacked into a temporary root
  (/usr/.rpmtemp...), and the rpm headers are stored to disk in normal
  download location.
  During the installation, files are rename()d from this temporary
  location to the final destination.

I fail to see why this would be significantly better... The logic to
handle the split rpm contents would seem to be more complicated than the
rewrite with /usr/bin/rpm2extents. Other comments?

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2021-01-02 Thread Zbigniew Jędrzejewski-Szmek
On Wed, Dec 30, 2020 at 10:10:27AM -0800, Kevin Fenzi wrote:
> On Wed, Dec 30, 2020 at 01:18:38PM +, Zbigniew Jędrzejewski-Szmek wrote:
> > On Tue, Dec 22, 2020 at 05:09:08PM -0500, Matthew Miller wrote:
> > > On Tue, Dec 22, 2020 at 02:02:13PM -0800, Kevin Fenzi wrote:
> > > > > delta rpms safe so much time in form of bandwidth on the client side.
> > > > Well, it's tradeoffs. They save bandwith and download time on one side,
> > > > but use lots of cpu cycles and disk space on the other. It just depends
> > > > on what each person wants based on their situation and hardware. 
> > > 
> > > They actually use a lot of cpu cycles on _both_ sides, really.
> > 
> > I thought that zchunk would obsolete drpm. What's the story here?
> 
> Nope, they are different things. 
> 
> zchunk = a way to only download changed chunks of repodata. 
> 
> drpms = a way to only download changed chunks of rpms.

Right, it did feel a bit like I was missing some important chunk of the picture 
;)

> > Also, in recent times, any dnf upgrade I did reported "savings" from
> > drpm on the level <1% [*]. Am I doing something wrong or is this expected?
> > Is there some usage pattern where there drpm provides real gain with
> > current Fedora?
> 
> This is most likely because we are only making drpms against the most
> recent updates. So, we are making very few drpms and only against things
> that recently updated. 

So... people who actually care about the total download are likely not
to update all the time, which also means that drpms will not work for them.

> For example: 
> https://dl.fedoraproject.org/pub/fedora/linux/updates/33/Everything/x86_64/drpms/
> (126 drpms for all of f33 updates). 
So... that means that drpms wouldn't even make a difference for people
who update often.

...and the proposed Change would require additional contortions to allow
drpms to work. It sounds like drpms are not worth the trouble anymore.
The effort to make them work properly would be large. I think the
crucial bit is that we have more packages and updates than ever, and
at the same time more people update at custom schedules, so any reasonable
subset of drpms will cover a shrinking subset of upgrades.

Zbyszek

> > Maybe the time has come to just disable DRPM entirely for F34.
> 
> We could. Or try and make them more usefull again. 
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-30 Thread Kevin Fenzi
On Wed, Dec 30, 2020 at 01:18:38PM +, Zbigniew Jędrzejewski-Szmek wrote:
> On Tue, Dec 22, 2020 at 05:09:08PM -0500, Matthew Miller wrote:
> > On Tue, Dec 22, 2020 at 02:02:13PM -0800, Kevin Fenzi wrote:
> > > > delta rpms safe so much time in form of bandwidth on the client side.
> > > Well, it's tradeoffs. They save bandwith and download time on one side,
> > > but use lots of cpu cycles and disk space on the other. It just depends
> > > on what each person wants based on their situation and hardware. 
> > 
> > They actually use a lot of cpu cycles on _both_ sides, really.
> 
> I thought that zchunk would obsolete drpm. What's the story here?

Nope, they are different things. 

zchunk = a way to only download changed chunks of repodata. 

drpms = a way to only download changed chunks of rpms.

> Also, in recent times, any dnf upgrade I did reported "savings" from
> drpm on the level <1% [*]. Am I doing something wrong or is this expected?
> Is there some usage pattern where there drpm provides real gain with
> current Fedora?

This is most likely because we are only making drpms against the most
recent updates. So, we are making very few drpms and only against things
that recently updated. 

For example: 
https://dl.fedoraproject.org/pub/fedora/linux/updates/33/Everything/x86_64/drpms/
(126 drpms for all of f33 updates). 

> Maybe the time has come to just disable DRPM entirely for F34.

We could. Or try and make them more usefull again. 

kevin


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-30 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Dec 22, 2020 at 05:09:08PM -0500, Matthew Miller wrote:
> On Tue, Dec 22, 2020 at 02:02:13PM -0800, Kevin Fenzi wrote:
> > > delta rpms safe so much time in form of bandwidth on the client side.
> > Well, it's tradeoffs. They save bandwith and download time on one side,
> > but use lots of cpu cycles and disk space on the other. It just depends
> > on what each person wants based on their situation and hardware. 
> 
> They actually use a lot of cpu cycles on _both_ sides, really.

I thought that zchunk would obsolete drpm. What's the story here?

Also, in recent times, any dnf upgrade I did reported "savings" from
drpm on the level <1% [*]. Am I doing something wrong or is this expected?
Is there some usage pattern where there drpm provides real gain with
current Fedora?

Maybe the time has come to just disable DRPM entirely for F34.

Zbyszek

[*] Today on F33:
> Delta RPMs reduced 836.8 MB of updates to 836.7 MB (0.1% saved)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-24 Thread Matthew Almond via devel
On Wed, 2020-12-23 at 19:23 -0500, James Cassell wrote:
> # Resolve packaging request into a list of packages and operations
> > # Download and '''decompress''' packages into a '''locally
> > optimized''' rpm file
> 
> Please verify the signature on the downloaded RPM before
> decompressing it.  (Do we do this already?)
> 

We have an opportunity to do the verification during download, but I'm
not keen on it for two major reasons:

1. the transcoder would need to open the rpmdb to perform the 
   verification, adding a fair amount of complexity.
2. (more crucially) I observe that dnf downloads packages and
   signatures before asking whether to trust them. The order of events
   means we can't be confident that all signatures are in the rpmdb
   yet.

My proposal is for enabling CoW with dnf. The code change to do
transcoding is in librepo as part of the generic file download
mechanism. librepo does verify downloads relative to the repo's
recorded digest.

The transcoder produces a different series of bits to be written to
disk, so how could that verification work? Turns out the answer is
easy: we see the original bits on the input to the transcoder, so we
calculate the digest of the bits received from the yum server and
record this in the footer of the transcoded rpm. I've
modified lr_checksum_fd() in librepo to look for this before using the
xattr cache or reading the whole file again. You can only locate that
whole file digest if the footer itself is complete.

The digest is actually a list of digests. The default value in
createrepo_c is SHA256 (irrespective of what digest algorithm is used
to identify/verify files in each rpm) and for now the dnf plugin passes
"SHA256" to the transcoder statically. This is ultimately repo
specific. I hope to eliminate the hard coding later if there's signal
within librepo to choose the right digest algo for the specific repo. 

The job of actually verifying the signature falls through to rpm as it
did before. As stated in the proposal: the headers (lead, signature,
main header) are completely untouched, so the gpg based signature is
still verified as before, and at the same point in time.


> > # Install and/or upgrade packages sequentially using RPM files,
> > using
> > '''reference linking''' (reflinking) to reuse data already on disk.
> 
> Sounds like a great improvement!  Any real-world data on how much
> time it saves, how much it changes disk usage, or how much SSD writes
> it saves?
> 

Forthcoming! I've got some numbers I've used internally at Facebook to
talk about this. To do this I had to write another rpm plugin to
measure how much time was spent on decompressing and writing data. I'm
planning on improving this and open sourcing that too. The goal here is
to produce some publicly reproducible numbers.


> > The outcome is intended to be the same, but the order of operations
> > is
> > different.
> > 
> > # Decompression happens inline with download. This has a positive
> > effect on resource usage: downloads are typically limited by
> > bandwidth. Decompression and writing the full data into a single
> > file
> > per rpm is essentially free. Additionally: if there is more than
> > one
> > download at a time, a multi-CPU system can be better utilized. All
> > compression types supported in RPM work because this uses the rpm
> > I/O
> > functions.
> 
> I referenced above, I think each chunk should also be verified before
> decompressing.
> 

This is certainly possible, but not implemented. My thinking here is
that the full rpm file digest enforced for files downloaded with
dnf/librepo also covers this. The only optimization possible here is
for a damaged rpm to fail faster during transcode. I consider this a
pretty minor optimization.


> > # RPMs are cached on local storage between downloading and
> > installation time as normal. This allows DNF to defer actual RPM
> > installation to when all the RPM are available. This is unchanged.
> > # The file format for RPMs is different with Copy on Write. The
> > headers are identical, but the payload is different. There is also
> > a
> > footer.
> > ## Files are converted (“transcoded”) locally during download using
> > /usr/bin/rpm2extents (part of rpm codebase). The
> > format
> > is not intended to be “portable” - i.e. copying the files from the
> > cache is not supported.
> 
> I think these should be made to be portable.  How many variants of
> these are there?  Would it be difficult to make the transcoder also
> understand RPMs transcoded for a different
> platform/setup?  Eventually, I'd like to see additional signatures
> added to the RPM for each of the variants so RPM itself can do the
> verification at install time, avoiding a transcode to the "canonical"
> format.  (I suppose this might require a build-time or sign-time
> transcode to each of the other variants.)  Until then, I'd like to
> ensure that the package signatures are being verified in a secure
> manner, which would be necessary for the 

Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-23 Thread James Cassell
On Mon, Dec 21, 2020, at 11:28 AM, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/RPMCoW
> 
> 
> == Summary ==
> 
> RPM Copy on Write provides a better experience for Fedora Users as it
> reduces the amount of I/O and offsets CPU cost of package
> decompression. RPM Copy on Write uses reflinking capabilities in
> btrfs, which is the default filesystem in Fedora 33.
> 
> == Owners ==
> 
> * Name: [[User:malmond|Matthew Almond]], [[User:dcavalca|Davide Cavalca]]
> * Email: malm...@fb.com, dcava...@fb.com
> 
> 
> == Detailed description ==
> 
> Installing and upgrading software packages is a standard part of
> managing the lifecycle of any operating system. For the entire
> lifecycle of Fedora, all software is packaged and distributed using
> the RPM file fomat. This proposal changes how software is downloaded
> and installed, leaving the distribution process unmodified.
> 
> === Current process ===
> 
> # Resolve packaging request into a list of packages and operations
> # Download and verify new packages
> # Install and/or upgrade packages sequentially using RPM files,
> decompressing, and writing a copy of the new files to storage.
> 
> === New process ===
> 
> # Resolve packaging request into a list of packages and operations
> # Download and '''decompress''' packages into a '''locally optimized''' rpm 
> file

Please verify the signature on the downloaded RPM before decompressing it.  (Do 
we do this already?)

> # Install and/or upgrade packages sequentially using RPM files, using
> '''reference linking''' (reflinking) to reuse data already on disk.

Sounds like a great improvement!  Any real-world data on how much time it 
saves, how much it changes disk usage, or how much SSD writes it saves?

> 
> The outcome is intended to be the same, but the order of operations is
> different.
> 
> # Decompression happens inline with download. This has a positive
> effect on resource usage: downloads are typically limited by
> bandwidth. Decompression and writing the full data into a single file
> per rpm is essentially free. Additionally: if there is more than one
> download at a time, a multi-CPU system can be better utilized. All
> compression types supported in RPM work because this uses the rpm I/O
> functions.

I referenced above, I think each chunk should also be verified before 
decompressing.

> # RPMs are cached on local storage between downloading and
> installation time as normal. This allows DNF to defer actual RPM
> installation to when all the RPM are available. This is unchanged.
> # The file format for RPMs is different with Copy on Write. The
> headers are identical, but the payload is different. There is also a
> footer.
> ## Files are converted (“transcoded”) locally during download using
> /usr/bin/rpm2extents (part of rpm codebase). The format
> is not intended to be “portable” - i.e. copying the files from the
> cache is not supported.

I think these should be made to be portable.  How many variants of these are 
there?  Would it be difficult to make the transcoder also understand RPMs 
transcoded for a different platform/setup?  Eventually, I'd like to see 
additional signatures added to the RPM for each of the variants so RPM itself 
can do the verification at install time, avoiding a transcode to the 
"canonical" format.  (I suppose this might require a build-time or sign-time 
transcode to each of the other variants.)  Until then, I'd like to ensure that 
the package signatures are being verified in a secure manner, which would be 
necessary for the plugin to be able to install packages not built with multiple 
signatures/digests.

Would it be practical to just have a single format aligned to the largest page 
size known, leaving fs holes as necessary on systems with smaller page sizes?


> ## Regular RPMs use a compressed .cpio based payload. In contrast,
> extent based RPMs contain uncompressed data aligned to the fundamental
> page size of the architecture, e.g. 4KiB on x86_64. This alignment is
> required for FICLONERANGE to work. Only files are
> represented in the payload, other directory entries like symlinks,
> device nodes etc are constructed entirely from rpm header information.
> Files are referenced by their digest, so identical files are
> de-duplicated.

How are hardlinks in an RPM handled?  Do they stay as hardlinks or become 
reflinks only, losing the hardlink status?  They should stay hardlinks, in my 
opinion.

> ## The footer currently has three sections
> ### Table of original (rpm) file digests, used to validate the
> integrity of the download in dnf.
> ### Table of digest → offset used when actually installing files.
> ### Signature 8 bytes at the end of the file, used to differentiate
> between traditional RPMs and extent based.

I think this magic number "signature" should vary based on the items that cause 
the format to change.

What happens if you try to use a transcoded RPM on a non-compatible system?

> 
> === Notes ===
> 
> # The headers are 

Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-23 Thread Chris Murphy
On Mon, Dec 21, 2020 at 10:49 AM Colin Walters  wrote:
>
>
>
> On Mon, Dec 21, 2020, at 11:28 AM, Ben Cotton wrote:
> > ## Regular RPMs use a compressed .cpio based payload. In contrast,
> > extent based RPMs contain uncompressed data aligned to the fundamental
> > page size of the architecture, e.g. 4KiB on x86_64. This alignment is
> > required for FICLONERANGE to work. Only files are
> > represented in the payload, other directory entries like symlinks,
> > device nodes etc are constructed entirely from rpm header information.
>
> This is the core change; some interesting tradeoffs here.  Python projects in 
> particular ship a lot of files smaller than 4k (classic example is 
> `__init__.py` which is zero sized).  And ppc64le is 64KiB pages right?  So 
> there will be "zero space" to align, right?  Would need some math to see how 
> much this would add up to, although I guess the implementation could instead 
> use holes?

I'm not sure about XFS or ext4 zero length file handling.

On Btrfs, it's a few hundred bytes. The file has no EXTENT_DATA item,
therefore it's the same whether you write a new zero length file or
reflink copy it.

Files bigger than 0 bytes but less than 2KiB will tend to result in
inline extents, i.e. EXTENT_DATA item contains the data in the same
metadata leaf as the inode rather than referencing some 4KiB data
block elsewhere.

Hardlinks take around 100 bytes, they are slightly more efficient space
wise. But can't have separate selinux labels, acl, permissions, or be
located in different subvolumes, and max hardlinks 65536 per file).
Reflinks don't have those limitations.

>
> > Files are referenced by their digest, so identical files are
> > de-duplicated.
>
> But just inside a single RPM, right?  It's interesting to compare with ostree 
> which does this by default; conceptually this is using reflinks inside a 
> single RPM to do what ostree does system wide with hardlinks.
>
> BTW we learned a few things, notably zero sized files are tricky because 
> there can be a *lot* of them - see e.g. 
> https://github.com/ostreedev/ostree/pull/2197
> That one was too many hardlinks, but how well do filesystems like btrfs/xfs 
> handle thousands of reflinks instead?  The Python __init__.py thing is such a 
> pathological case...

Thousands aren't a problem, nor are tens of thousands. A reflink is a
normal file that just so happens to have extents shared with another
file. It's the shared extent part that makes them sorta special, but
there's nothing in the structure of the file that says it's a reflink.
Whereas for a symlink or hard link, there is.

Shared extents are also produced by snapshots and dedup. It's the same
on-disk manifestation in all three cases. And at least on Btrfs there
are examples of millions of shared extents. But the workload will
dictate the extent layout, to what degree extents are shared, become
unshared, result in COW for modifications, and how much file and free
space fragmentation ensues. Those can be much bigger issues than the
number of reflinks.


--
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Garry T. Williams
On Tuesday, December 22, 2020 4:54:34 PM EST Matthew Almond via devel wrote:
> > I currently download once and upgrade three different systems by
> > rsync-ing the cache.
> >
> > Do I understand that this will no longer be supported or work?
>
> That's an interesting question. Is sharing the cache directory from
> a single host intended to be shared like this? I am guessing no, but
> it may still be common.
>
> It should still work, with two caveats:
> 1. The files in the cache will be bigger, so a simple rsync will
> involve more I/O, and the destination filesystem will also need more
> space and I/O time.
> 2. The systems must be the same endianness (The transcoded format
> doesn't bother with network order, because it's not intended to be
> shared)
> 3. The page size must be the same for reflinking to work: This is
> actually worked out when the filesystem is created, and defaults to
> the system page size, and if not the same as the current page size,
> the filesystem isn't even guaranteed to mount (see --sectorsize
> option in mkfs.btrfs man page).
>
> In reality you're quite unlikely to share packages unless the
> architecture were the same, which would steer both endianness and
> page size to the same value. That said, I'm aware that aarch64 can
> be flexible in both ways. I'm covering my bases with my statement: I
> have thought about it, and don't think I'm in any position to make
> promises.
>
> For this proposal: we're talking about shipping the code that would
> allow this to be turned on. We're not talking about enabling it by
> default. We can't until we have good answers to questions like this.

Understood.

To be clear, all three systems are x86_64, identical endianness,
architecture, and page size (as far as I know).

Also, this isn't a big deal, really.  I just wanted to reduce network
bandwidth without operating a local mirror.

___
sudo rsync -a --password-file=/etc/rsync.password --delete 
rsync://rsync@vfr/dnf /var/cache/dnf;sudo dnf --enablerepo=updates-testing 
upgrade

-- 
Garry T. Williams


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Matthew Miller
On Tue, Dec 22, 2020 at 02:02:13PM -0800, Kevin Fenzi wrote:
> > delta rpms safe so much time in form of bandwidth on the client side.
> Well, it's tradeoffs. They save bandwith and download time on one side,
> but use lots of cpu cycles and disk space on the other. It just depends
> on what each person wants based on their situation and hardware. 

They actually use a lot of cpu cycles on _both_ sides, really.



-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Kevin Fenzi
On Mon, Dec 21, 2020 at 07:14:08PM +0100, Marius Schwarz wrote:
> Am 21.12.20 um 18:53 schrieb Kevin Fenzi:
> > But in general perhaps we should decide how much value drpms provide
> > these days and either make sure we are making more of them, or drop
> > them.
> delta rpms safe so much time in form of bandwidth on the client side.

Well, it's tradeoffs. They save bandwith and download time on one side,
but use lots of cpu cycles and disk space on the other. It just depends
on what each person wants based on their situation and hardware. 

Right now we are not making very many drpms at all, due to the way the
compose process has changed over the years. If we keep drpms around we
should really look into making more of them... as they are now, they
seldom matter. 

kevin


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Matthew Almond via devel

> I currently download once and upgrade three different systems by
> rsync-ing the cache.
> 
> Do I understand that this will no longer be supported or work?

That's an interesting question. Is sharing the cache directory from a single 
host intended to be shared like this? I am guessing no, but it may still be 
common.

It should still work, with two caveats:
1. The files in the cache will be bigger, so a simple rsync will involve more 
I/O, and the destination filesystem will also need more space and I/O time.
2. The systems must be the same endianness (The transcoded format doesn't 
bother with network order, because it's not intended to be shared)
3. The page size must be the same for reflinking to work: This is actually 
worked out when the filesystem is created, and defaults to the system page 
size, and if not the same as the current page size, the filesystem isn't even 
guaranteed to mount (see --sectorsize option in mkfs.btrfs man page).

In reality you're quite unlikely to share packages unless the architecture were 
the same, which would steer both endianness and page size to the same value. 
That said, I'm aware that aarch64 can be flexible in both ways. I'm covering my 
bases with my statement: I have thought about it, and don't think I'm in any 
position to make promises.

For this proposal: we're talking about shipping the code that would allow this 
to be turned on. We're not talking about enabling it by default. We can't until 
we have good answers to questions like this.

Thanks, Matthew.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Matthew Almond via devel
> On Mon, Dec 21, 2020, at 1:07 PM, Neal Gompa wrote:
> 
> Yes it does.  It avoids writing the compressed data and then copying it back 
> out
> uncompressed, which is the same amount of savings as the reflink approach.
> 
> (It's also equally incompatible with deltarpm)
> 
> 
> No - static deltas exist, plus layered RPMs work on the wire the same.  But 
> this isn't
> really relevant here.
> 
> 
> Adding a hardlink indeed requires updating inodes proportional to the number 
> of files, but
> that's more an implementation of the transactional update approach, not of the
> "download and unpack in parallel" part which is more what we're discussing
> here.  (Though they are entangled a bit)
> 
> Anyways, I'd still stand by my summary that the much lower tech "files in
> temporary directory that get rename()d" approach would be all of *more* 
> efficient on
> disk, simpler to implement and much less disruptive than an RPM format 
> change.  (The main
> cost would be a new temporary directory path that would need cleanup as part 
> of e.g. `yum
> clean` etc.)

I'm replying to a bunch of topics in the same thread (via the web ui because I 
wasn't subscribed to the mailing list until today, yikes)

On editions: I wrote fedora-workstation because that's the same one that has 
btrfs as root by default

Zero byte files: I think reflinking is specifically fine here because 
reflinking is about contents, not inodes. A zero byte reflink should be a no-op 
(on the filesystem level, but I should check, if it's not, I can special case 
it easily enough). The process of installing files based on reflinks involves 
actually opening new files, then reflinking content.

On small files and alignment/waste: I believe most mutable filesystems do 
"waste some space". I call it out here because it's explicitly in the file 
format, the same as in .tar (without compression) and it's because FICLONERANGE 
and the filesystems demand it. I account for it as (number of files) x (native 
block size) / 2 - i.e. assume 50% usage of the tail of every file. The block 
size of ppc64 is unfortunate, but I expect the same level of waste happens 
whether you're using reflinking or not.

Talking about the topic more broadly:

The hardlinking approach in rpm-ostree depends on either a completely read-only 
system, or the use of a layered filesystem like overlayfs. I think it's a 
completely valid approach, and to my understanding, is the technology that 
underpins Fedora CoreOS and Project Atomic. These are different distro builds 
and have specific use cases in mind. As I understand it, they also have very 
different management policies: they are intended to be managed in a specific 
way, and that updates seem to require a reboot.

My hope for CoW for RPM is to bring a similar set of capabilities and benefits 
to Fedora, and eventually CentOS, RHEL without requiring any changes to how the 
system works or is managed. The new requirements are fairly simple: one 
filesystem for the rootfs and dnf cache, and that this filesystems supports 
reflinking.

Today data deduplication is within a given rpm. Looking forwards, I would like 
to extend the rpm2extents processor to read and re-use other blocks from the 
dnf/rpm cache and then we get full system level de-duplication.

I am really grateful for all this feedback, hopefully what I write makes sense 
- Matthew.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Chris Murphy
On Tue, Dec 22, 2020 at 12:58 PM Matthew Almond via devel
 wrote:
>
> There is also some confusion between compressed data in the rpm and the 
> transcoded one, and filesystem level compression. This proposal affects the 
> former, but not the latter. I'd caution against using btrfs specific 
> attributes to disable compression the dnf/rpm cache directory tree, because 
> then the extents written/shared to the installed file locations will also not 
> be compressed. (this is my interpretation of what I expect to see with 
> FICLONERANGE ioctl etc: it'd be slower if it honored filesystem level 
> compression because it'd need to re-write the data.)

It shouldn't need to rewrite the data. ficlonerange offset and length
is based on the Btrfs logical address space, and this is uncompressed.
That behind the scene it happens to be compressed is a sort of "last
mile" detail, similar to where the file is actually located. Btrfs
logical address for a file suggests there is exactly one copy of the
file and one copy of its metadata, but via chunk tree lookup it may be
that this file has two copies (raid1) or it may be located on any one
of a number of devices. Yet ficlonerange still works as expected
regardless of those details.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Matthew Almond via devel
>> === New process ===
>> # Resolve packaging request into a list of packages and operations
>> # Download and '''decompress''' packages into a '''locally optimized''' rpm 
>> file
>> # Install and/or upgrade packages sequentially using RPM files, using 
>> ''reference linking''' (reflinking) to reuse data already on
disk. 
> This sound great because free space requirements can be reduced, 
> specially when installing new packages.

I need to re-word this: the "reuse" of data is between the locally downloaded 
rpm and the installed destination. I do have a plan to investigate making 
rpm2extents enumerate the dnf/rpm cache (if you enable it) and reflink any 
shared data between rpms, saving writes.

Today this proposal explains that disk space requirements during updates are 
expected to be higher. See https://fedoraproject.org/wiki/Changes/RPMCoW#Notes 
item 3.

> I have experimented building very small appliances using btrfs 
> compression on things like /usr/share. So I think this could disrupt 
> this because if I am correct the extends will be first downloaded to a 
> temporary directory without compression enabled.

There is also some confusion between compressed data in the rpm and the 
transcoded one, and filesystem level compression. This proposal affects the 
former, but not the latter. I'd caution against using btrfs specific attributes 
to disable compression the dnf/rpm cache directory tree, because then the 
extents written/shared to the installed file locations will also not be 
compressed. (this is my interpretation of what I expect to see with 
FICLONERANGE ioctl etc: it'd be slower if it honored filesystem level 
compression because it'd need to re-write the data.)

> I am happy with an option to disable this behavior.

I'm unclear on which behavior you're referring to. This proposal is add support 
for Copy on Write in Fedora, but not make it default at this time.

Thanks, Matthew.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Matthew Almond via devel
> I cannot find it anywhere in rpm codebase.
 
The current status section of the proposal describes this as pending two PRs, 
and in the dependencies list, they're enumerated. Most of the code is in 
https://github.com/malmond77/rpm/tree/cow and enabled through work in 
https://github.com/malmond77/librepo/tree/transcode_cow

> Hmm, I, personally, see much better perfomance (and storage) improvements in 
> enabling %_minimize_writes
> however there is still https://bugzilla.redhat.com/show_bug.cgi?id=1872141 to 
> be resolved before this got enabled by default.

I'm curious about this so I'll look at it, but at first glance it seems 
tangential to this proposal.

Thanks, Matthew.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Miroslav Suchý
Dne 21. 12. 20 v 17:39 Neal Gompa napsal(a):
> On Mon, Dec 21, 2020 at 11:29 AM Ben Cotton  wrote:
>> # Decompression happens inline with download. This has a positive
>> effect on resource usage: downloads are typically limited by
>> bandwidth. Decompression and writing the full data into a single file
>> per rpm is essentially free. Additionally: if there is more than one
>> download at a time, a multi-CPU system can be better utilized. All
>> compression types supported in RPM work because this uses the rpm I/O
>> functions.
>> # RPMs are cached on local storage between downloading and
>> installation time as normal. This allows DNF to defer actual RPM
>> installation to when all the RPM are available. This is unchanged.
>> # The file format for RPMs is different with Copy on Write. The
>> headers are identical, but the payload is different. There is also a
>> footer.

>> ## Files are converted (“transcoded”) locally during download using
>> /usr/bin/rpm2extents (part of rpm codebase). The format

I cannot find it anywhere in rpm codebase.

>> # Disk space requirements are expected to be marginally higher than
>> before: all new packages or updates will consume their installed size
>> before installation instead of about half their size (regular rpms
>> with payloads still cost space).

The size is alreay an issue (for me) on small cloud images. But I do not use 
BTRFS there. So at the end I do not care :)

>>
>> Ballpark performance difference is about half the duration for file
>> download+install time. A lot of rpms are very small, so it’s difficult
>> to see/measure. Larger RPMs give much clearer signal.

Hmm, I, personally, see much better perfomance (and storage) improvements in 
enabling
 %_minimize_writes
however there is still
  https://bugzilla.redhat.com/show_bug.cgi?id=1872141
to be resolved before this got enabled by default.


-- 
Miroslav Suchy, RHCA
Red Hat, Associate Manager ABRT/Copr, #brno, #fedora-buildsys
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-22 Thread Robert Marcano via devel
On Mon, Dec 21, 2020, 8:19 PM Davide Cavalca via devel <
devel@lists.fedoraproject.org> wrote:

> On Mon, 2020-12-21 at 12:54 -0400, Robert Marcano via devel wrote:
> > On 12/21/20 12:28 PM, Ben Cotton wrote:
> > > ...
> > >
> > > === New process ===
> > >
> > > # Resolve packaging request into a list of packages and operations
> > > # Download and '''decompress''' packages into a '''locally
> > > optimized''' rpm file
> > > # Install and/or upgrade packages sequentially using RPM files,
> > > using
> > > '''reference linking''' (reflinking) to reuse data already on disk.
> >
> > This sound great because free space requirements can be reduced,
> > specially when installing new packages.
> >
> > I have experimented building very small appliances using btrfs
> > compression on things like /usr/share. So I think this could disrupt
> > this because if I am correct the extends will be first downloaded to
> > a
> > temporary directory without compression enabled.
>
> For CoW to be beneficial, the package cache should be on the same
> filesystem used for the bulk of the system. In this scenario,
> compression should work just fine, as long as it's enabled on the
> appropriate subvolumes.
>

On btrfs there is a compression file flag so you can set compression on a
directory without having compression on the DNF cache directory on the same
volume

>
> > I am happy with an option to disable this behavior.
>
> To be clear, for this Change we do not plan to enable CoW by default.
> If would be a user opt-in via the dnf-plugin-cow package.
>

Good, thanks


> Cheers
> Davide
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
>  lol
>
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Sam Varshavchik

Neal Gompa writes:

On Mon, Dec 21, 2020 at 1:14 PM Marius Schwarz   
wrote:

>
> If something really needs to change, it is the 50+ MB repo database that
> gets downloaded. It takes ages on slow connections to download
> and than you want to increase the size of the rpms too.. Doesn't sound
> like a good idea.
>

You should be getting delta fetching of repository metadata with
zchunk metadata, which we've had enabled since Fedora 30:
https://fedoraproject.org/wiki/Changes/Zchunk_Metadata

Is this not working for you or something?


Well, I don't know what's working for me, or not working for me. All I know  
is that:


1) I'm rsyncing the updates repo to my LAN, and other machines in my lan  
have the default updates repo disablde and a replacement repo pointing at my  
local copy.


2) Even on the LAN, an update downloads something from the local repo,  
giving me a zippy progress indication of the download. After the download it  
sits for a noticeable amount of time before it decides exactly what it's  
going to update and then gives me the list. This is especially noticable for  
a Fedora VM guest that I'm running in a VM that's emulating an aarch64  
platform. In the emulated aarch64 VM downloading of RPMs goes a bit slow,  
but the subsequent pause after download is quite noticable.


Except for rsyncing a mirror of the updates repo locally and then pointing  
everyone to my local mirror, I am not doing any other customization and  
that's the behavior I've seen.


Having said all that, I don't find the update process to be that much of a  
pain point right now, and in any dire need of improvement. It works. It is  
fairly reliable. A bit slow, but who cares. The important thing is that  
except for a burst of segfaults downloading rpms earlier this year (haven't  
had any in a while) it's been rock stable and hiccups are very, very rare. I  
don't exactly see what's the big value-added from the described feature  
enhancement, I'd only want to make sure it's just as stable.




pgpZBUJUKA6HS.pgp
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Davide Cavalca via devel
On Mon, 2020-12-21 at 12:48 -0500, Colin Walters wrote:
> 
> On Mon, Dec 21, 2020, at 11:28 AM, Ben Cotton wrote:
> > 
> > 
> > == Summary ==
> > 
> > RPM Copy on Write provides a better experience for Fedora Users as
> > it
> > reduces the amount of I/O and offsets CPU cost of package
> > decompression. RPM Copy on Write uses reflinking capabilities in
> > btrfs, which is the default filesystem in Fedora 33.
> 
> A bunch of points here:
> 
> - No, it's the default for one Edition.  Others don't default to
> it.  And even for Workstation we can't *require* it because it's
> definitely supported to use other filesystems and storage layouts. 
> 
> - Orthogonal to this, I'd also note that xfs supports reflinks too.
> 
> Combining those I'd say instead e.g.: "Most Fedora Editions default
> to a filesystem that support reflinks, e.g. btrfs or xfs" (actually I
> think IoT defaults to ext4 for...probably they didn't consider it?)

Thanks for surfacing this, we'll make the language clearer. About XFS:
it should work, but we haven't tested it extensively, and this work has
been developed primarily with btrfs in mind.

> - When talking about RPMs we need to think about container images,
> which use overlayfs by default, which defers to the underlying
> filesystem for reflinks - so should be fine, but should be explicitly
> written down (and tested)

If reflinking isn't possible (which can also happen if e.g. the package
cache and the system are on different filesystems) things work as
normal, albeit with a performance penalty (because more I/O is required
to install the package).

I'll let Matthew weigh in on the other points you raised. Thanks for
the feedback!

Cheers
Davide
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Davide Cavalca via devel
On Mon, 2020-12-21 at 12:54 -0400, Robert Marcano via devel wrote:
> On 12/21/20 12:28 PM, Ben Cotton wrote:
> > ...
> > 
> > === New process ===
> > 
> > # Resolve packaging request into a list of packages and operations
> > # Download and '''decompress''' packages into a '''locally
> > optimized''' rpm file
> > # Install and/or upgrade packages sequentially using RPM files,
> > using
> > '''reference linking''' (reflinking) to reuse data already on disk.
> 
> This sound great because free space requirements can be reduced, 
> specially when installing new packages.
> 
> I have experimented building very small appliances using btrfs 
> compression on things like /usr/share. So I think this could disrupt 
> this because if I am correct the extends will be first downloaded to
> a 
> temporary directory without compression enabled.

For CoW to be beneficial, the package cache should be on the same
filesystem used for the bulk of the system. In this scenario,
compression should work just fine, as long as it's enabled on the
appropriate subvolumes.

> I am happy with an option to disable this behavior.

To be clear, for this Change we do not plan to enable CoW by default.
If would be a user opt-in via the dnf-plugin-cow package.

Cheers
Davide
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Michel Alexandre Salim
On Mon, 2020-12-21 at 09:53 -0800, Kevin Fenzi wrote:
> Cool. A few questions inline... 
> 
> On Mon, Dec 21, 2020 at 11:28:51AM -0500, Ben Cotton wrote:
> > https://fedoraproject.org/wiki/Changes/RPMCoW
> > 
> > 
> > == Summary ==
> > 
> > RPM Copy on Write provides a better experience for Fedora Users as
> > it
> > reduces the amount of I/O and offsets CPU cost of package
> > decompression. RPM Copy on Write uses reflinking capabilities in
> > btrfs, which is the default filesystem in Fedora 33.
> 
> What happens if you enable this on non btrfs installs? 
> Does it just not work gracefully? Does it fail somehow? 
It would be slower but still works; see note #5

> https://fedoraproject.org/wiki/Changes/RPMCoW#Notes

Of note, even on systems with Btrfs/XFS that support reflinks, falling
back to copying is still needed for e.g. files in /boot or /boot/EFI


-- 
Michel Alexandre Salim
profile: https://keyoxide.org/mic...@michel-slm.name
chat via email: https://delta.chat/
GPG key: 5DCE 2E7E 9C3B 1CFF D335 C1D7 8B22 9D2F 7CCC 04F2


signature.asc
Description: This is a digitally signed message part
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: full file paths in the dependency metadata [was Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)]

2020-12-21 Thread Neal Gompa
On Mon, Dec 21, 2020 at 2:14 PM Matthew Miller  wrote:
>
> On Mon, Dec 21, 2020 at 01:47:19PM -0500, Neal Gompa wrote:
> > As someone who has to package for multiple distributions, I would
> > oppose any attempt to cripple DNF to stop supporting file dependencies
> > properly. I *aggressively* use file dependencies to avoid having to
> > litter my spec files with package name dependencies across RH/Fedora,
> > SUSE, Mandriva/Mageia, and others.
>
> Do you have examples outside of /etc, /usr/bin, /usr/sbin?
>

Mostly stuff in /usr/libexec and /usr/lib(64).

> Also, if you _are_ using arbitrary file dependencies, that renders the other
> part about opportunistic download of these deps kind of moot, since they'll
> have to be frequently, right?
>

For packages I maintain in Fedora *itself*, I don't need to do this,
but for packages I maintain *outside* of Fedora, I *must*.

> Again, I'm not kidding about 95% of the dep points being filenames. It's
> huge! I don't think that's a good price at all to make everyone pay
> constantly for packaging convenience. Better to convince packagers to put in
> cross-distro "Provides" or something.
>

Yes, I know. I've looked at the metadata myself before...

The fact that I can't get openSUSE to properly fully enable the Python
module dependency generator (that I maintain upstream in rpm!) after
almost two years of trying should be indication enough of how
difficult what you're asking really is.




--
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: full file paths in the dependency metadata [was Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)]

2020-12-21 Thread Matthew Miller
On Mon, Dec 21, 2020 at 01:47:19PM -0500, Neal Gompa wrote:
> As someone who has to package for multiple distributions, I would
> oppose any attempt to cripple DNF to stop supporting file dependencies
> properly. I *aggressively* use file dependencies to avoid having to
> litter my spec files with package name dependencies across RH/Fedora,
> SUSE, Mandriva/Mageia, and others.

Do you have examples outside of /etc, /usr/bin, /usr/sbin?

Also, if you _are_ using arbitrary file dependencies, that renders the other
part about opportunistic download of these deps kind of moot, since they'll
have to be frequently, right?

Again, I'm not kidding about 95% of the dep points being filenames. It's
huge! I don't think that's a good price at all to make everyone pay
constantly for packaging convenience. Better to convince packagers to put in
cross-distro "Provides" or something.

-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: full file paths in the dependency metadata [was Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)]

2020-12-21 Thread Neal Gompa
On Mon, Dec 21, 2020 at 1:42 PM Matthew Miller  wrote:
>
> On Mon, Dec 21, 2020 at 07:14:08PM +0100, Marius Schwarz wrote:
> > delta rpms safe so much time in form of bandwidth on the client side.
> > If something really needs to change, it is the 50+ MB repo database
> > that gets downloaded. It takes ages on slow connections to download
>
> This needs a followup. I didn't push on it because the DNF team was
> super-busy with modularity, but if someone wants to pick this up, it'd be a
> significant improvement:
>
> https://pagure.io/packaging-committee/issue/714
>
> In short, 95% of the dependency data is full filename paths. That's not
> hyperbole. It's literally 95% by count. Actually probably even more by
> _space_ since they tend to be long.
>
> Only a tiny fraction of packages use these at all, and almost all of the
> packages using file deps outside of /usr/bin, /usr/sbin, or /etc could use
> something else — and of the few using something else, many are actually
> doing so only in error.
>
> It remains convenient to be able to do
>
>dnf install /usr/share/fonts/jetbrains-mono-fonts/JetBrainsMono-Regular.ttf
>
> or whatever, but that seems like it could be covered by a DNF plugin.
>
> Previously, there was a chicken-and-egg scenario where the DNF folks didn't
> want to touch this while people were still making packages relying on this
> feature, but since 2018 that's a "SHOULD NOT" in the guidelines. So, I
> think there's room to move forward, should anyone like to take this on.
>
> https://docs.fedoraproject.org/en-US/packaging-guidelines/#_file_and_directory_dependencies
>

The main problem is that wiring libsolv to callback to
opportunistically fetch and repopulate the solver cache has not been
figured out for libdnf. Once we do that, we don't need to do any more
work. Most cases will automatically only fetch primary.xml and
filelists.xml will only be fetched as requested. This is the behavior
that YUM v3 had, and it wasn't ported to DNF because we lacked a
mechanism to do this. In *theory*, such a mechanism exists now in
libsolv, though the API is sufficiently confusing that I'm not sure
how to do it exactly.

As someone who has to package for multiple distributions, I would
oppose any attempt to cripple DNF to stop supporting file dependencies
properly. I *aggressively* use file dependencies to avoid having to
litter my spec files with package name dependencies across RH/Fedora,
SUSE, Mandriva/Mageia, and others.



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Garry T. Williams
On Monday, December 21, 2020 11:28:51 AM EST Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/RPMCoW

[snip]

> # The file format for RPMs is different with Copy on Write. The
> headers are identical, but the payload is different. There is also a
> footer.
> ## Files are converted (“transcoded”) locally during download using
> /usr/bin/rpm2extents (part of rpm codebase). The format
> is not intended to be “portable” - i.e. copying the files from the
> cache is not supported.

I currently download once and upgrade three different systems by
rsync-ing the cache.

Do I understand that this will no longer be supported or work?


-- 
Garry T. Williams


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Colin Walters


On Mon, Dec 21, 2020, at 1:07 PM, Neal Gompa wrote:
> 
> Sure, this makes some degree of sense, but it doesn't reduce the IOPS
> for actually *doing* the installation.

Yes it does.  It avoids writing the compressed data and then copying it back 
out uncompressed, which is the same amount of savings as the reflink approach.

(It's also equally incompatible with deltarpm)

> This is also a flaw with RPM-OSTree, since you have to fetch
> everything individually

No - static deltas exist, plus layered RPMs work on the wire the same.  But 
this isn't really relevant here.

> and construct the root by shifting hardlinks
> or reflinks around.

Adding a hardlink indeed requires updating inodes proportional to the number of 
files, but that's more an implementation of the transactional update approach, 
not of the "download and unpack in parallel" part which is more what we're 
discussing here.  (Though they are entangled a bit)

Anyways, I'd still stand by my summary that the much lower tech "files in 
temporary directory that get rename()d" approach would be all of *more* 
efficient on disk, simpler to implement and much less disruptive than an RPM 
format change.  (The main cost would be a new temporary directory path that 
would need cleanup as part of e.g. `yum clean` etc.)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


full file paths in the dependency metadata [was Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)]

2020-12-21 Thread Matthew Miller
On Mon, Dec 21, 2020 at 07:14:08PM +0100, Marius Schwarz wrote:
> delta rpms safe so much time in form of bandwidth on the client side.
> If something really needs to change, it is the 50+ MB repo database
> that gets downloaded. It takes ages on slow connections to download

This needs a followup. I didn't push on it because the DNF team was
super-busy with modularity, but if someone wants to pick this up, it'd be a
significant improvement:

https://pagure.io/packaging-committee/issue/714

In short, 95% of the dependency data is full filename paths. That's not
hyperbole. It's literally 95% by count. Actually probably even more by
_space_ since they tend to be long.

Only a tiny fraction of packages use these at all, and almost all of the
packages using file deps outside of /usr/bin, /usr/sbin, or /etc could use
something else — and of the few using something else, many are actually
doing so only in error.

It remains convenient to be able to do

   dnf install /usr/share/fonts/jetbrains-mono-fonts/JetBrainsMono-Regular.ttf

or whatever, but that seems like it could be covered by a DNF plugin.

Previously, there was a chicken-and-egg scenario where the DNF folks didn't
want to touch this while people were still making packages relying on this
feature, but since 2018 that's a "SHOULD NOT" in the guidelines. So, I
think there's room to move forward, should anyone like to take this on.

https://docs.fedoraproject.org/en-US/packaging-guidelines/#_file_and_directory_dependencies
 

-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Neal Gompa
On Mon, Dec 21, 2020 at 1:14 PM Marius Schwarz  wrote:
>
> Am 21.12.20 um 18:53 schrieb Kevin Fenzi:
> > But in general perhaps we should decide how much value drpms provide
> > these days and either make sure we are making more of them, or drop
> > them.
> delta rpms safe so much time in form of bandwidth on the client side.
>
> If something really needs to change, it is the 50+ MB repo database that
> gets downloaded. It takes ages on slow connections to download
> and than you want to increase the size of the rpms too.. Doesn't sound
> like a good idea.
>

You should be getting delta fetching of repository metadata with
zchunk metadata, which we've had enabled since Fedora 30:
https://fedoraproject.org/wiki/Changes/Zchunk_Metadata

Is this not working for you or something?


-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Matthew Miller
On Mon, Dec 21, 2020 at 01:07:42PM -0500, Neal Gompa wrote:
> As an aside, I *really* hate this split of terminology we have among
> Editions, Spins, and Labs. It's confusing to everyone. :(

The website hasn't been changed, but officially all of these are Fedora
Solutions, with only Editions being a special case. Other outputs can call
themselves Spin, Lab, Image, or whatever, as they like for their own
marketing.

https://docs.fedoraproject.org/en-US/council/policy/guiding-policy/#_what_does_this_mean


-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Marius Schwarz

Am 21.12.20 um 18:53 schrieb Kevin Fenzi:

But in general perhaps we should decide how much value drpms provide
these days and either make sure we are making more of them, or drop
them.

delta rpms safe so much time in form of bandwidth on the client side.

If something really needs to change, it is the 50+ MB repo database that 
gets downloaded. It takes ages on slow connections to download
and than you want to increase the size of the rpms too.. Doesn't sound 
like a good idea.


best regards,
Marius Schwarz
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Neal Gompa
On Mon, Dec 21, 2020 at 12:49 PM Colin Walters  wrote:
>
>
>
> On Mon, Dec 21, 2020, at 11:28 AM, Ben Cotton wrote:
> >
> >
> >
> > == Summary ==
> >
> > RPM Copy on Write provides a better experience for Fedora Users as it
> > reduces the amount of I/O and offsets CPU cost of package
> > decompression. RPM Copy on Write uses reflinking capabilities in
> > btrfs, which is the default filesystem in Fedora 33.
>
> A bunch of points here:
>
> - No, it's the default for one Edition.  Others don't default to it.  And 
> even for Workstation we can't *require* it because it's definitely supported 
> to use other filesystems and storage layouts.
>
> - Orthogonal to this, I'd also note that xfs supports reflinks too.
>
> Combining those I'd say instead e.g.: "Most Fedora Editions default to a 
> filesystem that support reflinks, e.g. btrfs or xfs" (actually I think IoT 
> defaults to ext4 for...probably they didn't consider it?)
>

It'd be more accurate to say most Fedora variants default to Btrfs.
The only exceptions right now are Cloud, Server, and CoreOS. But yes,
Fedora Server's current default of XFS on LVM means it also supports
reflinks.

As an aside, I *really* hate this split of terminology we have among
Editions, Spins, and Labs. It's confusing to everyone. :(

> - When talking about RPMs we need to think about container images, which use 
> overlayfs by default, which defers to the underlying filesystem for reflinks 
> - so should be fine, but should be explicitly written down (and tested)
>
> - Generally incompatible RPM payload changes cause pain proportional to how 
> far they're "not backported", e.g. if support for this isn't in Fedora N-1 
> (e.g. Fedora 32) it will be harder for current Koji/mock model.  Nowadays 
> many more people use podman than mock, which e.g. if using a RHEL8 host will 
> naturally avoid the dependency on an updated RPM.  But
>

Incomplete statement here?

That said, we don't have a problem in the Koji/Mock model anymore, as
bootstrap mode is now activated. Additionally, Mock uses
systemd-nspawn by default for all cases except for with Koji (which
overrides this because it can't handle nspawn mode at the moment).

> > # Decompression happens inline with download.
>
> rpm-ostree does this by default today BTW (rpms are unpacked into local 
> ostree commits in parallel even).
>
> > ## Regular RPMs use a compressed .cpio based payload. In contrast,
> > extent based RPMs contain uncompressed data aligned to the fundamental
> > page size of the architecture, e.g. 4KiB on x86_64. This alignment is
> > required for FICLONERANGE to work. Only files are
> > represented in the payload, other directory entries like symlinks,
> > device nodes etc are constructed entirely from rpm header information.
>
> This is the core change; some interesting tradeoffs here.  Python projects in 
> particular ship a lot of files smaller than 4k (classic example is 
> `__init__.py` which is zero sized).  And ppc64le is 64KiB pages right?  So 
> there will be "zero space" to align, right?  Would need some math to see how 
> much this would add up to, although I guess the implementation could instead 
> use holes?
>
> > Files are referenced by their digest, so identical files are
> > de-duplicated.
>
> But just inside a single RPM, right?  It's interesting to compare with ostree 
> which does this by default; conceptually this is using reflinks inside a 
> single RPM to do what ostree does system wide with hardlinks.
>
> BTW we learned a few things, notably zero sized files are tricky because 
> there can be a *lot* of them - see e.g. 
> https://github.com/ostreedev/ostree/pull/2197
> That one was too many hardlinks, but how well do filesystems like btrfs/xfs 
> handle thousands of reflinks instead?  The Python __init__.py thing is such a 
> pathological case...
>
> > # Disk space requirements are expected to be marginally higher than
> > before: all new packages or updates will consume their installed size
> > before installation instead of about half their size (regular rpms
> > with payloads still cost space).
>
> This won't matter much for small updates but could be quite noticeable for 
> larger system upgrades.
>
> This all said the more I think about this, wouldn't it be way simpler to 
> change rpm to support a "temporary root directory", e.g. `/usr/.rpmtemp` or 
> whatever.  Then dnf/zypper/etc cam do the unpack-and-download model without 
> any format changes to RPM - instead of reflinking it'd just be rename() into 
> place. This is effectively what rpm-ostree is doing today except with ostree 
> commits instead of a temporary directory.

Sure, this makes some degree of sense, but it doesn't reduce the IOPS
for actually *doing* the installation. My understanding is that this
Change is intended to reduce the thrashing when doing package
transactions.

This is also a flaw with RPM-OSTree, since you have to fetch
everything individually and construct the root by shifting hardlinks
or reflinks 

Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Davide Cavalca via devel
On Mon, 2020-12-21 at 18:00 +0100, Tomasz Torcz wrote:
> On Mon, Dec 21, 2020 at 11:28:51AM -0500, Ben Cotton wrote:
> > https://fedoraproject.org/wiki/Changes/RPMCoW 
> > # dnf-plugin-reflink (a new package):
> > https://github.com/facebookincubator/dnf-plugin-cow/
> 
>   Does not exists, but I've just noticed it mentioned in Current
> Status
> on Wiki:
>  3.2 Github repo needs to be published

Yeah, apologies for that, we wanted to get the Change proposal out asap
to start the discussion and gather feedback, but a few of the pieces
are still in the works. Specifically, the repo is currently pending
internal review and should be out soon.

Cheers
Davide
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Kevin Fenzi
Cool. A few questions inline... 

On Mon, Dec 21, 2020 at 11:28:51AM -0500, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/RPMCoW
> 
> 
> == Summary ==
> 
> RPM Copy on Write provides a better experience for Fedora Users as it
> reduces the amount of I/O and offsets CPU cost of package
> decompression. RPM Copy on Write uses reflinking capabilities in
> btrfs, which is the default filesystem in Fedora 33.

What happens if you enable this on non btrfs installs? 
Does it just not work gracefully? Does it fail somehow? 
I think we need to be sure it doesn't actually do anything bad for other
non CoW filesystems. 

...snip...
> ### Signature 8 bytes at the end of the file, used to differentiate
> between traditional RPMs and extent based.

So, there's no change to rpm building or signing, as all thats done in
transcoding them on download?

> === Notes ===
> 
> # The headers are preserved bit for bit during transcoding. This
> preserves signatures. The signatures cover the main header blob, and
> the main header blob ensures the integrity of data in two ways:
> ## Each file with content has a digest. Originally this was md5, but
> today it’s usually sha256. In normal RPM this is only used to verify
> the integrity of files, e.g. rpm -V. With CoW we use this
> as a content key.
> ## There is/are one or two digests (PAYLOADDIGEST and
> PAYLOADDIGESTALT) covering the payload archive
> (compressed cpio). The header value is preserved, but transcoded RPMs
> do not preserve the original structure so RPM’s pre-installation
> verification (controlled by %_pkgverify_level will fail.
> dnf-plugin-cow disables this check in dnf because it
> verifies the whole file digest which is captured during

Could rpm learn about this and still do it's verify in this case?

> download/transcoding. The second one is likely used for delta rpm.
> # This is untested, and possibly incompatible with delta RPM (drpm).
> The process for reconstructing an rpm to install from a delta is
> expensive from both a CPU and I/O perspective, while only providing
> marginal benefits on download size. It is expected that having delta
> rpm enabled (which is the default) will be handled gracefully.

I imagine drpms could still be used, just once you have constructed the
final rpm, you transcode it as if you just downloaded it?

But in general perhaps we should decide how much value drpms provide
these days and either make sure we are making more of them, or drop
them. 

...snip...
> === Performance Metrics ===
> 
> Ballpark performance difference is about half the duration for file
> download+install time. A lot of rpms are very small, so it’s difficult
> to see/measure. Larger RPMs give much clearer signal.
> 
> (Actual numbers/charts will be supplied in Jan 2021)

Nice!

kevin


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Colin Walters


On Mon, Dec 21, 2020, at 11:28 AM, Ben Cotton wrote:
> 
> 
> 
> == Summary ==
> 
> RPM Copy on Write provides a better experience for Fedora Users as it
> reduces the amount of I/O and offsets CPU cost of package
> decompression. RPM Copy on Write uses reflinking capabilities in
> btrfs, which is the default filesystem in Fedora 33.

A bunch of points here:

- No, it's the default for one Edition.  Others don't default to it.  And even 
for Workstation we can't *require* it because it's definitely supported to use 
other filesystems and storage layouts. 

- Orthogonal to this, I'd also note that xfs supports reflinks too.

Combining those I'd say instead e.g.: "Most Fedora Editions default to a 
filesystem that support reflinks, e.g. btrfs or xfs" (actually I think IoT 
defaults to ext4 for...probably they didn't consider it?)

- When talking about RPMs we need to think about container images, which use 
overlayfs by default, which defers to the underlying filesystem for reflinks - 
so should be fine, but should be explicitly written down (and tested)

- Generally incompatible RPM payload changes cause pain proportional to how far 
they're "not backported", e.g. if support for this isn't in Fedora N-1 (e.g. 
Fedora 32) it will be harder for current Koji/mock model.  Nowadays many more 
people use podman than mock, which e.g. if using a RHEL8 host will naturally 
avoid the dependency on an updated RPM.  But 

> # Decompression happens inline with download. 

rpm-ostree does this by default today BTW (rpms are unpacked into local ostree 
commits in parallel even).

> ## Regular RPMs use a compressed .cpio based payload. In contrast,
> extent based RPMs contain uncompressed data aligned to the fundamental
> page size of the architecture, e.g. 4KiB on x86_64. This alignment is
> required for FICLONERANGE to work. Only files are
> represented in the payload, other directory entries like symlinks,
> device nodes etc are constructed entirely from rpm header information.

This is the core change; some interesting tradeoffs here.  Python projects in 
particular ship a lot of files smaller than 4k (classic example is 
`__init__.py` which is zero sized).  And ppc64le is 64KiB pages right?  So 
there will be "zero space" to align, right?  Would need some math to see how 
much this would add up to, although I guess the implementation could instead 
use holes?

> Files are referenced by their digest, so identical files are
> de-duplicated.

But just inside a single RPM, right?  It's interesting to compare with ostree 
which does this by default; conceptually this is using reflinks inside a single 
RPM to do what ostree does system wide with hardlinks.

BTW we learned a few things, notably zero sized files are tricky because there 
can be a *lot* of them - see e.g. https://github.com/ostreedev/ostree/pull/2197
That one was too many hardlinks, but how well do filesystems like btrfs/xfs 
handle thousands of reflinks instead?  The Python __init__.py thing is such a 
pathological case...

> # Disk space requirements are expected to be marginally higher than
> before: all new packages or updates will consume their installed size
> before installation instead of about half their size (regular rpms
> with payloads still cost space).

This won't matter much for small updates but could be quite noticeable for 
larger system upgrades.

This all said the more I think about this, wouldn't it be way simpler to change 
rpm to support a "temporary root directory", e.g. `/usr/.rpmtemp` or whatever.  
Then dnf/zypper/etc cam do the unpack-and-download model without any format 
changes to RPM - instead of reflinking it'd just be rename() into place. This 
is effectively what rpm-ostree is doing today except with ostree commits 
instead of a temporary directory.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Neal Gompa
On Mon, Dec 21, 2020 at 11:58 AM Michael Catanzaro  wrote:
>
>
> On Mon, Dec 21, 2020 at 11:39 am, Neal Gompa  wrote:
> > This is very exciting! There is one thing, though: we need a libdnf
> > plugin for PackageKit to use too. "DNF plugins" are at the Python
> > layer, and libdnf has its own plugin system that C/C++ consumers can
> > use. So if both a libdnf and a dnf plugin exist, then the experience
> > is consistent between PK and DNF.
> >
> > But that leads to my other question: why not just integrate this into
> > libdnf and turn it into an option that can be activated in
> > /etc/dnf/dnf.conf? That seems to be the most straightforward way to do
> > this.
>
>  From the change proposal:
>
> # Current implementation of dnf-plugin-cow is in Python,
> but it looks possible to implement this in libdnf instead
> which would make it work in packagekit
>

Gah! I missed that. :)



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Tomasz Torcz
On Mon, Dec 21, 2020 at 11:28:51AM -0500, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/RPMCoW

> # dnf-plugin-reflink (a new package):
> https://github.com/facebookincubator/dnf-plugin-cow/

  Does not exists, but I've just noticed it mentioned in Current Status
on Wiki:
 3.2 Github repo needs to be published

-- 
Tomasz Torcz  “If you try to upissue this patchset I shall be 
seeking
to...@pipebreaker.pl   an IP-routable hand grenade.”  — Andrew Morton (LKML)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Michael Catanzaro


On Mon, Dec 21, 2020 at 11:39 am, Neal Gompa  wrote:

This is very exciting! There is one thing, though: we need a libdnf
plugin for PackageKit to use too. "DNF plugins" are at the Python
layer, and libdnf has its own plugin system that C/C++ consumers can
use. So if both a libdnf and a dnf plugin exist, then the experience
is consistent between PK and DNF.

But that leads to my other question: why not just integrate this into
libdnf and turn it into an option that can be activated in
/etc/dnf/dnf.conf? That seems to be the most straightforward way to do
this.


From the change proposal:

# Current implementation of dnf-plugin-cow is in Python,
but it looks possible to implement this in libdnf instead
which would make it work in packagekit

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Robert Marcano via devel

On 12/21/20 12:28 PM, Ben Cotton wrote:

...

=== New process ===

# Resolve packaging request into a list of packages and operations
# Download and '''decompress''' packages into a '''locally optimized''' rpm file
# Install and/or upgrade packages sequentially using RPM files, using
'''reference linking''' (reflinking) to reuse data already on disk.


This sound great because free space requirements can be reduced, 
specially when installing new packages.


I have experimented building very small appliances using btrfs 
compression on things like /usr/share. So I think this could disrupt 
this because if I am correct the extends will be first downloaded to a 
temporary directory without compression enabled.


I am happy with an option to disable this behavior.



The outcome is intended to be the same, but the order of operations is
different.

# Decompression happens inline with download. This has a positive
effect on resource usage: downloads are typically limited by
bandwidth. Decompression and writing the full data into a single file
per rpm is essentially free. Additionally: if there is more than one
download at a time, a multi-CPU system can be better utilized. All
compression types supported in RPM work because this uses the rpm I/O
functions.
# RPMs are cached on local storage between downloading and
installation time as normal. This allows DNF to defer actual RPM
installation to when all the RPM are available. This is unchanged.
# The file format for RPMs is different with Copy on Write. The
headers are identical, but the payload is different. There is also a
footer.
## Files are converted (“transcoded”) locally during download using
/usr/bin/rpm2extents (part of rpm codebase). The format
is not intended to be “portable” - i.e. copying the files from the
cache is not supported.
## Regular RPMs use a compressed .cpio based payload. In contrast,
extent based RPMs contain uncompressed data aligned to the fundamental
page size of the architecture, e.g. 4KiB on x86_64. This alignment is
required for FICLONERANGE to work. Only files are
represented in the payload, other directory entries like symlinks,
device nodes etc are constructed entirely from rpm header information.
Files are referenced by their digest, so identical files are
de-duplicated.
## The footer currently has three sections
### Table of original (rpm) file digests, used to validate the
integrity of the download in dnf.
### Table of digest → offset used when actually installing files.
### Signature 8 bytes at the end of the file, used to differentiate
between traditional RPMs and extent based.

=== Notes ===

# The headers are preserved bit for bit during transcoding. This
preserves signatures. The signatures cover the main header blob, and
the main header blob ensures the integrity of data in two ways:
## Each file with content has a digest. Originally this was md5, but
today it’s usually sha256. In normal RPM this is only used to verify
the integrity of files, e.g. rpm -V. With CoW we use this
as a content key.
## There is/are one or two digests (PAYLOADDIGEST and
PAYLOADDIGESTALT) covering the payload archive
(compressed cpio). The header value is preserved, but transcoded RPMs
do not preserve the original structure so RPM’s pre-installation
verification (controlled by %_pkgverify_level will fail.
dnf-plugin-cow disables this check in dnf because it
verifies the whole file digest which is captured during
download/transcoding. The second one is likely used for delta rpm.
# This is untested, and possibly incompatible with delta RPM (drpm).
The process for reconstructing an rpm to install from a delta is
expensive from both a CPU and I/O perspective, while only providing
marginal benefits on download size. It is expected that having delta
rpm enabled (which is the default) will be handled gracefully.
# Disk space requirements are expected to be marginally higher than
before: all new packages or updates will consume their installed size
before installation instead of about half their size (regular rpms
with payloads still cost space).
# rpm-plugin-reflink will fall back to simple file
copying when the destination path is not on the same
filesystem/subvolume. A common example is /boot and/or
/boot/efi.
# The system will still work on other filesystem types, but will
''always'' fall back to simple copying. This is expected to be
slightly slower than not enabling CoW because the source for copying
will be the decompressed data.
# For systems that enable transparent filesystem compression: every
file will continue to be decompressed from the original rpm, and then
transparently re-compressed by the filesystem. There is no effective
change here. There is a future project to investigate alternate
distribution mechanics to provide parallel versions of file content
pre-compressed in a filesystem specific format, reducing both CPU
costs and I/O. It is expected that this will result in slightly higher
network utilization because filesystem 

Re: Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Neal Gompa
On Mon, Dec 21, 2020 at 11:29 AM Ben Cotton  wrote:
>
> https://fedoraproject.org/wiki/Changes/RPMCoW
>
>
> == Summary ==
>
> RPM Copy on Write provides a better experience for Fedora Users as it
> reduces the amount of I/O and offsets CPU cost of package
> decompression. RPM Copy on Write uses reflinking capabilities in
> btrfs, which is the default filesystem in Fedora 33.
>
> == Owners ==
>
> * Name: [[User:malmond|Matthew Almond]], [[User:dcavalca|Davide Cavalca]]
> * Email: malm...@fb.com, dcava...@fb.com
>
>
> == Detailed description ==
>
> Installing and upgrading software packages is a standard part of
> managing the lifecycle of any operating system. For the entire
> lifecycle of Fedora, all software is packaged and distributed using
> the RPM file fomat. This proposal changes how software is downloaded
> and installed, leaving the distribution process unmodified.
>
> === Current process ===
>
> # Resolve packaging request into a list of packages and operations
> # Download and verify new packages
> # Install and/or upgrade packages sequentially using RPM files,
> decompressing, and writing a copy of the new files to storage.
>
> === New process ===
>
> # Resolve packaging request into a list of packages and operations
> # Download and '''decompress''' packages into a '''locally optimized''' rpm 
> file
> # Install and/or upgrade packages sequentially using RPM files, using
> '''reference linking''' (reflinking) to reuse data already on disk.
>
> The outcome is intended to be the same, but the order of operations is
> different.
>
> # Decompression happens inline with download. This has a positive
> effect on resource usage: downloads are typically limited by
> bandwidth. Decompression and writing the full data into a single file
> per rpm is essentially free. Additionally: if there is more than one
> download at a time, a multi-CPU system can be better utilized. All
> compression types supported in RPM work because this uses the rpm I/O
> functions.
> # RPMs are cached on local storage between downloading and
> installation time as normal. This allows DNF to defer actual RPM
> installation to when all the RPM are available. This is unchanged.
> # The file format for RPMs is different with Copy on Write. The
> headers are identical, but the payload is different. There is also a
> footer.
> ## Files are converted (“transcoded”) locally during download using
> /usr/bin/rpm2extents (part of rpm codebase). The format
> is not intended to be “portable” - i.e. copying the files from the
> cache is not supported.
> ## Regular RPMs use a compressed .cpio based payload. In contrast,
> extent based RPMs contain uncompressed data aligned to the fundamental
> page size of the architecture, e.g. 4KiB on x86_64. This alignment is
> required for FICLONERANGE to work. Only files are
> represented in the payload, other directory entries like symlinks,
> device nodes etc are constructed entirely from rpm header information.
> Files are referenced by their digest, so identical files are
> de-duplicated.
> ## The footer currently has three sections
> ### Table of original (rpm) file digests, used to validate the
> integrity of the download in dnf.
> ### Table of digest → offset used when actually installing files.
> ### Signature 8 bytes at the end of the file, used to differentiate
> between traditional RPMs and extent based.
>
> === Notes ===
>
> # The headers are preserved bit for bit during transcoding. This
> preserves signatures. The signatures cover the main header blob, and
> the main header blob ensures the integrity of data in two ways:
> ## Each file with content has a digest. Originally this was md5, but
> today it’s usually sha256. In normal RPM this is only used to verify
> the integrity of files, e.g. rpm -V. With CoW we use this
> as a content key.
> ## There is/are one or two digests (PAYLOADDIGEST and
> PAYLOADDIGESTALT) covering the payload archive
> (compressed cpio). The header value is preserved, but transcoded RPMs
> do not preserve the original structure so RPM’s pre-installation
> verification (controlled by %_pkgverify_level will fail.
> dnf-plugin-cow disables this check in dnf because it
> verifies the whole file digest which is captured during
> download/transcoding. The second one is likely used for delta rpm.
> # This is untested, and possibly incompatible with delta RPM (drpm).
> The process for reconstructing an rpm to install from a delta is
> expensive from both a CPU and I/O perspective, while only providing
> marginal benefits on download size. It is expected that having delta
> rpm enabled (which is the default) will be handled gracefully.
> # Disk space requirements are expected to be marginally higher than
> before: all new packages or updates will consume their installed size
> before installation instead of about half their size (regular rpms
> with payloads still cost space).
> # rpm-plugin-reflink will fall back to simple file
> copying when the destination path is not on 

Fedora 34 Change: DNF/RPM Copy on Write enablement for all variants (System-Wide Change)

2020-12-21 Thread Ben Cotton
https://fedoraproject.org/wiki/Changes/RPMCoW


== Summary ==

RPM Copy on Write provides a better experience for Fedora Users as it
reduces the amount of I/O and offsets CPU cost of package
decompression. RPM Copy on Write uses reflinking capabilities in
btrfs, which is the default filesystem in Fedora 33.

== Owners ==

* Name: [[User:malmond|Matthew Almond]], [[User:dcavalca|Davide Cavalca]]
* Email: malm...@fb.com, dcava...@fb.com


== Detailed description ==

Installing and upgrading software packages is a standard part of
managing the lifecycle of any operating system. For the entire
lifecycle of Fedora, all software is packaged and distributed using
the RPM file fomat. This proposal changes how software is downloaded
and installed, leaving the distribution process unmodified.

=== Current process ===

# Resolve packaging request into a list of packages and operations
# Download and verify new packages
# Install and/or upgrade packages sequentially using RPM files,
decompressing, and writing a copy of the new files to storage.

=== New process ===

# Resolve packaging request into a list of packages and operations
# Download and '''decompress''' packages into a '''locally optimized''' rpm file
# Install and/or upgrade packages sequentially using RPM files, using
'''reference linking''' (reflinking) to reuse data already on disk.

The outcome is intended to be the same, but the order of operations is
different.

# Decompression happens inline with download. This has a positive
effect on resource usage: downloads are typically limited by
bandwidth. Decompression and writing the full data into a single file
per rpm is essentially free. Additionally: if there is more than one
download at a time, a multi-CPU system can be better utilized. All
compression types supported in RPM work because this uses the rpm I/O
functions.
# RPMs are cached on local storage between downloading and
installation time as normal. This allows DNF to defer actual RPM
installation to when all the RPM are available. This is unchanged.
# The file format for RPMs is different with Copy on Write. The
headers are identical, but the payload is different. There is also a
footer.
## Files are converted (“transcoded”) locally during download using
/usr/bin/rpm2extents (part of rpm codebase). The format
is not intended to be “portable” - i.e. copying the files from the
cache is not supported.
## Regular RPMs use a compressed .cpio based payload. In contrast,
extent based RPMs contain uncompressed data aligned to the fundamental
page size of the architecture, e.g. 4KiB on x86_64. This alignment is
required for FICLONERANGE to work. Only files are
represented in the payload, other directory entries like symlinks,
device nodes etc are constructed entirely from rpm header information.
Files are referenced by their digest, so identical files are
de-duplicated.
## The footer currently has three sections
### Table of original (rpm) file digests, used to validate the
integrity of the download in dnf.
### Table of digest → offset used when actually installing files.
### Signature 8 bytes at the end of the file, used to differentiate
between traditional RPMs and extent based.

=== Notes ===

# The headers are preserved bit for bit during transcoding. This
preserves signatures. The signatures cover the main header blob, and
the main header blob ensures the integrity of data in two ways:
## Each file with content has a digest. Originally this was md5, but
today it’s usually sha256. In normal RPM this is only used to verify
the integrity of files, e.g. rpm -V. With CoW we use this
as a content key.
## There is/are one or two digests (PAYLOADDIGEST and
PAYLOADDIGESTALT) covering the payload archive
(compressed cpio). The header value is preserved, but transcoded RPMs
do not preserve the original structure so RPM’s pre-installation
verification (controlled by %_pkgverify_level will fail.
dnf-plugin-cow disables this check in dnf because it
verifies the whole file digest which is captured during
download/transcoding. The second one is likely used for delta rpm.
# This is untested, and possibly incompatible with delta RPM (drpm).
The process for reconstructing an rpm to install from a delta is
expensive from both a CPU and I/O perspective, while only providing
marginal benefits on download size. It is expected that having delta
rpm enabled (which is the default) will be handled gracefully.
# Disk space requirements are expected to be marginally higher than
before: all new packages or updates will consume their installed size
before installation instead of about half their size (regular rpms
with payloads still cost space).
# rpm-plugin-reflink will fall back to simple file
copying when the destination path is not on the same
filesystem/subvolume. A common example is /boot and/or
/boot/efi.
# The system will still work on other filesystem types, but will
''always'' fall back to simple copying. This is expected to be
slightly slower than not enabling CoW