Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

2018-08-09 Thread Zbigniew Jędrzejewski-Szmek
> On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan  wrote:
> > $ GET 
> > http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps
> > /bin/csh
> > /bin/grep
> > /bin/perl
> > /usr/bin/ln
> > /usr/bin/rm
> > /sbin/service
> > /usr/bin/chattr
> > /usr/bin/guile
> > /usr/bin/openssl
> > /usr/bin/pear
> > /usr/bin/texhash
> > /usr/bin/tr
> > /usr/bin/which
> > /usr/sbin/groupadd
> > /usr/sbin/groupdel
> > /usr/sbin/useradd
> > /usr/sbin/userdel
This gives us the Mandriva/Mageia/Mandrake behaviour.

For Fedora, we need to look at createrepo_c. There was some
uncertainty whether e.g. /usr/libexec paths are in primary.xml. It
turns out they are *not*, and the whitelist is anything that matches
/etc|/usr/lib/sendmail|bin/ [1]. So we have paths like
/usr/share/awstats/wwwroot/cgi-bin/awredir.pl and 
/var/www/moodle/web/admin/tool/recyclebin/classes/base_bin.php (sic!)
in primary.xml.

It seems that this behaviour is accidental and arbitrary. Adding the
list of pattern to primary.xml seems like a good first step. I hope
we can later clean up up the patterns to only match '^/usr/s?bin/'...

[1] 
https://github.com/rpm-software-management/createrepo_c/blob/master/src/misc.h#L110-L118

> So the primary.xml already includes all that. If you actually look in
> the primary.xml.gz files in the Mageia rpm-md data, those are already
> there. The problem is that there are people who actually request files
> outside of the base whitelist as a means to be able to request
> "things" without knowing how they are packaged, because the file path
> is the consistent thing across distros. This is supported in YUM and
> DNF, just slightly differently.
> 
> In this case, the wish is to restore the YUM behavior. The idea is
> that stacking this on top of the Zchunk deltarepo extension will yield
> incredible boosts for everything.
Yes!

Zbyszek
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

2018-08-09 Thread Jeff Johnson


> On Aug 9, 2018, at 4:12 AM, Vít Ondruch  wrote:
> 
> 
> 
> Dne 9.8.2018 v 07:34 Neal Gompa napsal(a):
>>> On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan  wrote:
 On 7 August 2018 at 09:50, Michael Schroeder  wrote:
> On Mon, Aug 06, 2018 at 04:36:07PM +, Zbigniew J??drzejewski-Szmek 
> wrote:
> this mail is a continuation of an FPC [1] and a FESCo [2] tickets.
> 
> A proposal was made is to disallow packages in Fedora from using file
> deps, and to optimize dnf to not load filelists.xml. File deps would
> still be supported, because external packages and users want to use
> them, but they would not be allowed for distro packages.
> 
> Not downloading or loading filelists.xml which are required for file
> deps would provide significant bandwidth savings (~47 MB compressed)
> and noticeable runtime savings (~10s at dnf startup) in many common
> cases.
> 
> So this is something that is worth exploring, but it's not clear if it
> is at all feasible.
 There's also something that can easily be done and would make
 loading the filelist unneeded in most of the cases: extend the
 primary filelist to include some whitelist of files. The whitelist
 must also be stored in the primary data, so that the solver knows
 what to expect.
>>> That's what Mandrake/Mandriva/Mageia/... has been doing for many
>>> years, there is a small file-deps file containing the ones we end up
>>> generating, mostly from scriptlets IIRC, and we end up with provides
>>> added for those in the main metadata when generating it. Then file
>>> lists are lazily loaded when people want to query them but not used
>>> for dependency resolution.
>>> 
>>> $ GET 
>>> http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps
>>> /bin/csh
>>> /bin/grep
>>> /bin/perl
>>> /usr/bin/ln
>>> /usr/bin/rm
>>> /sbin/service
>>> /usr/bin/chattr
>>> /usr/bin/guile
>>> /usr/bin/openssl
>>> /usr/bin/pear
>>> /usr/bin/texhash
>>> /usr/bin/tr
>>> /usr/bin/which
>>> /usr/sbin/groupadd
>>> /usr/sbin/groupdel
>>> /usr/sbin/useradd
>>> /usr/sbin/userdel
>> So the primary.xml already includes all that. If you actually look in
>> the primary.xml.gz files in the Mageia rpm-md data, those are already
>> there. The problem is that there are people who actually request files
>> outside of the base whitelist as a means to be able to request
>> "things" without knowing how they are packaged, because the file path
>> is the consistent thing across distros.
> 
> 
> So couldn't be createrepo actually extended in a way that if it
> identifies package, which has "Requires: /some/random/path" and at the
> same time, the "/some/random/path" is actually included in the
> repository, such file/package would be included in primary.xml.gz? This
> would help with huge repositories, since there is the highest cost of
> downloading filelist.xml.
> 

Creating a tool to automate generating the list of file dependencies in a 
whitelist is a sound idea.

A separate tool instead of bundling into createrepo may be simpler for two 
reasons:

1) the whitelist is not just existence, but also policy control: some file 
dependencies may not be permitted because of policy.

2) the whitelist must be complete before the markup is generated: this forces 
two passes on the packages, first to find the whitelist, then to generate 
primary.xml with permitted file paths.

(aside)
Note that file paths can appear in all dependencies, not just Requires:, even 
though Requires: is by far the most common usage case for rpm depsolvers which 
typically do not attempt back tracking (I.e. removing installed packages to 
avoid Conflicts:).

73 de Jeff
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

2018-08-09 Thread Vít Ondruch



Dne 9.8.2018 v 07:34 Neal Gompa napsal(a):
> On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan  wrote:
>> On 7 August 2018 at 09:50, Michael Schroeder  wrote:
>>> On Mon, Aug 06, 2018 at 04:36:07PM +, Zbigniew J??drzejewski-Szmek 
>>> wrote:
 this mail is a continuation of an FPC [1] and a FESCo [2] tickets.

 A proposal was made is to disallow packages in Fedora from using file
 deps, and to optimize dnf to not load filelists.xml. File deps would
 still be supported, because external packages and users want to use
 them, but they would not be allowed for distro packages.

 Not downloading or loading filelists.xml which are required for file
 deps would provide significant bandwidth savings (~47 MB compressed)
 and noticeable runtime savings (~10s at dnf startup) in many common
 cases.

 So this is something that is worth exploring, but it's not clear if it
 is at all feasible.
>>> There's also something that can easily be done and would make
>>> loading the filelist unneeded in most of the cases: extend the
>>> primary filelist to include some whitelist of files. The whitelist
>>> must also be stored in the primary data, so that the solver knows
>>> what to expect.
>> That's what Mandrake/Mandriva/Mageia/... has been doing for many
>> years, there is a small file-deps file containing the ones we end up
>> generating, mostly from scriptlets IIRC, and we end up with provides
>> added for those in the main metadata when generating it. Then file
>> lists are lazily loaded when people want to query them but not used
>> for dependency resolution.
>>
>> $ GET 
>> http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps
>> /bin/csh
>> /bin/grep
>> /bin/perl
>> /usr/bin/ln
>> /usr/bin/rm
>> /sbin/service
>> /usr/bin/chattr
>> /usr/bin/guile
>> /usr/bin/openssl
>> /usr/bin/pear
>> /usr/bin/texhash
>> /usr/bin/tr
>> /usr/bin/which
>> /usr/sbin/groupadd
>> /usr/sbin/groupdel
>> /usr/sbin/useradd
>> /usr/sbin/userdel
>>
> So the primary.xml already includes all that. If you actually look in
> the primary.xml.gz files in the Mageia rpm-md data, those are already
> there. The problem is that there are people who actually request files
> outside of the base whitelist as a means to be able to request
> "things" without knowing how they are packaged, because the file path
> is the consistent thing across distros.


So couldn't be createrepo actually extended in a way that if it
identifies package, which has "Requires: /some/random/path" and at the
same time, the "/some/random/path" is actually included in the
repository, such file/package would be included in primary.xml.gz? This
would help with huge repositories, since there is the highest cost of
downloading filelist.xml.


V.

___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

2018-08-08 Thread Neal Gompa
On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan  wrote:
>
> On 7 August 2018 at 09:50, Michael Schroeder  wrote:
> > On Mon, Aug 06, 2018 at 04:36:07PM +, Zbigniew J??drzejewski-Szmek 
> > wrote:
> >> this mail is a continuation of an FPC [1] and a FESCo [2] tickets.
> >>
> >> A proposal was made is to disallow packages in Fedora from using file
> >> deps, and to optimize dnf to not load filelists.xml. File deps would
> >> still be supported, because external packages and users want to use
> >> them, but they would not be allowed for distro packages.
> >>
> >> Not downloading or loading filelists.xml which are required for file
> >> deps would provide significant bandwidth savings (~47 MB compressed)
> >> and noticeable runtime savings (~10s at dnf startup) in many common
> >> cases.
> >>
> >> So this is something that is worth exploring, but it's not clear if it
> >> is at all feasible.
> >
> > There's also something that can easily be done and would make
> > loading the filelist unneeded in most of the cases: extend the
> > primary filelist to include some whitelist of files. The whitelist
> > must also be stored in the primary data, so that the solver knows
> > what to expect.
>
> That's what Mandrake/Mandriva/Mageia/... has been doing for many
> years, there is a small file-deps file containing the ones we end up
> generating, mostly from scriptlets IIRC, and we end up with provides
> added for those in the main metadata when generating it. Then file
> lists are lazily loaded when people want to query them but not used
> for dependency resolution.
>
> $ GET 
> http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps
> /bin/csh
> /bin/grep
> /bin/perl
> /usr/bin/ln
> /usr/bin/rm
> /sbin/service
> /usr/bin/chattr
> /usr/bin/guile
> /usr/bin/openssl
> /usr/bin/pear
> /usr/bin/texhash
> /usr/bin/tr
> /usr/bin/which
> /usr/sbin/groupadd
> /usr/sbin/groupdel
> /usr/sbin/useradd
> /usr/sbin/userdel
>

So the primary.xml already includes all that. If you actually look in
the primary.xml.gz files in the Mageia rpm-md data, those are already
there. The problem is that there are people who actually request files
outside of the base whitelist as a means to be able to request
"things" without knowing how they are packaged, because the file path
is the consistent thing across distros. This is supported in YUM and
DNF, just slightly differently.

In this case, the wish is to restore the YUM behavior. The idea is
that stacking this on top of the Zchunk deltarepo extension will yield
incredible boosts for everything.


--
真実はいつも一つ!/ Always, there's only one truth!
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


[Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf

2018-08-06 Thread Zbigniew Jędrzejewski-Szmek
Hi dnf and libsolv developers,

this mail is a continuation of an FPC [1] and a FESCo [2] tickets.

A proposal was made is to disallow packages in Fedora from using file
deps, and to optimize dnf to not load filelists.xml. File deps would
still be supported, because external packages and users want to use
them, but they would not be allowed for distro packages.

Not downloading or loading filelists.xml which are required for file
deps would provide significant bandwidth savings (~47 MB compressed)
and noticeable runtime savings (~10s at dnf startup) in many common
cases.

So this is something that is worth exploring, but it's not clear if it
is at all feasible. It seems that dnf would need to support loading
filelists.xml lazily. In the mailing list discussions, some people
said that this would be hard, some people said that it would be
possible… What is the situation here? IIUC, dnf would need to restart
the resolution of a transaction mid-flight once it encounters a file dep,
which would require support across the different layers.
If Fedora commits to making use of this, would it be possible to
implement this in dnf? What kind of changes would be required?

[1] https://pagure.io/packaging-committee/issue/714
[2] https://pagure.io/fesco/issue/1955

Zbyszek, on behalf of FESCo (but not that this writeup is based
on my understanding, so all errors are mine.)
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem