Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf
> On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan wrote: > > $ GET > > http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps > > /bin/csh > > /bin/grep > > /bin/perl > > /usr/bin/ln > > /usr/bin/rm > > /sbin/service > > /usr/bin/chattr > > /usr/bin/guile > > /usr/bin/openssl > > /usr/bin/pear > > /usr/bin/texhash > > /usr/bin/tr > > /usr/bin/which > > /usr/sbin/groupadd > > /usr/sbin/groupdel > > /usr/sbin/useradd > > /usr/sbin/userdel This gives us the Mandriva/Mageia/Mandrake behaviour. For Fedora, we need to look at createrepo_c. There was some uncertainty whether e.g. /usr/libexec paths are in primary.xml. It turns out they are *not*, and the whitelist is anything that matches /etc|/usr/lib/sendmail|bin/ [1]. So we have paths like /usr/share/awstats/wwwroot/cgi-bin/awredir.pl and /var/www/moodle/web/admin/tool/recyclebin/classes/base_bin.php (sic!) in primary.xml. It seems that this behaviour is accidental and arbitrary. Adding the list of pattern to primary.xml seems like a good first step. I hope we can later clean up up the patterns to only match '^/usr/s?bin/'... [1] https://github.com/rpm-software-management/createrepo_c/blob/master/src/misc.h#L110-L118 > So the primary.xml already includes all that. If you actually look in > the primary.xml.gz files in the Mageia rpm-md data, those are already > there. The problem is that there are people who actually request files > outside of the base whitelist as a means to be able to request > "things" without knowing how they are packaged, because the file path > is the consistent thing across distros. This is supported in YUM and > DNF, just slightly differently. > > In this case, the wish is to restore the YUM behavior. The idea is > that stacking this on top of the Zchunk deltarepo extension will yield > incredible boosts for everything. Yes! Zbyszek ___ Rpm-ecosystem mailing list Rpm-ecosystem@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-ecosystem
Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf
> On Aug 9, 2018, at 4:12 AM, Vít Ondruch wrote: > > > > Dne 9.8.2018 v 07:34 Neal Gompa napsal(a): >>> On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan wrote: On 7 August 2018 at 09:50, Michael Schroeder wrote: > On Mon, Aug 06, 2018 at 04:36:07PM +, Zbigniew J??drzejewski-Szmek > wrote: > this mail is a continuation of an FPC [1] and a FESCo [2] tickets. > > A proposal was made is to disallow packages in Fedora from using file > deps, and to optimize dnf to not load filelists.xml. File deps would > still be supported, because external packages and users want to use > them, but they would not be allowed for distro packages. > > Not downloading or loading filelists.xml which are required for file > deps would provide significant bandwidth savings (~47 MB compressed) > and noticeable runtime savings (~10s at dnf startup) in many common > cases. > > So this is something that is worth exploring, but it's not clear if it > is at all feasible. There's also something that can easily be done and would make loading the filelist unneeded in most of the cases: extend the primary filelist to include some whitelist of files. The whitelist must also be stored in the primary data, so that the solver knows what to expect. >>> That's what Mandrake/Mandriva/Mageia/... has been doing for many >>> years, there is a small file-deps file containing the ones we end up >>> generating, mostly from scriptlets IIRC, and we end up with provides >>> added for those in the main metadata when generating it. Then file >>> lists are lazily loaded when people want to query them but not used >>> for dependency resolution. >>> >>> $ GET >>> http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps >>> /bin/csh >>> /bin/grep >>> /bin/perl >>> /usr/bin/ln >>> /usr/bin/rm >>> /sbin/service >>> /usr/bin/chattr >>> /usr/bin/guile >>> /usr/bin/openssl >>> /usr/bin/pear >>> /usr/bin/texhash >>> /usr/bin/tr >>> /usr/bin/which >>> /usr/sbin/groupadd >>> /usr/sbin/groupdel >>> /usr/sbin/useradd >>> /usr/sbin/userdel >> So the primary.xml already includes all that. If you actually look in >> the primary.xml.gz files in the Mageia rpm-md data, those are already >> there. The problem is that there are people who actually request files >> outside of the base whitelist as a means to be able to request >> "things" without knowing how they are packaged, because the file path >> is the consistent thing across distros. > > > So couldn't be createrepo actually extended in a way that if it > identifies package, which has "Requires: /some/random/path" and at the > same time, the "/some/random/path" is actually included in the > repository, such file/package would be included in primary.xml.gz? This > would help with huge repositories, since there is the highest cost of > downloading filelist.xml. > Creating a tool to automate generating the list of file dependencies in a whitelist is a sound idea. A separate tool instead of bundling into createrepo may be simpler for two reasons: 1) the whitelist is not just existence, but also policy control: some file dependencies may not be permitted because of policy. 2) the whitelist must be complete before the markup is generated: this forces two passes on the packages, first to find the whitelist, then to generate primary.xml with permitted file paths. (aside) Note that file paths can appear in all dependencies, not just Requires:, even though Requires: is by far the most common usage case for rpm depsolvers which typically do not attempt back tracking (I.e. removing installed packages to avoid Conflicts:). 73 de Jeff ___ Rpm-ecosystem mailing list Rpm-ecosystem@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-ecosystem
Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf
Dne 9.8.2018 v 07:34 Neal Gompa napsal(a): > On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan wrote: >> On 7 August 2018 at 09:50, Michael Schroeder wrote: >>> On Mon, Aug 06, 2018 at 04:36:07PM +, Zbigniew J??drzejewski-Szmek >>> wrote: this mail is a continuation of an FPC [1] and a FESCo [2] tickets. A proposal was made is to disallow packages in Fedora from using file deps, and to optimize dnf to not load filelists.xml. File deps would still be supported, because external packages and users want to use them, but they would not be allowed for distro packages. Not downloading or loading filelists.xml which are required for file deps would provide significant bandwidth savings (~47 MB compressed) and noticeable runtime savings (~10s at dnf startup) in many common cases. So this is something that is worth exploring, but it's not clear if it is at all feasible. >>> There's also something that can easily be done and would make >>> loading the filelist unneeded in most of the cases: extend the >>> primary filelist to include some whitelist of files. The whitelist >>> must also be stored in the primary data, so that the solver knows >>> what to expect. >> That's what Mandrake/Mandriva/Mageia/... has been doing for many >> years, there is a small file-deps file containing the ones we end up >> generating, mostly from scriptlets IIRC, and we end up with provides >> added for those in the main metadata when generating it. Then file >> lists are lazily loaded when people want to query them but not used >> for dependency resolution. >> >> $ GET >> http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps >> /bin/csh >> /bin/grep >> /bin/perl >> /usr/bin/ln >> /usr/bin/rm >> /sbin/service >> /usr/bin/chattr >> /usr/bin/guile >> /usr/bin/openssl >> /usr/bin/pear >> /usr/bin/texhash >> /usr/bin/tr >> /usr/bin/which >> /usr/sbin/groupadd >> /usr/sbin/groupdel >> /usr/sbin/useradd >> /usr/sbin/userdel >> > So the primary.xml already includes all that. If you actually look in > the primary.xml.gz files in the Mageia rpm-md data, those are already > there. The problem is that there are people who actually request files > outside of the base whitelist as a means to be able to request > "things" without knowing how they are packaged, because the file path > is the consistent thing across distros. So couldn't be createrepo actually extended in a way that if it identifies package, which has "Requires: /some/random/path" and at the same time, the "/some/random/path" is actually included in the repository, such file/package would be included in primary.xml.gz? This would help with huge repositories, since there is the highest cost of downloading filelist.xml. V. ___ Rpm-ecosystem mailing list Rpm-ecosystem@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-ecosystem
Re: [Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf
On Wed, Aug 8, 2018 at 7:09 PM Pascal Terjan wrote: > > On 7 August 2018 at 09:50, Michael Schroeder wrote: > > On Mon, Aug 06, 2018 at 04:36:07PM +, Zbigniew J??drzejewski-Szmek > > wrote: > >> this mail is a continuation of an FPC [1] and a FESCo [2] tickets. > >> > >> A proposal was made is to disallow packages in Fedora from using file > >> deps, and to optimize dnf to not load filelists.xml. File deps would > >> still be supported, because external packages and users want to use > >> them, but they would not be allowed for distro packages. > >> > >> Not downloading or loading filelists.xml which are required for file > >> deps would provide significant bandwidth savings (~47 MB compressed) > >> and noticeable runtime savings (~10s at dnf startup) in many common > >> cases. > >> > >> So this is something that is worth exploring, but it's not clear if it > >> is at all feasible. > > > > There's also something that can easily be done and would make > > loading the filelist unneeded in most of the cases: extend the > > primary filelist to include some whitelist of files. The whitelist > > must also be stored in the primary data, so that the solver knows > > what to expect. > > That's what Mandrake/Mandriva/Mageia/... has been doing for many > years, there is a small file-deps file containing the ones we end up > generating, mostly from scriptlets IIRC, and we end up with provides > added for those in the main metadata when generating it. Then file > lists are lazily loaded when people want to query them but not used > for dependency resolution. > > $ GET > http://ftp.free.fr/mirrors/mageia.org/distrib/cauldron/x86_64/media/media_info/file-deps > /bin/csh > /bin/grep > /bin/perl > /usr/bin/ln > /usr/bin/rm > /sbin/service > /usr/bin/chattr > /usr/bin/guile > /usr/bin/openssl > /usr/bin/pear > /usr/bin/texhash > /usr/bin/tr > /usr/bin/which > /usr/sbin/groupadd > /usr/sbin/groupdel > /usr/sbin/useradd > /usr/sbin/userdel > So the primary.xml already includes all that. If you actually look in the primary.xml.gz files in the Mageia rpm-md data, those are already there. The problem is that there are people who actually request files outside of the base whitelist as a means to be able to request "things" without knowing how they are packaged, because the file path is the consistent thing across distros. This is supported in YUM and DNF, just slightly differently. In this case, the wish is to restore the YUM behavior. The idea is that stacking this on top of the Zchunk deltarepo extension will yield incredible boosts for everything. -- 真実はいつも一つ!/ Always, there's only one truth! ___ Rpm-ecosystem mailing list Rpm-ecosystem@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-ecosystem
[Rpm-ecosystem] lazy loading of filelists.xml to speed up dnf
Hi dnf and libsolv developers, this mail is a continuation of an FPC [1] and a FESCo [2] tickets. A proposal was made is to disallow packages in Fedora from using file deps, and to optimize dnf to not load filelists.xml. File deps would still be supported, because external packages and users want to use them, but they would not be allowed for distro packages. Not downloading or loading filelists.xml which are required for file deps would provide significant bandwidth savings (~47 MB compressed) and noticeable runtime savings (~10s at dnf startup) in many common cases. So this is something that is worth exploring, but it's not clear if it is at all feasible. It seems that dnf would need to support loading filelists.xml lazily. In the mailing list discussions, some people said that this would be hard, some people said that it would be possible… What is the situation here? IIUC, dnf would need to restart the resolution of a transaction mid-flight once it encounters a file dep, which would require support across the different layers. If Fedora commits to making use of this, would it be possible to implement this in dnf? What kind of changes would be required? [1] https://pagure.io/packaging-committee/issue/714 [2] https://pagure.io/fesco/issue/1955 Zbyszek, on behalf of FESCo (but not that this writeup is based on my understanding, so all errors are mine.) ___ Rpm-ecosystem mailing list Rpm-ecosystem@lists.rpm.org http://lists.rpm.org/mailman/listinfo/rpm-ecosystem