Re: [gentoo-dev] Change layout of distfiles
On Mon, 6 Mar 2006 13:45:01 -0500, Daniel Ostrow [EMAIL PROTECTED] wrote: portage will need to know that the location, on the distfiles mirrors, of cronolog, is now the equivilent of mirror://gentoo/${firstchar} And what about the local mirror type, that one can define in /etc/portage/mirrors: will it be assumed that files are stored with a first-char prefix or not? I would say no, because i think the most common usage is to share the $DISTDIR of one machine (hence without prefix) over LAN, but i'm not really sure. Maybe some people also use this one for full mirrors, based on the official ones (hence with prefix). -- TGL. -- gentoo-dev@gentoo.org mailing list
[gentoo-dev] Change layout of distfiles
Hi, as suggested by Mike in http://bugs.gentoo.org/show_bug.cgi?id=123335, here's my proposal for changing the layout of the distfiles tree: This is the current state: mirror:/storage/gentoo/data/source/distfiles# ls | wc -l 22543 mirror:/storage/gentoo/data/source/distfiles# ls -l ../ | grep distfiles drwxr-xr-x 3 gentoo gentoo 950272 Mar 6 06:08 distfiles mirror:/storage/gentoo/data/source/distfiles# People who want to browse the files by hand are usually stumped by the large output (the directory listing lighttpd creates is currently 4.2MB in size) and generation of those listings causes an excessive strain on the server. Plus the creation/deletion of files doesn't scale too well on filesystems, which store directory entries in linked lists (ext2/3, probably the common bsd filesystems), since the list has to be traversed for each file deleted/created. Introducing an additional directory hierarchy should fix this, and is the common solution for this problem for various projects, be it debian [1], cpan [2], slackware [3], etc. One migration scenario for a better future: Create subdirectories named after the first letter of each file and move the files in their respective directories. Either sym- or hardlink the files from the current distfiles root-directory to the specific directory where they reside in. (Check with the mirror admins first (depending on the chosen linktype) if rsync hardlink support is enabled or their web/ftp servers allow/follow symlinks) Adapt the build scripts so that they look for the files in their new location. Change the scripts which fetch the files for distfiles so that they save them under the new location. Wait a few weeks... (months? years? decades?) until the last user has updated and/or a clean upgrade-path exists, which doesn't rely on the old file locations. Drop the sym/hardlinks. After the change we'd have (with the current set of files) 63 subdirectories, the largest one containing 1775 files (letter 'g'), which is a definitive improvement over the current situation. Full list can be seen at http://mirror.inode.at/gentoo-listing.txt . best regards, Michael Renner - admin of gentoo.inode.at/rsync1.at.gentoo.org [1] http://debian.inode.at/debian/pool/main/ [2] http://www.slackware.at/data/slackware/slackware/ [3] http://cpan.inode.at/modules/by-authors/id/ -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Michael Renner wrote: Hi, as suggested by Mike in http://bugs.gentoo.org/show_bug.cgi?id=123335, here's my proposal for changing the layout of the distfiles tree: Introducing an additional directory hierarchy should fix this, and is the common solution for this problem for various projects, be it debian [1], cpan [2], slackware [3], etc. One migration scenario for a better future: Create subdirectories named after the first letter of each file and move the files in their respective directories. Either sym- or hardlink the files from the current distfiles root-directory to the specific directory where they reside in. (Check with the mirror admins first (depending on the chosen linktype) if rsync hardlink support is enabled or their web/ftp servers allow/follow symlinks) Adapt the build scripts so that they look for the files in their new location. Change the scripts which fetch the files for distfiles so that they save them under the new location. Wait a few weeks... (months? years? decades?) until the last user has updated and/or a clean upgrade-path exists, which doesn't rely on the old file locations. Drop the sym/hardlinks. Is this plan for server side only distfiles, or do you want /usr/portage/distfiles/{a-z}/ on the local system as well. If that is the case the answer is probably no. We've been asked in the past to implement a DISTFILES_PREFIX type system which would work in a similar manner, and it really only complicates things. Is there any needed performance benefit out of the current scheme? Can you give some numbers as to how much this will help the average user? I believe the Infrastructure team also doesn't want to change the layout, but I'll leave it up to them to comment on their own policy ;) best regards, Michael Renner - admin of gentoo.inode.at/rsync1.at.gentoo.org [1] http://debian.inode.at/debian/pool/main/ [2] http://www.slackware.at/data/slackware/slackware/ [3] http://cpan.inode.at/modules/by-authors/id/ signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Change layout of distfiles
Alec Warner wrote: Is this plan for server side only distfiles, or do you want /usr/portage/distfiles/{a-z}/ on the local system as well. Changing the layout on the server suffices, no need to fiddle around with more scripts than necessary ;). Is there any needed performance benefit out of the current scheme? Can you give some numbers as to how much this will help the average user? Listing the directory via proftpd takes the better of 10 minutes on cold caches and consumes around 1 minute of CPU time on an Athlon XP 2800+. With that figures in mind one easily could DoS a mirror-server if he wants. best regards, Michael Renner -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Kurt Lieber wrote: If we can come up with a seamless, painless transition process, great, let's make it happen. From the _MIRROR_-side using hardlinks should be fine enough, we'd just have to ensure that every mirror uses -H (preserve hardlinks). And for the mirrors not using -H this will just result in increased traffic and diskusage (42GB at the moment, might hurt a bit ;) ). Shouldn't be a problem though ensuring that every mirror uses -H (and I think they already do, since we already did hardlink magic when moving old releases to historical) I guess the more complicated part will be adapting the ebuild system to look for/store the files in the new location. best regards, Michael -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Michael Renner wrote: Kurt Lieber wrote: If we can come up with a seamless, painless transition process, great, let's make it happen. From the _MIRROR_-side using hardlinks should be fine enough, we'd just have to ensure that every mirror uses -H (preserve hardlinks). And for the mirrors not using -H this will just result in increased traffic and diskusage (42GB at the moment, might hurt a bit ;) ). Shouldn't be a problem though ensuring that every mirror uses -H (and I think they already do, since we already did hardlink magic when moving old releases to historical) I guess the more complicated part will be adapting the ebuild system to look for/store the files in the new location. Taking the earlier comment ( changing files only on the mirrors ) there are no portage changes that are technically required. However, you'd need to change about 1 ( random number I pulled out of my ass, but there are many affected ) SRC_URI's to point to the new format, or produce some sort of hack that translates between the two, and I wouldn't be to fond of the latter effort, mostly because it would probably rot in the tree for way too long ;) And you need to modify policy for placing files on the mirrors, but thats not a portage problem either; from the portage POV the change is relatively seamless. best regards, Michael -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Alec Warner wrote: Taking the earlier comment ( changing files only on the mirrors ) there are no portage changes that are technically required. However, you'd need to change about 1 ( random number I pulled out of my ass, but there are many affected ) SRC_URI's to point to the new format, or produce some sort of hack that translates between the two, and I wouldn't be to fond of the latter effort, mostly because it would probably rot in the tree for way too long ;) I don't see how making portage translate mirror://gentoo/${P}.patch.bz2 to http://distfiles.gentoo.org/distfiles/${firstchar}/${P}.patch.bz2 is worse than changing 1 SRC_URIs. And you need to modify policy for placing files on the mirrors, but thats not a portage problem either; from the portage POV the change is relatively seamless. That should be a one-time effort for one person anyway. I guess it's not too hard to make a script that puts the stuff in toucan:/space/distfiles-local into the right dir. -- Kind Regards, Simon Stelling Gentoo/AMD64 Member -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
On Monday 06 March 2006 12:36, Alec Warner wrote: Michael Renner wrote: Kurt Lieber wrote: If we can come up with a seamless, painless transition process, great, let's make it happen. From the _MIRROR_-side using hardlinks should be fine enough, we'd just have to ensure that every mirror uses -H (preserve hardlinks). And for the mirrors not using -H this will just result in increased traffic and diskusage (42GB at the moment, might hurt a bit ;) ). Shouldn't be a problem though ensuring that every mirror uses -H (and I think they already do, since we already did hardlink magic when moving old releases to historical) I guess the more complicated part will be adapting the ebuild system to look for/store the files in the new location. Taking the earlier comment ( changing files only on the mirrors ) there are no portage changes that are technically required. However, you'd need to change about 1 ( random number I pulled out of my ass, but there are many affected ) SRC_URI's to point to the new format, or produce some sort of hack that translates between the two, and I wouldn't be to fond of the latter effort, mostly because it would probably rot in the tree for way too long ;) And you need to modify policy for placing files on the mirrors, but thats not a portage problem either; from the portage POV the change is relatively seamless. best regards, Michael Hrm, /me thinks you are missing something there, almost the entire tree doesn't explicitly state the mirror://gentoo SRC_URI, portage handles that automatically. That being the case portage would have change so that the automatic lookup was mirror://gentoo/${firstchar}/. So that is at least one portage change I can think of being required Sure I can still see your point about needing to manually change the packages that do explicitly state mirror://gentoo in their SRC_URI, but given that you would have to do the above anyway -- Daniel Ostrow Gentoo Foundation Board of Trustees Gentoo/{PPC,PPC64,DevRel} [EMAIL PROTECTED] pgppxjDgRp6ds.pgp Description: PGP signature
Re: [gentoo-dev] Change layout of distfiles
Daniel Ostrow wrote: Hrm, /me thinks you are missing something there, almost the entire tree doesn't explicitly state the mirror://gentoo SRC_URI, portage handles that automatically. That being the case portage would have change so that the automatic lookup was mirror://gentoo/${firstchar}/. So that is at least one portage change I can think of being required Huh? What does it state then? AFAIK ebuilds should ALWAYS use the mirror:// URI when possible, and since this change is only affecting our own mirrors, it is always possible. Sure I can still see your point about needing to manually change the packages that do explicitly state mirror://gentoo in their SRC_URI, but given that you would have to do the above anyway Huh?? My point was that we shouldn't have to change all those ebuilds but instead just changing the mirror://gentoo-mapping. -- Kind Regards, Simon Stelling Gentoo/AMD64 Member -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Simon Stelling wrote: Alec Warner wrote: Taking the earlier comment ( changing files only on the mirrors ) there are no portage changes that are technically required. However, you'd need to change about 1 ( random number I pulled out of my ass, but there are many affected ) SRC_URI's to point to the new format, or produce some sort of hack that translates between the two, and I wouldn't be to fond of the latter effort, mostly because it would probably rot in the tree for way too long ;) I don't see how making portage translate mirror://gentoo/${P}.patch.bz2 to http://distfiles.gentoo.org/distfiles/${firstchar}/${P}.patch.bz2 is worse than changing 1 SRC_URIs. Better yet, the new portage could download files by trying both kind of URLs (of course, only during the transition period). After portage team mark the new portage version stable on all arches and give the folks a chance to update their systems (6 months perhaps), infra team could make the transition to the new URLs the same way they're doing releases - historical transitions (namely using hardlinks). signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Change layout of distfiles
On Monday 06 March 2006 13:18, Simon Stelling wrote: Daniel Ostrow wrote: Hrm, /me thinks you are missing something there, almost the entire tree doesn't explicitly state the mirror://gentoo SRC_URI, portage handles that automatically. That being the case portage would have change so that the automatic lookup was mirror://gentoo/${firstchar}/. So that is at least one portage change I can think of being required Huh? What does it state then? AFAIK ebuilds should ALWAYS use the mirror:// URI when possible, and since this change is only affecting our own mirrors, it is always possible. You seem to be missing my point, let's pick an ebuild at random, say app-admin/cronolog whose SRC_URI=http://cronolog.org/download/${P}.tar.gz;, no the automirror script will need to know to mirror at in /c/ on the distfiles mirrors, that's outside of portage, however when I emerge cronolog portage will need to know that the location, on the distfiles mirrors, of cronolog, is now the equivilent of mirror://gentoo/${firstchar}, taking distfiles.gentoo.org as an example that would mean http://distfiles.gentoo.org/distfiles/c/${P}.tar.gz, that means a portage modification in my book. Sure I can still see your point about needing to manually change the packages that do explicitly state mirror://gentoo in their SRC_URI, but given that you would have to do the above anyway Huh?? My point was that we shouldn't have to change all those ebuilds but instead just changing the mirror://gentoo-mapping. And I was saying I agree since the same work has to be done to handle all the automirrored stuff anyway. -- Daniel Ostrow Gentoo Foundation Board of Trustees Gentoo/{PPC,PPC64,DevRel} [EMAIL PROTECTED] pgpCvttJt2IaE.pgp Description: PGP signature
Re: [gentoo-dev] Change layout of distfiles
Simon Stelling wrote: Daniel Ostrow wrote: Hrm, /me thinks you are missing something there, almost the entire tree doesn't explicitly state the mirror://gentoo SRC_URI, portage handles that automatically. That being the case portage would have change so that the automatic lookup was mirror://gentoo/${firstchar}/. So that is at least one portage change I can think of being required 1925 ebuilds ( with a hacked up SRC_URI checking script )[1] URI_check.py mirror://gentoo Huh? What does it state then? AFAIK ebuilds should ALWAYS use the mirror:// URI when possible, and since this change is only affecting our own mirrors, it is always possible. Sure I can still see your point about needing to manually change the packages that do explicitly state mirror://gentoo in their SRC_URI, but given that you would have to do the above anyway Huh?? My point was that we shouldn't have to change all those ebuilds but instead just changing the mirror://gentoo-mapping. See if we do it the ebuild way we can filter via EAPI. The ebuild has a EAPI=2 SRC_URI, but portage is only EAPI=0, then the ebuild is automagically filtered; as opposed to the ebuild failing miserably. It's getting close to the point where we can finally leverage EAPI to push features out faster because backwards compatability is maintained ( for portage ). Infra is still screwed essentially doing 2 implementations until such time as the old one can die. I'd prefer the mirrors not be special cased in a mapping since. URI's are URI's are URI's... -Alec Warner - [1] dev.gentoo.org/~antarus/URI_check.py -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Hi, On 3/6/06, Michael Renner [EMAIL PROTECTED] wrote: Hi, as suggested by Mike in http://bugs.gentoo.org/show_bug.cgi?id=123335, here's my proposal for changing the layout of the distfiles tree: Introducing an additional directory hierarchy should fix this, and is the common solution for this problem for various projects, be it debian [1], cpan [2], slackware [3], etc. Why not have the directory structure follow the package category structure? E.g. the distfiles for package foo/bar goes into the directory ${MIRROR_ROOT}/foo/bar? This should be easy enough to support in Portage, and if applied to the /usr/portage/distfiles directory too, would solve a few other problems. It also has the advantage of grouping the distfiles in a way that users would find natural to browse. There is the problem of what happens when a package moves, but I think that's easily solved too. Best regards, Stu -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Stuart Herbert wrote: Why not have the directory structure follow the package category structure? E.g. the distfiles for package foo/bar goes into the directory ${MIRROR_ROOT}/foo/bar? This should be easy enough to support in Portage, and if applied to the /usr/portage/distfiles directory too, would solve a few other problems. It also has the advantage of grouping the distfiles in a way that users would find natural to browse. There is the problem of what happens when a package moves, but I think that's easily solved too. this has been discussed before. summary: tarballs could be used by more than one package. this way you'll manage to increase the disk space demands for our mirrors. signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Change layout of distfiles
Alin Nastac wrote: this has been discussed before. summary: tarballs could be used by more than one package. this way you'll manage to increase the disk space demands for our mirrors. This one is about sorting by first letter of filename. It won't solve multiple different files with same filename, though. Cheers, -jkt -- cd /local/pub more beer /dev/mouth signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] Change layout of distfiles
On 3/6/06, Alin Nastac [EMAIL PROTECTED] wrote: this has been discussed before. summary: tarballs could be used by more than one package. this way you'll manage to increase the disk space demands for our mirrors. And you can't hard-link the files into multiple directories because ...? Best regards, Stu -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
Michael Renner wrote: Introducing an additional directory hierarchy should fix this, and is the common solution for this problem for various projects, be it debian [1], cpan [2], slackware [3], etc. One migration scenario for a better future: Create subdirectories named after the first letter of each file and move the files in their respective directories. Splitting the files using only one letter leave some directory with still too much files in imho. g 2879 l 2394 p 2049 s 2018 versus l li 1652 k kd 888 x xf 670 g gn 559 li* (lib) are still a lot, but more manageable. the total number of files in my mirror directory is 32000, but I don't delete old files, and I've started some months ago. -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Change layout of distfiles
On Mon, 6 Mar 2006 19:54:28 + Stuart Herbert [EMAIL PROTECTED] wrote: | On 3/6/06, Alin Nastac [EMAIL PROTECTED] wrote: | this has been discussed before. | summary: tarballs could be used by more than one package. this way | you'll manage to increase the disk space demands for our mirrors. | | And you can't hard-link the files into multiple directories | because ...? ...you have to find them first, and because there's a hard link limit on some filesystems, and because some filesystems don't do hardlinks. -- Ciaran McCreesh : Gentoo Developer (Wearer of the shiny hat) Mail: ciaranm at gentoo.org Web : http://dev.gentoo.org/~ciaranm signature.asc Description: PGP signature
Re: [gentoo-dev] Change layout of distfiles
Jan Kundrát wrote: Alin Nastac wrote: this has been discussed before. summary: tarballs could be used by more than one package. this way you'll manage to increase the disk space demands for our mirrors. This one is about sorting by first letter of filename. It won't solve multiple different files with same filename, though. I know what is this about, but Stuart was trying to reopen that old thread. You can't solve the name conflict in a generic fashion without increasing required resorces from our mirrors (either disk space or CPU + RAM). Since probability of such conflict is very low, I say better solve one conflict at a time, by hosting a renamed version of those files on mirror://gentoo. signature.asc Description: OpenPGP digital signature