Re: [gentoo-dev] Change layout of distfiles

2006-03-07 Thread Thomas de Grenier de Latour
On Mon, 6 Mar 2006 13:45:01 -0500,
Daniel Ostrow [EMAIL PROTECTED] wrote:

 portage will need to know that the location, on the distfiles
 mirrors, of cronolog, is now the equivilent of
 mirror://gentoo/${firstchar}

And what about the local mirror type, that one can define in
/etc/portage/mirrors: will it be assumed that files are stored with a
first-char prefix or not? 
I would say no, because i think the most common usage is to share the
$DISTDIR of one machine (hence without prefix) over LAN, but i'm not
really sure. Maybe some people also use this one for full mirrors,
based on the official ones (hence with prefix).

--
TGL.
-- 
gentoo-dev@gentoo.org mailing list



[gentoo-dev] Change layout of distfiles

2006-03-06 Thread Michael Renner

Hi,

as suggested by Mike in http://bugs.gentoo.org/show_bug.cgi?id=123335, 
here's my proposal for changing the layout of the distfiles tree:



This is the current state:

mirror:/storage/gentoo/data/source/distfiles# ls | wc -l
22543
mirror:/storage/gentoo/data/source/distfiles# ls -l ../ | grep distfiles
drwxr-xr-x   3 gentoo gentoo 950272 Mar  6 06:08 distfiles
mirror:/storage/gentoo/data/source/distfiles#


People who want to browse the files by hand are usually stumped by the 
large output (the directory listing lighttpd creates is currently 4.2MB 
in size) and generation of those listings causes an excessive strain on 
the server.


Plus the creation/deletion of files doesn't scale too well on 
filesystems, which store directory entries in linked lists (ext2/3, 
probably the common bsd filesystems), since the list has to be traversed 
for each file deleted/created.


Introducing an additional directory hierarchy should fix this, and is 
the common solution for this problem for various projects, be it debian 
[1], cpan [2], slackware [3], etc.



One migration scenario for a better future:

Create subdirectories named after the first letter of each file and move 
the files in their respective directories.


Either sym- or hardlink the files from the current distfiles 
root-directory to the specific directory where they reside in. (Check 
with the mirror admins first (depending on the chosen linktype) if rsync 
hardlink support is enabled or their web/ftp servers allow/follow symlinks)


Adapt the build scripts so that they look for the files in their new 
location.


Change the scripts which fetch the files for distfiles so that they save 
them under the new location.


Wait a few weeks... (months? years? decades?) until the last user has 
updated and/or a clean upgrade-path exists, which doesn't rely on the 
old file locations.


Drop the sym/hardlinks.


After the change we'd have (with the current set of files) 63 
subdirectories, the largest one containing 1775 files (letter 'g'), 
which is a definitive improvement over the current situation.


Full list can be seen at http://mirror.inode.at/gentoo-listing.txt .


best regards,
Michael Renner - admin of gentoo.inode.at/rsync1.at.gentoo.org

[1] http://debian.inode.at/debian/pool/main/
[2] http://www.slackware.at/data/slackware/slackware/
[3] http://cpan.inode.at/modules/by-authors/id/
--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Alec Warner
Michael Renner wrote:
 Hi,
 
 as suggested by Mike in http://bugs.gentoo.org/show_bug.cgi?id=123335,
 here's my proposal for changing the layout of the distfiles tree:

 Introducing an additional directory hierarchy should fix this, and is
 the common solution for this problem for various projects, be it debian
 [1], cpan [2], slackware [3], etc.
 
 
 One migration scenario for a better future:
 
 Create subdirectories named after the first letter of each file and move
 the files in their respective directories.
 
 Either sym- or hardlink the files from the current distfiles
 root-directory to the specific directory where they reside in. (Check
 with the mirror admins first (depending on the chosen linktype) if rsync
 hardlink support is enabled or their web/ftp servers allow/follow symlinks)
 
 Adapt the build scripts so that they look for the files in their new
 location.
 
 Change the scripts which fetch the files for distfiles so that they save
 them under the new location.
 
 Wait a few weeks... (months? years? decades?) until the last user has
 updated and/or a clean upgrade-path exists, which doesn't rely on the
 old file locations.
 
 Drop the sym/hardlinks.
 

Is this plan for server side only distfiles, or do you want
/usr/portage/distfiles/{a-z}/ on the local system as well.  If that is
the case the answer is probably no.  We've been asked in the past to
implement a DISTFILES_PREFIX type system which would work in a similar
manner, and it really only complicates things.  Is there any needed
performance benefit out of the current scheme?  Can you give some
numbers as to how much this will help the average user?

I believe the Infrastructure team also doesn't want to change the
layout, but I'll leave it up to them to comment on their own policy ;)

 best regards,
 Michael Renner - admin of gentoo.inode.at/rsync1.at.gentoo.org
 
 [1] http://debian.inode.at/debian/pool/main/
 [2] http://www.slackware.at/data/slackware/slackware/
 [3] http://cpan.inode.at/modules/by-authors/id/


signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Michael Renner

Alec Warner wrote:


Is this plan for server side only distfiles, or do you want
/usr/portage/distfiles/{a-z}/ on the local system as well. 


Changing the layout on the server suffices, no need to fiddle around 
with more scripts than necessary ;).


Is there any needed performance benefit out of the current 

 scheme?  Can you give some numbers as to how much this will
 help the average user?

Listing the directory via proftpd takes the better of 10 minutes on 
cold caches and consumes around 1 minute of CPU time on an Athlon XP 
2800+. With that figures in mind one easily could DoS a mirror-server if 
he wants.


best regards,
Michael Renner
--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Michael Renner

Kurt Lieber wrote:


If we can come up with a seamless, painless transition process, great,
let's make it happen.


From the _MIRROR_-side using hardlinks should be fine enough, we'd just 
have to ensure that every mirror uses -H (preserve hardlinks). And for 
the mirrors not using -H this will just result in increased traffic and 
diskusage (42GB at the moment, might hurt a bit ;) ). Shouldn't be a 
problem though ensuring that every mirror uses -H (and I think they 
already do, since we already did hardlink magic when moving old releases 
to historical)


I guess the more complicated part will be adapting the ebuild system to 
look for/store the files in the new location.


best regards,
Michael
--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Alec Warner



Michael Renner wrote:

Kurt Lieber wrote:


If we can come up with a seamless, painless transition process, great,
let's make it happen.



 From the _MIRROR_-side using hardlinks should be fine enough, we'd just 
have to ensure that every mirror uses -H (preserve hardlinks). And for 
the mirrors not using -H this will just result in increased traffic and 
diskusage (42GB at the moment, might hurt a bit ;) ). Shouldn't be a 
problem though ensuring that every mirror uses -H (and I think they 
already do, since we already did hardlink magic when moving old releases 
to historical)


I guess the more complicated part will be adapting the ebuild system to 
look for/store the files in the new location.


Taking the earlier comment ( changing files only on the mirrors ) there 
are no portage changes that are technically required.  However, you'd 
need to change about 1 ( random number I pulled out of my ass, but 
there are many affected ) SRC_URI's to point to the new format, or 
produce some sort of hack that translates between the two, and I 
wouldn't be to fond of the latter effort, mostly because it would 
probably rot in the tree for way too long ;)


And you need to modify policy for placing files on the mirrors, but 
thats not a portage problem either; from the portage POV the change is 
relatively seamless.




best regards,
Michael

--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Simon Stelling

Alec Warner wrote:
Taking the earlier comment ( changing files only on the mirrors ) there 
are no portage changes that are technically required.  However, you'd 
need to change about 1 ( random number I pulled out of my ass, but 
there are many affected ) SRC_URI's to point to the new format, or 
produce some sort of hack that translates between the two, and I 
wouldn't be to fond of the latter effort, mostly because it would 
probably rot in the tree for way too long ;)


I don't see how making portage translate mirror://gentoo/${P}.patch.bz2 
to http://distfiles.gentoo.org/distfiles/${firstchar}/${P}.patch.bz2 is 
worse than changing 1 SRC_URIs.


 And you need to modify policy for placing files on the mirrors, but
 thats not a portage problem either; from the portage POV the change is
 relatively seamless.

That should be a one-time effort for one person anyway. I guess it's not 
too hard to make a script that puts the stuff in 
toucan:/space/distfiles-local into the right dir.


--
Kind Regards,

Simon Stelling
Gentoo/AMD64 Member
--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Daniel Ostrow
On Monday 06 March 2006 12:36, Alec Warner wrote:
 Michael Renner wrote:
  Kurt Lieber wrote:
  If we can come up with a seamless, painless transition process, great,
  let's make it happen.
 
   From the _MIRROR_-side using hardlinks should be fine enough, we'd just
  have to ensure that every mirror uses -H (preserve hardlinks). And for
  the mirrors not using -H this will just result in increased traffic and
  diskusage (42GB at the moment, might hurt a bit ;) ). Shouldn't be a
  problem though ensuring that every mirror uses -H (and I think they
  already do, since we already did hardlink magic when moving old releases
  to historical)
 
  I guess the more complicated part will be adapting the ebuild system to
  look for/store the files in the new location.

 Taking the earlier comment ( changing files only on the mirrors ) there
 are no portage changes that are technically required.  However, you'd
 need to change about 1 ( random number I pulled out of my ass, but
 there are many affected ) SRC_URI's to point to the new format, or
 produce some sort of hack that translates between the two, and I
 wouldn't be to fond of the latter effort, mostly because it would
 probably rot in the tree for way too long ;)

 And you need to modify policy for placing files on the mirrors, but
 thats not a portage problem either; from the portage POV the change is
 relatively seamless.

  best regards,
  Michael

Hrm, /me thinks you are missing something there, almost the entire tree 
doesn't explicitly state the mirror://gentoo SRC_URI, portage handles that 
automatically. That being the case portage would have change so that the 
automatic lookup was mirror://gentoo/${firstchar}/. So that is at least one 
portage change I can think of being required

Sure I can still see your point about needing to manually change the packages 
that do explicitly state mirror://gentoo in their SRC_URI, but given that you 
would have to do the above anyway

-- 
Daniel Ostrow
Gentoo Foundation Board of Trustees
Gentoo/{PPC,PPC64,DevRel}
[EMAIL PROTECTED]


pgppxjDgRp6ds.pgp
Description: PGP signature


Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Simon Stelling

Daniel Ostrow wrote:
Hrm, /me thinks you are missing something there, almost the entire tree 
doesn't explicitly state the mirror://gentoo SRC_URI, portage handles that 
automatically. That being the case portage would have change so that the 
automatic lookup was mirror://gentoo/${firstchar}/. So that is at least one 
portage change I can think of being required


Huh? What does it state then? AFAIK ebuilds should ALWAYS use the 
mirror:// URI when possible, and since this change is only affecting our 
own mirrors, it is always possible.


Sure I can still see your point about needing to manually change the packages 
that do explicitly state mirror://gentoo in their SRC_URI, but given that you 
would have to do the above anyway


Huh?? My point was that we shouldn't have to change all those ebuilds 
but instead just changing the mirror://gentoo-mapping.


--
Kind Regards,

Simon Stelling
Gentoo/AMD64 Member
--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Alin Nastac
Simon Stelling wrote:

 Alec Warner wrote:

 Taking the earlier comment ( changing files only on the mirrors )
 there are no portage changes that are technically required.  However,
 you'd need to change about 1 ( random number I pulled out of my
 ass, but there are many affected ) SRC_URI's to point to the new
 format, or produce some sort of hack that translates between the two,
 and I wouldn't be to fond of the latter effort, mostly because it
 would probably rot in the tree for way too long ;)


 I don't see how making portage translate
 mirror://gentoo/${P}.patch.bz2 to
 http://distfiles.gentoo.org/distfiles/${firstchar}/${P}.patch.bz2 is
 worse than changing 1 SRC_URIs.

Better yet, the new portage could download files by trying both kind of
URLs (of course, only during the transition period).
After portage team mark the new portage version stable on all arches and
give the folks a chance to update their systems (6 months perhaps),
infra team could make the transition to the new URLs the same way
they're doing releases - historical transitions (namely using hardlinks).


signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Daniel Ostrow
On Monday 06 March 2006 13:18, Simon Stelling wrote:
 Daniel Ostrow wrote:
  Hrm, /me thinks you are missing something there, almost the entire tree
  doesn't explicitly state the mirror://gentoo SRC_URI, portage handles
  that automatically. That being the case portage would have change so that
  the automatic lookup was mirror://gentoo/${firstchar}/. So that is at
  least one portage change I can think of being required

 Huh? What does it state then? AFAIK ebuilds should ALWAYS use the
 mirror:// URI when possible, and since this change is only affecting our
 own mirrors, it is always possible.

You seem to be missing my point, let's pick an ebuild at random, say 
app-admin/cronolog whose SRC_URI=http://cronolog.org/download/${P}.tar.gz;, 
no the automirror script will need to know to mirror at in /c/ on the 
distfiles mirrors, that's outside of portage, however when I emerge cronolog 
portage will need to know that the location, on the distfiles mirrors, of 
cronolog, is now the equivilent of mirror://gentoo/${firstchar}, taking 
distfiles.gentoo.org as an example that would mean 
http://distfiles.gentoo.org/distfiles/c/${P}.tar.gz, that means a portage 
modification in my book.

  Sure I can still see your point about needing to manually change the
  packages that do explicitly state mirror://gentoo in their SRC_URI, but
  given that you would have to do the above anyway

 Huh?? My point was that we shouldn't have to change all those ebuilds
 but instead just changing the mirror://gentoo-mapping.

And I was saying I agree since the same work has to be done to handle all the 
automirrored stuff anyway.

-- 
Daniel Ostrow
Gentoo Foundation Board of Trustees
Gentoo/{PPC,PPC64,DevRel}
[EMAIL PROTECTED]


pgpCvttJt2IaE.pgp
Description: PGP signature


Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Alec Warner



Simon Stelling wrote:

Daniel Ostrow wrote:

Hrm, /me thinks you are missing something there, almost the entire 
tree doesn't explicitly state the mirror://gentoo SRC_URI, portage 
handles that automatically. That being the case portage would have 
change so that the automatic lookup was mirror://gentoo/${firstchar}/. 
So that is at least one portage change I can think of being required


1925 ebuilds ( with a hacked up SRC_URI checking script )[1] 
URI_check.py mirror://gentoo





Huh? What does it state then? AFAIK ebuilds should ALWAYS use the 
mirror:// URI when possible, and since this change is only affecting our 
own mirrors, it is always possible.


Sure I can still see your point about needing to manually change the 
packages that do explicitly state mirror://gentoo in their SRC_URI, 
but given that you would have to do the above anyway



Huh?? My point was that we shouldn't have to change all those ebuilds 
but instead just changing the mirror://gentoo-mapping.




See if we do it the ebuild way we can filter via EAPI.  The ebuild has a 
EAPI=2 SRC_URI, but portage is only EAPI=0, then the ebuild is 
automagically filtered; as opposed to the ebuild failing miserably. 
It's getting close to the point where we can finally leverage EAPI to 
push features out faster because backwards compatability is maintained ( 
for portage ).  Infra is still screwed essentially doing 2 
implementations until such time as the old one can die.


I'd prefer the mirrors not be special cased in a mapping since.  URI's 
are URI's are URI's...


-Alec Warner


-
[1] dev.gentoo.org/~antarus/URI_check.py

--
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Stuart Herbert
Hi,

On 3/6/06, Michael Renner [EMAIL PROTECTED] wrote:
 Hi,

 as suggested by Mike in http://bugs.gentoo.org/show_bug.cgi?id=123335,
 here's my proposal for changing the layout of the distfiles tree:
 Introducing an additional directory hierarchy should fix this, and is
 the common solution for this problem for various projects, be it debian
 [1], cpan [2], slackware [3], etc.

Why not have the directory structure follow the package category
structure?  E.g. the distfiles for package foo/bar goes into the
directory ${MIRROR_ROOT}/foo/bar?

This should be easy enough to support in Portage, and if applied to
the /usr/portage/distfiles directory too, would solve a few other
problems.  It also has the advantage of grouping the distfiles in a
way that users would find natural to browse.

There is the problem of what happens when a package moves, but I think
that's easily solved too.

Best regards,
Stu

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Alin Nastac
Stuart Herbert wrote:

Why not have the directory structure follow the package category
structure?  E.g. the distfiles for package foo/bar goes into the
directory ${MIRROR_ROOT}/foo/bar?

This should be easy enough to support in Portage, and if applied to
the /usr/portage/distfiles directory too, would solve a few other
problems.  It also has the advantage of grouping the distfiles in a
way that users would find natural to browse.

There is the problem of what happens when a package moves, but I think
that's easily solved too.
  

this has been discussed before.
summary: tarballs could be used by more than one package. this way
you'll manage to increase the disk space demands for our mirrors.


signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Jan Kundrát
Alin Nastac wrote:
 this has been discussed before.
 summary: tarballs could be used by more than one package. this way
 you'll manage to increase the disk space demands for our mirrors.

This one is about sorting by first letter of filename. It won't solve
multiple different files with same filename, though.

Cheers,
-jkt

-- 
cd /local/pub  more beer  /dev/mouth


signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Stuart Herbert
On 3/6/06, Alin Nastac [EMAIL PROTECTED] wrote:
 this has been discussed before.
 summary: tarballs could be used by more than one package. this way
 you'll manage to increase the disk space demands for our mirrors.

And you can't hard-link the files into multiple directories because ...?

Best regards,
Stu

-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Francesco Riosa
Michael Renner wrote:

 Introducing an additional directory hierarchy should fix this, and is
 the common solution for this problem for various projects, be it debian
 [1], cpan [2], slackware [3], etc.
 
 
 One migration scenario for a better future:
 
 Create subdirectories named after the first letter of each file and move
 the files in their respective directories.
 

Splitting the files using only one letter leave some directory with
still too much files in imho.

g   2879
l   2394
p   2049
s   2018

versus

l   li  1652
k   kd  888
x   xf  670
g   gn  559

li* (lib) are still a lot, but more manageable.

the total number of files in my mirror directory is 32000, but I don't
delete old files, and I've started some months ago.
-- 
gentoo-dev@gentoo.org mailing list



Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Ciaran McCreesh
On Mon, 6 Mar 2006 19:54:28 + Stuart Herbert
[EMAIL PROTECTED] wrote:
| On 3/6/06, Alin Nastac [EMAIL PROTECTED] wrote:
|  this has been discussed before.
|  summary: tarballs could be used by more than one package. this way
|  you'll manage to increase the disk space demands for our mirrors.
| 
| And you can't hard-link the files into multiple directories
| because ...?

...you have to find them first, and because there's a hard link limit
on some filesystems, and because some filesystems don't do hardlinks.

-- 
Ciaran McCreesh : Gentoo Developer (Wearer of the shiny hat)
Mail: ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm



signature.asc
Description: PGP signature


Re: [gentoo-dev] Change layout of distfiles

2006-03-06 Thread Alin Nastac
Jan Kundrát wrote:

Alin Nastac wrote:
  

this has been discussed before.
summary: tarballs could be used by more than one package. this way
you'll manage to increase the disk space demands for our mirrors.



This one is about sorting by first letter of filename. It won't solve
multiple different files with same filename, though.

  

I know what is this about, but Stuart was trying to reopen that old thread.

You can't solve the name conflict in a generic fashion without
increasing required resorces from our mirrors (either disk space or CPU
+ RAM).
Since probability of such conflict is very low, I say better solve one
conflict at a time, by hosting a renamed version of those files on
mirror://gentoo.



signature.asc
Description: OpenPGP digital signature