Re: Distributing the CPAN

2010-04-03 Thread Ask Bjørn Hansen

On Apr 2, 2010, at 14:03, Tim Bunce wrote:

 Imagine a cpan-all 'superproject' repro that has all the distros as
 submodules.  This repro would be tiny when cloned because it only
 contains empty directories for the distos plus the metadata for where
 the upstream distro repro lives and what the current commit it.
 When a distro is updated the cpan-all repro would be updated
 to reference the latest version of the distro.

That's a really good idea actually.   That'd mean, too, that it's possible to 
reset a distribution (to get rid of excessive size etc).

It'd be fun to try on the gitpan data...


 - ask (on a sketchy 3g connection out in the country)

Re: Distributing the CPAN

2010-04-03 Thread Tim Bunce
On Fri, Apr 02, 2010 at 04:49:44PM +0200, Aristotle Pagaltzis wrote:
 * Tim Bunce tim.bu...@pobox.com [2010-04-02 15:55]:
  So, for a cpan-git-mirror to update itself it only needs to do:
 
  cd cpan-all  git pull  git submodule update
 
  The git pull of cpan-all repro would be very fast as it's tiny.
 
 With 15,000(?) distributions = submodules = directories, it’s not
 *that* tiny.
 
 You don’t want to stuff those all in the top-level directory.

Naturally. The cpan-all repro would be focussed on distributions not
authors, so I figured a structure (for Foo-Bar and Foo-Bar-Baz distros)
something like:

/Foo
/Bar
/Foo-Bar.distro/...
/Baz
/Foo-Bar-Baz.distro/...

(Let's not bikeshed that at the moment - the key point is that a
hierarchy is needed and that it be focussed on distros.)

 [...] you still get comparatively much churn for some still
 rather big directories, because any change to a subdirectory
 causes the entire chain of objects representing the directory
 levels above it to also change. I don’t know if that churn is
 bad enough to require a different solution.

I doubt it, but we won't know unless someone tries it :)

  Hopefully someone with more git foo than me can sanity check
  it. Assuming I'm not talking nonsense, I think this has great
  potential.
 
 It would take some trickery and thought to do well, but it’s not
 obviously broken as designed.

Great.

Tim.


Re: Distributing the CPAN

2010-04-02 Thread Michael G Schwern
On Thu, Apr 1, 2010 at 7:50 AM, Tim Bunce tim.bu...@pobox.com wrote:
 On Thu, Apr 01, 2010 at 12:39:27AM -0400, David Nicol wrote:
 On Wed, Mar 31, 2010 at 7:43 AM, Ask Bjørn Hansen a...@perl.org wrote:
  The main point here is that we can't use 20 inodes per distribution.

 so don't. How much reengineering would be needed to keep CPAN in a
 database instead of a file system?

 Random thoughts...

FWIW I've had similar thoughts.  I was discussing them with David
Wheeler in relation to the proposed PgAN (Postgres).


 * If you squint a little you can view git as a database with excellent
 replication support.

For bonus points, its smaller.  A bare gitpan is 5 gigs.  BackPAN is 14.


 * cpanminus already supports installing from a git repo.

 * For backwards compatibility a simple perl web server could provide a
 classic CPAN http mirror 'view' over a git repo like gitpan.
 This cpan-git-server would create and serve up cached distro tarballs on 
 demand.
 Someone could whip up one to work over gitpan as a proof of concept.

Its potentially even simpler over gitpan as github will produce
tarballs.  You just need to map the URLs.  I say potentially because
github will produce a tarball named after the commit checksum, not the
tag.  Something I've been on them to fix.


 * The need for widespread mirroring is less significant than it was in
 years past. (Also using git as the inter-mirror transport of source files
 means there'll be much less traffic between mirrors. Effectively only
 the diffs between releases.)

Not being a sysadmin, this is my gut feeling.  Relative to hard drive
prices, CPAN (hell, BackPAN) has shrunk.  I'd imagine the same to be
the case relative to network capacity.


 * New approaches to replication, such as git, don't have to be supported
 by existing mirror providers. A new set of cpan-git-mirror providers could
 emerge.

 * Any cpan-git-mirror provider running a cpan-git-server could be
 included in the list of mirrors used by existing installers.

 * Over time the number of cpan-git-mirror's and cpan-git-server's could
 grow and the number of traditional CPAN ftp/rsync mirrors could fall.

The central thesis is correct, git provides a very simple, very
compact database that sorts things by version and by distribution.
The downside is CPAN doesn't really do things by distribution, so that
would have to be worked out.  IMO this is a Good Thing that needs to
be done.

See http://use.perl.org/~schwern/journal/40014 for gitpan's issues
with identifying distributions.


Re: Distributing the CPAN

2010-04-02 Thread Tim Bunce
On Thu, Apr 01, 2010 at 08:03:53PM +0300, Burak Gürsoy wrote:
  From: Tim Bunce [mailto:tim.bu...@gmail.com] On Behalf Of Tim Bunce
  Subject: Distributing the CPAN
 
  * cpanminus already supports installing from a git repo.
 
  * Over time the number of cpan-git-mirror's and cpan-git-server's could
  grow and the number of traditional CPAN ftp/rsync mirrors could fall.
 
 There is a part missing in this scenario. Mirroring gitPAN can be a
 good idea since it has the actual released distros [...]

Yes, I was envisaging something like gitPAN. Though if this took off
then moving the tarball-git import logic to the PAUSE server would
probably be a good idea.

Tim.


RE: Distributing the CPAN

2010-04-01 Thread Burak Gürsoy
 -Original Message-
 From: Tim Bunce [mailto:tim.bu...@gmail.com] On Behalf Of Tim Bunce
 Sent: Thursday, April 01, 2010 5:51 PM
 To: cpan-workers; module-authors@perl.org
 Subject: Distributing the CPAN

 * cpanminus already supports installing from a git repo.
 

 * Over time the number of cpan-git-mirror's and cpan-git-server's could
 grow and the number of traditional CPAN ftp/rsync mirrors could fall.

There is a part missing in this scenario. Mirroring gitPAN can be a
good idea since it has the actual released distros but mirroring a
random repo (or somehow combining random repos and creating a single one)
is not logical IMHO since many people use builder builders and a repo may
not necessarily contain a complete installable configuration where the 
tar.gz archives (or releases) have a complete set of files.



Re: Distributing the CPAN

2010-04-01 Thread David E. Wheeler
On Apr 1, 2010, at 1:12 PM, Tim Bunce wrote:

 Yes, I was envisaging something like gitPAN. Though if this took off
 then moving the tarball-git import logic to the PAUSE server would
 probably be a good idea.

/me stashes these ideas away for PGAN…