Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-23 Thread Andreas Gruenbacher
On Mon, 2005-02-21 at 20:45, [EMAIL PROTECTED] wrote:
> CVS was pretty good at keeping files sane, but I'll go for a solution that
> completely sidesteps said problem any day.

One way to get the benefits of both worlds would be to keep an
additional history of changes (in whatever form) that allows to rebuild
the ,v files.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-23 Thread Andreas Gruenbacher
On Mon, 2005-02-21 at 20:45, [EMAIL PROTECTED] wrote:
 CVS was pretty good at keeping files sane, but I'll go for a solution that
 completely sidesteps said problem any day.

One way to get the benefits of both worlds would be to keep an
additional history of changes (in whatever form) that allows to rebuild
the ,v files.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread walt


On Mon, 21 Feb 2005, David Roundy wrote:



I just scanned the comparison of various source-code management
schemes at

http://zooko.com/revision_control_quick_ref.html

and found myself wishing for a similar review of bk (which was
excluded, not being open-source).

Would you (or anyone else) be willing to compose a similar
evaluation of bk so we amateurs can get a feeling for how
these systems differ?

Thanks!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread Horst von Brand
[EMAIL PROTECTED] said:
> On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote:

[...]

> > AFIK all other SCM except arch and darcs always modify the repo, I never
> > heard complains about it, as long as incremental backups are possible
> > and they definitely are possible.

> Well, as you seem to have never been bitten by that bug; let me assure you
> the problem is very real.  Each file (,v file) can live in the repo for
> many years and has to servive any spurious writes to be usable.  The
> curruption of such files (in my experience) only shows itself if you try
> to access its history; which may be weeks after the corruption started,
> and you can't use a backup for that since you will overwrite new versions
> added since.

Marking files read-only won't save you from corruption by NFS or the disk
or the kernel or... randomly scribbling around.
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread zander
On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote:
> Hello Miles,
> 
> On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote:
> > Yeah, the basic way arch organizes its repository seems _far_ more sane
> > than the crazy way CVS (or BK) does, for a variety of reasons[*].  No
> > doubt there are certain usage patterns which stress it, but I think it
> > makes a lot more sense to use a layer of caching to take care of those,
> > rather than screwing up the underlying organization.
> > 
> > [*] (a) Immutability of repository files (_massively_ good idea)
> 
> what is so important about never modifying the repo? Note that only the
> global changeset database and a few ,v files will be modified for each
> changeset, it's not like we're going to touch all the ,v files for each
> checkin. Touching the "modified" ,v files sounds a very minor overhead.
> 
> And you can incremental backup the stuff with recursive diffing the
> repo too.
> 
> AFIK all other SCM except arch and darcs always modify the repo, I never
> heard complains about it, as long as incremental backups are possible
> and they definitely are possible.

Well, as you seem to have never been bitten by that bug; let me assure you
the problem is very real.  Each file (,v file) can live in the repo for
many years and has to servive any spurious writes to be usable.  The
curruption of such files (in my experience) only shows itself if you try
to access its history; which may be weeks after the corruption started,
and you can't use a backup for that since you will overwrite new versions
added since.

Think about it this way;  nfs servers are known to corrupt things;
reboots can corrupt files, different clients will try to write to
the file at the same time quite often during the lifetime of the file, cvs
clients get killed during writes or network drops the connection during a
session.
Considering that the ,v files have a lifetime of years, with many
modifications during that time, I think its amazing corruption does not
happen more often.

CVS was pretty good at keeping files sane, but I'll go for a solution that
completely sidesteps said problem any day.

-- 
Thomas Zander


pgpTm8YJfFbYt.pgp
Description: PGP signature


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread David Brown
On Mon, Feb 21, 2005 at 07:41:54AM -0500, David Roundy wrote:

> The catch is that then we'd have to implement a smart server to keep users
> from having to download the entire history with each update.  That's not a
> fundamentally hard issue, but it would definitely degrade darcs' ease of
> use, besides putting additional load on the server.  So if something like
> this were implemented, I think it would definitely have to be as an
> optional format.
> 
> Also, we couldn't actually store the data in CVS/SCCS format, since in
> darcs a patch to a single file isn't uniquely determined by two states of
> that file.  But storing separately the patches relevant to different files
> would definitely be an optimization worth considering.

What about just a cache file that records, for each "file" which patches
affect it.  Now that I think about it, this is a little tricky, since I'm
not sure what that file would be called.  It would be easy to do for
filenames in the current version.

Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread Andrea Arcangeli
Hello Miles,

On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote:
> Yeah, the basic way arch organizes its repository seems _far_ more sane
> than the crazy way CVS (or BK) does, for a variety of reasons[*].  No
> doubt there are certain usage patterns which stress it, but I think it
> makes a lot more sense to use a layer of caching to take care of those,
> rather than screwing up the underlying organization.
> 
> [*] (a) Immutability of repository files (_massively_ good idea)

what is so important about never modifying the repo? Note that only the
global changeset database and a few ,v files will be modified for each
changeset, it's not like we're going to touch all the ,v files for each
checkin. Touching the "modified" ,v files sounds a very minor overhead.

And you can incremental backup the stuff with recursive diffing the
repo too.

AFIK all other SCM except arch and darcs always modify the repo, I never
heard complains about it, as long as incremental backups are possible
and they definitely are possible.

> (b) Deals with tree-changes "naturally" (CVS-style ,v files are a
> complete mess for anything except file-content changes)

Certainly it's more complicated but I believe the end result will be a
better on-disk format.

Don't get me wrong, current disk format is great already for small
projects, it's the simplest approach and it's very reliable, but I don't
think the current on-disk it scales up to the l-k with reasonable
performance.

> (c) Directly corresponds to traditional diff 'n' patch, easy to
> think about, no surprises

Nobody is supposed to touch the repo with an editor anyway, all it
matters is how fast the command works.

And you'll be able to ask to the SCM "show me all changesets touching
this file, or even ""this range" of the file"", in the last 2 years" and
get an answer in dozen seconds like with cvsps today. even cvsps
creates an huge cache, but it doesn't need to unpack >2 tar.gz
tarballs to create its cache. Feel free to prove me wrong and covert
current kernel CVS to arch and see how big it grows unpacked ;).

Anyway this is quickly going offtopic with l-k, so we should take it to
darcs and arch lists.

Thanks!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread Patrick McFarland
On Saturday 19 February 2005 12:53 pm, Andrea Arcangeli wrote:
> Wouldn't using the CVS format help an order of magnitude here? With
> CVS/SCCS format you can extract all the patchsets that affected a single
> file in a extremely efficient manner, memory utilization will be
> extremely low too (like in cvsps indeed). You'll only have to lookup the
> "global changeset file", and then open the few ,v files that are
> affected and extract their patchsets. cvsps does this optimally
> already. The only difference is that what cvsps is a "readonly" cache,
> while with a real SCM it would be a global file that control all the
> changesets in an atomic way.

But then that makes darcs do stuff the cvs way, and would make darcs exactly 
the opposite of how us darcs users want it, imho. If you're worried about 
darcs needing to open a billion files, nothing stops people from,  say, 
hacking darcs to use a SQL database to store patches in (they just have to 
code it, and I think I saw a SQL module for haskell around somewhere...)

May be I just don't understand the argument of why the CVS file format is 
anything short of insane, backwards, and outdated. We want each chunk of 
information to be both independent and have a clear history (ie, what patches 
does this patch rely on). CVS does not provide this, it is not fine grained 
enough for what darcs needs.

(David Roundy and Co can fill in the technical details of this, I'm not a 
versioning system expert)

In short, we need to move as far away from the CVS way of doing things, 
because ultimately its the wrong way. This is why I am somewhat dismayed when 
I hear of projects who move to SVN from CVS... SVN is just CVS with a few 
flaws fixed, and a few things like atomic commits added. It isn't the next 
step: it is just a small stepping stone between CVS and the next step.

-- 
Patrick "Diablo-D3" McFarland || [EMAIL PROTECTED]
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989


pgpB5eGVGeUue.pgp
Description: PGP signature


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread David Roundy
On Sat, Feb 19, 2005 at 06:53:48PM +0100, Andrea Arcangeli wrote:
> On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote:
> > The linux-2.5 tree right now (I'm re-doing the conversion, and am up to
> > October of last year, so far) is at 141M, if you don't count the pristine
> > cache or working directory.  That's already compressed, so you don't get
> > any extra bonuses.  Darcs stores with each changeset both the old and new
> > versions of each hunk, which gives you some redundancy, and probably
> > accounts for the factor of two greater size than CVS.  This gives a bit of
> > redundancy, which can be helpful in cases of repository corruption.
> 
> Double size of the compressed backup is about the same as SVM with fsfs
> (not tested on l-k tree but in something much smaller). Why not to
> simply checksum instead of having data redundancy? Knowing when
> something is corrupted is a great feature, but doing raid1 without the
> user asking for it, is a worthless overhead.

There are internal issues that would cause trouble here--darcs assumes that
if it knows a given patch, it also knows the patch's inverse.

> > I hope to someday (when more pressing issues are dealt with) add a per-file
> > cache indicating which patches affect which files, which should largely
> > address this problem, although it won't help at all with files that are
> > touched by most of the changesets, and won't be optimimal in any case. :(
> 
> Wouldn't using the CVS format help an order of magnitude here? With
> CVS/SCCS format you can extract all the patchsets that affected a single
> file in a extremely efficient manner, memory utilization will be
> extremely low too (like in cvsps indeed). You'll only have to lookup the
> "global changeset file", and then open the few ,v files that are
> affected and extract their patchsets. cvsps does this optimally
> already. The only difference is that what cvsps is a "readonly" cache,
> while with a real SCM it would be a global file that control all the
> changesets in an atomic way.

The catch is that then we'd have to implement a smart server to keep users
from having to download the entire history with each update.  That's not a
fundamentally hard issue, but it would definitely degrade darcs' ease of
use, besides putting additional load on the server.  So if something like
this were implemented, I think it would definitely have to be as an
optional format.

Also, we couldn't actually store the data in CVS/SCCS format, since in
darcs a patch to a single file isn't uniquely determined by two states of
that file.  But storing separately the patches relevant to different files
would definitely be an optimization worth considering.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread David Roundy
On Sat, Feb 19, 2005 at 06:53:48PM +0100, Andrea Arcangeli wrote:
 On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote:
  The linux-2.5 tree right now (I'm re-doing the conversion, and am up to
  October of last year, so far) is at 141M, if you don't count the pristine
  cache or working directory.  That's already compressed, so you don't get
  any extra bonuses.  Darcs stores with each changeset both the old and new
  versions of each hunk, which gives you some redundancy, and probably
  accounts for the factor of two greater size than CVS.  This gives a bit of
  redundancy, which can be helpful in cases of repository corruption.
 
 Double size of the compressed backup is about the same as SVM with fsfs
 (not tested on l-k tree but in something much smaller). Why not to
 simply checksum instead of having data redundancy? Knowing when
 something is corrupted is a great feature, but doing raid1 without the
 user asking for it, is a worthless overhead.

There are internal issues that would cause trouble here--darcs assumes that
if it knows a given patch, it also knows the patch's inverse.

  I hope to someday (when more pressing issues are dealt with) add a per-file
  cache indicating which patches affect which files, which should largely
  address this problem, although it won't help at all with files that are
  touched by most of the changesets, and won't be optimimal in any case. :(
 
 Wouldn't using the CVS format help an order of magnitude here? With
 CVS/SCCS format you can extract all the patchsets that affected a single
 file in a extremely efficient manner, memory utilization will be
 extremely low too (like in cvsps indeed). You'll only have to lookup the
 global changeset file, and then open the few ,v files that are
 affected and extract their patchsets. cvsps does this optimally
 already. The only difference is that what cvsps is a readonly cache,
 while with a real SCM it would be a global file that control all the
 changesets in an atomic way.

The catch is that then we'd have to implement a smart server to keep users
from having to download the entire history with each update.  That's not a
fundamentally hard issue, but it would definitely degrade darcs' ease of
use, besides putting additional load on the server.  So if something like
this were implemented, I think it would definitely have to be as an
optional format.

Also, we couldn't actually store the data in CVS/SCCS format, since in
darcs a patch to a single file isn't uniquely determined by two states of
that file.  But storing separately the patches relevant to different files
would definitely be an optimization worth considering.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread Patrick McFarland
On Saturday 19 February 2005 12:53 pm, Andrea Arcangeli wrote:
 Wouldn't using the CVS format help an order of magnitude here? With
 CVS/SCCS format you can extract all the patchsets that affected a single
 file in a extremely efficient manner, memory utilization will be
 extremely low too (like in cvsps indeed). You'll only have to lookup the
 global changeset file, and then open the few ,v files that are
 affected and extract their patchsets. cvsps does this optimally
 already. The only difference is that what cvsps is a readonly cache,
 while with a real SCM it would be a global file that control all the
 changesets in an atomic way.

But then that makes darcs do stuff the cvs way, and would make darcs exactly 
the opposite of how us darcs users want it, imho. If you're worried about 
darcs needing to open a billion files, nothing stops people from,  say, 
hacking darcs to use a SQL database to store patches in (they just have to 
code it, and I think I saw a SQL module for haskell around somewhere...)

May be I just don't understand the argument of why the CVS file format is 
anything short of insane, backwards, and outdated. We want each chunk of 
information to be both independent and have a clear history (ie, what patches 
does this patch rely on). CVS does not provide this, it is not fine grained 
enough for what darcs needs.

(David Roundy and Co can fill in the technical details of this, I'm not a 
versioning system expert)

In short, we need to move as far away from the CVS way of doing things, 
because ultimately its the wrong way. This is why I am somewhat dismayed when 
I hear of projects who move to SVN from CVS... SVN is just CVS with a few 
flaws fixed, and a few things like atomic commits added. It isn't the next 
step: it is just a small stepping stone between CVS and the next step.

-- 
Patrick Diablo-D3 McFarland || [EMAIL PROTECTED]
Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music. -- Kristian Wilson, Nintendo, Inc, 1989


pgpB5eGVGeUue.pgp
Description: PGP signature


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread Andrea Arcangeli
Hello Miles,

On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote:
 Yeah, the basic way arch organizes its repository seems _far_ more sane
 than the crazy way CVS (or BK) does, for a variety of reasons[*].  No
 doubt there are certain usage patterns which stress it, but I think it
 makes a lot more sense to use a layer of caching to take care of those,
 rather than screwing up the underlying organization.
 
 [*] (a) Immutability of repository files (_massively_ good idea)

what is so important about never modifying the repo? Note that only the
global changeset database and a few ,v files will be modified for each
changeset, it's not like we're going to touch all the ,v files for each
checkin. Touching the modified ,v files sounds a very minor overhead.

And you can incremental backup the stuff with recursive diffing the
repo too.

AFIK all other SCM except arch and darcs always modify the repo, I never
heard complains about it, as long as incremental backups are possible
and they definitely are possible.

 (b) Deals with tree-changes naturally (CVS-style ,v files are a
 complete mess for anything except file-content changes)

Certainly it's more complicated but I believe the end result will be a
better on-disk format.

Don't get me wrong, current disk format is great already for small
projects, it's the simplest approach and it's very reliable, but I don't
think the current on-disk it scales up to the l-k with reasonable
performance.

 (c) Directly corresponds to traditional diff 'n' patch, easy to
 think about, no surprises

Nobody is supposed to touch the repo with an editor anyway, all it
matters is how fast the command works.

And you'll be able to ask to the SCM show me all changesets touching
this file, or even this range of the file, in the last 2 years and
get an answer in dozen seconds like with cvsps today. even cvsps
creates an huge cache, but it doesn't need to unpack 2 tar.gz
tarballs to create its cache. Feel free to prove me wrong and covert
current kernel CVS to arch and see how big it grows unpacked ;).

Anyway this is quickly going offtopic with l-k, so we should take it to
darcs and arch lists.

Thanks!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread David Brown
On Mon, Feb 21, 2005 at 07:41:54AM -0500, David Roundy wrote:

 The catch is that then we'd have to implement a smart server to keep users
 from having to download the entire history with each update.  That's not a
 fundamentally hard issue, but it would definitely degrade darcs' ease of
 use, besides putting additional load on the server.  So if something like
 this were implemented, I think it would definitely have to be as an
 optional format.
 
 Also, we couldn't actually store the data in CVS/SCCS format, since in
 darcs a patch to a single file isn't uniquely determined by two states of
 that file.  But storing separately the patches relevant to different files
 would definitely be an optimization worth considering.

What about just a cache file that records, for each file which patches
affect it.  Now that I think about it, this is a little tricky, since I'm
not sure what that file would be called.  It would be easy to do for
filenames in the current version.

Dave
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread zander
On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote:
 Hello Miles,
 
 On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote:
  Yeah, the basic way arch organizes its repository seems _far_ more sane
  than the crazy way CVS (or BK) does, for a variety of reasons[*].  No
  doubt there are certain usage patterns which stress it, but I think it
  makes a lot more sense to use a layer of caching to take care of those,
  rather than screwing up the underlying organization.
  
  [*] (a) Immutability of repository files (_massively_ good idea)
 
 what is so important about never modifying the repo? Note that only the
 global changeset database and a few ,v files will be modified for each
 changeset, it's not like we're going to touch all the ,v files for each
 checkin. Touching the modified ,v files sounds a very minor overhead.
 
 And you can incremental backup the stuff with recursive diffing the
 repo too.
 
 AFIK all other SCM except arch and darcs always modify the repo, I never
 heard complains about it, as long as incremental backups are possible
 and they definitely are possible.

Well, as you seem to have never been bitten by that bug; let me assure you
the problem is very real.  Each file (,v file) can live in the repo for
many years and has to servive any spurious writes to be usable.  The
curruption of such files (in my experience) only shows itself if you try
to access its history; which may be weeks after the corruption started,
and you can't use a backup for that since you will overwrite new versions
added since.

Think about it this way;  nfs servers are known to corrupt things;
reboots can corrupt files, different clients will try to write to
the file at the same time quite often during the lifetime of the file, cvs
clients get killed during writes or network drops the connection during a
session.
Considering that the ,v files have a lifetime of years, with many
modifications during that time, I think its amazing corruption does not
happen more often.

CVS was pretty good at keeping files sane, but I'll go for a solution that
completely sidesteps said problem any day.

-- 
Thomas Zander


pgpTm8YJfFbYt.pgp
Description: PGP signature


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread Horst von Brand
[EMAIL PROTECTED] said:
 On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote:

[...]

  AFIK all other SCM except arch and darcs always modify the repo, I never
  heard complains about it, as long as incremental backups are possible
  and they definitely are possible.

 Well, as you seem to have never been bitten by that bug; let me assure you
 the problem is very real.  Each file (,v file) can live in the repo for
 many years and has to servive any spurious writes to be usable.  The
 curruption of such files (in my experience) only shows itself if you try
 to access its history; which may be weeks after the corruption started,
 and you can't use a backup for that since you will overwrite new versions
 added since.

Marking files read-only won't save you from corruption by NFS or the disk
or the kernel or... randomly scribbling around.
-- 
Dr. Horst H. von Brand   User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria  +56 32 654239
Casilla 110-V, Valparaiso, ChileFax:  +56 32 797513
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-21 Thread walt


On Mon, 21 Feb 2005, David Roundy wrote:

snip very technical discussion

I just scanned the comparison of various source-code management
schemes at

http://zooko.com/revision_control_quick_ref.html

and found myself wishing for a similar review of bk (which was
excluded, not being open-source).

Would you (or anyone else) be willing to compose a similar
evaluation of bk so we amateurs can get a feeling for how
these systems differ?

Thanks!

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-20 Thread Miles Bader
Dustin Sallings  writes:
> but the nicest thing about arch is that a given commit is immutable.
> There are no tools to modify it.  This is also why the crypto
> signature stuff was so easy to fit in.
>
> RCS and SCCS storage throws away most of those features.

Yeah, the basic way arch organizes its repository seems _far_ more sane
than the crazy way CVS (or BK) does, for a variety of reasons[*].  No
doubt there are certain usage patterns which stress it, but I think it
makes a lot more sense to use a layer of caching to take care of those,
rather than screwing up the underlying organization.

[*] (a) Immutability of repository files (_massively_ good idea)
(b) Deals with tree-changes "naturally" (CVS-style ,v files are a
complete mess for anything except file-content changes)
(c) Directly corresponds to traditional diff 'n' patch, easy to
think about, no surprises

-Miles
-- 
Saa, shall we dance?  (from a dance-class advertisement)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-20 Thread Ralph Corderoy

Hi,

David Roundy, creator of darcs, wrote:
> On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote:
> > I read in the webpage of the darcs kernel repository that they had
> > to add RAM serveral times to avoid running out of memory. They
> > needed more than 1G IIRC, and that was enough for me to lose
> > interest into it.  You're right I blamed the functional approach and
> > so I felt it was going to be a mess to fix the ram utilization, but
> > as someone else pointed out, perhaps it's darcs to blame and not
> > haskell. I don't know.
> 
> Darcs' RAM use has indeed already improved somewhat... I'm not exactly
> sure how much.  I'm not quite sure how to measure peak virtual memory
> usage, and most of the time darcs' memory use while doing the linux
> kernel conversion is under a couple of hundred megabytes.

Wouldn't calling sbrk(0) help?  I don't know if the Haskell run-time
ever shrinks the data segment, if not, it could just be called at the
end.  Or a `strace -e trace=brk darcs ...' might do.  But I guess darcs
has other VM usage that doesn't show in this figure?  Does /proc/$$/maps
if running under Linux help?

A consistent way to measure would be handy for observing changes over
time.

Cheers,


Ralph.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-20 Thread Ralph Corderoy

Hi,

David Roundy, creator of darcs, wrote:
 On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote:
  I read in the webpage of the darcs kernel repository that they had
  to add RAM serveral times to avoid running out of memory. They
  needed more than 1G IIRC, and that was enough for me to lose
  interest into it.  You're right I blamed the functional approach and
  so I felt it was going to be a mess to fix the ram utilization, but
  as someone else pointed out, perhaps it's darcs to blame and not
  haskell. I don't know.
 
 Darcs' RAM use has indeed already improved somewhat... I'm not exactly
 sure how much.  I'm not quite sure how to measure peak virtual memory
 usage, and most of the time darcs' memory use while doing the linux
 kernel conversion is under a couple of hundred megabytes.

Wouldn't calling sbrk(0) help?  I don't know if the Haskell run-time
ever shrinks the data segment, if not, it could just be called at the
end.  Or a `strace -e trace=brk darcs ...' might do.  But I guess darcs
has other VM usage that doesn't show in this figure?  Does /proc/$$/maps
if running under Linux help?

A consistent way to measure would be handy for observing changes over
time.

Cheers,


Ralph.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-20 Thread Miles Bader
Dustin Sallings dustin@spy.net writes:
 but the nicest thing about arch is that a given commit is immutable.
 There are no tools to modify it.  This is also why the crypto
 signature stuff was so easy to fit in.

 RCS and SCCS storage throws away most of those features.

Yeah, the basic way arch organizes its repository seems _far_ more sane
than the crazy way CVS (or BK) does, for a variety of reasons[*].  No
doubt there are certain usage patterns which stress it, but I think it
makes a lot more sense to use a layer of caching to take care of those,
rather than screwing up the underlying organization.

[*] (a) Immutability of repository files (_massively_ good idea)
(b) Deals with tree-changes naturally (CVS-style ,v files are a
complete mess for anything except file-content changes)
(c) Directly corresponds to traditional diff 'n' patch, easy to
think about, no surprises

-Miles
-- 
Saa, shall we dance?  (from a dance-class advertisement)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Andrea Arcangeli
On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote:
> The linux-2.5 tree right now (I'm re-doing the conversion, and am up to
> October of last year, so far) is at 141M, if you don't count the pristine
> cache or working directory.  That's already compressed, so you don't get
> any extra bonuses.  Darcs stores with each changeset both the old and new
> versions of each hunk, which gives you some redundancy, and probably
> accounts for the factor of two greater size than CVS.  This gives a bit of
> redundancy, which can be helpful in cases of repository corruption.

Double size of the compressed backup is about the same as SVM with fsfs
(not tested on l-k tree but in something much smaller). Why not to
simply checksum instead of having data redundancy? Knowing when
something is corrupted is a great feature, but doing raid1 without the
user asking for it, is a worthless overhead.

The same is true for arch of course, last time I checked they were using
the default -U 3 format instead of -U 0.

> I presume you're referring to a local checkout? That is already done using
> hard links by darcs--only of course the working directory has to actually
> be copied over, since there are editors out there that aren't friendly to
> hard-linked files.

arch allows to hardlink the copy too (optionally) and it's up to you to
use the right switch in the editor (Davide had a LD_PRELOAD to do a
copy-on-write since the kernel doesn't provide the feature).

> And here's where darcs really falls down.  To track the history of a single
> file it has to read the contents of every changeset since the creation of
> that file, which will take forever (well, not quite... but close).

Indeed, and as I mentioned this is the *major* feature as far as I'm
concerned (and frankly the only one I really care about and that helps a
lot to track changes in the tree and understand why the code evolved).

Note that cvsps works great for this, it's very efficient as well (not
remotely comparable to arch at least, even if arch provided a tool
equivalent to cvsps), the only problem is that CVS is out of sync...

> I hope to someday (when more pressing issues are dealt with) add a per-file
> cache indicating which patches affect which files, which should largely
> address this problem, although it won't help at all with files that are
> touched by most of the changesets, and won't be optimimal in any case. :(

Wouldn't using the CVS format help an order of magnitude here? With
CVS/SCCS format you can extract all the patchsets that affected a single
file in a extremely efficient manner, memory utilization will be
extremely low too (like in cvsps indeed). You'll only have to lookup the
"global changeset file", and then open the few ,v files that are
affected and extract their patchsets. cvsps does this optimally
already. The only difference is that what cvsps is a "readonly" cache,
while with a real SCM it would be a global file that control all the
changesets in an atomic way.

Infact *that* global file could be a bsddb too, I don't care about how
the changset file is being encoded, all I care is that the data is a ,v
file or SCCS file so cvsps won't have to read >2 files every time I
ask that question, which is currently unavoidable with both darcs and
arch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread David Roundy
On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote:
> But anyway the only thing I care about is that you import all dozen
> thousands changesets of the 2.5 kernel into it, and you show it's
> manageable with <1G of ram and that the backup size is not very far from
> the 75M of CVS.

The linux-2.5 tree right now (I'm re-doing the conversion, and am up to
October of last year, so far) is at 141M, if you don't count the pristine
cache or working directory.  That's already compressed, so you don't get
any extra bonuses.  Darcs stores with each changeset both the old and new
versions of each hunk, which gives you some redundancy, and probably
accounts for the factor of two greater size than CVS.  This gives a bit of
redundancy, which can be helpful in cases of repository corruption.

> I read in the webpage of the darcs kernel repository that they had to
> add RAM serveral times to avoid running out of memory. They needed more
> than 1G IIRC, and that was enough for me to lose interest into it.
> You're right I blamed the functional approach and so I felt it was going
> to be a mess to fix the ram utilization, but as someone else pointed
> out, perhaps it's darcs to blame and not haskell. I don't know.

Darcs' RAM use has indeed already improved somewhat... I'm not exactly sure
how much.  I'm not quite sure how to measure peak virtual memory usage, and
most of the time darcs' memory use while doing the linux kernel conversion
is under a couple of hundred megabytes.

There are indeed trickinesses involved in making sure garbage gets
collected in a timely manner when coding in a lazy language like haskell.

> On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote:
> > Thats all up to how the versioning system is written. Darcs developers
> > are working in a checkpoint system to allow you to just grab the newest
> > stuff,

Correction:  we already have a checkpoint system.  The work in progress is
making commands that examine the history fail gracefully when that history
isn't present.

> This is already available with arch. In fact I suggested myself how to
> improve it with hardlinks so that a checkout will take a few seconds no
> matter the size of the tree.

I presume you're referring to a local checkout? That is already done using
hard links by darcs--only of course the working directory has to actually
be copied over, since there are editors out there that aren't friendly to
hard-linked files.

> > and automatically grab anything else you need, instead of just grabbing
> > everything. In the case of the darcs linux repo, no one wants to
> > download 600 megs or so of changes.
> 
> If you use arch/darcs as a patch-download tool, then that's fine
...
> The major reason a versioning system is useful to me is to track all
> changesets that touched a single file since the start of 2.5 to the
> head. So I can't get away by downloading the last dozen patches and
> caching the previous tree (which is perfectly doable with arch for ages
> and with hardlinks as well).

And here's where darcs really falls down.  To track the history of a single
file it has to read the contents of every changeset since the creation of
that file, which will take forever (well, not quite... but close).

I hope to someday (when more pressing issues are dealt with) add a per-file
cache indicating which patches affect which files, which should largely
address this problem, although it won't help at all with files that are
touched by most of the changesets, and won't be optimimal in any case. :(
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Andrea Arcangeli
On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote:
> In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By 
> using it's super magical method, it represents how code is written and how it 
> changes (patch theory at its best). You can clearly see the direction code is 
> going, where it came from, and how it relates to other patches.

I don't know anything about darcs, I was only talking about arch. I
failed to compile darcs after trying for a while, so I give it up, I'll
try again eventually.

But anyway the only thing I care about is that you import all dozen
thousands changesets of the 2.5 kernel into it, and you show it's
manageable with <1G of ram and that the backup size is not very far from
the 75M of CVS.

I read in the webpage of the darcs kernel repository that they had to
add RAM serveral times to avoid running out of memory. They needed more
than 1G IIRC, and that was enough for me to lose interest into it.
You're right I blamed the functional approach and so I felt it was going
to be a mess to fix the ram utilization, but as someone else pointed
out, perhaps it's darcs to blame and not haskell. I don't know.

To me backup size matters too and for example I'm quite happy the fsfs
backend of SVN generates very small backups compared to bsddb.

> Sure, you can do this with RCS/SCCS style versioning, but whats the point? It 
> is inefficient, and backwards.

It is saved in a compact format, and I don't think it'll run slower
since if it's in cache it'll be fast and if it's I/O dominaed the more
compact it is, the faster it will be, having a compact size both for the
repository and for the backup, is more important to me.

In theory one could write a backup tool that extracts the thing and
rewrite a special backup-backend that is as space efficient as CVS and
that can compress as well as CVS, but this won't help the working copy.

> Thats all up to how the versioning system is written. Darcs developers are 
> working in a checkpoint system to allow you to just grab the newest stuff, 

This is already available with arch. Infact I suggested myself how to
improve it with hardlinks so that a checkout will take a few seconds no
matter the size of the tree.

> and automatically grab anything else you need, instead of just grabbing 
> everything. In the case of the darcs linux repo, no one wants to download 600 
> megs or so of changes.

If you use arch/darcs as a patch-download tool, then that's fine as you
say and you can already do that with arch (that in this part seems
already a lot more advanced and it's written in C btw).  Most people
just checking out the kernel with arch (or darcs) would never need to
download 600Megs of changes, but I need to download them all.

The major reason a versioning system is useful to me is to track all
changesets that touched a single file since the start of 2.5 to the
head. So I can't get away by downloading the last dozen patches and
caching the previous tree (which is perfectly doable with arch for ages
and with hardlinks as well).

> It may not even be space efficient. Code ultimately is just code, and changes 
> ultimately are changes. RCS isn't magical, and its far from it. Infact, the 

The way RCS stores the stuff compresses great. In that is "magical". I
guess SCCS is the same. fsfs isn't bad either though, and infact I'd
never use bsddb and I'd only use fsfs with SVN.

> The darcs repo which has the entire history since at least the start of 2.4 
> (iirc anyways) to *now* is around 600 to 700. 
> My suggestion is to convert _all_ dozen thousand changesets to darcs, and 
> then 
> compare the size with CVS. And no, darcs doesn't eat that much memory for the 

What is the above 600/700 number then? I thought that was the conversion
of all dozen thousand changesets of linux-2.5 CVS to darcs.

> amount of work its doing. (And yes, they are working on that).

I'll stay tuned.

To me the only argument for not using a "magic" format like CVS or SCCS
that is space efficient and compresses efficiently, is if you claim it's
going to be a lot slower at checkouts (but infact applying some dozen
thousand patchsets to run a checkout is going to be slower than
CVS/SCCS). I know it's so much simpler to keep each patchset in a
different file like arch is already doing, but that's not the best
on-disk format IMHO.

Note that some year ago I had the opposite idea, i.e. at some point I
got convinced it was so much better to keep each patch separated from
each other like you're advocating above, until I figured out how big the
thing grows and how little space efficient it is, and how much I/O it
forces me to do, how much disk it wastes in the backup and how slow it
is as well to checkout dozen thousand patchsets.

For smaller projects without dozen thousand changesets, the patch per
file looks fine instead. For big projects IMHO being space efficient is
much more important.
-
To unsubscribe from this list: send the line "unsubscribe 

Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Sean
On Sat, February 19, 2005 4:10 am, Patrick McFarland said:
> On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote:
>> On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
>> > RCS/SCCS format doesn't make much sence for a changeset oriented SCM.
>>
>> The advantage it will provide is that it'll be compact and a backup will
>> compress at best too. Small compressed tarballs compress very badly
>> instead, it wouldn't be even comparable. Once the thing is very compact
>> it has a better chance to fit in cache, and if it fits in cache
>> extracting diffs from each file will be very fast. Once it'll be compact
>> the cost of a changeset will be diminished allowing it to scale better
>> too.
>
> In the case of darcs, RCS/SCCS works exactly opposite of how darcs does.
> By
> using it's super magical method, it represents how code is written and how
> it
> changes (patch theory at its best). You can clearly see the direction code
> is
> going, where it came from, and how it relates to other patches.
>
> Sure, you can do this with RCS/SCCS style versioning, but whats the point?
> It
> is inefficient, and backwards.
>
>> Now it's true new disks are bigger, but they're not much faster, so if
>> the size of the repository is much larger, it'll be much slower to
>> checkout if it doesn't fit in cache. And if it's smaller it has better
>> chances of fitting in cache too.
>
> Thats all up to how the versioning system is written. Darcs developers are
> working in a checkpoint system to allow you to just grab the newest stuff,
> and automatically grab anything else you need, instead of just grabbing
> everything. In the case of the darcs linux repo, no one wants to download
> 600
> megs or so of changes.
>
>> The thing is, RCS seems a space efficient format for storing patches,
>> and it's efficient at extracting them too (plus it's textual so it's not
>> going to get lost info even if something goes wrong).
>
> It may not even be space efficient. Code ultimately is just code, and
> changes
> ultimately are changes. RCS isn't magical, and its far from it. Infact,
> the
> format darcs uses probably stores more code in less space, but don't quote
> me
> on that.
>
>> The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed.
>
> The darcs repo which has the entire history since at least the start of
> 2.4
> (iirc anyways) to *now* is around 600 to 700.
>
>> My suggestion is to convert _all_ dozen thousand changesets to arch or
>> SVN and then compare the size with CVS (also the compressed size is
>> interesting for backups IMHO). Unfortunately I know nothing about darcs
>> yet (except it eats quite some memory ;)
>
> My suggestion is to convert _all_ dozen thousand changesets to darcs, and
> then
> compare the size with CVS. And no, darcs doesn't eat that much memory for
> the
> amount of work its doing. (And yes, they are working on that).
>
> The only thing you haven't brought up is the whole "omgwtfbbq! BK sucks,
> lets
> switch to SVN or Arch!" thing everyone else in the known universe is
> doing.
> BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the
> same
> goes for SVN vs Arch vs Darcs).
>
> (Start Generic BK Thread On LKML Rant)
>
> Dear Everyone,
>
> I think if Linus is happy with BK, he should stick with it. His opinion
> ultimately trumps all of ours because he does all the hard maintainership
> work, and we don't. The only guy that gets to bitch about how much a
> versioning system sucks is the maintainer of a project (unless its CVS,
> then
> all bets are off).
>
> Linus has so far indicated that he likes BK, so the kernel hacking
> community
> will be stuck using that for awhile. However, that doesn't stop the
> license
> kiddies from coming out of the woodwork and mindlessly quoting the bad
> parts
> of the BK license (which, yes, its non-free, but at this point, who gives
> a
> shit).
>
> IMO, yes, a non-free versioning system for the crown jewel of the FLOSS
> community is a little... odd, but it was LInus's choice, and we now have
> to
> respect it/deal with it.
>
> Now, I did say above (in this thread) that darcs would be really awesome
> for
> kernel hacking, especially since it's inherent support for multiple
> branches[1] and the ability to send changes from each other around easily
> would come in handy; however, darcs was not mature at the time of Linus's
> decision (and many say it is still not mature enough), so if Linus had
> actually chosen darcs, I (and other people here) would be now flaming him
> for
> choosing a versioning system that wasn't mature.
>
> Similarly, if he had chosen arch, everyone would have flamed him for
> choosing
> a hard to use tool. With svn, he would have met flamage by the hands of it
> being too much like cvs and not supporting arch/darcs style branch
> syncing.
> And if he stayed with cvs, he would have been roasted over an open fire
> for
> sticking with an out of date, useless, insane tool.
>
> And if he chose 

Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Patrick McFarland
On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote:
> On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
> > RCS/SCCS format doesn't make much sence for a changeset oriented SCM.
>
> The advantage it will provide is that it'll be compact and a backup will
> compress at best too. Small compressed tarballs compress very badly
> instead, it wouldn't be even comparable. Once the thing is very compact
> it has a better chance to fit in cache, and if it fits in cache
> extracting diffs from each file will be very fast. Once it'll be compact
> the cost of a changeset will be diminished allowing it to scale better
> too.

In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By 
using it's super magical method, it represents how code is written and how it 
changes (patch theory at its best). You can clearly see the direction code is 
going, where it came from, and how it relates to other patches.

Sure, you can do this with RCS/SCCS style versioning, but whats the point? It 
is inefficient, and backwards.

> Now it's true new disks are bigger, but they're not much faster, so if
> the size of the repository is much larger, it'll be much slower to
> checkout if it doesn't fit in cache. And if it's smaller it has better
> chances of fitting in cache too.

Thats all up to how the versioning system is written. Darcs developers are 
working in a checkpoint system to allow you to just grab the newest stuff, 
and automatically grab anything else you need, instead of just grabbing 
everything. In the case of the darcs linux repo, no one wants to download 600 
megs or so of changes.

> The thing is, RCS seems a space efficient format for storing patches,
> and it's efficient at extracting them too (plus it's textual so it's not
> going to get lost info even if something goes wrong).

It may not even be space efficient. Code ultimately is just code, and changes 
ultimately are changes. RCS isn't magical, and its far from it. Infact, the 
format darcs uses probably stores more code in less space, but don't quote me 
on that.

> The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed.

The darcs repo which has the entire history since at least the start of 2.4 
(iirc anyways) to *now* is around 600 to 700. 

> My suggestion is to convert _all_ dozen thousand changesets to arch or
> SVN and then compare the size with CVS (also the compressed size is
> interesting for backups IMHO). Unfortunately I know nothing about darcs
> yet (except it eats quite some memory ;)

My suggestion is to convert _all_ dozen thousand changesets to darcs, and then 
compare the size with CVS. And no, darcs doesn't eat that much memory for the 
amount of work its doing. (And yes, they are working on that).

The only thing you haven't brought up is the whole "omgwtfbbq! BK sucks, lets 
switch to SVN or Arch!" thing everyone else in the known universe is doing. 
BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the same 
goes for SVN vs Arch vs Darcs).

(Start Generic BK Thread On LKML Rant)

Dear Everyone,

I think if Linus is happy with BK, he should stick with it. His opinion 
ultimately trumps all of ours because he does all the hard maintainership 
work, and we don't. The only guy that gets to bitch about how much a 
versioning system sucks is the maintainer of a project (unless its CVS, then 
all bets are off).

Linus has so far indicated that he likes BK, so the kernel hacking community 
will be stuck using that for awhile. However, that doesn't stop the license 
kiddies from coming out of the woodwork and mindlessly quoting the bad parts 
of the BK license (which, yes, its non-free, but at this point, who gives a 
shit). 

IMO, yes, a non-free versioning system for the crown jewel of the FLOSS 
community is a little... odd, but it was LInus's choice, and we now have to 
respect it/deal with it.

Now, I did say above (in this thread) that darcs would be really awesome for 
kernel hacking, especially since it's inherent support for multiple 
branches[1] and the ability to send changes from each other around easily 
would come in handy; however, darcs was not mature at the time of Linus's 
decision (and many say it is still not mature enough), so if Linus had 
actually chosen darcs, I (and other people here) would be now flaming him for 
choosing a versioning system that wasn't mature.

Similarly, if he had chosen arch, everyone would have flamed him for choosing 
a hard to use tool. With svn, he would have met flamage by the hands of it 
being too much like cvs and not supporting arch/darcs style branch syncing.
And if he stayed with cvs, he would have been roasted over an open fire for 
sticking with an out of date, useless, insane tool.

And if he chose anything else that I didn't previously mention, everyone would 
have donned flame retardant suits and went into the fray over the fact that 
no one has heard of that versioning system.

No matter what choice Linus would have 

Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Patrick McFarland
On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote:
 On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
  RCS/SCCS format doesn't make much sence for a changeset oriented SCM.

 The advantage it will provide is that it'll be compact and a backup will
 compress at best too. Small compressed tarballs compress very badly
 instead, it wouldn't be even comparable. Once the thing is very compact
 it has a better chance to fit in cache, and if it fits in cache
 extracting diffs from each file will be very fast. Once it'll be compact
 the cost of a changeset will be diminished allowing it to scale better
 too.

In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By 
using it's super magical method, it represents how code is written and how it 
changes (patch theory at its best). You can clearly see the direction code is 
going, where it came from, and how it relates to other patches.

Sure, you can do this with RCS/SCCS style versioning, but whats the point? It 
is inefficient, and backwards.

 Now it's true new disks are bigger, but they're not much faster, so if
 the size of the repository is much larger, it'll be much slower to
 checkout if it doesn't fit in cache. And if it's smaller it has better
 chances of fitting in cache too.

Thats all up to how the versioning system is written. Darcs developers are 
working in a checkpoint system to allow you to just grab the newest stuff, 
and automatically grab anything else you need, instead of just grabbing 
everything. In the case of the darcs linux repo, no one wants to download 600 
megs or so of changes.

 The thing is, RCS seems a space efficient format for storing patches,
 and it's efficient at extracting them too (plus it's textual so it's not
 going to get lost info even if something goes wrong).

It may not even be space efficient. Code ultimately is just code, and changes 
ultimately are changes. RCS isn't magical, and its far from it. Infact, the 
format darcs uses probably stores more code in less space, but don't quote me 
on that.

 The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed.

The darcs repo which has the entire history since at least the start of 2.4 
(iirc anyways) to *now* is around 600 to 700. 

 My suggestion is to convert _all_ dozen thousand changesets to arch or
 SVN and then compare the size with CVS (also the compressed size is
 interesting for backups IMHO). Unfortunately I know nothing about darcs
 yet (except it eats quite some memory ;)

My suggestion is to convert _all_ dozen thousand changesets to darcs, and then 
compare the size with CVS. And no, darcs doesn't eat that much memory for the 
amount of work its doing. (And yes, they are working on that).

The only thing you haven't brought up is the whole omgwtfbbq! BK sucks, lets 
switch to SVN or Arch! thing everyone else in the known universe is doing. 
BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the same 
goes for SVN vs Arch vs Darcs).

(Start Generic BK Thread On LKML Rant)

Dear Everyone,

I think if Linus is happy with BK, he should stick with it. His opinion 
ultimately trumps all of ours because he does all the hard maintainership 
work, and we don't. The only guy that gets to bitch about how much a 
versioning system sucks is the maintainer of a project (unless its CVS, then 
all bets are off).

Linus has so far indicated that he likes BK, so the kernel hacking community 
will be stuck using that for awhile. However, that doesn't stop the license 
kiddies from coming out of the woodwork and mindlessly quoting the bad parts 
of the BK license (which, yes, its non-free, but at this point, who gives a 
shit). 

IMO, yes, a non-free versioning system for the crown jewel of the FLOSS 
community is a little... odd, but it was LInus's choice, and we now have to 
respect it/deal with it.

Now, I did say above (in this thread) that darcs would be really awesome for 
kernel hacking, especially since it's inherent support for multiple 
branches[1] and the ability to send changes from each other around easily 
would come in handy; however, darcs was not mature at the time of Linus's 
decision (and many say it is still not mature enough), so if Linus had 
actually chosen darcs, I (and other people here) would be now flaming him for 
choosing a versioning system that wasn't mature.

Similarly, if he had chosen arch, everyone would have flamed him for choosing 
a hard to use tool. With svn, he would have met flamage by the hands of it 
being too much like cvs and not supporting arch/darcs style branch syncing.
And if he stayed with cvs, he would have been roasted over an open fire for 
sticking with an out of date, useless, insane tool.

And if he chose anything else that I didn't previously mention, everyone would 
have donned flame retardant suits and went into the fray over the fact that 
no one has heard of that versioning system.

No matter what choice Linus would have made, he would have had 

Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Sean
On Sat, February 19, 2005 4:10 am, Patrick McFarland said:
 On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote:
 On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
  RCS/SCCS format doesn't make much sence for a changeset oriented SCM.

 The advantage it will provide is that it'll be compact and a backup will
 compress at best too. Small compressed tarballs compress very badly
 instead, it wouldn't be even comparable. Once the thing is very compact
 it has a better chance to fit in cache, and if it fits in cache
 extracting diffs from each file will be very fast. Once it'll be compact
 the cost of a changeset will be diminished allowing it to scale better
 too.

 In the case of darcs, RCS/SCCS works exactly opposite of how darcs does.
 By
 using it's super magical method, it represents how code is written and how
 it
 changes (patch theory at its best). You can clearly see the direction code
 is
 going, where it came from, and how it relates to other patches.

 Sure, you can do this with RCS/SCCS style versioning, but whats the point?
 It
 is inefficient, and backwards.

 Now it's true new disks are bigger, but they're not much faster, so if
 the size of the repository is much larger, it'll be much slower to
 checkout if it doesn't fit in cache. And if it's smaller it has better
 chances of fitting in cache too.

 Thats all up to how the versioning system is written. Darcs developers are
 working in a checkpoint system to allow you to just grab the newest stuff,
 and automatically grab anything else you need, instead of just grabbing
 everything. In the case of the darcs linux repo, no one wants to download
 600
 megs or so of changes.

 The thing is, RCS seems a space efficient format for storing patches,
 and it's efficient at extracting them too (plus it's textual so it's not
 going to get lost info even if something goes wrong).

 It may not even be space efficient. Code ultimately is just code, and
 changes
 ultimately are changes. RCS isn't magical, and its far from it. Infact,
 the
 format darcs uses probably stores more code in less space, but don't quote
 me
 on that.

 The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed.

 The darcs repo which has the entire history since at least the start of
 2.4
 (iirc anyways) to *now* is around 600 to 700.

 My suggestion is to convert _all_ dozen thousand changesets to arch or
 SVN and then compare the size with CVS (also the compressed size is
 interesting for backups IMHO). Unfortunately I know nothing about darcs
 yet (except it eats quite some memory ;)

 My suggestion is to convert _all_ dozen thousand changesets to darcs, and
 then
 compare the size with CVS. And no, darcs doesn't eat that much memory for
 the
 amount of work its doing. (And yes, they are working on that).

 The only thing you haven't brought up is the whole omgwtfbbq! BK sucks,
 lets
 switch to SVN or Arch! thing everyone else in the known universe is
 doing.
 BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the
 same
 goes for SVN vs Arch vs Darcs).

 (Start Generic BK Thread On LKML Rant)

 Dear Everyone,

 I think if Linus is happy with BK, he should stick with it. His opinion
 ultimately trumps all of ours because he does all the hard maintainership
 work, and we don't. The only guy that gets to bitch about how much a
 versioning system sucks is the maintainer of a project (unless its CVS,
 then
 all bets are off).

 Linus has so far indicated that he likes BK, so the kernel hacking
 community
 will be stuck using that for awhile. However, that doesn't stop the
 license
 kiddies from coming out of the woodwork and mindlessly quoting the bad
 parts
 of the BK license (which, yes, its non-free, but at this point, who gives
 a
 shit).

 IMO, yes, a non-free versioning system for the crown jewel of the FLOSS
 community is a little... odd, but it was LInus's choice, and we now have
 to
 respect it/deal with it.

 Now, I did say above (in this thread) that darcs would be really awesome
 for
 kernel hacking, especially since it's inherent support for multiple
 branches[1] and the ability to send changes from each other around easily
 would come in handy; however, darcs was not mature at the time of Linus's
 decision (and many say it is still not mature enough), so if Linus had
 actually chosen darcs, I (and other people here) would be now flaming him
 for
 choosing a versioning system that wasn't mature.

 Similarly, if he had chosen arch, everyone would have flamed him for
 choosing
 a hard to use tool. With svn, he would have met flamage by the hands of it
 being too much like cvs and not supporting arch/darcs style branch
 syncing.
 And if he stayed with cvs, he would have been roasted over an open fire
 for
 sticking with an out of date, useless, insane tool.

 And if he chose anything else that I didn't previously mention, everyone
 would
 have donned flame retardant suits and went into the fray over the fact
 that
 no one 

Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Andrea Arcangeli
On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote:
 In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By 
 using it's super magical method, it represents how code is written and how it 
 changes (patch theory at its best). You can clearly see the direction code is 
 going, where it came from, and how it relates to other patches.

I don't know anything about darcs, I was only talking about arch. I
failed to compile darcs after trying for a while, so I give it up, I'll
try again eventually.

But anyway the only thing I care about is that you import all dozen
thousands changesets of the 2.5 kernel into it, and you show it's
manageable with 1G of ram and that the backup size is not very far from
the 75M of CVS.

I read in the webpage of the darcs kernel repository that they had to
add RAM serveral times to avoid running out of memory. They needed more
than 1G IIRC, and that was enough for me to lose interest into it.
You're right I blamed the functional approach and so I felt it was going
to be a mess to fix the ram utilization, but as someone else pointed
out, perhaps it's darcs to blame and not haskell. I don't know.

To me backup size matters too and for example I'm quite happy the fsfs
backend of SVN generates very small backups compared to bsddb.

 Sure, you can do this with RCS/SCCS style versioning, but whats the point? It 
 is inefficient, and backwards.

It is saved in a compact format, and I don't think it'll run slower
since if it's in cache it'll be fast and if it's I/O dominaed the more
compact it is, the faster it will be, having a compact size both for the
repository and for the backup, is more important to me.

In theory one could write a backup tool that extracts the thing and
rewrite a special backup-backend that is as space efficient as CVS and
that can compress as well as CVS, but this won't help the working copy.

 Thats all up to how the versioning system is written. Darcs developers are 
 working in a checkpoint system to allow you to just grab the newest stuff, 

This is already available with arch. Infact I suggested myself how to
improve it with hardlinks so that a checkout will take a few seconds no
matter the size of the tree.

 and automatically grab anything else you need, instead of just grabbing 
 everything. In the case of the darcs linux repo, no one wants to download 600 
 megs or so of changes.

If you use arch/darcs as a patch-download tool, then that's fine as you
say and you can already do that with arch (that in this part seems
already a lot more advanced and it's written in C btw).  Most people
just checking out the kernel with arch (or darcs) would never need to
download 600Megs of changes, but I need to download them all.

The major reason a versioning system is useful to me is to track all
changesets that touched a single file since the start of 2.5 to the
head. So I can't get away by downloading the last dozen patches and
caching the previous tree (which is perfectly doable with arch for ages
and with hardlinks as well).

 It may not even be space efficient. Code ultimately is just code, and changes 
 ultimately are changes. RCS isn't magical, and its far from it. Infact, the 

The way RCS stores the stuff compresses great. In that is magical. I
guess SCCS is the same. fsfs isn't bad either though, and infact I'd
never use bsddb and I'd only use fsfs with SVN.

 The darcs repo which has the entire history since at least the start of 2.4 
 (iirc anyways) to *now* is around 600 to 700. 
 My suggestion is to convert _all_ dozen thousand changesets to darcs, and 
 then 
 compare the size with CVS. And no, darcs doesn't eat that much memory for the 

What is the above 600/700 number then? I thought that was the conversion
of all dozen thousand changesets of linux-2.5 CVS to darcs.

 amount of work its doing. (And yes, they are working on that).

I'll stay tuned.

To me the only argument for not using a magic format like CVS or SCCS
that is space efficient and compresses efficiently, is if you claim it's
going to be a lot slower at checkouts (but infact applying some dozen
thousand patchsets to run a checkout is going to be slower than
CVS/SCCS). I know it's so much simpler to keep each patchset in a
different file like arch is already doing, but that's not the best
on-disk format IMHO.

Note that some year ago I had the opposite idea, i.e. at some point I
got convinced it was so much better to keep each patch separated from
each other like you're advocating above, until I figured out how big the
thing grows and how little space efficient it is, and how much I/O it
forces me to do, how much disk it wastes in the backup and how slow it
is as well to checkout dozen thousand patchsets.

For smaller projects without dozen thousand changesets, the patch per
file looks fine instead. For big projects IMHO being space efficient is
much more important.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a 

Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread David Roundy
On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote:
 But anyway the only thing I care about is that you import all dozen
 thousands changesets of the 2.5 kernel into it, and you show it's
 manageable with 1G of ram and that the backup size is not very far from
 the 75M of CVS.

The linux-2.5 tree right now (I'm re-doing the conversion, and am up to
October of last year, so far) is at 141M, if you don't count the pristine
cache or working directory.  That's already compressed, so you don't get
any extra bonuses.  Darcs stores with each changeset both the old and new
versions of each hunk, which gives you some redundancy, and probably
accounts for the factor of two greater size than CVS.  This gives a bit of
redundancy, which can be helpful in cases of repository corruption.

 I read in the webpage of the darcs kernel repository that they had to
 add RAM serveral times to avoid running out of memory. They needed more
 than 1G IIRC, and that was enough for me to lose interest into it.
 You're right I blamed the functional approach and so I felt it was going
 to be a mess to fix the ram utilization, but as someone else pointed
 out, perhaps it's darcs to blame and not haskell. I don't know.

Darcs' RAM use has indeed already improved somewhat... I'm not exactly sure
how much.  I'm not quite sure how to measure peak virtual memory usage, and
most of the time darcs' memory use while doing the linux kernel conversion
is under a couple of hundred megabytes.

There are indeed trickinesses involved in making sure garbage gets
collected in a timely manner when coding in a lazy language like haskell.

 On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote:
  Thats all up to how the versioning system is written. Darcs developers
  are working in a checkpoint system to allow you to just grab the newest
  stuff,

Correction:  we already have a checkpoint system.  The work in progress is
making commands that examine the history fail gracefully when that history
isn't present.

 This is already available with arch. In fact I suggested myself how to
 improve it with hardlinks so that a checkout will take a few seconds no
 matter the size of the tree.

I presume you're referring to a local checkout? That is already done using
hard links by darcs--only of course the working directory has to actually
be copied over, since there are editors out there that aren't friendly to
hard-linked files.

  and automatically grab anything else you need, instead of just grabbing
  everything. In the case of the darcs linux repo, no one wants to
  download 600 megs or so of changes.
 
 If you use arch/darcs as a patch-download tool, then that's fine
...
 The major reason a versioning system is useful to me is to track all
 changesets that touched a single file since the start of 2.5 to the
 head. So I can't get away by downloading the last dozen patches and
 caching the previous tree (which is perfectly doable with arch for ages
 and with hardlinks as well).

And here's where darcs really falls down.  To track the history of a single
file it has to read the contents of every changeset since the creation of
that file, which will take forever (well, not quite... but close).

I hope to someday (when more pressing issues are dealt with) add a per-file
cache indicating which patches affect which files, which should largely
address this problem, although it won't help at all with files that are
touched by most of the changesets, and won't be optimimal in any case. :(
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-19 Thread Andrea Arcangeli
On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote:
 The linux-2.5 tree right now (I'm re-doing the conversion, and am up to
 October of last year, so far) is at 141M, if you don't count the pristine
 cache or working directory.  That's already compressed, so you don't get
 any extra bonuses.  Darcs stores with each changeset both the old and new
 versions of each hunk, which gives you some redundancy, and probably
 accounts for the factor of two greater size than CVS.  This gives a bit of
 redundancy, which can be helpful in cases of repository corruption.

Double size of the compressed backup is about the same as SVM with fsfs
(not tested on l-k tree but in something much smaller). Why not to
simply checksum instead of having data redundancy? Knowing when
something is corrupted is a great feature, but doing raid1 without the
user asking for it, is a worthless overhead.

The same is true for arch of course, last time I checked they were using
the default -U 3 format instead of -U 0.

 I presume you're referring to a local checkout? That is already done using
 hard links by darcs--only of course the working directory has to actually
 be copied over, since there are editors out there that aren't friendly to
 hard-linked files.

arch allows to hardlink the copy too (optionally) and it's up to you to
use the right switch in the editor (Davide had a LD_PRELOAD to do a
copy-on-write since the kernel doesn't provide the feature).

 And here's where darcs really falls down.  To track the history of a single
 file it has to read the contents of every changeset since the creation of
 that file, which will take forever (well, not quite... but close).

Indeed, and as I mentioned this is the *major* feature as far as I'm
concerned (and frankly the only one I really care about and that helps a
lot to track changes in the tree and understand why the code evolved).

Note that cvsps works great for this, it's very efficient as well (not
remotely comparable to arch at least, even if arch provided a tool
equivalent to cvsps), the only problem is that CVS is out of sync...

 I hope to someday (when more pressing issues are dealt with) add a per-file
 cache indicating which patches affect which files, which should largely
 address this problem, although it won't help at all with files that are
 touched by most of the changesets, and won't be optimimal in any case. :(

Wouldn't using the CVS format help an order of magnitude here? With
CVS/SCCS format you can extract all the patchsets that affected a single
file in a extremely efficient manner, memory utilization will be
extremely low too (like in cvsps indeed). You'll only have to lookup the
global changeset file, and then open the few ,v files that are
affected and extract their patchsets. cvsps does this optimally
already. The only difference is that what cvsps is a readonly cache,
while with a real SCM it would be a global file that control all the
changesets in an atomic way.

Infact *that* global file could be a bsddb too, I don't care about how
the changset file is being encoded, all I care is that the data is a ,v
file or SCCS file so cvsps won't have to read 2 files every time I
ask that question, which is currently unavoidable with both darcs and
arch.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Dustin Sallings
On Feb 18, 2005, at 1:09, Andrea Arcangeli wrote:
darcs scares me a bit because it's in haskell, I don't believe very 
much
in functional languages for compute intensive stuff, ram utilization
	It doesn't sound like you've programmed in functional languages all 
that much.  While I don't have any practical experience in Haskell, I 
can say for sure that my functional code in ocaml blows my C code away 
in maintainability and performance.  Now, if I could only get it to 
dump core...

	Haskell was a draw for me.  It's very rare to find someone who 
actually knows C and can write bulletproof code in it.

On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
RCS/SCCS format doesn't make much sence for a changeset oriented SCM.
The advantage it will provide is that it'll be compact and a backup 
will
compress at best too. Small compressed tarballs compress very badly
instead, it wouldn't be even comparable. Once the thing is very compact
it has a better chance to fit in cache, and if it fits in cache
extracting diffs from each file will be very fast. Once it'll be 
compact
the cost of a changeset will be diminished allowing it to scale better
too.
	Then what gets transferred over the wire?  The full modified ,v file?  
Do you need a smart server to create deltas to be applied to your ,v 
files?  What do you do when someone goes in with an rcs command to 
destroy part of the history (since the storage is now mutable).

	I use both darcs and arch regularly.  darcs is a lot nicer to use from 
a human interface point of view (and the merging is really a lot 
nicer), but the nicest thing about arch is that a given commit is 
immutable.  There are no tools to modify it.  This is also why the 
crypto signature stuff was so easy to fit in.

RCS and SCCS storage throws away most of those features.
--
Dustin Sallings
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Andrea Arcangeli
On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
> RCS/SCCS format doesn't make much sence for a changeset oriented SCM.

The advantage it will provide is that it'll be compact and a backup will
compress at best too. Small compressed tarballs compress very badly
instead, it wouldn't be even comparable. Once the thing is very compact
it has a better chance to fit in cache, and if it fits in cache
extracting diffs from each file will be very fast. Once it'll be compact
the cost of a changeset will be diminished allowing it to scale better
too.

Now it's true new disks are bigger, but they're not much faster, so if
the size of the repository is much larger, it'll be much slower to
checkout if it doesn't fit in cache. And if it's smaller it has better
chances of fitting in cache too.

The thing is, RCS seems a space efficient format for storing patches,
and it's efficient at extracting them too (plus it's textual so it's not
going to get lost info even if something goes wrong).

The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed.

My suggestion is to convert _all_ dozen thousand changesets to arch or
SVN and then compare the size with CVS (also the compressed size is
interesting for backups IMHO). Unfortunately I know nothing about darcs
yet (except it eats quite some memory ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Erik Bågfors
On Fri, 18 Feb 2005 10:09:00 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
> On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote:
> > small to medium sized ones). Last I checked, Arch was still too slow in
> > some areas, though that might have changed in recent months. Also, many
> 
> IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the
> backend and a single file for the changesets and with sane parameters
> conventions miming SVN.

RCS/SCCS format doesn't make much sence for a changeset oriented SCM.

/Erik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Tomasz Zielonka
On Fri, Feb 18, 2005 at 10:09:00AM +0100, Andrea Arcangeli wrote:
> darcs scares me a bit because it's in haskell, I don't believe very much
> in functional languages for compute intensive stuff, ram utilization
> skyrockets sometime (I wouldn't like to need >1G of ram to manage the
> tree).

AFAICS, most of memory related problems in darcs are not necessarily a
result of using Haskell.

> Other languages like python or perl are much slower than C/C++ too but
> at least ram utilization can be normally dominated to sane levels with
> them and they can be greatly optimized easily with C/C++ extensions of
> the performance critical parts.

With those languages, you often have no other choice than resorting to
C. GHC is quite a good compiler and I've often been able to get my
programs run almost as fast as programs written in C++ - however, if I
were to write those programs in C++, I would never do that, despite
being quite a good C++ programmer.

Also, in Haskell you can use extensions written in C, as easily or even
easier than in Python or Perl (I've done this in Perl, heard the battle
stories about C extensions in Python). Haskell's FFI is quite good,
there are also many supporting tools.

Best regards
Tomasz

-- 
Szukamy programisty C++ i Haskell'a: http://tinyurl.com/5mw4e
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Andrea Arcangeli
On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote:
> small to medium sized ones). Last I checked, Arch was still too slow in 
> some areas, though that might have changed in recent months. Also, many 

IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the
backend and a single file for the changesets and with sane parameters
conventions miming SVN. The internal algorithms of arch seems the most
advanced possible. It's just the interface and the fs backend that's so
bad and doesn't compress in the backups either.  SVN bsddb doesn't
compress either by default, but at least the new fsfs compresses pretty
well, not as good as CVS, but not as badly as bsddb and arch either.

I may be completely wrong, so take the above just as a humble
suggestion.

darcs scares me a bit because it's in haskell, I don't believe very much
in functional languages for compute intensive stuff, ram utilization
skyrockets sometime (I wouldn't like to need >1G of ram to manage the
tree). Other languages like python or perl are much slower than C/C++
too but at least ram utilization can be normally dominated to sane
levels with them and they can be greatly optimized easily with C/C++
extensions of the performance critical parts.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Andrea Arcangeli
On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote:
 small to medium sized ones). Last I checked, Arch was still too slow in 
 some areas, though that might have changed in recent months. Also, many 

IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the
backend and a single file for the changesets and with sane parameters
conventions miming SVN. The internal algorithms of arch seems the most
advanced possible. It's just the interface and the fs backend that's so
bad and doesn't compress in the backups either.  SVN bsddb doesn't
compress either by default, but at least the new fsfs compresses pretty
well, not as good as CVS, but not as badly as bsddb and arch either.

I may be completely wrong, so take the above just as a humble
suggestion.

darcs scares me a bit because it's in haskell, I don't believe very much
in functional languages for compute intensive stuff, ram utilization
skyrockets sometime (I wouldn't like to need 1G of ram to manage the
tree). Other languages like python or perl are much slower than C/C++
too but at least ram utilization can be normally dominated to sane
levels with them and they can be greatly optimized easily with C/C++
extensions of the performance critical parts.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Tomasz Zielonka
On Fri, Feb 18, 2005 at 10:09:00AM +0100, Andrea Arcangeli wrote:
 darcs scares me a bit because it's in haskell, I don't believe very much
 in functional languages for compute intensive stuff, ram utilization
 skyrockets sometime (I wouldn't like to need 1G of ram to manage the
 tree).

AFAICS, most of memory related problems in darcs are not necessarily a
result of using Haskell.

 Other languages like python or perl are much slower than C/C++ too but
 at least ram utilization can be normally dominated to sane levels with
 them and they can be greatly optimized easily with C/C++ extensions of
 the performance critical parts.

With those languages, you often have no other choice than resorting to
C. GHC is quite a good compiler and I've often been able to get my
programs run almost as fast as programs written in C++ - however, if I
were to write those programs in C++, I would never do that, despite
being quite a good C++ programmer.

Also, in Haskell you can use extensions written in C, as easily or even
easier than in Python or Perl (I've done this in Perl, heard the battle
stories about C extensions in Python). Haskell's FFI is quite good,
there are also many supporting tools.

Best regards
Tomasz

-- 
Szukamy programisty C++ i Haskell'a: http://tinyurl.com/5mw4e
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Erik Bågfors
On Fri, 18 Feb 2005 10:09:00 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote:
 On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote:
  small to medium sized ones). Last I checked, Arch was still too slow in
  some areas, though that might have changed in recent months. Also, many
 
 IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the
 backend and a single file for the changesets and with sane parameters
 conventions miming SVN.

RCS/SCCS format doesn't make much sence for a changeset oriented SCM.

/Erik
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Andrea Arcangeli
On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
 RCS/SCCS format doesn't make much sence for a changeset oriented SCM.

The advantage it will provide is that it'll be compact and a backup will
compress at best too. Small compressed tarballs compress very badly
instead, it wouldn't be even comparable. Once the thing is very compact
it has a better chance to fit in cache, and if it fits in cache
extracting diffs from each file will be very fast. Once it'll be compact
the cost of a changeset will be diminished allowing it to scale better
too.

Now it's true new disks are bigger, but they're not much faster, so if
the size of the repository is much larger, it'll be much slower to
checkout if it doesn't fit in cache. And if it's smaller it has better
chances of fitting in cache too.

The thing is, RCS seems a space efficient format for storing patches,
and it's efficient at extracting them too (plus it's textual so it's not
going to get lost info even if something goes wrong).

The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed.

My suggestion is to convert _all_ dozen thousand changesets to arch or
SVN and then compare the size with CVS (also the compressed size is
interesting for backups IMHO). Unfortunately I know nothing about darcs
yet (except it eats quite some memory ;)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-18 Thread Dustin Sallings
On Feb 18, 2005, at 1:09, Andrea Arcangeli wrote:
darcs scares me a bit because it's in haskell, I don't believe very 
much
in functional languages for compute intensive stuff, ram utilization
	It doesn't sound like you've programmed in functional languages all 
that much.  While I don't have any practical experience in Haskell, I 
can say for sure that my functional code in ocaml blows my C code away 
in maintainability and performance.  Now, if I could only get it to 
dump core...

	Haskell was a draw for me.  It's very rare to find someone who 
actually knows C and can write bulletproof code in it.

On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote:
RCS/SCCS format doesn't make much sence for a changeset oriented SCM.
The advantage it will provide is that it'll be compact and a backup 
will
compress at best too. Small compressed tarballs compress very badly
instead, it wouldn't be even comparable. Once the thing is very compact
it has a better chance to fit in cache, and if it fits in cache
extracting diffs from each file will be very fast. Once it'll be 
compact
the cost of a changeset will be diminished allowing it to scale better
too.
	Then what gets transferred over the wire?  The full modified ,v file?  
Do you need a smart server to create deltas to be applied to your ,v 
files?  What do you do when someone goes in with an rcs command to 
destroy part of the history (since the storage is now mutable).

	I use both darcs and arch regularly.  darcs is a lot nicer to use from 
a human interface point of view (and the merging is really a lot 
nicer), but the nicest thing about arch is that a given commit is 
immutable.  There are no tools to modify it.  This is also why the 
crypto signature stuff was so easy to fit in.

RCS and SCCS storage throws away most of those features.
--
Dustin Sallings
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-17 Thread Sean
On Thu, February 17, 2005 9:24 pm, Tupshin Harper said:

Hi Tupshin,

> Speaking as somebody that uses Darcs evey day, my opinion is that the
> future of OSS SCM will be something like arch or darcs but that neither
> are ready for projects the size of the linux kernel yet. Darcs is
> definitely way too slow for really large projects (though great for
> small to medium sized ones). Last I checked, Arch was still too slow in
> some areas, though that might have changed in recent months. Also, many
> people, me included, find the usability of arch to be from ideal.
>
> My hope and expectation is that Arch and Darcs will both improve their
> performance, features, and usability and that in the not too distant
> future both of them will be viable alternatives for large scale source
> tree management.

Falling into the same category probably is svk, although it's less mature
than the options you cite.

> The important thing for the health of the SCM ecosystem is that there be
> ways to losslessly convert and interoperate between them as well as
> between legacy/centralized systems such as CVS and SVN as well as with
> BK.

Amen.

Thanks,
Sean


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-17 Thread Tupshin Harper
Patrick McFarland wrote:
On Sunday 13 February 2005 09:08 pm, Larry McVoy wrote:
 

Something that unintentionally started a flamewar.
   

Well, we just went through another round of 'BK sucks' and 'BK sucks, we need 
to switch to something else'.

Sans the flamewar, are there any options? CVS and SVN are out because they do 
not support 'off server' branches (arch and darcs do). Darcs would probably 
be the best choice because its easy to use, and the darcs team almost has a 
working linux-kernel import script (originally designed to just test darcs 
with a huge repo, but provides a mostly working linux tree).

So, without the flamewar, what is everyone's opinion on this? 
 

Speaking as somebody that uses Darcs evey day, my opinion is that the 
future of OSS SCM will be something like arch or darcs but that neither 
are ready for projects the size of the linux kernel yet. Darcs is 
definitely way too slow for really large projects (though great for 
small to medium sized ones). Last I checked, Arch was still too slow in 
some areas, though that might have changed in recent months. Also, many 
people, me included, find the usability of arch to be from ideal.

My hope and expectation is that Arch and Darcs will both improve their 
performance, features, and usability and that in the not too distant 
future both of them will be viable alternatives for large scale source 
tree management.

The important thing for the health of the SCM ecosystem is that there be 
ways to losslessly convert and interoperate between them as well as 
between legacy/centralized systems such as CVS and SVN as well as with BK.

-Tupshin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-17 Thread Tupshin Harper
Patrick McFarland wrote:
On Sunday 13 February 2005 09:08 pm, Larry McVoy wrote:
 

Something that unintentionally started a flamewar.
   

Well, we just went through another round of 'BK sucks' and 'BK sucks, we need 
to switch to something else'.

Sans the flamewar, are there any options? CVS and SVN are out because they do 
not support 'off server' branches (arch and darcs do). Darcs would probably 
be the best choice because its easy to use, and the darcs team almost has a 
working linux-kernel import script (originally designed to just test darcs 
with a huge repo, but provides a mostly working linux tree).

So, without the flamewar, what is everyone's opinion on this? 
 

Speaking as somebody that uses Darcs evey day, my opinion is that the 
future of OSS SCM will be something like arch or darcs but that neither 
are ready for projects the size of the linux kernel yet. Darcs is 
definitely way too slow for really large projects (though great for 
small to medium sized ones). Last I checked, Arch was still too slow in 
some areas, though that might have changed in recent months. Also, many 
people, me included, find the usability of arch to be from ideal.

My hope and expectation is that Arch and Darcs will both improve their 
performance, features, and usability and that in the not too distant 
future both of them will be viable alternatives for large scale source 
tree management.

The important thing for the health of the SCM ecosystem is that there be 
ways to losslessly convert and interoperate between them as well as 
between legacy/centralized systems such as CVS and SVN as well as with BK.

-Tupshin
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [darcs-users] Re: [BK] upgrade will be needed

2005-02-17 Thread Sean
On Thu, February 17, 2005 9:24 pm, Tupshin Harper said:

Hi Tupshin,

 Speaking as somebody that uses Darcs evey day, my opinion is that the
 future of OSS SCM will be something like arch or darcs but that neither
 are ready for projects the size of the linux kernel yet. Darcs is
 definitely way too slow for really large projects (though great for
 small to medium sized ones). Last I checked, Arch was still too slow in
 some areas, though that might have changed in recent months. Also, many
 people, me included, find the usability of arch to be from ideal.

 My hope and expectation is that Arch and Darcs will both improve their
 performance, features, and usability and that in the not too distant
 future both of them will be viable alternatives for large scale source
 tree management.

Falling into the same category probably is svk, although it's less mature
than the options you cite.

 The important thing for the health of the SCM ecosystem is that there be
 ways to losslessly convert and interoperate between them as well as
 between legacy/centralized systems such as CVS and SVN as well as with
 BK.

Amen.

Thanks,
Sean


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/