Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, 2005-02-21 at 20:45, [EMAIL PROTECTED] wrote: > CVS was pretty good at keeping files sane, but I'll go for a solution that > completely sidesteps said problem any day. One way to get the benefits of both worlds would be to keep an additional history of changes (in whatever form) that allows to rebuild the ,v files. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, 2005-02-21 at 20:45, [EMAIL PROTECTED] wrote: CVS was pretty good at keeping files sane, but I'll go for a solution that completely sidesteps said problem any day. One way to get the benefits of both worlds would be to keep an additional history of changes (in whatever form) that allows to rebuild the ,v files. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, 21 Feb 2005, David Roundy wrote: I just scanned the comparison of various source-code management schemes at http://zooko.com/revision_control_quick_ref.html and found myself wishing for a similar review of bk (which was excluded, not being open-source). Would you (or anyone else) be willing to compose a similar evaluation of bk so we amateurs can get a feeling for how these systems differ? Thanks! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
[EMAIL PROTECTED] said: > On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote: [...] > > AFIK all other SCM except arch and darcs always modify the repo, I never > > heard complains about it, as long as incremental backups are possible > > and they definitely are possible. > Well, as you seem to have never been bitten by that bug; let me assure you > the problem is very real. Each file (,v file) can live in the repo for > many years and has to servive any spurious writes to be usable. The > curruption of such files (in my experience) only shows itself if you try > to access its history; which may be weeks after the corruption started, > and you can't use a backup for that since you will overwrite new versions > added since. Marking files read-only won't save you from corruption by NFS or the disk or the kernel or... randomly scribbling around. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, ChileFax: +56 32 797513 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote: > Hello Miles, > > On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote: > > Yeah, the basic way arch organizes its repository seems _far_ more sane > > than the crazy way CVS (or BK) does, for a variety of reasons[*]. No > > doubt there are certain usage patterns which stress it, but I think it > > makes a lot more sense to use a layer of caching to take care of those, > > rather than screwing up the underlying organization. > > > > [*] (a) Immutability of repository files (_massively_ good idea) > > what is so important about never modifying the repo? Note that only the > global changeset database and a few ,v files will be modified for each > changeset, it's not like we're going to touch all the ,v files for each > checkin. Touching the "modified" ,v files sounds a very minor overhead. > > And you can incremental backup the stuff with recursive diffing the > repo too. > > AFIK all other SCM except arch and darcs always modify the repo, I never > heard complains about it, as long as incremental backups are possible > and they definitely are possible. Well, as you seem to have never been bitten by that bug; let me assure you the problem is very real. Each file (,v file) can live in the repo for many years and has to servive any spurious writes to be usable. The curruption of such files (in my experience) only shows itself if you try to access its history; which may be weeks after the corruption started, and you can't use a backup for that since you will overwrite new versions added since. Think about it this way; nfs servers are known to corrupt things; reboots can corrupt files, different clients will try to write to the file at the same time quite often during the lifetime of the file, cvs clients get killed during writes or network drops the connection during a session. Considering that the ,v files have a lifetime of years, with many modifications during that time, I think its amazing corruption does not happen more often. CVS was pretty good at keeping files sane, but I'll go for a solution that completely sidesteps said problem any day. -- Thomas Zander pgpTm8YJfFbYt.pgp Description: PGP signature
Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, Feb 21, 2005 at 07:41:54AM -0500, David Roundy wrote: > The catch is that then we'd have to implement a smart server to keep users > from having to download the entire history with each update. That's not a > fundamentally hard issue, but it would definitely degrade darcs' ease of > use, besides putting additional load on the server. So if something like > this were implemented, I think it would definitely have to be as an > optional format. > > Also, we couldn't actually store the data in CVS/SCCS format, since in > darcs a patch to a single file isn't uniquely determined by two states of > that file. But storing separately the patches relevant to different files > would definitely be an optimization worth considering. What about just a cache file that records, for each "file" which patches affect it. Now that I think about it, this is a little tricky, since I'm not sure what that file would be called. It would be easy to do for filenames in the current version. Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
Hello Miles, On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote: > Yeah, the basic way arch organizes its repository seems _far_ more sane > than the crazy way CVS (or BK) does, for a variety of reasons[*]. No > doubt there are certain usage patterns which stress it, but I think it > makes a lot more sense to use a layer of caching to take care of those, > rather than screwing up the underlying organization. > > [*] (a) Immutability of repository files (_massively_ good idea) what is so important about never modifying the repo? Note that only the global changeset database and a few ,v files will be modified for each changeset, it's not like we're going to touch all the ,v files for each checkin. Touching the "modified" ,v files sounds a very minor overhead. And you can incremental backup the stuff with recursive diffing the repo too. AFIK all other SCM except arch and darcs always modify the repo, I never heard complains about it, as long as incremental backups are possible and they definitely are possible. > (b) Deals with tree-changes "naturally" (CVS-style ,v files are a > complete mess for anything except file-content changes) Certainly it's more complicated but I believe the end result will be a better on-disk format. Don't get me wrong, current disk format is great already for small projects, it's the simplest approach and it's very reliable, but I don't think the current on-disk it scales up to the l-k with reasonable performance. > (c) Directly corresponds to traditional diff 'n' patch, easy to > think about, no surprises Nobody is supposed to touch the repo with an editor anyway, all it matters is how fast the command works. And you'll be able to ask to the SCM "show me all changesets touching this file, or even ""this range" of the file"", in the last 2 years" and get an answer in dozen seconds like with cvsps today. even cvsps creates an huge cache, but it doesn't need to unpack >2 tar.gz tarballs to create its cache. Feel free to prove me wrong and covert current kernel CVS to arch and see how big it grows unpacked ;). Anyway this is quickly going offtopic with l-k, so we should take it to darcs and arch lists. Thanks! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Saturday 19 February 2005 12:53 pm, Andrea Arcangeli wrote: > Wouldn't using the CVS format help an order of magnitude here? With > CVS/SCCS format you can extract all the patchsets that affected a single > file in a extremely efficient manner, memory utilization will be > extremely low too (like in cvsps indeed). You'll only have to lookup the > "global changeset file", and then open the few ,v files that are > affected and extract their patchsets. cvsps does this optimally > already. The only difference is that what cvsps is a "readonly" cache, > while with a real SCM it would be a global file that control all the > changesets in an atomic way. But then that makes darcs do stuff the cvs way, and would make darcs exactly the opposite of how us darcs users want it, imho. If you're worried about darcs needing to open a billion files, nothing stops people from, say, hacking darcs to use a SQL database to store patches in (they just have to code it, and I think I saw a SQL module for haskell around somewhere...) May be I just don't understand the argument of why the CVS file format is anything short of insane, backwards, and outdated. We want each chunk of information to be both independent and have a clear history (ie, what patches does this patch rely on). CVS does not provide this, it is not fine grained enough for what darcs needs. (David Roundy and Co can fill in the technical details of this, I'm not a versioning system expert) In short, we need to move as far away from the CVS way of doing things, because ultimately its the wrong way. This is why I am somewhat dismayed when I hear of projects who move to SVN from CVS... SVN is just CVS with a few flaws fixed, and a few things like atomic commits added. It isn't the next step: it is just a small stepping stone between CVS and the next step. -- Patrick "Diablo-D3" McFarland || [EMAIL PROTECTED] "Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989 pgpB5eGVGeUue.pgp Description: PGP signature
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 06:53:48PM +0100, Andrea Arcangeli wrote: > On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote: > > The linux-2.5 tree right now (I'm re-doing the conversion, and am up to > > October of last year, so far) is at 141M, if you don't count the pristine > > cache or working directory. That's already compressed, so you don't get > > any extra bonuses. Darcs stores with each changeset both the old and new > > versions of each hunk, which gives you some redundancy, and probably > > accounts for the factor of two greater size than CVS. This gives a bit of > > redundancy, which can be helpful in cases of repository corruption. > > Double size of the compressed backup is about the same as SVM with fsfs > (not tested on l-k tree but in something much smaller). Why not to > simply checksum instead of having data redundancy? Knowing when > something is corrupted is a great feature, but doing raid1 without the > user asking for it, is a worthless overhead. There are internal issues that would cause trouble here--darcs assumes that if it knows a given patch, it also knows the patch's inverse. > > I hope to someday (when more pressing issues are dealt with) add a per-file > > cache indicating which patches affect which files, which should largely > > address this problem, although it won't help at all with files that are > > touched by most of the changesets, and won't be optimimal in any case. :( > > Wouldn't using the CVS format help an order of magnitude here? With > CVS/SCCS format you can extract all the patchsets that affected a single > file in a extremely efficient manner, memory utilization will be > extremely low too (like in cvsps indeed). You'll only have to lookup the > "global changeset file", and then open the few ,v files that are > affected and extract their patchsets. cvsps does this optimally > already. The only difference is that what cvsps is a "readonly" cache, > while with a real SCM it would be a global file that control all the > changesets in an atomic way. The catch is that then we'd have to implement a smart server to keep users from having to download the entire history with each update. That's not a fundamentally hard issue, but it would definitely degrade darcs' ease of use, besides putting additional load on the server. So if something like this were implemented, I think it would definitely have to be as an optional format. Also, we couldn't actually store the data in CVS/SCCS format, since in darcs a patch to a single file isn't uniquely determined by two states of that file. But storing separately the patches relevant to different files would definitely be an optimization worth considering. -- David Roundy http://www.darcs.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 06:53:48PM +0100, Andrea Arcangeli wrote: On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote: The linux-2.5 tree right now (I'm re-doing the conversion, and am up to October of last year, so far) is at 141M, if you don't count the pristine cache or working directory. That's already compressed, so you don't get any extra bonuses. Darcs stores with each changeset both the old and new versions of each hunk, which gives you some redundancy, and probably accounts for the factor of two greater size than CVS. This gives a bit of redundancy, which can be helpful in cases of repository corruption. Double size of the compressed backup is about the same as SVM with fsfs (not tested on l-k tree but in something much smaller). Why not to simply checksum instead of having data redundancy? Knowing when something is corrupted is a great feature, but doing raid1 without the user asking for it, is a worthless overhead. There are internal issues that would cause trouble here--darcs assumes that if it knows a given patch, it also knows the patch's inverse. I hope to someday (when more pressing issues are dealt with) add a per-file cache indicating which patches affect which files, which should largely address this problem, although it won't help at all with files that are touched by most of the changesets, and won't be optimimal in any case. :( Wouldn't using the CVS format help an order of magnitude here? With CVS/SCCS format you can extract all the patchsets that affected a single file in a extremely efficient manner, memory utilization will be extremely low too (like in cvsps indeed). You'll only have to lookup the global changeset file, and then open the few ,v files that are affected and extract their patchsets. cvsps does this optimally already. The only difference is that what cvsps is a readonly cache, while with a real SCM it would be a global file that control all the changesets in an atomic way. The catch is that then we'd have to implement a smart server to keep users from having to download the entire history with each update. That's not a fundamentally hard issue, but it would definitely degrade darcs' ease of use, besides putting additional load on the server. So if something like this were implemented, I think it would definitely have to be as an optional format. Also, we couldn't actually store the data in CVS/SCCS format, since in darcs a patch to a single file isn't uniquely determined by two states of that file. But storing separately the patches relevant to different files would definitely be an optimization worth considering. -- David Roundy http://www.darcs.net - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Saturday 19 February 2005 12:53 pm, Andrea Arcangeli wrote: Wouldn't using the CVS format help an order of magnitude here? With CVS/SCCS format you can extract all the patchsets that affected a single file in a extremely efficient manner, memory utilization will be extremely low too (like in cvsps indeed). You'll only have to lookup the global changeset file, and then open the few ,v files that are affected and extract their patchsets. cvsps does this optimally already. The only difference is that what cvsps is a readonly cache, while with a real SCM it would be a global file that control all the changesets in an atomic way. But then that makes darcs do stuff the cvs way, and would make darcs exactly the opposite of how us darcs users want it, imho. If you're worried about darcs needing to open a billion files, nothing stops people from, say, hacking darcs to use a SQL database to store patches in (they just have to code it, and I think I saw a SQL module for haskell around somewhere...) May be I just don't understand the argument of why the CVS file format is anything short of insane, backwards, and outdated. We want each chunk of information to be both independent and have a clear history (ie, what patches does this patch rely on). CVS does not provide this, it is not fine grained enough for what darcs needs. (David Roundy and Co can fill in the technical details of this, I'm not a versioning system expert) In short, we need to move as far away from the CVS way of doing things, because ultimately its the wrong way. This is why I am somewhat dismayed when I hear of projects who move to SVN from CVS... SVN is just CVS with a few flaws fixed, and a few things like atomic commits added. It isn't the next step: it is just a small stepping stone between CVS and the next step. -- Patrick Diablo-D3 McFarland || [EMAIL PROTECTED] Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music. -- Kristian Wilson, Nintendo, Inc, 1989 pgpB5eGVGeUue.pgp Description: PGP signature
Re: [darcs-users] Re: [BK] upgrade will be needed
Hello Miles, On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote: Yeah, the basic way arch organizes its repository seems _far_ more sane than the crazy way CVS (or BK) does, for a variety of reasons[*]. No doubt there are certain usage patterns which stress it, but I think it makes a lot more sense to use a layer of caching to take care of those, rather than screwing up the underlying organization. [*] (a) Immutability of repository files (_massively_ good idea) what is so important about never modifying the repo? Note that only the global changeset database and a few ,v files will be modified for each changeset, it's not like we're going to touch all the ,v files for each checkin. Touching the modified ,v files sounds a very minor overhead. And you can incremental backup the stuff with recursive diffing the repo too. AFIK all other SCM except arch and darcs always modify the repo, I never heard complains about it, as long as incremental backups are possible and they definitely are possible. (b) Deals with tree-changes naturally (CVS-style ,v files are a complete mess for anything except file-content changes) Certainly it's more complicated but I believe the end result will be a better on-disk format. Don't get me wrong, current disk format is great already for small projects, it's the simplest approach and it's very reliable, but I don't think the current on-disk it scales up to the l-k with reasonable performance. (c) Directly corresponds to traditional diff 'n' patch, easy to think about, no surprises Nobody is supposed to touch the repo with an editor anyway, all it matters is how fast the command works. And you'll be able to ask to the SCM show me all changesets touching this file, or even this range of the file, in the last 2 years and get an answer in dozen seconds like with cvsps today. even cvsps creates an huge cache, but it doesn't need to unpack 2 tar.gz tarballs to create its cache. Feel free to prove me wrong and covert current kernel CVS to arch and see how big it grows unpacked ;). Anyway this is quickly going offtopic with l-k, so we should take it to darcs and arch lists. Thanks! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, Feb 21, 2005 at 07:41:54AM -0500, David Roundy wrote: The catch is that then we'd have to implement a smart server to keep users from having to download the entire history with each update. That's not a fundamentally hard issue, but it would definitely degrade darcs' ease of use, besides putting additional load on the server. So if something like this were implemented, I think it would definitely have to be as an optional format. Also, we couldn't actually store the data in CVS/SCCS format, since in darcs a patch to a single file isn't uniquely determined by two states of that file. But storing separately the patches relevant to different files would definitely be an optimization worth considering. What about just a cache file that records, for each file which patches affect it. Now that I think about it, this is a little tricky, since I'm not sure what that file would be called. It would be easy to do for filenames in the current version. Dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote: Hello Miles, On Mon, Feb 21, 2005 at 02:39:05PM +0900, Miles Bader wrote: Yeah, the basic way arch organizes its repository seems _far_ more sane than the crazy way CVS (or BK) does, for a variety of reasons[*]. No doubt there are certain usage patterns which stress it, but I think it makes a lot more sense to use a layer of caching to take care of those, rather than screwing up the underlying organization. [*] (a) Immutability of repository files (_massively_ good idea) what is so important about never modifying the repo? Note that only the global changeset database and a few ,v files will be modified for each changeset, it's not like we're going to touch all the ,v files for each checkin. Touching the modified ,v files sounds a very minor overhead. And you can incremental backup the stuff with recursive diffing the repo too. AFIK all other SCM except arch and darcs always modify the repo, I never heard complains about it, as long as incremental backups are possible and they definitely are possible. Well, as you seem to have never been bitten by that bug; let me assure you the problem is very real. Each file (,v file) can live in the repo for many years and has to servive any spurious writes to be usable. The curruption of such files (in my experience) only shows itself if you try to access its history; which may be weeks after the corruption started, and you can't use a backup for that since you will overwrite new versions added since. Think about it this way; nfs servers are known to corrupt things; reboots can corrupt files, different clients will try to write to the file at the same time quite often during the lifetime of the file, cvs clients get killed during writes or network drops the connection during a session. Considering that the ,v files have a lifetime of years, with many modifications during that time, I think its amazing corruption does not happen more often. CVS was pretty good at keeping files sane, but I'll go for a solution that completely sidesteps said problem any day. -- Thomas Zander pgpTm8YJfFbYt.pgp Description: PGP signature
Re: [darcs-users] Re: [BK] upgrade will be needed
[EMAIL PROTECTED] said: On Mon, Feb 21, 2005 at 04:53:06PM +0100, Andrea Arcangeli wrote: [...] AFIK all other SCM except arch and darcs always modify the repo, I never heard complains about it, as long as incremental backups are possible and they definitely are possible. Well, as you seem to have never been bitten by that bug; let me assure you the problem is very real. Each file (,v file) can live in the repo for many years and has to servive any spurious writes to be usable. The curruption of such files (in my experience) only shows itself if you try to access its history; which may be weeks after the corruption started, and you can't use a backup for that since you will overwrite new versions added since. Marking files read-only won't save you from corruption by NFS or the disk or the kernel or... randomly scribbling around. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, ChileFax: +56 32 797513 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Mon, 21 Feb 2005, David Roundy wrote: snip very technical discussion I just scanned the comparison of various source-code management schemes at http://zooko.com/revision_control_quick_ref.html and found myself wishing for a similar review of bk (which was excluded, not being open-source). Would you (or anyone else) be willing to compose a similar evaluation of bk so we amateurs can get a feeling for how these systems differ? Thanks! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
Dustin Sallings writes: > but the nicest thing about arch is that a given commit is immutable. > There are no tools to modify it. This is also why the crypto > signature stuff was so easy to fit in. > > RCS and SCCS storage throws away most of those features. Yeah, the basic way arch organizes its repository seems _far_ more sane than the crazy way CVS (or BK) does, for a variety of reasons[*]. No doubt there are certain usage patterns which stress it, but I think it makes a lot more sense to use a layer of caching to take care of those, rather than screwing up the underlying organization. [*] (a) Immutability of repository files (_massively_ good idea) (b) Deals with tree-changes "naturally" (CVS-style ,v files are a complete mess for anything except file-content changes) (c) Directly corresponds to traditional diff 'n' patch, easy to think about, no surprises -Miles -- Saa, shall we dance? (from a dance-class advertisement) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
Hi, David Roundy, creator of darcs, wrote: > On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote: > > I read in the webpage of the darcs kernel repository that they had > > to add RAM serveral times to avoid running out of memory. They > > needed more than 1G IIRC, and that was enough for me to lose > > interest into it. You're right I blamed the functional approach and > > so I felt it was going to be a mess to fix the ram utilization, but > > as someone else pointed out, perhaps it's darcs to blame and not > > haskell. I don't know. > > Darcs' RAM use has indeed already improved somewhat... I'm not exactly > sure how much. I'm not quite sure how to measure peak virtual memory > usage, and most of the time darcs' memory use while doing the linux > kernel conversion is under a couple of hundred megabytes. Wouldn't calling sbrk(0) help? I don't know if the Haskell run-time ever shrinks the data segment, if not, it could just be called at the end. Or a `strace -e trace=brk darcs ...' might do. But I guess darcs has other VM usage that doesn't show in this figure? Does /proc/$$/maps if running under Linux help? A consistent way to measure would be handy for observing changes over time. Cheers, Ralph. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
Hi, David Roundy, creator of darcs, wrote: On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote: I read in the webpage of the darcs kernel repository that they had to add RAM serveral times to avoid running out of memory. They needed more than 1G IIRC, and that was enough for me to lose interest into it. You're right I blamed the functional approach and so I felt it was going to be a mess to fix the ram utilization, but as someone else pointed out, perhaps it's darcs to blame and not haskell. I don't know. Darcs' RAM use has indeed already improved somewhat... I'm not exactly sure how much. I'm not quite sure how to measure peak virtual memory usage, and most of the time darcs' memory use while doing the linux kernel conversion is under a couple of hundred megabytes. Wouldn't calling sbrk(0) help? I don't know if the Haskell run-time ever shrinks the data segment, if not, it could just be called at the end. Or a `strace -e trace=brk darcs ...' might do. But I guess darcs has other VM usage that doesn't show in this figure? Does /proc/$$/maps if running under Linux help? A consistent way to measure would be handy for observing changes over time. Cheers, Ralph. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
Dustin Sallings dustin@spy.net writes: but the nicest thing about arch is that a given commit is immutable. There are no tools to modify it. This is also why the crypto signature stuff was so easy to fit in. RCS and SCCS storage throws away most of those features. Yeah, the basic way arch organizes its repository seems _far_ more sane than the crazy way CVS (or BK) does, for a variety of reasons[*]. No doubt there are certain usage patterns which stress it, but I think it makes a lot more sense to use a layer of caching to take care of those, rather than screwing up the underlying organization. [*] (a) Immutability of repository files (_massively_ good idea) (b) Deals with tree-changes naturally (CVS-style ,v files are a complete mess for anything except file-content changes) (c) Directly corresponds to traditional diff 'n' patch, easy to think about, no surprises -Miles -- Saa, shall we dance? (from a dance-class advertisement) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote: > The linux-2.5 tree right now (I'm re-doing the conversion, and am up to > October of last year, so far) is at 141M, if you don't count the pristine > cache or working directory. That's already compressed, so you don't get > any extra bonuses. Darcs stores with each changeset both the old and new > versions of each hunk, which gives you some redundancy, and probably > accounts for the factor of two greater size than CVS. This gives a bit of > redundancy, which can be helpful in cases of repository corruption. Double size of the compressed backup is about the same as SVM with fsfs (not tested on l-k tree but in something much smaller). Why not to simply checksum instead of having data redundancy? Knowing when something is corrupted is a great feature, but doing raid1 without the user asking for it, is a worthless overhead. The same is true for arch of course, last time I checked they were using the default -U 3 format instead of -U 0. > I presume you're referring to a local checkout? That is already done using > hard links by darcs--only of course the working directory has to actually > be copied over, since there are editors out there that aren't friendly to > hard-linked files. arch allows to hardlink the copy too (optionally) and it's up to you to use the right switch in the editor (Davide had a LD_PRELOAD to do a copy-on-write since the kernel doesn't provide the feature). > And here's where darcs really falls down. To track the history of a single > file it has to read the contents of every changeset since the creation of > that file, which will take forever (well, not quite... but close). Indeed, and as I mentioned this is the *major* feature as far as I'm concerned (and frankly the only one I really care about and that helps a lot to track changes in the tree and understand why the code evolved). Note that cvsps works great for this, it's very efficient as well (not remotely comparable to arch at least, even if arch provided a tool equivalent to cvsps), the only problem is that CVS is out of sync... > I hope to someday (when more pressing issues are dealt with) add a per-file > cache indicating which patches affect which files, which should largely > address this problem, although it won't help at all with files that are > touched by most of the changesets, and won't be optimimal in any case. :( Wouldn't using the CVS format help an order of magnitude here? With CVS/SCCS format you can extract all the patchsets that affected a single file in a extremely efficient manner, memory utilization will be extremely low too (like in cvsps indeed). You'll only have to lookup the "global changeset file", and then open the few ,v files that are affected and extract their patchsets. cvsps does this optimally already. The only difference is that what cvsps is a "readonly" cache, while with a real SCM it would be a global file that control all the changesets in an atomic way. Infact *that* global file could be a bsddb too, I don't care about how the changset file is being encoded, all I care is that the data is a ,v file or SCCS file so cvsps won't have to read >2 files every time I ask that question, which is currently unavoidable with both darcs and arch. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote: > But anyway the only thing I care about is that you import all dozen > thousands changesets of the 2.5 kernel into it, and you show it's > manageable with <1G of ram and that the backup size is not very far from > the 75M of CVS. The linux-2.5 tree right now (I'm re-doing the conversion, and am up to October of last year, so far) is at 141M, if you don't count the pristine cache or working directory. That's already compressed, so you don't get any extra bonuses. Darcs stores with each changeset both the old and new versions of each hunk, which gives you some redundancy, and probably accounts for the factor of two greater size than CVS. This gives a bit of redundancy, which can be helpful in cases of repository corruption. > I read in the webpage of the darcs kernel repository that they had to > add RAM serveral times to avoid running out of memory. They needed more > than 1G IIRC, and that was enough for me to lose interest into it. > You're right I blamed the functional approach and so I felt it was going > to be a mess to fix the ram utilization, but as someone else pointed > out, perhaps it's darcs to blame and not haskell. I don't know. Darcs' RAM use has indeed already improved somewhat... I'm not exactly sure how much. I'm not quite sure how to measure peak virtual memory usage, and most of the time darcs' memory use while doing the linux kernel conversion is under a couple of hundred megabytes. There are indeed trickinesses involved in making sure garbage gets collected in a timely manner when coding in a lazy language like haskell. > On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote: > > Thats all up to how the versioning system is written. Darcs developers > > are working in a checkpoint system to allow you to just grab the newest > > stuff, Correction: we already have a checkpoint system. The work in progress is making commands that examine the history fail gracefully when that history isn't present. > This is already available with arch. In fact I suggested myself how to > improve it with hardlinks so that a checkout will take a few seconds no > matter the size of the tree. I presume you're referring to a local checkout? That is already done using hard links by darcs--only of course the working directory has to actually be copied over, since there are editors out there that aren't friendly to hard-linked files. > > and automatically grab anything else you need, instead of just grabbing > > everything. In the case of the darcs linux repo, no one wants to > > download 600 megs or so of changes. > > If you use arch/darcs as a patch-download tool, then that's fine ... > The major reason a versioning system is useful to me is to track all > changesets that touched a single file since the start of 2.5 to the > head. So I can't get away by downloading the last dozen patches and > caching the previous tree (which is perfectly doable with arch for ages > and with hardlinks as well). And here's where darcs really falls down. To track the history of a single file it has to read the contents of every changeset since the creation of that file, which will take forever (well, not quite... but close). I hope to someday (when more pressing issues are dealt with) add a per-file cache indicating which patches affect which files, which should largely address this problem, although it won't help at all with files that are touched by most of the changesets, and won't be optimimal in any case. :( -- David Roundy http://www.darcs.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote: > In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By > using it's super magical method, it represents how code is written and how it > changes (patch theory at its best). You can clearly see the direction code is > going, where it came from, and how it relates to other patches. I don't know anything about darcs, I was only talking about arch. I failed to compile darcs after trying for a while, so I give it up, I'll try again eventually. But anyway the only thing I care about is that you import all dozen thousands changesets of the 2.5 kernel into it, and you show it's manageable with <1G of ram and that the backup size is not very far from the 75M of CVS. I read in the webpage of the darcs kernel repository that they had to add RAM serveral times to avoid running out of memory. They needed more than 1G IIRC, and that was enough for me to lose interest into it. You're right I blamed the functional approach and so I felt it was going to be a mess to fix the ram utilization, but as someone else pointed out, perhaps it's darcs to blame and not haskell. I don't know. To me backup size matters too and for example I'm quite happy the fsfs backend of SVN generates very small backups compared to bsddb. > Sure, you can do this with RCS/SCCS style versioning, but whats the point? It > is inefficient, and backwards. It is saved in a compact format, and I don't think it'll run slower since if it's in cache it'll be fast and if it's I/O dominaed the more compact it is, the faster it will be, having a compact size both for the repository and for the backup, is more important to me. In theory one could write a backup tool that extracts the thing and rewrite a special backup-backend that is as space efficient as CVS and that can compress as well as CVS, but this won't help the working copy. > Thats all up to how the versioning system is written. Darcs developers are > working in a checkpoint system to allow you to just grab the newest stuff, This is already available with arch. Infact I suggested myself how to improve it with hardlinks so that a checkout will take a few seconds no matter the size of the tree. > and automatically grab anything else you need, instead of just grabbing > everything. In the case of the darcs linux repo, no one wants to download 600 > megs or so of changes. If you use arch/darcs as a patch-download tool, then that's fine as you say and you can already do that with arch (that in this part seems already a lot more advanced and it's written in C btw). Most people just checking out the kernel with arch (or darcs) would never need to download 600Megs of changes, but I need to download them all. The major reason a versioning system is useful to me is to track all changesets that touched a single file since the start of 2.5 to the head. So I can't get away by downloading the last dozen patches and caching the previous tree (which is perfectly doable with arch for ages and with hardlinks as well). > It may not even be space efficient. Code ultimately is just code, and changes > ultimately are changes. RCS isn't magical, and its far from it. Infact, the The way RCS stores the stuff compresses great. In that is "magical". I guess SCCS is the same. fsfs isn't bad either though, and infact I'd never use bsddb and I'd only use fsfs with SVN. > The darcs repo which has the entire history since at least the start of 2.4 > (iirc anyways) to *now* is around 600 to 700. > My suggestion is to convert _all_ dozen thousand changesets to darcs, and > then > compare the size with CVS. And no, darcs doesn't eat that much memory for the What is the above 600/700 number then? I thought that was the conversion of all dozen thousand changesets of linux-2.5 CVS to darcs. > amount of work its doing. (And yes, they are working on that). I'll stay tuned. To me the only argument for not using a "magic" format like CVS or SCCS that is space efficient and compresses efficiently, is if you claim it's going to be a lot slower at checkouts (but infact applying some dozen thousand patchsets to run a checkout is going to be slower than CVS/SCCS). I know it's so much simpler to keep each patchset in a different file like arch is already doing, but that's not the best on-disk format IMHO. Note that some year ago I had the opposite idea, i.e. at some point I got convinced it was so much better to keep each patch separated from each other like you're advocating above, until I figured out how big the thing grows and how little space efficient it is, and how much I/O it forces me to do, how much disk it wastes in the backup and how slow it is as well to checkout dozen thousand patchsets. For smaller projects without dozen thousand changesets, the patch per file looks fine instead. For big projects IMHO being space efficient is much more important. - To unsubscribe from this list: send the line "unsubscribe
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, February 19, 2005 4:10 am, Patrick McFarland said: > On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote: >> On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: >> > RCS/SCCS format doesn't make much sence for a changeset oriented SCM. >> >> The advantage it will provide is that it'll be compact and a backup will >> compress at best too. Small compressed tarballs compress very badly >> instead, it wouldn't be even comparable. Once the thing is very compact >> it has a better chance to fit in cache, and if it fits in cache >> extracting diffs from each file will be very fast. Once it'll be compact >> the cost of a changeset will be diminished allowing it to scale better >> too. > > In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. > By > using it's super magical method, it represents how code is written and how > it > changes (patch theory at its best). You can clearly see the direction code > is > going, where it came from, and how it relates to other patches. > > Sure, you can do this with RCS/SCCS style versioning, but whats the point? > It > is inefficient, and backwards. > >> Now it's true new disks are bigger, but they're not much faster, so if >> the size of the repository is much larger, it'll be much slower to >> checkout if it doesn't fit in cache. And if it's smaller it has better >> chances of fitting in cache too. > > Thats all up to how the versioning system is written. Darcs developers are > working in a checkpoint system to allow you to just grab the newest stuff, > and automatically grab anything else you need, instead of just grabbing > everything. In the case of the darcs linux repo, no one wants to download > 600 > megs or so of changes. > >> The thing is, RCS seems a space efficient format for storing patches, >> and it's efficient at extracting them too (plus it's textual so it's not >> going to get lost info even if something goes wrong). > > It may not even be space efficient. Code ultimately is just code, and > changes > ultimately are changes. RCS isn't magical, and its far from it. Infact, > the > format darcs uses probably stores more code in less space, but don't quote > me > on that. > >> The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed. > > The darcs repo which has the entire history since at least the start of > 2.4 > (iirc anyways) to *now* is around 600 to 700. > >> My suggestion is to convert _all_ dozen thousand changesets to arch or >> SVN and then compare the size with CVS (also the compressed size is >> interesting for backups IMHO). Unfortunately I know nothing about darcs >> yet (except it eats quite some memory ;) > > My suggestion is to convert _all_ dozen thousand changesets to darcs, and > then > compare the size with CVS. And no, darcs doesn't eat that much memory for > the > amount of work its doing. (And yes, they are working on that). > > The only thing you haven't brought up is the whole "omgwtfbbq! BK sucks, > lets > switch to SVN or Arch!" thing everyone else in the known universe is > doing. > BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the > same > goes for SVN vs Arch vs Darcs). > > (Start Generic BK Thread On LKML Rant) > > Dear Everyone, > > I think if Linus is happy with BK, he should stick with it. His opinion > ultimately trumps all of ours because he does all the hard maintainership > work, and we don't. The only guy that gets to bitch about how much a > versioning system sucks is the maintainer of a project (unless its CVS, > then > all bets are off). > > Linus has so far indicated that he likes BK, so the kernel hacking > community > will be stuck using that for awhile. However, that doesn't stop the > license > kiddies from coming out of the woodwork and mindlessly quoting the bad > parts > of the BK license (which, yes, its non-free, but at this point, who gives > a > shit). > > IMO, yes, a non-free versioning system for the crown jewel of the FLOSS > community is a little... odd, but it was LInus's choice, and we now have > to > respect it/deal with it. > > Now, I did say above (in this thread) that darcs would be really awesome > for > kernel hacking, especially since it's inherent support for multiple > branches[1] and the ability to send changes from each other around easily > would come in handy; however, darcs was not mature at the time of Linus's > decision (and many say it is still not mature enough), so if Linus had > actually chosen darcs, I (and other people here) would be now flaming him > for > choosing a versioning system that wasn't mature. > > Similarly, if he had chosen arch, everyone would have flamed him for > choosing > a hard to use tool. With svn, he would have met flamage by the hands of it > being too much like cvs and not supporting arch/darcs style branch > syncing. > And if he stayed with cvs, he would have been roasted over an open fire > for > sticking with an out of date, useless, insane tool. > > And if he chose
Re: [darcs-users] Re: [BK] upgrade will be needed
On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote: > On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: > > RCS/SCCS format doesn't make much sence for a changeset oriented SCM. > > The advantage it will provide is that it'll be compact and a backup will > compress at best too. Small compressed tarballs compress very badly > instead, it wouldn't be even comparable. Once the thing is very compact > it has a better chance to fit in cache, and if it fits in cache > extracting diffs from each file will be very fast. Once it'll be compact > the cost of a changeset will be diminished allowing it to scale better > too. In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By using it's super magical method, it represents how code is written and how it changes (patch theory at its best). You can clearly see the direction code is going, where it came from, and how it relates to other patches. Sure, you can do this with RCS/SCCS style versioning, but whats the point? It is inefficient, and backwards. > Now it's true new disks are bigger, but they're not much faster, so if > the size of the repository is much larger, it'll be much slower to > checkout if it doesn't fit in cache. And if it's smaller it has better > chances of fitting in cache too. Thats all up to how the versioning system is written. Darcs developers are working in a checkpoint system to allow you to just grab the newest stuff, and automatically grab anything else you need, instead of just grabbing everything. In the case of the darcs linux repo, no one wants to download 600 megs or so of changes. > The thing is, RCS seems a space efficient format for storing patches, > and it's efficient at extracting them too (plus it's textual so it's not > going to get lost info even if something goes wrong). It may not even be space efficient. Code ultimately is just code, and changes ultimately are changes. RCS isn't magical, and its far from it. Infact, the format darcs uses probably stores more code in less space, but don't quote me on that. > The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed. The darcs repo which has the entire history since at least the start of 2.4 (iirc anyways) to *now* is around 600 to 700. > My suggestion is to convert _all_ dozen thousand changesets to arch or > SVN and then compare the size with CVS (also the compressed size is > interesting for backups IMHO). Unfortunately I know nothing about darcs > yet (except it eats quite some memory ;) My suggestion is to convert _all_ dozen thousand changesets to darcs, and then compare the size with CVS. And no, darcs doesn't eat that much memory for the amount of work its doing. (And yes, they are working on that). The only thing you haven't brought up is the whole "omgwtfbbq! BK sucks, lets switch to SVN or Arch!" thing everyone else in the known universe is doing. BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the same goes for SVN vs Arch vs Darcs). (Start Generic BK Thread On LKML Rant) Dear Everyone, I think if Linus is happy with BK, he should stick with it. His opinion ultimately trumps all of ours because he does all the hard maintainership work, and we don't. The only guy that gets to bitch about how much a versioning system sucks is the maintainer of a project (unless its CVS, then all bets are off). Linus has so far indicated that he likes BK, so the kernel hacking community will be stuck using that for awhile. However, that doesn't stop the license kiddies from coming out of the woodwork and mindlessly quoting the bad parts of the BK license (which, yes, its non-free, but at this point, who gives a shit). IMO, yes, a non-free versioning system for the crown jewel of the FLOSS community is a little... odd, but it was LInus's choice, and we now have to respect it/deal with it. Now, I did say above (in this thread) that darcs would be really awesome for kernel hacking, especially since it's inherent support for multiple branches[1] and the ability to send changes from each other around easily would come in handy; however, darcs was not mature at the time of Linus's decision (and many say it is still not mature enough), so if Linus had actually chosen darcs, I (and other people here) would be now flaming him for choosing a versioning system that wasn't mature. Similarly, if he had chosen arch, everyone would have flamed him for choosing a hard to use tool. With svn, he would have met flamage by the hands of it being too much like cvs and not supporting arch/darcs style branch syncing. And if he stayed with cvs, he would have been roasted over an open fire for sticking with an out of date, useless, insane tool. And if he chose anything else that I didn't previously mention, everyone would have donned flame retardant suits and went into the fray over the fact that no one has heard of that versioning system. No matter what choice Linus would have
Re: [darcs-users] Re: [BK] upgrade will be needed
On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote: On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: RCS/SCCS format doesn't make much sence for a changeset oriented SCM. The advantage it will provide is that it'll be compact and a backup will compress at best too. Small compressed tarballs compress very badly instead, it wouldn't be even comparable. Once the thing is very compact it has a better chance to fit in cache, and if it fits in cache extracting diffs from each file will be very fast. Once it'll be compact the cost of a changeset will be diminished allowing it to scale better too. In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By using it's super magical method, it represents how code is written and how it changes (patch theory at its best). You can clearly see the direction code is going, where it came from, and how it relates to other patches. Sure, you can do this with RCS/SCCS style versioning, but whats the point? It is inefficient, and backwards. Now it's true new disks are bigger, but they're not much faster, so if the size of the repository is much larger, it'll be much slower to checkout if it doesn't fit in cache. And if it's smaller it has better chances of fitting in cache too. Thats all up to how the versioning system is written. Darcs developers are working in a checkpoint system to allow you to just grab the newest stuff, and automatically grab anything else you need, instead of just grabbing everything. In the case of the darcs linux repo, no one wants to download 600 megs or so of changes. The thing is, RCS seems a space efficient format for storing patches, and it's efficient at extracting them too (plus it's textual so it's not going to get lost info even if something goes wrong). It may not even be space efficient. Code ultimately is just code, and changes ultimately are changes. RCS isn't magical, and its far from it. Infact, the format darcs uses probably stores more code in less space, but don't quote me on that. The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed. The darcs repo which has the entire history since at least the start of 2.4 (iirc anyways) to *now* is around 600 to 700. My suggestion is to convert _all_ dozen thousand changesets to arch or SVN and then compare the size with CVS (also the compressed size is interesting for backups IMHO). Unfortunately I know nothing about darcs yet (except it eats quite some memory ;) My suggestion is to convert _all_ dozen thousand changesets to darcs, and then compare the size with CVS. And no, darcs doesn't eat that much memory for the amount of work its doing. (And yes, they are working on that). The only thing you haven't brought up is the whole omgwtfbbq! BK sucks, lets switch to SVN or Arch! thing everyone else in the known universe is doing. BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the same goes for SVN vs Arch vs Darcs). (Start Generic BK Thread On LKML Rant) Dear Everyone, I think if Linus is happy with BK, he should stick with it. His opinion ultimately trumps all of ours because he does all the hard maintainership work, and we don't. The only guy that gets to bitch about how much a versioning system sucks is the maintainer of a project (unless its CVS, then all bets are off). Linus has so far indicated that he likes BK, so the kernel hacking community will be stuck using that for awhile. However, that doesn't stop the license kiddies from coming out of the woodwork and mindlessly quoting the bad parts of the BK license (which, yes, its non-free, but at this point, who gives a shit). IMO, yes, a non-free versioning system for the crown jewel of the FLOSS community is a little... odd, but it was LInus's choice, and we now have to respect it/deal with it. Now, I did say above (in this thread) that darcs would be really awesome for kernel hacking, especially since it's inherent support for multiple branches[1] and the ability to send changes from each other around easily would come in handy; however, darcs was not mature at the time of Linus's decision (and many say it is still not mature enough), so if Linus had actually chosen darcs, I (and other people here) would be now flaming him for choosing a versioning system that wasn't mature. Similarly, if he had chosen arch, everyone would have flamed him for choosing a hard to use tool. With svn, he would have met flamage by the hands of it being too much like cvs and not supporting arch/darcs style branch syncing. And if he stayed with cvs, he would have been roasted over an open fire for sticking with an out of date, useless, insane tool. And if he chose anything else that I didn't previously mention, everyone would have donned flame retardant suits and went into the fray over the fact that no one has heard of that versioning system. No matter what choice Linus would have made, he would have had
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, February 19, 2005 4:10 am, Patrick McFarland said: On Friday 18 February 2005 07:50 am, Andrea Arcangeli wrote: On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: RCS/SCCS format doesn't make much sence for a changeset oriented SCM. The advantage it will provide is that it'll be compact and a backup will compress at best too. Small compressed tarballs compress very badly instead, it wouldn't be even comparable. Once the thing is very compact it has a better chance to fit in cache, and if it fits in cache extracting diffs from each file will be very fast. Once it'll be compact the cost of a changeset will be diminished allowing it to scale better too. In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By using it's super magical method, it represents how code is written and how it changes (patch theory at its best). You can clearly see the direction code is going, where it came from, and how it relates to other patches. Sure, you can do this with RCS/SCCS style versioning, but whats the point? It is inefficient, and backwards. Now it's true new disks are bigger, but they're not much faster, so if the size of the repository is much larger, it'll be much slower to checkout if it doesn't fit in cache. And if it's smaller it has better chances of fitting in cache too. Thats all up to how the versioning system is written. Darcs developers are working in a checkpoint system to allow you to just grab the newest stuff, and automatically grab anything else you need, instead of just grabbing everything. In the case of the darcs linux repo, no one wants to download 600 megs or so of changes. The thing is, RCS seems a space efficient format for storing patches, and it's efficient at extracting them too (plus it's textual so it's not going to get lost info even if something goes wrong). It may not even be space efficient. Code ultimately is just code, and changes ultimately are changes. RCS isn't magical, and its far from it. Infact, the format darcs uses probably stores more code in less space, but don't quote me on that. The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed. The darcs repo which has the entire history since at least the start of 2.4 (iirc anyways) to *now* is around 600 to 700. My suggestion is to convert _all_ dozen thousand changesets to arch or SVN and then compare the size with CVS (also the compressed size is interesting for backups IMHO). Unfortunately I know nothing about darcs yet (except it eats quite some memory ;) My suggestion is to convert _all_ dozen thousand changesets to darcs, and then compare the size with CVS. And no, darcs doesn't eat that much memory for the amount of work its doing. (And yes, they are working on that). The only thing you haven't brought up is the whole omgwtfbbq! BK sucks, lets switch to SVN or Arch! thing everyone else in the known universe is doing. BK isn't clearly inferior or superior to SVN or Arch or Darcs (and the same goes for SVN vs Arch vs Darcs). (Start Generic BK Thread On LKML Rant) Dear Everyone, I think if Linus is happy with BK, he should stick with it. His opinion ultimately trumps all of ours because he does all the hard maintainership work, and we don't. The only guy that gets to bitch about how much a versioning system sucks is the maintainer of a project (unless its CVS, then all bets are off). Linus has so far indicated that he likes BK, so the kernel hacking community will be stuck using that for awhile. However, that doesn't stop the license kiddies from coming out of the woodwork and mindlessly quoting the bad parts of the BK license (which, yes, its non-free, but at this point, who gives a shit). IMO, yes, a non-free versioning system for the crown jewel of the FLOSS community is a little... odd, but it was LInus's choice, and we now have to respect it/deal with it. Now, I did say above (in this thread) that darcs would be really awesome for kernel hacking, especially since it's inherent support for multiple branches[1] and the ability to send changes from each other around easily would come in handy; however, darcs was not mature at the time of Linus's decision (and many say it is still not mature enough), so if Linus had actually chosen darcs, I (and other people here) would be now flaming him for choosing a versioning system that wasn't mature. Similarly, if he had chosen arch, everyone would have flamed him for choosing a hard to use tool. With svn, he would have met flamage by the hands of it being too much like cvs and not supporting arch/darcs style branch syncing. And if he stayed with cvs, he would have been roasted over an open fire for sticking with an out of date, useless, insane tool. And if he chose anything else that I didn't previously mention, everyone would have donned flame retardant suits and went into the fray over the fact that no one
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote: In the case of darcs, RCS/SCCS works exactly opposite of how darcs does. By using it's super magical method, it represents how code is written and how it changes (patch theory at its best). You can clearly see the direction code is going, where it came from, and how it relates to other patches. I don't know anything about darcs, I was only talking about arch. I failed to compile darcs after trying for a while, so I give it up, I'll try again eventually. But anyway the only thing I care about is that you import all dozen thousands changesets of the 2.5 kernel into it, and you show it's manageable with 1G of ram and that the backup size is not very far from the 75M of CVS. I read in the webpage of the darcs kernel repository that they had to add RAM serveral times to avoid running out of memory. They needed more than 1G IIRC, and that was enough for me to lose interest into it. You're right I blamed the functional approach and so I felt it was going to be a mess to fix the ram utilization, but as someone else pointed out, perhaps it's darcs to blame and not haskell. I don't know. To me backup size matters too and for example I'm quite happy the fsfs backend of SVN generates very small backups compared to bsddb. Sure, you can do this with RCS/SCCS style versioning, but whats the point? It is inefficient, and backwards. It is saved in a compact format, and I don't think it'll run slower since if it's in cache it'll be fast and if it's I/O dominaed the more compact it is, the faster it will be, having a compact size both for the repository and for the backup, is more important to me. In theory one could write a backup tool that extracts the thing and rewrite a special backup-backend that is as space efficient as CVS and that can compress as well as CVS, but this won't help the working copy. Thats all up to how the versioning system is written. Darcs developers are working in a checkpoint system to allow you to just grab the newest stuff, This is already available with arch. Infact I suggested myself how to improve it with hardlinks so that a checkout will take a few seconds no matter the size of the tree. and automatically grab anything else you need, instead of just grabbing everything. In the case of the darcs linux repo, no one wants to download 600 megs or so of changes. If you use arch/darcs as a patch-download tool, then that's fine as you say and you can already do that with arch (that in this part seems already a lot more advanced and it's written in C btw). Most people just checking out the kernel with arch (or darcs) would never need to download 600Megs of changes, but I need to download them all. The major reason a versioning system is useful to me is to track all changesets that touched a single file since the start of 2.5 to the head. So I can't get away by downloading the last dozen patches and caching the previous tree (which is perfectly doable with arch for ages and with hardlinks as well). It may not even be space efficient. Code ultimately is just code, and changes ultimately are changes. RCS isn't magical, and its far from it. Infact, the The way RCS stores the stuff compresses great. In that is magical. I guess SCCS is the same. fsfs isn't bad either though, and infact I'd never use bsddb and I'd only use fsfs with SVN. The darcs repo which has the entire history since at least the start of 2.4 (iirc anyways) to *now* is around 600 to 700. My suggestion is to convert _all_ dozen thousand changesets to darcs, and then compare the size with CVS. And no, darcs doesn't eat that much memory for the What is the above 600/700 number then? I thought that was the conversion of all dozen thousand changesets of linux-2.5 CVS to darcs. amount of work its doing. (And yes, they are working on that). I'll stay tuned. To me the only argument for not using a magic format like CVS or SCCS that is space efficient and compresses efficiently, is if you claim it's going to be a lot slower at checkouts (but infact applying some dozen thousand patchsets to run a checkout is going to be slower than CVS/SCCS). I know it's so much simpler to keep each patchset in a different file like arch is already doing, but that's not the best on-disk format IMHO. Note that some year ago I had the opposite idea, i.e. at some point I got convinced it was so much better to keep each patch separated from each other like you're advocating above, until I figured out how big the thing grows and how little space efficient it is, and how much I/O it forces me to do, how much disk it wastes in the backup and how slow it is as well to checkout dozen thousand patchsets. For smaller projects without dozen thousand changesets, the patch per file looks fine instead. For big projects IMHO being space efficient is much more important. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 05:42:13PM +0100, Andrea Arcangeli wrote: But anyway the only thing I care about is that you import all dozen thousands changesets of the 2.5 kernel into it, and you show it's manageable with 1G of ram and that the backup size is not very far from the 75M of CVS. The linux-2.5 tree right now (I'm re-doing the conversion, and am up to October of last year, so far) is at 141M, if you don't count the pristine cache or working directory. That's already compressed, so you don't get any extra bonuses. Darcs stores with each changeset both the old and new versions of each hunk, which gives you some redundancy, and probably accounts for the factor of two greater size than CVS. This gives a bit of redundancy, which can be helpful in cases of repository corruption. I read in the webpage of the darcs kernel repository that they had to add RAM serveral times to avoid running out of memory. They needed more than 1G IIRC, and that was enough for me to lose interest into it. You're right I blamed the functional approach and so I felt it was going to be a mess to fix the ram utilization, but as someone else pointed out, perhaps it's darcs to blame and not haskell. I don't know. Darcs' RAM use has indeed already improved somewhat... I'm not exactly sure how much. I'm not quite sure how to measure peak virtual memory usage, and most of the time darcs' memory use while doing the linux kernel conversion is under a couple of hundred megabytes. There are indeed trickinesses involved in making sure garbage gets collected in a timely manner when coding in a lazy language like haskell. On Sat, Feb 19, 2005 at 04:10:18AM -0500, Patrick McFarland wrote: Thats all up to how the versioning system is written. Darcs developers are working in a checkpoint system to allow you to just grab the newest stuff, Correction: we already have a checkpoint system. The work in progress is making commands that examine the history fail gracefully when that history isn't present. This is already available with arch. In fact I suggested myself how to improve it with hardlinks so that a checkout will take a few seconds no matter the size of the tree. I presume you're referring to a local checkout? That is already done using hard links by darcs--only of course the working directory has to actually be copied over, since there are editors out there that aren't friendly to hard-linked files. and automatically grab anything else you need, instead of just grabbing everything. In the case of the darcs linux repo, no one wants to download 600 megs or so of changes. If you use arch/darcs as a patch-download tool, then that's fine ... The major reason a versioning system is useful to me is to track all changesets that touched a single file since the start of 2.5 to the head. So I can't get away by downloading the last dozen patches and caching the previous tree (which is perfectly doable with arch for ages and with hardlinks as well). And here's where darcs really falls down. To track the history of a single file it has to read the contents of every changeset since the creation of that file, which will take forever (well, not quite... but close). I hope to someday (when more pressing issues are dealt with) add a per-file cache indicating which patches affect which files, which should largely address this problem, although it won't help at all with files that are touched by most of the changesets, and won't be optimimal in any case. :( -- David Roundy http://www.darcs.net - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Sat, Feb 19, 2005 at 12:15:02PM -0500, David Roundy wrote: The linux-2.5 tree right now (I'm re-doing the conversion, and am up to October of last year, so far) is at 141M, if you don't count the pristine cache or working directory. That's already compressed, so you don't get any extra bonuses. Darcs stores with each changeset both the old and new versions of each hunk, which gives you some redundancy, and probably accounts for the factor of two greater size than CVS. This gives a bit of redundancy, which can be helpful in cases of repository corruption. Double size of the compressed backup is about the same as SVM with fsfs (not tested on l-k tree but in something much smaller). Why not to simply checksum instead of having data redundancy? Knowing when something is corrupted is a great feature, but doing raid1 without the user asking for it, is a worthless overhead. The same is true for arch of course, last time I checked they were using the default -U 3 format instead of -U 0. I presume you're referring to a local checkout? That is already done using hard links by darcs--only of course the working directory has to actually be copied over, since there are editors out there that aren't friendly to hard-linked files. arch allows to hardlink the copy too (optionally) and it's up to you to use the right switch in the editor (Davide had a LD_PRELOAD to do a copy-on-write since the kernel doesn't provide the feature). And here's where darcs really falls down. To track the history of a single file it has to read the contents of every changeset since the creation of that file, which will take forever (well, not quite... but close). Indeed, and as I mentioned this is the *major* feature as far as I'm concerned (and frankly the only one I really care about and that helps a lot to track changes in the tree and understand why the code evolved). Note that cvsps works great for this, it's very efficient as well (not remotely comparable to arch at least, even if arch provided a tool equivalent to cvsps), the only problem is that CVS is out of sync... I hope to someday (when more pressing issues are dealt with) add a per-file cache indicating which patches affect which files, which should largely address this problem, although it won't help at all with files that are touched by most of the changesets, and won't be optimimal in any case. :( Wouldn't using the CVS format help an order of magnitude here? With CVS/SCCS format you can extract all the patchsets that affected a single file in a extremely efficient manner, memory utilization will be extremely low too (like in cvsps indeed). You'll only have to lookup the global changeset file, and then open the few ,v files that are affected and extract their patchsets. cvsps does this optimally already. The only difference is that what cvsps is a readonly cache, while with a real SCM it would be a global file that control all the changesets in an atomic way. Infact *that* global file could be a bsddb too, I don't care about how the changset file is being encoded, all I care is that the data is a ,v file or SCCS file so cvsps won't have to read 2 files every time I ask that question, which is currently unavoidable with both darcs and arch. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Feb 18, 2005, at 1:09, Andrea Arcangeli wrote: darcs scares me a bit because it's in haskell, I don't believe very much in functional languages for compute intensive stuff, ram utilization It doesn't sound like you've programmed in functional languages all that much. While I don't have any practical experience in Haskell, I can say for sure that my functional code in ocaml blows my C code away in maintainability and performance. Now, if I could only get it to dump core... Haskell was a draw for me. It's very rare to find someone who actually knows C and can write bulletproof code in it. On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: RCS/SCCS format doesn't make much sence for a changeset oriented SCM. The advantage it will provide is that it'll be compact and a backup will compress at best too. Small compressed tarballs compress very badly instead, it wouldn't be even comparable. Once the thing is very compact it has a better chance to fit in cache, and if it fits in cache extracting diffs from each file will be very fast. Once it'll be compact the cost of a changeset will be diminished allowing it to scale better too. Then what gets transferred over the wire? The full modified ,v file? Do you need a smart server to create deltas to be applied to your ,v files? What do you do when someone goes in with an rcs command to destroy part of the history (since the storage is now mutable). I use both darcs and arch regularly. darcs is a lot nicer to use from a human interface point of view (and the merging is really a lot nicer), but the nicest thing about arch is that a given commit is immutable. There are no tools to modify it. This is also why the crypto signature stuff was so easy to fit in. RCS and SCCS storage throws away most of those features. -- Dustin Sallings - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: > RCS/SCCS format doesn't make much sence for a changeset oriented SCM. The advantage it will provide is that it'll be compact and a backup will compress at best too. Small compressed tarballs compress very badly instead, it wouldn't be even comparable. Once the thing is very compact it has a better chance to fit in cache, and if it fits in cache extracting diffs from each file will be very fast. Once it'll be compact the cost of a changeset will be diminished allowing it to scale better too. Now it's true new disks are bigger, but they're not much faster, so if the size of the repository is much larger, it'll be much slower to checkout if it doesn't fit in cache. And if it's smaller it has better chances of fitting in cache too. The thing is, RCS seems a space efficient format for storing patches, and it's efficient at extracting them too (plus it's textual so it's not going to get lost info even if something goes wrong). The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed. My suggestion is to convert _all_ dozen thousand changesets to arch or SVN and then compare the size with CVS (also the compressed size is interesting for backups IMHO). Unfortunately I know nothing about darcs yet (except it eats quite some memory ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Fri, 18 Feb 2005 10:09:00 +0100, Andrea Arcangeli <[EMAIL PROTECTED]> wrote: > On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote: > > small to medium sized ones). Last I checked, Arch was still too slow in > > some areas, though that might have changed in recent months. Also, many > > IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the > backend and a single file for the changesets and with sane parameters > conventions miming SVN. RCS/SCCS format doesn't make much sence for a changeset oriented SCM. /Erik - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Fri, Feb 18, 2005 at 10:09:00AM +0100, Andrea Arcangeli wrote: > darcs scares me a bit because it's in haskell, I don't believe very much > in functional languages for compute intensive stuff, ram utilization > skyrockets sometime (I wouldn't like to need >1G of ram to manage the > tree). AFAICS, most of memory related problems in darcs are not necessarily a result of using Haskell. > Other languages like python or perl are much slower than C/C++ too but > at least ram utilization can be normally dominated to sane levels with > them and they can be greatly optimized easily with C/C++ extensions of > the performance critical parts. With those languages, you often have no other choice than resorting to C. GHC is quite a good compiler and I've often been able to get my programs run almost as fast as programs written in C++ - however, if I were to write those programs in C++, I would never do that, despite being quite a good C++ programmer. Also, in Haskell you can use extensions written in C, as easily or even easier than in Python or Perl (I've done this in Perl, heard the battle stories about C extensions in Python). Haskell's FFI is quite good, there are also many supporting tools. Best regards Tomasz -- Szukamy programisty C++ i Haskell'a: http://tinyurl.com/5mw4e - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote: > small to medium sized ones). Last I checked, Arch was still too slow in > some areas, though that might have changed in recent months. Also, many IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the backend and a single file for the changesets and with sane parameters conventions miming SVN. The internal algorithms of arch seems the most advanced possible. It's just the interface and the fs backend that's so bad and doesn't compress in the backups either. SVN bsddb doesn't compress either by default, but at least the new fsfs compresses pretty well, not as good as CVS, but not as badly as bsddb and arch either. I may be completely wrong, so take the above just as a humble suggestion. darcs scares me a bit because it's in haskell, I don't believe very much in functional languages for compute intensive stuff, ram utilization skyrockets sometime (I wouldn't like to need >1G of ram to manage the tree). Other languages like python or perl are much slower than C/C++ too but at least ram utilization can be normally dominated to sane levels with them and they can be greatly optimized easily with C/C++ extensions of the performance critical parts. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote: small to medium sized ones). Last I checked, Arch was still too slow in some areas, though that might have changed in recent months. Also, many IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the backend and a single file for the changesets and with sane parameters conventions miming SVN. The internal algorithms of arch seems the most advanced possible. It's just the interface and the fs backend that's so bad and doesn't compress in the backups either. SVN bsddb doesn't compress either by default, but at least the new fsfs compresses pretty well, not as good as CVS, but not as badly as bsddb and arch either. I may be completely wrong, so take the above just as a humble suggestion. darcs scares me a bit because it's in haskell, I don't believe very much in functional languages for compute intensive stuff, ram utilization skyrockets sometime (I wouldn't like to need 1G of ram to manage the tree). Other languages like python or perl are much slower than C/C++ too but at least ram utilization can be normally dominated to sane levels with them and they can be greatly optimized easily with C/C++ extensions of the performance critical parts. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Fri, Feb 18, 2005 at 10:09:00AM +0100, Andrea Arcangeli wrote: darcs scares me a bit because it's in haskell, I don't believe very much in functional languages for compute intensive stuff, ram utilization skyrockets sometime (I wouldn't like to need 1G of ram to manage the tree). AFAICS, most of memory related problems in darcs are not necessarily a result of using Haskell. Other languages like python or perl are much slower than C/C++ too but at least ram utilization can be normally dominated to sane levels with them and they can be greatly optimized easily with C/C++ extensions of the performance critical parts. With those languages, you often have no other choice than resorting to C. GHC is quite a good compiler and I've often been able to get my programs run almost as fast as programs written in C++ - however, if I were to write those programs in C++, I would never do that, despite being quite a good C++ programmer. Also, in Haskell you can use extensions written in C, as easily or even easier than in Python or Perl (I've done this in Perl, heard the battle stories about C extensions in Python). Haskell's FFI is quite good, there are also many supporting tools. Best regards Tomasz -- Szukamy programisty C++ i Haskell'a: http://tinyurl.com/5mw4e - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Fri, 18 Feb 2005 10:09:00 +0100, Andrea Arcangeli [EMAIL PROTECTED] wrote: On Thu, Feb 17, 2005 at 06:24:53PM -0800, Tupshin Harper wrote: small to medium sized ones). Last I checked, Arch was still too slow in some areas, though that might have changed in recent months. Also, many IMHO someone needs to rewrite ARCH using the RCS or SCCS format for the backend and a single file for the changesets and with sane parameters conventions miming SVN. RCS/SCCS format doesn't make much sence for a changeset oriented SCM. /Erik - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: RCS/SCCS format doesn't make much sence for a changeset oriented SCM. The advantage it will provide is that it'll be compact and a backup will compress at best too. Small compressed tarballs compress very badly instead, it wouldn't be even comparable. Once the thing is very compact it has a better chance to fit in cache, and if it fits in cache extracting diffs from each file will be very fast. Once it'll be compact the cost of a changeset will be diminished allowing it to scale better too. Now it's true new disks are bigger, but they're not much faster, so if the size of the repository is much larger, it'll be much slower to checkout if it doesn't fit in cache. And if it's smaller it has better chances of fitting in cache too. The thing is, RCS seems a space efficient format for storing patches, and it's efficient at extracting them too (plus it's textual so it's not going to get lost info even if something goes wrong). The whole linux-2.5 CVS is 500M uncompressed and 75M tar.bz2 compressed. My suggestion is to convert _all_ dozen thousand changesets to arch or SVN and then compare the size with CVS (also the compressed size is interesting for backups IMHO). Unfortunately I know nothing about darcs yet (except it eats quite some memory ;) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Feb 18, 2005, at 1:09, Andrea Arcangeli wrote: darcs scares me a bit because it's in haskell, I don't believe very much in functional languages for compute intensive stuff, ram utilization It doesn't sound like you've programmed in functional languages all that much. While I don't have any practical experience in Haskell, I can say for sure that my functional code in ocaml blows my C code away in maintainability and performance. Now, if I could only get it to dump core... Haskell was a draw for me. It's very rare to find someone who actually knows C and can write bulletproof code in it. On Fri, Feb 18, 2005 at 12:53:09PM +0100, Erik Bågfors wrote: RCS/SCCS format doesn't make much sence for a changeset oriented SCM. The advantage it will provide is that it'll be compact and a backup will compress at best too. Small compressed tarballs compress very badly instead, it wouldn't be even comparable. Once the thing is very compact it has a better chance to fit in cache, and if it fits in cache extracting diffs from each file will be very fast. Once it'll be compact the cost of a changeset will be diminished allowing it to scale better too. Then what gets transferred over the wire? The full modified ,v file? Do you need a smart server to create deltas to be applied to your ,v files? What do you do when someone goes in with an rcs command to destroy part of the history (since the storage is now mutable). I use both darcs and arch regularly. darcs is a lot nicer to use from a human interface point of view (and the merging is really a lot nicer), but the nicest thing about arch is that a given commit is immutable. There are no tools to modify it. This is also why the crypto signature stuff was so easy to fit in. RCS and SCCS storage throws away most of those features. -- Dustin Sallings - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Thu, February 17, 2005 9:24 pm, Tupshin Harper said: Hi Tupshin, > Speaking as somebody that uses Darcs evey day, my opinion is that the > future of OSS SCM will be something like arch or darcs but that neither > are ready for projects the size of the linux kernel yet. Darcs is > definitely way too slow for really large projects (though great for > small to medium sized ones). Last I checked, Arch was still too slow in > some areas, though that might have changed in recent months. Also, many > people, me included, find the usability of arch to be from ideal. > > My hope and expectation is that Arch and Darcs will both improve their > performance, features, and usability and that in the not too distant > future both of them will be viable alternatives for large scale source > tree management. Falling into the same category probably is svk, although it's less mature than the options you cite. > The important thing for the health of the SCM ecosystem is that there be > ways to losslessly convert and interoperate between them as well as > between legacy/centralized systems such as CVS and SVN as well as with > BK. Amen. Thanks, Sean - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
Patrick McFarland wrote: On Sunday 13 February 2005 09:08 pm, Larry McVoy wrote: Something that unintentionally started a flamewar. Well, we just went through another round of 'BK sucks' and 'BK sucks, we need to switch to something else'. Sans the flamewar, are there any options? CVS and SVN are out because they do not support 'off server' branches (arch and darcs do). Darcs would probably be the best choice because its easy to use, and the darcs team almost has a working linux-kernel import script (originally designed to just test darcs with a huge repo, but provides a mostly working linux tree). So, without the flamewar, what is everyone's opinion on this? Speaking as somebody that uses Darcs evey day, my opinion is that the future of OSS SCM will be something like arch or darcs but that neither are ready for projects the size of the linux kernel yet. Darcs is definitely way too slow for really large projects (though great for small to medium sized ones). Last I checked, Arch was still too slow in some areas, though that might have changed in recent months. Also, many people, me included, find the usability of arch to be from ideal. My hope and expectation is that Arch and Darcs will both improve their performance, features, and usability and that in the not too distant future both of them will be viable alternatives for large scale source tree management. The important thing for the health of the SCM ecosystem is that there be ways to losslessly convert and interoperate between them as well as between legacy/centralized systems such as CVS and SVN as well as with BK. -Tupshin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
Patrick McFarland wrote: On Sunday 13 February 2005 09:08 pm, Larry McVoy wrote: Something that unintentionally started a flamewar. Well, we just went through another round of 'BK sucks' and 'BK sucks, we need to switch to something else'. Sans the flamewar, are there any options? CVS and SVN are out because they do not support 'off server' branches (arch and darcs do). Darcs would probably be the best choice because its easy to use, and the darcs team almost has a working linux-kernel import script (originally designed to just test darcs with a huge repo, but provides a mostly working linux tree). So, without the flamewar, what is everyone's opinion on this? Speaking as somebody that uses Darcs evey day, my opinion is that the future of OSS SCM will be something like arch or darcs but that neither are ready for projects the size of the linux kernel yet. Darcs is definitely way too slow for really large projects (though great for small to medium sized ones). Last I checked, Arch was still too slow in some areas, though that might have changed in recent months. Also, many people, me included, find the usability of arch to be from ideal. My hope and expectation is that Arch and Darcs will both improve their performance, features, and usability and that in the not too distant future both of them will be viable alternatives for large scale source tree management. The important thing for the health of the SCM ecosystem is that there be ways to losslessly convert and interoperate between them as well as between legacy/centralized systems such as CVS and SVN as well as with BK. -Tupshin - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [darcs-users] Re: [BK] upgrade will be needed
On Thu, February 17, 2005 9:24 pm, Tupshin Harper said: Hi Tupshin, Speaking as somebody that uses Darcs evey day, my opinion is that the future of OSS SCM will be something like arch or darcs but that neither are ready for projects the size of the linux kernel yet. Darcs is definitely way too slow for really large projects (though great for small to medium sized ones). Last I checked, Arch was still too slow in some areas, though that might have changed in recent months. Also, many people, me included, find the usability of arch to be from ideal. My hope and expectation is that Arch and Darcs will both improve their performance, features, and usability and that in the not too distant future both of them will be viable alternatives for large scale source tree management. Falling into the same category probably is svk, although it's less mature than the options you cite. The important thing for the health of the SCM ecosystem is that there be ways to losslessly convert and interoperate between them as well as between legacy/centralized systems such as CVS and SVN as well as with BK. Amen. Thanks, Sean - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/