Re: space compression (again)

2005-04-19 Thread Martin Uecker
On Sat, Apr 16, 2005 at 07:37:02PM +0200, Martin Uecker wrote: > On Sat, Apr 16, 2005 at 11:11:00AM -0400, C. Scott Ananian wrote: > > The rsync approach does not use fixed chunk boundaries; this is necessary > > to ensure good storage reuse for the expected case (ie; inserting a single > > lin

Re: space compression (again)

2005-04-16 Thread Martin Uecker
On Sat, Apr 16, 2005 at 11:11:00AM -0400, C. Scott Ananian wrote: > On Sat, 16 Apr 2005, Martin Uecker wrote: > > >The right thing (TM) is to switch from SHA1 of compressed > >content for the complete monolithic file to a merkle hash tree > >of the uncompressed content. This would make the hash >

Re: space compression (again)

2005-04-16 Thread C. Scott Ananian
On Sat, 16 Apr 2005, Martin Uecker wrote: The right thing (TM) is to switch from SHA1 of compressed content for the complete monolithic file to a merkle hash tree of the uncompressed content. This would make the hash independent of the actual storage method (chunked or not). It would certainly be n

Re: space compression (again)

2005-04-16 Thread Martin Uecker
On Fri, Apr 15, 2005 at 12:11:43PM -0700, Linus Torvalds wrote: > On Fri, 15 Apr 2005, C. Scott Ananian wrote: > > > > So I guess I'll have to implement this and find out, won't I? =) > > The best way to shup somebody up is always to just do it, and say "hey, I > told you so". It's hard to arg

Re: space compression (again)

2005-04-16 Thread David Lang
nel.org Subject: Re: space compression (again) For for this email not threading properly, I have been lurking on the mail list archives and just had to reply to this message. I was planning to ask exactly this question, and Scott beat me to to. I even wanted to call them "chunks" too. :-)

Re: space compression (again)

2005-04-15 Thread Ray Heasman
For for this email not threading properly, I have been lurking on the mail list archives and just had to reply to this message. I was planning to ask exactly this question, and Scott beat me to to. I even wanted to call them "chunks" too. :-) It's probably worthwhile for anyone discussing this su

Re: space compression (again)

2005-04-15 Thread Linus Torvalds
On Fri, 15 Apr 2005, C. Scott Ananian wrote: > > So I guess I'll have to implement this and find out, won't I? =) The best way to shup somebody up is always to just do it, and say "hey, I told you so". It's hard to argue with numbers. Linus - To unsubscribe from this l

Re: space compression (again)

2005-04-15 Thread Derek Fawcus
On Fri, Apr 15, 2005 at 02:45:55PM -0400, C. Scott Ananian wrote: > > - we already have wasted space due to the low-level filesystem (as > > opposed to "git") usually being block-based, which means that space > > utilization for small objects tends to suck. So you really want to > > prefer ob

Re: space compression (again)

2005-04-15 Thread Derek Fawcus
On Fri, Apr 15, 2005 at 01:19:30PM -0400, C. Scott Ananian wrote: > Why are blobs per-file? [After all, Linus insists that files are an > illusion.] Why not just have 'chunks', and assemble *these* > into blobs (read, 'files')? A good chunk size would fit evenly into some > number of disk blo

Re: space compression (again)

2005-04-15 Thread C. Scott Ananian
On Fri, 15 Apr 2005, Linus Torvalds wrote: The problem with chunking is: - it complicates a lot of the routines. Things like "is this file unchanged" suddenly become "is this file still the same set of chunks", which is just a _lot_ more code and a lot more likely to have bugs. The blob still h

Re: space compression (again)

2005-04-15 Thread Linus Torvalds
On Fri, 15 Apr 2005, C. Scott Ananian wrote: > > Why are blobs per-file? [After all, Linus insists that files are an > illusion.] Why not just have 'chunks', and assemble *these* > into blobs (read, 'files')? A good chunk size would fit evenly into some > number of disk blocks (no wasted s

space compression (again)

2005-04-15 Thread C. Scott Ananian
I've been reading the archives (a bad idea, I know). Here's a concrete suggestion for GIT space-compression which is (I believe) consistent with the philosophy of GIT. Why are blobs per-file? [After all, Linus insists that files are an illusion.] Why not just have 'chunks', and assemble *the