Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
> However you're > right that the original structure proposed by Linus is too flat. That was the only point I *meant* to defend. The rest was error. -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
> Tom, please stop this ext* filesystem bashing ;-) For one thing... yes, i'm totally embarassed on this issue. I made a late-night math error in a spec. *hopefully* would have noticed it on my own as I coded to that spec but y'all have been wonderful at pointing out my mistake to me even though I initially defended it. As for ext* bashing it's not bashing exactly. I/O bandwidth gets a little better, disks get a bunch cheaper --- then ext* doesn't look bad at all in this respect. And we're awefully close to that point. Meanwhile, for times measured in years, I've gotten complaints from ext* users about software that is just fine on other filesystems over issues like the allocation size of small files. So ext*, from my perspective, was a little too far ahead of its time and, yes, my complaints about it are just about reaching their expiration date. Anyway, thanks for all the sanity about directory layout. Really, it was just an "I'm too sleepy to be doing this right now" error. -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
> [your 0:3/4:7 directory hierarchy is horked] Absolutely. Made a dumb mistake the night I wrote that spec and embarassed that I initially defended it. I had an arithmetic error. Thanks, this time, for your persistence in pointing it out. -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
> Yes, it really doesn't make much sense to have so big keys on the > directories. It's official... i'm blushing wildly thank you for the various replies that pointed out my thinko. That part of my spec hasn't been coded yet --- i just wrote text. It really was the silly late-night error of sort: "hmm...let's see, 4 hex digits plus 4 hex digits that's 16 bits sounds about right." Really, I'll fix it. -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
> Using your suggested indexing method that uses [0:4] as the 1st level key and [0:3] > [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M, > where the top level contains 18665 1st level keys, the largest first level dir > contains 5 entries, and all 2nd level dirs contain exactly 1 entry. That's just a mistake in the spec. The format should probably be multi-level but, yes, the fanout I suggested is currently quite bogus. When I write that part of that code (today or tomorrow) I'll fix it. A few people pointed that out. Thanks. -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Tomas Mraz <[EMAIL PROTECTED]> writes: >> Btw, if, as you indicate above, you do believe that a 1 level indexing should >> use [0:2], then it doesn't make much sense to me to also suggest that a 2 >> level >> indexing should use [0:1] as primary subkey :-) > > Why do you think so? IMHO we should always target a similar number of > files/subdirectories in a directories of the blob archive. So If I > always suppose that the archive would contain at most 16 millions of > files then the possible indexing schemes are either 1 level with key > length 3 (each directory would contain ~4096 files) or 2 level with key > length 2 (each directory would contain ~256 files). > Which one is better could be of course filesystem and hardware > dependent. First off, I have been using python slice notation, so when I write [0:2] I mean a key of length 2 (the second index is not included). I now realize that when you wrote the same you meant to include the second index. I believe that our disagreement comes from the fact that we are asking different questions. You consider the question of how to best index a fixed database and I consider the question of how to best index an ever increasing database. Now consider why we even want multiple indexing levels: presumably this is because certain operations become too costly when the size of a directory becomes too large. If that's not the case, then we might as well just have one big flat directory - perhaps that's even a viable option for some filesystems.[1] [1] there is the additional consideration that a hierarchical system implements a form of key compression by sharing key prefixes. I don't know at what point such an effect becomes beneficial, if ever. Now suppose we need at least one level of indexing. Under an assumption of uniform distribution of bits in keys, as more objects are added to the database, the lower levels are going to fill up uniformly. Therefore at those levels we are again faced with exactly the same indexing problem and thus should come up with exactly the same answer. This is why I believe that the scheme I proposed is best: when a bottom level directory fills up past a certain size, introduce under it an additional level, and reindex the keys. Since the "certain size" is fixed, this is a constant time operation. One could also entertain the idea of reindexing not just a bottom level directory but an entire subtree of the database (this would be closer to your idea of finding an optimal reindexing of just this part of the database). However this has the disadvantage that the operation's cost grows exponentially with the depth of the tree. Cheers, --Denys - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
On Thu, 2005-04-21 at 11:09 +0200, Denys Duchier wrote: > Tomas Mraz <[EMAIL PROTECTED]> writes: > > > If we suppose the maximum number of stored blobs in the order of milions > > probably the optimal indexing would be 1 level [0:2] indexing or 2 > > levels [0:1] [2:3]. However it would be necessary to do some > > benchmarking first before setting this to stone. > > As I have suggested in a previous message, it is trivial to implement adaptive > indexing: there is no need to hardwire a specific indexing scheme. > Furthermore, > I suspect that the optimal size of subkeys may well depend on the filesystem. > My experiments seem to indicate that subkeys of length 2 achieve an excellent > compromise between discriminatory power and disk footprint on ext2. > > Btw, if, as you indicate above, you do believe that a 1 level indexing should > use [0:2], then it doesn't make much sense to me to also suggest that a 2 > level > indexing should use [0:1] as primary subkey :-) Why do you think so? IMHO we should always target a similar number of files/subdirectories in a directories of the blob archive. So If I always suppose that the archive would contain at most 16 millions of files then the possible indexing schemes are either 1 level with key length 3 (each directory would contain ~4096 files) or 2 level with key length 2 (each directory would contain ~256 files). Which one is better could be of course filesystem and hardware dependent. Of course it might be best to allow adaptive indexing but I think that first some benchmarking should be made and it's possible that some fixed scheme could be chosen as optimal. -- Tomas Mraz <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Tomas Mraz <[EMAIL PROTECTED]> writes: > If we suppose the maximum number of stored blobs in the order of milions > probably the optimal indexing would be 1 level [0:2] indexing or 2 > levels [0:1] [2:3]. However it would be necessary to do some > benchmarking first before setting this to stone. As I have suggested in a previous message, it is trivial to implement adaptive indexing: there is no need to hardwire a specific indexing scheme. Furthermore, I suspect that the optimal size of subkeys may well depend on the filesystem. My experiments seem to indicate that subkeys of length 2 achieve an excellent compromise between discriminatory power and disk footprint on ext2. Btw, if, as you indicate above, you do believe that a 1 level indexing should use [0:2], then it doesn't make much sense to me to also suggest that a 2 level indexing should use [0:1] as primary subkey :-) Cheers, -- Dr. Denys Duchier - IRI & LIFL - CNRS, Lille, France AIM: duchierdenys - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
On Wed, 2005-04-20 at 16:04 -0700, Tom Lord wrote: > I think that to a large extent you are seeing artifacts > of the questionable trade-offs that (reports tell me) the > ext* filesystems make. With a different filesystem, the > results would be very different. Tom, please stop this ext* filesystem bashing ;-) Even with filesystem which compresses the tails of files into one filesystem block it wouldn't make a difference that there are potentially (and very probably even with blob numbers in order of 10) 65536 directories on the first level. This doesn't help much in fast reading the first level. > I'm imagining a blob database containing may revisions of the linux > kernel. It will contain millions of blobs. > > It's fine that some filesystems and some blob operations work fine > on a directory with millions of files but what about other operations > on the database? I pity the poor program that has to `readdir' through > millions of files. Even with milions of files this key structure doesn't make much sense - the keys on the first and second levels are too long. However you're right that the original structure proposed by Linus is too flat. -- Tomas Mraz <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Tom Lord <[EMAIL PROTECTED]> writes: > Thank you for your experiment. you are welcome. > I think that to a large extent you are seeing artifacts > of the questionable trade-offs that (reports tell me) the > ext* filesystems make. With a different filesystem, the > results would be very different. No, this is not the only thing that we observe. For example, here are the reports for the following two experiments: Indexing method = [2] Max keys at level 0: 256 Max keys at level 1: 108 Total number of dirs: 257 Total number of keys: 21662 Disk footprint :1.8M Indexing method = [4 4] Max keys at level 0: 18474 Max keys at level 1: 5 Max keys at level 2: 1 Total number of dirs: 40137 Total number of keys: 21662 Disk footprint :157M Notice the huge number of directories in the second experiment and they don't help at all in performing discrimination. > I'm imagining a blob database containing may revisions of the linux > kernel. It will contain millions of blobs. It is very easy to write code that uses an adaptive discrimination method (i.e. when a directory becomes too full, introduce an additional level of discrimination and rehash). In fact I have code that does that (rehashing if the size of a leaf directory exceed 256), but the [2] method above doesn't even need it even though it has 21662 keys. Just in case there is some interest, I attach below the python scripts which I used for my experiments: To create an indexed archive: python build.py SRC DST N1 ... Nk where SRC is the root directory of the tree to be indexed, and DST names the root directory of the indexed archive to be created. N1 through Nk are integers that each indicate how many chars to chop off the key to create the next level indexing key. python info.py DST collects and then prints out statistics about an indexed archive. For example, the invocation that relates to your original proposal would be: python build.py /usr/src/linux store 4 4 python info.py store import os,os.path,stat,sha tree = None archive = None slices= [] lastslice = (0,-1) def recurse(path): s = os.stat(path) if stat.S_ISDIR(s.st_mode): print path contents = [] for n in os.listdir(path): uid = recurse(os.path.join(path,n)) contents.append('\t'.join((n,uid))) contents = '\n'.join(contents) buf = sha.new(contents) uid = buf.hexdigest() uid = ','.join((uid,str(len(contents store(uid) return uid else: fd = file(path,"rb") contents = fd.read() fd.close() buf = sha.new(contents) uid = ','.join((buf.hexdigest(),str(s.st_size))) store(uid) return uid def store(uid): p = archive if not os.path.exists(p): os.mkdir(p) for s in slices: p = os.path.join(p,uid[s[0]:s[1]]) if not os.path.exists(p): os.mkdir(p) p = os.path.join(p,uid[lastslice[0]:lastslice[1]]) fd = file(p,"wb") fd.close() if __name__ == '__main__': import sys from optparse import OptionParser from types import IntType parser = OptionParser(usage="usage: %prog TREE ARCHIVE N1 ... Nk") (options, args) = parser.parse_args() if len(args) < 3: print sys.stderr, "expected at least 3 positional arguments" sys.exit(1) tree= args[0] archive = args[1] prev= 0 for a in args[2:]: try: next = prev+int(a) slices.append((prev,next)) prev = next except: print >>sys.stderr, "positional argument is not an integer:",a sys.exit(1) lastslice = (next,-1) recurse(tree) sys.exit(0) import os,os.path,stat info = [] archive = None total_keys = 0 total_dirs = 0 def collect_info(path,i): global total_dirs,total_keys s = os.stat(path) if stat.S_ISDIR(s.st_mode): total_dirs += 1 l = os.listdir(path) n = len(l) if i==len(info): info.append(n) elif n>info[i]: info[i] = n i += 1 for f in l: collect_info(os.path.join(path,f),i) else: total_keys += 1 def print_info(): i = 0 for n in info: print "Max keys at level %2s: %7s" % (i,n) i += 1 print "Total number of dirs: %7s" % total_dirs print "Total number of keys: %7s" % total_keys fd = os.popen("du -csh %s" % archive,"r") s = fd.read() fd.close() s = s.split()[0] print "Disk footprint : %7s" % s if __name__ == '__main__': import sys from optparse import OptionParser parser = OptionParser(usage="usage: %prog ARCHIVE") (options, args) = parser.parse_args() if len(args) != 1: print sys.stderr, "expected exactly 1 positional argument" sys.exit(1) archive = args[0] collect_info(archive,0) pr
Re: chunking (Re: [ANNOUNCEMENT] /Arch/ embraces `git')
On Wed, 20 Apr 2005, Linus Torvalds wrote: What's the disk usage results? I'm on ext3, for example, which means that even small files invariably take up 4.125kB on disk (with the inode). Even uncompressed, most source files tend to be small. Compressed, I'm seeing the median blob size being ~1.6kB in my trivial checks. That's blobs only, btw. I'm working on it. The format was chosen so that blobs under 1 block long *stay* 1 block long; i.e. there's no 'chunk plus index file' overhead. So the chunking should only kick in on multiple-block files. I hacked 'convert-cache' to do the conversion and it's running out of memory on linux-2.6.git, however --- I found a few memory leaks in your code =) but I certainly seem to be missing a big one still (maybe it's in my code!). When I get this working properly, my plan is to do a number of runs over the linux-2.6 archive trying out various combinations of chunking parameters. I *will* be watching both 'real' disk usage (bunged up to block boundaries) and 'ideal' disk usage (on a reiserfs-type system). The goal is to improve both, but if I can improve 'ideal' usage significantly with a minimal penalty in 'real' usage then I would argue it's still worth doing, since that will improve network times. The handshaking penalties you mention are significant, but that's why rsync uses a pipelined approach. The 'upstream' part of your full-duplex pipe is 'free' while you've got bits clogging your 'downstream' pipe. The wonders of full-duplex... Anyway, "numbers talk, etc". I'm working on them. --scott LIONIZER LCPANES shortwave MKSEARCH ESGAIN Saddam Hussein Rijndael WASHTUB Morwenstow ZPSEMANTIC SKIMMER cryptographic FJHOPEFUL assassination ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
From: [EMAIL PROTECTED] Thank you for your experiment. I'm not surprised by the result but it is very nice to know that my expectations are right. I think that to a large extent you are seeing artifacts of the questionable trade-offs that (reports tell me) the ext* filesystems make. With a different filesystem, the results would be very different. I'm imagining a blob database containing may revisions of the linux kernel. It will contain millions of blobs. It's fine that some filesystems and some blob operations work fine on a directory with millions of files but what about other operations on the database? I pity the poor program that has to `readdir' through millions of files. That said: I may add an optional flat-directory format to my library, just to avoid issues such as those you raise over the next couple years. -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
On Wed, 2005-04-20 at 19:15 +0200, [EMAIL PROTECTED] wrote: ... > As data, I used my /usr/src/linux which uses 301M and contains 20753 files and > 1389 directories. To compute the key for a directory, I considered that its > contents were a mapping from names to keys. I suppose if you used the blob archive for storing many revisions the number of stored blobs would be much higher. However even then we can estimate that the maximum number of stored blobs will be in the order of milions. > When constructing the indexed archive, I actually stored empty files instead > of > blobs because I am only interested in overhead. > > Using your suggested indexing method that uses [0:4] as the 1st level key and [0:3] > [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M, > where the top level contains 18665 1st level keys, the largest first level dir > contains 5 entries, and all 2nd level dirs contain exactly 1 entry. Yes, it really doesn't make much sense to have so big keys on the directories. If we would assume that SHA1 is a really good hashing function so the probability of any hash value is the same this would allow storing 2^16 * 2^16 * 2^16 blobs with approximately same directory usage. > Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that [0:1] I suppose > occupies 1.8M, where the top level contains 256 1st level keys, and where the > largest 1st level dir contains 110 entries. The question is how many entries in directory is optimal compromise between space and the speed of access to it's files. If we suppose the maximum number of stored blobs in the order of milions probably the optimal indexing would be 1 level [0:2] indexing or 2 levels [0:1] [2:3]. However it would be necessary to do some benchmarking first before setting this to stone. -- Tomas Mraz <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
On Wed, 2005-04-20 at 19:15 +0200, [EMAIL PROTECTED] wrote: ... > As data, I used my /usr/src/linux which uses 301M and contains 20753 files and > 1389 directories. To compute the key for a directory, I considered that its > contents were a mapping from names to keys. I suppose if you used the blob archive for storing many revisions the number of stored blobs would be much higher. However even then we can estimate that the maximum number of stored blobs will be in the order of milions. > When constructing the indexed archive, I actually stored empty files instead > of > blobs because I am only interested in overhead. > > Using your suggested indexing method that uses [0:4] as the 1st level key and [0:3] > [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M, > where the top level contains 18665 1st level keys, the largest first level dir > contains 5 entries, and all 2nd level dirs contain exactly 1 entry. Yes, it really doesn't make much sense to have so big keys on the directories. If we would assume that SHA1 is a really good hashing function so the probability of any hash value is the same this would allow storing 2^16 * 2^16 * 2^16 blobs with approximately same directory usage. > Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that [0:1] I suppose > occupies 1.8M, where the top level contains 256 1st level keys, and where the > largest 1st level dir contains 110 entries. The question is how many entries in directory is optimal compromise between space and the speed of access to it's files. If we suppose the maximum number of stored blobs in the order of milions probably the optimal indexing would be 1 level [0:2] indexing or 2 levels [0:1] [2:3]. However it would be necessary to do some benchmarking first before setting this to stone. -- Tomas Mraz <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
chunking (Re: [ANNOUNCEMENT] /Arch/ embraces `git')
On Wed, 20 Apr 2005, C. Scott Ananian wrote: > > I'm hoping my 'chunking' patches will fix this. This ought to reduce the > size of the object store by (in effect) doing delta compression; rsync > will then Do The Right Thing and only transfer the needed deltas. > Running some benchmarks right now to see how well it lives up to this > promise... What's the disk usage results? I'm on ext3, for example, which means that even small files invariably take up 4.125kB on disk (with the inode). Even uncompressed, most source files tend to be small. Compressed, I'm seeing the median blob size being ~1.6kB in my trivial checks. That's blobs only, btw. My point being that about 75% of all blobs already take up less than the minimal amount of space that most filesystems can sanely allocate. And I'm _not_ going to say "you have to use reiserfs" with git. So the disk fragmentation really does matter. It doesn't help to make a file smaller than 4kB, it hurts - while that can be offset by sharing chunks, it might not be. Also, while network performance is important, so is the handshaking on which objects to get. Lots of small objects potentially need lots of handshaking to figure out _which_ of the objects to do. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCEMENT] /Arch/ embraces `git'
On Wed, 20 Apr 2005, Petr Baudis wrote: I think one thing git's objects database is not very well suited for are network transports. You want to have something smart doing the transports, comparing trees so that it can do some delta compression; that could probably reduce the amount of data needed to be sent significantly. I'm hoping my 'chunking' patches will fix this. This ought to reduce the size of the object store by (in effect) doing delta compression; rsync will then Do The Right Thing and only transfer the needed deltas. Running some benchmarks right now to see how well it lives up to this promise... --scott terrorist AEROPLANE munitions PAPERCLIP MI5 Morwenstow WSHOOFS CABOUNCE colonel Yakima AES MI6 nuclear NSA Cocaine Columbia plastique LICOZY ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCEMENT] /Arch/ embraces `git'
Dear diary, on Wed, Apr 20, 2005 at 12:00:36PM CEST, I got a letter where Tom Lord <[EMAIL PROTECTED]> told me that... > >From the /Arch/ perspective: `git' technology will form the > basis of a new archive/revlib/cache format and the basis > of new network transports. I think one thing git's objects database is not very well suited for are network transports. You want to have something smart doing the transports, comparing trees so that it can do some delta compression; that could probably reduce the amount of data needed to be sent significantly. > >From the `git' perspective, /Arch/ will replace the lame "directory > cache" component of `git' with a proper revision control system. I'm not sure if you fully grasped the git's philosophy yet. The "directory cache" component is not by itself any revision control system - it is merely a staging area for any revision system on top of it (IOW: subordinate, not competitor). > I started here: > >http://www.seyza.com/=clients/linus/tree/index.html > > and for those interested in `git'-theory, a good place to start is > >http://www.seyza.com/=clients/linus/tree/src/liblob/index.html These pages are surely very nice, unfortunately I have to enjoy them only from the "HTML source" view. The HTML seems completely broken, containing unterminated comments like "
Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Hi Tom, just as a datapoint, here is an experiment I carried out. I wanted to evaluate how much overhead is incurred by using several levels of directories to implement a discrimating index. I used the key format you specified: SHA1,SIZE As data, I used my /usr/src/linux which uses 301M and contains 20753 files and 1389 directories. To compute the key for a directory, I considered that its contents were a mapping from names to keys. When constructing the indexed archive, I actually stored empty files instead of blobs because I am only interested in overhead. Using your suggested indexing method that uses [0:4] as the 1st level key and [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M, where the top level contains 18665 1st level keys, the largest first level dir contains 5 entries, and all 2nd level dirs contain exactly 1 entry. Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that occupies 1.8M, where the top level contains 256 1st level keys, and where the largest 1st level dir contains 110 entries. This experiment was performed on an ext3 file system. Cheers, --Denys - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Way to go. -Miles -- Do not taunt Happy Fun Ball. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCEMENT] /Arch/ embraces `git'
`git', by Linus Torvalds, contains some very good ideas and some very entertaining source code -- recommended reading for hackers. /GNU Arch/ will adopt `git': >From the /Arch/ perspective: `git' technology will form the basis of a new archive/revlib/cache format and the basis of new network transports. >From the `git' perspective, /Arch/ will replace the lame "directory cache" component of `git' with a proper revision control system. In my view, the core ideas in `git' are quite profound and deserve an impeccable implementation. This is practical because those ideas are also pretty simple. I started here: http://www.seyza.com/=clients/linus/tree/index.html and for those interested in `git'-theory, a good place to start is http://www.seyza.com/=clients/linus/tree/src/liblob/index.html (Linus is not literally a "client" of mine. That's just the directory where this goes.) -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCEMENT] /Arch/ embraces `git'
`git', by Linus Torvalds, contains some very good ideas and some very entertaining source code -- recommended reading for hackers. /GNU Arch/ will adopt `git': >From the /Arch/ perspective: `git' technology will form the basis of a new archive/revlib/cache format and the basis of new network transports. >From the `git' perspective, /Arch/ will replace the lame "directory cache" component of `git' with a proper revision control system. In my view, the core ideas in `git' are quite profound and deserve an impeccable implementation. This is practical because those ideas are also pretty simple. I started here: http://www.seyza.com/=clients/linus/tree/index.html and for those interested in `git'-theory, a good place to start is http://www.seyza.com/=clients/linus/tree/src/liblob/index.html (Linus is not literally a "client" of mine. That's just the directory where this goes.) -t - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html