Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Tom Lord


  > However you're
  > right that the original structure proposed by Linus is too flat.

That was the only point I *meant* to defend.   The rest was error.

-t
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Tom Lord


   > Tom, please stop this ext* filesystem bashing ;-) 

For one thing... yes, i'm totally embarassed on this issue.
I made a late-night math error in a spec.  *hopefully* would
have noticed it on my own as I coded to that spec but y'all
have been wonderful at pointing out my mistake to me even 
though I initially defended it.

As for ext* bashing it's not bashing exactly.  I/O bandwidth
gets a little better, disks get a bunch cheaper --- then ext* 
doesn't look bad at all in this respect.  And we're awefully close
to that point.   

Meanwhile, for times measured in years, I've gotten complaints from
ext* users about software that is just fine on other filesystems
over issues like the allocation size of small files.

So ext*, from my perspective, was a little too far ahead of its time
and, yes, my complaints about it are just about reaching their
expiration date.

Anyway, thanks for all the sanity about directory layout.  Really,
it was just an "I'm too sleepy to be doing this right now" error.

-t

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Tom Lord

   > [your 0:3/4:7 directory hierarchy is horked]

Absolutely.  Made a dumb mistake the night I wrote that spec
and embarassed that I initially defended it.   I had an arithmetic
error.   Thanks, this time, for your persistence in pointing it out.

-t
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Tom Lord


   > Yes, it really doesn't make much sense to have so big keys on the
   > directories.

It's official... i'm blushing wildly thank you for the various
replies that pointed out my thinko.

That part of my spec hasn't been coded yet --- i just wrote text.  It
really was the silly late-night error of sort: "hmm...let's see, 4 hex
digits plus 4 hex digits  that's 16 bits sounds about right."

Really, I'll fix it.

-t
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Tom Lord

  > Using your suggested indexing method that uses [0:4] as the 1st level key 
and
 [0:3]
  > [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M,
  > where the top level contains 18665 1st level keys, the largest first level 
dir
  > contains 5 entries, and all 2nd level dirs contain exactly 1 entry.


That's just a mistake in the spec.  The format should probably be
multi-level but, yes, the fanout I suggested is currently quite bogus.
When I write that part of that code (today or tomorrow) I'll fix it.

A few people pointed that out.  Thanks.

-t

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread duchier
Tomas Mraz <[EMAIL PROTECTED]> writes:

>> Btw, if, as you indicate above, you do believe that a 1 level indexing should
>> use [0:2], then it doesn't make much sense to me to also suggest that a 2 
>> level
>> indexing should use [0:1] as primary subkey :-)
>
> Why do you think so? IMHO we should always target a similar number of
> files/subdirectories in a directories of the blob archive. So If I
> always suppose that the archive would contain at most 16 millions of
> files then the possible indexing schemes are either 1 level with key
> length 3 (each directory would contain ~4096 files) or 2 level with key
> length 2 (each directory would contain ~256 files).
> Which one is better could be of course filesystem and hardware
> dependent.

First off, I have been using python slice notation, so when I write [0:2] I mean
a key of length 2 (the second index is not included).  I now realize that when
you wrote the same you meant to include the second index.

I believe that our disagreement comes from the fact that we are asking different
questions.  You consider the question of how to best index a fixed database and
I consider the question of how to best index an ever increasing database.

Now consider why we even want multiple indexing levels: presumably this is
because certain operations become too costly when the size of a directory
becomes too large.  If that's not the case, then we might as well just have one
big flat directory - perhaps that's even a viable option for some
filesystems.[1]

  [1] there is the additional consideration that a hierarchical system
  implements a form of key compression by sharing key prefixes.  I don't know at
  what point such an effect becomes beneficial, if ever.

Now suppose we need at least one level of indexing.  Under an assumption of
uniform distribution of bits in keys, as more objects are added to the database,
the lower levels are going to fill up uniformly.  Therefore at those levels we
are again faced with exactly the same indexing problem and thus should come up
with exactly the same answer.

This is why I believe that the scheme I proposed is best: when a bottom level
directory fills up past a certain size, introduce under it an additional level,
and reindex the keys.  Since the "certain size" is fixed, this is a constant
time operation.

One could also entertain the idea of reindexing not just a bottom level
directory but an entire subtree of the database (this would be closer to your
idea of finding an optimal reindexing of just this part of the database).
However this has the disadvantage that the operation's cost grows exponentially
with the depth of the tree.

Cheers,

--Denys

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Tomas Mraz
On Thu, 2005-04-21 at 11:09 +0200, Denys Duchier wrote:
> Tomas Mraz <[EMAIL PROTECTED]> writes:
> 
> > If we suppose the maximum number of stored blobs in the order of milions
> > probably the optimal indexing would be 1 level [0:2] indexing or 2
> > levels [0:1] [2:3]. However it would be necessary to do some
> > benchmarking first before setting this to stone.
> 
> As I have suggested in a previous message, it is trivial to implement adaptive
> indexing: there is no need to hardwire a specific indexing scheme.  
> Furthermore,
> I suspect that the optimal size of subkeys may well depend on the filesystem.
> My experiments seem to indicate that subkeys of length 2 achieve an excellent
> compromise between discriminatory power and disk footprint on ext2.
> 
> Btw, if, as you indicate above, you do believe that a 1 level indexing should
> use [0:2], then it doesn't make much sense to me to also suggest that a 2 
> level
> indexing should use [0:1] as primary subkey :-)

Why do you think so? IMHO we should always target a similar number of
files/subdirectories in a directories of the blob archive. So If I
always suppose that the archive would contain at most 16 millions of
files then the possible indexing schemes are either 1 level with key
length 3 (each directory would contain ~4096 files) or 2 level with key
length 2 (each directory would contain ~256 files).
Which one is better could be of course filesystem and hardware
dependent.

Of course it might be best to allow adaptive indexing but I think that
first some benchmarking should be made and it's possible that some fixed
scheme could be chosen as optimal.

-- 
Tomas Mraz <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Denys Duchier
Tomas Mraz <[EMAIL PROTECTED]> writes:

> If we suppose the maximum number of stored blobs in the order of milions
> probably the optimal indexing would be 1 level [0:2] indexing or 2
> levels [0:1] [2:3]. However it would be necessary to do some
> benchmarking first before setting this to stone.

As I have suggested in a previous message, it is trivial to implement adaptive
indexing: there is no need to hardwire a specific indexing scheme.  Furthermore,
I suspect that the optimal size of subkeys may well depend on the filesystem.
My experiments seem to indicate that subkeys of length 2 achieve an excellent
compromise between discriminatory power and disk footprint on ext2.

Btw, if, as you indicate above, you do believe that a 1 level indexing should
use [0:2], then it doesn't make much sense to me to also suggest that a 2 level
indexing should use [0:1] as primary subkey :-)

Cheers,

-- 
Dr. Denys Duchier - IRI & LIFL - CNRS, Lille, France
AIM: duchierdenys
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-21 Thread Tomas Mraz
On Wed, 2005-04-20 at 16:04 -0700, Tom Lord wrote:

> I think that to a large extent you are seeing artifacts
> of the questionable trade-offs that (reports tell me) the
> ext* filesystems make.   With a different filesystem, the 
> results would be very different.
Tom, please stop this ext* filesystem bashing ;-) Even with filesystem
which compresses the tails of files into one filesystem block it
wouldn't make a difference that there are potentially (and very probably
even with blob numbers in order of 10) 65536 directories on the
first level. This doesn't help much in fast reading the first level.

> I'm imagining a blob database containing may revisions of the linux
> kernel.  It will contain millions of blobs.
> 
> It's fine that some filesystems and some blob operations work fine
> on a directory with millions of files but what about other operations
> on the database?   I pity the poor program that has to `readdir' through
> millions of files.

Even with milions of files this key structure doesn't make much sense -
the keys on the first and second levels are too long. However you're
right that the original structure proposed by Linus is too flat.

-- 
Tomas Mraz <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Denys Duchier
Tom Lord <[EMAIL PROTECTED]> writes:

> Thank you for your experiment.

you are welcome.

> I think that to a large extent you are seeing artifacts
> of the questionable trade-offs that (reports tell me) the
> ext* filesystems make.   With a different filesystem, the 
> results would be very different.

No, this is not the only thing that we observe.  For example, here are the
reports for the following two experiments:

Indexing method = [2]

Max keys at level  0: 256
Max keys at level  1: 108
Total number of dirs: 257
Total number of keys:   21662
Disk footprint  :1.8M

Indexing method = [4 4]

Max keys at level  0:   18474
Max keys at level  1:   5
Max keys at level  2:   1
Total number of dirs:   40137
Total number of keys:   21662
Disk footprint  :157M

Notice the huge number of directories in the second experiment and they don't
help at all in performing discrimination.

> I'm imagining a blob database containing may revisions of the linux
> kernel.  It will contain millions of blobs.

It is very easy to write code that uses an adaptive discrimination method
(i.e. when a directory becomes too full, introduce an additional level of
discrimination and rehash).  In fact I have code that does that (rehashing if
the size of a leaf directory exceed 256), but the [2] method above doesn't even
need it even though it has 21662 keys.

Just in case there is some interest, I attach below the python scripts which I
used for my experiments:

To create an indexed archive:

python build.py SRC DST N1 ... Nk

where SRC is the root directory of the tree to be indexed, and DST names the
root directory of the indexed archive to be created.  N1 through Nk are integers
that each indicate how many chars to chop off the key to create the next level
indexing key.

python info.py DST

collects and then prints out statistics about an indexed archive.

For example, the invocation that relates to your original proposal would be:

python build.py /usr/src/linux store 4 4
python info.py store

import os,os.path,stat,sha

tree  = None
archive   = None
slices= []
lastslice = (0,-1)

def recurse(path):
s = os.stat(path)
if stat.S_ISDIR(s.st_mode):
print path
contents = []
for n in os.listdir(path):
uid = recurse(os.path.join(path,n))
contents.append('\t'.join((n,uid)))
contents = '\n'.join(contents)
buf = sha.new(contents)
uid = buf.hexdigest()
uid = ','.join((uid,str(len(contents
store(uid)
return uid
else:
fd = file(path,"rb")
contents = fd.read()
fd.close()
buf = sha.new(contents)
uid = ','.join((buf.hexdigest(),str(s.st_size)))
store(uid)
return uid

def store(uid):
p = archive
if not os.path.exists(p):
os.mkdir(p)
for s in slices:
p = os.path.join(p,uid[s[0]:s[1]])
if not os.path.exists(p):
os.mkdir(p)
p = os.path.join(p,uid[lastslice[0]:lastslice[1]])
fd = file(p,"wb")
fd.close()

if __name__ == '__main__':
import sys
from optparse import OptionParser
from types import IntType
parser = OptionParser(usage="usage: %prog TREE ARCHIVE N1 ... Nk")
(options, args) = parser.parse_args()
if len(args) < 3:
print sys.stderr, "expected at least 3 positional arguments"
sys.exit(1)
tree= args[0]
archive = args[1]
prev= 0
for a in args[2:]:
try:
next = prev+int(a)
slices.append((prev,next))
prev = next
except:
print >>sys.stderr, "positional argument is not an integer:",a
sys.exit(1)
lastslice = (next,-1)
recurse(tree)
sys.exit(0)
import os,os.path,stat

info = []
archive = None
total_keys = 0
total_dirs = 0

def collect_info(path,i):
global total_dirs,total_keys
s = os.stat(path)
if stat.S_ISDIR(s.st_mode):
total_dirs += 1
l = os.listdir(path)
n = len(l)
if i==len(info):
info.append(n)
elif n>info[i]:
info[i] = n
i += 1
for f in l:
collect_info(os.path.join(path,f),i)
else:
total_keys += 1

def print_info():
i = 0
for n in info:
print "Max keys at level %2s: %7s" % (i,n)
i += 1
print "Total number of dirs: %7s" % total_dirs
print "Total number of keys: %7s" % total_keys
fd = os.popen("du -csh %s" % archive,"r")
s = fd.read()
fd.close()
s = s.split()[0]
print "Disk footprint  : %7s"  % s

if __name__ == '__main__':
import sys
from optparse import OptionParser
parser = OptionParser(usage="usage: %prog ARCHIVE")
(options, args) = parser.parse_args()
if len(args) != 1:
print sys.stderr, "expected exactly 1 positional argument"
sys.exit(1)
archive = args[0]
collect_info(archive,0)
pr

Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Tomas Mraz
On Wed, 2005-04-20 at 19:15 +0200, [EMAIL PROTECTED] wrote:
...
> As data, I used my /usr/src/linux which uses 301M and contains 20753 files and
> 1389 directories.  To compute the key for a directory, I considered that its
> contents were a mapping from names to keys.
I suppose if you used the blob archive for storing many revisions the
number of stored blobs would be much higher. However even then we can
estimate that the maximum number of stored blobs will be in the order of
milions.

> When constructing the indexed archive, I actually stored empty files instead 
> of
> blobs because I am only interested in overhead.
> 
> Using your suggested indexing method that uses [0:4] as the 1st level key and
 [0:3]
> [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M,
> where the top level contains 18665 1st level keys, the largest first level dir
> contains 5 entries, and all 2nd level dirs contain exactly 1 entry.
Yes, it really doesn't make much sense to have so big keys on the
directories. If we would assume that SHA1 is a really good hashing
function so the probability of any hash value is the same this would
allow storing 2^16 * 2^16 * 2^16 blobs with approximately same directory
usage.

> Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that
[0:1] I suppose
> occupies 1.8M, where the top level contains 256 1st level keys, and where the
> largest 1st level dir contains 110 entries.
The question is how many entries in directory is optimal compromise
between space and the speed of access to it's files.

If we suppose the maximum number of stored blobs in the order of milions
probably the optimal indexing would be 1 level [0:2] indexing or 2
levels [0:1] [2:3]. However it would be necessary to do some
benchmarking first before setting this to stone.

-- 
Tomas Mraz <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'

2005-04-20 Thread Tomas Mraz
On Wed, 2005-04-20 at 19:15 +0200, [EMAIL PROTECTED] wrote:
...
> As data, I used my /usr/src/linux which uses 301M and contains 20753 files and
> 1389 directories.  To compute the key for a directory, I considered that its
> contents were a mapping from names to keys.
I suppose if you used the blob archive for storing many revisions the
number of stored blobs would be much higher. However even then we can
estimate that the maximum number of stored blobs will be in the order of
milions.

> When constructing the indexed archive, I actually stored empty files instead 
> of
> blobs because I am only interested in overhead.
> 
> Using your suggested indexing method that uses [0:4] as the 1st level key and
 [0:3]
> [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M,
> where the top level contains 18665 1st level keys, the largest first level dir
> contains 5 entries, and all 2nd level dirs contain exactly 1 entry.
Yes, it really doesn't make much sense to have so big keys on the
directories. If we would assume that SHA1 is a really good hashing
function so the probability of any hash value is the same this would
allow storing 2^16 * 2^16 * 2^16 blobs with approximately same directory
usage.

> Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that
[0:1] I suppose
> occupies 1.8M, where the top level contains 256 1st level keys, and where the
> largest 1st level dir contains 110 entries.
The question is how many entries in directory is optimal compromise
between space and the speed of access to it's files.

If we suppose the maximum number of stored blobs in the order of milions
probably the optimal indexing would be 1 level [0:2] indexing or 2
levels [0:1] [2:3]. However it would be necessary to do some
benchmarking first before setting this to stone.

-- 
Tomas Mraz <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html