It appears that increasing the "MAX_DIRENT_COUNT" in the
src/kernel/linux2.6/pvfs2-dev-proto.h file has turned out to be a bad
thing
for us. We had implemented this to be 96 also, and found some issues
in some
stress testing.
We've hit a scenario where a single directory on our file system
contained >
800,000 files/directories, with many directories containing 10,000+ files
each. When we executed 'ls -Rl' on the top level directory, after about 8
hours, the 'ls' command was consuming 800MB+ memory and eventually exited
with a "memory exhausted" error. We definitely have some paths that
are long
enough that 96 of them won't fit into a single 4K page.
We backed out only the "MAX_DIRENT_COUNT" in the
src/kernel/linux2.6/pvfs2-dev-proto.h and put it back at 0x00000020
(32) and
reran the test. The 'ls -Rl' consistently runs in about an hour now, and
finishes correctly.
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Phil
Carns
Sent: Thursday, September 11, 2008 9:33 AM
To: Bart Taylor
Cc: [email protected]
Subject: Re: [Pvfs2-developers] Listing performance patch
Hi Bart,
I fixed a silly bug in our readdir logic just now, and now your patch
works fine for the case I was looking at. I applied the dirent
increase patch to trunk.
I now get the correct number of getdents calls (using ext3 for
comparison) on PVFS:
getdents64(3, /* 170 entries */, 4096) = 4080
getdents64(3, /* 132 entries */, 4096) = 3168
getdents64(3, /* 0 entries */, 4096) = 0
So even with just 300 entries your patch takes us from 11 getdents
system calls down to 3 to do an ls.
Thanks!
-Phil
Phil Carns wrote:
I looked at the code a little just now. The getdents system call
passes a filldir() callback function into the file system readdir()
implementation that lets it fill entries until the user's dentry
buffer is full. The dentries at this level use variable length
strings. The only remaining cap at this point is the size of the
dentry buffer passed in from user space (and any artificial cap
introduced by the file system implementation).
http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L270
http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L232
If I do an strace on a directory with 300 entries on ext3, this is
what happens:
getdents64(3, /* 170 entries */, 4096) = 4080
getdents64(3, /* 132 entries */, 4096) = 3168
getdents64(3, /* 0 entries */, 4096) = 0
If I do the same thing on a PVFS volume, this is what happens:
getdents64(3, /* 34 entries */, 4096) = 816
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 32 entries */, 4096) = 768
getdents64(3, /* 12 entries */, 4096) = 288
getdents64(3, /* 0 entries */, 4096) = 0
The latter is not filling up the getdents buffer because our code is
stopping at 32 entries per iteration. If I then apply Bart's patch,
things improve in terms of how much it fits into one getdents system
call, but on my box at least (2.6.24-19, 32bit, current PVFS trunk)
something new breaks:
getdents64(3, /* 170 entries */, 4096) = 4080
getdents64(3, /* 0 entries */, 4096) = 0
It looks like it stopped after one getdents (the actual output from
ls only shows 170 entries).
So... I would like to apply this patch, but first I need to dig a
little more and find out what the bug is on my system that is making
it stop at the first getdents call. It must not be handling the
token right in the case where PVFS returns more entries than
filldir() can consume.
-Phil
Rob Ross wrote:
Has the internal kernel value changed since we last looked?
Rob
On Sep 4, 2008, at 4:16 PM, Phil Carns wrote:
Sam Lang wrote:
Hi Bart,
Thanks for the patch. For users with that many files in a
directory, using pvfs2-ls is probably a good alternative.
The kernel does readdir requests 32 entries at a time, so
increasing MAX_NUM_DIRENTS won't help for ls. Long listings
requires getting the size of files, which in PVFS is fairly
expensive.
Unfortunately, we haven't kept up with the readdirplus
implementation, some bugs have probably crept in since Murali
added that tool. If you were motivated to look at where the
servers were crashing, we'd certainly be interested in helping
with the debugging there.
Thanks again,
-sam
It does look like ls improved with the patches for some reason,
though.
The 256 and 512 results are also just about close enough to be
noise. It looks like most of the benefit came from the jump from
32/64 to 256.
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers