It appears that increasing the "MAX_DIRENT_COUNT" in the src/kernel/linux2.6/pvfs2-dev-proto.h file has turned out to be a bad thing for us. We had implemented this to be 96 also, and found some issues in some stress testing.
We've hit a scenario where a single directory on our file system contained > 800,000 files/directories, with many directories containing 10,000+ files each. When we executed 'ls -Rl' on the top level directory, after about 8 hours, the 'ls' command was consuming 800MB+ memory and eventually exited with a "memory exhausted" error. We definitely have some paths that are long enough that 96 of them won't fit into a single 4K page. We backed out only the "MAX_DIRENT_COUNT" in the src/kernel/linux2.6/pvfs2-dev-proto.h and put it back at 0x00000020 (32) and reran the test. The 'ls -Rl' consistently runs in about an hour now, and finishes correctly. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Phil Carns Sent: Thursday, September 11, 2008 9:33 AM To: Bart Taylor Cc: [email protected] Subject: Re: [Pvfs2-developers] Listing performance patch Hi Bart, I fixed a silly bug in our readdir logic just now, and now your patch works fine for the case I was looking at. I applied the dirent increase patch to trunk. I now get the correct number of getdents calls (using ext3 for comparison) on PVFS: getdents64(3, /* 170 entries */, 4096) = 4080 getdents64(3, /* 132 entries */, 4096) = 3168 getdents64(3, /* 0 entries */, 4096) = 0 So even with just 300 entries your patch takes us from 11 getdents system calls down to 3 to do an ls. Thanks! -Phil Phil Carns wrote: > I looked at the code a little just now. The getdents system call passes > a filldir() callback function into the file system readdir() > implementation that lets it fill entries until the user's dentry buffer > is full. The dentries at this level use variable length strings. The > only remaining cap at this point is the size of the dentry buffer passed > in from user space (and any artificial cap introduced by the file system > implementation). > > http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L270 > http://lxr.linux.no/linux+v2.6.26.5/fs/readdir.c#L232 > > If I do an strace on a directory with 300 entries on ext3, this is what > happens: > > getdents64(3, /* 170 entries */, 4096) = 4080 > getdents64(3, /* 132 entries */, 4096) = 3168 > getdents64(3, /* 0 entries */, 4096) = 0 > > If I do the same thing on a PVFS volume, this is what happens: > > getdents64(3, /* 34 entries */, 4096) = 816 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 32 entries */, 4096) = 768 > getdents64(3, /* 12 entries */, 4096) = 288 > getdents64(3, /* 0 entries */, 4096) = 0 > > The latter is not filling up the getdents buffer because our code is > stopping at 32 entries per iteration. If I then apply Bart's patch, > things improve in terms of how much it fits into one getdents system > call, but on my box at least (2.6.24-19, 32bit, current PVFS trunk) > something new breaks: > > getdents64(3, /* 170 entries */, 4096) = 4080 > getdents64(3, /* 0 entries */, 4096) = 0 > > It looks like it stopped after one getdents (the actual output from ls > only shows 170 entries). > > So... I would like to apply this patch, but first I need to dig a little > more and find out what the bug is on my system that is making it stop at > the first getdents call. It must not be handling the token right in the > case where PVFS returns more entries than filldir() can consume. > > -Phil > > > Rob Ross wrote: >> Has the internal kernel value changed since we last looked? >> >> Rob >> >> On Sep 4, 2008, at 4:16 PM, Phil Carns wrote: >> >>> Sam Lang wrote: >>>> Hi Bart, >>>> Thanks for the patch. For users with that many files in a >>>> directory, using pvfs2-ls is probably a good alternative. >>>> The kernel does readdir requests 32 entries at a time, so increasing >>>> MAX_NUM_DIRENTS won't help for ls. Long listings requires getting >>>> the size of files, which in PVFS is fairly expensive. >>>> Unfortunately, we haven't kept up with the readdirplus >>>> implementation, some bugs have probably crept in since Murali added >>>> that tool. If you were motivated to look at where the servers were >>>> crashing, we'd certainly be interested in helping with the debugging >>>> there. >>>> Thanks again, >>>> -sam >>> >>> It does look like ls improved with the patches for some reason, though. >>> >>> The 256 and 512 results are also just about close enough to be noise. >>> It looks like most of the benefit came from the jump from 32/64 to 256. >>> >>> -Phil >>> _______________________________________________ >>> Pvfs2-developers mailing list >>> [email protected] >>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers >> > > _______________________________________________ > Pvfs2-developers mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers _______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
