On Wed, Mar 24, 2010 at 4:32 PM, Steve Simmons <[email protected]> wrote: > > On Mar 18, 2010, at 2:37 AM, Tom Keiser wrote: > >> On Wed, Mar 17, 2010 at 7:41 PM, Derrick Brashear <[email protected]> wrote: >>> On Wed, Mar 17, 2010 at 12:50 PM, Steve Simmons <[email protected]> wrote: >>>> We've been seeing issues for a while that seem to relate to the number of >>>> volumes in a single vice partition. The numbers and data are inexact >>>> because there are so many damned possible parameters that affect >>>> performance, but it appears that somewhere between 10,000 and 14,000 >>>> volumes performance falls off significantly. That 40% difference in volume >>>> count results in 2x to 3x falloffs for performance in issues that affect >>>> the /vicep as a whole - backupsys, nightly dumps, vos listvol, etc. >>>> >> >> First off, could you describe how you're measuring the performance drop-off? > > Wall clock, mostly. Operations which touch all the volumes on a server take > disproportionately longer on servers w/10,000 volumes vs servers with 14,000. > The best operations to show this are vos backupsys and our nightly dumps, > which call vos dump with various parameters on every volume on the server. >
Ok. Well, this likely rules out the volume hash chain suggestion--we don't directly use the hash table in the volserver (although we do perform at least two lookups as a consequence of performing fssync ops as part of the volume transaction). The reason I say it's unlikely is fssync overhead is an insignificant component of the execution time for the vos ops you're talking about. >> The fact that this relationship b/t volumes and performance is >> superlinear makes me think you're exceeding a magic boundary (e.g >> you're now causing eviction pressure on some cache where you weren't >> previously...). > > Our estimate too. But before drilling down, it seemed worth checking if > anyone else has a similar server - ext3 with 14,000 or more volumes in a > single vice partition - and has seen a difference. Note, tho, that it's not > #inodes or total disk usage in the partition. The servers that exhibited the > problem had a large number of mostly empty volumes. > Sure. Makes sense. The one thing that does come to mind is that regardless of the number of inodes, ISTR some people were having trouble with ext performance when htree indices were turned on because spatial locality of reference against the inode tables goes way down when you process files in the order returned by readdir(), since readdir() in htree mode returns files in hash chain order rather than more-or-less inode order. This could definitely have a huge impact on the salvager [especially GetVolumeSummary(), and to a lesser extent ListViceInodes() and friends]. I'm less certain how it would affect things in the volserver, but it would certainly have an effect on operations which delete clones, since the nuke code also calls ListViceInodes(). In addition, with regard to ext htree indices I'll pose the (completely untested) hypothesis that htree indices aren't necessarily a net win for the namei workload. Given that namei goes great lengths to avoid large directories (with the notable exception of the /vicepXX root dir itself), it is conceivable that htree overhead is actually a net loss. I don't know for sure, but I'd say it's worth doing further study. In a volume with files>>dirs you're going to see on the order of ~256 files per namei directory. Certainly a linear search of on average 128 entries is expensive, but it may be worth verifying this empirically because we don't know how much overhead htree and its side-effects produce. Regrettably, there don't seem to be any published results on the threshold above which htree becomes a net win... Finally, you did tune2fs -O dir_index <dev> before populating the file system, right? >> Another possibility accounting for the superlinearity, which would >> very much depend upon your workload, is that by virtue of increased >> volume count you're now experiencing higher volume operation >> concurrency, thus causing higher rates of partition lock contention. >> However, this would be very specific to the volume server and >> salvager--it should not have any substantial effect on the file >> server, aside from some increased VOL_LOCK contention... > > Salvager is not involved, or at least, hasn't yet been involved. It's vos > backupsys and vos dump where we see it mostly. What I was trying to say is if the observed performance regression involves either the volserver, or the salvager, then it could involve partition lock contention. However, this will only come into play if you're running a lot of vos jobs in parallel against the same vice partition... Cheers, -Tom _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
