I've been looking at the diskd stuff a little for someone to see if I can mitigate the crashes under load. I didn't feel like trying to fix the main code paths to support reentry :)
So here's what I've got thus far: http://www.creative.net.au/diffs/20080119-diskd-2.diff * Track the number of opened storeIOState's per swap dir; * Limit magic1 to be the number of open files in that swapdir, rather than the number of away messages; * disable using diskd for unlink; just use unlinkd. These are all an attempt to constrain the queue size to be somewhat related to the number of open storeIOState's for a given swapdir. Unfortunately its not -quite- related as I'm still seeing 3x and 4x the number of away messages to storeIOState for a given swapdir, but it doesn't reach magic2 anywhere near as often now and doesn't end up having to call storeDirCallback() recursively under high load. Magic1 = 64, Magic2=128 here. I think thats about as good a solution as I can come up with in the short term. I'm not going to commit it in its entirety - I may just commit the unlinkd change as that itself may mitigate issues enough to be worth it - but if diskd is going to hang around in the future then it needs to be a way of dispatching queued disk events rather than being the queue itself (ie, how aio works.) (Hopefully it works for the poor guy who is stuck with diskd and the crashes!) Adrian -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
