On Mar 20, 2010, at 4:17 PM, David Holland wrote: > On Sat, Mar 20, 2010 at 04:06:32PM -0400, Steven Bellovin wrote: >>> That suggests that something is flushing buffers to a device that's >>> suspended and it's throwing them away instead of rejecting them or >>> panicing. >> >> Possibly.... > > Although it doesn't quite make sense, because in most cases this could > only corrupt the fs if the same block was left untouched afterwards > for long enough for the (allegedly) clean buffer to be discarded, and > that shouldn't cause a panic right after resume. Unless the fs was > already broken from a previous suspend, I guess. > > Maybe there's suspend code somewhere that writes out and also discards > buffers in the hopes of cleaning up for some future suspend-to-disk > work? Could be, I guess, but I'd tend to think not. I ought to go look > at the code but I don't think I have time for that this weekend. :-| > >>> Does stuffing a couple sync calls somewhere before it starts >>> suspending devices (wherever that is, I don't know) make the problems >>> go away? >> >> No -- I've had a sync call in my suspend script for years. More >> precisely, at the moment it's >> >> sync; sleep 1 >> >> to let things flush. No joy. > > That might not be late enough; I was thinking of inside the kernel. > >> Of course, rejecting them wouldn't seem to do any good; what's >> needed, I suspect, is for the device to queue them (as usual) but >> not fire up the disk when in suspending mode. > > Or for the writes to not be issued at all until after resume. > > ISTM it must be either the syncer firing at the wrong time or > something's gotten out of order in the suspend sequencing. >
Let me see if I can find my first note on the subject -- it might give a clue about the date of any changes. > --Steve Bellovin, http://www.cs.columbia.edu/~smb