> -----Original Message----- > From: Mark Fasheh [mailto:[EMAIL PROTECTED] > Sent: Wednesday, August 15, 2007 18:04 > To: Alexei_Roudnev > Cc: Ulf Zimmermann; Sunil Mushran; [email protected] > Subject: Re: [Ocfs2-users] 6 node cluster with unexplained reboots > > On Wed, Aug 15, 2007 at 05:52:49PM -0700, Alexei_Roudnev wrote: > > ANY SCSI controller can quitly delay IO for 10 - 20 seconds, without > errors > > and explanationbs. 10 seconds threshold in OCFSv2 will never work > properly. > > That has nothing to do with what I'm asking him. > > Ulf was described his controller thusly: > > "does write into cache on its two controllers, then acknowledges a > write and then writes it actually to disk." > > I'm keying in on the part where it acknowledges a write (presumably to the > host os) and _then_ pushes that write out to the disk. In general, that's > the wrong order ;) > > > Anyway, getting back to the task of trying to fix someone's problem, I > admit > that I don't really know whether it's possible for a controller to do > writeback caching, I'm just trying to clarify what's going on, that's all. > --Mark
I primary posted the messages just as a follow up for now. Waiting for 3Par to tell me if they have anything in the logs before I decide on further progression, i.e. raising the write timeout or not. The first 4 reboots we had, which may or may not have been OCFS2, happened on our 3Par S400 which has 16GB of cache per controller. The last reboot for which I do have the console messages (thanks HP for iLO and virtual serial plus Conserver :-) ), happened on our E200, which has 8GB of cache per controller. We also have some SCSI errors on some nodes and I am currently awaiting a maintance window to replace two FC cables to see if that clears up the errors. As you can see, all kind of things unfortunately going on. And I am official on vacation right now too. Sigh. Ulf. _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
