On Fri, Mar 13, 2009 at 01:26:55PM +0800, Tao Ma wrote: > ramya tn wrote: > > Feb 20 23:36:41 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = > > 0x20000 > > Feb 20 23:36:41 ImageInt1 kernel: end_request: I/O error, dev sdc, > > sector 656216192 > > Feb 20 23:36:41 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = > > 0x20000 > > Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, > > sector 657248384 > > Feb 20 23:36:42 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = > > 0x20000 > > Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, > > sector 667312256 > > Feb 20 23:36:42 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = > > 0x20000 > > Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, > > sector 670408832 > > Feb 20 23:36:42 ImageInt1 kernel: SCSI error : <1 0 2 1> return code = > > 0x20000 > > Feb 20 23:36:42 ImageInt1 kernel: end_request: I/O error, dev sdc, > > sector 670666880 > > . > > Feb 20 23:53:21 ImageInt1 kernel: Index 13: took 0 ms to do submit_bio > > for write > > Feb 20 23:53:21 ImageInt1 kernel: Index 14: took 0 ms to do checking slots > > Feb 20 23:53:21 ImageInt1 kernel: Index 15: took 50 ms to do waiting for > > write completion > > Feb 20 23:53:21 ImageInt1 kernel: Index 16: took 1904 ms to do msleep > > Feb 20 23:53:21 ImageInt1 kernel: Index 17: took 0 ms to do allocating > > bios for read > > Feb 20 23:53:21 ImageInt1 kernel: Index 18: took 0 ms to do bio alloc read > > Feb 20 23:53:21 ImageInt1 kernel: Index 19: took 0 ms to do bio add page > > read > > Feb 20 23:53:21 ImageInt1 kernel: Index 20: took 0 ms to do submit_bio > > for read > > Feb 20 23:53:21 ImageInt1 kernel: Index 21: took 44652 ms to do waiting > > for read completion > > Feb 20 23:53:21 ImageInt1 kernel: Index 22: took 0 ms to do bio alloc write > > Feb 20 23:53:21 ImageInt1 kernel: Index 23: took 0 ms to do bio add page > > write > > Feb 20 23:53:21 ImageInt1 kernel: Index 0: took 0 ms to do submit_bio > > for write > > Feb 20 23:53:21 ImageInt1 kernel: Index 1: took 0 ms to do checking slots > > Feb 20 23:53:21 ImageInt1 kernel: Index 2: took 9307 ms to do waiting > > for write completion > > Feb 20 23:53:21 ImageInt1 kernel: Index 3: took 0 ms to do allocating > > bios for read > > Feb 20 23:53:21 ImageInt1 kernel: Index 4: took 0 ms to do bio alloc read > > Feb 20 23:53:21 ImageInt1 kernel: Index 5: took 0 ms to do bio add page read > > Feb 20 23:53:21 ImageInt1 kernel: Index 6: took 0 ms to do submit_bio > > for read > > Feb 20 23:53:22 ImageInt1 kernel: Index 7: took 35756 ms to do waiting > > for read completion > > Feb 20 23:53:22 ImageInt1 kernel: Index 8: took 0 ms to do bio alloc write > > Feb 20 23:53:22 ImageInt1 kernel: Index 9: took 0 ms to do bio add page > > write > > Feb 20 23:53:22 ImageInt1 kernel: Index 10: took 0 ms to do submit_bio > > for write > > Feb 20 23:53:22 ImageInt1 kernel: Index 11: took 0 ms to do checking slots > > Feb 20 23:53:22 ImageInt1 kernel: Index 12: took 84549 ms to do waiting > > for write completion > > Feb 20 23:53:22 ImageInt1 kernel: *** ocfs2 is very sorry to be fencing > > this system by restarting *** > > I found the same scsi errors each time it fences. Can anyone suggest > > what could be the reason for these SCSI errors and is it those SCSI > > errors which is causing fencing. > I don't know the reason for SCSI errors. So just answer your second qs. > Yes, SCSI error will cause ocfs2 fencing. OCFS2 need to heartbeat in the > disk, so if it tries many times and still fails to write to disk because > of the SCSI error, it will fence itself.
Like Tao says, if ocfs2 can't read or write the disk in a timely fashion, it will fence. I think there's an issue with your storage. That second hunk of log messages shows some I/Os taking 85 seconds (84549ms) to complete. Your heartbeat timeouts are probably shorter than that, and so ocfs2 eventually has to give up. The earlier log messages, about I/O errors for sdc, are even more worrying. Those are I/Os that failed. I would check your I/O topology. Is it an overloaded SAN? Is it iSCSI without enough throughput? Do you just have a failing disk? Joel -- You can use a screwdriver to screw in screws or to clean your ears, however, the latter needs real skill, determination and a lack of fear of injuring yourself. It is much the same with JavaScript. - Chris Heilmann Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users