Re: RHEL 5.2 and 5.3 - ISCSI Errors impacting database performance?

Mike Christie Wed, 18 Mar 2009 10:46:20 -0700

bigcatxjs wrote:
> Hi,
> We have encountered this error below.  This is the first time I have
> seen this before;



This is with the noop settings set to 0 right? Was this the RHEL 5.3 or 
5.2 setup?

Could you do

rpm -q iscsi-initiator-utils


> 
> 
> Mar 17 12:40:47 MYHOST53 kernel:   Vendor: DataCore  Model:
> SANmelody         Rev: DCS
> Mar 17 12:40:47 MYHOST53 kernel:   Type:   Direct-
> Access                      ANSI SCSI revision: 04
> Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: 41943040 512-byte
> hdwr sectors (21475 MB)
> Mar 17 12:40:47 MYHOST53 kernel: sdd: Write Protect is off
> Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: drive cache: write
> back w/ FUA
> Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: 41943040 512-byte
> hdwr sectors (21475 MB)
> Mar 17 12:40:47 MYHOST53 kernel: sdd: Write Protect is off
> Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: drive cache: write
> back w/ FUA
> Mar 17 12:40:47 MYHOST53 kernel:  sdd: sdd1
> Mar 17 12:40:47 MYHOST53 kernel: sd 5:0:0:0: Attached scsi disk sdd
> Mar 17 12:40:47 MYHOST53 kernel: sd 5:0:0:0: Attached scsi generic sg2
> type 0
> Mar 17 12:40:47 MYHOST53 iscsid: received iferror -38
> Mar 17 18:21:39 MYHOST53 last message repeated 20 times


> Mar 17 18:27:59 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device


It looks like one of the following is happening:

1. were using RHEL 5.2 and the target logged us out or dropped the 
session and when we tried to login we got what we thought was a fatal 
error (but may be a transient error) from the target so iscsid destroyed 
the session. When this happens the devices will be removed and IO to the 
device will get failed like you see below with the rejecting to dead device.

In RHEL 5.3 this should be fixed. We will retry the login error instead 
of giving up right away.

2. someone ran a iscsiadm logout command.

3. iscsid bugged out and killed the session. I do not think this happens 
because I see below for the session4 (connection4:0) we get an error and 
end up logging back in so iscsid is up and running.


But if it is #1, it makes me think maybe the target is dropping the 
session or logging is out. This would explain some nops timing out or 
failing or the conn failures in the other logs and below.

Was there anything in the target logs at this time? Maybe something 
about a protocol error or something about rebalancing IO or was there 
anything going on on the target like a firmware upgrade?

I am afraid I do not know much about these targets. I have never used 
one. Have you made any requests to the data core people? Do you have a 
support guy that you can send me a email address for? Even a tech sales 
guy there or something might be useful to try and find someone.

Does anyone know anyone there?


> Mar 17 18:28:04 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device
> Mar 17 18:28:04 MYHOST53 kernel: journal_bmap: journal block not found
> at offset 2616 on sdc1
> Mar 17 18:28:04 MYHOST53 kernel: Aborting journal on device sdc1.
> Mar 17 18:28:04 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device
> Mar 17 18:28:04 MYHOST53 kernel: Buffer I/O error on device sdc1,
> logical block 1545
> Mar 17 18:28:04 MYHOST53 kernel: lost page write due to I/O error on
> sdc1
> Mar 17 23:03:40 MYHOST53 kernel:  connection4:0: iscsi: detected conn
> error (1011)
> Mar 17 23:03:41 MYHOST53 iscsid: Kernel reported iSCSI connection 4:0
> error (1011) state (3)
> Mar 17 23:03:44 MYHOST53 iscsid: received iferror -38
> Mar 17 23:03:44 MYHOST53 last message repeated 2 times
> Mar 17 23:03:44 MYHOST53 iscsid: connection4:0 is operational after
> recovery (1 attempts)
> Mar 17 23:46:17 MYHOST53 kernel:  connection4:0: iscsi: detected conn
> error (1011)
> Mar 17 23:46:18 MYHOST53 iscsid: Kernel reported iSCSI connection 4:0
> error (1011) state (3)
> Mar 17 23:46:20 MYHOST53 iscsid: received iferror -38
> Mar 17 23:46:20 MYHOST53 last message repeated 2 times
> Mar 17 23:46:20 MYHOST53 iscsid: connection4:0 is operational after
> recovery (1 attempts)
> Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device
> Mar 18 04:04:27 MYHOST53 kernel: EXT3-fs error (device sdc1):
> ext3_find_entry: reading directory #2 offset 0
> Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device
> Mar 18 04:04:27 MYHOST53 kernel: Buffer I/O error on device sdc1,
> logical block 0
> Mar 18 04:04:27 MYHOST53 kernel: lost page write due to I/O error on
> sdc1
> Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device
> Mar 18 04:04:27 MYHOST53 kernel: EXT3-fs error (device sdc1):
> ext3_find_entry: reading directory #2 offset 0
> Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device
> Mar 18 04:04:27 MYHOST53 kernel: Buffer I/O error on device sdc1,
> logical block 0
> Mar 18 04:04:27 MYHOST53 kernel: lost page write due to I/O error on
> sdc1
> Mar 18 14:56:49 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> device
> Mar 18 14:56:49 MYHOST53 kernel: ext3_abort called.
> Mar 18 14:56:49 MYHOST53 kernel: EXT3-fs error (device sdc1):
> ext3_journal_start_sb: Detected aborted journal
> 
> So quite a serious error.  I'm assuming that it would not be anything
> to do with the iscsi time-out parm changes we made previosuly....  the

Yeah, it should not. Turning them off though may have changed where the 
problem was detected and so we took a different error handling path.




> disk was not under any i/o stress at all when the error occurred.
> 
> 
> Thanks,
> Richard.
> > 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: RHEL 5.2 and 5.3 - ISCSI Errors impacting database performance?

Reply via email to