Thanks Mike...

On Mar 18, 5:45 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
> bigcatxjs wrote:
> > Hi,
> > We have encountered this error below.  This is the first time I have
> > seen this before;
>
> This is with the noop settings set to 0 right? Was this the RHEL 5.3 or
> 5.2 setup?

It is our RHEL 5.3 host.

>
> Could you do
>
> rpm -q iscsi-initiator-utils

Sure...
- rpm -q iscsi-initiator-utils;

iscsi-initiator-utils-6.2.0.868-0.18.el5

>
>
>
>
>
> > Mar 17 12:40:47 MYHOST53 kernel:   Vendor: DataCore  Model:
> > SANmelody         Rev: DCS
> > Mar 17 12:40:47 MYHOST53 kernel:   Type:   Direct-
> > Access                      ANSI SCSI revision: 04
> > Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: 41943040 512-byte
> > hdwr sectors (21475 MB)
> > Mar 17 12:40:47 MYHOST53 kernel: sdd: Write Protect is off
> > Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: drive cache: write
> > back w/ FUA
> > Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: 41943040 512-byte
> > hdwr sectors (21475 MB)
> > Mar 17 12:40:47 MYHOST53 kernel: sdd: Write Protect is off
> > Mar 17 12:40:47 MYHOST53 kernel: SCSI device sdd: drive cache: write
> > back w/ FUA
> > Mar 17 12:40:47 MYHOST53 kernel:  sdd: sdd1
> > Mar 17 12:40:47 MYHOST53 kernel: sd 5:0:0:0: Attached scsi disk sdd
> > Mar 17 12:40:47 MYHOST53 kernel: sd 5:0:0:0: Attached scsi generic sg2
> > type 0
> > Mar 17 12:40:47 MYHOST53 iscsid: received iferror -38
> > Mar 17 18:21:39 MYHOST53 last message repeated 20 times
> > Mar 17 18:27:59 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
>
> It looks like one of the following is happening:
>
> 1. were using RHEL 5.2 and the target logged us out or dropped the
> session and when we tried to login we got what we thought was a fatal
> error (but may be a transient error) from the target so iscsid destroyed
> the session. When this happens the devices will be removed and IO to the
> device will get failed like you see below with the rejecting to dead device.
>
> In RHEL 5.3 this should be fixed. We will retry the login error instead
> of giving up right away.
>
> 2. someone ran a iscsiadm logout command.

Unlikely, I am the only person working with this host currently.

>
> 3. iscsid bugged out and killed the session. I do not think this happens
> because I see below for the session4 (connection4:0) we get an error and
> end up logging back in so iscsid is up and running.

Yes - iscsidm -m session -P3 showed ISCSI as running.  BUT the device
SDC1;
/dev/sdc1               /sandisk1               ext3
_netdev         0 0

It disappeared!  in /DEV the SDC re-appeared as SDD!!  So I needed to
update our FSTAB to
/dev/sdd1               /sandisk1               ext3
_netdev         0 0

... then remount the volume as /sandisk1, then log-out and log-back
into ISCSI.

On our prod boxes (such as the RHEL 5.2 box) we use LABELS.

>
> But if it is #1, it makes me think maybe the target is dropping the
> session or logging is out. This would explain some nops timing out or
> failing or the conn failures in the other logs and below.
>
> Was there anything in the target logs at this time? Maybe something
> about a protocol error or something about rebalancing IO or was there
> anything going on on the target like a firmware upgrade?

I have checked the logs on the SM node - unfortunately the logs are
circular so the history has already been overwritten (own-goal on my
part!).  I have checked this morning and so far only informational
messages (no errors reported).

>
> I am afraid I do not know much about these targets. I have never used
> one. Have you made any requests to the data core people? Do you have a
> support guy that you can send me a email address for? Even a tech sales
> guy there or something might be useful to try and find someone.
>

We have support with DataCore Europe and have logged support bundles
with them in the past.  I am looking to raise a new one shortly.

Thanks, Rich.

END.

> Does anyone know anyone there?
>
>
>
> > Mar 17 18:28:04 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
> > Mar 17 18:28:04 MYHOST53 kernel: journal_bmap: journal block not found
> > at offset 2616 on sdc1
> > Mar 17 18:28:04 MYHOST53 kernel: Aborting journal on device sdc1.
> > Mar 17 18:28:04 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
> > Mar 17 18:28:04 MYHOST53 kernel: Buffer I/O error on device sdc1,
> > logical block 1545
> > Mar 17 18:28:04 MYHOST53 kernel: lost page write due to I/O error on
> > sdc1
> > Mar 17 23:03:40 MYHOST53 kernel:  connection4:0: iscsi: detected conn
> > error (1011)
> > Mar 17 23:03:41 MYHOST53 iscsid: Kernel reported iSCSI connection 4:0
> > error (1011) state (3)
> > Mar 17 23:03:44 MYHOST53 iscsid: received iferror -38
> > Mar 17 23:03:44 MYHOST53 last message repeated 2 times
> > Mar 17 23:03:44 MYHOST53 iscsid: connection4:0 is operational after
> > recovery (1 attempts)
> > Mar 17 23:46:17 MYHOST53 kernel:  connection4:0: iscsi: detected conn
> > error (1011)
> > Mar 17 23:46:18 MYHOST53 iscsid: Kernel reported iSCSI connection 4:0
> > error (1011) state (3)
> > Mar 17 23:46:20 MYHOST53 iscsid: received iferror -38
> > Mar 17 23:46:20 MYHOST53 last message repeated 2 times
> > Mar 17 23:46:20 MYHOST53 iscsid: connection4:0 is operational after
> > recovery (1 attempts)
> > Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
> > Mar 18 04:04:27 MYHOST53 kernel: EXT3-fs error (device sdc1):
> > ext3_find_entry: reading directory #2 offset 0
> > Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
> > Mar 18 04:04:27 MYHOST53 kernel: Buffer I/O error on device sdc1,
> > logical block 0
> > Mar 18 04:04:27 MYHOST53 kernel: lost page write due to I/O error on
> > sdc1
> > Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
> > Mar 18 04:04:27 MYHOST53 kernel: EXT3-fs error (device sdc1):
> > ext3_find_entry: reading directory #2 offset 0
> > Mar 18 04:04:27 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
> > Mar 18 04:04:27 MYHOST53 kernel: Buffer I/O error on device sdc1,
> > logical block 0
> > Mar 18 04:04:27 MYHOST53 kernel: lost page write due to I/O error on
> > sdc1
> > Mar 18 14:56:49 MYHOST53 kernel: scsi 2:0:0:0: rejecting I/O to dead
> > device
> > Mar 18 14:56:49 MYHOST53 kernel: ext3_abort called.
> > Mar 18 14:56:49 MYHOST53 kernel: EXT3-fs error (device sdc1):
> > ext3_journal_start_sb: Detected aborted journal
>
> > So quite a serious error.  I'm assuming that it would not be anything
> > to do with the iscsi time-out parm changes we made previosuly....  the
>
> Yeah, it should not. Turning them off though may have changed where the
> problem was detected and so we took a different error handling path.
>
> > disk was not under any i/o stress at all when the error occurred.
>
> > Thanks,
> > Richard.
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to