UPDATE:  RHEL 5.3 Host with NO disk i/o to SAN volume has encountered
errors;

Mar 13 10:38:49 PETDBLINUX01 kernel:  connection1:0: iscsi: detected
conn error (1011)
Mar 13 10:38:49 PETDBLINUX01 iscsid: Kernel reported iSCSI connection
1:0 error (1011) state (3)
Mar 13 10:38:52 PETDBLINUX01 iscsid: received iferror -38
Mar 13 10:38:52 PETDBLINUX01 last message repeated 2 times
Mar 13 10:38:52 PETDBLINUX01 iscsid: connection1:0 is operational
after recovery (1 attempts)
Mar 13 11:00:06 PETDBLINUX01 kernel:  connection1:0: iscsi: detected
conn error (1011)
Mar 13 11:00:06 PETDBLINUX01 iscsid: Kernel reported iSCSI connection
1:0 error (1011) state (3)
Mar 13 11:00:09 PETDBLINUX01 iscsid: received iferror -38
Mar 13 11:00:09 PETDBLINUX01 last message repeated 2 times
Mar 13 11:00:09 PETDBLINUX01 iscsid: connection1:0 is operational
after recovery (1 attempts)

Thanks, Rich

END.

On Mar 13, 10:01 am, bigcatxjs <ad...@richardjamestrading.co.uk>
wrote:
> Thanks Mike,
>
> > For this RHEL 5.2 setup, does it make a difference if you do not use
> > ifaces and setup the box like in 5.3 below?
>
> I have used bonded ifaces so that the I/O requests can be split across
> multiple NICS (both Server-side and on the Datacore San Melody SM node
> NICS).  This split is acheived by ensuring that the volumes used by
> Oracle containing DATA and INDEX datafiles route through one named
> Iface and that volumes used by Oracle for SYSTEM, BACKUP, and REDO
> data / logs etc route through the other.  We have seen a performance
> uplift by maintaining this split despite the time-out issues.  We have
> a W2K3 x86_64 STD Oracle host that runs on one iface - this is much
> slower than the RHEL 5.2 x86_64 host even though the hardware is
> identical.  We did have RHEL 5.1 x86_64 Oracle hosts running on one
> iface - again, this was noticibly slower than the bonded ifaces
> approach.  This have been upgraded to RHEL 5.2 with the multiple
> ifaces.
>
> > There was a bug in 5.2 where the initiator would think it detected a
> > timeout when it did not. It is fixed in 5.3.
>
> Good.  Then I should expect to see less errors.
>
> > The messages can also occur when there really is a problem with the
> > network or if the target is bogged down.
>
> We have spread the primary volumes across both SM nodes.  The nodes
> are WK23K x86 (no x64 option for the DataCore Software) DELL 2850's.
> There are two switches (one for SM1, one for SM2) that are linked
> using teamed Fibre (2GB sec capacity).  Thus I/O should route evenly
> across both switches.  The SM mirroring takes advantage of the Fibre.
> With the RHEL 5.2 host, you will note that both ifaces are goiing to
> SM2 node, but utilising different NICS on the SM2 node.  These volumes
> are then mirrored to SM1 (except the BACKUP volume, which is a linear
> volume).  We know that the switches aren't congested, but we don't
> accurately know if SM1 or SM2 are congested.  We only have a logical
> spread of volumes presented across multiple NICS to at least try and
> minimise congestion.
>
> > At these times is there lots of disk IO? Is there anything in the target
> > logs?
>
> It is fair to say that all these volumes take a heavy hit, in terms of
> I/O.  Each host (excluding the RHEL 5.3. test host) run two Oracle
> databases, of which some have intra-database replication (Oracle
> Streams) enabled.  The issue on the RHEL 5.2 host occures every 10
> secs or so during Office Hours when it is being utilised.
>
> > So the RHEL5.3 box is having troubles too? There is nothing in the log
> > below.
>
> The error with the RHEL 5.3 host was as follows;
>
> > Mar 11 18:12:03 MYHOST53 iscsid: received iferror -38
> > Mar 11 18:12:03 MYHOST53 last message repeated 2 times
> > Mar 11 18:12:03 MYHOST53 iscsid: connection1:0 is operational now
>
> This looked similar to previous RHEL 5.2 errors.
>
> > Can you replicate this pretty easily? If you just login the session,
> > then let it sit (do not run the db or any disk IO), will you see the
> > ping timeout errors?
>
> I can test this with the RHEL 5.3 host.  Unfortunately, it will be
> difficult to down the RHEL 5.2 host's database services until we have
> a scheduled outage window.
>
> Today, there have been no further errors on RHEL 5.3 host :>).
>
> > It might be helpful to run ethereal/wireshark while you run your test
> > then send the /var/log/messages and trace so I can check and see if the
> > ping is really timing out or not. For the test you only need one session
> > logged in (this will reduce log and trace info), and once you see the
> > first ping timeout error you can stop tracing/logging and send it.
>
> Yes; there is also an Oracle tool (Orion) that we could also use.
>
> I think that I will monitor the RHEL 5.3 host for any further errors.
> If the incidence of errors is reduced, then this gives justification
> to upgrading the RHEL 5.2 host to 5.3.  Such an outage would provide
> me with an opportunity to perform the tests above as well.
>
> Many thanks,
> Richard.
>
> END.
>
> With the RHEL 5.2 host
>
> On Mar 12, 5:53 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
>
>
>
> > bigcatxjs wrote:
>
> > For this RHEL 5.2 setup, does it make a difference if you do not use
> > ifaces and setup the box like in 5.3 below?
>
> > > iscsiadm:
> > > iSCSI Transport Class version 2.0-724
> > > iscsiadm version 2.0-868
> > > Target: iqn.2000-08.com.datacore:sm2-3
> > >    Current Portal: 172.16.200.9:3260,1
> > >    Persistent Portal: 172.16.200.9:3260,1
> > >            **********
> > >            Interface:
> > >            **********
> > >            Iface Name: iface0
> > >            Iface Transport: tcp
> > >            Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de
> > >            Iface IPaddress: 172.16.200.39
> > >            Iface HWaddress: 00:14:22:0d:0a:fa
> > >            Iface Netdev: default
> > >            SID: 1
> > >            iSCSI Connection State: LOGGED IN
> > >            iSCSI Session State: Unknown
> > >            Internal iscsid Session State: NO CHANGE
> > >            ************************
> > >            Negotiated iSCSI params:
> > >            ************************
> > >            HeaderDigest: None
> > >            DataDigest: None
> > >            MaxRecvDataSegmentLength: 131072
> > >            MaxXmitDataSegmentLength: 262144
> > >            FirstBurstLength: 0
> > >            MaxBurstLength: 1048576
> > >            ImmediateData: No
> > >            InitialR2T: Yes
> > >            MaxOutstandingR2T: 1
> > >            ************************
> > >            Attached SCSI devices:
> > >            ************************
> > >            Host Number: 1  State: running
> > >            scsi1 Channel 00 Id 0 Lun: 0
> > >                    Attached scsi disk sdb          State: running
> > >            scsi1 Channel 00 Id 0 Lun: 1
> > >                    Attached scsi disk sde          State: running
> > >            scsi1 Channel 00 Id 0 Lun: 2
> > >                    Attached scsi disk sdf          State: running
> > > Target: iqn.2000-08.com.datacore:sm2-4
> > >    Current Portal: 172.16.200.10:3260,1
> > >    Persistent Portal: 172.16.200.10:3260,1
> > >            **********
> > >            Interface:
> > >            **********
> > >            Iface Name: iface2
> > >            Iface Transport: tcp
> > >            Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de
> > >            Iface IPaddress: 172.16.200.56
> > >            Iface HWaddress: 00:14:22:b1:d6:a6
> > >            Iface Netdev: default
> > >            SID: 2
> > >            iSCSI Connection State: LOGGED IN
> > >            iSCSI Session State: Unknown
> > >            Internal iscsid Session State: NO CHANGE
> > >            ************************
> > >            Negotiated iSCSI params:
> > >            ************************
> > >            HeaderDigest: None
> > >            DataDigest: None
> > >            MaxRecvDataSegmentLength: 131072
> > >            MaxXmitDataSegmentLength: 262144
> > >            FirstBurstLength: 0
> > >            MaxBurstLength: 1048576
> > >            ImmediateData: No
> > >            InitialR2T: Yes
> > >            MaxOutstandingR2T: 1
> > >            ************************
> > >            Attached SCSI devices:
> > >            ************************
> > >            Host Number: 2  State: running
> > >            scsi2 Channel 00 Id 0 Lun: 0
> > >                    Attached scsi disk sdc          State: running
> > >            scsi2 Channel 00 Id 0 Lun: 1
> > >                    Attached scsi disk sdd          State: running
>
> > > Log Errors;co
> > > Mar 12 09:30:48 MYHOST52 last message repeated 2 times
> > > Mar 12 09:30:48 MYHOST52 iscsid: connection2:0 is operational after
> > > recovery (1 attempts)
> > > Mar 12 09:32:52 MYHOST52 kernel: ping timeout of 5 secs expired, last
> > > rx 19592296349, last ping 19592301349, now 19592306349
>
> > There was a bug in 5.2 where the initiator would think it detected a
> > timeout when it did not. It is fixed in 5.3.
>
> > The messages can also occur when there really is a problem with the
> > network or if the target is bogged down.
>
> > At these times is there lots of disk IO? Is there anything in the target
> > logs?
>
> > I am also not sure how well some targets handle bonding plus ifaces. Is
> > iface* using a bonded interface?
>
> > Can you replicate this pretty easily? If you just login the session,
> > then let it sit (do not run the db or any disk IO), will you see the
> > ping timeout errors?
>
> > It might be helpful to run ethereal/wireshark while you run your test
> > then send the /var/log/messages and trace so I can check and see if the
> > ping is really timing out or not. For the test you only need one session
> > logged in (this will reduce log and trace info), and once you see the
> > first ping timeout error you can stop tracing/logging and send it.
>
> > > From RHEL 5.3 x86 Host;
>
> > So the RHEL5.3 box is having troubles too? There is nothing in the log
> > below.
>
> > > iscsiadm;
> > > iSCSI Transport Class version 2.0-724
> > > iscsiadm version 2.0-868
> > > Target: iqn.2000-08.com.datacore:sm2-3
> > >    Current Portal: 172.16.200.9:3260,1
> > >    Persistent Portal: 172.16.200.9:3260,1
> > >            **********
> > >            Interface:
> > >            **********
> > >            Iface Name: default
> > >            Iface Transport: tcp
> > >            Iface Initiatorname: iqn.2005-03.com.redhat:01.406e5fd710e2
> > >            Iface IPaddress: 172.16.200.69
> > >          - Hide quoted text -
>
> - Show quoted text -...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to