Thanks Mike,

> For this RHEL 5.2 setup, does it make a difference if you do not use
> ifaces and setup the box like in 5.3 below?
I have used bonded ifaces so that the I/O requests can be split across
multiple NICS (both Server-side and on the Datacore San Melody SM node
NICS).  This split is acheived by ensuring that the volumes used by
Oracle containing DATA and INDEX datafiles route through one named
Iface and that volumes used by Oracle for SYSTEM, BACKUP, and REDO
data / logs etc route through the other.  We have seen a performance
uplift by maintaining this split despite the time-out issues.  We have
a W2K3 x86_64 STD Oracle host that runs on one iface - this is much
slower than the RHEL 5.2 x86_64 host even though the hardware is
identical.  We did have RHEL 5.1 x86_64 Oracle hosts running on one
iface - again, this was noticibly slower than the bonded ifaces
approach.  This have been upgraded to RHEL 5.2 with the multiple
ifaces.

> There was a bug in 5.2 where the initiator would think it detected a
> timeout when it did not. It is fixed in 5.3.
Good.  Then I should expect to see less errors.

> The messages can also occur when there really is a problem with the
> network or if the target is bogged down.
We have spread the primary volumes across both SM nodes.  The nodes
are WK23K x86 (no x64 option for the DataCore Software) DELL 2850's.
There are two switches (one for SM1, one for SM2) that are linked
using teamed Fibre (2GB sec capacity).  Thus I/O should route evenly
across both switches.  The SM mirroring takes advantage of the Fibre.
With the RHEL 5.2 host, you will note that both ifaces are goiing to
SM2 node, but utilising different NICS on the SM2 node.  These volumes
are then mirrored to SM1 (except the BACKUP volume, which is a linear
volume).  We know that the switches aren't congested, but we don't
accurately know if SM1 or SM2 are congested.  We only have a logical
spread of volumes presented across multiple NICS to at least try and
minimise congestion.

> At these times is there lots of disk IO? Is there anything in the target
> logs?
It is fair to say that all these volumes take a heavy hit, in terms of
I/O.  Each host (excluding the RHEL 5.3. test host) run two Oracle
databases, of which some have intra-database replication (Oracle
Streams) enabled.  The issue on the RHEL 5.2 host occures every 10
secs or so during Office Hours when it is being utilised.

> So the RHEL5.3 box is having troubles too? There is nothing in the log
> below.
The error with the RHEL 5.3 host was as follows;

> Mar 11 18:12:03 MYHOST53 iscsid: received iferror -38
> Mar 11 18:12:03 MYHOST53 last message repeated 2 times
> Mar 11 18:12:03 MYHOST53 iscsid: connection1:0 is operational now

This looked similar to previous RHEL 5.2 errors.

> Can you replicate this pretty easily? If you just login the session,
> then let it sit (do not run the db or any disk IO), will you see the
> ping timeout errors?
I can test this with the RHEL 5.3 host.  Unfortunately, it will be
difficult to down the RHEL 5.2 host's database services until we have
a scheduled outage window.

Today, there have been no further errors on RHEL 5.3 host :>).

> It might be helpful to run ethereal/wireshark while you run your test
> then send the /var/log/messages and trace so I can check and see if the
> ping is really timing out or not. For the test you only need one session
> logged in (this will reduce log and trace info), and once you see the
> first ping timeout error you can stop tracing/logging and send it.
Yes; there is also an Oracle tool (Orion) that we could also use.

I think that I will monitor the RHEL 5.3 host for any further errors.
If the incidence of errors is reduced, then this gives justification
to upgrading the RHEL 5.2 host to 5.3.  Such an outage would provide
me with an opportunity to perform the tests above as well.



Many thanks,
Richard.

END.

With the RHEL 5.2 host

On Mar 12, 5:53 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
> bigcatxjs wrote:
>
> For this RHEL 5.2 setup, does it make a difference if you do not use
> ifaces and setup the box like in 5.3 below?
>
>
>
>
>
> > iscsiadm:
> > iSCSI Transport Class version 2.0-724
> > iscsiadm version 2.0-868
> > Target: iqn.2000-08.com.datacore:sm2-3
> >    Current Portal: 172.16.200.9:3260,1
> >    Persistent Portal: 172.16.200.9:3260,1
> >            **********
> >            Interface:
> >            **********
> >            Iface Name: iface0
> >            Iface Transport: tcp
> >            Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de
> >            Iface IPaddress: 172.16.200.39
> >            Iface HWaddress: 00:14:22:0d:0a:fa
> >            Iface Netdev: default
> >            SID: 1
> >            iSCSI Connection State: LOGGED IN
> >            iSCSI Session State: Unknown
> >            Internal iscsid Session State: NO CHANGE
> >            ************************
> >            Negotiated iSCSI params:
> >            ************************
> >            HeaderDigest: None
> >            DataDigest: None
> >            MaxRecvDataSegmentLength: 131072
> >            MaxXmitDataSegmentLength: 262144
> >            FirstBurstLength: 0
> >            MaxBurstLength: 1048576
> >            ImmediateData: No
> >            InitialR2T: Yes
> >            MaxOutstandingR2T: 1
> >            ************************
> >            Attached SCSI devices:
> >            ************************
> >            Host Number: 1  State: running
> >            scsi1 Channel 00 Id 0 Lun: 0
> >                    Attached scsi disk sdb          State: running
> >            scsi1 Channel 00 Id 0 Lun: 1
> >                    Attached scsi disk sde          State: running
> >            scsi1 Channel 00 Id 0 Lun: 2
> >                    Attached scsi disk sdf          State: running
> > Target: iqn.2000-08.com.datacore:sm2-4
> >    Current Portal: 172.16.200.10:3260,1
> >    Persistent Portal: 172.16.200.10:3260,1
> >            **********
> >            Interface:
> >            **********
> >            Iface Name: iface2
> >            Iface Transport: tcp
> >            Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de
> >            Iface IPaddress: 172.16.200.56
> >            Iface HWaddress: 00:14:22:b1:d6:a6
> >            Iface Netdev: default
> >            SID: 2
> >            iSCSI Connection State: LOGGED IN
> >            iSCSI Session State: Unknown
> >            Internal iscsid Session State: NO CHANGE
> >            ************************
> >            Negotiated iSCSI params:
> >            ************************
> >            HeaderDigest: None
> >            DataDigest: None
> >            MaxRecvDataSegmentLength: 131072
> >            MaxXmitDataSegmentLength: 262144
> >            FirstBurstLength: 0
> >            MaxBurstLength: 1048576
> >            ImmediateData: No
> >            InitialR2T: Yes
> >            MaxOutstandingR2T: 1
> >            ************************
> >            Attached SCSI devices:
> >            ************************
> >            Host Number: 2  State: running
> >            scsi2 Channel 00 Id 0 Lun: 0
> >                    Attached scsi disk sdc          State: running
> >            scsi2 Channel 00 Id 0 Lun: 1
> >                    Attached scsi disk sdd          State: running
>
> > Log Errors;co
> > Mar 12 09:30:48 MYHOST52 last message repeated 2 times
> > Mar 12 09:30:48 MYHOST52 iscsid: connection2:0 is operational after
> > recovery (1 attempts)
> > Mar 12 09:32:52 MYHOST52 kernel: ping timeout of 5 secs expired, last
> > rx 19592296349, last ping 19592301349, now 19592306349
>
> There was a bug in 5.2 where the initiator would think it detected a
> timeout when it did not. It is fixed in 5.3.
>
> The messages can also occur when there really is a problem with the
> network or if the target is bogged down.
>
> At these times is there lots of disk IO? Is there anything in the target
> logs?
>
> I am also not sure how well some targets handle bonding plus ifaces. Is
> iface* using a bonded interface?
>
> Can you replicate this pretty easily? If you just login the session,
> then let it sit (do not run the db or any disk IO), will you see the
> ping timeout errors?
>
> It might be helpful to run ethereal/wireshark while you run your test
> then send the /var/log/messages and trace so I can check and see if the
> ping is really timing out or not. For the test you only need one session
> logged in (this will reduce log and trace info), and once you see the
> first ping timeout error you can stop tracing/logging and send it.
>
>
>
> > From RHEL 5.3 x86 Host;
>
> So the RHEL5.3 box is having troubles too? There is nothing in the log
> below.
>
>
>
>
>
> > iscsiadm;
> > iSCSI Transport Class version 2.0-724
> > iscsiadm version 2.0-868
> > Target: iqn.2000-08.com.datacore:sm2-3
> >    Current Portal: 172.16.200.9:3260,1
> >    Persistent Portal: 172.16.200.9:3260,1
> >            **********
> >            Interface:
> >            **********
> >            Iface Name: default
> >            Iface Transport: tcp
> >            Iface Initiatorname: iqn.2005-03.com.redhat:01.406e5fd710e2
> >            Iface IPaddress: 172.16.200.69
> >            Iface HWaddress: default
> >            Iface Netdev: default
> >            SID: 1
> >            iSCSI Connection State: LOGGED IN
> >            iSCSI Session State: Unknown
> >            Internal iscsid Session State: NO CHANGE
> >            ************************
> >            Negotiated iSCSI params:
> >            ************************
> >            HeaderDigest: None
> >            DataDigest: None
> >            MaxRecvDataSegmentLength: 131072
> >            MaxXmitDataSegmentLength: 262144
> >            FirstBurstLength: 0
> >            MaxBurstLength: 1048576
> >            ImmediateData: No
> >            InitialR2T: Yes
> >            MaxOutstandingR2T: 1
> >            ************************
> >            Attached SCSI devices:
> >            ************************
> >            Host Number: 2  State: running
> >            scsi2 Channel 00 Id 0 Lun: 0
> >                    Attached scsi disk sdc          State: running
>
> > Log Errors;
> > Mar 11 18:12:03 MYHOST53 kernel: md: Autodetecting RAID arrays.
> > Mar 11 18:12:03 MYHOST53 kernel: md: autorun ...
> > Mar 11 18:12:03 MYHOST53 kernel: md: ... autorun DONE.
> > Mar 11 18:12:03 MYHOST53 kernel: device-mapper: multipath: version
> > 1.0.5 loaded
> > Mar 11 18:12:03 MYHOST53 kernel: EXT3 FS on dm-0, internal journal
> > Mar 11 18:12:03 MYHOST53 kernel: kjournald starting.  Commit interval
> > 5 seconds
> > Mar 11 18:12:03 MYHOST53 kernel: EXT3 FS on sda1, internal journal
> > Mar 11 18:12:03 MYHOST53 kernel: EXT3-fs: mounted filesystem with
> > ordered data mode.
> > Mar 11 18:12:03 MYHOST53 kernel: Adding 2031608k swap on /dev/
> > VolGroup00/LogVol01.  Priority:-1 extents:1 across:2031608k
> > Mar 11 18:12:03 MYHOST53 kernel: IA-32 Microcode Update Driver: v1.14a
> > <tig...@veritas.com>
> > Mar 11 18:12:03 MYHOST53 kernel: microcode: CPU1 updated from revision
> > 0x7 to 0xc, date = 04212005
> > Mar 11 18:12:03 MYHOST53 kernel: microcode: CPU0 updated from revision
> > 0x7 to 0xc, date = 04212005
> > Mar 11 18:12:03 MYHOST53 kernel: Loading iSCSI transport class
> > v2.0-724.
> > Mar 11 18:12:03 MYHOST53 kernel: iscsi: registered transport (tcp)
> > Mar 11 18:12:03 MYHOST53 kernel: iscsi: registered transport (iser)
> > Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_UP): eth0: link is
> > not ready
> > Mar 11 18:12:03 MYHOST53 kernel: e1000: eth0: e1000_watchdog_task: NIC
> > Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> > Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link
> > becomes ready
> > Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_UP): eth1: link is
> > not ready
> > Mar 11 18:12:03 MYHOST53 kernel: e1000: eth1: e1000_watchdog_task: NIC
> > Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> > Mar 11 18:12:03 MYHOST53 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link
> > becomes ready
> > Mar 11 18:12:03 MYHOST53 kernel: scsi2 : iSCSI Initiator over TCP/IP
> > Mar 11 18:12:03 MYHOST53 kernel:   Vendor: DataCore  Model:
> > SANmelody         Rev: DCS
> > Mar 11 18:12:03 MYHOST53 kernel:   Type:   Direct-
> > Access                      ANSI SCSI revision: 04
> > Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: 41943040 512-byte
> > hdwr sectors (21475 MB)
> > Mar 11 18:12:03 MYHOST53 kernel: sdc: Write Protect is off
> > Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: drive cache: write
> > back w/ FUA
> > Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: 41943040 512-byte
> > hdwr sectors (21475 MB)
> > Mar 11 18:12:03 MYHOST53 kernel: sdc: Write Protect is off
> > Mar 11 18:12:03 MYHOST53 kernel: SCSI device sdc: drive cache: write
> > back w/ FUA
> > Mar 11 18:12:03 MYHOST53 kernel:  sdc: sdc1
> > Mar 11 18:12:03 MYHOST53 kernel: sd 2:0:0:0: Attached scsi disk sdc
> > Mar 11 18:12:03 MYHOST53 kernel: sd 2:0:0:0: Attached scsi generic sg2
> > type 0
> > Mar 11 18:12:03 MYHOST53 rpc.statd[2160]: Version 1.0.9 Starting
> > Mar 11 18:12:03 MYHOST53 iscsid: received iferror -38
> > Mar 11 18:12:03 MYHOST53 last message repeated 2 times
> > Mar 11 18:12:03 MYHOST53 iscsid: connection1:0 is operational now
> > Mar 11 18:12:04 MYHOST53 kdump: kexec: loaded kdump kernel
> > Mar 11 18:12:04 MYHOST53 kdump: started up
> > Mar 11 18:12:04 MYHOST53 kernel: symev_rh_ES_5_2.6.18_53.el5_i686:
> > module license 'Proprietary' taints kernel.
> > Mar 11 18:12:04 MYHOST53 symev: loaded (symev-rh-ES-5-2.6.18-53.el5-
> > i686.ko)
> > Mar 11 18:12:04 MYHOST53 symap: loaded (symap-rh-ES-5-2.6.18-53.el5-
> > i686.ko)
>
> > END
>
> > Any help / suggestions gratefully received.  I can change the config
> > of the RHEL 5.3 x86 host on demand, but not the RHEL 5.2 x86_64 host
> > (prod box).
>
> > Many thanks,
> > Rich.- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to