bigcatxjs wrote: > UPDATE: RHEL 5.3 Host is showing errors. No Disk I/O to SAN volume > (last I/O Thursday 12th March); >
Is there anything in the log before this? Something about a ping or nop timing out? > Mar 13 10:38:49 MYHOST53 kernel: connection1:0: iscsi: detected conn > error (1011) > Mar 13 10:38:49 MYHOST53 iscsid: Kernel reported iSCSI connection 1:0 > error (1011) state (3) > Mar 13 10:38:52 MYHOST53 iscsid: received iferror -38 > Mar 13 10:38:52 MYHOST53 last message repeated 2 times > Mar 13 10:38:52 MYHOST53 iscsid: connection1:0 is operational after > recovery (1 attempts) > Mar 13 11:00:06 MYHOST53 kernel: connection1:0: iscsi: detected conn > error (1011) > Mar 13 11:00:06 MYHOST53 iscsid: Kernel reported iSCSI connection 1:0 > error (1011) state (3) > Mar 13 11:00:09 MYHOST53 iscsid: received iferror -38 > Mar 13 11:00:09 MYHOST53 last message repeated 2 times > Mar 13 11:00:09 MYHOST53 iscsid: connection1:0 is operational after > recovery (1 attempts) > > Thanks, Rich. > > END. > > On Mar 13, 10:01 am, bigcatxjs <ad...@richardjamestrading.co.uk> > wrote: >> Thanks Mike, >> >>> For this RHEL 5.2 setup, does it make a difference if you do not use >>> ifaces and setup the box like in 5.3 below? >> I have used bonded ifaces so that the I/O requests can be split across >> multiple NICS (both Server-side and on the Datacore San Melody SM node >> NICS). This split is acheived by ensuring that the volumes used by >> Oracle containing DATA and INDEX datafiles route through one named >> Iface and that volumes used by Oracle for SYSTEM, BACKUP, and REDO >> data / logs etc route through the other. We have seen a performance >> uplift by maintaining this split despite the time-out issues. We have >> a W2K3 x86_64 STD Oracle host that runs on one iface - this is much >> slower than the RHEL 5.2 x86_64 host even though the hardware is >> identical. We did have RHEL 5.1 x86_64 Oracle hosts running on one >> iface - again, this was noticibly slower than the bonded ifaces >> approach. This have been upgraded to RHEL 5.2 with the multiple >> ifaces. >> >>> There was a bug in 5.2 where the initiator would think it detected a >>> timeout when it did not. It is fixed in 5.3. >> Good. Then I should expect to see less errors. >> >>> The messages can also occur when there really is a problem with the >>> network or if the target is bogged down. >> We have spread the primary volumes across both SM nodes. The nodes >> are WK23K x86 (no x64 option for the DataCore Software) DELL 2850's. >> There are two switches (one for SM1, one for SM2) that are linked >> using teamed Fibre (2GB sec capacity). Thus I/O should route evenly >> across both switches. The SM mirroring takes advantage of the Fibre. >> With the RHEL 5.2 host, you will note that both ifaces are goiing to >> SM2 node, but utilising different NICS on the SM2 node. These volumes >> are then mirrored to SM1 (except the BACKUP volume, which is a linear >> volume). We know that the switches aren't congested, but we don't >> accurately know if SM1 or SM2 are congested. We only have a logical >> spread of volumes presented across multiple NICS to at least try and >> minimise congestion. >> >>> At these times is there lots of disk IO? Is there anything in the target >>> logs? >> It is fair to say that all these volumes take a heavy hit, in terms of >> I/O. Each host (excluding the RHEL 5.3. test host) run two Oracle >> databases, of which some have intra-database replication (Oracle >> Streams) enabled. The issue on the RHEL 5.2 host occures every 10 >> secs or so during Office Hours when it is being utilised. >> >>> So the RHEL5.3 box is having troubles too? There is nothing in the log >>> below. >> The error with the RHEL 5.3 host was as follows; >> >>> Mar 11 18:12:03 MYHOST53 iscsid: received iferror -38 >>> Mar 11 18:12:03 MYHOST53 last message repeated 2 times >>> Mar 11 18:12:03 MYHOST53 iscsid: connection1:0 is operational now >> This looked similar to previous RHEL 5.2 errors. >> >>> Can you replicate this pretty easily? If you just login the session, >>> then let it sit (do not run the db or any disk IO), will you see the >>> ping timeout errors? >> I can test this with the RHEL 5.3 host. Unfortunately, it will be >> difficult to down the RHEL 5.2 host's database services until we have >> a scheduled outage window. >> >> Today, there have been no further errors on RHEL 5.3 host :>). >> >>> It might be helpful to run ethereal/wireshark while you run your test >>> then send the /var/log/messages and trace so I can check and see if the >>> ping is really timing out or not. For the test you only need one session >>> logged in (this will reduce log and trace info), and once you see the >>> first ping timeout error you can stop tracing/logging and send it. >> Yes; there is also an Oracle tool (Orion) that we could also use. >> >> I think that I will monitor the RHEL 5.3 host for any further errors. >> If the incidence of errors is reduced, then this gives justification >> to upgrading the RHEL 5.2 host to 5.3. Such an outage would provide >> me with an opportunity to perform the tests above as well. >> >> Many thanks, >> Richard. >> >> END. >> >> With the RHEL 5.2 host >> >> On Mar 12, 5:53 pm, Mike Christie <micha...@cs.wisc.edu> wrote: >> >> >> >>> bigcatxjs wrote: >>> For this RHEL 5.2 setup, does it make a difference if you do not use >>> ifaces and setup the box like in 5.3 below? >>>> iscsiadm: >>>> iSCSI Transport Class version 2.0-724 >>>> iscsiadm version 2.0-868 >>>> Target: iqn.2000-08.com.datacore:sm2-3 >>>> Current Portal: 172.16.200.9:3260,1 >>>> Persistent Portal: 172.16.200.9:3260,1 >>>> ********** >>>> Interface: >>>> ********** >>>> Iface Name: iface0 >>>> Iface Transport: tcp >>>> Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de >>>> Iface IPaddress: 172.16.200.39 >>>> Iface HWaddress: 00:14:22:0d:0a:fa >>>> Iface Netdev: default >>>> SID: 1 >>>> iSCSI Connection State: LOGGED IN >>>> iSCSI Session State: Unknown >>>> Internal iscsid Session State: NO CHANGE >>>> ************************ >>>> Negotiated iSCSI params: >>>> ************************ >>>> HeaderDigest: None >>>> DataDigest: None >>>> MaxRecvDataSegmentLength: 131072 >>>> MaxXmitDataSegmentLength: 262144 >>>> FirstBurstLength: 0 >>>> MaxBurstLength: 1048576 >>>> ImmediateData: No >>>> InitialR2T: Yes >>>> MaxOutstandingR2T: 1 >>>> ************************ >>>> Attached SCSI devices: >>>> ************************ >>>> Host Number: 1 State: running >>>> scsi1 Channel 00 Id 0 Lun: 0 >>>> Attached scsi disk sdb State: running >>>> scsi1 Channel 00 Id 0 Lun: 1 >>>> Attached scsi disk sde State: running >>>> scsi1 Channel 00 Id 0 Lun: 2 >>>> Attached scsi disk sdf State: running >>>> Target: iqn.2000-08.com.datacore:sm2-4 >>>> Current Portal: 172.16.200.10:3260,1 >>>> Persistent Portal: 172.16.200.10:3260,1 >>>> ********** >>>> Interface: >>>> ********** >>>> Iface Name: iface2 >>>> Iface Transport: tcp >>>> Iface Initiatorname: iqn.1994-05.com.redhat:7fe2f44ea9de >>>> Iface IPaddress: 172.16.200.56 >>>> Iface HWaddress: 00:14:22:b1:d6:a6 >>>> Iface Netdev: default >>>> SID: 2 >>>> iSCSI Connection State: LOGGED IN >>>> iSCSI Session State: Unknown >>>> Internal iscsid Session State: NO CHANGE >>>> ************************ >>>> Negotiated iSCSI params: >>>> ************************ >>>> HeaderDigest: None >>>> DataDigest: None >>>> MaxRecvDataSegmentLength: 131072 >>>> MaxXmitDataSegmentLength: 262144 >>>> FirstBurstLength: 0 >>>> MaxBurstLength: 1048576 >>>> ImmediateData: No >>>> InitialR2T: Yes >>>> MaxOutstandingR2T: 1 >>>> ************************ >>>> Attached SCSI devices: >>>> ************************ >>>> Host Number: 2 State: running >>>> scsi2 Channel 00 Id 0 Lun: 0 >>>> Attached scsi disk sdc State: running >>>> scsi2 Channel 00 Id 0 Lun: 1 >>>> Attached scsi disk sdd State: running >>>> Log Errors;co >>>> Mar 12 09:30:48 MYHOST52 last message repeated 2 times >>>> Mar 12 09:30:48 MYHOST52 iscsid: connection2:0 is operational after >>>> recovery (1 attempts) >>>> Mar 12 09:32:52 MYHOST52 kernel: ping timeout of 5 secs expired, last >>>> rx 19592296349, last ping 19592301349, now 19592306349 >>> There was a bug in 5.2 where the initiator would think it detected a >>> timeout when it did not. It is fixed in 5.3. >>> The messages can also occur when there really is a problem with the >>> network or if the target is bogged down. >>> At these times is there lots of disk IO? Is there anything in the target >>> logs? >>> I am also not sure how well some targets handle bonding plus ifaces. Is >>> iface* using a bonded interface? >>> Can you replicate this pretty easily? If you just login the session, >>> then let it sit (do not run the db or any disk IO), will you see the >>> ping timeout errors? >>> It might be helpful to run ethereal/wireshark while you run your test >>> then send the /var/log/messages and trace so I can check and see if the >>> ping is really timing out or not. For the test you only need one session >>> logged in (this will reduce log and trace info), and once you see the >>> first ping timeout error you can stop tracing/logging and send it. >>>> From RHEL 5.3 x86 Host; >>> So the RHEL5.3 box is having troubles too? There is nothing in the log >>> below. >>>> iscsiadm; >>>> iSCSI Transport Class version 2.0-724 >>>> iscsiadm version 2.0-868 >>>> Target: iqn.2000-08.com.datacore:sm2-3 >>>> Current Portal: 172.16.200.9:3260,1 >>>> Persistent Portal: 172.16.200.9:3260,1 >>>> ********** >>>> Interface: >>>> ********** >>>> Iface Name: default >>>> Iface Transport: tcp >>>> Iface Initiatorname: iqn.2005-03.com.redhat:01.406e5fd710e2 >>>> Iface IPaddress: 172.16.200.69 >>>> - Hide quoted text - >> - Show quoted text -... >> >> read more ยป > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---