Devs, We seem to having an issue with the time to failover over iSCSI. The end goal here being to force a failover within 10 seconds to an alternate path as defined by dm-multipath.
Distro: CentOS Kernel version: 2.6.29.5 dm-multipath version: device-mapper-multipath-0.4.7-17.el5 iscsid version: iscsi-initiator-utils-6.2.0.868-0.7.el5 We have dm-multipath installed and configured with the following configurations: udev_dir /dev polling_interval 3 selector "round-robin 0" path_grouping_policy failover getuid_callout "/sbin/scsi_id -g -u -s / block/%n" prio_callout /bin/true path_checker tur rr_min_io 10 max_fds 8192 rr_weight uniform failback manual no_path_retry fail user_friendly_names yes We have also modified scsi PDU timeout: ACTION=="add", SUBSYSTEM=="scsi" , SYSFS{type}=="0|7| 14", \ RUN+="/bin/sh -c 'echo 60 > /sys$ $DEVPATH/timeout'" in the /etc/udev/rules.d/50-udev.rules. We have also modified some parameters in /etc/iscsi/iscsi.conf: node.session.timeo.replacement_timeout = 5 node.conn[0].timeo.login_timeout = 5 node.conn[0].timeo.logout_timeout = 5 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 10 Given the above configuration the failover takes place in 2 minutes. After reading a few posts on this group, I did try to change the scsi PDU timeout & the node.session.timeo.replacement_timeout, but still that didn't change the failover time. However, if I modify the scsi PDU timout to 3, the node.conn [0].timeo.noop_out_interval = 1 & node.conn[0].timeo.noop_out_timeout = 2, we seem to get a failover in about 65 seconds (thats still too long for our purposes) /etc/iscsi/iscsi.conf: node.session.timeo.replacement_timeout = 120 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 1 node.conn[0].timeo.noop_out_timeout = 2 /etc/udev/rules.d/50-udev.rules: ACTION=="add", SUBSYSTEM=="scsi" , SYSFS{type}=="0|7| 14", \ RUN+="/bin/sh -c 'echo 60 > /sys$ $DEVPATH/timeout'" Not sure why these values actually cause a difference in the failover time, but apparently changing any other parameter doesn't really help. /var/log/messages: When the cable (power) is pulled from the primary: Jul 8 15:38:23 cschi-mbxdsg-0226.cleversafelabs.com kernel: connection2:0: ping timeout of 2 secs expired, last rx 4295169719, last ping 4295170719, now 4295172719 Jul 8 15:38:23 cschi-mbxdsg-0226.cleversafelabs.com kernel: connection2:0: detected conn error (1011) Jul 8 15:38:24 cschi-mbxdsg-0226.cleversafelabs.com iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) . . No messages for a bit and then: (the failover occurs at this point) Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: sd 2:0:0:0: timing out command, waited 18s Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: sd 2:0:0:0: [sdb] Unhandled error code Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: end_request: I/O error, dev sdb, sector 1599400 Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: device-mapper: multipath: Failing path 8:16. Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: sd 2:0:0:0: timing out command, waited 18s Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: sd 2:0:0:0: [sdb] Unhandled error code Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: sd 2:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Jul 8 15:39:35 cschi-mbxdsg-0226.cleversafelabs.com kernel: end_request: I/O error, dev sdb, sector 1600424 Any clues on how we can reduce this failover time would be appreciated. -- Akshay Lal --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---