Re: connection, host resets, I/O errors eventually (DRBD, but not only)
Tomasz Chmielewski wrote: Mike Christie schrieb: The scsi layer sets a timeout on each command. If it does not execute in X seconds, it will run the iscsi eh. So you can increase the scsi command time: To modify the udev rule open /etc/udev/rules.d/50-udev.rules, and find the following lines: ACTION==add, SUBSYSTEM==scsi , SYSFS{type}==0|7|14, \ RUN+=/bin/sh -c 'echo 60 /sys$$DEVPATH/timeout' and you probably want to decrease the number of oustanding commands by setting the node.session.cmds_max for that session. With 50 kB/s you might as well set this to 1 command. This helps a bit, but after some time, something weird happens. I increased the timeout to 240 seconds. The data flows fine for some time, but after a couple of minutes, every program running on that initiator machine seems to freeze (i.e. ping stops to ping, top stops to refresh the data, they can't be interrupted / won't exit with ctrl+c). There is no traffic any more between the target and the initiator. The machine is a bit alive, as it replies to pings and responds to sysrq magic, and I can switch VTs (ctrl+alt+F1...). The machine has its root filesystem accessible via iSCSI (via fast LAN, to a different target) which can somehow contribute to the problem? It runs a 2.6.22 kernel. Some bad interaction if the initiator is connected to two targets with different IPs, and connection to one target is very slow? There should not be. Each session/connection to the target is going to get its own threads for sending IO. The receiving is done in the network softirq and cannot sleep or dominate the use. Did you set the queue limit lower too? If so did you do it globally (set it in iscsid.conf and discovery the targets) or did you run it for a specific sesssion (run iscsiadm -m node -T target -p ip:port -o update -n ..)? Maybe if you did it globally the lower queue depth is slowing the IO execution and affecting the apps. This is probably not the case though. I only know things like a big database not like its IO slowed down and I do not think other apps would notice the slow down as long as IO completes. Or were there any iscsi or IO messages in the logs? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection, host resets, I/O errors eventually (DRBD, but not only)
Mike Christie schrieb: The scsi layer sets a timeout on each command. If it does not execute in X seconds, it will run the iscsi eh. So you can increase the scsi command time: To modify the udev rule open /etc/udev/rules.d/50-udev.rules, and find the following lines: ACTION==add, SUBSYSTEM==scsi , SYSFS{type}==0|7|14, \ RUN+=/bin/sh -c 'echo 60 /sys$$DEVPATH/timeout' and you probably want to decrease the number of oustanding commands by setting the node.session.cmds_max for that session. With 50 kB/s you might as well set this to 1 command. This helps a bit, but after some time, something weird happens. I increased the timeout to 240 seconds. The data flows fine for some time, but after a couple of minutes, every program running on that initiator machine seems to freeze (i.e. ping stops to ping, top stops to refresh the data, they can't be interrupted / won't exit with ctrl+c). There is no traffic any more between the target and the initiator. The machine is a bit alive, as it replies to pings and responds to sysrq magic, and I can switch VTs (ctrl+alt+F1...). The machine has its root filesystem accessible via iSCSI (via fast LAN, to a different target) which can somehow contribute to the problem? It runs a 2.6.22 kernel. Some bad interaction if the initiator is connected to two targets with different IPs, and connection to one target is very slow? No such phenomenon on a machine with rootfs on SATA, where everything works fine. -- Tomasz Chmielewski http://wpkg.org --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection, host resets, I/O errors eventually (DRBD, but not only)
On Thu, Jan 8, 2009 at 12:44 PM, Tomasz Chmielewski man...@wpkg.org wrote: Anyone using iSCSI over DRBD? And a slow internet link perhaps? How reliable is the link you are using -- which percentage of packets is lost ? You can test this e.g. with the ping command. The following command will generate about 32 KB/s of network traffic and reports the percentage of lost packets: # ping -q -i 0.01 -c1000 -s160 ${remote_ip} PING 192.168.1.102 (192.168.1.102) 160(188) bytes of data. --- 192.168.1.102 ping statistics --- 1000 packets transmitted, 1000 received, 0% packet loss, time 8997ms rtt min/avg/max/mdev = 0.000/0.048/0.474/0.027 ms Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection, host resets, I/O errors eventually (DRBD, but not only)
On Fri, Jan 9, 2009 at 3:22 PM, Tomasz Chmielewski man...@wpkg.org wrote: Bart Van Assche schrieb: # ping -q -i 0.01 -c1000 -s160 ${remote_ip} I get about 1% losses. IMHO running iSCSI over a slow link should work, but a packet loss of 1% is troublesome. On a local network the packet loss rate is about 0.001% (1e-5) for 1000-byte packets. Bart. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection, host resets, I/O errors eventually (DRBD, but not only)
Bart Van Assche schrieb: On Fri, Jan 9, 2009 at 3:22 PM, Tomasz Chmielewski man...@wpkg.org wrote: Bart Van Assche schrieb: # ping -q -i 0.01 -c1000 -s160 ${remote_ip} I get about 1% losses. IMHO running iSCSI over a slow link should work, but a packet loss of 1% is troublesome. On a local network the packet loss rate is about 0.001% (1e-5) for 1000-byte packets. It's not really running iSCSI over a slow link in this case. DRBD synchronizes two block devices, over a slow link in this case: P - primary node, accessed by the target, accessed by the initiator S - secondary node / synchronized area U - unsynchronized area PPP slow link SSS Slow link is used to transfer data for unsynchronized area. Now, if the initiator begins to write data, DRBD has to transfer it to the secondary node before the write is completed: writes flow over a slow link and compete with background synchronization in the meantime. As a result, we can say that iSCSI is running over a slow link. Mike's suggestion help though - increasing timeouts and decreasing the number of outstanding commands help here. One more note - I see such connection/host resets from time to time also when using a gigabit ethernet and a very loaded target (no I/O errors though, everything recovers on time with the default values). -- Tomasz Chmielewski http://wpkg.org --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
connection, host resets, I/O errors eventually (DRBD, but not only)
Anyone using iSCSI over DRBD? And a slow internet link perhaps? If yes, you are likely to see connection errors, host resets, and eventually, I/O errors reported, for example: Jan 7 21:47:09 vmware1 kernel: connection23:0: iscsi: detected conn error (1011) Jan 7 21:47:10 vmware1 kernel: iscsi: host reset succeeded Jan 7 21:47:50 vmware1 kernel: connection23:0: iscsi: detected conn error (1011) Jan 7 21:47:50 vmware1 kernel: iscsi: host reset succeeded Jan 7 21:48:00 vmware1 kernel: sd 22:0:0:1: SCSI error: return code = 0x0002 Jan 7 21:48:00 vmware1 kernel: end_request: I/O error, dev sdw, sector 1494720 Jan 7 21:48:00 vmware1 kernel: Buffer I/O error on device sdw, logical block 186840 Jan 7 21:48:00 vmware1 kernel: lost page write due to I/O error on sdw This is due to the fact that open-iscsi doesn't seem to like low-speed (but stable) connections to the target. To reproduce: 1) set up a connection with limited speed between the target and the initiator, for example, with openvpn, one would use --shaper 5 option to limit the speed to 50 kB/s. 2) login the target to the initiator over this connection (can be also in LAN) 3) start reading and writing... after some time you will be seeing connection errors and host resets followed by I/O errors, possibly data corruption A harder way to reproduce (but somehow more realistic) would be to set up DRBD, start background synchronization at high speed (thus leaving not much bandwidth for normal writes), start reading and writing... I can reproduce it with tgtd and IET, so I guess open-iscsi is to be blamed. Ideas what's wrong and why it fails? -- Tomasz Chmielewski http://wpkg.org --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection, host resets, I/O errors eventually (DRBD, but not only)
Tomasz Chmielewski wrote: Anyone using iSCSI over DRBD? And a slow internet link perhaps? If yes, you are likely to see connection errors, host resets, and eventually, I/O errors reported, for example: Jan 7 21:47:09 vmware1 kernel: connection23:0: iscsi: detected conn error (1011) Jan 7 21:47:10 vmware1 kernel: iscsi: host reset succeeded Jan 7 21:47:50 vmware1 kernel: connection23:0: iscsi: detected conn error (1011) Jan 7 21:47:50 vmware1 kernel: iscsi: host reset succeeded Jan 7 21:48:00 vmware1 kernel: sd 22:0:0:1: SCSI error: return code = 0x0002 Jan 7 21:48:00 vmware1 kernel: end_request: I/O error, dev sdw, sector 1494720 Jan 7 21:48:00 vmware1 kernel: Buffer I/O error on device sdw, logical block 186840 Jan 7 21:48:00 vmware1 kernel: lost page write due to I/O error on sdw This is due to the fact that open-iscsi doesn't seem to like low-speed (but stable) connections to the target. To reproduce: 1) set up a connection with limited speed between the target and the initiator, for example, with openvpn, one would use --shaper 5 option to limit the speed to 50 kB/s. 2) login the target to the initiator over this connection (can be also in LAN) 3) start reading and writing... after some time you will be seeing connection errors and host resets followed by I/O errors, possibly data corruption A harder way to reproduce (but somehow more realistic) would be to set up DRBD, start background synchronization at high speed (thus leaving not much bandwidth for normal writes), start reading and writing... I can reproduce it with tgtd and IET, so I guess open-iscsi is to be blamed. Ideas what's wrong and why it fails? The scsi layer sets a timeout on each command. If it does not execute in X seconds, it will run the iscsi eh. So you can increase the scsi command time: To modify the udev rule open /etc/udev/rules.d/50-udev.rules, and find the following lines: ACTION==add, SUBSYSTEM==scsi , SYSFS{type}==0|7|14, \ RUN+=/bin/sh -c 'echo 60 /sys$$DEVPATH/timeout' and you probably want to decrease the number of oustanding commands by setting the node.session.cmds_max for that session. With 50 kB/s you might as well set this to 1 command. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---