Re: connection, host resets, I/O errors eventually (DRBD, but not only)

Tomasz Chmielewski Mon, 12 Jan 2009 03:46:50 -0800

Mike Christie schrieb:

>> The machine has its root filesystem accessible via iSCSI (via fast LAN,
>> to a different target) which can somehow contribute to the problem? It 
>> runs a 2.6.22 kernel.
>> Some bad interaction if the initiator is connected to two targets with
>> different IPs, and connection to one target is very slow?
>>
> 
> There should not be. Each session/connection to the target is going to 
> get its own threads for sending IO. The receiving is done in the network 
> softirq and cannot sleep or dominate the use.
> 
> Did you set the queue limit lower too? If so did you do it globally (set 
> it in iscsid.conf and discovery the targets) or did you run it for a 
> specific sesssion (run iscsiadm -m node -T target -p ip:port -o update 
> -n ......)? Maybe if you did it globally the lower queue depth is 
> slowing the IO execution and affecting the apps. This is probably not 
> the case though. I only know things like a big database not like its IO 
> slowed down and I do not think other apps would notice the slow down as 
> long as IO completes.
> 
> Or were there any iscsi or IO messages in the logs?


I set the limit per host.


In fact, now I only changed the timeout in /sys (the one settable via 
the udev rule you gave) to 720, with:

node.session.timeo.replacement_timeout = 1000000

and other settings being default.

However, since there are no disconnections involved, it shouldn't make 
any difference, am I correct?


So with timeout set to 720 in /sys/... and replacement timeout set very 
high, I still see this problem.


One thing I'm not sure I mentioned clearly:

rootfs is on iscsi as well, and is mounted in initrd via iscsistart. 
Does it make a difference here? Is this connection still "managed" by 
iscsid, even if it starts later in the boot process?


Some more details:

- rootfs is connected to the target using 100 Mbit LAN
- another target/IP is connected via openvpn with "--shaper 50000" 
option to limit the bandwidth to about 50 kB/s
- I do lots of writes and reads to the ext3 filesystem from the target 
with limited bandwidth
- in the meantime, I ping the target using its VPN IP address and a real 
IP address
- after 20-30 minutes I can see ping saying "sendmsg: No buffer space 
available" - the one that pings the VPN IP; it can't be interrupted with 
ctrl+c (in in D+ state, as ps indicates)
- another ping is still pinging
- all I/O activity to the target behind VPN is stopped/frozen
- I see occasional I/O activity to the rootfs target (connected via LAN)
- the last syslog entry is
iscsid: Nop-out timedout after 15 seconds on connection 2:0 state (3). 
Dropping session.
- session is not re-established, as VPN tunnel does not work for some reason
- any command from rootfs which is cached will start, for example "find 
/sys" or ps x will work (I started find /sys and ps earlier)
- any command from rootfs which is *not* cached will not start, i.e. 
"md5sum -h" will freeze
- kjournald is in D state (probably trying to access inaccessible 
device), which is interesting and may be the reason why everything seems 
to freeze?


-- 
Tomasz Chmielewski
http://wpkg.org

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: connection, host resets, I/O errors eventually (DRBD, but not only)

Reply via email to