On 01/18/2012 12:58 PM, Dennis Jacobfeuerborn wrote: > Thanks for the response. I'm currently toying with this on my Fedora 15 > system but eventually this will be implemented on a centos 6.2 system with: > > root@dus1san1:~# iscsiadm -V > iscsiadm version 2.0-872.33.el6 > root@dus1san1:~# uname -a > Linux dus1san1.cvsn.local 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 > GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
> > My problem is that I'm not sure how the various timeouts relate to each > other. What I basically want to be able to do is to guarantee that if e.g. > a network outage lasts X seconds I want the virtual machines to recover and > not get an I/O error resulting in a corrupt filesystem. > > From the readme it sound like the first thing that happens are the 5 "ping" > retries and this would last 5*noop_out_timeout seconds. What happens after There are not ping retries. Just one chance. There are 5 retries for disk IO. > that? > It sounds like a re-establishment of the connection is then attempted. Will > this then generate new noop retry cycle and last until the > replacement_timeout has passed? At which point does the os device timeout > come into play (/sys/block/sdX/...)? No. > > I guess what I'm looking for is a sort of timeline. The network gets > unplugged and an I/O request is issued (e.g. a simple "ls" on the > filesystem on an iscsi device) to the device. What happens with this I/O > request until it hits the wall and the failure manifest itself and show up > as an I/O error on the console? 1 Initiator sends ping if there is not activity (READ/WRITE request being sent) on the connection for timeo.noop_out_interval seconds. 2 If we do not get a responce for the ping in noop_out_timeout seconds we fail the connection. 3. iscsi layer will try to relogin to the target. 4. A. If the command was running (it has not timed out and the scsi eh is not running) then the IO will be failed to the scsi layer and if it has retries left (so if it has been retried less than 5 times for disk IO) it will be queue in the block/scsi layer. B. If the command had already timedout then it is sort of stuck in the scsi eh until we relogin or replacement_timeout fires. It will sit in there waiting for the outcome of #5. 5. A. If we relogin within replacement_timeout seconds then IO will be restarted if the command had enough retries left. B. If cannot relogin withing replacement_timeout seconds then the IO will be failed upwards (if you are using dm-multipath then it will handle the problem). > (Currently I'm not using multipath in the setup I'm experimenting with) > > Regards, > Dennis > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
