On 11/04/2011 04:27 AM, Sebastian Riemer wrote:
> On 03/11/11 17:58, Mike Christie wrote:
>>
>> Do you mean iscsi_tcp reconnects before the replacement_timeout? If so
>> iser should be doing this too. It is a bug if it is not coming back
>> until after replcement_timeout seconds if the problem has been fixed
>> within that many seconds.
>>   
> 
> Thanks for the explanation, I'll try with more debugging output.
> 
> Actually, everything is fine if the connection loss is fixed within the
> replacement_timeout < 120s. But if the connection loss lasts longer lots
> of problems occur with the file system inside the VM. We can't control
> parameters inside the customer VMs, so we can't tweak FS parameters there.


That is expected. Once replacement_timeout seconds has gone by the iscsi
layer will unblock the scsi devices and fail all IO that was queued and
will fail any new IO sent to them.


> 
> What we want is that everything is blocked until we replace the failed
> component (e.g. an IB switch) without data corruption inside the VMs.
> With Ethernet (iscsi_tcp) we have that behaviour.
> But with RDMA the VM looses much data. If we increase the
> replacement_timeout, then the guest kernel notices after 120s blocked IO
> timeout, discards the IOs and one of the mentioned three cases occurs.

I am not sure what the problem is then. Can't you just increase the
replacement_timeout? Set it to -1 if you want to turn it off and never
fail IO (if you do this then your systems could die for other reasons
like OOM).

The replacement_timeout is there because
1. for multipath you want it shorter so we fail over to other paths quickly.
2. if you queue IO for ever then your system could eventually run out of
memory. If something were to sit around and just keep doing writes to
the iscsi device then more and more memory is allocated until it runs
out because IO is never run.
3. I have no idea what people want in their specific setup (do you want
#1 or #2) so you have to set it like how you want.


> 
> I guess that some piece of software has to resend the pending IO in case
> of an IO error when the connection becomes available again and the guest
> kernel may not notice. I think that QEMU/KVM does something like that.
> 
> Is there a difference in the IO errors which are reported to the
> application depending on the transport (iser or tcp)?

No. The libiscsi layer handles this for both drivers. It should be
exactly the same.

> 
> Btw: IPoIB and iscsi_tcp is even worse. There, the VM can't be shut down
> any more (IO error) and the QEMU/KVM process can't be killed by SIGKILL
> immediately. The defunct process remains in the system for a long time.
> 
> So, is this some kind of QEMU/KVM or open-iscsi issue?
> 

That is just how it works. With FC or SAS you have similar timers. If
the problem last for longer than them then they fail IO upwards in the
storage stack. The FS will then die or go into read only mode depending
on what options you used for the FS.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to