Hi all,

we've found out that open-iscsi (also with newest userspace source from
Git, 3.0.4 kernel) immediately trys to disconnect the iSER session upon
connection loss. Why is that so? This is blocked if the device is in use.

The Solaris COMSTAR iSER target doesn't show any errors.

We have our QEMU/KVM VMs with Debian Squeeze running inside with ext3 on
the iSER storage. I ran a continuous dd to a file inside the VM to
produce some IO.

Meanwhile on the host, I ran a little script with a timed loop of
"ibportstate <LID> <port> reset" to hold down the IB link ("disable"
only works for IB switches).

After 120s default replacement_timeout the errors are reported to the VM
and in the moment the IB link comes available again ext3 breaks always
for sure (becomes read-only).
TCP over Ethernet would reconnect and resend packets as if there wasn't
any error. Can such a behaviour be implemented with iSER, too?

If I increase the replacement_timeout, then the guest kernel notices
that the IO is blocked for more than 120s. If the connection comes
available again, there are three different cases:

1. no errors,
2. only written file is broken, or
3. ext3 is broken once again

We'll try to handle this with dm-multipath.
Any ideas how to improve error handling here?

Cheers,

Sebastian


On 26/10/11 14:18, Mike Christie wrote:
> On 10/26/2011 06:48 AM, Sebastian Riemer wrote:
>   
>> Hi all,
>>
>> we have OpenIndiana storage servers as iSER targets and Debian Squeeze
>> with open-iscsi 2.0.871.3-2squeeze as initiators. The Debian systems run
>> lots of QEMU/KVM VMs on the iSER SCSI devices.
>>
>> What kind of connection error is shown here and why does iSER disconnect
>> instead of blocking?
>>     
> Is there some log before this part?
>
> Are you doing something on the target? In older tools if the target
> returned a login error indicating it was not coming back iscsid would
> logout the session destroying /dev/sdXs. I am not sure what is in
> debian's code.
>
>
>
>   
>> Oct 25 13:36:07 server14 kernel: [79490.310169] sd 6:0:0:215:
>> [sdhi] Synchronizing SCSI cache
>> Oct 25 13:36:07 server14 kernel: [79490.568167] iser:
>> iser_cma_handler:event 10 status 0 conn ffff8807ff2a5a80 id ffff881
>> 00755f800
>> Oct 25 13:36:07 server14 kernel: [79490.568259]  connection2:0:
>> detected conn error (1011)
>> Oct 25 13:36:07 server14 kernel: [79490.818577] iser:
>> iscsi_iser_ep_disconnect:ib conn ffff8807ff2a5a80 state 3
>> Oct 25 13:36:07 server14 kernel: [79490.818944] iser:
>> iser_free_ib_conn_res:freeing conn ffff8807ff2a5a80 cma_id ffff8810
>> 0755f800 fmr pool ffff8807ff305d00 qp ffff8807fe707e00
>> Oct 25 13:36:07 server14 kernel: [79490.863799] iser:
>> iser_device_try_release:device ffff8807febc3140 refcount 0
>>
>> The VMs need their storage block devices. These are removed although
>> they are in use.
>>
>> We have pretty long device names (38 chars) and we use the device mapper
>> (36 char names) to be able to perform storage live migration.
>>
>> The error occurred during normal operation.
>>
>> Thanks in advance!
>>
>>     

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to