Hello all,
I encounter a weird issue with open-iscsi. I have a test machine with 500
iscsi volumes backed by an IP san. The test machine then performs r/w with
o_direct on those 500 raw block devices. During the test I trigger a
failure on the IP san so some iscsi connections break. iscsi client is
able to reconnect and recover, however, immediately after recovery,
some iscsi read finds corrupted data.
This issue happens frequently. After a lot of tracing on the IP san
server, we become sure that those corrupted read requests have never been
received by iscsi server at IP san.
In the following timeline diagram, the client generates the read around
time t1 when connections are turned down. iscsi connection recovered at
time t2. The time between t1 and t2 is about 15~20 seconds. Read returns
several seconds after t2.
cut iscsi connections iscsi connection
recoveryed
------------------------- t1 ------------------------------------------- t2
---------------------------------->
The client machine uses Linux libaio to perform read/write. The read/write
is performed in the following approach:
- blk devices are opened with O_DIRECT, io buffer is 4K-aligned, io
offset is 4K aligned.
- Call io_submit() to submit requests to blk device.
- call io_getevents() to wait for completion events.
* If the status is “N bytes done”, assumes I/O was successful.
* If the status is “-1”, assume IO failure.
Is it possible that, iscsi layer will mark a blk_read/write completion
with 0-bytes done because the connection is not available, and the upper
layer will receive a completion with 0-bytes as the result?
Thank you for reading.
-Shawn
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/open-iscsi/f76e46dc-f2eb-48fa-8431-f85ee719a181n%40googlegroups.com.