More details: 1, after read returns, we parse the read buffer and find that the read-buffer contains stale data from previous read, i.e., it seems the kernel didn't update the buffer at all. That's why I suspect the kernel iscsi client didn't perform the read, it just bounce back the request to upper layer and mark it completed.
2, Client is Ubuntu 18.04 with stock open-iscsi. -Shawn On Friday, February 19, 2021 at 8:28:14 PM UTC-8 [email protected] wrote: > Hello all, > > I encounter a weird issue with open-iscsi. I have a test machine with 500 > iscsi volumes backed by an IP san. The test machine then performs r/w with > o_direct on those 500 raw block devices. During the test I trigger a > failure on the IP san so some iscsi connections break. iscsi client is > able to reconnect and recover, however, immediately after recovery, > some iscsi read finds corrupted data. > > This issue happens frequently. After a lot of tracing on the IP san > server, we become sure that those corrupted read requests have never been > received by iscsi server at IP san. > > In the following timeline diagram, the client generates the read around > time t1 when connections are turned down. iscsi connection recovered at > time t2. The time between t1 and t2 is about 15~20 seconds. Read returns > several seconds after t2. > > cut iscsi connections iscsi connection > recoveryed > ------------------------- t1 ------------------------------------------- > t2 ----------------------------------> > > > The client machine uses Linux libaio to perform read/write. The > read/write is performed in the following approach: > > - blk devices are opened with O_DIRECT, io buffer is 4K-aligned, io > offset is 4K aligned. > - Call io_submit() to submit requests to blk device. > - call io_getevents() to wait for completion events. > * If the status is “N bytes done”, assumes I/O was successful. > * If the status is “-1”, assume IO failure. > > Is it possible that, iscsi layer will mark a blk_read/write completion > with 0-bytes done because the connection is not available, and the upper > layer will receive a completion with 0-bytes as the result? > > Thank you for reading. > > > -Shawn > > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/0c62fc2b-0be2-49af-8f57-9117ec4bdb6fn%40googlegroups.com.
