Hello Shlomo and Or,
I'm Guilherme Piccoli from LTC/IBM - firstly, sorry to bother you.
We are running some tests with iSCSI and we found an issue caused
possibly by commit 659743b02c41 ("libiscsi: Reduce locking contention in
fast path").
After some time (+/- 1 hour) of testing with a hardware target (using
fio benchmark tool), we got a kernel oops; the following link is a
pastebin of the error message (we got lots of these messages, since our
system has multiple cores): http://codepad.org/KS2C9Jjt
With some debugging, we could find the exact point of the crash, caused
by a null-pointer read: sc == NULL on sc->device->lun at libiscsi.c:369.
But as you can see in error messages, some list issue seems to be
possibly leading to this null-pointer situation.
After reverting the aforementioned commit, the issue is gone and we can
run the benchmark many times without a single failure. The issue is hard
to reproduce; we only were able to reproduce in high bandwidth
environment (10Gb network) with the our hardware target (IBM FlashSystem
840). Notice that from the initiator side we're using software iSCSI
(iscsi_tcp/libiscsi_tcp).
We'd really appreciate if you could give us some directions to help us
figuring what's going on - what path might have been taken leading to
that null pointer read? It's hard to debug since I'm no expert in iSCSI,
so any clues or suggestions you can provide would be really appreciated
and helpful.
Any additional information you want, please let me know and I'd be glad
to provide. Again, sorry to bother you.
Thanks in advance,
Guilherme
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.