On Thu, Nov 05, 2015 at 02:33:44PM -0200, Guilherme G. Piccoli wrote: > Hello Shlomo and Or, > > I'm Guilherme Piccoli from LTC/IBM - firstly, sorry to bother you. > > > We are running some tests with iSCSI and we found an issue caused possibly > by commit 659743b02c41 ("libiscsi: Reduce locking contention in fast path"). > > After some time (+/- 1 hour) of testing with a hardware target (using fio > benchmark tool), we got a kernel oops; the following link is a pastebin of > the error message (we got lots of these messages, since our system has > multiple cores): http://codepad.org/KS2C9Jjt
Interesting. From the trace, the list debugging code is detecting corruption when removing a task from some list. Could be the connection mgmtqueue, cmdqueue, or requeue. After the locking change adding a task to any of those lists is under the session fwrd_lock, but the call to iscsi_complete_task which deletes the task from whatever list it's on is under the back_lock. Am I missing something, or is splitting a linked list across two locks a major failing of this change? - Chris > With some debugging, we could find the exact point of the crash, caused by a > null-pointer read: sc == NULL on sc->device->lun at libiscsi.c:369. But as > you can see in error messages, some list issue seems to be possibly leading > to this null-pointer situation. > > After reverting the aforementioned commit, the issue is gone and we can run > the benchmark many times without a single failure. The issue is hard to > reproduce; we only were able to reproduce in high bandwidth environment > (10Gb network) with the our hardware target (IBM FlashSystem 840). Notice > that from the initiator side we're using software iSCSI > (iscsi_tcp/libiscsi_tcp). > > > We'd really appreciate if you could give us some directions to help us > figuring what's going on - what path might have been taken leading to that > null pointer read? It's hard to debug since I'm no expert in iSCSI, so any > clues or suggestions you can provide would be really appreciated and > helpful. > > Any additional information you want, please let me know and I'd be glad to > provide. Again, sorry to bother you. > > Thanks in advance, > > > > Guilherme -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.