On Thu, Nov 05, 2015 at 02:33:44PM -0200, Guilherme G. Piccoli wrote:
> Hello Shlomo and Or,
> 
> I'm Guilherme Piccoli from LTC/IBM - firstly, sorry to bother you.
> 
> 
> We are running some tests with iSCSI and we found an issue caused possibly
> by commit 659743b02c41 ("libiscsi: Reduce locking contention in fast path").
> 
> After some time (+/- 1 hour) of testing with a hardware target (using fio
> benchmark tool), we got a kernel oops; the following link is a pastebin of
> the error message (we got lots of these messages, since our system has
> multiple cores): http://codepad.org/KS2C9Jjt

Interesting. From the trace, the list debugging code is detecting
corruption when removing a task from some list.  Could be the connection
mgmtqueue, cmdqueue, or requeue.

After the locking change adding a task to any of those lists is under
the session fwrd_lock, but the call to iscsi_complete_task which deletes
the task from whatever list it's on is under the back_lock.

Am I missing something, or is splitting a linked list across two locks a
major failing of this change?

- Chris

> With some debugging, we could find the exact point of the crash, caused by a
> null-pointer read: sc == NULL on sc->device->lun at libiscsi.c:369. But as
> you can see in error messages, some list issue seems to be possibly leading
> to this null-pointer situation.
> 
> After reverting the aforementioned commit, the issue is gone and we can run
> the benchmark many times without a single failure. The issue is hard to
> reproduce; we only were able to reproduce in high bandwidth environment
> (10Gb network) with the our hardware target (IBM FlashSystem 840). Notice
> that from the initiator side we're using software iSCSI
> (iscsi_tcp/libiscsi_tcp).
> 
> 
> We'd really appreciate if you could give us some directions to help us
> figuring what's going on - what path might have been taken leading to that
> null pointer read? It's hard to debug since I'm no expert in iSCSI, so any
> clues or suggestions you can provide would be really appreciated and
> helpful.
> 
> Any additional information you want, please let me know and I'd be glad to
> provide. Again, sorry to bother you.
> 
> Thanks in advance,
> 
> 
> 
> Guilherme

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to