Nice debugging! Thanks. When I get off of work today I will work on a fix. It looks like we need to rework how we bring up sessions/hosts (do the 2 stage alloc then add so the interface is not presented until after we have done all the allocations and setup) and then it looks like we can hit a similar bug on the release/free side where we kfree stuff while it could be in use.
On 08/07/2013 09:40 AM, Lev Vainblat wrote: > Hi Mike, > > I'm working together with Alex on this issue. > > I guess the problem is in iscsi_sw_tcp_session_create(): > > It first calls iscsi_host_add(shost, NULL), and only later > tcp_sw_host->session = session; > > iscsi_host_add() eventually calls scsi_sysfs_add_host(), so sysfs > special file is created before tcp_sw_host->session is ready. If now > iscsi_sw_tcp_host_get_param() is called from iscsiadm, it gets NULL > session pointer and fails. > > The issue is easily reproducible if you insert some sleep in > iscsi_sw_tcp_session_create() after iscsi_host_add() and calls for > example "iscsiadm -m session" during the sleep. > > -Lev. > > -----Original Message----- From: Alex Lyakas > Sent: Monday, July 29, 2013 9:34 PM > To: Mike Christie > Cc: [email protected] ; Lev Vainblat ; Yair Hershko ; Liran > Strugano > Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param > > Thank you, Mike. > We are running kernel 3.8.13, and now we can repro this pretty reliably. > > Alex. > > > -----Original Message----- From: Mike Christie > Sent: 29 July, 2013 9:29 PM > To: Alex Lyakas > Cc: [email protected] ; Lev Vainblat ; Yair Hershko ; Liran > Strugano > Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param > > The iscsiadm crash is sort of expected due to where the kernel is > crashing initially. > > When I get some time at work, I will make a debug patch for you run with > that should spit out some extra info. > > On 07/28/2013 03:33 AM, Alex Lyakas wrote: >> Hi Mike, >> Attached is a trace of another repro, maybe it will give more info. >> The crashing iscsiadm process (2704) was spawned via fork/exec by our >> "zadara_vam" process (2657). >> >> Thanks, >> Alex. >> >> >> -----Original Message----- From: Mike Christie >> Sent: 22 July, 2013 7:14 PM >> To: [email protected] >> Cc: Alex Lyakas ; Lev Vainblat ; Yair Hershko ; Liran Strugano >> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param >> >> For me to replicate the problem I just login to the target using bnx2i. >> Just one instance of iscsiadm -m node --login using the bnx2i driver >> causes the problem. >> >> I have been trying to replicate when adding debug comments in the kernel >> but that made the problem go away. I also tried getting a crash report >> but with that enabled the problem went away. >> >> >> On 07/22/2013 09:24 AM, Alex Lyakas wrote: >>> Hi Mike, >>> any advice on how to proceed further with this issue? >>> >>> Thanks, >>> Alex. >>> >>> >>> -----Original Message----- From: Alex Lyakas >>> Sent: 02 July, 2013 9:41 PM >>> To: Mike Christie ; [email protected] >>> Cc: Lev Vainblat ; Yair Hershko >>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param >>> >>> Hi Mike, >>> For us it happened only once till now; and from our kernel log, I don't >>> think anything special was going on during that time, except that we >>> were >>> reading the sysfs entry. Can you pls share how do you replicate the >>> problem >>> with the Oracle kernel? If this narrows us down a bit on how to >>> replicate, >>> then, yes, we can apply a debugging patch. >>> >>> One thing our application is doing, is to run several iscsiadm commands >>> (via >>> fork/exec) in parallel. Is this, in general, a safe thing to do, i.e., >>> running multiple iscsiadm processes in parallel? Each iscsiadm process >>> operates against a different iSCSI target. >>> >>> Thanks, >>> Alex. >>> >>> >>> -----Original Message----- From: Mike Christie >>> Sent: 02 July, 2013 8:23 PM >>> To: [email protected] >>> Cc: Alex Lyakas ; Lev Vainblat ; Yair Hershko >>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param >>> >>> Hey, >>> >>> Is it easy for you to replicate this problem and if so would it be >>> possible to run with a patch that spits out some extra debugging info? >>> >>> It is easy for me to replicate with the Oracle linux kernel, but when I >>> add debugging it seems to move around or become difficult to hit. >>> >>> >>> On 06/27/2013 04:54 AM, Alex Lyakas wrote: >>>> Hello Mike, >>>> thank you for responding to my bug report. >>>> Here is the information you asked for: >>>> >>>> This issue happened within a virtual machine. The network interface, >>>> that is used for iscsi within the VM is a SR-IOV Virtual Function. The >>>> VM runs a stock ixgbevf driver from 3.8.13 mainline kernel. On the >>>> physical machine, the Virtual Function is spawned out of Intel 82599EB >>>> card. The ixgbe driver for the Intel card on the physical machine is >>>> 3.11.33. The physical machine runs stock Ubuntu Precise kernel >>>> "3.2.0-29-generic #46-Ubuntu", while the VM runs mainline 3.8.13 >>>> kernel. >>>> From within the VM, we connect to targets that live both on the same >>>> physical machine and on other physical machines. >>>> >>>> I am attaching a .config file for the VM kernel. We did not build the >>>> kernel ourselves, this is a mainline build done by Ubunti here: >>>> http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13-raring/ >>>> I am also attaching the full kernel log with the information about the >>>> crash, it has more stack traces. >>>> >>>> The iscsiadm commands that we use (from another application via >>>> fork()/exec()) are: >>>> iscsiadm --mode node --portal <portal> --targetname <name> --op new >>>> iscsiadm --mode node --portal <portal> --targetname <name> --login >>>> iscsiadm --mode node --portal <portal> --targetname <name> --logout >>>> iscsiadm --mode node [--print <level>] >>>> iscsiadm --mode session [--print <level>] >>>> iscsiadm --mode host [--print <level>] >>>> >>>> Occasionally we also read sysfs attributes, with a script that collects >>>> all system's sysfs entries for further analysis. >>>> >>>> We use version 2.0-871 of open-iscsi, we install them via apt-get: >>>> dpkg -l: >>>> ii open-iscsi 2.0.871-0ubuntu9.12.04.1 >>>> High performance, transport independent iSCSI implementation >>>> ii open-iscsi-utils 2.0.871-0ubuntu9.12.04.1 >>>> iSCSI initiatior administrative utility >>>> >>>> We open one session to each target, but we connect to multiple targets. >>>> >>>> Please let me know if any other info is needed. >>>> >>>> Thanks for your help, >>>> Alex. >>>> >>>> >>>> -----Original Message----- From: Mike Christie >>>> Sent: 27 June, 2013 10:13 AM >>>> To: [email protected] >>>> Cc: Alex Lyakas >>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param >>>> >>>> On 06/26/2013 07:50 PM, Mike Christie wrote: >>>>> On 06/26/2013 05:27 PM, Mike Christie wrote: >>>>>> We have not seen it before. I am not seeing it here. >>>>> >>>>> Oh wait, I can hit it when using bnx2i and the OEL kernel >>>>> 2.6.39-400.17.1.el6uek.x86_64 kernel. I do not hit it with iscsi_tcp >>>>> though. Have not tried other upstream kernels with offload yet. >>>>> >>>> >>>> Huh. I tried upstream 2.6.39 to 3.8 and also 3.8.10 and could not hit >>>> the problem. I only hit it with that OEL kernel when using bnx2x. >>>> Also I >>>> hit the oops in a slightly different place. >>>> >>> > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/groups/opt_out.
