Maybe we want to do this in 2 steps. The attached patch is a simple 2
liner that seems to fix the problem for me. If it is ok with you guys we
could send that upstream and into the stable kernel so it will hit
debian and other distros.

I think though there are other possibly bugs and so I think the correct
fix is to rework the allocation/addition and release (it seems there are
similar bugs in that path) code. The rework patch I am still working on
is more invasive and has dependency issues so I am not sure if it can
easily go into the stable kernels you guys use.

Let me know what you guys think.


On 08/07/2013 01:56 PM, Mike Christie wrote:
> Nice debugging! Thanks.
> 
> When I get off of work today I will work on a fix. It looks like we need
> to rework how we bring up sessions/hosts (do the 2 stage alloc then add
> so the interface is not presented until after we have done all the
> allocations and setup) and then it looks like we can hit a similar bug
> on the release/free side where we kfree stuff while it could be in use.
> 
> 
> On 08/07/2013 09:40 AM, Lev Vainblat wrote:
>> Hi Mike,
>>
>> I'm working together with Alex on this issue.
>>
>> I guess the problem is in iscsi_sw_tcp_session_create():
>>
>> It first calls iscsi_host_add(shost, NULL), and only later
>> tcp_sw_host->session = session;
>>
>> iscsi_host_add() eventually calls scsi_sysfs_add_host(), so sysfs
>> special file is created before tcp_sw_host->session is ready. If now
>> iscsi_sw_tcp_host_get_param() is called from iscsiadm, it gets NULL
>> session pointer and fails.
>>
>> The issue is easily reproducible if you insert some sleep in
>> iscsi_sw_tcp_session_create() after iscsi_host_add() and calls for
>> example "iscsiadm -m session" during the sleep.
>>
>> -Lev.
>>
>> -----Original Message----- From: Alex Lyakas
>> Sent: Monday, July 29, 2013 9:34 PM
>> To: Mike Christie
>> Cc: open-iscsi@googlegroups.com ; Lev Vainblat ; Yair Hershko ; Liran
>> Strugano
>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>
>> Thank you, Mike.
>> We are running kernel 3.8.13, and now we can repro this pretty reliably.
>>
>> Alex.
>>
>>
>> -----Original Message----- From: Mike Christie
>> Sent: 29 July, 2013 9:29 PM
>> To: Alex Lyakas
>> Cc: open-iscsi@googlegroups.com ; Lev Vainblat ; Yair Hershko ; Liran
>> Strugano
>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>
>> The iscsiadm crash is sort of expected due to where the kernel is
>> crashing initially.
>>
>> When I get some time at work, I will make a debug patch for you run with
>> that should spit out some extra info.
>>
>> On 07/28/2013 03:33 AM, Alex Lyakas wrote:
>>> Hi Mike,
>>> Attached is a trace of another repro, maybe it will give more info.
>>> The crashing iscsiadm process (2704) was spawned via fork/exec by our
>>> "zadara_vam" process (2657).
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>> -----Original Message----- From: Mike Christie
>>> Sent: 22 July, 2013 7:14 PM
>>> To: open-iscsi@googlegroups.com
>>> Cc: Alex Lyakas ; Lev Vainblat ; Yair Hershko ; Liran Strugano
>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>>
>>> For me to replicate the problem I just login to the target using bnx2i.
>>> Just one instance of iscsiadm -m node --login using the bnx2i driver
>>> causes the problem.
>>>
>>> I have been trying to replicate when adding debug comments in the kernel
>>> but that made the problem go away. I also tried getting a crash report
>>> but with that enabled the problem went away.
>>>
>>>
>>> On 07/22/2013 09:24 AM, Alex Lyakas wrote:
>>>> Hi Mike,
>>>> any advice on how to proceed further with this issue?
>>>>
>>>> Thanks,
>>>> Alex.
>>>>
>>>>
>>>> -----Original Message----- From: Alex Lyakas
>>>> Sent: 02 July, 2013 9:41 PM
>>>> To: Mike Christie ; open-iscsi@googlegroups.com
>>>> Cc: Lev Vainblat ; Yair Hershko
>>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>>>
>>>> Hi Mike,
>>>> For us it happened only once till now; and from our kernel log, I don't
>>>> think anything special was going on during that time, except that we
>>>> were
>>>> reading the sysfs entry. Can you pls share how do you replicate the
>>>> problem
>>>> with the Oracle kernel? If this narrows us down a bit on how to
>>>> replicate,
>>>> then, yes, we can apply a debugging patch.
>>>>
>>>> One thing our application is doing, is to run several iscsiadm commands
>>>> (via
>>>> fork/exec) in parallel. Is this, in general, a safe thing to do, i.e.,
>>>> running multiple iscsiadm processes in parallel? Each iscsiadm process
>>>> operates against a different iSCSI target.
>>>>
>>>> Thanks,
>>>> Alex.
>>>>
>>>>
>>>> -----Original Message----- From: Mike Christie
>>>> Sent: 02 July, 2013 8:23 PM
>>>> To: open-iscsi@googlegroups.com
>>>> Cc: Alex Lyakas ; Lev Vainblat ; Yair Hershko
>>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>>>
>>>> Hey,
>>>>
>>>> Is it easy for you to replicate this problem and if so would it be
>>>> possible to run with a patch that spits out some extra debugging info?
>>>>
>>>> It is easy for me to replicate with the Oracle linux kernel, but when I
>>>> add debugging it seems to move around or become difficult to hit.
>>>>
>>>>
>>>> On 06/27/2013 04:54 AM, Alex Lyakas wrote:
>>>>> Hello Mike,
>>>>> thank you for responding to my bug report.
>>>>> Here is the information you asked for:
>>>>>
>>>>> This issue happened within a virtual machine. The network interface,
>>>>> that is used for iscsi within the VM is a SR-IOV Virtual Function. The
>>>>> VM runs a stock ixgbevf driver from 3.8.13 mainline kernel. On the
>>>>> physical machine, the Virtual Function is spawned out of Intel 82599EB
>>>>> card. The ixgbe driver for the Intel card on the physical machine is
>>>>> 3.11.33. The physical machine runs stock Ubuntu Precise kernel
>>>>> "3.2.0-29-generic #46-Ubuntu", while the VM runs mainline 3.8.13
>>>>> kernel.
>>>>> From within the VM, we connect to targets that live both on the same
>>>>> physical machine and on other physical machines.
>>>>>
>>>>> I am attaching a .config file for the VM kernel. We did not build the
>>>>> kernel ourselves, this is a mainline build done by Ubunti here:
>>>>> http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13-raring/
>>>>> I am also attaching the full kernel log with the information about the
>>>>> crash, it has more stack traces.
>>>>>
>>>>> The iscsiadm commands that we use (from another application via
>>>>> fork()/exec()) are:
>>>>> iscsiadm --mode node --portal <portal> --targetname <name> --op new
>>>>> iscsiadm --mode node --portal <portal> --targetname <name> --login
>>>>> iscsiadm --mode node --portal <portal> --targetname <name> --logout
>>>>> iscsiadm --mode node [--print <level>]
>>>>> iscsiadm --mode session [--print <level>]
>>>>> iscsiadm --mode host [--print <level>]
>>>>>
>>>>> Occasionally we also read sysfs attributes, with a script that collects
>>>>> all system's sysfs entries for further analysis.
>>>>>
>>>>> We use version 2.0-871 of open-iscsi, we install them via apt-get:
>>>>> dpkg -l:
>>>>> ii  open-iscsi                          2.0.871-0ubuntu9.12.04.1
>>>>> High performance, transport independent iSCSI implementation
>>>>> ii  open-iscsi-utils                    2.0.871-0ubuntu9.12.04.1
>>>>> iSCSI initiatior administrative utility
>>>>>
>>>>> We open one session to each target, but we connect to multiple targets.
>>>>>
>>>>> Please let me know if any other info is needed.
>>>>>
>>>>> Thanks for your help,
>>>>> Alex.
>>>>>
>>>>>
>>>>> -----Original Message----- From: Mike Christie
>>>>> Sent: 27 June, 2013 10:13 AM
>>>>> To: open-iscsi@googlegroups.com
>>>>> Cc: Alex Lyakas
>>>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>>>>
>>>>> On 06/26/2013 07:50 PM, Mike Christie wrote:
>>>>>> On 06/26/2013 05:27 PM, Mike Christie wrote:
>>>>>>> We have not seen it before. I am not seeing it here.
>>>>>>
>>>>>> Oh wait, I can hit it when using bnx2i and the OEL kernel
>>>>>> 2.6.39-400.17.1.el6uek.x86_64 kernel. I do not hit it with iscsi_tcp
>>>>>> though. Have not tried other upstream kernels with offload yet.
>>>>>>
>>>>>
>>>>> Huh. I tried upstream 2.6.39 to 3.8 and also 3.8.10 and could not hit
>>>>> the problem. I only hit it with that OEL kernel when using bnx2x.
>>>>> Also I
>>>>> hit the oops in a slightly different place.
>>>>>
>>>>
>>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/groups/opt_out.


diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 9e2588a..a95af7c 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -758,6 +758,9 @@ static int iscsi_sw_tcp_host_get_param(struct Scsi_Host 
*shost,
 
        switch (param) {
        case ISCSI_HOST_PARAM_IPADDRESS:
+               if (!session)
+                       return -ENOTCONN;
+
                spin_lock_bh(&session->lock);
                conn = session->leadconn;
                if (!conn) {

Reply via email to