Nice debugging! Thanks.

When I get off of work today I will work on a fix. It looks like we need
to rework how we bring up sessions/hosts (do the 2 stage alloc then add
so the interface is not presented until after we have done all the
allocations and setup) and then it looks like we can hit a similar bug
on the release/free side where we kfree stuff while it could be in use.


On 08/07/2013 09:40 AM, Lev Vainblat wrote:
> Hi Mike,
> 
> I'm working together with Alex on this issue.
> 
> I guess the problem is in iscsi_sw_tcp_session_create():
> 
> It first calls iscsi_host_add(shost, NULL), and only later
> tcp_sw_host->session = session;
> 
> iscsi_host_add() eventually calls scsi_sysfs_add_host(), so sysfs
> special file is created before tcp_sw_host->session is ready. If now
> iscsi_sw_tcp_host_get_param() is called from iscsiadm, it gets NULL
> session pointer and fails.
> 
> The issue is easily reproducible if you insert some sleep in
> iscsi_sw_tcp_session_create() after iscsi_host_add() and calls for
> example "iscsiadm -m session" during the sleep.
> 
> -Lev.
> 
> -----Original Message----- From: Alex Lyakas
> Sent: Monday, July 29, 2013 9:34 PM
> To: Mike Christie
> Cc: [email protected] ; Lev Vainblat ; Yair Hershko ; Liran
> Strugano
> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
> 
> Thank you, Mike.
> We are running kernel 3.8.13, and now we can repro this pretty reliably.
> 
> Alex.
> 
> 
> -----Original Message----- From: Mike Christie
> Sent: 29 July, 2013 9:29 PM
> To: Alex Lyakas
> Cc: [email protected] ; Lev Vainblat ; Yair Hershko ; Liran
> Strugano
> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
> 
> The iscsiadm crash is sort of expected due to where the kernel is
> crashing initially.
> 
> When I get some time at work, I will make a debug patch for you run with
> that should spit out some extra info.
> 
> On 07/28/2013 03:33 AM, Alex Lyakas wrote:
>> Hi Mike,
>> Attached is a trace of another repro, maybe it will give more info.
>> The crashing iscsiadm process (2704) was spawned via fork/exec by our
>> "zadara_vam" process (2657).
>>
>> Thanks,
>> Alex.
>>
>>
>> -----Original Message----- From: Mike Christie
>> Sent: 22 July, 2013 7:14 PM
>> To: [email protected]
>> Cc: Alex Lyakas ; Lev Vainblat ; Yair Hershko ; Liran Strugano
>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>
>> For me to replicate the problem I just login to the target using bnx2i.
>> Just one instance of iscsiadm -m node --login using the bnx2i driver
>> causes the problem.
>>
>> I have been trying to replicate when adding debug comments in the kernel
>> but that made the problem go away. I also tried getting a crash report
>> but with that enabled the problem went away.
>>
>>
>> On 07/22/2013 09:24 AM, Alex Lyakas wrote:
>>> Hi Mike,
>>> any advice on how to proceed further with this issue?
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>> -----Original Message----- From: Alex Lyakas
>>> Sent: 02 July, 2013 9:41 PM
>>> To: Mike Christie ; [email protected]
>>> Cc: Lev Vainblat ; Yair Hershko
>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>>
>>> Hi Mike,
>>> For us it happened only once till now; and from our kernel log, I don't
>>> think anything special was going on during that time, except that we
>>> were
>>> reading the sysfs entry. Can you pls share how do you replicate the
>>> problem
>>> with the Oracle kernel? If this narrows us down a bit on how to
>>> replicate,
>>> then, yes, we can apply a debugging patch.
>>>
>>> One thing our application is doing, is to run several iscsiadm commands
>>> (via
>>> fork/exec) in parallel. Is this, in general, a safe thing to do, i.e.,
>>> running multiple iscsiadm processes in parallel? Each iscsiadm process
>>> operates against a different iSCSI target.
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>> -----Original Message----- From: Mike Christie
>>> Sent: 02 July, 2013 8:23 PM
>>> To: [email protected]
>>> Cc: Alex Lyakas ; Lev Vainblat ; Yair Hershko
>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>>
>>> Hey,
>>>
>>> Is it easy for you to replicate this problem and if so would it be
>>> possible to run with a patch that spits out some extra debugging info?
>>>
>>> It is easy for me to replicate with the Oracle linux kernel, but when I
>>> add debugging it seems to move around or become difficult to hit.
>>>
>>>
>>> On 06/27/2013 04:54 AM, Alex Lyakas wrote:
>>>> Hello Mike,
>>>> thank you for responding to my bug report.
>>>> Here is the information you asked for:
>>>>
>>>> This issue happened within a virtual machine. The network interface,
>>>> that is used for iscsi within the VM is a SR-IOV Virtual Function. The
>>>> VM runs a stock ixgbevf driver from 3.8.13 mainline kernel. On the
>>>> physical machine, the Virtual Function is spawned out of Intel 82599EB
>>>> card. The ixgbe driver for the Intel card on the physical machine is
>>>> 3.11.33. The physical machine runs stock Ubuntu Precise kernel
>>>> "3.2.0-29-generic #46-Ubuntu", while the VM runs mainline 3.8.13
>>>> kernel.
>>>> From within the VM, we connect to targets that live both on the same
>>>> physical machine and on other physical machines.
>>>>
>>>> I am attaching a .config file for the VM kernel. We did not build the
>>>> kernel ourselves, this is a mainline build done by Ubunti here:
>>>> http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13-raring/
>>>> I am also attaching the full kernel log with the information about the
>>>> crash, it has more stack traces.
>>>>
>>>> The iscsiadm commands that we use (from another application via
>>>> fork()/exec()) are:
>>>> iscsiadm --mode node --portal <portal> --targetname <name> --op new
>>>> iscsiadm --mode node --portal <portal> --targetname <name> --login
>>>> iscsiadm --mode node --portal <portal> --targetname <name> --logout
>>>> iscsiadm --mode node [--print <level>]
>>>> iscsiadm --mode session [--print <level>]
>>>> iscsiadm --mode host [--print <level>]
>>>>
>>>> Occasionally we also read sysfs attributes, with a script that collects
>>>> all system's sysfs entries for further analysis.
>>>>
>>>> We use version 2.0-871 of open-iscsi, we install them via apt-get:
>>>> dpkg -l:
>>>> ii  open-iscsi                          2.0.871-0ubuntu9.12.04.1
>>>> High performance, transport independent iSCSI implementation
>>>> ii  open-iscsi-utils                    2.0.871-0ubuntu9.12.04.1
>>>> iSCSI initiatior administrative utility
>>>>
>>>> We open one session to each target, but we connect to multiple targets.
>>>>
>>>> Please let me know if any other info is needed.
>>>>
>>>> Thanks for your help,
>>>> Alex.
>>>>
>>>>
>>>> -----Original Message----- From: Mike Christie
>>>> Sent: 27 June, 2013 10:13 AM
>>>> To: [email protected]
>>>> Cc: Alex Lyakas
>>>> Subject: Re: NULL pointer deref in iscsi_sw_tcp_host_get_param
>>>>
>>>> On 06/26/2013 07:50 PM, Mike Christie wrote:
>>>>> On 06/26/2013 05:27 PM, Mike Christie wrote:
>>>>>> We have not seen it before. I am not seeing it here.
>>>>>
>>>>> Oh wait, I can hit it when using bnx2i and the OEL kernel
>>>>> 2.6.39-400.17.1.el6uek.x86_64 kernel. I do not hit it with iscsi_tcp
>>>>> though. Have not tried other upstream kernels with offload yet.
>>>>>
>>>>
>>>> Huh. I tried upstream 2.6.39 to 3.8 and also 3.8.10 and could not hit
>>>> the problem. I only hit it with that OEL kernel when using bnx2x.
>>>> Also I
>>>> hit the oops in a slightly different place.
>>>>
>>>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to