RE: Target reboot -> iscsiadm rescan Stuck

Cale, Yonatan Tue, 03 Jun 2014 04:48:08 -0700

Hi,

Regarding debug prints in "__scsi_scan_target", here is what I added:
{{{
static void __scsi_scan_target(struct device *parent, unsigned int channel,
...
                printk(KERN_INFO "__scsi_scan_target --> 
scsi_probe_and_add_lun: lun (%d) != SCAN_WILD_CARD, rescan=%d \n", lun, rescan);
                scsi_probe_and_add_lun(starget, lun, NULL, NULL, rescan, NULL);
...
        /*
         * Scan LUN 0, if there is some response, scan further. Ideally, we
         * would not configure LUN 0 until all LUNs are scanned.
         */
        printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: before 
scanning LUN 0, rescan=%d \n", rescan);
        res = scsi_probe_and_add_lun(starget, 0, &bflags, NULL, rescan, NULL);
        printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: 
scanning LUN 0: rescan=%d, res=%d \n", lun, res);
...
}}}

The output contains the following messages (a few times) (the first+last prints 
here are new, the third one I'll explain in a moment..):
{
dmesg.0:__scsi_scan_target --> scsi_probe_and_add_lun: before scanning LUN 0, 
rescan=0
dmesg.0:caley debug: scsi_probe_and_add_lun start, lun=0, rescan=0
dmesg.0:caley debug: scsi_probe_and_add_lun out_free_sdev, res=0
dmesg.0:__scsi_scan_target --> scsi_probe_and_add_lun: scanning LUN 0: 
rescan=-1, res=0
}

Regarding the third print ("...scsi_probe_and_add_lun out_free_sdev..."), I 
added it just before the LAST return at the bottom line of " 
scsi_probe_and_add_lun". Since it always appears in the log right after " 
scsi_probe_and_add_lun start", I concluded that the function always finished.
I agree that this does not add up very well with having the trace *currently in 
side* of "scsi_probe_and_add_lun".
Perhaps a timing issue of "when were the logs taken vs when was the trace 
taken" ? It would be more likely if we'd have an infinite loop (that is half 
time in the function and half time out of it)
Perhaps a bad stack trace? I don't think so, the "scsi_probe_and_add_lun" keeps 
appearing in stack traces I'm getting.

Regarding having "max_dev_lun=1":
MAX_SCSI_LUNS is defined as 512, as usual.
CONFIG_SCSI_MULTI_LUN - I'm not sure, I doubt we changed the default to "don't 
work with multi lun". I think that this "max_dev_lun=1" might be coming from 
earlier in the function, a lot of code there changes it. Anyway it seems to be 
less relevant since it's not causing an infinite loop as was expected and also 
because "scsi_sequential_lun_scan" doesn't even appear in the stack trace. (If 
you'd like, I'll run a test to check what it is defined to anyway).

-----Original Message-----
From: Mike Christie [mailto:[email protected]] 
Sent: Sunday, June 01, 2014 10:00 PM
To: [email protected]
Cc: Cale, Yonatan; [email protected]
Subject: Re: Target reboot -> iscsiadm rescan Stuck

On 06/01/2014 01:03 PM, Cale, Yonatan wrote:
> Hi again Mike,
> I'd like to update you on this topic. 
> The short story is that it probably isn't an infinite loop in 
> "scsi_sequential_lun_scan". I can't "prove" that it is or isn't in the 
> scsi or iscsi layer, so please decide based on your own judgment if 
> you'd like to take part in the investigation. Obviously I'd be glad to 
> hear from you. :)
> 
> Evidence:
> 
> 1)
> I added printk's to scsi_sequential_lun_scan:
> {{{
> 
> static void scsi_sequential_lun_scan(struct scsi_target *starget,
>                                      int bflags, int scsi_level, int 
> rescan) { ...
>         printk(KERN_INFO "caley debug: scsi scan: Sequential scan of"
>                                             "%s, max_dev_lun=%d, 
> bflags=%d\n", dev_name(&starget->dev), max_dev_lun, bflags); ...
>         for (lun = 1; lun < max_dev_lun; ++lun)
>         {
>                 probe_result = scsi_probe_and_add_lun(starget, lun, NULL, 
> NULL, rescan, NULL);
>                 printk(KERN_INFO "caley debug: scsi_probe_and_add_lun 
> (lun=%d, rescan=%d) = %d", lun, rescan, probe_result);
>                 if ((probe_result != SCSI_SCAN_LUN_PRESENT) &&
>                     !sparse_lun)
>                         return;
>         }
> }}}
> 
> And the output after a run is 2 of these prints:
> caley debug: scsi scan: Sequential scan oftarget0:0:0, max_dev_lun=1,
> bflags=0

How are you getting max_dev_lun=1? Do you build your kernel without 
CONFIG_SCSI_MULTI_LUN, or did you modify iscsi_tcp's max lun?

> 
> Showing that the "for" loop that was expected to be infinite - doesn't 
> run at all (!!)
> 
> 
> 2)
> Here is part of a core dump, giving us the stack trace of the (probably) 
> stuck iscsiadm process, while it is stuck:
> {
> #6 [ffff880094b6f8d0] memcpy at ffffffff8127d0b0
>     /emc/krnl/kernels/linux-3.0.56/arch/x86/lib/memcpy_64.S: 159
> #7 [ffff880094b6f8d0] vsnprintf at ffffffff8127b997
>     /emc/krnl/kernels/linux-3.0.56/lib/vsprintf.c: 1202
> #8 [ffff880094b6f940] kvasprintf at ffffffff8127ebae
>     /emc/krnl/kernels/linux-3.0.56/lib/kasprintf.c: 31
> #9 [ffff880094b6f990] kobject_set_name_vargs at ffffffff81273fcd
>     /emc/krnl/kernels/linux-3.0.56/lib/kobject.c: 225
> #10 [ffff880094b6f9b0] dev_set_name at ffffffff812dca69
>     /emc/krnl/kernels/linux-3.0.56/drivers/base/core.c: 814
> #11 [ffff880094b6fa90] scsi_sysfs_device_initialize at ffffffff81310c18
>     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_sysfs.c: 1082
> #12 [ffff880094b6fad0] scsi_alloc_sdev at ffffffff8130dc62
>     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 306
> #13 [ffff880094b6fb10] scsi_probe_and_add_lun at ffffffff8130e82d
>    /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1044
> #14 [ffff880094b6fc40] __scsi_scan_target at ffffffff8130f168
>     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1224
> #15 [ffff880094b6fd30] scsi_scan_target at ffffffff8130fa28
>     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1650
> #16 [ffff880094b6fd70] iscsi_user_scan_session at ffffffffa00111b6 
> [scsi_transport_iscsi] }
> 
> My point in this stack is that "scsi_sequential_lun_scan" is not called at 
> all. "scsi_probe_and_add_lun" is called directly from "__scsi_scan_target".
> 
> Later I also added some print in " scsi_probe_and_add_lun" and it is 
> definitely not called ~infinit'ish times. I can also say that every time it 
> started to run, it finished running.
> 

I am not sure how you can say it always completes when in the above trace we 
are stuck in scsi_probe_and_add_lun. If we are not really stuck in there (not 
sure if it is just a bad trace because you said you saw the sequential scan 
printk so we must have moved past that initial probe and add lun call), then 
where are we stuck in the scsi_scan_target/__scsi_scan_target code path? Did 
you already stick some printks in there to narrow it down? If not, then do that.

During this test, before you do the rescan command, are there luns/devices 
setup on the initiator side? I mean is there a scsi_device for LUN0 already 
setup or did you delete the luns/scsi_devices at some time before you did the 
rescan?

If there is a lun0 already present, then you should add printks to 
scsi_probe_and_add_lun to see why it was not found and we are going down the 
code path to add it. scsi_probe_and_add_lun should have returned 
SCSI_SCAN_LUN_PRESENT.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

RE: Target reboot -> iscsiadm rescan Stuck

Reply via email to