On 6/3/14, 6:46 AM, Cale, Yonatan wrote:
Hi,

Regarding debug prints in "__scsi_scan_target", here is what I added:

I meant add more. We are trying to debug why the scan does not complete. For the printks you added below it does not tell us where we are stuck in the scan process. It could have got stuck after the printks you added there.

Also, did you even hang in that run you provided the log output for below? It looks like the probe and add for LUN0 returned SCSI_SCAN_NO_RESPONSE, so it looks like in that run you did not drop down to the report luns and sequential scan code like you did in some other runs.

You need to just add some printks or do a kprobe module or figure out how to turn on scsi scan logging or a combo of both.



{{{
static void __scsi_scan_target(struct device *parent, unsigned int channel,
...
                printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: lun 
(%d) != SCAN_WILD_CARD, rescan=%d \n", lun, rescan);
                scsi_probe_and_add_lun(starget, lun, NULL, NULL, rescan, NULL);
...
        /*
         * Scan LUN 0, if there is some response, scan further. Ideally, we
         * would not configure LUN 0 until all LUNs are scanned.
         */
        printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: before 
scanning LUN 0, rescan=%d \n", rescan);
        res = scsi_probe_and_add_lun(starget, 0, &bflags, NULL, rescan, NULL);
        printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: scanning 
LUN 0: rescan=%d, res=%d \n", lun, res);
...
}}}

The output contains the following messages (a few times) (the first+last prints 
here are new, the third one I'll explain in a moment..):
{
dmesg.0:__scsi_scan_target --> scsi_probe_and_add_lun: before scanning LUN 0, 
rescan=0
dmesg.0:caley debug: scsi_probe_and_add_lun start, lun=0, rescan=0
dmesg.0:caley debug: scsi_probe_and_add_lun out_free_sdev, res=0
dmesg.0:__scsi_scan_target --> scsi_probe_and_add_lun: scanning LUN 0: 
rescan=-1, res=0
}

Regarding the third print ("...scsi_probe_and_add_lun out_free_sdev..."), I added it just before 
the LAST return at the bottom line of " scsi_probe_and_add_lun". Since it always appears in the log 
right after " scsi_probe_and_add_lun start", I concluded that the function always finished.
I agree that this does not add up very well with having the trace *currently in side* of 
"scsi_probe_and_add_lun".
Perhaps a timing issue of "when were the logs taken vs when was the trace 
taken" ? It would be more likely if we'd have an infinite loop (that is half time in 
the function and half time out of it)
Perhaps a bad stack trace? I don't think so, the "scsi_probe_and_add_lun" keeps 
appearing in stack traces I'm getting.

Regarding having "max_dev_lun=1":
MAX_SCSI_LUNS is defined as 512, as usual.
CONFIG_SCSI_MULTI_LUN - I'm not sure, I doubt we changed the default to "don't work with multi 
lun". I think that this "max_dev_lun=1" might be coming from earlier in the function, a lot of 
code there changes it. Anyway it seems to be less relevant since it's not causing an infinite loop as was 
expected and also because "scsi_sequential_lun_scan" doesn't even appear in the stack trace. (If 
you'd like, I'll run a test to check what it is defined to anyway).


-----Original Message-----
From: Mike Christie [mailto:[email protected]]
Sent: Sunday, June 01, 2014 10:00 PM
To: [email protected]
Cc: Cale, Yonatan; [email protected]
Subject: Re: Target reboot -> iscsiadm rescan Stuck

On 06/01/2014 01:03 PM, Cale, Yonatan wrote:
Hi again Mike,
I'd like to update you on this topic.
The short story is that it probably isn't an infinite loop in
"scsi_sequential_lun_scan". I can't "prove" that it is or isn't in the
scsi or iscsi layer, so please decide based on your own judgment if
you'd like to take part in the investigation. Obviously I'd be glad to
hear from you. :)

Evidence:

1)
I added printk's to scsi_sequential_lun_scan:
{{{

static void scsi_sequential_lun_scan(struct scsi_target *starget,
                                      int bflags, int scsi_level, int
rescan) { ...
         printk(KERN_INFO "caley debug: scsi scan: Sequential scan of"
                                             "%s, max_dev_lun=%d,
bflags=%d\n", dev_name(&starget->dev), max_dev_lun, bflags); ...
         for (lun = 1; lun < max_dev_lun; ++lun)
         {
                 probe_result = scsi_probe_and_add_lun(starget, lun, NULL, 
NULL, rescan, NULL);
                 printk(KERN_INFO "caley debug: scsi_probe_and_add_lun (lun=%d, 
rescan=%d) = %d", lun, rescan, probe_result);
                 if ((probe_result != SCSI_SCAN_LUN_PRESENT) &&
                     !sparse_lun)
                         return;
         }
}}}

And the output after a run is 2 of these prints:
caley debug: scsi scan: Sequential scan oftarget0:0:0, max_dev_lun=1,
bflags=0


How are you getting max_dev_lun=1? Do you build your kernel without 
CONFIG_SCSI_MULTI_LUN, or did you modify iscsi_tcp's max lun?



Showing that the "for" loop that was expected to be infinite - doesn't
run at all (!!)


2)
Here is part of a core dump, giving us the stack trace of the (probably) stuck 
iscsiadm process, while it is stuck:
{
#6 [ffff880094b6f8d0] memcpy at ffffffff8127d0b0
     /emc/krnl/kernels/linux-3.0.56/arch/x86/lib/memcpy_64.S: 159
#7 [ffff880094b6f8d0] vsnprintf at ffffffff8127b997
     /emc/krnl/kernels/linux-3.0.56/lib/vsprintf.c: 1202
#8 [ffff880094b6f940] kvasprintf at ffffffff8127ebae
     /emc/krnl/kernels/linux-3.0.56/lib/kasprintf.c: 31
#9 [ffff880094b6f990] kobject_set_name_vargs at ffffffff81273fcd
     /emc/krnl/kernels/linux-3.0.56/lib/kobject.c: 225
#10 [ffff880094b6f9b0] dev_set_name at ffffffff812dca69
     /emc/krnl/kernels/linux-3.0.56/drivers/base/core.c: 814
#11 [ffff880094b6fa90] scsi_sysfs_device_initialize at ffffffff81310c18
     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_sysfs.c: 1082
#12 [ffff880094b6fad0] scsi_alloc_sdev at ffffffff8130dc62
     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 306
#13 [ffff880094b6fb10] scsi_probe_and_add_lun at ffffffff8130e82d
    /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1044
#14 [ffff880094b6fc40] __scsi_scan_target at ffffffff8130f168
     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1224
#15 [ffff880094b6fd30] scsi_scan_target at ffffffff8130fa28
     /emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1650
#16 [ffff880094b6fd70] iscsi_user_scan_session at ffffffffa00111b6
[scsi_transport_iscsi] }

My point in this stack is that "scsi_sequential_lun_scan" is not called at all. 
"scsi_probe_and_add_lun" is called directly from "__scsi_scan_target".

Later I also added some print in " scsi_probe_and_add_lun" and it is definitely 
not called ~infinit'ish times. I can also say that every time it started to run, it 
finished running.


I am not sure how you can say it always completes when in the above trace we 
are stuck in scsi_probe_and_add_lun. If we are not really stuck in there (not 
sure if it is just a bad trace because you said you saw the sequential scan 
printk so we must have moved past that initial probe and add lun call), then 
where are we stuck in the scsi_scan_target/__scsi_scan_target code path? Did 
you already stick some printks in there to narrow it down? If not, then do that.

During this test, before you do the rescan command, are there luns/devices 
setup on the initiator side? I mean is there a scsi_device for LUN0 already 
setup or did you delete the luns/scsi_devices at some time before you did the 
rescan?

If there is a lun0 already present, then you should add printks to 
scsi_probe_and_add_lun to see why it was not found and we are going down the 
code path to add it. scsi_probe_and_add_lun should have returned 
SCSI_SCAN_LUN_PRESENT.


--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to