{{{
static void __scsi_scan_target(struct device *parent, unsigned int channel,
...
printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: lun
(%d) != SCAN_WILD_CARD, rescan=%d \n", lun, rescan);
scsi_probe_and_add_lun(starget, lun, NULL, NULL, rescan, NULL);
...
/*
* Scan LUN 0, if there is some response, scan further. Ideally, we
* would not configure LUN 0 until all LUNs are scanned.
*/
printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: before
scanning LUN 0, rescan=%d \n", rescan);
res = scsi_probe_and_add_lun(starget, 0, &bflags, NULL, rescan, NULL);
printk(KERN_INFO "__scsi_scan_target --> scsi_probe_and_add_lun: scanning
LUN 0: rescan=%d, res=%d \n", lun, res);
...
}}}
The output contains the following messages (a few times) (the first+last prints
here are new, the third one I'll explain in a moment..):
{
dmesg.0:__scsi_scan_target --> scsi_probe_and_add_lun: before scanning LUN 0,
rescan=0
dmesg.0:caley debug: scsi_probe_and_add_lun start, lun=0, rescan=0
dmesg.0:caley debug: scsi_probe_and_add_lun out_free_sdev, res=0
dmesg.0:__scsi_scan_target --> scsi_probe_and_add_lun: scanning LUN 0:
rescan=-1, res=0
}
Regarding the third print ("...scsi_probe_and_add_lun out_free_sdev..."), I added it just before
the LAST return at the bottom line of " scsi_probe_and_add_lun". Since it always appears in the log
right after " scsi_probe_and_add_lun start", I concluded that the function always finished.
I agree that this does not add up very well with having the trace *currently in side* of
"scsi_probe_and_add_lun".
Perhaps a timing issue of "when were the logs taken vs when was the trace
taken" ? It would be more likely if we'd have an infinite loop (that is half time in
the function and half time out of it)
Perhaps a bad stack trace? I don't think so, the "scsi_probe_and_add_lun" keeps
appearing in stack traces I'm getting.
Regarding having "max_dev_lun=1":
MAX_SCSI_LUNS is defined as 512, as usual.
CONFIG_SCSI_MULTI_LUN - I'm not sure, I doubt we changed the default to "don't work with multi
lun". I think that this "max_dev_lun=1" might be coming from earlier in the function, a lot of
code there changes it. Anyway it seems to be less relevant since it's not causing an infinite loop as was
expected and also because "scsi_sequential_lun_scan" doesn't even appear in the stack trace. (If
you'd like, I'll run a test to check what it is defined to anyway).
-----Original Message-----
From: Mike Christie [mailto:[email protected]]
Sent: Sunday, June 01, 2014 10:00 PM
To: [email protected]
Cc: Cale, Yonatan; [email protected]
Subject: Re: Target reboot -> iscsiadm rescan Stuck
On 06/01/2014 01:03 PM, Cale, Yonatan wrote:
Hi again Mike,
I'd like to update you on this topic.
The short story is that it probably isn't an infinite loop in
"scsi_sequential_lun_scan". I can't "prove" that it is or isn't in the
scsi or iscsi layer, so please decide based on your own judgment if
you'd like to take part in the investigation. Obviously I'd be glad to
hear from you. :)
Evidence:
1)
I added printk's to scsi_sequential_lun_scan:
{{{
static void scsi_sequential_lun_scan(struct scsi_target *starget,
int bflags, int scsi_level, int
rescan) { ...
printk(KERN_INFO "caley debug: scsi scan: Sequential scan of"
"%s, max_dev_lun=%d,
bflags=%d\n", dev_name(&starget->dev), max_dev_lun, bflags); ...
for (lun = 1; lun < max_dev_lun; ++lun)
{
probe_result = scsi_probe_and_add_lun(starget, lun, NULL,
NULL, rescan, NULL);
printk(KERN_INFO "caley debug: scsi_probe_and_add_lun (lun=%d,
rescan=%d) = %d", lun, rescan, probe_result);
if ((probe_result != SCSI_SCAN_LUN_PRESENT) &&
!sparse_lun)
return;
}
}}}
And the output after a run is 2 of these prints:
caley debug: scsi scan: Sequential scan oftarget0:0:0, max_dev_lun=1,
bflags=0
How are you getting max_dev_lun=1? Do you build your kernel without
CONFIG_SCSI_MULTI_LUN, or did you modify iscsi_tcp's max lun?
Showing that the "for" loop that was expected to be infinite - doesn't
run at all (!!)
2)
Here is part of a core dump, giving us the stack trace of the (probably) stuck
iscsiadm process, while it is stuck:
{
#6 [ffff880094b6f8d0] memcpy at ffffffff8127d0b0
/emc/krnl/kernels/linux-3.0.56/arch/x86/lib/memcpy_64.S: 159
#7 [ffff880094b6f8d0] vsnprintf at ffffffff8127b997
/emc/krnl/kernels/linux-3.0.56/lib/vsprintf.c: 1202
#8 [ffff880094b6f940] kvasprintf at ffffffff8127ebae
/emc/krnl/kernels/linux-3.0.56/lib/kasprintf.c: 31
#9 [ffff880094b6f990] kobject_set_name_vargs at ffffffff81273fcd
/emc/krnl/kernels/linux-3.0.56/lib/kobject.c: 225
#10 [ffff880094b6f9b0] dev_set_name at ffffffff812dca69
/emc/krnl/kernels/linux-3.0.56/drivers/base/core.c: 814
#11 [ffff880094b6fa90] scsi_sysfs_device_initialize at ffffffff81310c18
/emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_sysfs.c: 1082
#12 [ffff880094b6fad0] scsi_alloc_sdev at ffffffff8130dc62
/emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 306
#13 [ffff880094b6fb10] scsi_probe_and_add_lun at ffffffff8130e82d
/emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1044
#14 [ffff880094b6fc40] __scsi_scan_target at ffffffff8130f168
/emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1224
#15 [ffff880094b6fd30] scsi_scan_target at ffffffff8130fa28
/emc/krnl/kernels/linux-3.0.56/drivers/scsi/scsi_scan.c: 1650
#16 [ffff880094b6fd70] iscsi_user_scan_session at ffffffffa00111b6
[scsi_transport_iscsi] }
My point in this stack is that "scsi_sequential_lun_scan" is not called at all.
"scsi_probe_and_add_lun" is called directly from "__scsi_scan_target".
Later I also added some print in " scsi_probe_and_add_lun" and it is definitely
not called ~infinit'ish times. I can also say that every time it started to run, it
finished running.
I am not sure how you can say it always completes when in the above trace we
are stuck in scsi_probe_and_add_lun. If we are not really stuck in there (not
sure if it is just a bad trace because you said you saw the sequential scan
printk so we must have moved past that initial probe and add lun call), then
where are we stuck in the scsi_scan_target/__scsi_scan_target code path? Did
you already stick some printks in there to narrow it down? If not, then do that.
During this test, before you do the rescan command, are there luns/devices
setup on the initiator side? I mean is there a scsi_device for LUN0 already
setup or did you delete the luns/scsi_devices at some time before you did the
rescan?
If there is a lun0 already present, then you should add printks to
scsi_probe_and_add_lun to see why it was not found and we are going down the
code path to add it. scsi_probe_and_add_lun should have returned
SCSI_SCAN_LUN_PRESENT.