Re: [lkp-robot] [scsi, block] 0dba1314d4: WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup
On Mon, 2017-02-06 at 21:09 -0700, Jens Axboe wrote: > On 02/06/2017 05:14 PM, James Bottomley wrote: > > On Sun, 2017-02-05 at 21:13 -0800, Dan Williams wrote: > > > On Sun, Feb 5, 2017 at 1:13 AM, Christoph Hellwig> > > wrote: > > > > Dan, > > > > > > > > can you please quote your emails? I can't find any content > > > > inbetween all these quotes. > > > > > > Sorry, I'm using gmail, but I'll switch to attaching the logs. > > > > > > So with help from Xiaolong I was able to reproduce this, and it > > > does > > > not appear to be a regression. We simply change the failure > > > output of > > > an existing bug. Attached is a log of the same test on v4.10-rc7 > > > (i.e. without the recent block/scsi fixes), and it shows sda > > > being > > > registered twice. > > > > > > "[6.647077] kobject (d5078ca4): tried to init an initialized > > > object, something is seriously wrong." > > > > > > The change that "scsi, block: fix duplicate bdi name registration > > > crashes" makes is to properly try to register sdb since the sda > > > devt > > > is still alive. However that's not a fix because we've managed to > > > call blk_register_queue() twice on the same queue. > > > > OK, time to involve others: linux-scsi and linux-block cc'd and > > I've > > inserted the log below. > > > > James > > > > --- > > > > [5.969672] scsi host0: scsi_debug: version 1.86 [20160430] > > [5.969672] dev_size_mb=8, opts=0x0, submit_queues=1, > > statistics=0 > > [5.971895] scsi 0:0:0:0: Direct-Access Linuxscsi_debug > > 0186 PQ: 0 ANSI: 7 > > [6.006983] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: > > (8.39 MB/8.00 MiB) > > [6.026965] sd 0:0:0:0: [sda] Write Protect is off > > [6.027870] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 > > [6.066962] sd 0:0:0:0: [sda] Write cache: enabled, read cache: > > enabled, supports DPO and FUA > > [6.486962] sd 0:0:0:0: [sda] Attached SCSI disk > > [6.488377] sd 0:0:0:0: [sda] Synchronizing SCSI cache > > [6.489455] sd 0:0:0:0: Attached scsi generic sg0 type 0 > > [6.526982] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: > > (8.39 MB/8.00 MiB) > > [6.546964] sd 0:0:0:0: [sda] Write Protect is off > > [6.547873] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 > > [6.586963] sd 0:0:0:0: [sda] Write cache: enabled, read cache: > > enabled, supports DPO and FUA > > [6.647077] kobject (d5078ca4): tried to init an initialized > > object, something is seriously wrong. > > So sda is probed twice, and hilarity ensues when we try to register > it twice. I can't reproduce this, using scsi_debug and with > scsi_async enabled. Actually, when you look closely, it's not a double add; it's an add/remove/add. You can see this from [6.488377] sd 0:0:0:0: [sda] Synchronizing SCSI cache That's from sd_shutdown() as the driver is removing. It looks like something with the config caused the built in SCSI debug to do this (not sure why), but since the stack trace is in block, I think the bug is in the remove path: something didn't release the mq object correctly. James
Re: [lkp-robot] [scsi, block] 0dba1314d4: WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup
On Wed, Feb 8, 2017 at 4:08 PM, James Bottomleywrote: > On Mon, 2017-02-06 at 21:42 -0800, Dan Williams wrote: [..] >> ...but it reproduces on current mainline with the same config. I >> haven't spotted what makes scsi_debug behave like this. > > Looking at the config, it's a static debug with report luns enabled. > Is it as simple as the fact that we probe lun 0 manually to see if the > target exists, but then we don't account for the fact that we already > did this, so if it turns up again in the report lun scan, we'll probe > it again leading to a double add. If that theory is correct, this may > be the fix (compile tested only). > > James > > --- > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index 6f7128f..ba4be08 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -1441,6 +1441,10 @@ static int scsi_report_lun_scan(struct scsi_target > *starget, int bflags, > for (lunp = _data[1]; lunp <= _data[num_luns]; lunp++) { > lun = scsilun_to_int(lunp); > > + if (lun == 0) > + /* already scanned LUN 0 */ > + continue; > + > if (lun > sdev->host->max_lun) { > sdev_printk(KERN_WARNING, sdev, > "lun%llu has a LUN larger than" I gave this a shot on top of linux-next, but still hit the failure. Log attached. log Description: Binary data
Re: [lkp-robot] [scsi, block] 0dba1314d4: WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup
On Mon, Feb 6, 2017 at 8:09 PM, Jens Axboewrote: > On 02/06/2017 05:14 PM, James Bottomley wrote: >> On Sun, 2017-02-05 at 21:13 -0800, Dan Williams wrote: >>> On Sun, Feb 5, 2017 at 1:13 AM, Christoph Hellwig wrote: Dan, can you please quote your emails? I can't find any content inbetween all these quotes. >>> >>> Sorry, I'm using gmail, but I'll switch to attaching the logs. >>> >>> So with help from Xiaolong I was able to reproduce this, and it does >>> not appear to be a regression. We simply change the failure output of >>> an existing bug. Attached is a log of the same test on v4.10-rc7 >>> (i.e. without the recent block/scsi fixes), and it shows sda being >>> registered twice. >>> >>> "[6.647077] kobject (d5078ca4): tried to init an initialized >>> object, something is seriously wrong." >>> >>> The change that "scsi, block: fix duplicate bdi name registration >>> crashes" makes is to properly try to register sdb since the sda devt >>> is still alive. However that's not a fix because we've managed to >>> call blk_register_queue() twice on the same queue. >> >> OK, time to involve others: linux-scsi and linux-block cc'd and I've >> inserted the log below. >> >> James >> >> --- >> >> [5.969672] scsi host0: scsi_debug: version 1.86 [20160430] >> [5.969672] dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0 >> [5.971895] scsi 0:0:0:0: Direct-Access Linuxscsi_debug >> 0186 PQ: 0 ANSI: 7 >> [6.006983] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 >> MB/8.00 MiB) >> [6.026965] sd 0:0:0:0: [sda] Write Protect is off >> [6.027870] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 >> [6.066962] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, >> supports DPO and FUA >> [6.486962] sd 0:0:0:0: [sda] Attached SCSI disk >> [6.488377] sd 0:0:0:0: [sda] Synchronizing SCSI cache >> [6.489455] sd 0:0:0:0: Attached scsi generic sg0 type 0 >> [6.526982] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 >> MB/8.00 MiB) >> [6.546964] sd 0:0:0:0: [sda] Write Protect is off >> [6.547873] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 >> [6.586963] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, >> supports DPO and FUA >> [6.647077] kobject (d5078ca4): tried to init an initialized object, >> something is seriously wrong. > > So sda is probed twice, and hilarity ensues when we try to register it > twice. I can't reproduce this, using scsi_debug and with scsi_async > enabled. > > This is running linux-next? What's your .config? > The original failure report is here: http://marc.info/?l=linux-kernel=148619222300774=2 ...but it reproduces on current mainline with the same config. I haven't spotted what makes scsi_debug behave like this.
Re: [lkp-robot] [scsi, block] 0dba1314d4: WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup
On 02/06/2017 05:14 PM, James Bottomley wrote: > On Sun, 2017-02-05 at 21:13 -0800, Dan Williams wrote: >> On Sun, Feb 5, 2017 at 1:13 AM, Christoph Hellwigwrote: >>> Dan, >>> >>> can you please quote your emails? I can't find any content >>> inbetween all these quotes. >> >> Sorry, I'm using gmail, but I'll switch to attaching the logs. >> >> So with help from Xiaolong I was able to reproduce this, and it does >> not appear to be a regression. We simply change the failure output of >> an existing bug. Attached is a log of the same test on v4.10-rc7 >> (i.e. without the recent block/scsi fixes), and it shows sda being >> registered twice. >> >> "[6.647077] kobject (d5078ca4): tried to init an initialized >> object, something is seriously wrong." >> >> The change that "scsi, block: fix duplicate bdi name registration >> crashes" makes is to properly try to register sdb since the sda devt >> is still alive. However that's not a fix because we've managed to >> call blk_register_queue() twice on the same queue. > > OK, time to involve others: linux-scsi and linux-block cc'd and I've > inserted the log below. > > James > > --- > > [5.969672] scsi host0: scsi_debug: version 1.86 [20160430] > [5.969672] dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0 > [5.971895] scsi 0:0:0:0: Direct-Access Linuxscsi_debug 0186 > PQ: 0 ANSI: 7 > [6.006983] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 > MiB) > [6.026965] sd 0:0:0:0: [sda] Write Protect is off > [6.027870] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 > [6.066962] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, > supports DPO and FUA > [6.486962] sd 0:0:0:0: [sda] Attached SCSI disk > [6.488377] sd 0:0:0:0: [sda] Synchronizing SCSI cache > [6.489455] sd 0:0:0:0: Attached scsi generic sg0 type 0 > [6.526982] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 > MiB) > [6.546964] sd 0:0:0:0: [sda] Write Protect is off > [6.547873] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 > [6.586963] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, > supports DPO and FUA > [6.647077] kobject (d5078ca4): tried to init an initialized object, > something is seriously wrong. So sda is probed twice, and hilarity ensues when we try to register it twice. I can't reproduce this, using scsi_debug and with scsi_async enabled. This is running linux-next? What's your .config? -- Jens Axboe
Re: [lkp-robot] [scsi, block] 0dba1314d4: WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup
On Sun, 2017-02-05 at 21:13 -0800, Dan Williams wrote: > On Sun, Feb 5, 2017 at 1:13 AM, Christoph Hellwigwrote: > > Dan, > > > > can you please quote your emails? I can't find any content > > inbetween all these quotes. > > Sorry, I'm using gmail, but I'll switch to attaching the logs. > > So with help from Xiaolong I was able to reproduce this, and it does > not appear to be a regression. We simply change the failure output of > an existing bug. Attached is a log of the same test on v4.10-rc7 > (i.e. without the recent block/scsi fixes), and it shows sda being > registered twice. > > "[6.647077] kobject (d5078ca4): tried to init an initialized > object, something is seriously wrong." > > The change that "scsi, block: fix duplicate bdi name registration > crashes" makes is to properly try to register sdb since the sda devt > is still alive. However that's not a fix because we've managed to > call blk_register_queue() twice on the same queue. OK, time to involve others: linux-scsi and linux-block cc'd and I've inserted the log below. James --- [5.969672] scsi host0: scsi_debug: version 1.86 [20160430] [5.969672] dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0 [5.971895] scsi 0:0:0:0: Direct-Access Linuxscsi_debug 0186 PQ: 0 ANSI: 7 [6.006983] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB) [6.026965] sd 0:0:0:0: [sda] Write Protect is off [6.027870] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 [6.066962] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [6.486962] sd 0:0:0:0: [sda] Attached SCSI disk [6.488377] sd 0:0:0:0: [sda] Synchronizing SCSI cache [6.489455] sd 0:0:0:0: Attached scsi generic sg0 type 0 [6.526982] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB) [6.546964] sd 0:0:0:0: [sda] Write Protect is off [6.547873] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08 [6.586963] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA [6.647077] kobject (d5078ca4): tried to init an initialized object, something is seriously wrong. [6.648723] CPU: 0 PID: 99 Comm: kworker/u2:1 Not tainted 4.10.0-rc7 #932 [6.649811] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 [6.651418] Workqueue: events_unbound async_run_entry_fn [6.652347] Call Trace: [6.652987] dump_stack+0x79/0xa4 [6.653716] kobject_init+0x75/0x90 [6.654452] blk_mq_register_dev+0x2a/0x110 [6.655269] blk_register_queue+0x7b/0x130 [6.656080] device_add_disk+0x1c6/0x460 [6.656866] sd_probe_async+0xf1/0x1c0 [6.657634] ? __lock_acquire.isra.14+0x43b/0x940 [6.658501] async_run_entry_fn+0x30/0x190 [6.659311] ? process_one_work+0x12f/0x430 [6.660113] process_one_work+0x1aa/0x430 [6.660901] ? process_one_work+0x12f/0x430 [6.661716] worker_thread+0x1dd/0x470 [6.662479] kthread+0xd4/0x100 [6.663175] ? process_one_work+0x430/0x430 [6.663984] ? __kthread_create_on_node+0x180/0x180 [6.664869] ret_from_fork+0x21/0x2c [6.665638] kobject (ffab51ec): tried to init an initialized object, something is seriously wrong. [6.667290] CPU: 0 PID: 99 Comm: kworker/u2:1 Not tainted 4.10.0-rc7 #932 [6.668372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 [6.669984] Workqueue: events_unbound async_run_entry_fn [6.670909] Call Trace: [6.671540] dump_stack+0x79/0xa4 [6.672266] kobject_init+0x75/0x90 [6.673011] blk_mq_register_dev+0x4c/0x110 [6.673832] blk_register_queue+0x7b/0x130 [6.674633] device_add_disk+0x1c6/0x460 [6.675413] sd_probe_async+0xf1/0x1c0 [6.676191] ? __lock_acquire.isra.14+0x43b/0x940 [6.677057] async_run_entry_fn+0x30/0x190 [6.677860] ? process_one_work+0x12f/0x430 [6.678667] process_one_work+0x1aa/0x430 [6.679455] ? process_one_work+0x12f/0x430 [6.680269] worker_thread+0x1dd/0x470 [6.681036] kthread+0xd4/0x100 [6.681737] ? process_one_work+0x430/0x430 [6.682540] ? __kthread_create_on_node+0x180/0x180 [6.683420] ret_from_fork+0x21/0x2c [6.684207] [ cut here ] [6.685067] WARNING: CPU: 0 PID: 99 at ./include/linux/kref.h:46 kobject_get+0x7f/0x90 [6.686592] CPU: 0 PID: 99 Comm: kworker/u2:1 Not tainted 4.10.0-rc7 #932 [6.687680] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 [6.689280] Workqueue: events_unbound async_run_entry_fn [6.690208] Call Trace: [6.690843] dump_stack+0x79/0xa4 [6.691563] __warn+0xd2/0xf0 [6.692246] ? kobject_get+0x7f/0x90 [6.692992] warn_slowpath_null+0x25/0x30 [6.693787] kobject_get+0x7f/0x90 [6.694505] kobject_add_internal+0x2e/0x360 [6.695322] ? kfree_const+0x18/0x20 [6.696071] ? kobject_set_name_vargs+0x62/0x80 [6.696914]