[block regression] kernel oops triggered by removing scsi device dring IO

2018-04-07 Thread Ming Lei
Hi,

The following kernel oops is triggered by 'removing scsi device' during
heavy IO.

'git bisect' shows that commit a063057d7c731cffa7d10740(block: Fix a race
between request queue removal and the block cgroup controller)
introduced this regression:

[   42.268257] BUG: unable to handle kernel NULL pointer dereference at 
0028
[   42.269339] PGD 26bd9f067 P4D 26bd9f067 PUD 26bfec067 PMD 0 
[   42.270077] Oops:  [#1] PREEMPT SMP NOPTI
[   42.270681] Dumping ftrace buffer:
[   42.271141](ftrace buffer empty)
[   42.271641] Modules linked in: scsi_debug iTCO_wdt iTCO_vendor_support 
crc32c_intel i2c_i801 i2c_core lpc_ich mfd_core usb_storage nvme shpchp 
nvme_core virtio_scsi qemu_fw_cfg ip_tables
[   42.273770] CPU: 5 PID: 1076 Comm: fio Not tainted 4.16.0+ #49
[   42.274530] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.10.2-2.fc27 04/01/2014
[   42.275634] RIP: 0010:blk_throtl_bio+0x41/0x904
[   42.276225] RSP: 0018:c900033cfaa0 EFLAGS: 00010246
[   42.276907] RAX: 8000 RBX: 8801bdcc5118 RCX: 0001
[   42.277818] RDX: 8801bdcc5118 RSI:  RDI: 8802641f8870
[   42.278733] RBP:  R08: 0001 R09: c900033cfb94
[   42.279651] R10: c900033cfc00 R11: 06ea R12: 8802641f8870
[   42.280567] R13: 88026f34f000 R14:  R15: 8801bdcc5118
[   42.281489] FS:  7fc123922d40() GS:880272f4() 
knlGS:
[   42.282525] CS:  0010 DS:  ES:  CR0: 80050033
[   42.283270] CR2: 0028 CR3: 00026d7ac004 CR4: 007606e0
[   42.284194] DR0:  DR1:  DR2: 
[   42.285116] DR3:  DR6: fffe0ff0 DR7: 0400
[   42.286036] PKRU: 5554
[   42.286393] Call Trace:
[   42.286725]  ? try_to_wake_up+0x3a3/0x3c9
[   42.287255]  ? blk_mq_hctx_notify_dead+0x135/0x135
[   42.287880]  ? gup_pud_range+0xb5/0x7e1
[   42.288381]  generic_make_request_checks+0x3cf/0x539
[   42.289027]  ? gup_pgd_range+0x8e/0xaa
[   42.289515]  generic_make_request+0x38/0x25b
[   42.290078]  ? submit_bio+0x103/0x11f
[   42.290555]  submit_bio+0x103/0x11f
[   42.291018]  ? bio_iov_iter_get_pages+0xe4/0x104
[   42.291620]  blkdev_direct_IO+0x2a3/0x3af
[   42.292151]  ? kiocb_free+0x34/0x34
[   42.292607]  ? ___preempt_schedule+0x16/0x18
[   42.293168]  ? preempt_schedule_common+0x4c/0x65
[   42.293771]  ? generic_file_read_iter+0x96/0x110
[   42.294377]  generic_file_read_iter+0x96/0x110
[   42.294962]  aio_read+0xca/0x13b
[   42.295388]  ? preempt_count_add+0x6d/0x8c
[   42.295926]  ? aio_read_events+0x287/0x2d6
[   42.296460]  ? do_io_submit+0x4d2/0x62c
[   42.296964]  do_io_submit+0x4d2/0x62c
[   42.297446]  ? do_syscall_64+0x9d/0x15e
[   42.297950]  do_syscall_64+0x9d/0x15e
[   42.298431]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   42.299090] RIP: 0033:0x7fc12244e687
[   42.299556] RSP: 002b:7ffe18388a68 EFLAGS: 0202 ORIG_RAX: 
00d1
[   42.300528] RAX: ffda RBX: 7fc0fde08670 RCX: 7fc12244e687
[   42.301442] RDX: 01d1b388 RSI: 0001 RDI: 7fc123782000
[   42.302359] RBP: 22d8 R08: 0001 R09: 01c461e0
[   42.303275] R10:  R11: 0202 R12: 7fc0fde08670
[   42.304195] R13:  R14: 01d1d0c0 R15: 01b872f0
[   42.305117] Code: 48 85 f6 48 89 7c 24 10 75 0e 48 8b b7 b8 05 00 00 31 ed 
48 85 f6 74 0f 48 63 05 75 a4 e4 00 48 8b ac c6 28 02 00 00 f6 43 15 02 <48> 8b 
45 28 48 89 04 24 0f 85 28 08 00 00 8b 43 10 45 31 e4 83 
[   42.307553] RIP: blk_throtl_bio+0x41/0x904 RSP: c900033cfaa0
[   42.308328] CR2: 0028
[   42.308920] ---[ end trace f53a144979f63b29 ]---
[   42.309520] Kernel panic - not syncing: Fatal exception
[   42.310635] Dumping ftrace buffer:
[   42.311087](ftrace buffer empty)
[   42.311583] Kernel Offset: disabled
[   42.312163] ---[ end Kernel panic - not syncing: Fatal exception ]---

-- 
Ming


Re: [PATCH V3 4/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-04-07 Thread Ming Lei
On Fri, Apr 06, 2018 at 11:49:47PM +0200, Thomas Gleixner wrote:
> On Fri, 6 Apr 2018, Thomas Gleixner wrote:
> 
> > On Fri, 6 Apr 2018, Ming Lei wrote:
> > > 
> > > I will post V4 soon by using cpu_present_mask in the 1st stage irq spread.
> > > And it should work fine for Kashyap's case in normal cases.
> > 
> > No need to resend. I've changed it already and will push it out after
> > lunch.
> 
> No. Lunch did not last that long :)
> 
> I pushed out the lot to
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core
> 
> Please double check the modifications I did. The first related commit fixes
> an existing error handling bug.

I think your modification is better, especially about adding comment in
irq_create_affinity_masks().

I also testes these patches again, and they just work fine.

Thanks,
Ming


Re: 4.15.14 crash with iscsi target and dvd

2018-04-07 Thread Bart Van Assche
On Sat, 2018-04-07 at 12:53 -0400, Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is 
> > > null.
> > > I added a dev_printk in scsi_print_command where the 2 if statements 
> > > return.
> > > Logs:
> > > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> > 
> > That's something that should never happen. As one can see in
> > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > that pointer. Since I have not yet been able to reproduce myself what you
> > reported, would it be possible for you to bisect this issue? You will need
> > to follow something like the following procedure (see also
> > https://git-scm.com/docs/git-bisect):
> 
> After doing 3 successful compiles with good/bad, I got this error and was
> not able to compile any more kernels:
>   CC  scripts/mod/devicetable-offsets.s
> scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
>  /* empty file to figure out endianness / word size */
>  
> scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not 
> support PIC mode
>  #include 
>  
> scripts/Makefile.build:153: recipe for target 
> 'scripts/mod/devicetable-offsets.s' failed
> 
> I don't think it found the bad commit.

Have you tried to modify the kernel Makefile as indicated in the following
e-mail? This should make the kernel build:

https://lists.ubuntu.com/archives/kernel-team/2016-May/077178.html

Thanks,

Bart.

Re: 4.15.14 crash with iscsi target and dvd

2018-04-07 Thread Wakko Warner
Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is 
> > > null.
> > > I added a dev_printk in scsi_print_command where the 2 if statements 
> > > return.
> > > Logs:
> > > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> > 
> > That's something that should never happen. As one can see in
> > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > that pointer. Since I have not yet been able to reproduce myself what you
> > reported, would it be possible for you to bisect this issue? You will need
> > to follow something like the following procedure (see also
> > https://git-scm.com/docs/git-bisect):
> 
> After doing 3 successful compiles with good/bad, I got this error and was
> not able to compile any more kernels:
>   CC  scripts/mod/devicetable-offsets.s
> scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
>  /* empty file to figure out endianness / word size */
>  
> scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not 
> support PIC mode
>  #include 
>  
> scripts/Makefile.build:153: recipe for target 
> 'scripts/mod/devicetable-offsets.s' failed
> 
> I don't think it found the bad commit.

I forgot to mention my gcc version.
gcc (Debian 6.2.1-7) 6.2.1 20161215

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: 4.15.14 crash with iscsi target and dvd

2018-04-07 Thread Wakko Warner
Bart Van Assche wrote:
> On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > I know now why scsi_print_command isn't doing anything.  cmd->cmnd is null.
> > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > Logs:
> > [  29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> 
> That's something that should never happen. As one can see in
> scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> that pointer. Since I have not yet been able to reproduce myself what you
> reported, would it be possible for you to bisect this issue? You will need
> to follow something like the following procedure (see also
> https://git-scm.com/docs/git-bisect):

After doing 3 successful compiles with good/bad, I got this error and was
not able to compile any more kernels:
  CC  scripts/mod/devicetable-offsets.s
scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
 /* empty file to figure out endianness / word size */
 
scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not 
support PIC mode
 #include 
 
scripts/Makefile.build:153: recipe for target 
'scripts/mod/devicetable-offsets.s' failed

I don't think it found the bad commit.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.


Re: [PATCH] loop: fix LOOP_GET_STATUS lock imbalance

2018-04-07 Thread Dmitry Vyukov
On Sat, Apr 7, 2018 at 9:27 AM, Tetsuo Handa
 wrote:
> Omar Sandoval wrote:
>> From: Omar Sandoval 
>>
>> Commit 2d1d4c1e591f made loop_get_status() drop lo_ctx_mutex before
>> returning, but the loop_get_status_old(), loop_get_status64(), and
>> loop_get_status_compat() wrappers don't call loop_get_status() if the
>> passed argument is NULL. The callers expect that the lock is dropped, so
>> make sure we drop it in that case, too.
>>
>> Reported-by: syzbot+31e8daa8b3fc129e7...@syzkaller.appspotmail.com
>
> Well, it is me who reported this bug before syzbot reports it. ;-)

Robots are taking our jobs! :)

We could notify syzbot with just "#syz fix" tag in the report email,
rather than putting it into Reported-by.


>> Fixes: 2d1d4c1e591f ("loop: don't call into filesystem while holding 
>> lo_ctl_mutex")
>
> But I feel we should revert 2d1d4c1e591f rather than applying this patch.
>
>> Signed-off-by: Omar Sandoval 
>
> If the reason of dropping the lock is to avoid deadlock caused by recursing
> into the same lock from vfs_getattr(), it is correct thing to drop the lock.
>
> But when the reason is that vfs_getattr() cannot return when NFS server is
> dead, there is no point with dropping the lock. Anybody who tries to call
> loop_get_status() will get stuck. It is commit 3148ffbdb9162baa ("loop:
> use killable lock in ioctls") which actually helps minimizing number of
> stuck processes when NFS server is dead if we didn't drop the lock.
> If we drop the lock before calling vfs_getattr(), all threads who called
> loop_get_status() will reach vfs_getattr() and get stuck, won't it?


[PATCH v2] blk-cgroup: remove entries in blkg_tree before queue release

2018-04-07 Thread Alexandru Moise
The q->id is used as an index within the blkg_tree radix tree.

If the entry is not released before reclaiming the blk_queue_ida's id
blkcg_init_queue() within a different driver from which this id
was originally for can fail due to the entry at that index within
the radix tree already existing.

Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com>
---
v2: Added no-op for !CONFIG_BLK_CGROUP

 block/blk-cgroup.c | 2 +-
 block/blk-sysfs.c  | 4 
 include/linux/blk-cgroup.h | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 1c16694ae145..224e937dbb59 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -369,7 +369,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
  *
  * Destroy all blkgs associated with @q.
  */
-static void blkg_destroy_all(struct request_queue *q)
+void blkg_destroy_all(struct request_queue *q)
 {
struct blkcg_gq *blkg, *n;
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index d00d1b0ec109..a72866458f22 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -816,6 +816,10 @@ static void __blk_release_queue(struct work_struct *work)
if (q->bio_split)
bioset_free(q->bio_split);
 
+   spin_lock_irq(q->queue_lock);
+   blkg_destroy_all(q);
+   spin_unlock_irq(q->queue_lock);
+
ida_simple_remove(_queue_ida, q->id);
call_rcu(>rcu_head, blk_free_queue_rcu);
 }
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 6c666fd7de3c..3d60b1d1973d 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -179,6 +179,8 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
 int blkcg_init_queue(struct request_queue *q);
 void blkcg_drain_queue(struct request_queue *q);
 void blkcg_exit_queue(struct request_queue *q);
+void blkg_destroy_all(struct request_queue *q);
+
 
 /* Blkio controller policy registration */
 int blkcg_policy_register(struct blkcg_policy *pol);
@@ -740,6 +742,7 @@ static inline struct blkcg_gq *blkg_lookup(struct blkcg 
*blkcg, void *key) { ret
 static inline int blkcg_init_queue(struct request_queue *q) { return 0; }
 static inline void blkcg_drain_queue(struct request_queue *q) { }
 static inline void blkcg_exit_queue(struct request_queue *q) { }
+static inline void blkg_destroy_all(struct request_queue *q) { }
 static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; }
 static inline void blkcg_policy_unregister(struct blkcg_policy *pol) { }
 static inline int blkcg_activate_policy(struct request_queue *q,
-- 
2.16.2



Re: [RFC PATCH 0/2] use larger max_request_size for virtio_blk

2018-04-07 Thread Weiping Zhang
2018-04-05 22:29 GMT+08:00 Jens Axboe :
> On 4/5/18 4:09 AM, Weiping Zhang wrote:
>> Hi,
>>
>> For virtio block device, actually there is no a hard limit for max request
>> size, and virtio_blk driver set -1 to blk_queue_max_hw_sectors(q, -1U);.
>> But it doesn't work, because there is a default upper limitation
>> BLK_DEF_MAX_SECTORS (1280 sectors). So this series want to add a new helper
>> blk_queue_max_hw_sectors_no_limit to set a proper max reqeust size.
>>
>> Weiping Zhang (2):
>>   blk-setting: add new helper blk_queue_max_hw_sectors_no_limit
>>   virtio_blk: add new module parameter to set max request size
>>
>>  block/blk-settings.c   | 20 
>>  drivers/block/virtio_blk.c | 32 ++--
>>  include/linux/blkdev.h |  2 ++
>>  3 files changed, 52 insertions(+), 2 deletions(-)
>
> The driver should just use blk_queue_max_hw_sectors() to set the limit,
> and then the soft limit can be modified by a udev rule. Technically the
> driver doesn't own the software limit, it's imposed to ensure that we
> don't introduce too much latency per request.
>
> Your situation is no different from many other setups, where the
> hw limit is much higher than the default 1280k.
>
Hi Martin, Jens,

It seems more reasonable to change software limitation by udev rule,
thanks you.

>
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] loop: fix LOOP_GET_STATUS lock imbalance

2018-04-07 Thread Tetsuo Handa
Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> Commit 2d1d4c1e591f made loop_get_status() drop lo_ctx_mutex before
> returning, but the loop_get_status_old(), loop_get_status64(), and
> loop_get_status_compat() wrappers don't call loop_get_status() if the
> passed argument is NULL. The callers expect that the lock is dropped, so
> make sure we drop it in that case, too.
> 
> Reported-by: syzbot+31e8daa8b3fc129e7...@syzkaller.appspotmail.com

Well, it is me who reported this bug before syzbot reports it. ;-)

> Fixes: 2d1d4c1e591f ("loop: don't call into filesystem while holding 
> lo_ctl_mutex")

But I feel we should revert 2d1d4c1e591f rather than applying this patch.

> Signed-off-by: Omar Sandoval 

If the reason of dropping the lock is to avoid deadlock caused by recursing
into the same lock from vfs_getattr(), it is correct thing to drop the lock.

But when the reason is that vfs_getattr() cannot return when NFS server is
dead, there is no point with dropping the lock. Anybody who tries to call
loop_get_status() will get stuck. It is commit 3148ffbdb9162baa ("loop:
use killable lock in ioctls") which actually helps minimizing number of
stuck processes when NFS server is dead if we didn't drop the lock.
If we drop the lock before calling vfs_getattr(), all threads who called
loop_get_status() will reach vfs_getattr() and get stuck, won't it?