Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Guoqing Jiang



On 01/10/2018 02:13 PM, Paolo Valente wrote:



Il giorno 10 gen 2018, alle ore 02:41, Guoqing Jiang <gqji...@suse.com> ha 
scritto:



On 01/09/2018 05:27 PM, Paolo Valente wrote:

For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it.  On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists.  Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.

The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.

This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.

Reported-by: Holger Hoffstätte <hol...@applied-asynchrony.com>
Reported-by: Guoqing Jiang <gqji...@suse.com>
Signed-off-by: Davide Ferrari <davideferra...@gmail.com>
Signed-off-by: Paolo Valente <paolo.vale...@linaro.org>
---
  block/bfq-cgroup.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..d819dc77fe65 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
unsigned long flags;
int i;
  + spin_lock_irqsave(>lock, flags);
+
if (!entity) /* root group */
-   return;
+   goto put_async_queues;
  - spin_lock_irqsave(>lock, flags);
/*
 * Empty all service_trees belonging to this group before
 * deactivating the group itself.
@@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
}
__bfq_deactivate_entity(entity, false);
+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
spin_unlock_irqrestore(>lock, flags);

With this change, bfqg_stats_xfer_dead will be called even entity is not 
existed,

Actually, the entity associated with the bfq group being offlined
exists even if the local variable entity is NULL here.  Simply, that
variable gets NULL if the bfq group is the bfq root group for a
device.


is it necessary?

No, I opted for this solution to not shake the code too much, and
considering that
- bfqg_stats_xfer_dead simply does nothing if applied
to a root group
- who knows, in the future that function may need
do be invoked for a root group too.

Yet, if you guys think that it would be cleaner to not invoke
bfqg_stats_xfer_dead at all for a root group, I'll change the code
accordingly (this would introduce a little asymmetry with
cfq_pd_offline, which invokes cfqg_stats_xfer_dead unconditionally).


Thanks a lot for the explanation, I am fine with it.

Acked-by: Guoqing Jiang <gqji...@suse.com>

Regards,
Guoqing


Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

2018-01-09 Thread Guoqing Jiang



On 01/09/2018 05:27 PM, Paolo Valente wrote:

For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it.  On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists.  Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.

The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.

This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.

Reported-by: Holger Hoffstätte <hol...@applied-asynchrony.com>
Reported-by: Guoqing Jiang <gqji...@suse.com>
Signed-off-by: Davide Ferrari <davideferra...@gmail.com>
Signed-off-by: Paolo Valente <paolo.vale...@linaro.org>
---
  block/bfq-cgroup.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..d819dc77fe65 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
unsigned long flags;
int i;
  
+	spin_lock_irqsave(>lock, flags);

+
if (!entity) /* root group */
-   return;
+   goto put_async_queues;
  
-	spin_lock_irqsave(>lock, flags);

/*
 * Empty all service_trees belonging to this group before
 * deactivating the group itself.
@@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
}
  
  	__bfq_deactivate_entity(entity, false);

+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
  
  	spin_unlock_irqrestore(>lock, flags);


With this change, bfqg_stats_xfer_dead will be called even entity is not 
existed,

is it necessary? Thanks.

Regards,
Guoqing


Re: Bug report - issue with bfq?

2018-01-03 Thread Guoqing Jiang



On 01/03/2018 03:44 PM, Paolo Valente wrote:



Il giorno 03 gen 2018, alle ore 04:58, Guoqing Jiang <gqji...@suse.com> ha 
scritto:

Hi,


Hi


In my test, I found some issues when try bfq with xfs.
The test basically just set the disk's scheduler to bfq,
create xfs on top of it, mount fs and write something,
then umount the fs. After several rounds of iteration,
I can see different calltraces appeared.

For example, the one which happened frequently:

Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Mounting V5 Filesystem
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Ending clean mount
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Unmounting Filesystem
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Mounting V5 Filesystem
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Ending clean mount
Jan 03 11:35:19 linux-mainline kernel: BUG: unable to handle kernel paging 
request at 00029ec0
Jan 03 11:35:19 linux-mainline kernel: IP: __mod_node_page_state+0x5/0x50
Jan 03 11:35:19 linux-mainline kernel: PGD 0 P4D 0
Jan 03 11:35:19 linux-mainline kernel: Oops:  [#1] SMP KASAN
Jan 03 11:35:19 linux-mainline kernel: Modules linked in: bfq(E) joydev(E) 
uinput(E) fuse(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) 
snd_hda_codec_generic(E) crct10dif_pclmul(E) crc32_pclmul(E) xfs(E) 
ghash_clmulni_intel(E) libcrc32c(E) crc32c_intel(E) pcbc(E) snd_hda_intel(E) 
snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) ppdev(E) 
aesni_intel(E) snd_timer(E) aes_x86_64(E) crypto_simd(E) snd(E) glue_helper(E) 
cryptd(E) pcspkr(E) virtio_balloon(E) virtio_net(E) parport_pc(E) parport(E) 
soundcore(E) i2c_piix4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) virtio_console(E) 
virtio_rng(E) virtio_blk(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) 
floppy(E) ehci_pci(E) qxl(E) serio_raw(E) drm_kms_helper(E) syscopyarea(E) 
sysfillrect(E) sysimgblt(E) fb_sys_fops(E) sym53c8xx(E) scsi_transport_spi(E) 
button(E) libata(E)
Jan 03 11:35:19 linux-mainline kernel:  ttm(E) drm(E) uhci_hcd(E) ehci_hcd(E) 
usbcore(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) dm_multipath(E) 
dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
Jan 03 11:35:19 linux-mainline kernel: CPU: 0 PID: 3349 Comm: ps Tainted: G 
   E4.15.0-rc1-69-default #1
Jan 03 11:35:19 linux-mainline kernel: Hardware name: QEMU Standard PC (i440FX 
+ PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
Jan 03 11:35:19 linux-mainline kernel: task: 880061efce80 task.stack: 
880058bd
Jan 03 11:35:19 linux-mainline kernel: RIP: 0010:__mod_node_page_state+0x5/0x50
Jan 03 11:35:19 linux-mainline kernel: RSP: 0018:880058bd7ce8 EFLAGS: 
00010a07
Jan 03 11:35:19 linux-mainline kernel: RAX: 03ff RBX: 
ea00011a3d80 RCX: 011a3d80
Jan 03 11:35:19 linux-mainline kernel: RDX:  RSI: 
000d RDI: 
Jan 03 11:35:19 linux-mainline kernel: RBP:  R08: 
88006378a630 R09: 880058bd7d98
Jan 03 11:35:19 linux-mainline kernel: R10: 7f7f4d806280 R11: 
 R12: ea00011a3d80
Jan 03 11:35:19 linux-mainline kernel: R13: 7f7f4f318000 R14: 
7f7f4f31c000 R15: 880058bd7e18
Jan 03 11:35:19 linux-mainline kernel: FS:  () 
GS:880066c0() knlGS:
Jan 03 11:35:19 linux-mainline kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Jan 03 11:35:19 linux-mainline kernel: CR2: 00029ec0 CR3: 
01c0d006 CR4: 001606f0
Jan 03 11:35:19 linux-mainline kernel: Call Trace:
Jan 03 11:35:19 linux-mainline kernel: page_remove_rmap+0x11a/0x2b0
Jan 03 11:35:19 linux-mainline kernel: unmap_page_range+0x547/0xa30
Jan 03 11:35:19 linux-mainline kernel:  unmap_vmas+0x42/0x90
Jan 03 11:35:19 linux-mainline kernel:  exit_mmap+0x86/0x180
Jan 03 11:35:19 linux-mainline kernel:  mmput+0x4a/0x110
Jan 03 11:35:19 linux-mainline kernel:  do_exit+0x25d/0xae0
Jan 03 11:35:19 linux-mainline kernel:  do_group_exit+0x39/0xa0
Jan 03 11:35:19 linux-mainline kernel:  SyS_exit_group+0x10/0x10
Jan 03 11:35:19 linux-mainline kernel: entry_SYSCALL_64_fastpath+0x1a/0x7d
Jan 03 11:35:19 linux-mainline kernel: RIP: 0033:0x7f7f4eb8c338
Jan 03 11:35:19 linux-mainline kernel: RSP: 002b:7ffca4400d48 EFLAGS: 
0246 ORIG_RAX: 00e7
Jan 03 11:35:19 linux-mainline kernel: RAX: ffda RBX: 
 RCX: 7f7f4eb8c338
Jan 03 11:35:19 linux-mainline kernel: RDX:  RSI: 
003c RDI: 
Jan 03 11:35:19 linux-mainline kernel: RBP: 7ffca4400d40 R08: 
00e7 R09: fef8
Jan 03 11:35:19 linux-mainline kernel: R10: 7f7f4d806280 R11: 
0246 R12: 7f7f4f756000
Jan 03 11:35:19 linux-mainline kernel: R13: 7ffca4400cc8 R14: 
7f7f4f732b20 R15: 7f7f4d5ebc70
Jan 03 11:35:19 linux-mainline kernel: Code: f7 d9 48 39 ca 7c 05 65 88 50 0f c3 f0 
48 01 94

Bug report - issue with bfq?

2018-01-02 Thread Guoqing Jiang

Hi,

In my test, I found some issues when try bfq with xfs.
The test basically just set the disk's scheduler to bfq,
create xfs on top of it, mount fs and write something,
then umount the fs. After several rounds of iteration,
I can see different calltraces appeared.

For example, the one which happened frequently:

Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Mounting V5 Filesystem
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Ending clean mount
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Unmounting Filesystem
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Mounting V5 Filesystem
Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Ending clean mount
Jan 03 11:35:19 linux-mainline kernel: BUG: unable to handle kernel 
paging request at 00029ec0

Jan 03 11:35:19 linux-mainline kernel: IP: __mod_node_page_state+0x5/0x50
Jan 03 11:35:19 linux-mainline kernel: PGD 0 P4D 0
Jan 03 11:35:19 linux-mainline kernel: Oops:  [#1] SMP KASAN
Jan 03 11:35:19 linux-mainline kernel: Modules linked in: bfq(E) 
joydev(E) uinput(E) fuse(E) af_packet(E) iscsi_ibft(E) 
iscsi_boot_sysfs(E) snd_hda_codec_generic(E) crct10dif_pclmul(E) 
crc32_pclmul(E) xfs(E) ghash_clmulni_intel(E) libcrc32c(E) 
crc32c_intel(E) pcbc(E) snd_hda_intel(E) snd_hda_codec(E) 
snd_hda_core(E) snd_hwdep(E) snd_pcm(E) ppdev(E) aesni_intel(E) 
snd_timer(E) aes_x86_64(E) crypto_simd(E) snd(E) glue_helper(E) 
cryptd(E) pcspkr(E) virtio_balloon(E) virtio_net(E) parport_pc(E) 
parport(E) soundcore(E) i2c_piix4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) 
virtio_console(E) virtio_rng(E) virtio_blk(E) ata_generic(E) ata_piix(E) 
ahci(E) libahci(E) floppy(E) ehci_pci(E) qxl(E) serio_raw(E) 
drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) 
fb_sys_fops(E) sym53c8xx(E) scsi_transport_spi(E) button(E) libata(E)
Jan 03 11:35:19 linux-mainline kernel:  ttm(E) drm(E) uhci_hcd(E) 
ehci_hcd(E) usbcore(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) 
dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
scsi_mod(E) autofs4(E)
Jan 03 11:35:19 linux-mainline kernel: CPU: 0 PID: 3349 Comm: ps 
Tainted: G    E    4.15.0-rc1-69-default #1
Jan 03 11:35:19 linux-mainline kernel: Hardware name: QEMU Standard PC 
(i440FX + PIIX, 1996), BIOS 
rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
Jan 03 11:35:19 linux-mainline kernel: task: 880061efce80 
task.stack: 880058bd
Jan 03 11:35:19 linux-mainline kernel: RIP: 
0010:__mod_node_page_state+0x5/0x50
Jan 03 11:35:19 linux-mainline kernel: RSP: 0018:880058bd7ce8 
EFLAGS: 00010a07
Jan 03 11:35:19 linux-mainline kernel: RAX: 03ff RBX: 
ea00011a3d80 RCX: 011a3d80
Jan 03 11:35:19 linux-mainline kernel: RDX:  RSI: 
000d RDI: 
Jan 03 11:35:19 linux-mainline kernel: RBP:  R08: 
88006378a630 R09: 880058bd7d98
Jan 03 11:35:19 linux-mainline kernel: R10: 7f7f4d806280 R11: 
 R12: ea00011a3d80
Jan 03 11:35:19 linux-mainline kernel: R13: 7f7f4f318000 R14: 
7f7f4f31c000 R15: 880058bd7e18
Jan 03 11:35:19 linux-mainline kernel: FS:  () 
GS:880066c0() knlGS:
Jan 03 11:35:19 linux-mainline kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Jan 03 11:35:19 linux-mainline kernel: CR2: 00029ec0 CR3: 
01c0d006 CR4: 001606f0

Jan 03 11:35:19 linux-mainline kernel: Call Trace:
Jan 03 11:35:19 linux-mainline kernel: page_remove_rmap+0x11a/0x2b0
Jan 03 11:35:19 linux-mainline kernel: unmap_page_range+0x547/0xa30
Jan 03 11:35:19 linux-mainline kernel:  unmap_vmas+0x42/0x90
Jan 03 11:35:19 linux-mainline kernel:  exit_mmap+0x86/0x180
Jan 03 11:35:19 linux-mainline kernel:  mmput+0x4a/0x110
Jan 03 11:35:19 linux-mainline kernel:  do_exit+0x25d/0xae0
Jan 03 11:35:19 linux-mainline kernel:  do_group_exit+0x39/0xa0
Jan 03 11:35:19 linux-mainline kernel:  SyS_exit_group+0x10/0x10
Jan 03 11:35:19 linux-mainline kernel: entry_SYSCALL_64_fastpath+0x1a/0x7d
Jan 03 11:35:19 linux-mainline kernel: RIP: 0033:0x7f7f4eb8c338
Jan 03 11:35:19 linux-mainline kernel: RSP: 002b:7ffca4400d48 
EFLAGS: 0246 ORIG_RAX: 00e7
Jan 03 11:35:19 linux-mainline kernel: RAX: ffda RBX: 
 RCX: 7f7f4eb8c338
Jan 03 11:35:19 linux-mainline kernel: RDX:  RSI: 
003c RDI: 
Jan 03 11:35:19 linux-mainline kernel: RBP: 7ffca4400d40 R08: 
00e7 R09: fef8
Jan 03 11:35:19 linux-mainline kernel: R10: 7f7f4d806280 R11: 
0246 R12: 7f7f4f756000
Jan 03 11:35:19 linux-mainline kernel: R13: 7ffca4400cc8 R14: 
7f7f4f732b20 R15: 7f7f4d5ebc70
Jan 03 11:35:19 linux-mainline kernel: Code: f7 d9 48 39 ca 7c 05 65 88 
50 0f c3 f0 48 01 94 f7 00 05 00 00 f0 48 01 14 f5 c0 c4 c0 81 31 d2 eb 
e5 0f 1f 40 00 0f 1f 44 00 00 <48> 8b 8f c0 9e 02 00 89 f6 48 8d 04 31 
65 44 8a 40 01 4d 

Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod

2017-12-21 Thread Guoqing Jiang



On 12/21/2017 03:53 PM, Paolo Valente wrote:



Il giorno 21 dic 2017, alle ore 08:08, Guoqing Jiang <gqji...@suse.com> ha 
scritto:

Hi,


On 12/08/2017 08:34 AM, Holger Hoffstätte wrote:

So plugging in a device on USB with BFQ as scheduler now works without
hiccup (probably thanks to Ming Lei's last patch), but of course I found
another problem. Unmounting the device after use, changing the scheduler
back to deadline or kyber and rmmod'ing the BFQ module reproducibly gives me:

kernel: 
=
kernel: BUG bfq_queue (Tainted: GB  ): Objects remaining in 
bfq_queue on __kmem_cache_shutdown()
kernel: 
-
kernel:
kernel: INFO: Slab 0xea001601fc00 objects=37 used=3 fp=0x8805807f0360 
flags=0x80008100
kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: GB   4.14.5 #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, 
BIOS F1 05/06/2011
kernel: Call Trace:
kernel:  dump_stack+0x46/0x5e
kernel:  slab_err+0x9e/0xb0
kernel:  ? on_each_cpu_mask+0x35/0x40
kernel:  ? ksm_migrate_page+0x60/0x60
kernel:  ? __kmalloc+0x1c9/0x1d0
kernel:  __kmem_cache_shutdown+0x177/0x350
kernel:  shutdown_cache+0xf/0x130
kernel:  kmem_cache_destroy+0x19e/0x1b0
kernel:  SyS_delete_module+0x168/0x230
kernel:  ? exit_to_usermode_loop+0x39/0x80
kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
kernel: RIP: 0033:0x7f53e4136b97
kernel: RSP: 002b:7ffd660061d8 EFLAGS: 0206 ORIG_RAX: 00b0
kernel: RAX: ffda RBX: 0003 RCX: 7f53e4136b97
kernel: RDX: 000a RSI: 0800 RDI: 006247f8
kernel: RBP:  R08: 7ffd66005171 R09: 
kernel: R10: 7f53e41acbc0 R11: 0206 R12: 00624790
kernel: R13: 7ffd660051d0 R14: 00624790 R15: 00623260
kernel: INFO: Object 0x8805807f @offset=0
kernel: INFO: Object 0x8805807f01b0 @offset=432
kernel: INFO: Object 0x8805807f3cc0 @offset=15552
kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: GB   4.14.5 #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, 
BIOS F1 05/06/2011
kernel: Call Trace:
kernel:  dump_stack+0x46/0x5e
kernel:  kmem_cache_destroy+0x191/0x1b0
kernel:  SyS_delete_module+0x168/0x230
kernel:  ? exit_to_usermode_loop+0x39/0x80
kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
kernel: RIP: 0033:0x7f53e4136b97
kernel: RSP: 002b:7ffd660061d8 EFLAGS: 0206 ORIG_RAX: 00b0
kernel: RAX: ffda RBX: 0003 RCX: 7f53e4136b97
kernel: RDX: 000a RSI: 0800 RDI: 006247f8
kernel: RBP:  R08: 7ffd66005171 R09: 
kernel: R10: 7f53e41acbc0 R11: 0206 R12: 00624790
kernel: R13: 7ffd660051d0 R14: 00624790 R15: 00623260


I also encountered the similar bug, seems there are some async objects which 
are not freed with
BFQ_GROUP_IOSCHED build as "Y".

If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c ("block, bfq: 
add full hierarchical
scheduling and cgroups support")like below, then I can't see the bug again with 
simple test.

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 889a8549d97f..c640e64e042f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
 spin_lock_irq(>lock);
 list_for_each_entry_safe(bfqq, n, >idle_list, bfqq_list)
 bfq_deactivate_bfqq(bfqd, bfqq, false, false);
+   bfq_put_async_queues(bfqd, bfqd->root_group);
 spin_unlock_irq(>lock);

 hrtimer_cancel(>idle_slice_timer);
@@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct elevator_queue *e)
 blkcg_deactivate_policy(bfqd->queue, _policy_bfq);
  #else
 spin_lock_irq(>lock);
-   bfq_put_async_queues(bfqd, bfqd->root_group);
 kfree(bfqd->root_group);
 spin_unlock_irq(>lock);

But perhaps we should do it inside blkcg_deactivate_policy, right?


Thank you very much for investigating this. And yes, I removed that 
bfq_put_async_queues invocation a long ago, because it had to be done in 
blkcg_deactivate_policy; and actually it was invoked as expected. We are trying 
to understand what is going wrong now.


Thanks for the in formations. I find another way, does it make sense?

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..0a070daf96c7 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -474,7 +474,10 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
 static void bfq_pd_free(struct blkg_policy_data *pd)
 {
    struct bfq_group *bfqg = pd_to_bfqg(pd);
+

Re: bfq: BUG bfq_queue: Objects remaining in bfq_queue on __kmem_cache_shutdown() after rmmod

2017-12-20 Thread Guoqing Jiang

Hi,


On 12/08/2017 08:34 AM, Holger Hoffstätte wrote:

So plugging in a device on USB with BFQ as scheduler now works without
hiccup (probably thanks to Ming Lei's last patch), but of course I found
another problem. Unmounting the device after use, changing the scheduler
back to deadline or kyber and rmmod'ing the BFQ module reproducibly gives me:

kernel: 
=
kernel: BUG bfq_queue (Tainted: GB  ): Objects remaining in 
bfq_queue on __kmem_cache_shutdown()
kernel: 
-
kernel:
kernel: INFO: Slab 0xea001601fc00 objects=37 used=3 fp=0x8805807f0360 
flags=0x80008100
kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: GB   4.14.5 #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, 
BIOS F1 05/06/2011
kernel: Call Trace:
kernel:  dump_stack+0x46/0x5e
kernel:  slab_err+0x9e/0xb0
kernel:  ? on_each_cpu_mask+0x35/0x40
kernel:  ? ksm_migrate_page+0x60/0x60
kernel:  ? __kmalloc+0x1c9/0x1d0
kernel:  __kmem_cache_shutdown+0x177/0x350
kernel:  shutdown_cache+0xf/0x130
kernel:  kmem_cache_destroy+0x19e/0x1b0
kernel:  SyS_delete_module+0x168/0x230
kernel:  ? exit_to_usermode_loop+0x39/0x80
kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
kernel: RIP: 0033:0x7f53e4136b97
kernel: RSP: 002b:7ffd660061d8 EFLAGS: 0206 ORIG_RAX: 00b0
kernel: RAX: ffda RBX: 0003 RCX: 7f53e4136b97
kernel: RDX: 000a RSI: 0800 RDI: 006247f8
kernel: RBP:  R08: 7ffd66005171 R09: 
kernel: R10: 7f53e41acbc0 R11: 0206 R12: 00624790
kernel: R13: 7ffd660051d0 R14: 00624790 R15: 00623260
kernel: INFO: Object 0x8805807f @offset=0
kernel: INFO: Object 0x8805807f01b0 @offset=432
kernel: INFO: Object 0x8805807f3cc0 @offset=15552
kernel: kmem_cache_destroy bfq_queue: Slab cache still has objects
kernel: CPU: 0 PID: 9967 Comm: rmmod Tainted: GB   4.14.5 #1
kernel: Hardware name: Gigabyte Technology Co., Ltd. P67-DS3-B3/P67-DS3-B3, 
BIOS F1 05/06/2011
kernel: Call Trace:
kernel:  dump_stack+0x46/0x5e
kernel:  kmem_cache_destroy+0x191/0x1b0
kernel:  SyS_delete_module+0x168/0x230
kernel:  ? exit_to_usermode_loop+0x39/0x80
kernel:  entry_SYSCALL_64_fastpath+0x13/0x94
kernel: RIP: 0033:0x7f53e4136b97
kernel: RSP: 002b:7ffd660061d8 EFLAGS: 0206 ORIG_RAX: 00b0
kernel: RAX: ffda RBX: 0003 RCX: 7f53e4136b97
kernel: RDX: 000a RSI: 0800 RDI: 006247f8
kernel: RBP:  R08: 7ffd66005171 R09: 
kernel: R10: 7f53e41acbc0 R11: 0206 R12: 00624790
kernel: R13: 7ffd660051d0 R14: 00624790 R15: 00623260



I also encountered the similar bug, seems there are some async objects 
which are not freed with

BFQ_GROUP_IOSCHED build as "Y".

If revert part of commit e21b7a0b988772e82e7147e1c659a5afe2ae003c 
("block, bfq: add full hierarchical
scheduling and cgroups support")like below, then I can't see the bug 
again with simple test.


diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 889a8549d97f..c640e64e042f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4710,6 +4710,7 @@ static void bfq_exit_queue(struct elevator_queue *e)
    spin_lock_irq(>lock);
    list_for_each_entry_safe(bfqq, n, >idle_list, bfqq_list)
    bfq_deactivate_bfqq(bfqd, bfqq, false, false);
+   bfq_put_async_queues(bfqd, bfqd->root_group);
    spin_unlock_irq(>lock);

    hrtimer_cancel(>idle_slice_timer);
@@ -4718,7 +4719,6 @@ static void bfq_exit_queue(struct elevator_queue *e)
    blkcg_deactivate_policy(bfqd->queue, _policy_bfq);
 #else
    spin_lock_irq(>lock);
-   bfq_put_async_queues(bfqd, bfqd->root_group);
    kfree(bfqd->root_group);
    spin_unlock_irq(>lock);

But perhaps we should do it inside blkcg_deactivate_policy, right?

Thanks,
Guoqing


Re: [PATCH] badblocks: fix wrong return value in badblocks_set if badblocks are disabled

2017-09-28 Thread Guoqing Jiang



On 09/29/2017 02:45 AM, Liu Bo wrote:

On Thu, Sep 28, 2017 at 09:57:41AM +0800, Guoqing Jiang wrote:


On 09/28/2017 06:13 AM, Liu Bo wrote:

MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if
badblocks are disabled, otherwise, rdev_set_badblocks() will record
superblock changes and return success in that case and md will fail to
report an IO error which it should.

This bug has existed since badblocks were introduced in commit
9e0e252a048b ("badblocks: Add core badblock management code").

I don't think we need to change it since it originally was return 0 in
original
commit.

The difference is that the original md_set_badblocks() returns 0 for
both "being disabled" and "no room", while now badblocks_set() returns
1 for "no room" as a failure but 0 for "being disabled".


Thanks for point it out, you are right, and the below comments said:

 * Return:
 *  0: success
 *  1: failed to set badblocks (out of space)

Maybe it is better to add "badblocks are disabled" to above comment.

Anyway, Acked-by: Guoqing Jiang <gqji...@suse.com>

Regards,
Guoqing


Re: [PATCH] badblocks: fix wrong return value in badblocks_set if badblocks are disabled

2017-09-27 Thread Guoqing Jiang



On 09/28/2017 06:13 AM, Liu Bo wrote:

MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if
badblocks are disabled, otherwise, rdev_set_badblocks() will record
superblock changes and return success in that case and md will fail to
report an IO error which it should.

This bug has existed since badblocks were introduced in commit
9e0e252a048b ("badblocks: Add core badblock management code").


I don't think we need to change it since it originally was return 0 in 
original

commit.

commit 2230dfe4ccc3add340dc6d437965b2de1d269fde
Author: NeilBrown 
Date:   Thu Jul 28 11:31:46 2011 +1000

md: beginnings of bad block management.

This the first step in allowing md to track bad-blocks per-device so
that we can fail individual blocks rather than the whole device.

This patch just adds a data structure for recording bad blocks, with
routines to add, remove, search the list.

Signed-off-by: NeilBrown 
Reviewed-by: Namhyung Kim 
+static int md_set_badblocks(struct badblocks *bb, sector_t s, int sectors,
+   int acknowledged)
+{
+   u64 *p;
+   int lo, hi;
+   int rv = 1;
+
+   if (bb->shift < 0)
+   /* badblocks are disabled */
+   return 0;

After a quick glance, I guess we need to fix it inside md, since below 
change

seems not correct in fc974ee2bffdde47d1e4b220cf326952cc2c4794.

@@ -8734,114 +8485,19 @@ int rdev_set_badblocks(struct md_rdev *rdev, 
sector_t s, int sectors,

s += rdev->new_data_offset;
else
s += rdev->data_offset;
-   rv = md_set_badblocks(>badblocks,
- s, sectors, 0);
-   if (rv) {
+   rv = badblocks_set(>badblocks, s, sectors, 0);
+   if (rv == 0) {


So we need to correct it like:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0ff1bbf..245fe52 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8905,7 +8905,7 @@ int rdev_set_badblocks(struct md_rdev *rdev, 
sector_t s, int sectors,

else
s += rdev->data_offset;
rv = badblocks_set(>badblocks, s, sectors, 0);
-   if (rv == 0) {
+   if (rv) {

Thanks,
Guoqing


[PATCH] block: use bio_io_error directly

2017-07-21 Thread Guoqing Jiang
bio_io_error is capable of replacing the two lines.

Cc: Christoph Hellwig <h...@lst.de>
Cc: Philipp Reisner <philipp.reis...@linbit.com>
Cc: Lars Ellenberg <lars.ellenb...@linbit.com>
Cc: Mike Snitzer <snit...@redhat.com>
Cc: Sagi Grimberg <s...@grimberg.me>
Cc: Alexander Viro <v...@zeniv.linux.org.uk>
Signed-off-by: Guoqing Jiang <gqji...@suse.com>
---
 drivers/block/drbd/drbd_int.h | 3 +--
 drivers/md/dm-mpath.c | 3 +--
 drivers/nvme/target/io-cmd.c  | 3 +--
 fs/block_dev.c| 3 +--
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index d17b6e6..dd21a1b 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1630,8 +1630,7 @@ static inline void drbd_generic_make_request(struct 
drbd_device *device,
__release(local);
if (!bio->bi_bdev) {
drbd_err(device, "drbd_generic_make_request: bio->bi_bdev == 
NULL\n");
-   bio->bi_status = BLK_STS_IOERR;
-   bio_endio(bio);
+   bio_io_error(bio);
return;
}
 
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index 0e8ab5b..aa6dcc4 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -623,8 +623,7 @@ static void process_queued_bios(struct work_struct *work)
r = __multipath_map_bio(m, bio, get_mpio_from_bio(bio));
switch (r) {
case DM_MAPIO_KILL:
-   bio->bi_status = BLK_STS_IOERR;
-   bio_endio(bio);
+   bio_io_error(bio);
break;
case DM_MAPIO_REQUEUE:
bio->bi_status = BLK_STS_DM_REQUEUE;
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 3b4d47a..46baf19 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -145,8 +145,7 @@ static void nvmet_execute_discard(struct nvmet_req *req)
bio->bi_private = req;
bio->bi_end_io = nvmet_bio_done;
if (status) {
-   bio->bi_status = BLK_STS_IOERR;
-   bio_endio(bio);
+   bio_io_error(bio);
} else {
submit_bio(bio);
}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9941dc8..3ceb8f4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -370,8 +370,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
 
ret = bio_iov_iter_get_pages(bio, iter);
if (unlikely(ret)) {
-   bio->bi_status = BLK_STS_IOERR;
-   bio_endio(bio);
+   bio_io_error(bio);
break;
}
 
-- 
2.10.0



Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-28 Thread Guoqing Jiang



On 06/28/2017 12:22 AM, Ming Lei wrote:



Seems above section is similar as reset_bvec_table introduced in next patch,
why there is difference between raid1 and raid10? Maybe add reset_bvec_table
into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Hi Guoqing,

I think it is a good idea, and even the two patches can be put into
one, so how about the following patch?


Looks good.

Acked-by: Guoqing Jiang <gqji...@suse.com>

Thanks,
Guoqing



Shaohua, what do you think of this one?

---
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3d957ac1e109..7ffc622dd7fa 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9126,6 +9126,24 @@ void md_reload_sb(struct mddev *mddev, int nr)
  }
  EXPORT_SYMBOL(md_reload_sb);
  
+/* generally called after bio_reset() for reseting bvec */

+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size)
+{
+   /* initialize bvec table again */
+   rp->idx = 0;
+   do {
+   struct page *page = resync_fetch_page(rp, rp->idx++);
+   int len = min_t(int, size, PAGE_SIZE);
+
+   /*
+* won't fail because the vec table is big
+* enough to hold all these pages
+*/
+   bio_add_page(bio, page, len, 0);
+   size -= len;
+   } while (rp->idx < RESYNC_PAGES && size > 0);
+}
+
  #ifndef MODULE
  
  /*

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 991f0fe2dcc6..f569831b1de9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -783,4 +783,6 @@ static inline struct page *resync_fetch_page(struct 
resync_pages *rp,
return NULL;
return rp->pages[idx];
  }
+
+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size);
  #endif /* _MD_MD_H */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..6ae2665ecd58 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,7 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
-   int j;
-   int size;
blk_status_t status;
-   struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2095,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
-   b->bi_vcnt = vcnt;
-   b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2102,8 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;
  
-		size = b->bi_iter.bi_size;

-   bio_for_each_segment_all(bi, b, j) {
-   bi->bv_offset = 0;
-   if (size > PAGE_SIZE)
-   bi->bv_len = PAGE_SIZE;
-   else
-   bi->bv_len = size;
-   size -= PAGE_SIZE;
-   }
+   /* initialize bvec table again */
+   reset_bvec_table(b, rp, r1_bio->sectors << 9);
}
for (primary = 0; primary < conf->raid_disks * 2; primary++)
if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5026e7ad51d3..72f4fa07449b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2087,8 +2087,8 @@ static void sync_request_write(struct mddev *mddev, 
struct r10bio *r10_bio)
rp = get_resync_pages(tbio);
bio_reset(tbio);
  
-		tbio->bi_vcnt = vcnt;

-   tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
+   reset_bvec_table(tbio, rp, fbio->bi_iter.bi_size);
+
rp->raid_bio = r10_bio;
tbio->bi_private = rp;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;

Thanks,
Ming





Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-27 Thread Guoqing Jiang



On 06/26/2017 08:09 PM, Ming Lei wrote:

We will support multipage bvec soon, so initialize bvec
table using the standardy way instead of writing the
talbe directly. Otherwise it won't work any more once
multipage bvec is enabled.

Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Signed-off-by: Ming Lei 
---
  drivers/md/raid1.c | 27 ++-
  1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..835c42396861 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,8 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
-   int j;
int size;
blk_status_t status;
-   struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2096,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
-   b->bi_vcnt = vcnt;
-   b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2103,20 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;
  
-		size = b->bi_iter.bi_size;

-   bio_for_each_segment_all(bi, b, j) {
-   bi->bv_offset = 0;
-   if (size > PAGE_SIZE)
-   bi->bv_len = PAGE_SIZE;
-   else
-   bi->bv_len = size;
-   size -= PAGE_SIZE;
-   }
+   /* initialize bvec table again */
+   rp->idx = 0;
+   size = r1_bio->sectors << 9;
+   do {
+   struct page *page = resync_fetch_page(rp, rp->idx++);
+   int len = min_t(int, size, PAGE_SIZE);
+
+   /*
+* won't fail because the vec table is big
+* enough to hold all these pages
+*/
+   bio_add_page(b, page, len, 0);
+   size -= len;
+   } while (rp->idx < RESYNC_PAGES && size > 0);
}


Seems above section is similar as reset_bvec_table introduced in next patch,
why there is difference between raid1 and raid10? Maybe add reset_bvec_table
into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Thanks,
Guoqing


[PATCH] block: simplify code with bio_io_error

2017-06-02 Thread Guoqing Jiang
bio_io_error was introduced in the commit 4246a0b
("block: add a bi_error field to struct bio"), so
use it directly.

Signed-off-by: Guoqing Jiang <gqji...@suse.com>
---
 block/blk-core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c706852..92bc663 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1662,8 +1662,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
blk_queue_split(q, , q->bio_split);
 
if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
-   bio->bi_error = -EIO;
-   bio_endio(bio);
+   bio_io_error(bio);
return BLK_QC_T_NONE;
}
 
-- 
2.10.0



Re: [LSF/MM TOPIC] [LSF/MM ATTEND] md raid general discussion

2017-01-15 Thread Guoqing Jiang



On 01/10/2017 12:38 AM, Coly Li wrote:

Hi Folks,

I'd like to propose a general md raid discussion, it is quite necessary
for most of active md raid developers sit together to discuss current
challenge of Linux software raid and development trends.

In the last years, we have many development activities in md raid, e.g.
raid5 cache, raid1 clustering, partial parity log, fast fail
upstreaming, and some effort for raid1 & raid0 performance improvement.

I see there are some kind of functionality overlap between r5cache
(raid5 cache) and PPL (partial parity log), currently I have no idea
where we will go for these two development activities.
Also I receive reports from users that raid1 performance is desired when
it is built on NVMe SSDs as a cache (maybe bcache or dm-cache). I am
working on some raid1 performance improvement (e.g. new raid1 I/O
barrier and lockless raid1 I/O submit), and have some more ideas to discuss.

Therefore, if md raid developers may have a chance to sit together,
discuss how to efficiently collaborate in next year, it will be much
more productive then communicating on mailing list.


I would like to attend raid discussion, besides above topics I think we
can talk about improve the test suite of mdadm to make it more robust
(I can share related test suite which is used for clustered raid).

And I could share  the status of clustered raid about what we have done
and what we can do in the future. Finally, I'd like to know/discuss about
the roadmap of RAID.

Thanks a lot!
Guoqing
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html