Re: [PATCH] scsi: core: Cap initial sdev queue depth at shost.can_queue

2021-04-19 Thread Ming Lei
On Tue, Apr 20, 2021 at 12:06:24AM +0800, John Garry wrote: > Function sdev_store_queue_depth() enforces that the sdev queue depth cannot > exceed shost.can_queue. > > However, the LLDD may still set cmd_per_lun > can_queue, which leads to an > initial sdev queue depth greater than can_queue. >

Re: [RFC PATCH 2/2] bfq/mq-deadline: remove redundant check for passthrough request

2021-04-15 Thread Ming Lei
blk_rq_is_passthrough(rq)) { > - if (at_head) > - list_add(>queuelist, >dispatch); > - else > - list_add_tail(>queuelist, >dispatch); > + if (at_head) { > + list_add(>queuelist, >dispatch); > } else { > deadline_add_rq_rb(dd, rq); > > -- > 2.30.2 > Looks fine: Reviewed-by: Ming Lei Thanks, Ming

Re: [PATCH 1/2] blk-mq: bypass IO scheduler's limit_depth for passthrough request

2021-04-15 Thread Ming Lei
v.h > +++ b/include/linux/blkdev.h > @@ -272,6 +272,12 @@ static inline bool bio_is_passthrough(struct bio *bio) > return blk_op_is_scsi(op) || blk_op_is_private(op); > } > > +static inline bool blk_op_is_passthrough(unsigned int op) > +{ > + return (blk_op_is_scsi(op & REQ_OP

Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-11 Thread Ming Lei
On Sun, Apr 11, 2021 at 10:13:01PM +, Damien Le Moal wrote: > On 2021/04/09 23:47, Bart Van Assche wrote: > > On 4/7/21 3:27 AM, Damien Le Moal wrote: > >> On 2021/04/07 18:46, Changheun Lee wrote: > >>> I'll prepare new patch as you recommand. It will be added setting of > >>> limit_bio_size

Re: [PATCH 3/3] loop: Charge i/o to mem and blk cg

2021-04-05 Thread Ming Lei
n tmpfs, memory is charged appropriately. > > This patch also exports cgroup_get_e_css and int_active_memcg so it > can be used by the loop module. > > Signed-off-by: Dan Schatzberg > Acked-by: Johannes Weiner Reviewed-by: Ming Lei -- Ming Lei

Re: [PATCH 1/3] loop: Use worker per cgroup instead of kworker

2021-04-05 Thread Ming Lei
lo_complete_rq, > }; > > @@ -2164,6 +2300,7 @@ static int loop_add(struct loop_device **l, int i) > mutex_init(>lo_mutex); > lo->lo_number = i; > spin_lock_init(>lo_lock); > + spin_lock_init(>lo_work_lock); > disk->major = LOOP_MAJOR; > disk->first_minor = i << part_shift; > disk->fops = _fops; > diff --git a/drivers/block/loop.h b/drivers/block/loop.h > index a3c04f310672..9289c1cd6374 100644 > --- a/drivers/block/loop.h > +++ b/drivers/block/loop.h > @@ -14,7 +14,6 @@ > #include > #include > #include > -#include > #include > > /* Possible states of device */ > @@ -54,8 +53,13 @@ struct loop_device { > > spinlock_t lo_lock; > int lo_state; > - struct kthread_worker worker; > - struct task_struct *worker_task; > + spinlock_t lo_work_lock; > + struct workqueue_struct *workqueue; > + struct work_struct rootcg_work; > + struct list_headrootcg_cmd_list; > + struct list_headidle_worker_list; > + struct rb_root worker_tree; > + struct timer_list timer; > booluse_dio; > boolsysfs_inited; > > @@ -66,7 +70,7 @@ struct loop_device { > }; > > struct loop_cmd { > - struct kthread_work work; > + struct list_head list_entry; > bool use_aio; /* use AIO interface to handle I/O */ > atomic_t ref; /* only for aio */ > long ret; > -- > 2.30.2 > Reviewed-by: Ming Lei -- Ming Lei

Re: [PATCH 1/2] block: shutdown blktrace in case of fatal signal pending

2021-04-03 Thread Ming Lei
On Sat, Apr 03, 2021 at 04:10:16PM +0800, Ming Lei wrote: > On Fri, Apr 02, 2021 at 07:27:30PM +0200, Christoph Hellwig wrote: > > On Wed, Mar 31, 2021 at 08:16:50AM +0800, Ming Lei wrote: > > > On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote: > > > &

Re: [PATCH 1/2] block: shutdown blktrace in case of fatal signal pending

2021-04-03 Thread Ming Lei
On Fri, Apr 02, 2021 at 07:27:30PM +0200, Christoph Hellwig wrote: > On Wed, Mar 31, 2021 at 08:16:50AM +0800, Ming Lei wrote: > > On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote: > > > On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote: > > > &

Re: Race condition in Kernel

2021-04-01 Thread Ming Lei
On Thu, Apr 01, 2021 at 04:27:37PM +, Gulam Mohamed wrote: > Hi Ming, > > Thanks for taking a look into this. Can you please see my inline > comments in below mail? > > Regards, > Gulam Mohamed. > > -Original Message- > From: Ming Lei > Se

Re: [PATCH 1/2] block: shutdown blktrace in case of fatal signal pending

2021-03-30 Thread Ming Lei
On Tue, Mar 30, 2021 at 06:53:30PM +0200, Christoph Hellwig wrote: > On Tue, Mar 23, 2021 at 04:14:39PM +0800, Ming Lei wrote: > > blktrace may allocate lots of memory, if the process is terminated > > by user or OOM, we need to provide one chance to remove the trace > > buf

Re: [PATCH 2/2] blktrace: limit allowed total trace buffer size

2021-03-29 Thread Ming Lei
On Tue, Mar 30, 2021 at 10:57:04AM +0800, Su Yue wrote: > > On Tue 23 Mar 2021 at 16:14, Ming Lei wrote: > > > On some ARCHs, such as aarch64, page size may be 64K, meantime there may > > be lots of CPU cores. relay_open() needs to allocate pages on each CPU > > b

Re: [PATCH 0/2] blktrace: fix trace buffer leak and limit trace buffer size

2021-03-29 Thread Ming Lei
On Tue, Mar 23, 2021 at 04:14:38PM +0800, Ming Lei wrote: > blktrace may pass big trace buffer size via '-b', meantime the system > may have lots of CPU cores, so too much memory can be allocated for > blktrace. > > The 1st patch shutdown bltrace in blkdev_close() in case of

Re: [syzbot] KASAN: use-after-free Read in disk_part_iter_next (2)

2021-03-27 Thread Ming Lei
0 RCX: 00465d67 > RDX: 7ffda32c37f3 RSI: 004bfab2 RDI: 7ffda32c37e0 > RBP: R08: R09: 7ffda32c35a0 > R10: 7ffda32c3457 R11: 0202 R12: 0001 > R13: 0000 R14: 0001 R15: 7ffda32c37e0 This is another & un-related warning with original report, so I think the patch in above tree fixes the issue. -- Ming Lei

Re: [syzbot] KASAN: use-after-free Read in disk_part_iter_next (2)

2021-03-26 Thread Ming Lei
On Sun, Mar 14, 2021 at 7:10 PM syzbot wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit:280d542f Merge tag 'drm-fixes-2021-03-05' of git://anongit.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=15ade5aed0 > kernel config:

Re: Race condition in Kernel

2021-03-24 Thread Ming Lei
On Wed, Mar 24, 2021 at 12:37:03PM +, Gulam Mohamed wrote: > Hi All, > > We are facing a stale link (of the device) issue during the iscsi-logout > process if we use parted command just before the iscsi logout. Here are the > details: > > As part of iscsi logout, the

Re: Race condition in Kernel

2021-03-24 Thread Ming Lei
> kernel between the systemd-udevd and iscsi-logout processing as described > above. We are able to reproduce this even with latest upstream kernel. > > We have come across a patch from Ming Lei which was created for "avoid to > drop & re-add partitions if part

[PATCH 2/2] blktrace: limit allowed total trace buffer size

2021-03-23 Thread Ming Lei
in case of 'blktrace -b 8192' which is used by device-mapper test suite[1]. This way could cause OOM easily. Fix the issue by limiting max allowed pages to be 1/8 of totalram_pages(). [1] https://github.com/jthornber/device-mapper-test-suite.git Signed-off-by: Ming Lei --- kernel/trace/blktrace.c

[PATCH 0/2] blktrace: fix trace buffer leak and limit trace buffer size

2021-03-23 Thread Ming Lei
for avoiding potential OOM. Ming Lei (2): block: shutdown blktrace in case of fatal signal pending blktrace: limit allowed total trace buffer size fs/block_dev.c | 6 ++ kernel/trace/blktrace.c | 32 2 files changed, 38 insertions

[PATCH 1/2] block: shutdown blktrace in case of fatal signal pending

2021-03-23 Thread Ming Lei
blktrace may allocate lots of memory, if the process is terminated by user or OOM, we need to provide one chance to remove the trace buffer, otherwise memory leak may be caused. Fix the issue by shutdown blktrace in case of task exiting in blkdev_close(). Signed-off-by: Ming Lei --- fs

Re: [syzbot] KASAN: use-after-free Read in disk_part_iter_next (2)

2021-03-22 Thread Ming Lei
On Sun, Mar 14, 2021 at 7:10 PM syzbot wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit:280d542f Merge tag 'drm-fixes-2021-03-05' of git://anongit.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=15ade5aed0 > kernel config:

Re: [syzbot] KASAN: use-after-free Read in disk_part_iter_next (2)

2021-03-21 Thread Ming Lei
7ebdc It should be the same issue which was addressed by aebf5db91705 block: fix use-after-free in disk_part_iter_next but converting to xarray introduced the issue again. -- Ming Lei

Re: [PATCH v7 2/3] block: add bdev_interposer

2021-03-16 Thread Ming Lei
On Tue, Mar 16, 2021 at 07:35:44PM +0300, Sergei Shtepa wrote: > The 03/16/2021 11:09, Ming Lei wrote: > > On Fri, Mar 12, 2021 at 06:44:54PM +0300, Sergei Shtepa wrote: > > > bdev_interposer allows to redirect bio requests to another devices. > > > > &g

Re: [PATCH v7 2/3] block: add bdev_interposer

2021-03-16 Thread Ming Lei
On Fri, Mar 12, 2021 at 06:44:54PM +0300, Sergei Shtepa wrote: > bdev_interposer allows to redirect bio requests to another devices. > > Signed-off-by: Sergei Shtepa > --- > block/bio.c | 2 ++ > block/blk-core.c | 57 +++ >

Re: [RFC PATCH v3 2/3] blk-mq: Freeze and quiesce all queues for tagset in elevator_exit()

2021-03-10 Thread Ming Lei
On Fri, Mar 05, 2021 at 11:14:53PM +0800, John Garry wrote: > A use-after-free may occur if blk_mq_queue_tag_busy_iter() is run on a > queue when another queue associated with the same tagset is switching IO > scheduler: > > BUG: KASAN: use-after-free in bt_iter+0xa0/0x120 > Read of size 8 at

Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2021-03-05 Thread Ming Lei
e(entity); 1181is_in_service = entity == sd->in_service_entity; 1182 1183bfq_calc_finish(entity, entity->service); 1184 1185if (is_in_service) Seems entity->sched_data points to NULL. > > Thanks, > Paolo > > > Il giorno 5 mar 2021,

Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2021-03-05 Thread Ming Lei
Hello Hillf, Thanks for the debug patch. On Fri, Mar 5, 2021 at 5:00 PM Hillf Danton wrote: > > On Thu, 4 Mar 2021 16:42:30 +0800 Ming Lei wrote: > > On Sat, Oct 10, 2020 at 1:40 PM Mikhail Gavrilov > > wrote: > > > > > > Paolo, Jens I am sorry for the noi

Re: [bugreport 5.9-rc8] general protection fault, probably for non-canonical address 0x46b1b0f0d8856e4a: 0000 [#1] SMP NOPTI

2021-03-04 Thread Ming Lei
t; FS: () GS:8dc90e0c() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 003e8ebe4000 CR3: 0007c2546000 CR4: 00350ee0 > Call Trace: > bfq_deactivate_entity+0x4f/0xc0 Hello, The same stack trace was observed in RH internal test too, and kernel is 5.11.0-0.rc6, but there isn't reproducer yet. -- Ming Lei

Re: [PATCH V2 0/3] block: avoid to drop & re-add partitions if partitions aren't changed

2021-02-24 Thread Ming Lei
On Wed, Feb 24, 2021 at 09:18:25AM +0100, Christoph Hellwig wrote: > On Wed, Feb 24, 2021 at 11:58:26AM +0800, Ming Lei wrote: > > Hi Guys, > > > > The two patches changes block ioctl(BLKRRPART) for avoiding drop & > > re-add partitions if partitions state isn't ch

Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-21 Thread Ming Lei
On Fri, Feb 19, 2021 at 06:15:13PM +0530, SelvaKumar S wrote: > This patchset tries to add support for TP4065a ("Simple Copy Command"), > v2020.05.04 ("Ratified") > > The Specification can be found in following link. > https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip >

Re: [PATCH 0/2] block: avoid to drop & re-add partitions if partitions aren't changed

2021-02-18 Thread Ming Lei
On Wed, Feb 17, 2021 at 08:16:29AM +0100, Christoph Hellwig wrote: > On Wed, Feb 17, 2021 at 11:07:14AM +0800, Ming Lei wrote: > > Do you think it is correct for ioctl(BLKRRPART) to always drop/re-add > > partition device node? > > Yes, that is what it is designed to do. Th

Re: [PATCH 0/2] block: avoid to drop & re-add partitions if partitions aren't changed

2021-02-16 Thread Ming Lei
On Tue, Feb 16, 2021 at 09:44:30AM +0100, Christoph Hellwig wrote: > On Mon, Feb 15, 2021 at 12:03:41PM +0800, Ming Lei wrote: > > Hello, > > I think this is a fundamentally bad idea. We should not keep the > parsed partition state around forever just to work around some

Re: [PATCH v2] block: recalculate segment count for multi-segment discards correctly

2021-02-14 Thread Ming Lei
ment discards. It calculates the correct discard segment > count by counting the number of bio as each discard bio is considered its > own segment. > > Fixes: 1e739730c5b9 ("block: optionally merge discontiguous discard bios into > a single request") > Signed-off-by: Da

Re: [PATCH 0/2] block: avoid to drop & re-add partitions if partitions aren't changed

2021-02-14 Thread Ming Lei
On Fri, Feb 05, 2021 at 10:17:06AM +0800, Ming Lei wrote: > Hi Guys, > > The two patches changes block ioctl(BLKRRPART) for avoiding drop & > re-add partitions if partitions state isn't changed. The current > behavior confuses userspace because partitions can disappear a

Re: [PATCH 2/2] block: avoid to drop & re-add partitions if partitions aren't changed

2021-02-04 Thread Ming Lei
On Fri, Feb 05, 2021 at 08:14:29AM +0100, Christoph Hellwig wrote: > On Fri, Feb 05, 2021 at 10:17:08AM +0800, Ming Lei wrote: > > block ioctl(BLKRRPART) always drops current partitions and adds > > partitions again, even though there isn't any change in partitions table. > >

[PATCH 1/2] block: move partitions check code into single helper

2021-02-04 Thread Ming Lei
No functional change, make code more readable, and prepare for supporting safe re-read partitions. Cc: Ewan D. Milne Signed-off-by: Ming Lei --- block/partitions/core.c | 51 ++--- 1 file changed, 37 insertions(+), 14 deletions(-) diff --git a/block

[PATCH 0/2] block: avoid to drop & re-add partitions if partitions aren't changed

2021-02-04 Thread Ming Lei
Hi Guys, The two patches changes block ioctl(BLKRRPART) for avoiding drop & re-add partitions if partitions state isn't changed. The current behavior confuses userspace because partitions can disappear anytime when ioctl(BLKRRPART). Ming Lei (2): block: move partitions check code into si

[PATCH 2/2] block: avoid to drop & re-add partitions if partitions aren't changed

2021-02-04 Thread Ming Lei
may confuse userspace or users, for example, one normal workable partition device node may disappear any time. Fix this issue by checking if there is real change in partitions state, and only drop & re-add them when partitions state is really changed. Cc: Ewan D. Milne Signed-off-by: Ming

Re: [PATCH] block: recalculate segment count for multi-segment discard requests correctly

2021-02-03 Thread Ming Lei
urn nr_phys_segs; > + } > + /* fall through */ > case REQ_OP_SECURE_ERASE: > case REQ_OP_WRITE_ZEROES: > return 0; blk_rq_nr_discard_segments() always returns >=1 segments, so no similar issue in case of single range discard. Reviewed-by: Ming Lei And it can be thought as: Fixes: 1e739730c5b9 ("block: optionally merge discontiguous discard bios into a single request") -- Ming

Re: [PATCH] block: recalculate segment count for multi-segment discard requests correctly

2021-02-03 Thread Ming Lei
On Wed, Feb 03, 2021 at 11:23:37AM -0500, David Jeffery wrote: > On Wed, Feb 03, 2021 at 10:35:17AM +0800, Ming Lei wrote: > > > > On Tue, Feb 02, 2021 at 03:43:55PM -0500, David Jeffery wrote: > > > The return 0 does seem to be an old relic that does not make sense >

Re: [PATCH v4 1/2] bio: limit bio max size

2021-02-02 Thread Ming Lei
On Tue, Feb 02, 2021 at 01:12:04PM +0900, Changheun Lee wrote: > > On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote: > > > > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote: > > > > > bio size can grow up to 4GB when muli-page bvec is enabled. > > > > > but sometimes it

Re: [PATCH] block: recalculate segment count for multi-segment discard requests correctly

2021-02-02 Thread Ming Lei
On Tue, Feb 02, 2021 at 03:43:55PM -0500, David Jeffery wrote: > On Tue, Feb 02, 2021 at 11:33:43AM +0800, Ming Lei wrote: > > > > On Mon, Feb 01, 2021 at 11:48:50AM -0500, David Jeffery wrote: > > > When a stacked block device inserts a request into another

Re: [PATCH] block: recalculate segment count for multi-segment discard requests correctly

2021-02-01 Thread Ming Lei
On Mon, Feb 01, 2021 at 11:48:50AM -0500, David Jeffery wrote: > When a stacked block device inserts a request into another block device > using blk_insert_cloned_request, the request's nr_phys_segments field gets > recalculated by a call to blk_recalc_rq_segments in > blk_cloned_rq_check_limits.

Re: [PATCH v4 1/2] bio: limit bio max size

2021-01-31 Thread Ming Lei
On Mon, Feb 01, 2021 at 11:52:48AM +0900, Changheun Lee wrote: > > On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote: > > > bio size can grow up to 4GB when muli-page bvec is enabled. > > > but sometimes it would lead to inefficient behaviors. > > > in case of large chunk direct I/O, -

Re: [PATCH v4 1/2] bio: limit bio max size

2021-01-28 Thread Ming Lei
On Fri, Jan 29, 2021 at 12:49:08PM +0900, Changheun Lee wrote: > bio size can grow up to 4GB when muli-page bvec is enabled. > but sometimes it would lead to inefficient behaviors. > in case of large chunk direct I/O, - 32MB chunk read in user space - > all pages for 32MB would be merged to a bio

Re: [PATCH] Revert "block: simplify set_init_blocksize" to regain lost performance

2021-01-27 Thread Ming Lei
On Wed, Jan 27, 2021 at 09:44:50AM +0200, Maxim Mikityanskiy wrote: > On Wed, Jan 27, 2021 at 6:23 AM Bart Van Assche wrote: > > > > On 1/26/21 11:59 AM, Maxim Mikityanskiy wrote: > > > The cited commit introduced a serious regression with SATA write speed, > > > as found by bisecting. This patch

Re: [PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Ming Lei
On Tue, Jan 26, 2021 at 06:26:02AM +, Damien Le Moal wrote: > On 2021/01/26 15:07, Ming Lei wrote: > > On Tue, Jan 26, 2021 at 04:06:06AM +, Damien Le Moal wrote: > >> On 2021/01/26 12:58, Ming Lei wrote: > >>> On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changh

Re: [PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Ming Lei
On Tue, Jan 26, 2021 at 04:06:06AM +, Damien Le Moal wrote: > On 2021/01/26 12:58, Ming Lei wrote: > > On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changheun Lee wrote: > >> bio size can grow up to 4GB when muli-page bvec is enabled. > >> but sometimes it would le

Re: [PATCH v3 1/2] bio: limit bio max size

2021-01-26 Thread Ming Lei
On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changheun Lee wrote: > bio size can grow up to 4GB when muli-page bvec is enabled. > but sometimes it would lead to inefficient behaviors. > in case of large chunk direct I/O, - 32MB chunk read in user space - > all pages for 32MB would be merged to a bio

Re: [PATCH v2] bio: limit bio max size.

2021-01-21 Thread Ming Lei
On Thu, Jan 21, 2021 at 09:58:03AM +0900, Changheun Lee wrote: > bio size can grow up to 4GB when muli-page bvec is enabled. > but sometimes it would lead to inefficient behaviors. > in case of large chunk direct I/O, - 32MB chunk read in user space - > all pages for 32MB would be merged to a bio

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-14 Thread Ming Lei
On Thu, Jan 14, 2021 at 11:24:35AM -0600, Brian King wrote: > On 1/13/21 7:27 PM, Ming Lei wrote: > > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote: > >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote: > >>> On 1/12/21 2:54 PM, Brian King wrote: > >&g

Re: [PATCH] bio: limit bio max size.

2021-01-13 Thread Ming Lei
On Wed, Jan 13, 2021 at 12:02:44PM +, Damien Le Moal wrote: > On 2021/01/13 20:48, Ming Lei wrote: > > On Wed, Jan 13, 2021 at 11:16:11AM +, Damien Le Moal wrote: > >> On 2021/01/13 19:25, Ming Lei wrote: > >>> On Wed, Jan 13, 2021 at 09:28:02AM +, Damien

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-13 Thread Ming Lei
On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote: > On 1/12/21 6:33 PM, Tyrel Datwyler wrote: > > On 1/12/21 2:54 PM, Brian King wrote: > >> On 1/11/21 5:12 PM, Tyrel Datwyler wrote: > >>> Introduce several new vhost fields for managing MQ state of the adapter > >>> as well as initial

Re: [PATCH] bio: limit bio max size.

2021-01-13 Thread Ming Lei
On Wed, Jan 13, 2021 at 11:16:11AM +, Damien Le Moal wrote: > On 2021/01/13 19:25, Ming Lei wrote: > > On Wed, Jan 13, 2021 at 09:28:02AM +, Damien Le Moal wrote: > >> On 2021/01/13 18:19, Ming Lei wrote: > >>> On Wed, Jan 13, 2021 at 12:09

Re: [PATCH] bio: limit bio max size.

2021-01-13 Thread Ming Lei
On Wed, Jan 13, 2021 at 09:28:02AM +, Damien Le Moal wrote: > On 2021/01/13 18:19, Ming Lei wrote: > > On Wed, Jan 13, 2021 at 12:09 PM Changheun Lee > > wrote: > >> > >>> On 2021/01/12 21:14, Changheun Lee wrote: > >>>>> On 2021/01/12

Re: Re: [PATCH] bio: limit bio max size.

2021-01-13 Thread Ming Lei
So what is the actual total > >latency > >difference for the entire 32MB user IO ? That is I think what needs to be > >compared here. > > > >Also, what is your device max_sectors_kb and max queue depth ? > > > > 32MB total latency is about 19ms including merge time without this patch. > But with this patch, total latency is about 17ms including merge time too. 19ms looks too big just for preparing one 32MB sized bio, which isn't supposed to take so long. Can you investigate where the 19ms is taken just for preparing one 32MB sized bio? It might be iov_iter_get_pages() for handling page fault. If yes, one suggestion is to enable THP(Transparent HugePage Support) in your application. -- Ming Lei

Re: [percpu_ref] 2b0d3d3e4f: reaim.jobs_per_min -18.4% regression

2021-01-11 Thread Ming Lei
On Sun, Jan 10, 2021 at 10:32:47PM +0800, kernel test robot wrote: > > Greeting, > > FYI, we noticed a -18.4% regression of reaim.jobs_per_min due to commit: > > > commit: 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e ("percpu_ref: reduce memory > footprint of percpu_ref in fast path") >

Re: [PATCH v3 7/7] bio: don't copy bvec for direct IO

2021-01-10 Thread Ming Lei
g(bio, BIO_WORKINGSET); > diff --git a/include/linux/bio.h b/include/linux/bio.h > index d8f9077c43ef..1d30572a8c53 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -444,10 +444,13 @@ static inline void bio_wouldblock_error(struct bio *bio) > > /* > * Calculate number of bvec segments that should be allocated to fit data > - * pointed by @iter. > + * pointed by @iter. If @iter is backed by bvec it's going to be reused > + * instead of allocating a new one. > */ > static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs) > { > + if (iov_iter_is_bvec(iter)) > + return 0; > return iov_iter_npages(iter, max_segs); > } > > -- > 2.24.0 > Reviewed-by: Ming Lei -- Ming

Re: [PATCH v3 6/7] bio: add a helper calculating nr segments to alloc

2021-01-10 Thread Ming Lei
lk_types.h */ > #include > +#include > > #define BIO_DEBUG > > @@ -441,6 +442,15 @@ static inline void bio_wouldblock_error(struct bio *bio) > bio_endio(bio); > } > > +/* > + * Calculate number of bvec segments that should be allocated to fit data > + * pointed by @iter. > + */ > +static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs) > +{ > + return iov_iter_npages(iter, max_segs); > +} > + > struct request_queue; > > extern int submit_bio_wait(struct bio *bio); > -- > 2.24.0 > Reviewed-by: Ming Lei -- Ming

Re: [PATCH v3 5/7] iov_iter: optimise bvec iov_iter_advance()

2021-01-10 Thread Ming Lei
v_iter_is_bvec(i)) { > + iov_iter_bvec_advance(i, size); > + return; > + } > iterate_and_advance(i, size, v, 0, 0, 0) > } > EXPORT_SYMBOL(iov_iter_advance); > -- > 2.24.0 > Reviewed-by: Ming Lei -- Ming

Re: [PATCH v3 4/7] target/file: allocate the bvec array as part of struct target_core_file_cmd

2021-01-10 Thread Ming Lei
r_bvec(, is_write, aio_cmd->bvecs, sgl_nents, len); > > aio_cmd->cmd = cmd; > aio_cmd->len = len; > @@ -307,8 +301,6 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist > *sgl, u32 sgl_nents, > else > ret = call_read_iter(file, _cmd->iocb, ); > > - kfree(bvec); > - > if (ret != -EIOCBQUEUED) > cmd_rw_aio_complete(_cmd->iocb, ret, 0); > > -- > 2.24.0 > Reviewed-by: Ming Lei -- Ming

Re: [PATCH v3 3/7] block/psi: remove PSI annotations from direct IO

2021-01-10 Thread Ming Lei
ct-io.c > @@ -426,6 +426,8 @@ static inline void dio_bio_submit(struct dio *dio, struct > dio_submit *sdio) > unsigned long flags; > > bio->bi_private = dio; > + /* don't account direct I/O as memory stall */ > + bio_clear_flag(bio, BIO_WORKINGSET); > > spin_lock_irqsave(>bio_lock, flags); > dio->refcount++; > -- > 2.24.0 > Reviewed-by: Ming Lei -- Ming

Re: [PATCH v3 2/7] bvec/iter: disallow zero-length segment bvecs

2021-01-10 Thread Ming Lei
ontinue; \ > (void)(STEP); \ > } \ > } > -- > 2.24.0 > Reviewed-by: Ming Lei -- Ming

Re: [PATCH v3 1/7] splice: don't generate zero-len segement bvecs

2021-01-10 Thread Ming Lei
ipe, buf); > if (unlikely(ret)) { > @@ -680,6 +682,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe, > struct file *out, > array[n].bv_len = this_len; > array[n].bv_offset = buf->offset; > left

Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

2021-01-08 Thread Ming Lei
On Thu, Jan 07, 2021 at 09:21:11AM +1100, Dave Chinner wrote: > On Wed, Jan 06, 2021 at 04:45:48PM +0800, Ming Lei wrote: > > On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote: > > > At least for iomap I think this is the wrong approach. Betwe

Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

2021-01-06 Thread Ming Lei
On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote: > At least for iomap I think this is the wrong approach. Between the > iomap and writeback_control we know the maximum size of the writeback > request and can just use that. I think writeback_control can tell us nothing about max

[RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

2021-01-05 Thread Ming Lei
b_vcnt.bt Cc: Alexander Viro Cc: Darrick J. Wong Cc: linux-...@vger.kernel.org Cc: linux-fsde...@vger.kernel.org Signed-off-by: Ming Lei --- fs/block_dev.c| 1 + fs/iomap/buffered-io.c| 13 + include/linux/bio.h | 2 -- include/linux/blk

Re: [PATCH] fs/buffer: try to submit writeback bio in unit of page

2021-01-04 Thread Ming Lei
On Mon, Jan 04, 2021 at 09:44:15AM +0100, Christoph Hellwig wrote: > On Wed, Dec 30, 2020 at 08:08:15AM +0800, Ming Lei wrote: > > It is observed that __block_write_full_page() always submit bio with size > > of block size, > > which is often 512 bytes. > > > >

[PATCH 1/6] block: manage bio slab cache by xarray

2020-12-29 Thread Ming Lei
Managing bio slab cache via xarray by using slab cache size as xarray index, and storing 'struct bio_slab' instance into xarray. So code is simplified a lot, meantime is is more readable than before. Signed-off-by: Ming Lei --- block/bio.c | 104

[PATCH 5/6] block: move three bvec helpers declaration into private helper

2020-12-29 Thread Ming Lei
bvec_alloc(), bvec_free() and bvec_nr_vecs() are only used inside block layer core functions, no need to declare them in public header. Signed-off-by: Ming Lei --- block/blk.h | 4 include/linux/bio.h | 3 --- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/block

[PATCH 6/6] bcache: don't pass BIOSET_NEED_BVECS for the 'bio_set' embedded in 'cache_set'

2020-12-29 Thread Ming Lei
This bioset is just for allocating bio only from bio_next_split, and it needn't bvecs, so remove the flag. Cc: linux-bca...@vger.kernel.org Cc: Coly Li Signed-off-by: Ming Lei --- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/bcache

[PATCH 4/6] block: set .bi_max_vecs as actual allocated vector number

2020-12-29 Thread Ming Lei
bvec_alloc() may allocate more bio vectors than requested, so set .bi_max_vecs as actual allocated vector number, instead of the requested number. This way can help fs build bigger bio because new bio often won't be allocated until the current one becomes full. Signed-off-by: Ming Lei

[PATCH 2/6] block: don't pass BIOSET_NEED_BVECS for q->bio_split

2020-12-29 Thread Ming Lei
q->bio_split is only used by bio_split() for fast cloning bio, and no need to allocate bvecs, so remove this flag. Signed-off-by: Ming Lei --- block/blk-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index 96e5fcd7f

[PATCH 0/6] block: improvement on bioset & bvec allocation

2020-12-29 Thread Ming Lei
Hello, All are bioset / bvec improvement, and most of them are quite straightforward. Ming Lei (6): block: manage bio slab cache by xarray block: don't pass BIOSET_NEED_BVECS for q->bio_split block: don't allocate inline bvecs if this bioset needn't bvecs block: set .bi_max_v

[PATCH 3/6] block: don't allocate inline bvecs if this bioset needn't bvecs

2020-12-29 Thread Ming Lei
The inline bvecs won't be used if user needn't bvecs by not passing BIOSET_NEED_BVECS, so don't allocate bvecs in this situation. Signed-off-by: Ming Lei --- block/bio.c | 11 +++ include/linux/bio.h | 1 + 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/block

[PATCH] fs/buffer: try to submit writeback bio in unit of page

2020-12-29 Thread Ming Lei
Cc: Christoph Hellwig Cc: Jens Axboe Signed-off-by: Ming Lei --- fs/buffer.c | 112 +--- 1 file changed, 90 insertions(+), 22 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 32647d2011df..6bcf9ce5d7f8 100644 --- a/fs/buffer.c +++ b/fs/buffe

Re: [PATCH 1/3] blk-mq: allow hardware queue to get more tag while sharing a tag set

2020-12-28 Thread Ming Lei
On Mon, Dec 28, 2020 at 05:02:50PM +0800, yukuai (C) wrote: > Hi > > On 2020/12/28 16:28, Ming Lei wrote: > > Another candidate solution may be to always return true from > > hctx_may_queue() > > for this kind of queue because queue_depth has provided fair allocation

Re: [PATCH 1/3] blk-mq: allow hardware queue to get more tag while sharing a tag set

2020-12-28 Thread Ming Lei
On Mon, Dec 28, 2020 at 09:56:15AM +0800, yukuai (C) wrote: > Hi, > > On 2020/12/27 19:58, Ming Lei wrote: > > Hi Yu Kuai, > > > > On Sat, Dec 26, 2020 at 06:28:06PM +0800, Yu Kuai wrote: > > > When sharing a tag set, if most disks are issuing small amount of

Re: [PATCH 1/3] blk-mq: allow hardware queue to get more tag while sharing a tag set

2020-12-27 Thread Ming Lei
Hi Yu Kuai, On Sat, Dec 26, 2020 at 06:28:06PM +0800, Yu Kuai wrote: > When sharing a tag set, if most disks are issuing small amount of IO, and > only a few is issuing a large amount of IO. Current approach is to limit > the max amount of tags a disk can get equally to the average of total >

Re: [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter when freeing rqs

2020-12-22 Thread Ming Lei
On Tue, Dec 22, 2020 at 11:22:19AM +, John Garry wrote: > Resend without p...@codeaurora.org, which bounces for me > > On 22/12/2020 02:13, Bart Van Assche wrote: > > On 12/21/20 10:47 AM, John Garry wrote: > >> Yes, I agree, and I'm not sure what I wrote to give that impression. > >> > >>

Re: [RFC PATCH v2 2/2] blk-mq: Lockout tagset iter when freeing rqs

2020-12-17 Thread Ming Lei
On Thu, Dec 17, 2020 at 07:07:53PM +0800, John Garry wrote: > References to old IO sched requests are currently cleared from the > tagset when freeing those requests; switching elevator or changing > request queue depth is such a scenario in which this occurs. > > However, this does not stop the

Re: [PATCH] blktrace: fix 'BUG: sleeping function called from invalid context' in case of PREEMPT_RT

2020-12-15 Thread Ming Lei
On Mon, Dec 14, 2020 at 10:24:22AM -0500, Steven Rostedt wrote: > On Mon, 14 Dec 2020 10:22:17 +0800 > Ming Lei wrote: > > > trace_note_tsk() is called by __blk_add_trace(), which is covered by RCU > > read lock. > > So in case of PREEMPT_RT, warning of 'BUG: sl

Re: [PATCH v1 0/6] no-copy bvec

2020-12-15 Thread Ming Lei
On Tue, Dec 15, 2020 at 11:14:20AM +, Pavel Begunkov wrote: > On 15/12/2020 01:41, Ming Lei wrote: > > On Tue, Dec 15, 2020 at 12:20:19AM +, Pavel Begunkov wrote: > >> Instead of creating a full copy of iter->bvec into bio in direct I/O, > >> the patchset

Re: [PATCH v1 0/6] no-copy bvec

2020-12-14 Thread Ming Lei
On Tue, Dec 15, 2020 at 12:20:19AM +, Pavel Begunkov wrote: > Instead of creating a full copy of iter->bvec into bio in direct I/O, > the patchset makes use of the one provided. It changes semantics and > obliges users of asynchronous kiocb to track bvec lifetime, and [1/6] > converts the only

[PATCH] blktrace: fix 'BUG: sleeping function called from invalid context' in case of PREEMPT_RT

2020-12-13 Thread Ming Lei
into raw_spin_lock(). Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Ingo Molnar Cc: linux-kernel@vger.kernel.org Signed-off-by: Ming Lei --- kernel/trace/blktrace.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index

Re: KASAN: use-after-free Read in disk_part_iter_next

2020-12-13 Thread Ming Lei
On Fri, Dec 11, 2020 at 01:03:11PM -0800, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:15ac8fdb Add linux-next specific files for 20201207 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=15d8ad3750 > kernel

Re: [PATCH] x86/apic/vector: Fix ordering in vector assignment

2020-12-10 Thread Ming Lei
> return 0; > + > + if (node != NUMA_NO_NODE) { > + /* Try the node mask */ > + if (!assign_vector_locked(irqd, cpumask_of_node(node))) > + return 0; > + } > + > /* Try the full online mask */ > return assign_vector_locked(irqd, cpu_online_mask); > } > Reviewed-by: Ming Lei Thanks, Ming

Re: [RFC PATCH] blk-mq: Clean up references when freeing rqs

2020-12-10 Thread Ming Lei
On Thu, Dec 10, 2020 at 10:44:54AM +, John Garry wrote: > Hi Ming, > > On 10/12/2020 02:07, Ming Lei wrote: > > > Apart from this, my concern is that we come with for a solution, but it's > > > a > > > complicated solution and may not b

Re: [PATCH] blk-mq-tag: make blk_mq_tag_busy() return void

2020-12-09 Thread Ming Lei
> - return false; > + return; > > - return __blk_mq_tag_busy(hctx); > + __blk_mq_tag_busy(hctx); The above can be simplified as: if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) __blk_mq_tag_busy(hctx); Otherwise, looks fine: Reviewed-by: Ming Lei Thanks, Ming

Re: [RFC PATCH] blk-mq: Clean up references when freeing rqs

2020-12-09 Thread Ming Lei
On Wed, Dec 09, 2020 at 09:55:30AM +, John Garry wrote: > On 09/12/2020 01:01, Ming Lei wrote: > > blk_mq_queue_tag_busy_iter() can be run on another request queue just > > between one driver tag is allocated and updating the request map, so one > > extra request reference

Re: [RFC PATCH] blk-mq: Clean up references when freeing rqs

2020-12-08 Thread Ming Lei
On Tue, Dec 08, 2020 at 11:36:58AM +, John Garry wrote: > On 03/12/2020 09:26, John Garry wrote: > > On 03/12/2020 00:55, Ming Lei wrote: > > > > Hi Ming, > > > > > > Yeah, so I said that was another problem which you mentioned > > > > t

Re: [PATCH V2 0/3] blk-mq/nvme-loop: use nvme-loop's lock class for addressing lockdep false positive warning

2020-12-07 Thread Ming Lei
On Thu, Dec 03, 2020 at 09:26:35AM +0800, Ming Lei wrote: > Hi, > > Qian reported there is hang during booting when shared host tagset is > introduced on megaraid sas. Sumit reported the whole SCSI probe takes > about ~45min in his test. > > Turns out it is caused by n

Re: [RFC PATCH] blk-mq: Clean up references when freeing rqs

2020-12-02 Thread Ming Lei
On Wed, Dec 02, 2020 at 11:18:31AM +, John Garry wrote: > On 02/12/2020 03:31, Ming Lei wrote: > > On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote: > > > It has been reported many times that a use-after-free can be > > > intermittently > > >

Re: [RFC PATCH] blk-mq: Clean up references when freeing rqs

2020-12-01 Thread Ming Lei
On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote: > It has been reported many times that a use-after-free can be intermittently > found when iterating busy requests: > > - > https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96f...@huawei.com/ > - >

Re: [PATCH v2] blk-mq: Remove 'running from the wrong CPU' warning

2020-11-30 Thread Ming Lei
and the request is still processed > correctly, better remove the warning as this is the fast path. > > Suggested-by: Ming Lei > Signed-off-by: Daniel Wagner > --- > > v2: > - remove the warning as suggested by Ming > v1: > - initial version > > https:/

Re: [PATCH] blk-mq: Make running from the wrong CPU less scary

2020-11-26 Thread Ming Lei
On Thu, Nov 26, 2020 at 10:51:52AM +0100, Daniel Wagner wrote: > The current warning looks aweful like a proper crash. This is > confusing. There is not much information to gained from the stack > trace anyway, let's drop it. > > While at it print the cpumask as there might be additial helpful >

Re: [PATCH v2 2/4] sbitmap: remove swap_lock

2020-11-26 Thread Ming Lei
On Thu, Nov 26, 2020 at 01:44:36PM +, Pavel Begunkov wrote: > On 26/11/2020 02:46, Ming Lei wrote: > > On Sun, Nov 22, 2020 at 03:35:46PM +, Pavel Begunkov wrote: > >> map->swap_lock protects map->cleared from concurrent modification, > >> however sb

Re: [PATCH v2 2/4] sbitmap: remove swap_lock

2020-11-25 Thread Ming Lei
On Sun, Nov 22, 2020 at 03:35:46PM +, Pavel Begunkov wrote: > map->swap_lock protects map->cleared from concurrent modification, > however sbitmap_deferred_clear() is already atomically drains it, so > it's guaranteed to not loose bits on concurrent > sbitmap_deferred_clear(). > > A one

Re: [PATCH 5.11] block: optimise for_each_bvec() advance

2020-11-24 Thread Ming Lei
_advance((bio_vec), &(iter), \ > - (bvl).bv_len) : bvec_iter_skip_zero_bvec(&(iter))) > + bvec_iter_advance_single((bio_vec), &(iter), (bvl).bv_len)) > > /* for iterating one bio from start to end */ > #define BVEC_ITER_ALL_INIT (struct bvec_iter) > \ > -- > 2.24.0 > Looks fine, Reviewed-by: Ming Lei Thanks, Ming

Re: [PATCH v2 1/2] iov_iter: optimise iov_iter_npages for bvec

2020-11-19 Thread Ming Lei
On Fri, Nov 20, 2020 at 02:06:10AM +, Matthew Wilcox wrote: > On Fri, Nov 20, 2020 at 01:56:22AM +, Pavel Begunkov wrote: > > On 20/11/2020 01:49, Matthew Wilcox wrote: > > > On Fri, Nov 20, 2020 at 01:39:05AM +, Pavel Begunkov wrote: > > >> On 20/11/2020 01:20, Matthew Wilcox wrote: >

Re: [PATCH v2 1/2] iov_iter: optimise iov_iter_npages for bvec

2020-11-19 Thread Ming Lei
On Fri, Nov 20, 2020 at 01:39:05AM +, Pavel Begunkov wrote: > On 20/11/2020 01:20, Matthew Wilcox wrote: > > On Thu, Nov 19, 2020 at 11:24:38PM +, Pavel Begunkov wrote: > >> The block layer spends quite a while in iov_iter_npages(), but for the > >> bvec case the number of pages is already

  1   2   3   4   5   6   7   8   9   10   >