On Thu, Jan 21, 2021 at 09:58:03AM +0900, Changheun Lee wrote:
> bio size can grow up to 4GB when muli-page bvec is enabled.
> but sometimes it would lead to inefficient behaviors.
> in case of large chunk direct I/O, - 32MB chunk read in user space -
> all pages for 32MB would be merged to a bio
On Thu, Jan 14, 2021 at 11:24:35AM -0600, Brian King wrote:
> On 1/13/21 7:27 PM, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> >>> On 1/12/21 2:54 PM, Brian King wrote:
> >&g
On Wed, Jan 13, 2021 at 12:02:44PM +, Damien Le Moal wrote:
> On 2021/01/13 20:48, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 11:16:11AM +, Damien Le Moal wrote:
> >> On 2021/01/13 19:25, Ming Lei wrote:
> >>> On Wed, Jan 13, 2021 at 09:28:02AM +, Damien
On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> > On 1/12/21 2:54 PM, Brian King wrote:
> >> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> >>> Introduce several new vhost fields for managing MQ state of the adapter
> >>> as well as initial
On Wed, Jan 13, 2021 at 11:16:11AM +, Damien Le Moal wrote:
> On 2021/01/13 19:25, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 09:28:02AM +, Damien Le Moal wrote:
> >> On 2021/01/13 18:19, Ming Lei wrote:
> >>> On Wed, Jan 13, 2021 at 12:09
On Wed, Jan 13, 2021 at 09:28:02AM +, Damien Le Moal wrote:
> On 2021/01/13 18:19, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 12:09 PM Changheun Lee
> > wrote:
> >>
> >>> On 2021/01/12 21:14, Changheun Lee wrote:
> >>>>> On 2021/01/12
So what is the actual total
> >latency
> >difference for the entire 32MB user IO ? That is I think what needs to be
> >compared here.
> >
> >Also, what is your device max_sectors_kb and max queue depth ?
> >
>
> 32MB total latency is about 19ms including merge time without this patch.
> But with this patch, total latency is about 17ms including merge time too.
19ms looks too big just for preparing one 32MB sized bio, which isn't
supposed to
take so long. Can you investigate where the 19ms is taken just for
preparing one
32MB sized bio?
It might be iov_iter_get_pages() for handling page fault. If yes, one suggestion
is to enable THP(Transparent HugePage Support) in your application.
--
Ming Lei
On Sun, Jan 10, 2021 at 10:32:47PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -18.4% regression of reaim.jobs_per_min due to commit:
>
>
> commit: 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e ("percpu_ref: reduce memory
> footprint of percpu_ref in fast path")
>
g(bio, BIO_WORKINGSET);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index d8f9077c43ef..1d30572a8c53 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -444,10 +444,13 @@ static inline void bio_wouldblock_error(struct bio *bio)
>
> /*
> * Calculate number of bvec segments that should be allocated to fit data
> - * pointed by @iter.
> + * pointed by @iter. If @iter is backed by bvec it's going to be reused
> + * instead of allocating a new one.
> */
> static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> {
> + if (iov_iter_is_bvec(iter))
> + return 0;
> return iov_iter_npages(iter, max_segs);
> }
>
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
lk_types.h */
> #include
> +#include
>
> #define BIO_DEBUG
>
> @@ -441,6 +442,15 @@ static inline void bio_wouldblock_error(struct bio *bio)
> bio_endio(bio);
> }
>
> +/*
> + * Calculate number of bvec segments that should be allocated to fit data
> + * pointed by @iter.
> + */
> +static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> +{
> + return iov_iter_npages(iter, max_segs);
> +}
> +
> struct request_queue;
>
> extern int submit_bio_wait(struct bio *bio);
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
v_iter_is_bvec(i)) {
> + iov_iter_bvec_advance(i, size);
> + return;
> + }
> iterate_and_advance(i, size, v, 0, 0, 0)
> }
> EXPORT_SYMBOL(iov_iter_advance);
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
r_bvec(, is_write, aio_cmd->bvecs, sgl_nents, len);
>
> aio_cmd->cmd = cmd;
> aio_cmd->len = len;
> @@ -307,8 +301,6 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist
> *sgl, u32 sgl_nents,
> else
> ret = call_read_iter(file, _cmd->iocb, );
>
> - kfree(bvec);
> -
> if (ret != -EIOCBQUEUED)
> cmd_rw_aio_complete(_cmd->iocb, ret, 0);
>
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
ct-io.c
> @@ -426,6 +426,8 @@ static inline void dio_bio_submit(struct dio *dio, struct
> dio_submit *sdio)
> unsigned long flags;
>
> bio->bi_private = dio;
> + /* don't account direct I/O as memory stall */
> + bio_clear_flag(bio, BIO_WORKINGSET);
>
> spin_lock_irqsave(>bio_lock, flags);
> dio->refcount++;
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
ontinue; \
> (void)(STEP); \
> } \
> }
> --
> 2.24.0
>
Reviewed-by: Ming Lei
--
Ming
ipe, buf);
> if (unlikely(ret)) {
> @@ -680,6 +682,7 @@ iter_file_splice_write(struct pipe_inode_info *pipe,
> struct file *out,
> array[n].bv_len = this_len;
> array[n].bv_offset = buf->offset;
> left
On Thu, Jan 07, 2021 at 09:21:11AM +1100, Dave Chinner wrote:
> On Wed, Jan 06, 2021 at 04:45:48PM +0800, Ming Lei wrote:
> > On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote:
> > > At least for iomap I think this is the wrong approach. Betwe
On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote:
> At least for iomap I think this is the wrong approach. Between the
> iomap and writeback_control we know the maximum size of the writeback
> request and can just use that.
I think writeback_control can tell us nothing about max
b_vcnt.bt
Cc: Alexander Viro
Cc: Darrick J. Wong
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Ming Lei
---
fs/block_dev.c| 1 +
fs/iomap/buffered-io.c| 13 +
include/linux/bio.h | 2 --
include/linux/blk
On Mon, Jan 04, 2021 at 09:44:15AM +0100, Christoph Hellwig wrote:
> On Wed, Dec 30, 2020 at 08:08:15AM +0800, Ming Lei wrote:
> > It is observed that __block_write_full_page() always submit bio with size
> > of block size,
> > which is often 512 bytes.
> >
> >
Managing bio slab cache via xarray by using slab cache size as xarray index, and
storing 'struct bio_slab' instance into xarray.
So code is simplified a lot, meantime is is more readable than before.
Signed-off-by: Ming Lei
---
block/bio.c | 104
bvec_alloc(), bvec_free() and bvec_nr_vecs() are only used inside block
layer core functions, no need to declare them in public header.
Signed-off-by: Ming Lei
---
block/blk.h | 4
include/linux/bio.h | 3 ---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/block
This bioset is just for allocating bio only from bio_next_split, and it needn't
bvecs, so remove the flag.
Cc: linux-bca...@vger.kernel.org
Cc: Coly Li
Signed-off-by: Ming Lei
---
drivers/md/bcache/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/bcache
bvec_alloc() may allocate more bio vectors than requested, so set .bi_max_vecs
as
actual allocated vector number, instead of the requested number. This way can
help
fs build bigger bio because new bio often won't be allocated until the current
one
becomes full.
Signed-off-by: Ming Lei
q->bio_split is only used by bio_split() for fast cloning bio, and no
need to allocate bvecs, so remove this flag.
Signed-off-by: Ming Lei
---
block/blk-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 96e5fcd7f
Hello,
All are bioset / bvec improvement, and most of them are quite
straightforward.
Ming Lei (6):
block: manage bio slab cache by xarray
block: don't pass BIOSET_NEED_BVECS for q->bio_split
block: don't allocate inline bvecs if this bioset needn't bvecs
block: set .bi_max_v
The inline bvecs won't be used if user needn't bvecs by not passing
BIOSET_NEED_BVECS, so don't allocate bvecs in this situation.
Signed-off-by: Ming Lei
---
block/bio.c | 11 +++
include/linux/bio.h | 1 +
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/block
Cc: Christoph Hellwig
Cc: Jens Axboe
Signed-off-by: Ming Lei
---
fs/buffer.c | 112 +---
1 file changed, 90 insertions(+), 22 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 32647d2011df..6bcf9ce5d7f8 100644
--- a/fs/buffer.c
+++ b/fs/buffe
On Mon, Dec 28, 2020 at 05:02:50PM +0800, yukuai (C) wrote:
> Hi
>
> On 2020/12/28 16:28, Ming Lei wrote:
> > Another candidate solution may be to always return true from
> > hctx_may_queue()
> > for this kind of queue because queue_depth has provided fair allocation
On Mon, Dec 28, 2020 at 09:56:15AM +0800, yukuai (C) wrote:
> Hi,
>
> On 2020/12/27 19:58, Ming Lei wrote:
> > Hi Yu Kuai,
> >
> > On Sat, Dec 26, 2020 at 06:28:06PM +0800, Yu Kuai wrote:
> > > When sharing a tag set, if most disks are issuing small amount of
Hi Yu Kuai,
On Sat, Dec 26, 2020 at 06:28:06PM +0800, Yu Kuai wrote:
> When sharing a tag set, if most disks are issuing small amount of IO, and
> only a few is issuing a large amount of IO. Current approach is to limit
> the max amount of tags a disk can get equally to the average of total
>
On Tue, Dec 22, 2020 at 11:22:19AM +, John Garry wrote:
> Resend without p...@codeaurora.org, which bounces for me
>
> On 22/12/2020 02:13, Bart Van Assche wrote:
> > On 12/21/20 10:47 AM, John Garry wrote:
> >> Yes, I agree, and I'm not sure what I wrote to give that impression.
> >>
> >>
On Thu, Dec 17, 2020 at 07:07:53PM +0800, John Garry wrote:
> References to old IO sched requests are currently cleared from the
> tagset when freeing those requests; switching elevator or changing
> request queue depth is such a scenario in which this occurs.
>
> However, this does not stop the
On Mon, Dec 14, 2020 at 10:24:22AM -0500, Steven Rostedt wrote:
> On Mon, 14 Dec 2020 10:22:17 +0800
> Ming Lei wrote:
>
> > trace_note_tsk() is called by __blk_add_trace(), which is covered by RCU
> > read lock.
> > So in case of PREEMPT_RT, warning of 'BUG: sl
On Tue, Dec 15, 2020 at 11:14:20AM +, Pavel Begunkov wrote:
> On 15/12/2020 01:41, Ming Lei wrote:
> > On Tue, Dec 15, 2020 at 12:20:19AM +, Pavel Begunkov wrote:
> >> Instead of creating a full copy of iter->bvec into bio in direct I/O,
> >> the patchset
On Tue, Dec 15, 2020 at 12:20:19AM +, Pavel Begunkov wrote:
> Instead of creating a full copy of iter->bvec into bio in direct I/O,
> the patchset makes use of the one provided. It changes semantics and
> obliges users of asynchronous kiocb to track bvec lifetime, and [1/6]
> converts the only
into raw_spin_lock().
Cc: Christoph Hellwig
Cc: Steven Rostedt
Cc: Ingo Molnar
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ming Lei
---
kernel/trace/blktrace.c | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index
On Fri, Dec 11, 2020 at 01:03:11PM -0800, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:15ac8fdb Add linux-next specific files for 20201207
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15d8ad3750
> kernel
> return 0;
> +
> + if (node != NUMA_NO_NODE) {
> + /* Try the node mask */
> + if (!assign_vector_locked(irqd, cpumask_of_node(node)))
> + return 0;
> + }
> +
> /* Try the full online mask */
> return assign_vector_locked(irqd, cpu_online_mask);
> }
>
Reviewed-by: Ming Lei
Thanks,
Ming
On Thu, Dec 10, 2020 at 10:44:54AM +, John Garry wrote:
> Hi Ming,
>
> On 10/12/2020 02:07, Ming Lei wrote:
> > > Apart from this, my concern is that we come with for a solution, but it's
> > > a
> > > complicated solution and may not b
> - return false;
> + return;
>
> - return __blk_mq_tag_busy(hctx);
> + __blk_mq_tag_busy(hctx);
The above can be simplified as:
if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
__blk_mq_tag_busy(hctx);
Otherwise, looks fine:
Reviewed-by: Ming Lei
Thanks,
Ming
On Wed, Dec 09, 2020 at 09:55:30AM +, John Garry wrote:
> On 09/12/2020 01:01, Ming Lei wrote:
> > blk_mq_queue_tag_busy_iter() can be run on another request queue just
> > between one driver tag is allocated and updating the request map, so one
> > extra request reference
On Tue, Dec 08, 2020 at 11:36:58AM +, John Garry wrote:
> On 03/12/2020 09:26, John Garry wrote:
> > On 03/12/2020 00:55, Ming Lei wrote:
> >
> > Hi Ming,
> >
> > > > Yeah, so I said that was another problem which you mentioned
> > > > t
On Thu, Dec 03, 2020 at 09:26:35AM +0800, Ming Lei wrote:
> Hi,
>
> Qian reported there is hang during booting when shared host tagset is
> introduced on megaraid sas. Sumit reported the whole SCSI probe takes
> about ~45min in his test.
>
> Turns out it is caused by n
On Wed, Dec 02, 2020 at 11:18:31AM +, John Garry wrote:
> On 02/12/2020 03:31, Ming Lei wrote:
> > On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote:
> > > It has been reported many times that a use-after-free can be
> > > intermittently
> > >
On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote:
> It has been reported many times that a use-after-free can be intermittently
> found when iterating busy requests:
>
> -
> https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96f...@huawei.com/
> -
>
and the request is still processed
> correctly, better remove the warning as this is the fast path.
>
> Suggested-by: Ming Lei
> Signed-off-by: Daniel Wagner
> ---
>
> v2:
> - remove the warning as suggested by Ming
> v1:
> - initial version
>
> https:/
On Thu, Nov 26, 2020 at 10:51:52AM +0100, Daniel Wagner wrote:
> The current warning looks aweful like a proper crash. This is
> confusing. There is not much information to gained from the stack
> trace anyway, let's drop it.
>
> While at it print the cpumask as there might be additial helpful
>
On Thu, Nov 26, 2020 at 01:44:36PM +, Pavel Begunkov wrote:
> On 26/11/2020 02:46, Ming Lei wrote:
> > On Sun, Nov 22, 2020 at 03:35:46PM +, Pavel Begunkov wrote:
> >> map->swap_lock protects map->cleared from concurrent modification,
> >> however sb
On Sun, Nov 22, 2020 at 03:35:46PM +, Pavel Begunkov wrote:
> map->swap_lock protects map->cleared from concurrent modification,
> however sbitmap_deferred_clear() is already atomically drains it, so
> it's guaranteed to not loose bits on concurrent
> sbitmap_deferred_clear().
>
> A one
_advance((bio_vec), &(iter), \
> - (bvl).bv_len) : bvec_iter_skip_zero_bvec(&(iter)))
> + bvec_iter_advance_single((bio_vec), &(iter), (bvl).bv_len))
>
> /* for iterating one bio from start to end */
> #define BVEC_ITER_ALL_INIT (struct bvec_iter)
> \
> --
> 2.24.0
>
Looks fine,
Reviewed-by: Ming Lei
Thanks,
Ming
On Fri, Nov 20, 2020 at 02:06:10AM +, Matthew Wilcox wrote:
> On Fri, Nov 20, 2020 at 01:56:22AM +, Pavel Begunkov wrote:
> > On 20/11/2020 01:49, Matthew Wilcox wrote:
> > > On Fri, Nov 20, 2020 at 01:39:05AM +, Pavel Begunkov wrote:
> > >> On 20/11/2020 01:20, Matthew Wilcox wrote:
>
On Fri, Nov 20, 2020 at 01:39:05AM +, Pavel Begunkov wrote:
> On 20/11/2020 01:20, Matthew Wilcox wrote:
> > On Thu, Nov 19, 2020 at 11:24:38PM +, Pavel Begunkov wrote:
> >> The block layer spends quite a while in iov_iter_npages(), but for the
> >> bvec case the number of pages is already
On Fri, Nov 13, 2020 at 01:36:16PM -0800, Sagi Grimberg wrote:
>
> > > But if you think this has a better home, I'm assuming that the guys
> > > will be open to that.
> >
> > Also see the reply from Ming. It's a balancing act - don't want to add
> > extra overhead to the core, but also don't
Hello,
On Thu, Nov 12, 2020 at 09:07:52AM -0500, Rachit Agarwal wrote:
> From: Rachit Agarwal >
>
> Hi All,
>
> I/O batching is beneficial for optimizing IOPS and throughput for various
> applications. For instance, several kernel block drivers would benefit from
> batching,
> including mmc
On Wed, Nov 11, 2020 at 09:42:17AM -0500, Qian Cai wrote:
> On Wed, 2020-11-11 at 17:27 +0800, Ming Lei wrote:
> > Can this issue disappear by applying the following change?
>
> This makes the system boot again as well.
OK, actually it isn't necessary to register one new lock ke
On Wed, Nov 11, 2020 at 12:57:59PM +0530, Sumit Saxena wrote:
> On Tue, Nov 10, 2020 at 11:12 PM John Garry wrote:
> >
> > On 09/11/2020 14:05, John Garry wrote:
> > > On 09/11/2020 13:39, Qian Cai wrote:
> > >>> I suppose I could try do this myself also, but an authentic version
> > >>> would be
.
GCC: gcc version 10.2.1 20200826 (Red Hat 10.2.1-3) (GCC)
--
Ming Lei
50
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15bf75b850
Not reproduce this issue by above C reproducer with the kernel config
in hours running
on linus tree.
Thanks,
Ming Lei
+ if (queue_is_mq(q)) {
> + struct blk_mq_hw_ctx *hctx;
> + int i;
> +
> cancel_delayed_work_sync(>requeue_work);
>
> + queue_for_each_hw_ctx(q, hctx, i)
> + cancel_delayed_work_sync(>run_work);
> + }
> +
Looks fine:
Reviewed-by: Ming Lei
Thanks,
Ming
On Thu, Oct 08, 2020 at 07:23:02PM -0600, Jens Axboe wrote:
> On 10/8/20 2:28 PM, syzbot wrote:
> > syzbot has bisected this issue to:
> >
> > commit 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e
> > Author: Ming Lei
> > Date: Thu Oct 1 15:48:41 2020 +
>
On Thu, Oct 01, 2020 at 11:48:40PM +0800, Ming Lei wrote:
> Hi,
>
> The 1st patch removes memory footprint of percpu_ref in fast path
> from 7 words to 2 words, since it is often used in fast path and
> embedded in user struct.
>
> The 2nd patch moves .q_usage_cou
: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d5a3e1a4c2f7..67935b3bef6c 100644
--- a/include/linux/blkdev.h
V2:
- pass 'gfp' to kzalloc() for fixing block/027 failure reported by
kernel test robot
- protect percpu_ref_is_zero() with destroying percpu-refcount by
spin lock
Ming Lei (2):
percpu_ref: reduce memory footprint of percpu_ref in fast path
block: m
-by: Ming Lei
---
drivers/infiniband/sw/rdmavt/mr.c | 2 +-
include/linux/percpu-refcount.h | 52 ++--
lib/percpu-refcount.c | 131 ++
3 files changed, 123 insertions(+), 62 deletions(-)
diff --git a/drivers/infiniband/sw/rdmavt/mr.c
b/drivers
On Wed, Sep 30, 2020 at 12:00:15PM -0400, Tejun Heo wrote:
> On Wed, Sep 30, 2020 at 04:26:56PM +0800, Ming Lei wrote:
> > diff --git a/include/linux/percpu-refcount.h
> > b/include/linux/percpu-refcount.h
> > index 87d8a38bdea1..1d6ed9ca23dd 100644
> > --- a/incl
-by: Ming Lei
---
drivers/infiniband/sw/rdmavt/mr.c | 2 +-
include/linux/percpu-refcount.h | 45 --
lib/percpu-refcount.c | 131 ++
3 files changed, 116 insertions(+), 62 deletions(-)
diff --git a/drivers/infiniband/sw/rdmavt/mr.c
b/drivers
: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d5a3e1a4c2f7..67935b3bef6c 100644
--- a/include/linux/blkdev.h
027 failure reported by
kernel test robot
- protect percpu_ref_is_zero() with destroying percpu-refcount by
spin lock
Ming Lei (2):
percpu_ref: reduce memory footprint of percpu_ref in fast path
block: move 'q_usage_counter' into front of 'request_queue'
drivers/infinib
: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d5a3e1a4c2f7..67935b3bef6c 100644
--- a/include/linux/blkdev.h
-by: Ming Lei
---
drivers/infiniband/sw/rdmavt/mr.c | 2 +-
include/linux/percpu-refcount.h | 45 --
lib/percpu-refcount.c | 131 ++
3 files changed, 116 insertions(+), 62 deletions(-)
diff --git a/drivers/infiniband/sw/rdmavt/mr.c
b/drivers
...@vger.kernel.org
Cc: Sagi Grimberg
Cc: Tejun Heo
Cc: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
drivers/md/md.c | 2 +-
include/linux/percpu-refcount.h | 1 +
lib/percpu-refcount.c | 6 ++
3 files changed, 8 insertions(+), 1
unt by
spin lock
Ming Lei (3):
percpu_ref: add percpu_ref_is_initialized for MD
percpu_ref: reduce memory footprint of percpu_ref in fast path
block: move 'q_usage_counter' into front of 'request_queue'
drivers/infiniband/sw/rdmavt/mr.c | 2 +-
drivers/md/md.c | 2 +-
incl
m_cache *cachep)
> @@ -3402,9 +3406,9 @@ static void cache_flusharray(struct kmem_cache *cachep,
> struct array_cache *ac)
> }
> #endif
> spin_unlock(>list_lock);
> - slabs_destroy(cachep, );
> ac->avail -= batchcount;
> memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
> + slabs_destroy(cachep, );
> }
The issue can't be reproduced after applying this patch:
Tested-by: Ming Lei
Thanks,
Ming
On Fri, Sep 25, 2020 at 03:31:45PM +0800, Ming Lei wrote:
> On Thu, Sep 24, 2020 at 09:13:11PM -0400, Theodore Y. Ts'o wrote:
> > On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> >
On Thu, Sep 24, 2020 at 09:13:11PM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > that one of the stack traces had virtio balloon in the trace, I
> > realized that when I
On Fri, Sep 25, 2020 at 09:14:16AM +0800, Ming Lei wrote:
> On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > On Thu, Sep 24, 2020 at 08:59:01AM +0800, Ming Lei wrote:
> > >
> > > The list corruption issue can be reproduced on kvm/qumu guest too
On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 24, 2020 at 08:59:01AM +0800, Ming Lei wrote:
> >
> > The list corruption issue can be reproduced on kvm/qumu guest too when
> > running xfstests(ext4) generic/038.
> >
> &
On Thu, Sep 17, 2020 at 10:30:12AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 17, 2020 at 10:20:51AM +0800, Ming Lei wrote:
> >
> > Obviously there is other more serious issue, since 568f27006577 is
> > completely reverted in your test, and you still see list corruption
&
On Thu, Sep 17, 2020 at 09:04:55AM +0100, Christoph Hellwig wrote:
> On Wed, Sep 16, 2020 at 09:07:14AM -0400, Brian Foster wrote:
> > Dave described the main purpose earlier in this thread [1]. The initial
> > motivation is that we've had downstream reports of soft lockup problems
> > in
On Thu, Sep 17, 2020 at 10:30:12AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 17, 2020 at 10:20:51AM +0800, Ming Lei wrote:
> >
> > Obviously there is other more serious issue, since 568f27006577 is
> > completely reverted in your test, and you still see list corruption
&
On Wed, Sep 16, 2020 at 04:20:26PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Sep 16, 2020 at 07:09:41AM +0800, Ming Lei wrote:
> > > The problem is it's a bit tricky to revert 568f27006577, since there
> > > is a merge conflict in blk_kick_flush(). I attempted to do the bisec
On Tue, Sep 15, 2020 at 06:45:41PM -0400, Theodore Y. Ts'o wrote:
> On Tue, Sep 15, 2020 at 03:33:03PM +0800, Ming Lei wrote:
> > Hi Theodore,
> >
> > On Tue, Sep 15, 2020 at 12:45:19AM -0400, Theodore Y. Ts'o wrote:
> > > On Thu, Sep 03, 2020 at 11:55:28PM
Hi Theodore,
On Tue, Sep 15, 2020 at 12:45:19AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 03, 2020 at 11:55:28PM -0400, Theodore Y. Ts'o wrote:
> > Worse, right now, -rc1 and -rc2 is causing random crashes in my
> > gce-xfstests framework. Sometimes it happens before we've run even a
> >
: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7d82959e7b86..7b1e53084799 100644
--- a/include/linux/blkdev.h
...@vger.kernel.org
Cc: Sagi Grimberg
Cc: Tejun Heo
Cc: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
drivers/md/md.c | 2 +-
include/linux/percpu-refcount.h | 1 +
lib/percpu-refcount.c | 6 ++
3 files changed, 8 insertions(+), 1
Kabatova
V2:
- pass 'gfp' to kzalloc() for fixing block/027 failure reported by
kernel test robot
- protect percpu_ref_is_zero() with destroying percpu-refcount by
spin lock
Ming Lei (3):
percpu_ref: add percpu_ref_is_initialized for MD
percpu_ref: reduce
-by: Ming Lei
---
drivers/infiniband/sw/rdmavt/mr.c | 2 +-
include/linux/percpu-refcount.h | 45 --
lib/percpu-refcount.c | 131 ++
3 files changed, 116 insertions(+), 62 deletions(-)
diff --git a/drivers/infiniband/sw/rdmavt/mr.c
b/drivers
*same_page = false;
> return false;
> + }
> bv->bv_len += len;
> bio->bi_iter.bi_size += len;
> return true;
Reviewed-by: Ming Lei
--
Ming Lei
Hello Haifeng,
On Wed, Sep 09, 2020 at 02:11:20AM +, Zhao, Haifeng wrote:
> Ming, Christoph,
> Could you point out the patch aimed to fix this issue ? I would like to
> try it. This issue blocked my other PCI patch developing and verification
> work,
> I am not a BLOCK/NVMe expert,
ueue *q)
>
> kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
> kobject_del(q->mq_kobj);
> +out_kobj:
> kobject_put(>kobj);
> return ret;
> }
> --
> 2.28.0
>
Looks good fix:
Reviewed-by: Ming Lei
--
Ming
-off-by: Ming Lei
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d0d61bc81615..7575fa0aae6e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -397,6 +397,8 @@ struct
(), then memory footprint of
'percpu_ref' in fast path is reduced a lot and becomes suitable to put
into hot cacheline of user structure.
Cc: Sagi Grimberg
Cc: Tejun Heo
Cc: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
drivers/infiniband/sw/rdmavt/mr.c | 2
(two threads
per core) machine, dual socket/numa.
V2:
- pass 'gfp' to kzalloc() for fixing block/027 failure reported by
kernel test robot
- protect percpu_ref_is_zero() with destroying percpu-refcount by
spin lock
Ming Lei (2):
percpu_ref: reduce memory footprint
left;\
> skip = __v.iov_len; \
>
> and end up seeing overflows ("n" supposes to be less than PAGE_SIZE) before
> the
> soft-lockups and a dead system,
>
> [ 4300.249180][T470195] ITER_IOVEC left = 0, n = 48566423
>
> Thoughts?
Does the following patch make a difference for you?
https://lore.kernel.org/linux-block/20200817100055.2495905-1-ming@redhat.com/
thanks,
Ming Lei
On Tue, Aug 25, 2020 at 10:49:17AM -0400, Brian Foster wrote:
> cc Ming
>
> On Tue, Aug 25, 2020 at 10:42:03AM +1000, Dave Chinner wrote:
> > On Mon, Aug 24, 2020 at 11:48:41AM -0400, Brian Foster wrote:
> > > On Mon, Aug 24, 2020 at 04:04:17PM +0100, Christoph Hellwig wrote:
> > > > On Mon, Aug
On Wed, Aug 26, 2020 at 10:06:51AM +0800, Xianting Tian wrote:
> Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.
> And move the definition of BLK_MQ_NO_TAG from 'block/blk-mq-tag.h'
> to 'include/linux/blk-mq.h'
All three symbols are supposed for block core internal code only, so
-off-by: Ming Lei
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bb5636cc17b9..d8dba550ecac 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -396,6 +396,8 @@ struct
(two threads
per core) machine, dual socket/numa.
Ming Lei (2):
percpu_ref: reduce memory footprint of percpu_ref in fast path
block: move 'q_usage_counter' into front of 'request_queue'
drivers/infiniband/sw/rdmavt/mr.c | 2 +-
include/linux/blkdev.h| 3 +-
include/linux
(), then memory footprint of
'percpu_ref' in fast path is reduced a lot and becomes suitable to put
into hot cacheline of user structure.
Cc: Sagi Grimberg
Cc: Tejun Heo
Cc: Christoph Hellwig
Cc: Jens Axboe
Cc: Bart Van Assche
Signed-off-by: Ming Lei
---
drivers/infiniband/sw/rdmavt/mr.c | 2
me_page = ((vec_end_addr & PAGE_MASK) == page_addr);
> - if (!*same_page && pfn_to_page(PFN_DOWN(vec_end_addr)) + 1 != page)
> - return false;
> - return true;
> + if (*same_page)
> + return true;
> + return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
Looks this way is more straightforward, meantime can cover compound
pages:
Reviewed-by: Ming Lei
Thanks,
Ming
1 - 100 of 5415 matches
Mail list logo