Re: [PATCH 4.21 V3] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance
On 11/20/18 9:45 AM, Guenter Roeck wrote: > On Tue, Nov 20, 2018 at 08:51:50AM +0100, Greg Kroah-Hartman wrote: >> On Tue, Nov 20, 2018 at 09:44:35AM +0800, Ming Lei wrote: >>> Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime >>> from block layer's view, actually they don't because userspace may >>> grab one kobject anytime via sysfs. >>> >>> This patch fixes the issue by the following approach: >>> >>> 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing >>> all ctxs >>> >>> 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release >>> handler of .mq_kobj >>> >>> 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that >>> .mq_kobj is always released after all ctxs are freed. >>> >>> This patch fixes kernel panic issue during booting when >>> DEBUG_KOBJECT_RELEASE >>> is enabled. >>> >>> Reported-by: Guenter Roeck >>> Cc: "jianchao.wang" >>> Cc: Guenter Roeck >>> Cc: Greg Kroah-Hartman >>> Cc: sta...@vger.kernel.org >>> Signed-off-by: Ming Lei >>> --- >>> V3: >>> - keep to allocate q->queue_ctx via percpu allocator, so one extra >>> pointer reference can be saved for getting ctx >>> V2: >>> - allocate 'blk_mq_ctx' inside blk_mq_init_allocated_queue() >>> - allocate q->mq_kobj directly >> >> Not tested, but seems sane from a kobject point-of-view: >> >> Reviewed-by: Greg Kroah-Hartman > > Tested-by: Guenter Roeck > > with v4.14.y and v4.19.y. > > The patch is marked for v4.21. I would kindly suggest to not wait for v4.21 > but apply it to v4.20. This would let us enable DEBUG_KOBJECT_RELEASE > with syzbot on upstream and stable kernels. I'd very much like to put this into 4.21, and not 4.20, as that's much less risky. This isn't a new regression anyway, so there's no rush to put it into 4.20 as far as I'm concerned. -- Jens Axboe
Re: [PATCH 4.21 V3] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance
On Tue, Nov 20, 2018 at 08:51:50AM +0100, Greg Kroah-Hartman wrote: > On Tue, Nov 20, 2018 at 09:44:35AM +0800, Ming Lei wrote: > > Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime > > from block layer's view, actually they don't because userspace may > > grab one kobject anytime via sysfs. > > > > This patch fixes the issue by the following approach: > > > > 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing > > all ctxs > > > > 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release > > handler of .mq_kobj > > > > 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that > > .mq_kobj is always released after all ctxs are freed. > > > > This patch fixes kernel panic issue during booting when > > DEBUG_KOBJECT_RELEASE > > is enabled. > > > > Reported-by: Guenter Roeck > > Cc: "jianchao.wang" > > Cc: Guenter Roeck > > Cc: Greg Kroah-Hartman > > Cc: sta...@vger.kernel.org > > Signed-off-by: Ming Lei > > --- > > V3: > > - keep to allocate q->queue_ctx via percpu allocator, so one extra > > pointer reference can be saved for getting ctx > > V2: > > - allocate 'blk_mq_ctx' inside blk_mq_init_allocated_queue() > > - allocate q->mq_kobj directly > > Not tested, but seems sane from a kobject point-of-view: > > Reviewed-by: Greg Kroah-Hartman Tested-by: Guenter Roeck with v4.14.y and v4.19.y. The patch is marked for v4.21. I would kindly suggest to not wait for v4.21 but apply it to v4.20. This would let us enable DEBUG_KOBJECT_RELEASE with syzbot on upstream and stable kernels. Greg, applying the patch to v4.14.y will require a backport due to a minor context conflict. I'll send that to you after the patch is available in mainline. Thanks, Guenter
Re: [PATCH 4.21 V3] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance
On Tue, Nov 20, 2018 at 09:44:35AM +0800, Ming Lei wrote: > Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime > from block layer's view, actually they don't because userspace may > grab one kobject anytime via sysfs. > > This patch fixes the issue by the following approach: > > 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing > all ctxs > > 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release > handler of .mq_kobj > > 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that > .mq_kobj is always released after all ctxs are freed. > > This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE > is enabled. > > Reported-by: Guenter Roeck > Cc: "jianchao.wang" > Cc: Guenter Roeck > Cc: Greg Kroah-Hartman > Cc: sta...@vger.kernel.org > Signed-off-by: Ming Lei > --- > V3: > - keep to allocate q->queue_ctx via percpu allocator, so one extra > pointer reference can be saved for getting ctx > V2: > - allocate 'blk_mq_ctx' inside blk_mq_init_allocated_queue() > - allocate q->mq_kobj directly Not tested, but seems sane from a kobject point-of-view: Reviewed-by: Greg Kroah-Hartman
[PATCH 4.21 V3] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance
Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime from block layer's view, actually they don't because userspace may grab one kobject anytime via sysfs. This patch fixes the issue by the following approach: 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing all ctxs 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release handler of .mq_kobj 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that .mq_kobj is always released after all ctxs are freed. This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE is enabled. Reported-by: Guenter Roeck Cc: "jianchao.wang" Cc: Guenter Roeck Cc: Greg Kroah-Hartman Cc: sta...@vger.kernel.org Signed-off-by: Ming Lei --- V3: - keep to allocate q->queue_ctx via percpu allocator, so one extra pointer reference can be saved for getting ctx V2: - allocate 'blk_mq_ctx' inside blk_mq_init_allocated_queue() - allocate q->mq_kobj directly block/blk-mq-sysfs.c | 34 -- block/blk-mq.c | 39 --- block/blk-mq.h | 6 ++ include/linux/blkdev.h | 2 +- 4 files changed, 63 insertions(+), 18 deletions(-) diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index 3d25b9c419e9..6efef1f679f0 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -15,6 +15,18 @@ static void blk_mq_sysfs_release(struct kobject *kobj) { + struct blk_mq_ctxs *ctxs = container_of(kobj, struct blk_mq_ctxs, kobj); + + free_percpu(ctxs->queue_ctx); + kfree(ctxs); +} + +static void blk_mq_ctx_sysfs_release(struct kobject *kobj) +{ + struct blk_mq_ctx *ctx = container_of(kobj, struct blk_mq_ctx, kobj); + + /* ctx->ctxs won't be released until all ctx are freed */ + kobject_put(>ctxs->kobj); } static void blk_mq_hw_sysfs_release(struct kobject *kobj) @@ -213,7 +225,7 @@ static struct kobj_type blk_mq_ktype = { static struct kobj_type blk_mq_ctx_ktype = { .sysfs_ops = _mq_sysfs_ops, .default_attrs = default_ctx_attrs, - .release= blk_mq_sysfs_release, + .release= blk_mq_ctx_sysfs_release, }; static struct kobj_type blk_mq_hw_ktype = { @@ -245,7 +257,7 @@ static int blk_mq_register_hctx(struct blk_mq_hw_ctx *hctx) if (!hctx->nr_ctx) return 0; - ret = kobject_add(>kobj, >mq_kobj, "%u", hctx->queue_num); + ret = kobject_add(>kobj, q->mq_kobj, "%u", hctx->queue_num); if (ret) return ret; @@ -268,8 +280,8 @@ void blk_mq_unregister_dev(struct device *dev, struct request_queue *q) queue_for_each_hw_ctx(q, hctx, i) blk_mq_unregister_hctx(hctx); - kobject_uevent(>mq_kobj, KOBJ_REMOVE); - kobject_del(>mq_kobj); + kobject_uevent(q->mq_kobj, KOBJ_REMOVE); + kobject_del(q->mq_kobj); kobject_put(>kobj); q->mq_sysfs_init_done = false; @@ -289,7 +301,7 @@ void blk_mq_sysfs_deinit(struct request_queue *q) ctx = per_cpu_ptr(q->queue_ctx, cpu); kobject_put(>kobj); } - kobject_put(>mq_kobj); + kobject_put(q->mq_kobj); } void blk_mq_sysfs_init(struct request_queue *q) @@ -297,10 +309,12 @@ void blk_mq_sysfs_init(struct request_queue *q) struct blk_mq_ctx *ctx; int cpu; - kobject_init(>mq_kobj, _mq_ktype); + kobject_init(q->mq_kobj, _mq_ktype); for_each_possible_cpu(cpu) { ctx = per_cpu_ptr(q->queue_ctx, cpu); + + kobject_get(q->mq_kobj); kobject_init(>kobj, _mq_ctx_ktype); } } @@ -313,11 +327,11 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q) WARN_ON_ONCE(!q->kobj.parent); lockdep_assert_held(>sysfs_lock); - ret = kobject_add(>mq_kobj, kobject_get(>kobj), "%s", "mq"); + ret = kobject_add(q->mq_kobj, kobject_get(>kobj), "%s", "mq"); if (ret < 0) goto out; - kobject_uevent(>mq_kobj, KOBJ_ADD); + kobject_uevent(q->mq_kobj, KOBJ_ADD); queue_for_each_hw_ctx(q, hctx, i) { ret = blk_mq_register_hctx(hctx); @@ -334,8 +348,8 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q) while (--i >= 0) blk_mq_unregister_hctx(q->queue_hw_ctx[i]); - kobject_uevent(>mq_kobj, KOBJ_REMOVE); - kobject_del(>mq_kobj); + kobject_uevent(q->mq_kobj, KOBJ_REMOVE); + kobject_del(q->mq_kobj); kobject_put(>kobj); return ret; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 32b246ed44c0..9228f2e9bd40 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2515,6 +2515,34 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set, mutex_unlock(>tag_list_lock); } +/* All allocations will be freed in release handler of q->mq_kobj */ +static