Re: [PATCH v2 3/6] block: loop: convert to blk-mq
Hi Robert, Great thanks for your so detailed review. On Sun, Aug 31, 2014 at 6:03 AM, Elliott, Robert (Server Storage) wrote: > > >> -Original Message- >> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- >> ow...@vger.kernel.org] On Behalf Of Ming Lei >> Sent: Saturday, 30 August, 2014 11:08 AM >> To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp >> Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei >> Subject: [PATCH v2 3/6] block: loop: convert to blk-mq >> > ... >> -static inline void loop_handle_bio(struct loop_device *lo, struct >> bio *bio) >> +static inline int loop_handle_bio(struct loop_device *lo, struct bio >> *bio) >> { >> - if (unlikely(!bio->bi_bdev)) { >> - do_loop_switch(lo, bio->bi_private); >> - bio_put(bio); >> - } else { >> - int ret = do_bio_filebacked(lo, bio); >> - bio_endio(bio, ret); >> - } >> + int ret = do_bio_filebacked(lo, bio); >> + return ret; > > No need for the temporary variable; just return the function > call. > > ... >> +static int loop_prepare_hctxs(struct loop_device *lo) >> +{ >> + struct request_queue *q = lo->lo_queue; >> + struct blk_mq_hw_ctx *hctx; >> + struct loop_hctx_data *data; >> + unsigned int i; >> + >> + queue_for_each_hw_ctx(q, hctx, i) { >> + BUG_ON(i >= lo->tag_set.nr_hw_queues); > > If something goes bad in the loop driver like that, is it > necessary to crash the whole kernel? Actually, the BUG_ON() shouldn't have been added, will remove it. > >> + data = hctx->driver_data; >> + >> + data->lo = lo; >> + init_kthread_worker(>worker); >> + data->worker_task = kthread_run(kthread_worker_fn, >> + >worker, "loop%d-%d", >> + lo->lo_number, i); >> + if (IS_ERR(data->worker_task)) { >> + loop_unprepare_hctxs(lo, i); >> + return -ENOMEM; >> + } >> + set_user_nice(data->worker_task, MIN_NICE); >> + sched_getaffinity(data->worker_task->pid, hctx->cpumask); > > Is that supposed to be sched_setaffinity? It looks like > getaffinity does have a side-effect of updating the CPU mask > based on the current active cpus, but there won't be a CPU mask > to update unless setaffinity was called. Hmm, it is a typo, and I meant sched_setaffinity(), but it isn't exported, so set_cpus_allowed_ptr() should be used. > > ... >> @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo, >> fmode_t mode, >> >> set_blocksize(bdev, lo_blocksize); >> >> - lo->lo_thread = kthread_create(loop_thread, lo, "loop%d", >> - lo->lo_number); >> - if (IS_ERR(lo->lo_thread)) { >> - error = PTR_ERR(lo->lo_thread); >> + if ((error = loop_prepare_hctxs(lo)) != 0) >> goto out_clr; > > I've been told that linux kernel style is: > error = x(); > if (error) > > ... >> @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo) >> lo->lo_state = Lo_rundown; >> spin_unlock_irq(>lo_lock); >> >> - kthread_stop(lo->lo_thread); >> + loop_unprepare_hctxs(lo, lo->tag_set.nr_hw_queues); >> >> spin_lock_irq(>lo_lock); >> lo->lo_backing_file = NULL; > ... >> + >> +static int loop_prepare_flush_rq(void *data, struct request_queue >> *q, >> + struct request *flush_rq, >> + const struct request *src_rq) >> +{ >> + /* borrow initialization helper for common rq */ >> + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE); >> + return 0; >> +} > > In patch 2/6 that function is called with: > if (orig_rq->q->mq_ops->prepare_flush_rq) > ret = orig_rq->q->mq_ops->prepare_flush_rq( > orig_rq->q->tag_set->driver_data, > orig_rq->q, flush_rq, orig_rq); > > > The src_rq argument is not used in this new function. > Do you anticipate it might be necessary in other drivers? Yes, sooner or later the problem will be triggered in blk-mq conversion for current drivers, and current default behaviour is to copy pdu of src_rq to q->flush_rq, that is why the src_rq is passed. > Is this new function allowed to modify *data, *flush_rq, > or *src_rq? If not, const might be good to add. Instance pointed by data and src_rq shouldn't be modified. > > If orig_rq is always passed, then this function could > decode orig_rq->q and orig_rq->q->tag_set_>driver_data > on its own, eliminating the need for the first two arguments. Looks a good idea. > > Similarly, in the unprepare_flush_rq function in patch 2, > which is not provided by the loop driver in this patch: > > + if (q->mq_ops->unprepare_flush_rq) > + q->mq_ops->unprepare_flush_rq( > +
Re: [PATCH v2 3/6] block: loop: convert to blk-mq
Hi Robert, Great thanks for your so detailed review. On Sun, Aug 31, 2014 at 6:03 AM, Elliott, Robert (Server Storage) elli...@hp.com wrote: -Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- ow...@vger.kernel.org] On Behalf Of Ming Lei Sent: Saturday, 30 August, 2014 11:08 AM To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei Subject: [PATCH v2 3/6] block: loop: convert to blk-mq ... -static inline void loop_handle_bio(struct loop_device *lo, struct bio *bio) +static inline int loop_handle_bio(struct loop_device *lo, struct bio *bio) { - if (unlikely(!bio-bi_bdev)) { - do_loop_switch(lo, bio-bi_private); - bio_put(bio); - } else { - int ret = do_bio_filebacked(lo, bio); - bio_endio(bio, ret); - } + int ret = do_bio_filebacked(lo, bio); + return ret; No need for the temporary variable; just return the function call. ... +static int loop_prepare_hctxs(struct loop_device *lo) +{ + struct request_queue *q = lo-lo_queue; + struct blk_mq_hw_ctx *hctx; + struct loop_hctx_data *data; + unsigned int i; + + queue_for_each_hw_ctx(q, hctx, i) { + BUG_ON(i = lo-tag_set.nr_hw_queues); If something goes bad in the loop driver like that, is it necessary to crash the whole kernel? Actually, the BUG_ON() shouldn't have been added, will remove it. + data = hctx-driver_data; + + data-lo = lo; + init_kthread_worker(data-worker); + data-worker_task = kthread_run(kthread_worker_fn, + data-worker, loop%d-%d, + lo-lo_number, i); + if (IS_ERR(data-worker_task)) { + loop_unprepare_hctxs(lo, i); + return -ENOMEM; + } + set_user_nice(data-worker_task, MIN_NICE); + sched_getaffinity(data-worker_task-pid, hctx-cpumask); Is that supposed to be sched_setaffinity? It looks like getaffinity does have a side-effect of updating the CPU mask based on the current active cpus, but there won't be a CPU mask to update unless setaffinity was called. Hmm, it is a typo, and I meant sched_setaffinity(), but it isn't exported, so set_cpus_allowed_ptr() should be used. ... @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, set_blocksize(bdev, lo_blocksize); - lo-lo_thread = kthread_create(loop_thread, lo, loop%d, - lo-lo_number); - if (IS_ERR(lo-lo_thread)) { - error = PTR_ERR(lo-lo_thread); + if ((error = loop_prepare_hctxs(lo)) != 0) goto out_clr; I've been told that linux kernel style is: error = x(); if (error) ... @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo) lo-lo_state = Lo_rundown; spin_unlock_irq(lo-lo_lock); - kthread_stop(lo-lo_thread); + loop_unprepare_hctxs(lo, lo-tag_set.nr_hw_queues); spin_lock_irq(lo-lo_lock); lo-lo_backing_file = NULL; ... + +static int loop_prepare_flush_rq(void *data, struct request_queue *q, + struct request *flush_rq, + const struct request *src_rq) +{ + /* borrow initialization helper for common rq */ + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE); + return 0; +} In patch 2/6 that function is called with: if (orig_rq-q-mq_ops-prepare_flush_rq) ret = orig_rq-q-mq_ops-prepare_flush_rq( orig_rq-q-tag_set-driver_data, orig_rq-q, flush_rq, orig_rq); The src_rq argument is not used in this new function. Do you anticipate it might be necessary in other drivers? Yes, sooner or later the problem will be triggered in blk-mq conversion for current drivers, and current default behaviour is to copy pdu of src_rq to q-flush_rq, that is why the src_rq is passed. Is this new function allowed to modify *data, *flush_rq, or *src_rq? If not, const might be good to add. Instance pointed by data and src_rq shouldn't be modified. If orig_rq is always passed, then this function could decode orig_rq-q and orig_rq-q-tag_set_driver_data on its own, eliminating the need for the first two arguments. Looks a good idea. Similarly, in the unprepare_flush_rq function in patch 2, which is not provided by the loop driver in this patch: + if (q-mq_ops-unprepare_flush_rq) + q-mq_ops-unprepare_flush_rq( + q-tag_set-driver_data, + q, q-flush_rq); if q is always passed, then that function could calculate q-tag_set_driver_data and q-flush_rq itself, eliminating two arguments. I suggest to keep 'q' and
RE: [PATCH v2 3/6] block: loop: convert to blk-mq
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Ming Lei > Sent: Saturday, 30 August, 2014 11:08 AM > To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp > Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei > Subject: [PATCH v2 3/6] block: loop: convert to blk-mq > ... > -static inline void loop_handle_bio(struct loop_device *lo, struct > bio *bio) > +static inline int loop_handle_bio(struct loop_device *lo, struct bio > *bio) > { > - if (unlikely(!bio->bi_bdev)) { > - do_loop_switch(lo, bio->bi_private); > - bio_put(bio); > - } else { > - int ret = do_bio_filebacked(lo, bio); > - bio_endio(bio, ret); > - } > + int ret = do_bio_filebacked(lo, bio); > + return ret; No need for the temporary variable; just return the function call. ... > +static int loop_prepare_hctxs(struct loop_device *lo) > +{ > + struct request_queue *q = lo->lo_queue; > + struct blk_mq_hw_ctx *hctx; > + struct loop_hctx_data *data; > + unsigned int i; > + > + queue_for_each_hw_ctx(q, hctx, i) { > + BUG_ON(i >= lo->tag_set.nr_hw_queues); If something goes bad in the loop driver like that, is it necessary to crash the whole kernel? > + data = hctx->driver_data; > + > + data->lo = lo; > + init_kthread_worker(>worker); > + data->worker_task = kthread_run(kthread_worker_fn, > + >worker, "loop%d-%d", > + lo->lo_number, i); > + if (IS_ERR(data->worker_task)) { > + loop_unprepare_hctxs(lo, i); > + return -ENOMEM; > + } > + set_user_nice(data->worker_task, MIN_NICE); > + sched_getaffinity(data->worker_task->pid, hctx->cpumask); Is that supposed to be sched_setaffinity? It looks like getaffinity does have a side-effect of updating the CPU mask based on the current active cpus, but there won't be a CPU mask to update unless setaffinity was called. ... > @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo, > fmode_t mode, > > set_blocksize(bdev, lo_blocksize); > > - lo->lo_thread = kthread_create(loop_thread, lo, "loop%d", > - lo->lo_number); > - if (IS_ERR(lo->lo_thread)) { > - error = PTR_ERR(lo->lo_thread); > + if ((error = loop_prepare_hctxs(lo)) != 0) > goto out_clr; I've been told that linux kernel style is: error = x(); if (error) ... > @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo) > lo->lo_state = Lo_rundown; > spin_unlock_irq(>lo_lock); > > - kthread_stop(lo->lo_thread); > + loop_unprepare_hctxs(lo, lo->tag_set.nr_hw_queues); > > spin_lock_irq(>lo_lock); > lo->lo_backing_file = NULL; ... > + > +static int loop_prepare_flush_rq(void *data, struct request_queue > *q, > + struct request *flush_rq, > + const struct request *src_rq) > +{ > + /* borrow initialization helper for common rq */ > + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE); > + return 0; > +} In patch 2/6 that function is called with: if (orig_rq->q->mq_ops->prepare_flush_rq) ret = orig_rq->q->mq_ops->prepare_flush_rq( orig_rq->q->tag_set->driver_data, orig_rq->q, flush_rq, orig_rq); The src_rq argument is not used in this new function. Do you anticipate it might be necessary in other drivers? Is this new function allowed to modify *data, *flush_rq, or *src_rq? If not, const might be good to add. If orig_rq is always passed, then this function could decode orig_rq->q and orig_rq->q->tag_set_>driver_data on its own, eliminating the need for the first two arguments. Similarly, in the unprepare_flush_rq function in patch 2, which is not provided by the loop driver in this patch: + if (q->mq_ops->unprepare_flush_rq) + q->mq_ops->unprepare_flush_rq( + q->tag_set->driver_data, + q, q->flush_rq); if q is always passed, then that function could calculate q->tag_set_driver_data and q->flush_rq itself, eliminating two arguments. > + > +static int loop_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, > + unsigned int index) > +{ > + hctx->driver_data = kmalloc(sizeof(struct loop_hctx_data), > + GFP_KERNEL); hctx->numa_node has been set before this; how about passing it along to kmalloc_node in case it has a useful value? blk_mq_init_hw_queues sets it before calling this function: node = hctx->numa_node; if (node == NUMA_NO_NODE) node = hctx->numa_node = set->numa_node; ...
RE: [PATCH v2 3/6] block: loop: convert to blk-mq
-Original Message- From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- ow...@vger.kernel.org] On Behalf Of Ming Lei Sent: Saturday, 30 August, 2014 11:08 AM To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei Subject: [PATCH v2 3/6] block: loop: convert to blk-mq ... -static inline void loop_handle_bio(struct loop_device *lo, struct bio *bio) +static inline int loop_handle_bio(struct loop_device *lo, struct bio *bio) { - if (unlikely(!bio-bi_bdev)) { - do_loop_switch(lo, bio-bi_private); - bio_put(bio); - } else { - int ret = do_bio_filebacked(lo, bio); - bio_endio(bio, ret); - } + int ret = do_bio_filebacked(lo, bio); + return ret; No need for the temporary variable; just return the function call. ... +static int loop_prepare_hctxs(struct loop_device *lo) +{ + struct request_queue *q = lo-lo_queue; + struct blk_mq_hw_ctx *hctx; + struct loop_hctx_data *data; + unsigned int i; + + queue_for_each_hw_ctx(q, hctx, i) { + BUG_ON(i = lo-tag_set.nr_hw_queues); If something goes bad in the loop driver like that, is it necessary to crash the whole kernel? + data = hctx-driver_data; + + data-lo = lo; + init_kthread_worker(data-worker); + data-worker_task = kthread_run(kthread_worker_fn, + data-worker, loop%d-%d, + lo-lo_number, i); + if (IS_ERR(data-worker_task)) { + loop_unprepare_hctxs(lo, i); + return -ENOMEM; + } + set_user_nice(data-worker_task, MIN_NICE); + sched_getaffinity(data-worker_task-pid, hctx-cpumask); Is that supposed to be sched_setaffinity? It looks like getaffinity does have a side-effect of updating the CPU mask based on the current active cpus, but there won't be a CPU mask to update unless setaffinity was called. ... @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode, set_blocksize(bdev, lo_blocksize); - lo-lo_thread = kthread_create(loop_thread, lo, loop%d, - lo-lo_number); - if (IS_ERR(lo-lo_thread)) { - error = PTR_ERR(lo-lo_thread); + if ((error = loop_prepare_hctxs(lo)) != 0) goto out_clr; I've been told that linux kernel style is: error = x(); if (error) ... @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo) lo-lo_state = Lo_rundown; spin_unlock_irq(lo-lo_lock); - kthread_stop(lo-lo_thread); + loop_unprepare_hctxs(lo, lo-tag_set.nr_hw_queues); spin_lock_irq(lo-lo_lock); lo-lo_backing_file = NULL; ... + +static int loop_prepare_flush_rq(void *data, struct request_queue *q, + struct request *flush_rq, + const struct request *src_rq) +{ + /* borrow initialization helper for common rq */ + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE); + return 0; +} In patch 2/6 that function is called with: if (orig_rq-q-mq_ops-prepare_flush_rq) ret = orig_rq-q-mq_ops-prepare_flush_rq( orig_rq-q-tag_set-driver_data, orig_rq-q, flush_rq, orig_rq); The src_rq argument is not used in this new function. Do you anticipate it might be necessary in other drivers? Is this new function allowed to modify *data, *flush_rq, or *src_rq? If not, const might be good to add. If orig_rq is always passed, then this function could decode orig_rq-q and orig_rq-q-tag_set_driver_data on its own, eliminating the need for the first two arguments. Similarly, in the unprepare_flush_rq function in patch 2, which is not provided by the loop driver in this patch: + if (q-mq_ops-unprepare_flush_rq) + q-mq_ops-unprepare_flush_rq( + q-tag_set-driver_data, + q, q-flush_rq); if q is always passed, then that function could calculate q-tag_set_driver_data and q-flush_rq itself, eliminating two arguments. + +static int loop_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, + unsigned int index) +{ + hctx-driver_data = kmalloc(sizeof(struct loop_hctx_data), + GFP_KERNEL); hctx-numa_node has been set before this; how about passing it along to kmalloc_node in case it has a useful value? blk_mq_init_hw_queues sets it before calling this function: node = hctx-numa_node; if (node == NUMA_NO_NODE) node = hctx-numa_node = set-numa_node; ... if (set-ops-init_hctx set-ops-init_hctx(hctx, set-driver_data, i)) break;