Re: [PATCH v2 3/6] block: loop: convert to blk-mq

2014-08-31 Thread Ming Lei
Hi Robert,

Great thanks for your so detailed review.

On Sun, Aug 31, 2014 at 6:03 AM, Elliott, Robert (Server Storage)
 wrote:
>
>
>> -Original Message-
>> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
>> ow...@vger.kernel.org] On Behalf Of Ming Lei
>> Sent: Saturday, 30 August, 2014 11:08 AM
>> To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp
>> Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei
>> Subject: [PATCH v2 3/6] block: loop: convert to blk-mq
>>
> ...
>> -static inline void loop_handle_bio(struct loop_device *lo, struct
>> bio *bio)
>> +static inline int loop_handle_bio(struct loop_device *lo, struct bio
>> *bio)
>>  {
>> - if (unlikely(!bio->bi_bdev)) {
>> - do_loop_switch(lo, bio->bi_private);
>> - bio_put(bio);
>> - } else {
>> - int ret = do_bio_filebacked(lo, bio);
>> - bio_endio(bio, ret);
>> - }
>> + int ret = do_bio_filebacked(lo, bio);
>> + return ret;
>
> No need for the temporary variable; just return the function
> call.
>
> ...
>> +static int loop_prepare_hctxs(struct loop_device *lo)
>> +{
>> + struct request_queue *q = lo->lo_queue;
>> + struct blk_mq_hw_ctx *hctx;
>> + struct loop_hctx_data *data;
>> + unsigned int i;
>> +
>> + queue_for_each_hw_ctx(q, hctx, i) {
>> + BUG_ON(i >= lo->tag_set.nr_hw_queues);
>
> If something goes bad in the loop driver like that, is it
> necessary to crash the whole kernel?

Actually, the BUG_ON() shouldn't have been added, will remove it.

>
>> + data = hctx->driver_data;
>> +
>> + data->lo = lo;
>> + init_kthread_worker(>worker);
>> + data->worker_task = kthread_run(kthread_worker_fn,
>> + >worker, "loop%d-%d",
>> + lo->lo_number, i);
>> + if (IS_ERR(data->worker_task)) {
>> + loop_unprepare_hctxs(lo, i);
>> + return -ENOMEM;
>> + }
>> + set_user_nice(data->worker_task, MIN_NICE);
>> + sched_getaffinity(data->worker_task->pid, hctx->cpumask);
>
> Is that supposed to be sched_setaffinity?  It looks like
> getaffinity does have a side-effect of updating the CPU mask
> based on the current active cpus, but there won't be a CPU mask
> to update unless setaffinity was called.

Hmm, it is a typo, and I meant sched_setaffinity(), but it isn't exported,
so set_cpus_allowed_ptr() should be used.

>
> ...
>> @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo,
>> fmode_t mode,
>>
>>   set_blocksize(bdev, lo_blocksize);
>>
>> - lo->lo_thread = kthread_create(loop_thread, lo, "loop%d",
>> - lo->lo_number);
>> - if (IS_ERR(lo->lo_thread)) {
>> - error = PTR_ERR(lo->lo_thread);
>> + if ((error = loop_prepare_hctxs(lo)) != 0)
>>   goto out_clr;
>
> I've been told that linux kernel style is:
> error = x();
> if (error)
>
> ...
>> @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo)
>>   lo->lo_state = Lo_rundown;
>>   spin_unlock_irq(>lo_lock);
>>
>> - kthread_stop(lo->lo_thread);
>> + loop_unprepare_hctxs(lo, lo->tag_set.nr_hw_queues);
>>
>>   spin_lock_irq(>lo_lock);
>>   lo->lo_backing_file = NULL;
> ...
>> +
>> +static int loop_prepare_flush_rq(void *data, struct request_queue
>> *q,
>> + struct request *flush_rq,
>> + const struct request *src_rq)
>> +{
>> + /* borrow initialization helper for common rq */
>> + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE);
>> + return 0;
>> +}
>
> In patch 2/6 that function is called with:
> if (orig_rq->q->mq_ops->prepare_flush_rq)
> ret = orig_rq->q->mq_ops->prepare_flush_rq(
> orig_rq->q->tag_set->driver_data,
> orig_rq->q, flush_rq, orig_rq);
>
>
> The src_rq argument is not used in this new function.
> Do you anticipate it might be necessary in other drivers?

Yes, sooner or later the problem will be triggered in blk-mq
conversion for current drivers, and current default behaviour is to
copy pdu of src_rq to q->flush_rq, that is why the src_rq is passed.

> Is this new function allowed to modify *data, *flush_rq,
> or *src_rq?  If not, const might be good to add.

Instance pointed by data and src_rq shouldn't be modified.

>
> If orig_rq is always passed, then this function could
> decode orig_rq->q and orig_rq->q->tag_set_>driver_data
> on its own, eliminating the need for the first two arguments.

Looks a good idea.

>
> Similarly, in the unprepare_flush_rq function in patch 2,
> which is not provided by the loop driver in this patch:
>
> +   if (q->mq_ops->unprepare_flush_rq)
> +   q->mq_ops->unprepare_flush_rq(
> +   

Re: [PATCH v2 3/6] block: loop: convert to blk-mq

2014-08-31 Thread Ming Lei
Hi Robert,

Great thanks for your so detailed review.

On Sun, Aug 31, 2014 at 6:03 AM, Elliott, Robert (Server Storage)
elli...@hp.com wrote:


 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
 ow...@vger.kernel.org] On Behalf Of Ming Lei
 Sent: Saturday, 30 August, 2014 11:08 AM
 To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp
 Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei
 Subject: [PATCH v2 3/6] block: loop: convert to blk-mq

 ...
 -static inline void loop_handle_bio(struct loop_device *lo, struct
 bio *bio)
 +static inline int loop_handle_bio(struct loop_device *lo, struct bio
 *bio)
  {
 - if (unlikely(!bio-bi_bdev)) {
 - do_loop_switch(lo, bio-bi_private);
 - bio_put(bio);
 - } else {
 - int ret = do_bio_filebacked(lo, bio);
 - bio_endio(bio, ret);
 - }
 + int ret = do_bio_filebacked(lo, bio);
 + return ret;

 No need for the temporary variable; just return the function
 call.

 ...
 +static int loop_prepare_hctxs(struct loop_device *lo)
 +{
 + struct request_queue *q = lo-lo_queue;
 + struct blk_mq_hw_ctx *hctx;
 + struct loop_hctx_data *data;
 + unsigned int i;
 +
 + queue_for_each_hw_ctx(q, hctx, i) {
 + BUG_ON(i = lo-tag_set.nr_hw_queues);

 If something goes bad in the loop driver like that, is it
 necessary to crash the whole kernel?

Actually, the BUG_ON() shouldn't have been added, will remove it.


 + data = hctx-driver_data;
 +
 + data-lo = lo;
 + init_kthread_worker(data-worker);
 + data-worker_task = kthread_run(kthread_worker_fn,
 + data-worker, loop%d-%d,
 + lo-lo_number, i);
 + if (IS_ERR(data-worker_task)) {
 + loop_unprepare_hctxs(lo, i);
 + return -ENOMEM;
 + }
 + set_user_nice(data-worker_task, MIN_NICE);
 + sched_getaffinity(data-worker_task-pid, hctx-cpumask);

 Is that supposed to be sched_setaffinity?  It looks like
 getaffinity does have a side-effect of updating the CPU mask
 based on the current active cpus, but there won't be a CPU mask
 to update unless setaffinity was called.

Hmm, it is a typo, and I meant sched_setaffinity(), but it isn't exported,
so set_cpus_allowed_ptr() should be used.


 ...
 @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo,
 fmode_t mode,

   set_blocksize(bdev, lo_blocksize);

 - lo-lo_thread = kthread_create(loop_thread, lo, loop%d,
 - lo-lo_number);
 - if (IS_ERR(lo-lo_thread)) {
 - error = PTR_ERR(lo-lo_thread);
 + if ((error = loop_prepare_hctxs(lo)) != 0)
   goto out_clr;

 I've been told that linux kernel style is:
 error = x();
 if (error)

 ...
 @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo)
   lo-lo_state = Lo_rundown;
   spin_unlock_irq(lo-lo_lock);

 - kthread_stop(lo-lo_thread);
 + loop_unprepare_hctxs(lo, lo-tag_set.nr_hw_queues);

   spin_lock_irq(lo-lo_lock);
   lo-lo_backing_file = NULL;
 ...
 +
 +static int loop_prepare_flush_rq(void *data, struct request_queue
 *q,
 + struct request *flush_rq,
 + const struct request *src_rq)
 +{
 + /* borrow initialization helper for common rq */
 + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE);
 + return 0;
 +}

 In patch 2/6 that function is called with:
 if (orig_rq-q-mq_ops-prepare_flush_rq)
 ret = orig_rq-q-mq_ops-prepare_flush_rq(
 orig_rq-q-tag_set-driver_data,
 orig_rq-q, flush_rq, orig_rq);


 The src_rq argument is not used in this new function.
 Do you anticipate it might be necessary in other drivers?

Yes, sooner or later the problem will be triggered in blk-mq
conversion for current drivers, and current default behaviour is to
copy pdu of src_rq to q-flush_rq, that is why the src_rq is passed.

 Is this new function allowed to modify *data, *flush_rq,
 or *src_rq?  If not, const might be good to add.

Instance pointed by data and src_rq shouldn't be modified.


 If orig_rq is always passed, then this function could
 decode orig_rq-q and orig_rq-q-tag_set_driver_data
 on its own, eliminating the need for the first two arguments.

Looks a good idea.


 Similarly, in the unprepare_flush_rq function in patch 2,
 which is not provided by the loop driver in this patch:

 +   if (q-mq_ops-unprepare_flush_rq)
 +   q-mq_ops-unprepare_flush_rq(
 +   q-tag_set-driver_data,
 +   q, q-flush_rq);

 if q is always passed, then that function could calculate
 q-tag_set_driver_data and q-flush_rq itself, eliminating
 two arguments.

I suggest to keep 'q' and 

RE: [PATCH v2 3/6] block: loop: convert to blk-mq

2014-08-30 Thread Elliott, Robert (Server Storage)


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Ming Lei
> Sent: Saturday, 30 August, 2014 11:08 AM
> To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp
> Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei
> Subject: [PATCH v2 3/6] block: loop: convert to blk-mq
> 
...
> -static inline void loop_handle_bio(struct loop_device *lo, struct
> bio *bio)
> +static inline int loop_handle_bio(struct loop_device *lo, struct bio
> *bio)
>  {
> - if (unlikely(!bio->bi_bdev)) {
> - do_loop_switch(lo, bio->bi_private);
> - bio_put(bio);
> - } else {
> - int ret = do_bio_filebacked(lo, bio);
> - bio_endio(bio, ret);
> - }
> + int ret = do_bio_filebacked(lo, bio);
> + return ret;

No need for the temporary variable; just return the function
call.

...
> +static int loop_prepare_hctxs(struct loop_device *lo)
> +{
> + struct request_queue *q = lo->lo_queue;
> + struct blk_mq_hw_ctx *hctx;
> + struct loop_hctx_data *data;
> + unsigned int i;
> +
> + queue_for_each_hw_ctx(q, hctx, i) {
> + BUG_ON(i >= lo->tag_set.nr_hw_queues);

If something goes bad in the loop driver like that, is it
necessary to crash the whole kernel?  

> + data = hctx->driver_data;
> +
> + data->lo = lo;
> + init_kthread_worker(>worker);
> + data->worker_task = kthread_run(kthread_worker_fn,
> + >worker, "loop%d-%d",
> + lo->lo_number, i);
> + if (IS_ERR(data->worker_task)) {
> + loop_unprepare_hctxs(lo, i);
> + return -ENOMEM;
> + }
> + set_user_nice(data->worker_task, MIN_NICE);
> + sched_getaffinity(data->worker_task->pid, hctx->cpumask);

Is that supposed to be sched_setaffinity?  It looks like
getaffinity does have a side-effect of updating the CPU mask
based on the current active cpus, but there won't be a CPU mask
to update unless setaffinity was called.

...
> @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo,
> fmode_t mode,
> 
>   set_blocksize(bdev, lo_blocksize);
> 
> - lo->lo_thread = kthread_create(loop_thread, lo, "loop%d",
> - lo->lo_number);
> - if (IS_ERR(lo->lo_thread)) {
> - error = PTR_ERR(lo->lo_thread);
> + if ((error = loop_prepare_hctxs(lo)) != 0)
>   goto out_clr;

I've been told that linux kernel style is:
error = x();
if (error)

...
> @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo)
>   lo->lo_state = Lo_rundown;
>   spin_unlock_irq(>lo_lock);
> 
> - kthread_stop(lo->lo_thread);
> + loop_unprepare_hctxs(lo, lo->tag_set.nr_hw_queues);
> 
>   spin_lock_irq(>lo_lock);
>   lo->lo_backing_file = NULL;
...
> +
> +static int loop_prepare_flush_rq(void *data, struct request_queue
> *q,
> + struct request *flush_rq,
> + const struct request *src_rq)
> +{
> + /* borrow initialization helper for common rq */
> + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE);
> + return 0;
> +}

In patch 2/6 that function is called with:
if (orig_rq->q->mq_ops->prepare_flush_rq)
ret = orig_rq->q->mq_ops->prepare_flush_rq(
orig_rq->q->tag_set->driver_data,
orig_rq->q, flush_rq, orig_rq);


The src_rq argument is not used in this new function.
Do you anticipate it might be necessary in other drivers?

Is this new function allowed to modify *data, *flush_rq,
or *src_rq?  If not, const might be good to add.

If orig_rq is always passed, then this function could 
decode orig_rq->q and orig_rq->q->tag_set_>driver_data
on its own, eliminating the need for the first two arguments.

Similarly, in the unprepare_flush_rq function in patch 2,
which is not provided by the loop driver in this patch:

+   if (q->mq_ops->unprepare_flush_rq)
+   q->mq_ops->unprepare_flush_rq(
+   q->tag_set->driver_data,
+   q, q->flush_rq);

if q is always passed, then that function could calculate
q->tag_set_driver_data and q->flush_rq itself, eliminating
two arguments.

> +
> +static int loop_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
> + unsigned int index)
> +{
> + hctx->driver_data = kmalloc(sizeof(struct loop_hctx_data),
> + GFP_KERNEL);

hctx->numa_node has been set before this; how about passing it 
along to kmalloc_node in case it has a useful value?

blk_mq_init_hw_queues sets it before calling this function:
node = hctx->numa_node;
if (node == NUMA_NO_NODE)
node = hctx->numa_node = set->numa_node;
...

RE: [PATCH v2 3/6] block: loop: convert to blk-mq

2014-08-30 Thread Elliott, Robert (Server Storage)


 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
 ow...@vger.kernel.org] On Behalf Of Ming Lei
 Sent: Saturday, 30 August, 2014 11:08 AM
 To: Jens Axboe; linux-kernel@vger.kernel.org; Dave Kleikamp
 Cc: Zach Brown; Christoph Hellwig; Maxim Patlasov; Ming Lei
 Subject: [PATCH v2 3/6] block: loop: convert to blk-mq
 
...
 -static inline void loop_handle_bio(struct loop_device *lo, struct
 bio *bio)
 +static inline int loop_handle_bio(struct loop_device *lo, struct bio
 *bio)
  {
 - if (unlikely(!bio-bi_bdev)) {
 - do_loop_switch(lo, bio-bi_private);
 - bio_put(bio);
 - } else {
 - int ret = do_bio_filebacked(lo, bio);
 - bio_endio(bio, ret);
 - }
 + int ret = do_bio_filebacked(lo, bio);
 + return ret;

No need for the temporary variable; just return the function
call.

...
 +static int loop_prepare_hctxs(struct loop_device *lo)
 +{
 + struct request_queue *q = lo-lo_queue;
 + struct blk_mq_hw_ctx *hctx;
 + struct loop_hctx_data *data;
 + unsigned int i;
 +
 + queue_for_each_hw_ctx(q, hctx, i) {
 + BUG_ON(i = lo-tag_set.nr_hw_queues);

If something goes bad in the loop driver like that, is it
necessary to crash the whole kernel?  

 + data = hctx-driver_data;
 +
 + data-lo = lo;
 + init_kthread_worker(data-worker);
 + data-worker_task = kthread_run(kthread_worker_fn,
 + data-worker, loop%d-%d,
 + lo-lo_number, i);
 + if (IS_ERR(data-worker_task)) {
 + loop_unprepare_hctxs(lo, i);
 + return -ENOMEM;
 + }
 + set_user_nice(data-worker_task, MIN_NICE);
 + sched_getaffinity(data-worker_task-pid, hctx-cpumask);

Is that supposed to be sched_setaffinity?  It looks like
getaffinity does have a side-effect of updating the CPU mask
based on the current active cpus, but there won't be a CPU mask
to update unless setaffinity was called.

...
 @@ -906,14 +848,10 @@ static int loop_set_fd(struct loop_device *lo,
 fmode_t mode,
 
   set_blocksize(bdev, lo_blocksize);
 
 - lo-lo_thread = kthread_create(loop_thread, lo, loop%d,
 - lo-lo_number);
 - if (IS_ERR(lo-lo_thread)) {
 - error = PTR_ERR(lo-lo_thread);
 + if ((error = loop_prepare_hctxs(lo)) != 0)
   goto out_clr;

I've been told that linux kernel style is:
error = x();
if (error)

...
 @@ -1014,7 +951,7 @@ static int loop_clr_fd(struct loop_device *lo)
   lo-lo_state = Lo_rundown;
   spin_unlock_irq(lo-lo_lock);
 
 - kthread_stop(lo-lo_thread);
 + loop_unprepare_hctxs(lo, lo-tag_set.nr_hw_queues);
 
   spin_lock_irq(lo-lo_lock);
   lo-lo_backing_file = NULL;
...
 +
 +static int loop_prepare_flush_rq(void *data, struct request_queue
 *q,
 + struct request *flush_rq,
 + const struct request *src_rq)
 +{
 + /* borrow initialization helper for common rq */
 + loop_init_request(data, flush_rq, 0, -1, NUMA_NO_NODE);
 + return 0;
 +}

In patch 2/6 that function is called with:
if (orig_rq-q-mq_ops-prepare_flush_rq)
ret = orig_rq-q-mq_ops-prepare_flush_rq(
orig_rq-q-tag_set-driver_data,
orig_rq-q, flush_rq, orig_rq);


The src_rq argument is not used in this new function.
Do you anticipate it might be necessary in other drivers?

Is this new function allowed to modify *data, *flush_rq,
or *src_rq?  If not, const might be good to add.

If orig_rq is always passed, then this function could 
decode orig_rq-q and orig_rq-q-tag_set_driver_data
on its own, eliminating the need for the first two arguments.

Similarly, in the unprepare_flush_rq function in patch 2,
which is not provided by the loop driver in this patch:

+   if (q-mq_ops-unprepare_flush_rq)
+   q-mq_ops-unprepare_flush_rq(
+   q-tag_set-driver_data,
+   q, q-flush_rq);

if q is always passed, then that function could calculate
q-tag_set_driver_data and q-flush_rq itself, eliminating
two arguments.

 +
 +static int loop_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 + unsigned int index)
 +{
 + hctx-driver_data = kmalloc(sizeof(struct loop_hctx_data),
 + GFP_KERNEL);

hctx-numa_node has been set before this; how about passing it 
along to kmalloc_node in case it has a useful value?

blk_mq_init_hw_queues sets it before calling this function:
node = hctx-numa_node;
if (node == NUMA_NO_NODE)
node = hctx-numa_node = set-numa_node;
...
   if (set-ops-init_hctx 
set-ops-init_hctx(hctx, set-driver_data, i))
break;