Re: [patch] cfq-iosched: fix incorrect filing of rt async cfqq

2015-01-21 Thread Jens Axboe
On 01/12/2015 01:21 PM, Jeff Moyer wrote:
> Hi,
> 
> If you can manage to submit an async write as the first async I/O from
> the context of a process with realtime scheduling priority, then a
> cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
> ends up in the best effort array, but actually has realtime I/O
> scheduling priority set in cfqq->ioprio.

Applied, thanks Jeff. I've added a stable tag to it as well.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cfq-iosched: fix incorrect filing of rt async cfqq

2015-01-21 Thread Jens Axboe
On 01/12/2015 01:21 PM, Jeff Moyer wrote:
 Hi,
 
 If you can manage to submit an async write as the first async I/O from
 the context of a process with realtime scheduling priority, then a
 cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
 ends up in the best effort array, but actually has realtime I/O
 scheduling priority set in cfqq-ioprio.

Applied, thanks Jeff. I've added a stable tag to it as well.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cfq-iosched: fix incorrect filing of rt async cfqq

2015-01-20 Thread Hidehiro Kawai
Hi,

Thank you for the patch, Jeff!

(2015/01/13 5:21), Jeff Moyer wrote:
> Hi,
>
> If you can manage to submit an async write as the first async I/O from
> the context of a process with realtime scheduling priority, then a
> cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
> ends up in the best effort array, but actually has realtime I/O
> scheduling priority set in cfqq->ioprio.

Actually, this bug causes a severe I/O starvation in the following scenario
(we've experienced):

1. RT demon issues an O_SYNC write to an ext2 filesystem early in the
   boot sequence (this is an example which causes the first async write
   to a device in an RT process context)

2. Do heavy buffered writes to the device (these writes will be
   treated as RT class)

3. Issue a sync I/O to the device, and it will take over tens seconds
   to be prevented by many RT class I/Os

I confirmed Jeff's patch solves the above problem.

Tested-by: Hidehiro Kawai 

>
> The reason is that cfq_get_queue assumes the default scheduling class and
> priority when there is no information present (i.e. when the async cfqq
> is created):
>
> static struct cfq_queue *
> cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
> struct bio *bio, gfp_t gfp_mask)
> {
>   const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
>   const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
>
> cic->ioprio starts out as 0, which is "invalid".  So, class of 0
> (IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:
>
>   async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
>
> static struct cfq_queue **
> cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
> {
> switch (ioprio_class) {
> case IOPRIO_CLASS_RT:
> return >async_cfqq[0][ioprio];
> case IOPRIO_CLASS_NONE:
> ioprio = IOPRIO_NORM;
> /* fall through */
> case IOPRIO_CLASS_BE:
> return >async_cfqq[1][ioprio];
> case IOPRIO_CLASS_IDLE:
> return >async_idle_cfqq;
> default:
> BUG();
> }
> }
>
> Here, instead of returning a class mapped from the process' scheduling
> priority, we get back the bucket associated with IOPRIO_CLASS_BE.
>
> Now, there is no queue allocated there yet, so we create it:
>
>   cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
>
> That function ends up doing this:
>
>   cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
>   cfq_init_prio_data(cfqq, cic);
>
> cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
> data does this:
>
>   ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
>   switch (ioprio_class) {
>   default:
>   printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
>   case IOPRIO_CLASS_NONE:
>   /*
>* no prio set, inherit CPU scheduling settings
>*/
>   cfqq->ioprio = task_nice_ioprio(tsk);
>   cfqq->ioprio_class = task_nice_ioclass(tsk);
>   break;
>
> So we basically have two code paths that treat IOPRIO_CLASS_NONE
> differently, which results in an RT async cfqq filed into a best effort
> bucket.
>
> Attached is a patch which fixes the problem.  I'm not sure how to make
> it cleaner.  Suggestions would be welcome.
>
> Signed-off-by: Jeff Moyer 
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 6f2751d..b9abdca 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -3656,12 +3656,17 @@ static struct cfq_queue *
>  cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
> struct bio *bio, gfp_t gfp_mask)
>  {
> - const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
> - const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
> + int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
> + int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
>   struct cfq_queue **async_cfqq = NULL;
>   struct cfq_queue *cfqq = NULL;
>  
>   if (!is_sync) {
> + if (!ioprio_valid(cic->ioprio)) {
> + struct task_struct *tsk = current;
> + ioprio = task_nice_ioprio(tsk);
> + ioprio_class = task_nice_ioclass(tsk);
> + }
>   async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
>   cfqq = *async_cfqq;
>   }
>
>


--
Hidehiro Kawai
Hitachi, Yokohama Research Laboratory

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] cfq-iosched: fix incorrect filing of rt async cfqq

2015-01-20 Thread Hidehiro Kawai
Hi,

Thank you for the patch, Jeff!

(2015/01/13 5:21), Jeff Moyer wrote:
 Hi,

 If you can manage to submit an async write as the first async I/O from
 the context of a process with realtime scheduling priority, then a
 cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
 ends up in the best effort array, but actually has realtime I/O
 scheduling priority set in cfqq-ioprio.

Actually, this bug causes a severe I/O starvation in the following scenario
(we've experienced):

1. RT demon issues an O_SYNC write to an ext2 filesystem early in the
   boot sequence (this is an example which causes the first async write
   to a device in an RT process context)

2. Do heavy buffered writes to the device (these writes will be
   treated as RT class)

3. Issue a sync I/O to the device, and it will take over tens seconds
   to be prevented by many RT class I/Os

I confirmed Jeff's patch solves the above problem.

Tested-by: Hidehiro Kawai hidehiro.kawai...@hitachi.com


 The reason is that cfq_get_queue assumes the default scheduling class and
 priority when there is no information present (i.e. when the async cfqq
 is created):

 static struct cfq_queue *
 cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 struct bio *bio, gfp_t gfp_mask)
 {
   const int ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
   const int ioprio = IOPRIO_PRIO_DATA(cic-ioprio);

 cic-ioprio starts out as 0, which is invalid.  So, class of 0
 (IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:

   async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);

 static struct cfq_queue **
 cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
 {
 switch (ioprio_class) {
 case IOPRIO_CLASS_RT:
 return cfqd-async_cfqq[0][ioprio];
 case IOPRIO_CLASS_NONE:
 ioprio = IOPRIO_NORM;
 /* fall through */
 case IOPRIO_CLASS_BE:
 return cfqd-async_cfqq[1][ioprio];
 case IOPRIO_CLASS_IDLE:
 return cfqd-async_idle_cfqq;
 default:
 BUG();
 }
 }

 Here, instead of returning a class mapped from the process' scheduling
 priority, we get back the bucket associated with IOPRIO_CLASS_BE.

 Now, there is no queue allocated there yet, so we create it:

   cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);

 That function ends up doing this:

   cfq_init_cfqq(cfqd, cfqq, current-pid, is_sync);
   cfq_init_prio_data(cfqq, cic);

 cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
 data does this:

   ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
   switch (ioprio_class) {
   default:
   printk(KERN_ERR cfq: bad prio %x\n, ioprio_class);
   case IOPRIO_CLASS_NONE:
   /*
* no prio set, inherit CPU scheduling settings
*/
   cfqq-ioprio = task_nice_ioprio(tsk);
   cfqq-ioprio_class = task_nice_ioclass(tsk);
   break;

 So we basically have two code paths that treat IOPRIO_CLASS_NONE
 differently, which results in an RT async cfqq filed into a best effort
 bucket.

 Attached is a patch which fixes the problem.  I'm not sure how to make
 it cleaner.  Suggestions would be welcome.

 Signed-off-by: Jeff Moyer jmo...@redhat.com

 diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
 index 6f2751d..b9abdca 100644
 --- a/block/cfq-iosched.c
 +++ b/block/cfq-iosched.c
 @@ -3656,12 +3656,17 @@ static struct cfq_queue *
  cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
 struct bio *bio, gfp_t gfp_mask)
  {
 - const int ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
 - const int ioprio = IOPRIO_PRIO_DATA(cic-ioprio);
 + int ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
 + int ioprio = IOPRIO_PRIO_DATA(cic-ioprio);
   struct cfq_queue **async_cfqq = NULL;
   struct cfq_queue *cfqq = NULL;
  
   if (!is_sync) {
 + if (!ioprio_valid(cic-ioprio)) {
 + struct task_struct *tsk = current;
 + ioprio = task_nice_ioprio(tsk);
 + ioprio_class = task_nice_ioclass(tsk);
 + }
   async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
   cfqq = *async_cfqq;
   }




--
Hidehiro Kawai
Hitachi, Yokohama Research Laboratory

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] cfq-iosched: fix incorrect filing of rt async cfqq

2015-01-12 Thread Jeff Moyer
Hi,

If you can manage to submit an async write as the first async I/O from
the context of a process with realtime scheduling priority, then a
cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
ends up in the best effort array, but actually has realtime I/O
scheduling priority set in cfqq->ioprio.

The reason is that cfq_get_queue assumes the default scheduling class and
priority when there is no information present (i.e. when the async cfqq
is created):

static struct cfq_queue *
cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
  struct bio *bio, gfp_t gfp_mask)
{
const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);

cic->ioprio starts out as 0, which is "invalid".  So, class of 0
(IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:

async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);

static struct cfq_queue **
cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
{
switch (ioprio_class) {
case IOPRIO_CLASS_RT:
return >async_cfqq[0][ioprio];
case IOPRIO_CLASS_NONE:
ioprio = IOPRIO_NORM;
/* fall through */
case IOPRIO_CLASS_BE:
return >async_cfqq[1][ioprio];
case IOPRIO_CLASS_IDLE:
return >async_idle_cfqq;
default:
BUG();
}
}

Here, instead of returning a class mapped from the process' scheduling
priority, we get back the bucket associated with IOPRIO_CLASS_BE.

Now, there is no queue allocated there yet, so we create it:

cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);

That function ends up doing this:

cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
cfq_init_prio_data(cfqq, cic);

cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
data does this:

ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
switch (ioprio_class) {
default:
printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
case IOPRIO_CLASS_NONE:
/*
 * no prio set, inherit CPU scheduling settings
 */
cfqq->ioprio = task_nice_ioprio(tsk);
cfqq->ioprio_class = task_nice_ioclass(tsk);
break;

So we basically have two code paths that treat IOPRIO_CLASS_NONE
differently, which results in an RT async cfqq filed into a best effort
bucket.

Attached is a patch which fixes the problem.  I'm not sure how to make
it cleaner.  Suggestions would be welcome.

Signed-off-by: Jeff Moyer 

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 6f2751d..b9abdca 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3656,12 +3656,17 @@ static struct cfq_queue *
 cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
  struct bio *bio, gfp_t gfp_mask)
 {
-   const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
-   const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
+   int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
+   int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
struct cfq_queue **async_cfqq = NULL;
struct cfq_queue *cfqq = NULL;
 
if (!is_sync) {
+   if (!ioprio_valid(cic->ioprio)) {
+   struct task_struct *tsk = current;
+   ioprio = task_nice_ioprio(tsk);
+   ioprio_class = task_nice_ioclass(tsk);
+   }
async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
cfqq = *async_cfqq;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] cfq-iosched: fix incorrect filing of rt async cfqq

2015-01-12 Thread Jeff Moyer
Hi,

If you can manage to submit an async write as the first async I/O from
the context of a process with realtime scheduling priority, then a
cfq_queue is allocated, but filed into the wrong async_cfqq bucket.  It
ends up in the best effort array, but actually has realtime I/O
scheduling priority set in cfqq-ioprio.

The reason is that cfq_get_queue assumes the default scheduling class and
priority when there is no information present (i.e. when the async cfqq
is created):

static struct cfq_queue *
cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
  struct bio *bio, gfp_t gfp_mask)
{
const int ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
const int ioprio = IOPRIO_PRIO_DATA(cic-ioprio);

cic-ioprio starts out as 0, which is invalid.  So, class of 0
(IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:

async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);

static struct cfq_queue **
cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
{
switch (ioprio_class) {
case IOPRIO_CLASS_RT:
return cfqd-async_cfqq[0][ioprio];
case IOPRIO_CLASS_NONE:
ioprio = IOPRIO_NORM;
/* fall through */
case IOPRIO_CLASS_BE:
return cfqd-async_cfqq[1][ioprio];
case IOPRIO_CLASS_IDLE:
return cfqd-async_idle_cfqq;
default:
BUG();
}
}

Here, instead of returning a class mapped from the process' scheduling
priority, we get back the bucket associated with IOPRIO_CLASS_BE.

Now, there is no queue allocated there yet, so we create it:

cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);

That function ends up doing this:

cfq_init_cfqq(cfqd, cfqq, current-pid, is_sync);
cfq_init_prio_data(cfqq, cic);

cfq_init_cfqq marks the priority as having changed.  Then, cfq_init_prio
data does this:

ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
switch (ioprio_class) {
default:
printk(KERN_ERR cfq: bad prio %x\n, ioprio_class);
case IOPRIO_CLASS_NONE:
/*
 * no prio set, inherit CPU scheduling settings
 */
cfqq-ioprio = task_nice_ioprio(tsk);
cfqq-ioprio_class = task_nice_ioclass(tsk);
break;

So we basically have two code paths that treat IOPRIO_CLASS_NONE
differently, which results in an RT async cfqq filed into a best effort
bucket.

Attached is a patch which fixes the problem.  I'm not sure how to make
it cleaner.  Suggestions would be welcome.

Signed-off-by: Jeff Moyer jmo...@redhat.com

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 6f2751d..b9abdca 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3656,12 +3656,17 @@ static struct cfq_queue *
 cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
  struct bio *bio, gfp_t gfp_mask)
 {
-   const int ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
-   const int ioprio = IOPRIO_PRIO_DATA(cic-ioprio);
+   int ioprio_class = IOPRIO_PRIO_CLASS(cic-ioprio);
+   int ioprio = IOPRIO_PRIO_DATA(cic-ioprio);
struct cfq_queue **async_cfqq = NULL;
struct cfq_queue *cfqq = NULL;
 
if (!is_sync) {
+   if (!ioprio_valid(cic-ioprio)) {
+   struct task_struct *tsk = current;
+   ioprio = task_nice_ioprio(tsk);
+   ioprio_class = task_nice_ioclass(tsk);
+   }
async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
cfqq = *async_cfqq;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/