Re: [RFC] blk-mq and I/O scheduling

2015-11-30 Thread Andreas Herrmann
On Wed, Nov 25, 2015 at 12:47:21PM -0700, Jens Axboe wrote:
> On 11/19/2015 05:02 AM, Andreas Herrmann wrote:

 --8<--

> >The latter helped to improve performance for sequential reads and
> >writes. But it's not on a par with CFQ. Increasing the time slice is
> >suboptimal (as shown with the 2ms results, see below). It might be
> >possible to get better performance when further reducing the initial
> >time slice and adapting it up to a maximum value if there are
> >repeatedly pending requests for a CPU.
> >
> >After these observations and assuming that non-rotational devices are
> >most likely fine using blk-mq without I/O scheduling support I wonder
> >whether
> >
> >- it's really a good idea to re-implement scheduling support for
> >   blk-mq that eventually behaves like CFQ for rotational devices.
> >
> >- it's technical possible to support both blk-mq and CFQ for different
> >   devices on the same host adapter. This would allow to use "good old"
> >   code for "good old" rotational devices. (But this might not be a
> >   choice if in the long run a goal is to get rid of non-blk-mq code --
> >   not sure what the plans are.)
> >
> >What do you think about this?
> 
> Sorry I did not get around to properly looking at this this week,
> I'll tend to it next week. I think the concept of tying the idling
> to a specific CPU is likely fine, though I wonder if there are cases
> where we preempt more heavily and subsequently miss breaking the
> idling properly. I don't think we want/need cfq for blk-mq, but
> basic idling could potentially be enough. That's still a far cry
> from a full cfq implementation. The long term plans are still to
> move away from the legacy IO path, though with things like
> scheduling, that's sure to take some time.

FYI, I'll plan to send an updated patch later today.  

I've slightly changed it to allow specification of a time slice in µs
(instead of ms) and to extend it for a software queue when requests
were actually put into the hardware queue for this specific software
queue. This improved performance a little bit.

> That is actually where the mq-deadline work comes in. The
> mq-deadline work is missing a test patch to limit tag allocations,
> and a bunch of other little things to truly make it functional.
> There might be some options for folding it all together, with
> idling, as that would still be important on rotating storage going
> forward.


Thanks for you comments,

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] blk-mq and I/O scheduling

2015-11-30 Thread Andreas Herrmann
On Tue, Nov 24, 2015 at 09:19:32AM +0100, Christoph Hellwig wrote:
> Hi Andreas,

Hi Christoph,

> I don't understand the time slicing algorithm to well, but from the
> blk-mq integration perspective this looks nice, and anything that
> helps improving blk-mq for spinning rust is useful.

I'll put description/comments in the next patch version that hopefully
explain it.

> As a nitpick some of the larger "if (use_time_slice)" blocks should
> be moved into separate helper functions‥

And I'll address this also.


Thanks,

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] blk-mq and I/O scheduling

2015-11-30 Thread Andreas Herrmann
On Wed, Nov 25, 2015 at 12:47:21PM -0700, Jens Axboe wrote:
> On 11/19/2015 05:02 AM, Andreas Herrmann wrote:

 --8<--

> >The latter helped to improve performance for sequential reads and
> >writes. But it's not on a par with CFQ. Increasing the time slice is
> >suboptimal (as shown with the 2ms results, see below). It might be
> >possible to get better performance when further reducing the initial
> >time slice and adapting it up to a maximum value if there are
> >repeatedly pending requests for a CPU.
> >
> >After these observations and assuming that non-rotational devices are
> >most likely fine using blk-mq without I/O scheduling support I wonder
> >whether
> >
> >- it's really a good idea to re-implement scheduling support for
> >   blk-mq that eventually behaves like CFQ for rotational devices.
> >
> >- it's technical possible to support both blk-mq and CFQ for different
> >   devices on the same host adapter. This would allow to use "good old"
> >   code for "good old" rotational devices. (But this might not be a
> >   choice if in the long run a goal is to get rid of non-blk-mq code --
> >   not sure what the plans are.)
> >
> >What do you think about this?
> 
> Sorry I did not get around to properly looking at this this week,
> I'll tend to it next week. I think the concept of tying the idling
> to a specific CPU is likely fine, though I wonder if there are cases
> where we preempt more heavily and subsequently miss breaking the
> idling properly. I don't think we want/need cfq for blk-mq, but
> basic idling could potentially be enough. That's still a far cry
> from a full cfq implementation. The long term plans are still to
> move away from the legacy IO path, though with things like
> scheduling, that's sure to take some time.

FYI, I'll plan to send an updated patch later today.  

I've slightly changed it to allow specification of a time slice in µs
(instead of ms) and to extend it for a software queue when requests
were actually put into the hardware queue for this specific software
queue. This improved performance a little bit.

> That is actually where the mq-deadline work comes in. The
> mq-deadline work is missing a test patch to limit tag allocations,
> and a bunch of other little things to truly make it functional.
> There might be some options for folding it all together, with
> idling, as that would still be important on rotating storage going
> forward.


Thanks for you comments,

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] blk-mq and I/O scheduling

2015-11-30 Thread Andreas Herrmann
On Tue, Nov 24, 2015 at 09:19:32AM +0100, Christoph Hellwig wrote:
> Hi Andreas,

Hi Christoph,

> I don't understand the time slicing algorithm to well, but from the
> blk-mq integration perspective this looks nice, and anything that
> helps improving blk-mq for spinning rust is useful.

I'll put description/comments in the next patch version that hopefully
explain it.

> As a nitpick some of the larger "if (use_time_slice)" blocks should
> be moved into separate helper functions‥

And I'll address this also.


Thanks,

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] blk-mq and I/O scheduling

2015-11-25 Thread Jens Axboe

On 11/19/2015 05:02 AM, Andreas Herrmann wrote:

Hi,

I've looked into blk-mq and possible support for I/O scheduling.

The reason for this is to minimize performance degradation with
rotational devices when scsi_mod.use_blk_mq=1 is switched on.

I think that the degradation is well reflected with fio measurements.
With an increasing number of jobs you'll encounter a significant
performance drop for sequential reads and writes with blk-mq in
contrast to CFQ. blk-mq ensures that requests from different processes
(CPUs) are "perfectly shuffled" in a hardware queue. This is no
problem for non-rotational devices for which blk-mq is aimed for but
not so nice for rotational disks.

   (i) I've done some tests with patch c2ed2f2dcf92 (blk-mq: first cut
   deadline scheduling) from branch mq-deadline of linux-block
   repository. I've not seen a significant performance impact when
   enabling it (neither for non-rotational nor for rotational
   disks).

  (ii) I've played with code to enable sorting/merging of requests. I
   did this in flush_busy_ctxs. This didn't have a performance
   impact either. On a closer look this was due to high frequency
   of calls to __blk_mq_run_hw_queue. There was almost nothing to
   sort (too few requests). I guess that's also the reason why (i)
   had not much impact.

(iii) With CFQ I've observed similar performance patterns to blk-mq if
   slice_idle was set to 0.

  (iv) I thought about introducing a per software queue time slice
   during which blk-mq will service only one software queue (one
   CPU) and not flush all software queues. This could help to
   enqueue multiple requests belonging to the same process (as long
   as it runs on same CPU) into a hardware queue.  A minimal patch
   to implement this is attached below.

The latter helped to improve performance for sequential reads and
writes. But it's not on a par with CFQ. Increasing the time slice is
suboptimal (as shown with the 2ms results, see below). It might be
possible to get better performance when further reducing the initial
time slice and adapting it up to a maximum value if there are
repeatedly pending requests for a CPU.

After these observations and assuming that non-rotational devices are
most likely fine using blk-mq without I/O scheduling support I wonder
whether

- it's really a good idea to re-implement scheduling support for
   blk-mq that eventually behaves like CFQ for rotational devices.

- it's technical possible to support both blk-mq and CFQ for different
   devices on the same host adapter. This would allow to use "good old"
   code for "good old" rotational devices. (But this might not be a
   choice if in the long run a goal is to get rid of non-blk-mq code --
   not sure what the plans are.)

What do you think about this?


Sorry I did not get around to properly looking at this this week, I'll 
tend to it next week. I think the concept of tying the idling to a 
specific CPU is likely fine, though I wonder if there are cases where we 
preempt more heavily and subsequently miss breaking the idling properly. 
I don't think we want/need cfq for blk-mq, but basic idling could 
potentially be enough. That's still a far cry from a full cfq 
implementation. The long term plans are still to move away from the 
legacy IO path, though with things like scheduling, that's sure to take 
some time.


That is actually where the mq-deadline work comes in. The mq-deadline 
work is missing a test patch to limit tag allocations, and a bunch of 
other little things to truly make it functional. There might be some 
options for folding it all together, with idling, as that would still be 
important on rotating storage going forward.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] blk-mq and I/O scheduling

2015-11-25 Thread Jens Axboe

On 11/19/2015 05:02 AM, Andreas Herrmann wrote:

Hi,

I've looked into blk-mq and possible support for I/O scheduling.

The reason for this is to minimize performance degradation with
rotational devices when scsi_mod.use_blk_mq=1 is switched on.

I think that the degradation is well reflected with fio measurements.
With an increasing number of jobs you'll encounter a significant
performance drop for sequential reads and writes with blk-mq in
contrast to CFQ. blk-mq ensures that requests from different processes
(CPUs) are "perfectly shuffled" in a hardware queue. This is no
problem for non-rotational devices for which blk-mq is aimed for but
not so nice for rotational disks.

   (i) I've done some tests with patch c2ed2f2dcf92 (blk-mq: first cut
   deadline scheduling) from branch mq-deadline of linux-block
   repository. I've not seen a significant performance impact when
   enabling it (neither for non-rotational nor for rotational
   disks).

  (ii) I've played with code to enable sorting/merging of requests. I
   did this in flush_busy_ctxs. This didn't have a performance
   impact either. On a closer look this was due to high frequency
   of calls to __blk_mq_run_hw_queue. There was almost nothing to
   sort (too few requests). I guess that's also the reason why (i)
   had not much impact.

(iii) With CFQ I've observed similar performance patterns to blk-mq if
   slice_idle was set to 0.

  (iv) I thought about introducing a per software queue time slice
   during which blk-mq will service only one software queue (one
   CPU) and not flush all software queues. This could help to
   enqueue multiple requests belonging to the same process (as long
   as it runs on same CPU) into a hardware queue.  A minimal patch
   to implement this is attached below.

The latter helped to improve performance for sequential reads and
writes. But it's not on a par with CFQ. Increasing the time slice is
suboptimal (as shown with the 2ms results, see below). It might be
possible to get better performance when further reducing the initial
time slice and adapting it up to a maximum value if there are
repeatedly pending requests for a CPU.

After these observations and assuming that non-rotational devices are
most likely fine using blk-mq without I/O scheduling support I wonder
whether

- it's really a good idea to re-implement scheduling support for
   blk-mq that eventually behaves like CFQ for rotational devices.

- it's technical possible to support both blk-mq and CFQ for different
   devices on the same host adapter. This would allow to use "good old"
   code for "good old" rotational devices. (But this might not be a
   choice if in the long run a goal is to get rid of non-blk-mq code --
   not sure what the plans are.)

What do you think about this?


Sorry I did not get around to properly looking at this this week, I'll 
tend to it next week. I think the concept of tying the idling to a 
specific CPU is likely fine, though I wonder if there are cases where we 
preempt more heavily and subsequently miss breaking the idling properly. 
I don't think we want/need cfq for blk-mq, but basic idling could 
potentially be enough. That's still a far cry from a full cfq 
implementation. The long term plans are still to move away from the 
legacy IO path, though with things like scheduling, that's sure to take 
some time.


That is actually where the mq-deadline work comes in. The mq-deadline 
work is missing a test patch to limit tag allocations, and a bunch of 
other little things to truly make it functional. There might be some 
options for folding it all together, with idling, as that would still be 
important on rotating storage going forward.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] blk-mq and I/O scheduling

2015-11-24 Thread Christoph Hellwig
Hi Andreas,

I don't understand the time slicing algorithm to well, but from the
blk-mq integration perspective this looks nice, and anything that
helps improving blk-mq for spinning rust is useful.

As a nitpick some of the larger "if (use_time_slice)" blocks should
be moved into separate helper functions‥

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] blk-mq and I/O scheduling

2015-11-24 Thread Christoph Hellwig
Hi Andreas,

I don't understand the time slicing algorithm to well, but from the
blk-mq integration perspective this looks nice, and anything that
helps improving blk-mq for spinning rust is useful.

As a nitpick some of the larger "if (use_time_slice)" blocks should
be moved into separate helper functions‥

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/