Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-29 Thread Paolo Valente

> Il giorno 29 ott 2016, alle ore 16:12, Jens Axboe  ha 
> scritto:
> 
> On 10/28/2016 11:38 PM, Paolo Valente wrote:
>> 
>>> Il giorno 26 ott 2016, alle ore 18:12, Jens Axboe  ha 
>>> scritto:
>>> 
>>> On 10/26/2016 10:04 AM, Paolo Valente wrote:
 
> Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha 
> scritto:
> 
> On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
>> On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
>>> The question to ask first is whether to actually have pluggable
>>> schedulers on blk-mq at all, or just have one that is meant to
>>> do the right thing in every case (and possibly can be bypassed
>>> completely).
>> 
>> That would be my preference.  Have a BFQ-variant for blk-mq as an
>> option (default to off unless opted in by the driver or user), and
>> not other scheduler for blk-mq.  Don't bother with bfq for non
>> blk-mq.  It's not like there is any advantage in the legacy-request
>> device even for slow devices, except for the option of having I/O
>> scheduling.
> 
> It's the only right way forward. blk-mq might not offer any substantial
> advantages to rotating storage, but with scheduling, it won't offer a
> downside either. And it'll take us towards the real goal, which is to
> have just one IO path.
 
 ok
 
> Adding a new scheduler for the legacy IO path
> makes no sense.
 
 I would fully agree if effective and stable I/O scheduling would be
 available in blk-mq in one or two months.  But I guess that it will
 take at least one year optimistically, given the current status of the
 needed infrastructure, and given the great difficulties of doing
 effective scheduling at the high parallelism and extreme target speeds
 of blk-mq.  Of course, this holds true unless little clever scheduling
 is performed.
 
 So, what's the point in forcing a lot of users wait another year or
 more, for a solution that has yet to be even defined, while they could
 enjoy a much better system, and then switch an even better system when
 scheduling is ready in blk-mq too?
>>> 
>>> That same argument could have been made 2 years ago. Saying no to a new
>>> scheduler for the legacy framework goes back roughly that long. We could
>>> have had BFQ for mq NOW, if we didn't keep coming back to this very
>>> point.
>>> 
>>> I'm hesistant to add a new scheduler because it's very easy to add, very
>>> difficult to get rid of. If we do add BFQ as a legacy scheduler now,
>>> it'll take us years and years to get rid of it again. We should be
>>> moving towards LESS moving parts in the legacy path, not more.
>>> 
>>> We can keep having this discussion every few years, but I think we'd
>>> both prefer to make some actual progress here.
>> 
>> ok Jens, I give up
>> 
>>> It's perfectly fine to
>>> add an interface for a single queue interface for an IO scheduler for
>>> blk-mq, since we don't care too much about scalability there. And that
>>> won't take years, that should be a few weeks. Retrofitting BFQ on top of
>>> that should not be hard either. That can co-exist with a real multiqueue
>>> scheduler as well, something that's geared towards some fairness for
>>> faster devices.
>>> 
>> 
>> AFAICT this solution is good, for many practical reasons.  I don't
>> have the expertise to make such an infrastructure well on my own.  At
>> least not in an acceptable amount of time, because working on this
>> nice stuff is unfortunately not my job (although Linaro is now
>> supporting me for BFQ).
>> 
>> Then, assuming that this solution may be of general interest, and that
>> BFQ benefits convinced you a little bit too, may I get significant
>> collaboration/help on implementing this infrastructure?
> 
> Of course, I already offered to help with this.
> 

Yep, I just did not want to take this important point for granted.

>> If so, Jens
>> and all possibly interested parties, could we have a sort of short
>> kick-off technical meeting during KS/LPC?
> 
> I'm not a huge fan of setting up a BoF to discuss something technical,
> when there's no code to discuss yet. We need some actual meat on the
> bone in the shape of code, and that's much better dealt with in email.
> Timing is pretty advanced at this point, otherwise I'd offer to cook
> something up that we COULD discuss, but I will not have time to do that
> for KS.
> 

Sorry, I was not thinking of any BoF or the like.  I just meant, with
a stuffy phrase, "let's get it started concretely". 

> If you are at LPC, why don't the two of us sit down and talk about it
> Wednesday or Thursday?

I'm also at KS.  I'm available from Sunday evening to Wednesday
evening.  I'm leaving on Thursday morning.  If Wednesday is in any
case your preferred day, then let's do it on Wednesday.  At what time?

If I understand correctly, Bart will join us too.

> I'd 

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-29 Thread Jens Axboe

On 10/28/2016 11:38 PM, Paolo Valente wrote:



Il giorno 26 ott 2016, alle ore 18:12, Jens Axboe  ha scritto:

On 10/26/2016 10:04 AM, Paolo Valente wrote:



Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha scritto:

On 10/26/2016 09:29 AM, Christoph Hellwig wrote:

On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:

The question to ask first is whether to actually have pluggable
schedulers on blk-mq at all, or just have one that is meant to
do the right thing in every case (and possibly can be bypassed
completely).


That would be my preference.  Have a BFQ-variant for blk-mq as an
option (default to off unless opted in by the driver or user), and
not other scheduler for blk-mq.  Don't bother with bfq for non
blk-mq.  It's not like there is any advantage in the legacy-request
device even for slow devices, except for the option of having I/O
scheduling.


It's the only right way forward. blk-mq might not offer any substantial
advantages to rotating storage, but with scheduling, it won't offer a
downside either. And it'll take us towards the real goal, which is to
have just one IO path.


ok


Adding a new scheduler for the legacy IO path
makes no sense.


I would fully agree if effective and stable I/O scheduling would be
available in blk-mq in one or two months.  But I guess that it will
take at least one year optimistically, given the current status of the
needed infrastructure, and given the great difficulties of doing
effective scheduling at the high parallelism and extreme target speeds
of blk-mq.  Of course, this holds true unless little clever scheduling
is performed.

So, what's the point in forcing a lot of users wait another year or
more, for a solution that has yet to be even defined, while they could
enjoy a much better system, and then switch an even better system when
scheduling is ready in blk-mq too?


That same argument could have been made 2 years ago. Saying no to a new
scheduler for the legacy framework goes back roughly that long. We could
have had BFQ for mq NOW, if we didn't keep coming back to this very
point.

I'm hesistant to add a new scheduler because it's very easy to add, very
difficult to get rid of. If we do add BFQ as a legacy scheduler now,
it'll take us years and years to get rid of it again. We should be
moving towards LESS moving parts in the legacy path, not more.

We can keep having this discussion every few years, but I think we'd
both prefer to make some actual progress here.


ok Jens, I give up


It's perfectly fine to
add an interface for a single queue interface for an IO scheduler for
blk-mq, since we don't care too much about scalability there. And that
won't take years, that should be a few weeks. Retrofitting BFQ on top of
that should not be hard either. That can co-exist with a real multiqueue
scheduler as well, something that's geared towards some fairness for
faster devices.



AFAICT this solution is good, for many practical reasons.  I don't
have the expertise to make such an infrastructure well on my own.  At
least not in an acceptable amount of time, because working on this
nice stuff is unfortunately not my job (although Linaro is now
supporting me for BFQ).

Then, assuming that this solution may be of general interest, and that
BFQ benefits convinced you a little bit too, may I get significant
collaboration/help on implementing this infrastructure?


Of course, I already offered to help with this.


If so, Jens
and all possibly interested parties, could we have a sort of short
kick-off technical meeting during KS/LPC?


I'm not a huge fan of setting up a BoF to discuss something technical,
when there's no code to discuss yet. We need some actual meat on the
bone in the shape of code, and that's much better dealt with in email.
Timing is pretty advanced at this point, otherwise I'd offer to cook
something up that we COULD discuss, but I will not have time to do that
for KS.

If you are at LPC, why don't the two of us sit down and talk about it
Wednesday or Thursday? I'd like to try and understand what parts of
blk-mq you aren't up to speed on, and how we can best get a simple
framework going that will allow us to entertain single queue scheduling
within blk-mq.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-29 Thread Bart Van Assche
On 10/28/16 22:38, Paolo Valente wrote:
> Then, assuming that this solution may be of general interest, and that
> BFQ benefits convinced you a little bit too, may I get significant
> collaboration/help on implementing this infrastructure?  If so, Jens
> and all possibly interested parties, could we have a sort of short
> kick-off technical meeting during KS/LPC?

Hello Paolo and Jens,

Please keep me in the loop for any communication about BFQ / blk-mq 
scheduling. My employer was so kind to allow me to spend some of my time 
to work on this. I plan to attend the KS.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Paolo Valente

> Il giorno 26 ott 2016, alle ore 18:12, Jens Axboe  ha 
> scritto:
> 
> On 10/26/2016 10:04 AM, Paolo Valente wrote:
>> 
>>> Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha 
>>> scritto:
>>> 
>>> On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
 On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
> The question to ask first is whether to actually have pluggable
> schedulers on blk-mq at all, or just have one that is meant to
> do the right thing in every case (and possibly can be bypassed
> completely).
 
 That would be my preference.  Have a BFQ-variant for blk-mq as an
 option (default to off unless opted in by the driver or user), and
 not other scheduler for blk-mq.  Don't bother with bfq for non
 blk-mq.  It's not like there is any advantage in the legacy-request
 device even for slow devices, except for the option of having I/O
 scheduling.
>>> 
>>> It's the only right way forward. blk-mq might not offer any substantial
>>> advantages to rotating storage, but with scheduling, it won't offer a
>>> downside either. And it'll take us towards the real goal, which is to
>>> have just one IO path.
>> 
>> ok
>> 
>>> Adding a new scheduler for the legacy IO path
>>> makes no sense.
>> 
>> I would fully agree if effective and stable I/O scheduling would be
>> available in blk-mq in one or two months.  But I guess that it will
>> take at least one year optimistically, given the current status of the
>> needed infrastructure, and given the great difficulties of doing
>> effective scheduling at the high parallelism and extreme target speeds
>> of blk-mq.  Of course, this holds true unless little clever scheduling
>> is performed.
>> 
>> So, what's the point in forcing a lot of users wait another year or
>> more, for a solution that has yet to be even defined, while they could
>> enjoy a much better system, and then switch an even better system when
>> scheduling is ready in blk-mq too?
> 
> That same argument could have been made 2 years ago. Saying no to a new
> scheduler for the legacy framework goes back roughly that long. We could
> have had BFQ for mq NOW, if we didn't keep coming back to this very
> point.
> 
> I'm hesistant to add a new scheduler because it's very easy to add, very
> difficult to get rid of. If we do add BFQ as a legacy scheduler now,
> it'll take us years and years to get rid of it again. We should be
> moving towards LESS moving parts in the legacy path, not more.
> 
> We can keep having this discussion every few years, but I think we'd
> both prefer to make some actual progress here.

ok Jens, I give up

> It's perfectly fine to
> add an interface for a single queue interface for an IO scheduler for
> blk-mq, since we don't care too much about scalability there. And that
> won't take years, that should be a few weeks. Retrofitting BFQ on top of
> that should not be hard either. That can co-exist with a real multiqueue
> scheduler as well, something that's geared towards some fairness for
> faster devices.
> 

AFAICT this solution is good, for many practical reasons.  I don't
have the expertise to make such an infrastructure well on my own.  At
least not in an acceptable amount of time, because working on this
nice stuff is unfortunately not my job (although Linaro is now
supporting me for BFQ).

Then, assuming that this solution may be of general interest, and that
BFQ benefits convinced you a little bit too, may I get significant
collaboration/help on implementing this infrastructure?  If so, Jens
and all possibly interested parties, could we have a sort of short
kick-off technical meeting during KS/LPC?

Thanks,
Paolo

> -- 
> Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Linus Walleij
On Fri, Oct 28, 2016 at 5:29 PM, Christoph Hellwig  wrote:
> On Fri, Oct 28, 2016 at 11:32:21AM +0200, Linus Walleij wrote:
>> So I'm not just complaining by the way, I'm trying to fix this. Also
>> Bartlomiej from Samsung has done some stabs at switching MMC/SD
>> to blk-mq. I just rebased my latest stab at a naīve switch to blk-mq
>> to v4.9-rc2 with these results.
>>
>> The patch to enable MQ looks like this:
>> https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq=8f79b527e2e854071d8da019451da68d4753f71d
>>
>> I run these tests directly after boot with cold caches. The results
>> are consistent: I ran the same commands 10 times in a row.
>
> A couple comments from a quick look over the patch:
>
> In the changelog you complain:
>
> ". Lack of front- and back-end merging in the MQ block layer creating
> several small requests instead of a few large ones."
>
> In blk-mq merging is controller by the BLK_MQ_F_SHOULD_MERGE and
> BLK_MQ_F_SG_MERGE flags.  You set the former, but not the latter.
> BLK_MQ_F_SG_MERGE controls wether multiple physical contiguous pages get
> merged into a single segment.  For a dd after a fresh boot that is
> probably very common.  Except for the polarity of the merge flags the
> basic merge functionality between the legacy and blk-mq path should be
> the same, and if they aren't you've found a bug we need to address.

Aha OK I will make sure to set both flags next time. (I will also stop
guessing about that as a cause since that part probably works.)

> You also say that you disable the pipelining.  How much of a performance
> gain did this feature give when added? How much does just removing that
> on it's own cost you?

Interestingly, the original commit doesn't say.
http://marc.info/?l=linaro-dev=137645684811479=2

It however dependends the cache architecture of the machine how
much is won. The heavier the cache flushes, the more it gains.

I guess I need to make a patch removing that mechanism to bench
it. It's pretty hard to get rid of because it goes really deep into the
MMC subsystem. It's massaged in like a schampoo.

> While I think that features is rather messy and
> should be avoided if possible I don't see how it's impossible to
> implement in blk-mq.

It's probably possible. What I discussed with Arnd was to let
the blk-mq core call out to these pre-request and post-request
hooks on new requests in parallel with processing a request or
a queue of requests. I.e. add .prep_request() and .unprep_request()
callbacks to struct blk_mq_ops.

I tried to understand if the existing .init_request and .exit_request
callbacks could be used. But as I understand it they are only used
to allocate and prepare the extra per-request-associated memory
and state, and does not have access to the request per se,
so it doesn't know anything about the actual request when
.init_request() is called.

So we're looking for something called whenever the contents of
a request are done, right before queueing it, and right after
dequeueing it after being served.

>  If you just increase your queue depth and use
> the old scheme you should get it - if you currently can't handle the
> second command for some reason (i.e. the special request magic) you
> can just return BLK_MQ_RQ_QUEUE_BUSY from the queue_rq function.

Bartlomiejs patch set did that, but I haven't been able to reproduce it.

I will try to make a clean patch in the spirit of his.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Linus Walleij
On Fri, Oct 28, 2016 at 4:22 PM, Jens Axboe  wrote:
> On 10/28/2016 03:32 AM, Linus Walleij wrote:
>>
>> This is without using Bartlomiej's clever hack to pretend we have
>> 2 elements in the HW queue though. His early tests indicate that
>> it doesn't help much: the performance regression we see is due to
>> lack of block scheduling.
>
> A simple dd test, I don't see how that can be slower due to lack of
> scheduling. There's nothing to schedule there, just issue them in order?

Yeah I guess you're right, I guess it could be in part to not having
activated front- and back-end merges properly as Christoph pointed
out, I'll look closer at this.

> So that would probably be where I would start looking. A blktrace of the
> in-kernel code and the blk-mq enabled code would perhaps be
> enlightening. I don't think it's worth looking at the more complex test
> cases until the dd test case is at least as fast as the non-mq version.

Yeah.

> Was that with CFQ, btw, or what scheduler did it run?

CFQ, just plain defconfig.

> It'd be nice to NOT have to rely on that fake QD=2 setup, since it will
> mess with the IO scheduling as well.

I agree.

>> I try to find a way forward with this, and also massage the MMC/SD
>> code to be more MQ friendly to begin with (like only pick requests
>> when we get a request notification and stop pulling NULL requests
>> off the queue) but it's really a messy piece of code.
>
> Yeah, it does look pretty messy... I'd be happy to help out with that,
> and particularly in figuring out why the direct conversion is slower for
> a basic 'dd' test case.

I'm looking into it.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Mark Brown
On Fri, Oct 28, 2016 at 08:17:01AM -0600, Jens Axboe wrote:
> On 10/28/2016 12:36 AM, Ulf Hansson wrote:

> > You have been pushing Paolo in different directions throughout the
> > years with his work in BFQ, wasting lots of his time/effort.

> I have not. Various entities have advised Paolo approach it in various ways.
> We've had blk-mq for 3 years now, my position should have been pretty clear
> on that.

Having come to this somewhat late I have to say that that hasn't been
100% clear as a set opinion from everyone - in the time I've been
following things there's been engagement about the meat of the code
which gave the impression the patches were being seriously considered.

But like I said in a previous mail this is all in the past anyway, we
need to focus on the present situation.


signature.asc
Description: PGP signature


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Arnd Bergmann
On Friday, October 28, 2016 9:30:07 AM CEST Jens Axboe wrote:
> On 10/28/2016 03:32 AM, Linus Walleij wrote:
> > The patch to enable MQ looks like this:
> > https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq=8f79b527e2e854071d8da019451da68d4753f71d
> 
> BTW, another viable "hack" for the depth issue would be to expose more
> than one hardware queue. It's meant to map to a distinct submission
> region in the hardware, but there's nothing stopping the driver from
> using it differently. Might not be cleaner than just increasing the
> queue depth on a single queue, though.
> 
> That still won't solve the issue of lying about it and causing IO
> scheduler confusion, of course.
> 
> Also, 4.8 and newer have support for BLK_MQ_F_BLOCKING, if you need to
> block in ->queue_rq(). That could eliminate the need to offload to a
> kthread manually.

I think the main reason for the kthread is that on ARM and other
architectures, the dma mapping operations are fairly slow (for
cache flushes or bounce buffering) and we want to minimize the
time between subsequent requests being handled by the hardware.

This is not unique to MMC in any way, MMC just happens to be
common on ARM and it is limited by its lack of hardware
command queuing.
It would be nice to do a similar trick for SCSI disks,
especially USB mass storage, maybe also SATA, which are the
next most common storage devices on non-coherent ARM systems
(SATA nowadays often comes with NCQ, so it's less of an
issue)

It may be reasonable to tie this in with the I/O scheduler:
if you don't have a scheduler, the access to the device is
probably rather direct and you want to avoid any complexity
in the kernel, but if preparing a request is expensive
and the hardware has no queuing, you probably also want to
use a scheduler.

We should probably also try to understand how this could
work out with USB mass storage, if there is a solution at
all, and then do it for MMC in a way that would work on
both. I don't think the USB core can currently split the
dma_map_sg() operation from the USB command submission,
so this may require some deeper surgery there.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Bartlomiej Zolnierkiewicz

Hi,

On Friday, October 28, 2016 09:30:07 AM Jens Axboe wrote:
> On 10/28/2016 03:32 AM, Linus Walleij wrote:
> > The patch to enable MQ looks like this:
> > https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq=8f79b527e2e854071d8da019451da68d4753f71d
> 
> BTW, another viable "hack" for the depth issue would be to expose more
> than one hardware queue. It's meant to map to a distinct submission
> region in the hardware, but there's nothing stopping the driver from
> using it differently. Might not be cleaner than just increasing the
> queue depth on a single queue, though.

Yes, I'm already considering this for rewritten version of my
patch set as it may also help with performance when compared to
non blk-mq case.

Significant amount of time is spent on DMA map/unmap operations
on ARM MMC hosts and I would like to do these DMA (un)mapping-s
in parallel for two (or more) requests to check whether it helps
the performance (hopefully the cache controller doesn't serialize
these operations).

BTW I'm following the discussion and still would like to help with
getting blk-mq work for MMC.  I'm just quite busy with other things
at the moment.

> That still won't solve the issue of lying about it and causing IO
> scheduler confusion, of course.
> 
> Also, 4.8 and newer have support for BLK_MQ_F_BLOCKING, if you need to
> block in ->queue_rq(). That could eliminate the need to offload to a
> kthread manually.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R Institute Poland
Samsung Electronics

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Jens Axboe

On 10/28/2016 03:32 AM, Linus Walleij wrote:

The patch to enable MQ looks like this:
https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq=8f79b527e2e854071d8da019451da68d4753f71d


BTW, another viable "hack" for the depth issue would be to expose more
than one hardware queue. It's meant to map to a distinct submission
region in the hardware, but there's nothing stopping the driver from
using it differently. Might not be cleaner than just increasing the
queue depth on a single queue, though.

That still won't solve the issue of lying about it and causing IO
scheduler confusion, of course.

Also, 4.8 and newer have support for BLK_MQ_F_BLOCKING, if you need to
block in ->queue_rq(). That could eliminate the need to offload to a
kthread manually.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Jens Axboe

On 10/28/2016 12:36 AM, Ulf Hansson wrote:

[...]




Moreover, I am still trying to understand what's the big deal to why
you say no to BFQ as a legacy scheduler. Ideally it shouldn't cause
you any maintenance burden and it doesn't make the removal of the
legacy blk layer any more difficult, right?



Not sure I can state it much clearer. It's a new scheduler, and a
complicated one at that. It WILL carry a maintenance burden. And I'm


Really? Either you maintain the code or not. And if Paolo would do it,
then your are off the hook!


Are you trying to be deliberately obtuse? If so, good job. I'd advise 
you to look into how code in the kernel is maintained in general. A 
maintenance burden exists for code A, but it also carries over to the 
subsystem it is under, and the kernel in general. Adding code is never free.



really not that interested in adding such a burden for something that
will be defunct as soon as the single queue blk-mq version is done.
Additionally, if we put BFQ in right now, the motivation to do the real
work will be gone.


You have been pushing Paolo in different directions throughout the
years with his work in BFQ, wasting lots of his time/effort.


I have not. Various entities have advised Paolo approach it in various 
ways. We've had blk-mq for 3 years now, my position should have been 
pretty clear on that.



You have not given him any credibility for his work in BFQ and now you
point him yet in another direction.


I don't even know what that means. But I'm not pointing him in a new 
direction.


Ulf, I'm done discussing with you. I've made my position clear, yet you 
continue to beat on a dead horse. As far as I'm concerned, there's 
nothing further to discuss here. I'll be happy to discuss when there's 
some meat on the bone (ie code). Until then, EOD.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Richard Weinberger
On Fri, Oct 28, 2016 at 2:07 PM, Arnd Bergmann  wrote:
>> > I don't think that's an accurate statement. In terms of coverage, most
>> > drivers do support blk-mq. Anything SCSI, nvme, virtio-blk, SATA runs on
>> > (or can run on) top of blk-mq.
>>
>> Well, I just used "git grep" and found that many drivers didn't use
>> blkmq. Apologize if I gave the wrong impressions.
>
> To clarify, this seems to be a complete list:
>
> $ git grep -wl '\(__\|\)blk_\(fetch\|end\|start\)_request' | xargs grep -L 
> blk_mq
> Documentation/scsi/scsi_eh.txt
> arch/um/drivers/ubd_kern.c

AFAICT Daniel looked at the UML block driver and did an initial
conversion some time ago.
Daniel?
Anton is also working on a patch series to speed up the driver.
Maybe it is time to bite the bullet and do the conversion.

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Arnd Bergmann
On Thursday, October 27, 2016 8:13:08 PM CEST Ulf Hansson wrote:
> On 27 October 2016 at 19:43, Jens Axboe  wrote:
> > On 10/27/2016 11:32 AM, Ulf Hansson wrote:
> >>
> >> [...]
> >>
> >>>
> >>> I'm hesistant to add a new scheduler because it's very easy to add, very
> >>> difficult to get rid of. If we do add BFQ as a legacy scheduler now,
> >>> it'll take us years and years to get rid of it again. We should be
> >>> moving towards LESS moving parts in the legacy path, not more.
> >>
> >>
> >> Jens, I think you are wrong here and let me try to elaborate on why.
> >>
> >> 1)
> >> We already have legacy schedulers like CFQ, DEADLINE, etc - and most
> >> block device drivers are still using the legacy blk interface.
> >
> >
> > I don't think that's an accurate statement. In terms of coverage, most
> > drivers do support blk-mq. Anything SCSI, nvme, virtio-blk, SATA runs on
> > (or can run on) top of blk-mq.
> 
> Well, I just used "git grep" and found that many drivers didn't use
> blkmq. Apologize if I gave the wrong impressions.

To clarify, this seems to be a complete list:

$ git grep -wl '\(__\|\)blk_\(fetch\|end\|start\)_request' | xargs grep -L 
blk_mq
Documentation/scsi/scsi_eh.txt
arch/um/drivers/ubd_kern.c
block/blk-tag.c
block/bsg-lib.c
drivers/block/DAC960.c
drivers/block/amiflop.c
drivers/block/aoe/aoeblk.c
drivers/block/aoe/aoecmd.c
drivers/block/aoe/aoedev.c
drivers/block/ataflop.c
drivers/block/cciss.c
drivers/block/floppy.c
drivers/block/hd.c
drivers/block/mg_disk.c
drivers/block/osdblk.c
drivers/block/paride/pcd.c
drivers/block/paride/pd.c
drivers/block/paride/pf.c
drivers/block/ps3disk.c
drivers/block/skd_main.c
drivers/block/sunvdc.c
drivers/block/swim.c
drivers/block/swim3.c
drivers/block/sx8.c
drivers/block/xsysace.c
drivers/block/z2ram.c
drivers/cdrom/gdrom.c
drivers/ide/ide-atapi.c
drivers/ide/ide-io.c
drivers/ide/ide-pm.c
drivers/memstick/core/ms_block.c
drivers/memstick/core/mspro_block.c
drivers/mmc/card/block.c
drivers/mmc/card/queue.c
drivers/mtd/mtd_blkdevs.c
drivers/s390/block/dasd.c
drivers/s390/block/scm_blk.c
drivers/sbus/char/jsflash.c
drivers/scsi/osd/osd_initiator.c
drivers/scsi/scsi_transport_fc.c
drivers/scsi/scsi_transport_sas.c
samples/bpf/tracex3_kern.c

>From what I can tell, most of these are hopelessly obsolete, but
there are some notable exceptions: aoe, osdblk, skd, sunvdc, mtdblk,
mmc, dasd and scm. I've never used any of the first four, but the
last four of the list are certainly important (for very different
reasons).

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Linus Walleij
On Fri, Oct 28, 2016 at 12:27 AM, Linus Walleij
 wrote:
> On Thu, Oct 27, 2016 at 11:08 PM, Jens Axboe  wrote:
>
>> blk-mq has evolved to support a variety of devices, there's nothing
>> special about mmc that can't work well within that framework.
>
> There is. Read mmc_queue_thread() in drivers/mmc/card/queue.c

So I'm not just complaining by the way, I'm trying to fix this. Also
Bartlomiej from Samsung has done some stabs at switching MMC/SD
to blk-mq. I just rebased my latest stab at a naïve switch to blk-mq
to v4.9-rc2 with these results.

The patch to enable MQ looks like this:
https://git.kernel.org/cgit/linux/kernel/git/linusw/linux-stericsson.git/commit/?h=mmc-mq=8f79b527e2e854071d8da019451da68d4753f71d

I run these tests directly after boot with cold caches. The results
are consistent: I ran the same commands 10 times in a row.


BEFORE switching to BLK-MQ (clean v4.9-rc2):

time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 47.781464 seconds, 21.4MB/s
real0m 47.79s
user0m 0.02s
sys 0m 9.35s

mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real0m 3.60s
user0m 0.25s
sys 0m 1.58s

mount /dev/mmcblk0p1 /mnt/
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
(kBytes/second)
randomrandom
kB  reclenwrite  rewritereadrereadread write
 20480   4 2112 2157 6052 6060 6025   40
 20480   8 4820 5074 9163 9121 9125   81
 20480  16 5755 5242123171232012280  165
 20480  32 6176 6261149811498714962  336
 20480  64 6547 5875168261682816810  692
 20480 128 6762 6828178991789617896 1408
 20480 256 6802 6871169601751318373 3048
 20480 512 7220 7252186751874618741 7228
 204801024 7222 7304184361785818246 7322
 204802048 7316 7398187441875118526 7419
 204804096 7520 7636207742099520703 7609
 204808192 7519 7704218502148921467 7663
 20480   16384 7395 7782223992221022215 7781


AFTER switching to BLK-MQ:

time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.0GB) copied, 60.551117 seconds, 16.9MB/s
real1m 0.56s
user0m 0.02s
sys 0m 9.81s

mount /dev/mmcblk0p1 /mnt/
cd /mnt/
time find . > /dev/null
real0m 4.42s
user0m 0.24s
sys 0m 1.81s

mount /dev/mmcblk0p1 /mnt/
iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
(kBytes/second)
randomrandom
kB  reclenwrite  rewritereadrereadread write
 20480   4 2086 2201 6024 6036 6006   40
 20480   8 4812 5036 8014 9121 9090   82
 20480  16 5432 563312267 977612212  168
 20480  32 6180 6233148701489114852  340
 20480  64 6382 5454167441677116746  702
 20480 128 6761 6776178161784617836 1394
 20480 256 6828 6842177891789517094 3084
 20480 512 7158 7222179571768117698 7232
 204801024 7215 7274186421767918031 7300
 204802048 7229 7269179431864217732 7358
 204804096 7212 7360182721815718889 7371
 204808192 7008 7271186321870718225 7282
 20480   16384 6889 7211182431842918018 7246


A simple dd readtest of 1 GB is always consistently 10+
seconds slower with MQ. find in the rootfs is a second slower.
iozone results are consistently lower throughput or the same.

This is without using Bartlomiej's clever hack to pretend we have
2 elements in the HW queue though. His early tests indicate that
it doesn't help much: the performance regression we see is due to
lack of block scheduling.

I try to find a way forward with this, and also massage the MMC/SD
code to be more MQ friendly to begin with (like only pick requests
when we get a request notification and stop pulling NULL requests
off the queue) but it's really a messy piece of code.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Jan Kara
On Thu 27-10-16 10:26:18, Jens Axboe wrote:
> On 10/27/2016 03:26 AM, Jan Kara wrote:
> >On Wed 26-10-16 10:12:38, Jens Axboe wrote:
> >>On 10/26/2016 10:04 AM, Paolo Valente wrote:
> >>>
> Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha 
> scritto:
> 
> On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
> >On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
> >>The question to ask first is whether to actually have pluggable
> >>schedulers on blk-mq at all, or just have one that is meant to
> >>do the right thing in every case (and possibly can be bypassed
> >>completely).
> >
> >That would be my preference.  Have a BFQ-variant for blk-mq as an
> >option (default to off unless opted in by the driver or user), and
> >not other scheduler for blk-mq.  Don't bother with bfq for non
> >blk-mq.  It's not like there is any advantage in the legacy-request
> >device even for slow devices, except for the option of having I/O
> >scheduling.
> 
> It's the only right way forward. blk-mq might not offer any substantial
> advantages to rotating storage, but with scheduling, it won't offer a
> downside either. And it'll take us towards the real goal, which is to
> have just one IO path.
> >>>
> >>>ok
> >>>
> Adding a new scheduler for the legacy IO path
> makes no sense.
> >>>
> >>>I would fully agree if effective and stable I/O scheduling would be
> >>>available in blk-mq in one or two months.  But I guess that it will
> >>>take at least one year optimistically, given the current status of the
> >>>needed infrastructure, and given the great difficulties of doing
> >>>effective scheduling at the high parallelism and extreme target speeds
> >>>of blk-mq.  Of course, this holds true unless little clever scheduling
> >>>is performed.
> >>>
> >>>So, what's the point in forcing a lot of users wait another year or
> >>>more, for a solution that has yet to be even defined, while they could
> >>>enjoy a much better system, and then switch an even better system when
> >>>scheduling is ready in blk-mq too?
> >>
> >>That same argument could have been made 2 years ago. Saying no to a new
> >>scheduler for the legacy framework goes back roughly that long. We could
> >>have had BFQ for mq NOW, if we didn't keep coming back to this very
> >>point.
> >>
> >>I'm hesistant to add a new scheduler because it's very easy to add, very
> >>difficult to get rid of. If we do add BFQ as a legacy scheduler now,
> >>it'll take us years and years to get rid of it again. We should be
> >>moving towards LESS moving parts in the legacy path, not more.
> >>
> >>We can keep having this discussion every few years, but I think we'd
> >>both prefer to make some actual progress here. It's perfectly fine to
> >>add an interface for a single queue interface for an IO scheduler for
> >>blk-mq, since we don't care too much about scalability there. And that
> >>won't take years, that should be a few weeks. Retrofitting BFQ on top of
> >>that should not be hard either. That can co-exist with a real multiqueue
> >>scheduler as well, something that's geared towards some fairness for
> >>faster devices.
> >
> >OK, so some solution like having a variant of blk_sq_make_request() that
> >will consume requests, do IO scheduling decisions on them, and feed them
> >into the HW queue is it sees fit would be acceptable? That will provide the
> >IO scheduler a global view that it needs for complex scheduling decisions
> >so it should indeed be relatively easy to port BFQ to work like that.
> 
> I'd probably start off Omar's base [1] that switches the software queues
> to store bios instead of requests, since that lifts the of the 1:1
> mapping between what we can queue up and what we can dispatch. Without
> that, the IO scheduler won't have too much to work with. And with that
> in place, it'll be a "bio in, request out" type of setup, which is
> similar to what we have in the legacy path.
>
> I'd keep the software queues, but as a starting point, mandate 1
> hardware queue to keep that as the per-device view of the state. The IO
> scheduler would be responsible for moving one or more bios from the
> software queues to the hardware queue, when they are ready to dispatch.
> 
> [1] 
> https://github.com/osandov/linux/commit/8ef3508628b6cf7c4712cd3d8084ee11ef5d2530

Yeah, but what would be software queues actually good for for a single
queue device with device-global IO scheduling? The IO scheduler doing
complex decisions will keep all the bios / requests in a single structure
anyway so there's no scalability to gain from per-cpu software queues...
So you can directly consume bios in your ->make_request handler, place it
in IO scheduler structures and then push requests out to the HW queue in
response to HW tags getting freed (i.e. IO completion). No need
for intermediate software queues. But maybe I miss something.

 

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-28 Thread Ulf Hansson
[...]

>
>> Moreover, I am still trying to understand what's the big deal to why
>> you say no to BFQ as a legacy scheduler. Ideally it shouldn't cause
>> you any maintenance burden and it doesn't make the removal of the
>> legacy blk layer any more difficult, right?
>
>
> Not sure I can state it much clearer. It's a new scheduler, and a
> complicated one at that. It WILL carry a maintenance burden. And I'm

Really? Either you maintain the code or not. And if Paolo would do it,
then your are off the hook!

> really not that interested in adding such a burden for something that
> will be defunct as soon as the single queue blk-mq version is done.
> Additionally, if we put BFQ in right now, the motivation to do the real
> work will be gone.

You have been pushing Paolo in different directions throughout the
years with his work in BFQ, wasting lots of his time/effort.

You have not given him any credibility for his work in BFQ and now you
point him yet in another direction.

I understand Paolo is a very persistent hard working guy, most likely
because he is really confident about his work in BFQ and he should be!

But, regarding motivation, if you continue to push him in different
directions and without giving him any credibility - then at some
point, you probably knows what will happen.

>
> The path forward is clear. It'd be a lot better to put some work behind
> that, rather than continue this email thread.

Yes, it seems so!

Kind regards
Ulf Hansson
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Jens Axboe

On 10/27/2016 01:34 PM, Ulf Hansson wrote:

[...]


Instead, what I can tell, as we have been looking into converting mmc
(which I maintains) and that is indeed a significant amount of work.
We will need to rip out all of the mmc request management, and most
likely we also need to extend the blkmq interface - as to be able to
do re-implement all the current request optimizations. We are looking
into this, but it just takes time.



It's usually as much work as you make it into, for most cases it's
pretty straight forward and usually removes more code than it adds.
Hence the end result is better for it as well - less code in a driver is
better.


From a scalability and maintenance point of view, converting to blkmq
makes perfect sense.

Although, me personally don't want to sacrifice on performance (at
least very little), just for the sake of gaining in
scalability/maintainability.


Nobody has said anything about sacrificing performance. And whether you
like it or not, maintainability is always the most important aspect.
Even performance takes a backseat to maintainability.


I would rather strive to adopt the blkmq framework to also suit my
needs. Then it simply do takes more time.

For example, in the mmc case we have implemented an asynchronous
request path, which greatly improves performance on some systems.


blk-mq has evolved to support a variety of devices, there's nothing
special about mmc that can't work well within that framework.


3)
While we work on scheduling in blkmq (at least for single queue
devices), it's of course important that we set high goals. Having BFQ
(and the other schedulers) in the legacy blk, provides a good
reference for what we could aim for.




Sure, but you don't need BFQ to be included in the kernel for that.



Perhaps not.

But does that mean, you expect Paolo to maintain an up to date BFQ
tree for you?



I don't expect anything. If Paolo or others want to compare with BFQ on
the legacy IO path, then they can do that however way they want. If you
(and others) want to have that reference point, it's up to you how to
accomplish that.


Do I get this right? You personally don't care about using BFQ as
reference when evolving blkmq for single queue devices?

Paolo and lots of other Linux users certainly do care about this.


I'm getting a little tired of this putting words in my mouth... That is
not what I'm saying at all. What I'm saying is that the people working
on BFQ can do what they need to do to have a reference implementation to
compare against. You don't need BFQ in the kernel for that. I said it's
up to YOU, with the you here meaning the people that want to work on it,
how that goes down.


Moreover, I am still trying to understand what's the big deal to why
you say no to BFQ as a legacy scheduler. Ideally it shouldn't cause
you any maintenance burden and it doesn't make the removal of the
legacy blk layer any more difficult, right?


Not sure I can state it much clearer. It's a new scheduler, and a
complicated one at that. It WILL carry a maintenance burden. And I'm
really not that interested in adding such a burden for something that
will be defunct as soon as the single queue blk-mq version is done.
Additionally, if we put BFQ in right now, the motivation to do the real
work will be gone.

The path forward is clear. It'd be a lot better to put some work behind
that, rather than continue this email thread.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Christoph Hellwig
On Thu, Oct 27, 2016 at 08:41:27PM +0100, Mark Brown wrote:
> Plus the benchmarking to verify that it works well of course, especially
> initially where it'll also be a new queue infrastructure as well as the
> blk-mq conversion itself.  It does feel like something that's going to
> take at least a couple of kernel releases to get through.

Or to put it the other way around:  it could have been long done
if people had started it the first it was suggestead.  Instead you guys
keep arguing and nothing gets done.  Get started now, waiting won't
make anything go faster.

> I think there's also value in having improvements there for people who
> benefit from them while queue infrastructure for blk-mq is being worked
> on.  

Well, apply it to you vendor tree then and maintain it yourself if you
disagree with our direction.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Mark Brown
On Thu, Oct 27, 2016 at 12:21:06PM -0600, Jens Axboe wrote:
> On 10/27/2016 12:13 PM, Ulf Hansson wrote:

> > I can imagine, that it's not always a straight forward "convert to blk
> > mq" patch for every block device driver.

> Well, I've actually done a few conversions, and it's not difficult at
> all. The grunt of the work is usually around converting to using some of
> the blk-mq features for parts of the driver that it had implemented
> privately, like timeout handling, etc.

Plus the benchmarking to verify that it works well of course, especially
initially where it'll also be a new queue infrastructure as well as the
blk-mq conversion itself.  It does feel like something that's going to
take at least a couple of kernel releases to get through.

> > > > 3)
> > > > While we work on scheduling in blkmq (at least for single queue
> > > > devices), it's of course important that we set high goals. Having BFQ
> > > > (and the other schedulers) in the legacy blk, provides a good
> > > > reference for what we could aim for.

> > > Sure, but you don't need BFQ to be included in the kernel for that.

> > Perhaps not.

> > But does that mean, you expect Paolo to maintain an up to date BFQ
> > tree for you?

> I don't expect anything. If Paolo or others want to compare with BFQ on
> the legacy IO path, then they can do that however way they want. If you
> (and others) want to have that reference point, it's up to you how to
> accomplish that.

I think there's also value in having improvements there for people who
benefit from them while queue infrastructure for blk-mq is being worked
on.  


signature.asc
Description: PGP signature


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Ulf Hansson
[...]

>> Instead, what I can tell, as we have been looking into converting mmc
>> (which I maintains) and that is indeed a significant amount of work.
>> We will need to rip out all of the mmc request management, and most
>> likely we also need to extend the blkmq interface - as to be able to
>> do re-implement all the current request optimizations. We are looking
>> into this, but it just takes time.
>
>
> It's usually as much work as you make it into, for most cases it's
> pretty straight forward and usually removes more code than it adds.
> Hence the end result is better for it as well - less code in a driver is
> better.

>From a scalability and maintenance point of view, converting to blkmq
makes perfect sense.

Although, me personally don't want to sacrifice on performance (at
least very little), just for the sake of gaining in
scalability/maintainability.

I would rather strive to adopt the blkmq framework to also suit my
needs. Then it simply do takes more time.

For example, in the mmc case we have implemented an asynchronous
request path, which greatly improves performance on some systems.

>
>> I can imagine, that it's not always a straight forward "convert to blk
>> mq" patch for every block device driver.
>
>
> Well, I've actually done a few conversions, and it's not difficult at
> all. The grunt of the work is usually around converting to using some of
> the blk-mq features for parts of the driver that it had implemented
> privately, like timeout handling, etc.
>
> I'm always happy to help people with converting drivers.

Great, we ping you if we need some help! Thanks!

>
 3)
 While we work on scheduling in blkmq (at least for single queue
 devices), it's of course important that we set high goals. Having BFQ
 (and the other schedulers) in the legacy blk, provides a good
 reference for what we could aim for.
>>>
>>>
>>>
>>> Sure, but you don't need BFQ to be included in the kernel for that.
>>
>>
>> Perhaps not.
>>
>> But does that mean, you expect Paolo to maintain an up to date BFQ
>> tree for you?
>
>
> I don't expect anything. If Paolo or others want to compare with BFQ on
> the legacy IO path, then they can do that however way they want. If you
> (and others) want to have that reference point, it's up to you how to
> accomplish that.

Do I get this right? You personally don't care about using BFQ as
reference when evolving blkmq for single queue devices?

Paolo and lots of other Linux users certainly do care about this.

Moreover, I am still trying to understand what's the big deal to why
you say no to BFQ as a legacy scheduler. Ideally it shouldn't cause
you any maintenance burden and it doesn't make the removal of the
legacy blk layer any more difficult, right?

Kind regards
Ulf Hansson
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Jens Axboe

On 10/27/2016 11:32 AM, Ulf Hansson wrote:

[...]



I'm hesistant to add a new scheduler because it's very easy to add, very
difficult to get rid of. If we do add BFQ as a legacy scheduler now,
it'll take us years and years to get rid of it again. We should be
moving towards LESS moving parts in the legacy path, not more.


Jens, I think you are wrong here and let me try to elaborate on why.

1)
We already have legacy schedulers like CFQ, DEADLINE, etc - and most
block device drivers are still using the legacy blk interface.


I don't think that's an accurate statement. In terms of coverage, most
drivers do support blk-mq. Anything SCSI, nvme, virtio-blk, SATA runs on
(or can run on) top of blk-mq.


To be able to remove the legacy blk layer, all block device drivers
must be converted to blkmq - of course.


That's a given.


So to reach that goal, we will not only need to evolve blkmq to allow
scheduling (at least for single queue devices), but we also need to
convert *all* block device drivers to blkmq. For sure this will take
*years* and not months.


Correct.


More important, when the transition to blkmq has been completed, then
there is absolutely no difference (from effort point of view) in
removing the legacy blk layer - no matter if we have BFQ in there or
not.

I do understand if you have concern from maintenance point of view, as
I assume you would rather focus on evolving blkmq, than care about
legacy blk code. So, would it help if Paolo volunteers to maintain the
BFQ code in the meantime?


We're obviously still maintaining the legacy IO path. But we don't want
to actively develop it, and we haven't, for a long time.

And Paolo maintaining it is a strict requirement for inclusion, legacy
or blk-mq aside. That would go for both. I'd never accept a major
feature from an individual or company if they weren't willing and
capable of maintaining it. Throwing submissions over the wall is not
viable.


2)
While we work on evolving blkmq and convert block device drivers to
it, BFQ could as a separate legacy scheduler, help *lots* of Linux
users to get a significant improved experience. Should we really
prevent them from that? I think you block maintainer guys, really need
to consider this fact.


You still seem to be basing that assumption on the notion that we have
to convert tons of drivers for BFQ to make sense under the blk-mq
umbrella. That's not the case.


3)
While we work on scheduling in blkmq (at least for single queue
devices), it's of course important that we set high goals. Having BFQ
(and the other schedulers) in the legacy blk, provides a good
reference for what we could aim for.


Sure, but you don't need BFQ to be included in the kernel for that.


We can keep having this discussion every few years, but I think we'd
both prefer to make some actual progress here. It's perfectly fine to
add an interface for a single queue interface for an IO scheduler for
blk-mq, since we don't care too much about scalability there. And that
won't take years, that should be a few weeks. Retrofitting BFQ on top of
that should not be hard either. That can co-exist with a real multiqueue
scheduler as well, something that's geared towards some fairness for
faster devices.


That's really great news!

I hope we get a possibility to meet and discuss the plans for this at
Kernel summit/Linux Plumbers the next week!


I'll be there.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Ulf Hansson
[...]

>
> I'm hesistant to add a new scheduler because it's very easy to add, very
> difficult to get rid of. If we do add BFQ as a legacy scheduler now,
> it'll take us years and years to get rid of it again. We should be
> moving towards LESS moving parts in the legacy path, not more.

Jens, I think you are wrong here and let me try to elaborate on why.

1)
We already have legacy schedulers like CFQ, DEADLINE, etc - and most
block device drivers are still using the legacy blk interface.

To be able to remove the legacy blk layer, all block device drivers
must be converted to blkmq - of course.

So to reach that goal, we will not only need to evolve blkmq to allow
scheduling (at least for single queue devices), but we also need to
convert *all* block device drivers to blkmq. For sure this will take
*years* and not months.

More important, when the transition to blkmq has been completed, then
there is absolutely no difference (from effort point of view) in
removing the legacy blk layer - no matter if we have BFQ in there or
not.

I do understand if you have concern from maintenance point of view, as
I assume you would rather focus on evolving blkmq, than care about
legacy blk code. So, would it help if Paolo volunteers to maintain the
BFQ code in the meantime?

2)
While we work on evolving blkmq and convert block device drivers to
it, BFQ could as a separate legacy scheduler, help *lots* of Linux
users to get a significant improved experience. Should we really
prevent them from that? I think you block maintainer guys, really need
to consider this fact.

3)
While we work on scheduling in blkmq (at least for single queue
devices), it's of course important that we set high goals. Having BFQ
(and the other schedulers) in the legacy blk, provides a good
reference for what we could aim for.

>
> We can keep having this discussion every few years, but I think we'd
> both prefer to make some actual progress here. It's perfectly fine to
> add an interface for a single queue interface for an IO scheduler for
> blk-mq, since we don't care too much about scalability there. And that
> won't take years, that should be a few weeks. Retrofitting BFQ on top of
> that should not be hard either. That can co-exist with a real multiqueue
> scheduler as well, something that's geared towards some fairness for
> faster devices.

That's really great news!

I hope we get a possibility to meet and discuss the plans for this at
Kernel summit/Linux Plumbers the next week!

>
> --
> Jens Axboe

Kind regards
Ulf Hansson
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Jens Axboe

On 10/27/2016 03:26 AM, Jan Kara wrote:

On Wed 26-10-16 10:12:38, Jens Axboe wrote:

On 10/26/2016 10:04 AM, Paolo Valente wrote:



Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha scritto:

On 10/26/2016 09:29 AM, Christoph Hellwig wrote:

On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:

The question to ask first is whether to actually have pluggable
schedulers on blk-mq at all, or just have one that is meant to
do the right thing in every case (and possibly can be bypassed
completely).


That would be my preference.  Have a BFQ-variant for blk-mq as an
option (default to off unless opted in by the driver or user), and
not other scheduler for blk-mq.  Don't bother with bfq for non
blk-mq.  It's not like there is any advantage in the legacy-request
device even for slow devices, except for the option of having I/O
scheduling.


It's the only right way forward. blk-mq might not offer any substantial
advantages to rotating storage, but with scheduling, it won't offer a
downside either. And it'll take us towards the real goal, which is to
have just one IO path.


ok


Adding a new scheduler for the legacy IO path
makes no sense.


I would fully agree if effective and stable I/O scheduling would be
available in blk-mq in one or two months.  But I guess that it will
take at least one year optimistically, given the current status of the
needed infrastructure, and given the great difficulties of doing
effective scheduling at the high parallelism and extreme target speeds
of blk-mq.  Of course, this holds true unless little clever scheduling
is performed.

So, what's the point in forcing a lot of users wait another year or
more, for a solution that has yet to be even defined, while they could
enjoy a much better system, and then switch an even better system when
scheduling is ready in blk-mq too?


That same argument could have been made 2 years ago. Saying no to a new
scheduler for the legacy framework goes back roughly that long. We could
have had BFQ for mq NOW, if we didn't keep coming back to this very
point.

I'm hesistant to add a new scheduler because it's very easy to add, very
difficult to get rid of. If we do add BFQ as a legacy scheduler now,
it'll take us years and years to get rid of it again. We should be
moving towards LESS moving parts in the legacy path, not more.

We can keep having this discussion every few years, but I think we'd
both prefer to make some actual progress here. It's perfectly fine to
add an interface for a single queue interface for an IO scheduler for
blk-mq, since we don't care too much about scalability there. And that
won't take years, that should be a few weeks. Retrofitting BFQ on top of
that should not be hard either. That can co-exist with a real multiqueue
scheduler as well, something that's geared towards some fairness for
faster devices.


OK, so some solution like having a variant of blk_sq_make_request() that
will consume requests, do IO scheduling decisions on them, and feed them
into the HW queue is it sees fit would be acceptable? That will provide the
IO scheduler a global view that it needs for complex scheduling decisions
so it should indeed be relatively easy to port BFQ to work like that.


I'd probably start off Omar's base [1] that switches the software queues
to store bios instead of requests, since that lifts the of the 1:1
mapping between what we can queue up and what we can dispatch. Without
that, the IO scheduler won't have too much to work with. And with that
in place, it'll be a "bio in, request out" type of setup, which is
similar to what we have in the legacy path.

I'd keep the software queues, but as a starting point, mandate 1
hardware queue to keep that as the per-device view of the state. The IO
scheduler would be responsible for moving one or more bios from the
software queues to the hardware queue, when they are ready to dispatch.

[1] 
https://github.com/osandov/linux/commit/8ef3508628b6cf7c4712cd3d8084ee11ef5d2530


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Jens Axboe

On 10/27/2016 08:34 AM, Grozdan wrote:

On Thu, Oct 27, 2016 at 11:26 AM, Jan Kara  wrote:

On Wed 26-10-16 10:12:38, Jens Axboe wrote:

On 10/26/2016 10:04 AM, Paolo Valente wrote:



Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha scritto:

On 10/26/2016 09:29 AM, Christoph Hellwig wrote:

On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:

The question to ask first is whether to actually have pluggable
schedulers on blk-mq at all, or just have one that is meant to
do the right thing in every case (and possibly can be bypassed
completely).


That would be my preference.  Have a BFQ-variant for blk-mq as an
option (default to off unless opted in by the driver or user), and
not other scheduler for blk-mq.  Don't bother with bfq for non
blk-mq.  It's not like there is any advantage in the legacy-request
device even for slow devices, except for the option of having I/O
scheduling.


It's the only right way forward. blk-mq might not offer any substantial
advantages to rotating storage, but with scheduling, it won't offer a
downside either. And it'll take us towards the real goal, which is to
have just one IO path.


ok


Adding a new scheduler for the legacy IO path
makes no sense.


I would fully agree if effective and stable I/O scheduling would be
available in blk-mq in one or two months.  But I guess that it will
take at least one year optimistically, given the current status of the
needed infrastructure, and given the great difficulties of doing
effective scheduling at the high parallelism and extreme target speeds
of blk-mq.  Of course, this holds true unless little clever scheduling
is performed.

So, what's the point in forcing a lot of users wait another year or
more, for a solution that has yet to be even defined, while they could
enjoy a much better system, and then switch an even better system when
scheduling is ready in blk-mq too?


That same argument could have been made 2 years ago. Saying no to a new
scheduler for the legacy framework goes back roughly that long. We could
have had BFQ for mq NOW, if we didn't keep coming back to this very
point.

I'm hesistant to add a new scheduler because it's very easy to add, very
difficult to get rid of. If we do add BFQ as a legacy scheduler now,
it'll take us years and years to get rid of it again. We should be
moving towards LESS moving parts in the legacy path, not more.

We can keep having this discussion every few years, but I think we'd
both prefer to make some actual progress here. It's perfectly fine to
add an interface for a single queue interface for an IO scheduler for
blk-mq, since we don't care too much about scalability there. And that
won't take years, that should be a few weeks. Retrofitting BFQ on top of
that should not be hard either. That can co-exist with a real multiqueue
scheduler as well, something that's geared towards some fairness for
faster devices.


OK, so some solution like having a variant of blk_sq_make_request() that
will consume requests, do IO scheduling decisions on them, and feed them
into the HW queue is it sees fit would be acceptable? That will provide the
IO scheduler a global view that it needs for complex scheduling decisions
so it should indeed be relatively easy to port BFQ to work like that.

Honza
--
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Hello,

Let me first say that I'm in no way associated with Paolo Valente or
any other BFQ developer. I'm a mere user who has had great experience
using BFQ

My workload is one that takes my disks to their limits. I often use
large files like raw Blu-ray streams which then I remux to mkv's while
at the same time streaming at least 2 movies to various devices in
house and using my system as I do while the remuxing process is going
on. At times, I'm also pushing video files to my NAS at close to Gbps
speed while the stuff I mentioned is in progress

My experience with BFQ is that it has never resulted in the video
streams being interrupted due to disk trashing. I've extensively used
all the other Linux disk schedulers in the past and what I've observed
is that whenever I start the remuxing (and copying) process, the
streams will begin to hiccup, stutter and often multi-seconds long
"waits" will occur. It gets even worse, when I do this kind of
workload, the whole system will come to almost a halt and
interactivity goes out the window. Impossible to start an app in a
reasonable amount of time. Loading a visited website makes Chrome hang
while trying to get the contents from its cache, etc

BFQ has greatly helped to have a responsive system during such
operations and as I said, I have never experience any interruption of
the video streams. Do I think BFQ is the best 

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Grozdan
On Thu, Oct 27, 2016 at 11:26 AM, Jan Kara  wrote:
> On Wed 26-10-16 10:12:38, Jens Axboe wrote:
>> On 10/26/2016 10:04 AM, Paolo Valente wrote:
>> >
>> >>Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha 
>> >>scritto:
>> >>
>> >>On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
>> >>>On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
>> The question to ask first is whether to actually have pluggable
>> schedulers on blk-mq at all, or just have one that is meant to
>> do the right thing in every case (and possibly can be bypassed
>> completely).
>> >>>
>> >>>That would be my preference.  Have a BFQ-variant for blk-mq as an
>> >>>option (default to off unless opted in by the driver or user), and
>> >>>not other scheduler for blk-mq.  Don't bother with bfq for non
>> >>>blk-mq.  It's not like there is any advantage in the legacy-request
>> >>>device even for slow devices, except for the option of having I/O
>> >>>scheduling.
>> >>
>> >>It's the only right way forward. blk-mq might not offer any substantial
>> >>advantages to rotating storage, but with scheduling, it won't offer a
>> >>downside either. And it'll take us towards the real goal, which is to
>> >>have just one IO path.
>> >
>> >ok
>> >
>> >>Adding a new scheduler for the legacy IO path
>> >>makes no sense.
>> >
>> >I would fully agree if effective and stable I/O scheduling would be
>> >available in blk-mq in one or two months.  But I guess that it will
>> >take at least one year optimistically, given the current status of the
>> >needed infrastructure, and given the great difficulties of doing
>> >effective scheduling at the high parallelism and extreme target speeds
>> >of blk-mq.  Of course, this holds true unless little clever scheduling
>> >is performed.
>> >
>> >So, what's the point in forcing a lot of users wait another year or
>> >more, for a solution that has yet to be even defined, while they could
>> >enjoy a much better system, and then switch an even better system when
>> >scheduling is ready in blk-mq too?
>>
>> That same argument could have been made 2 years ago. Saying no to a new
>> scheduler for the legacy framework goes back roughly that long. We could
>> have had BFQ for mq NOW, if we didn't keep coming back to this very
>> point.
>>
>> I'm hesistant to add a new scheduler because it's very easy to add, very
>> difficult to get rid of. If we do add BFQ as a legacy scheduler now,
>> it'll take us years and years to get rid of it again. We should be
>> moving towards LESS moving parts in the legacy path, not more.
>>
>> We can keep having this discussion every few years, but I think we'd
>> both prefer to make some actual progress here. It's perfectly fine to
>> add an interface for a single queue interface for an IO scheduler for
>> blk-mq, since we don't care too much about scalability there. And that
>> won't take years, that should be a few weeks. Retrofitting BFQ on top of
>> that should not be hard either. That can co-exist with a real multiqueue
>> scheduler as well, something that's geared towards some fairness for
>> faster devices.
>
> OK, so some solution like having a variant of blk_sq_make_request() that
> will consume requests, do IO scheduling decisions on them, and feed them
> into the HW queue is it sees fit would be acceptable? That will provide the
> IO scheduler a global view that it needs for complex scheduling decisions
> so it should indeed be relatively easy to port BFQ to work like that.
>
> Honza
> --
> Jan Kara 
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hello,

Let me first say that I'm in no way associated with Paolo Valente or
any other BFQ developer. I'm a mere user who has had great experience
using BFQ

My workload is one that takes my disks to their limits. I often use
large files like raw Blu-ray streams which then I remux to mkv's while
at the same time streaming at least 2 movies to various devices in
house and using my system as I do while the remuxing process is going
on. At times, I'm also pushing video files to my NAS at close to Gbps
speed while the stuff I mentioned is in progress

My experience with BFQ is that it has never resulted in the video
streams being interrupted due to disk trashing. I've extensively used
all the other Linux disk schedulers in the past and what I've observed
is that whenever I start the remuxing (and copying) process, the
streams will begin to hiccup, stutter and often multi-seconds long
"waits" will occur. It gets even worse, when I do this kind of
workload, the whole system will come to almost a halt and
interactivity goes out the window. Impossible to start an app in a
reasonable amount of time. Loading a visited website makes Chrome hang

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-27 Thread Jan Kara
On Wed 26-10-16 10:12:38, Jens Axboe wrote:
> On 10/26/2016 10:04 AM, Paolo Valente wrote:
> >
> >>Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha 
> >>scritto:
> >>
> >>On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
> >>>On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
> The question to ask first is whether to actually have pluggable
> schedulers on blk-mq at all, or just have one that is meant to
> do the right thing in every case (and possibly can be bypassed
> completely).
> >>>
> >>>That would be my preference.  Have a BFQ-variant for blk-mq as an
> >>>option (default to off unless opted in by the driver or user), and
> >>>not other scheduler for blk-mq.  Don't bother with bfq for non
> >>>blk-mq.  It's not like there is any advantage in the legacy-request
> >>>device even for slow devices, except for the option of having I/O
> >>>scheduling.
> >>
> >>It's the only right way forward. blk-mq might not offer any substantial
> >>advantages to rotating storage, but with scheduling, it won't offer a
> >>downside either. And it'll take us towards the real goal, which is to
> >>have just one IO path.
> >
> >ok
> >
> >>Adding a new scheduler for the legacy IO path
> >>makes no sense.
> >
> >I would fully agree if effective and stable I/O scheduling would be
> >available in blk-mq in one or two months.  But I guess that it will
> >take at least one year optimistically, given the current status of the
> >needed infrastructure, and given the great difficulties of doing
> >effective scheduling at the high parallelism and extreme target speeds
> >of blk-mq.  Of course, this holds true unless little clever scheduling
> >is performed.
> >
> >So, what's the point in forcing a lot of users wait another year or
> >more, for a solution that has yet to be even defined, while they could
> >enjoy a much better system, and then switch an even better system when
> >scheduling is ready in blk-mq too?
> 
> That same argument could have been made 2 years ago. Saying no to a new
> scheduler for the legacy framework goes back roughly that long. We could
> have had BFQ for mq NOW, if we didn't keep coming back to this very
> point.
> 
> I'm hesistant to add a new scheduler because it's very easy to add, very
> difficult to get rid of. If we do add BFQ as a legacy scheduler now,
> it'll take us years and years to get rid of it again. We should be
> moving towards LESS moving parts in the legacy path, not more.
> 
> We can keep having this discussion every few years, but I think we'd
> both prefer to make some actual progress here. It's perfectly fine to
> add an interface for a single queue interface for an IO scheduler for
> blk-mq, since we don't care too much about scalability there. And that
> won't take years, that should be a few weeks. Retrofitting BFQ on top of
> that should not be hard either. That can co-exist with a real multiqueue
> scheduler as well, something that's geared towards some fairness for
> faster devices.

OK, so some solution like having a variant of blk_sq_make_request() that
will consume requests, do IO scheduling decisions on them, and feed them
into the HW queue is it sees fit would be acceptable? That will provide the
IO scheduler a global view that it needs for complex scheduling decisions
so it should indeed be relatively easy to port BFQ to work like that.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Jens Axboe

On 10/26/2016 10:04 AM, Paolo Valente wrote:



Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe  ha scritto:

On 10/26/2016 09:29 AM, Christoph Hellwig wrote:

On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:

The question to ask first is whether to actually have pluggable
schedulers on blk-mq at all, or just have one that is meant to
do the right thing in every case (and possibly can be bypassed
completely).


That would be my preference.  Have a BFQ-variant for blk-mq as an
option (default to off unless opted in by the driver or user), and
not other scheduler for blk-mq.  Don't bother with bfq for non
blk-mq.  It's not like there is any advantage in the legacy-request
device even for slow devices, except for the option of having I/O
scheduling.


It's the only right way forward. blk-mq might not offer any substantial
advantages to rotating storage, but with scheduling, it won't offer a
downside either. And it'll take us towards the real goal, which is to
have just one IO path.


ok


Adding a new scheduler for the legacy IO path
makes no sense.


I would fully agree if effective and stable I/O scheduling would be
available in blk-mq in one or two months.  But I guess that it will
take at least one year optimistically, given the current status of the
needed infrastructure, and given the great difficulties of doing
effective scheduling at the high parallelism and extreme target speeds
of blk-mq.  Of course, this holds true unless little clever scheduling
is performed.

So, what's the point in forcing a lot of users wait another year or
more, for a solution that has yet to be even defined, while they could
enjoy a much better system, and then switch an even better system when
scheduling is ready in blk-mq too?


That same argument could have been made 2 years ago. Saying no to a new
scheduler for the legacy framework goes back roughly that long. We could
have had BFQ for mq NOW, if we didn't keep coming back to this very
point.

I'm hesistant to add a new scheduler because it's very easy to add, very
difficult to get rid of. If we do add BFQ as a legacy scheduler now,
it'll take us years and years to get rid of it again. We should be
moving towards LESS moving parts in the legacy path, not more.

We can keep having this discussion every few years, but I think we'd
both prefer to make some actual progress here. It's perfectly fine to
add an interface for a single queue interface for an IO scheduler for
blk-mq, since we don't care too much about scalability there. And that
won't take years, that should be a few weeks. Retrofitting BFQ on top of
that should not be hard either. That can co-exist with a real multiqueue
scheduler as well, something that's geared towards some fairness for
faster devices.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Jens Axboe

On 10/26/2016 09:29 AM, Christoph Hellwig wrote:

On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:

The question to ask first is whether to actually have pluggable
schedulers on blk-mq at all, or just have one that is meant to
do the right thing in every case (and possibly can be bypassed
completely).


That would be my preference.  Have a BFQ-variant for blk-mq as an
option (default to off unless opted in by the driver or user), and
not other scheduler for blk-mq.  Don't bother with bfq for non
blk-mq.  It's not like there is any advantage in the legacy-request
device even for slow devices, except for the option of having I/O
scheduling.


It's the only right way forward. blk-mq might not offer any substantial
advantages to rotating storage, but with scheduling, it won't offer a
downside either. And it'll take us towards the real goal, which is to
have just one IO path. Adding a new scheduler for the legacy IO path
makes no sense. Adding one for blk-mq and phasing out the old path is
what we need to do.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Christoph Hellwig
On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
> The question to ask first is whether to actually have pluggable
> schedulers on blk-mq at all, or just have one that is meant to
> do the right thing in every case (and possibly can be bypassed
> completely).

That would be my preference.  Have a BFQ-variant for blk-mq as an
option (default to off unless opted in by the driver or user), and
not other scheduler for blk-mq.  Don't bother with bfq for non
blk-mq.  It's not like there is any advantage in the legacy-request
device even for slow devices, except for the option of having I/O
scheduling.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Arnd Bergmann
On Wednesday, October 26, 2016 8:05:11 AM CEST Bart Van Assche wrote:
> On 10/26/2016 04:34 AM, Jan Kara wrote:
> > On Wed 26-10-16 03:19:03, Christoph Hellwig wrote:
> >> Just as last time:
> >>
> >> big NAK for introducing giant new infrastructure like a new I/O scheduler
> >> for the legacy request structure.
> >>
> >> Please direct your engergy towards blk-mq instead.
> >
> > Christoph, we will probably talk about this next week but IMO rotating
> > disks and SATA based SSDs are going to stay with us for another 15 years,
> > likely more. For them blk-mq is no win, relatively complex IO scheduling
> > like CFQ or BFQ does is a big win for them in some cases. So I think IO
> > scheduling (and thus place for something like BFQ) is going to stay with us
> > for quite a long time still. So are we going to add hooks in blk-mq to
> > support full-blown IO scheduling at least for single queue devices? Or how
> > else do we want to support that HW?
> 
> Hello Jan,
> 
> Having two versions (one for non-blk-mq, one for blk-mq) of every I/O 
> scheduler would be a maintenance nightmare. Has anyone already analyzed 
> whether it would be possible to come up with an API for I/O schedulers 
> that makes it possible to use the same I/O scheduler for both blk-mq and 
> the traditional block layer?

The question to ask first is whether to actually have pluggable
schedulers on blk-mq at all, or just have one that is meant to
do the right thing in every case (and possibly can be bypassed
completely).

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Christoph Hellwig
Just as last time:

big NAK for introducing giant new infrastructure like a new I/O scheduler
for the legacy request structure.

Please direct your engergy towards blk-mq instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Paolo Valente
Hi,
this new patch series turns back to the initial approach, i.e., it
adds BFQ as an extra scheduler, instead of replacing CFQ with
BFQ. This patch series also contains all the improvements and bug
fixes recommended by Tejun [5], plus new features of BFQ-v8r5. Details
about old and new features in patch descriptions.

The first version of BFQ was submitted a few years ago [1]. It is
denoted as v0 in this patchset, to distinguish it from the version I
am submitting now, v8r5. In particular, the first two patches
introduce BFQ-v0, whereas the remaining patches turn progressively
BFQ-v0 into BFQ-v8r5.

Some patch generates WARNINGS with checkpatch.pl, but these WARNINGS
seem to be either unavoidable for the involved pieces of code (which
the patch just extends), or false positives.

For your convenience, a slightly updated and extended description of
BFQ follows.

On average CPUs, the current version of BFQ can handle devices
performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. These
are about the same limits as CFQ. There may be room for noticeable
improvements regarding these limits, but, given the overall
limitations of blk itself, I thought it was not the case to further
delay this new submission.

Here are some nice features of BFQ-v8r5.

Low latency for interactive applications

Regardless of the actual background workload, BFQ guarantees that, for
interactive tasks, the storage device is virtually as responsive as if
it was idle. For example, even if one or more of the following
background workloads are being executed:
- one or more large files are being read, written or copied,
- a tree of source files is being compiled,
- one or more virtual machines are performing I/O,
- a software update is in progress,
- indexing daemons are scanning filesystems and updating their
  databases,
starting an application or loading a file from within an application
takes about the same time as if the storage device was idle. As a
comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
applications experience high latencies, or even become unresponsive
until the background workload terminates (also on SSDs).

Low latency for soft real-time applications

Also soft real-time applications, such as audio and video
players/streamers, enjoy a low latency and a low drop rate, regardless
of the background I/O workload. As a consequence, these applications
do not suffer from almost any glitch due to the background workload.

Higher speed for code-development tasks

If some additional workload happens to be executed in parallel, then
BFQ executes the I/O-related components of typical code-development
tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
NOOP or DEADLINE.

High throughput

On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
up to 150% higher throughput than DEADLINE and NOOP, with all the
sequential workloads considered in our tests. With random workloads,
and with all the workloads on flash-based devices, BFQ achieves,
instead, about the same throughput as the other schedulers.

Strong fairness, bandwidth and delay guarantees

BFQ distributes the device throughput, and not just the device time,
among I/O-bound applications in proportion their weights, with any
workload and regardless of the device parameters. From these bandwidth
guarantees, it is possible to compute tight per-I/O-request delay
guarantees by a simple formula. If not configured for strict service
guarantees, BFQ switches to time-based resource sharing (only) for
applications that would otherwise cause a throughput loss.


BFQ achieves the above service properties thanks to the combination of
its accurate scheduling engine (patches 1-2), and a set of simple
heuristics and improvements (patches 3-14). Details on how BFQ and
its components work are provided in the descriptions of the
patches. In addition, an organic description of the main BFQ algorithm
and of most of its features can be found in this paper [2].

What BFQ can do in practice is shown, e.g., in this 8-minute demo with
an SSD: [3]. I made this demo with an older version of BFQ (v7r6) and
under Linux 3.17.0, but, for the tests considered in the demo,
performance has remained about the same with more recent BFQ and
kernel versions. More details about this point can be found here [4],
together with graphs showing the performance of BFQ, as compared with
CFQ, DEADLINE and NOOP, and on: a fast and a slow hard disk, a RAID1,
an SSD, a microSDHC Card and an eMMC. As an example, our results on
the SSD are reported also in a table at the end of this email.

Finally, as for testing in everyday use, BFQ is the default I/O
scheduler in, e.g., Mageia, Manjaro, Sabayon, OpenMandriva and Arch
Linux ARM, plus several kernel forks for PCs and smartphones. In
addition, BFQ is optionally available in, e.g., Arch, PCLinuxOS and
Gentoo, and we record several downloads a day from people using other
distributions. The feedback received so far