Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Tom Seewald
> I'm not convinced it's the domain of an IO scheduler to be involved,
> rather than it being explicit UX intended by the desktop environment.
> Seems to me the desktop environment is in a better position to know
> what users expect.

Well wouldn't bfq just be enforcing the bandwidth weights, if any, that were 
explicitly set in the various groups? If something is already creating and 
modifying control groups, then that something should have total control over 
setting the bandwidth weights. It's not obvious to me how the IO scheduler 
would be bypassing or otherwise ignoring whatever manages control groups.

It's also not clear to me how anything except an IO scheduler would be able to 
directly control how device bandwidth is shared.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Chris Murphy
On Tue, Jun 30, 2020 at 6:02 PM Tom Seewald  wrote:
>
> I forgot to mention that bfq appears to be the only IO scheduler that 
> supports cgroups-v2 IO controllers [1]. Perhaps I am wrong, but I wasn't able 
> to find documentation indicating that mq-deadline is cgroup-aware, at the 
> very least it's not documented in the official deadline tunables section [2].
>

I'm not convinced it's the domain of an IO scheduler to be involved,
rather than it being explicit UX intended by the desktop environment.
Seems to me the desktop environment is in a better position to know
what users expect.

> I'm mentioning this because btrfs' support for cgroups-v2 (and the IO 
> isolation/fairness capability it provides) was listed as one of the key 
> reasons to move to btrfs. While I am not clear on exactly how the IO 
> scheduler and files system interact when it comes to IO cgroups, I thought it 
> was worth bringing up.

It is and it's very relevant. I think there are more questions than
answers, to what degree there may be unexpected conflicts. In this
particular case I'm not that concerned about the majority. I'm in fact
concerned about a distinct minority who could end up having a
significantly worse experience, and have no idea why, and then being
told it is they who should do more testing and provide bug reports to
prove it's the IO scheduler causing the problem.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Tom Seewald
I forgot to mention that bfq appears to be the only IO scheduler that supports 
cgroups-v2 IO controllers [1]. Perhaps I am wrong, but I wasn't able to find 
documentation indicating that mq-deadline is cgroup-aware, at the very least 
it's not documented in the official deadline tunables section [2].

I'm mentioning this because btrfs' support for cgroups-v2 (and the IO 
isolation/fairness capability it provides) was listed as one of the key reasons 
to move to btrfs. While I am not clear on exactly how the IO scheduler and 
files system interact when it comes to IO cgroups, I thought it was worth 
bringing up.

[1] 
https://www.kernel.org/doc/html/latest/block/bfq-iosched.html#group-scheduling-with-bfq
[2] https://www.kernel.org/doc/html/latest/block/deadline-iosched.html
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jun 30, 2020 at 07:28:53PM +0100, Ankur Sinha wrote:
> On Tue, Jun 30, 2020 17:23:16 +, Zbigniew Jędrzejewski-Szmek wrote:
> > On Tue, Jun 30, 2020 at 04:25:23PM +0100, Ankur Sinha wrote:
> > > On Mon, Jun 29, 2020 15:01:24 -0600, Chris Murphy wrote:
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1851783
> > > > 
> > > > The main argument is that for typical and varied workloads in Fedora,
> > > > mostly on consumer hardware, we should use mq-deadline scheduler
> > > > rather than either none or bfq.
> > > > 
> > > > It may be true most folks with NVMe won't see anything bad with none,
> > > > but those who have heavier IO workloads are likely to be better off
> > > > with mq-deadline.
> > > > 
> > > > Further details are in the bug, but let's discuss it on list. Thanks!
> > > 
> > > There was this thread about our systems hanging, and the workaround was
> > > to revert to mq-deadline from bfq:
> > > 
> > > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/MJJFT5AOYUFZ3SO2EDVLJSDAZMZI4HAP/#DA7RCQFIAD4Z3Q7HQBW2ELPTLPYDKJMT
> > 
> > To clarify: you could reliably reproduce the issue when building steps in 
> > mock.
> > Did you verify that it is reliably fixed simply by switching 
> > bfq→mq-deadline?
> 
> Yes, that was the first change I had made and it had stopped the
> hanging. As a permanent fix, though, I switched to using isolation =
> simple in mock, and since that works, I've not changed it since.

OK, thanks.

> (I make it a point to provide the needed information for bugs, but this
> release my quota is currently being used up on getting Docker + minikube
> to work on F32 for $dayjob)
> 
> > > There are a few threads on AskFedora about systems hanging. They're not
> > > the easiest to debug but we did suggest people try switching to
> > > mq-deadline to see if it helps:
> > > 
> > > https://ask.fedoraproject.org/t/whole-os-freezes-watching-a-video-with-mpv/6770/10
> > > 
> > > I don't know enough about this to say if it's a bug and if it has been
> > > fixed.
> > 
> > There's a lot of noise in those bug reports. For heisenbugs, the fact
> > that something was an issue and after a flurry of half-random changes
> > to the system isn't, does not allow us conclude _anything_. We need
> > somebody who understands what they are doing to isolate the issue. In
> > particular, if this is a kernel hang, than we need a proper traceback
> > from the kernel, and not just assume it's the scheduler.
> 
> There is a kernel trace in the related bug that was cited there:
> https://bugzilla.redhat.com/show_bug.cgi?id=1767097#c7
> 
> which links to another bfq bug here that's currently needinfo:
> https://bugzilla.redhat.com/show_bug.cgi?id=1767539
> 
> > (In particular, if this is a race condition, changing the scheduler
> > could be just making the condition less likely because the system is
> > slower or faster or just schedules processes in a different order,
> > without the scheduler being relevant to the bug).
> 
> Like I said, I don't know. I'm a fairly advanced Linux user but you can
> hardly me to also be kernel hacker.  :)
> 
> For kernel bugs, I'd strongly suggest giving reporters steps by step
> instructions or links to using a "serial console" or a "netconsole".
> These are not part of my working vocabulary (I cannot speak for others).

Thanks for the links. This seems to be a tough cookie and I hope it
gets resolved as some point. And to clarify: my comment about
debugging was not directed to you in particular, apart from the
question above which you have already answered.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Ankur Sinha
On Tue, Jun 30, 2020 17:23:16 +, Zbigniew Jędrzejewski-Szmek wrote:
> On Tue, Jun 30, 2020 at 04:25:23PM +0100, Ankur Sinha wrote:
> > On Mon, Jun 29, 2020 15:01:24 -0600, Chris Murphy wrote:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1851783
> > > 
> > > The main argument is that for typical and varied workloads in Fedora,
> > > mostly on consumer hardware, we should use mq-deadline scheduler
> > > rather than either none or bfq.
> > > 
> > > It may be true most folks with NVMe won't see anything bad with none,
> > > but those who have heavier IO workloads are likely to be better off
> > > with mq-deadline.
> > > 
> > > Further details are in the bug, but let's discuss it on list. Thanks!
> > 
> > There was this thread about our systems hanging, and the workaround was
> > to revert to mq-deadline from bfq:
> > 
> > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/MJJFT5AOYUFZ3SO2EDVLJSDAZMZI4HAP/#DA7RCQFIAD4Z3Q7HQBW2ELPTLPYDKJMT
> 
> To clarify: you could reliably reproduce the issue when building steps in 
> mock.
> Did you verify that it is reliably fixed simply by switching bfq→mq-deadline?

Yes, that was the first change I had made and it had stopped the
hanging. As a permanent fix, though, I switched to using isolation =
simple in mock, and since that works, I've not changed it since.

(I make it a point to provide the needed information for bugs, but this
release my quota is currently being used up on getting Docker + minikube
to work on F32 for $dayjob)

> > There are a few threads on AskFedora about systems hanging. They're not
> > the easiest to debug but we did suggest people try switching to
> > mq-deadline to see if it helps:
> > 
> > https://ask.fedoraproject.org/t/whole-os-freezes-watching-a-video-with-mpv/6770/10
> > 
> > I don't know enough about this to say if it's a bug and if it has been
> > fixed.
> 
> There's a lot of noise in those bug reports. For heisenbugs, the fact
> that something was an issue and after a flurry of half-random changes
> to the system isn't, does not allow us conclude _anything_. We need
> somebody who understands what they are doing to isolate the issue. In
> particular, if this is a kernel hang, than we need a proper traceback
> from the kernel, and not just assume it's the scheduler.

There is a kernel trace in the related bug that was cited there:
https://bugzilla.redhat.com/show_bug.cgi?id=1767097#c7

which links to another bfq bug here that's currently needinfo:
https://bugzilla.redhat.com/show_bug.cgi?id=1767539

> (In particular, if this is a race condition, changing the scheduler
> could be just making the condition less likely because the system is
> slower or faster or just schedules processes in a different order,
> without the scheduler being relevant to the bug).

Like I said, I don't know. I'm a fairly advanced Linux user but you can
hardly me to also be kernel hacker.  :)

For kernel bugs, I'd strongly suggest giving reporters steps by step
instructions or links to using a "serial console" or a "netconsole".
These are not part of my working vocabulary (I cannot speak for others).

-- 
Thanks,
Regards,
Ankur Sinha "FranciscoD" (He / Him / His) | 
https://fedoraproject.org/wiki/User:Ankursinha
Time zone: Europe/London


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jun 30, 2020 at 04:25:23PM +0100, Ankur Sinha wrote:
> On Mon, Jun 29, 2020 15:01:24 -0600, Chris Murphy wrote:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1851783
> > 
> > The main argument is that for typical and varied workloads in Fedora,
> > mostly on consumer hardware, we should use mq-deadline scheduler
> > rather than either none or bfq.
> > 
> > It may be true most folks with NVMe won't see anything bad with none,
> > but those who have heavier IO workloads are likely to be better off
> > with mq-deadline.
> > 
> > Further details are in the bug, but let's discuss it on list. Thanks!
> 
> There was this thread about our systems hanging, and the workaround was
> to revert to mq-deadline from bfq:
> 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/MJJFT5AOYUFZ3SO2EDVLJSDAZMZI4HAP/#DA7RCQFIAD4Z3Q7HQBW2ELPTLPYDKJMT

To clarify: you could reliably reproduce the issue when building steps in mock.
Did you verify that it is reliably fixed simply by switching bfq→mq-deadline?

Zbyszek

> There are a few threads on AskFedora about systems hanging. They're not
> the easiest to debug but we did suggest people try switching to
> mq-deadline to see if it helps:
> 
> https://ask.fedoraproject.org/t/whole-os-freezes-watching-a-video-with-mpv/6770/10
> 
> I don't know enough about this to say if it's a bug and if it has been
> fixed.

There's a lot of noise in those bug reports. For heisenbugs, the fact
that something was an issue and after a flurry of half-random changes
to the system isn't, does not allow us conclude _anything_. We need
somebody who understands what they are doing to isolate the issue. In
particular, if this is a kernel hang, than we need a proper traceback
from the kernel, and not just assume it's the scheduler.
(In particular, if this is a race condition, changing the scheduler
could be just making the condition less likely because the system is
slower or faster or just schedules processes in a different order,
without the scheduler being relevant to the bug).

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Ankur Sinha
On Mon, Jun 29, 2020 15:01:24 -0600, Chris Murphy wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=1851783
> 
> The main argument is that for typical and varied workloads in Fedora,
> mostly on consumer hardware, we should use mq-deadline scheduler
> rather than either none or bfq.
> 
> It may be true most folks with NVMe won't see anything bad with none,
> but those who have heavier IO workloads are likely to be better off
> with mq-deadline.
> 
> Further details are in the bug, but let's discuss it on list. Thanks!

There was this thread about our systems hanging, and the workaround was
to revert to mq-deadline from bfq:

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/MJJFT5AOYUFZ3SO2EDVLJSDAZMZI4HAP/#DA7RCQFIAD4Z3Q7HQBW2ELPTLPYDKJMT

There are a few threads on AskFedora about systems hanging. They're not
the easiest to debug but we did suggest people try switching to
mq-deadline to see if it helps:

https://ask.fedoraproject.org/t/whole-os-freezes-watching-a-video-with-mpv/6770/10

I don't know enough about this to say if it's a bug and if it has been
fixed.

-- 
Thanks,
Regards,
Ankur Sinha "FranciscoD" (He / Him / His) | 
https://fedoraproject.org/wiki/User:Ankursinha
Time zone: Europe/London


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-30 Thread Michael Catanzaro

On Tue, Jun 30, 2020 at 4:44 am, Tom Seewald  wrote:
In case you find it useful, Paolo has posted his own results from 
testing IO schedulers on Linux [1][2] as well as the scripts he used 
to generate the load [3]. I don't claim that these results have been 
independently verified or that they are good representations of the 
real world, but they may be a useful set of data points.


Chris, notwithstanding your previous comments on benchmarks, Paolo's 
results look extraordinarily good for bfq. Any comment on these...?


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-29 Thread Tom Seewald
> It's super annoying for me to post, because benchmarks drive me crazy,
> and yet here I am posting one - this is almost like self flagellation
> to paste this...
> 
> https://www.phoronix.com/scan.php?page=article=linux-56-nvme;...
> 
> None of these benchmarks are representative of a generic desktop. The
> difficulty with desktop workloads is their heterogenetity. Some people
> are mixing music, others compiling, still others lots of web browsing
> (Chrome OS I guess went to bfq around the same time we did), and we
> just don't really know what people are going to do. Some even use
> Workstation as a base for more typical server operations.
> 
> The geometric mean isn't helpful either, because none of the tests are
> run concurrently or attempt to produce tag starvation which would
> result in latency spikes. That's where mq-deadline would do better
> than none.

In case you find it useful, Paolo has posted his own results from testing IO 
schedulers on Linux [1][2] as well as the scripts he used to generate the load 
[3]. I don't claim that these results have been independently verified or that 
they are good representations of the real world, but they may be a useful set 
of data points.

[1] http://algo.ing.unimo.it/people/paolo/disk_sched/results.php
[2] https://www.youtube.com/watch?v=w2bREYTe0-0
[3] https://github.com/Algodev-github/S
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-29 Thread Chris Murphy
On Mon, Jun 29, 2020 at 9:45 PM Tom Seewald  wrote:
>
> > The latter but considering they're a broad variety of workloads I
> > think it's misleading to call them server workloads as if that's one
> > particular type of thing, or not applicable to a desktop under IO
> > pressure. Why? (a) they're using consumer storage devices (b) these
> > are real workloads rather than simulations (c) even by upstream's own
> > descriptions of the various IO schedulers only mq-deadline is intended
> > to be generic. (d) it's really hard to prove anything in this area
> > without a lot of data.
>
> You are right that the difference between them is blurry. My question comes 
> from being unsure if it's the case that Fedora users are experiencing 
> problems with bfq but are not reporting them, or if there is something 
> specific that is causing that pathological scheduling behavior at Facebook.

They're using mq-deadline most everywhere, not just the servers, but
local computers and VMs.  They use kyber (which is Facebook
contributed) for high end storage, and it's not indicated for our
usage. I'm not sure they're seeing anything wrong per se with bfq,
it's just consistently not performing as well as mq-deadline due to
latencies. I'm not sure that's a bug if it's improving performance in
other areas that are relevant for the intended workloads. The gotcha
is, what are the intended workloads? What is even a desktop workload?

>It was also my understanding that Facebook primarily uses NVMe drives [1][2], 
>and that is the class of storage Fedora does not use bfq with. Is it possible 
>these latency problems occurred when using bfq with NVMe drives?

Not certain. But in our case we use 'none' for NVMe drives. For most
people that's OK, but then some workloads will suffer if you get a
task that has a heavy demand for tags, because there's no scheduler to
spread them out among those demanding them. So it's pulling a number
ouf of my butt, but none could be fine for 90% and not great for 10%.
If anything 'none' and NVMe is a server like configuration, if it's
running a typically homogenous workload.

> I now see that Paolo was cc'd in comment #9 of the bugzilla ticket, so 
> hopefully he responds.
>
> > But fair enough, I'll see about collecting some data before asking to
> > change the IO scheduler yet again.
>
> For the record, I definitely agree that mq-deadline should become the default 
> scheduler for NVMe drives.

The other question I have, I'm pretty sure we're using the same udev
rule across all of Fedora. It's not just on the desktops. My Fedora
Server is using bfq for everything. VM's are using mq-deadline for
/dev/vd* virtio devices and bfq for /dev/sr* and /dev/sd* devies. I
have nothing against bfq but I'm inclined to go with the most generic
IO scheduler as the default, and let people optimize for their
specific workload, rather than the other way around.

It's super annoying for me to post, because benchmarks drive me crazy,
and yet here I am posting one - this is almost like self flagellation
to paste this...

https://www.phoronix.com/scan.php?page=article=linux-56-nvme=4

None of these benchmarks are representative of a generic desktop. The
difficulty with desktop workloads is their heterogenetity. Some people
are mixing music, others compiling, still others lots of web browsing
(Chrome OS I guess went to bfq around the same time we did), and we
just don't really know what people are going to do. Some even use
Workstation as a base for more typical server operations.

The geometric mean isn't helpful either, because none of the tests are
run concurrently or attempt to produce tag starvation which would
result in latency spikes. That's where mq-deadline would do better
than none.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-29 Thread Tom Seewald
> The latter but considering they're a broad variety of workloads I
> think it's misleading to call them server workloads as if that's one
> particular type of thing, or not applicable to a desktop under IO
> pressure. Why? (a) they're using consumer storage devices (b) these
> are real workloads rather than simulations (c) even by upstream's own
> descriptions of the various IO schedulers only mq-deadline is intended
> to be generic. (d) it's really hard to prove anything in this area
> without a lot of data.

You are right that the difference between them is blurry. My question comes 
from being unsure if it's the case that Fedora users are experiencing problems 
with bfq but are not reporting them, or if there is something specific that is 
causing that pathological scheduling behavior at Facebook. It was also my 
understanding that Facebook primarily uses NVMe drives [1][2], and that is the 
class of storage Fedora does not use bfq with. Is it possible these latency 
problems occurred when using bfq with NVMe drives?

I now see that Paolo was cc'd in comment #9 of the bugzilla ticket, so 
hopefully he responds.

> But fair enough, I'll see about collecting some data before asking to
> change the IO scheduler yet again.

For the record, I definitely agree that mq-deadline should become the default 
scheduler for NVMe drives.

[1] 
https://nvmexpress.org/how-facebook-leverages-nvme-cloud-storage-in-the-datacenter/
[2] 
https://engineering.fb.com/data-center-engineering/introducing-lightning-a-flexible-nvme-jbof/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-29 Thread Chris Murphy
On Mon, Jun 29, 2020 at 8:24 PM Tom Seewald  wrote:
>
> > https://bugzilla.redhat.com/show_bug.cgi?id=1851783
> >
> > The main argument is that for typical and varied workloads in Fedora,
> > mostly on consumer hardware, we should use mq-deadline scheduler
> > rather than either none or bfq.
> >
> > It may be true most folks with NVMe won't see anything bad with none,
> > but those who have heavier IO workloads are likely to be better off
> > with mq-deadline.
> >
> > Further details are in the bug, but let's discuss it on list. Thanks!g
>
> I'm a little confused by this proposal because last year the author of bfq, 
> Paolo Valente, worked with the Fedora community to switch to bfq by default 
> on non-NVMe drives [1]. Now another kernel developer is telling us that bfq 
> has performance problems that ostensibly aren't being fixed. So my immediate 
> question is: have these problems been reported to Paolo and what has his 
> response been?

Thanks for the background, I'd forgotten about that.

I am seeing only one Fedora bug report upstream (maybe it's a two in
one) and they've been responsive, but it was an oops not a performance
complaint.
https://bugzilla.kernel.org/show_bug.cgi?id=205447

> From what I can tell bfq was chosen because it improved the responsiveness of 
> the desktop, and so I'm curious where it's falling short. Are there 
> performance issues with workloads that Fedora users are running, or have 
> these latency spikes primarily been seen with Facebook's server workloads?

The latter but considering they're a broad variety of workloads I
think it's misleading to call them server workloads as if that's one
particular type of thing, or not applicable to a desktop under IO
pressure. Why? (a) they're using consumer storage devices (b) these
are real workloads rather than simulations (c) even by upstream's own
descriptions of the various IO schedulers only mq-deadline is intended
to be generic. (d) it's really hard to prove anything in this area
without a lot of data.

But fair enough, I'll see about collecting some data before asking to
change the IO scheduler yet again.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-29 Thread Tom Seewald
> https://bugzilla.redhat.com/show_bug.cgi?id=1851783
> 
> The main argument is that for typical and varied workloads in Fedora,
> mostly on consumer hardware, we should use mq-deadline scheduler
> rather than either none or bfq.
> 
> It may be true most folks with NVMe won't see anything bad with none,
> but those who have heavier IO workloads are likely to be better off
> with mq-deadline.
> 
> Further details are in the bug, but let's discuss it on list. Thanks!g

I'm a little confused by this proposal because last year the author of bfq, 
Paolo Valente, worked with the Fedora community to switch to bfq by default on 
non-NVMe drives [1]. Now another kernel developer is telling us that bfq has 
performance problems that ostensibly aren't being fixed. So my immediate 
question is: have these problems been reported to Paolo and what has his 
response been?

From what I can tell bfq was chosen because it improved the responsiveness of 
the desktop, and so I'm curious where it's falling short. Are there performance 
issues with workloads that Fedora users are running, or have these latency 
spikes primarily been seen with Facebook's server workloads?

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1738828
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-29 Thread Chris Murphy
On Mon, Jun 29, 2020 at 4:38 PM Richard Shaw  wrote:
>
> On Mon, Jun 29, 2020 at 4:01 PM Chris Murphy  wrote:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1851783
>>
>> The main argument is that for typical and varied workloads in Fedora,
>> mostly on consumer hardware, we should use mq-deadline scheduler
>> rather than either none or bfq.
>>
>> It may be true most folks with NVMe won't see anything bad with none,
>> but those who have heavier IO workloads are likely to be better off
>> with mq-deadline.
>
>
> How would one go about forcing the scheduler as to experiment to see if there 
> is any perceived difference between them?

# echo 'mq-deadline' > /sys/block/mmcblk0/queue/scheduler
# cat /sys/block/mmcblk0/queue/scheduler

I expect none and mq-deadline come up about the same unless you're
doing concurrent heavy IO tasks, and in that case good chance one of
them gets IO starved if you use none.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: drop bfq scheduler, instead use mq-deadline across the board

2020-06-29 Thread Richard Shaw
On Mon, Jun 29, 2020 at 4:01 PM Chris Murphy 
wrote:

> https://bugzilla.redhat.com/show_bug.cgi?id=1851783
>
> The main argument is that for typical and varied workloads in Fedora,
> mostly on consumer hardware, we should use mq-deadline scheduler
> rather than either none or bfq.
>
> It may be true most folks with NVMe won't see anything bad with none,
> but those who have heavier IO workloads are likely to be better off
> with mq-deadline.
>

How would one go about forcing the scheduler as to experiment to see if
there is any perceived difference between them?

Thanks,
Richard
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org