Re: Switching to MQ by default may generate some bug reports

2017-08-10 Thread Mel Gorman
On Wed, Aug 09, 2017 at 11:49:17PM +0200, Paolo Valente wrote:
> > This discrepancy with your results makes a little bit harder for me to
> > understand how to better proceed, as I see no regression.  Anyway,
> > since this reader-throttling issue seems relevant, I have investigated
> > it a little more in depth.  The cause of the throttling is that the
> > fdatasync frequently performed by the writers in this test turns the
> > I/O of the writers into a 100% sync I/O.  And neither bfq or cfq
> > differentiate bandwidth between sync reads and sync writes.  Basically
> > both cfq and bfq are willing to dispatch the I/O requests of each
> > writer for a time slot equal to that devoted to the reader.  But write
> > requests, after reaching the device, use the latter for much more time
> > than reads.  This delays the completion of the requests of the reader,
> > and, being the I/O sync, the issuing of the next I/O requests by the
> > reader.  The final result is that the device spends most of the time
> > serving write requests, while the reader issues its read requests very
> > slowly.
> > 
> > It might not be so difficult to balance this unfairness, although I'm
> > a little worried about changing bfq without being able to see the
> > regression you report.  In case I give it a try, could I then count on
> > some testing on your machines?
> > 
> 
> Hi Mel,
> I've investigated this test case a little bit more, and the outcome is
> unfortunately rather drastic, unless I'm missing some important point.
> It is impossible to control the rate of the reader with the exact
> configuration of this test. 

Correct, both are simply competing for access to IO. Very broadly speaking,
it's only checking for loose (but not perfect) fairness with different IO
patterns.  While it's not a recent problem, historically (2+ years ago) we
had problems whereby a heavy reader or writer could starve IO completely. It
had odd effects like some multi-threaded benchmarks being artifically good
simply because one thread would dominate and artifically complete faster and
exit prematurely. "Fixing" it had a tendency to help real workloads while
hurting some benchmarks so it's not straight-forward to control for properly.
Bottom line, I'm not necessarily worried if a particular benchmark shows
an apparent regression once I understand why and can convince myself that a
"real" workload benefits from it (preferably proving it).

> In fact, since iodepth is equal to 1, the
> reader issues one I/O request at a time.  When one such request is
> dispatched, after some write requests have already been dispatched
> (and then queued in the device), the time to serve the request is
> controlled only by the device.  The longer the device makes the read
> request wait before being served, the later the reader will see the
> completion of its request, and then the later the reader will issue a
> new request, and so on.  So, for this test, it is mainly the device
> controller to decide the rate of the reader.
> 

Understood. It's less than ideal but not a completely silly test either.
That said, the fio tests are relatively new compared to some of the tests
monitored by mmtests looking for issues. It can take time to finalise a
test configuration before it's giving useful data 100% of the time.

> On the other hand, the scheduler can gain again control of the
> bandwidth of the reader, if the reader issues more than one request at
> a time. 

Ok, I'll take it as a todo item to increase the depth as a depth of 1 is
not that interesting as such. It's also on my todo list to add fio
configs that add think time.

> Anyway, before analyzing this second, controllable case, I
> wanted to test responsiveness with this heavy write workload in the
> background.  And it was very bad!  After some hour of mild panic, I
> found out that this failure depends on a bug in bfq, bug that,
> luckily, happens to be triggered by these heavy writes as a background
> workload ...
> 
> I've already found and am testing a fix for this bug. Yet, it will
> probably take me some week to submit this fix, because I'm finally
> going on vacation.
> 

This is obviously both good and bad. Bad in that the bug exists at all,
good in that you detected it and a fix is possible. I don't think you have
to panic considering that some of the pending fixes include Ming's work
which won't be merged for quite some time and tests take a long time anyway.
Whenever you get around to a fix after your vacation, just cc me and I'll
queue it across a range of machines so you have some independent tests.
A review from me would not be worth much as I haven't spent the time to
fully understand BFQ yet.

If the fixes do not hit until the next merge window or the window after that
then someone who cares enough can do a performance-based -stable backport. If
there are any bugs in the meantime (e.g. after 4.13 comes out) then there
will be a series for the reporter to test. I think it's still reasonably
positi

Re: Switching to MQ by default may generate some bug reports

2017-08-09 Thread Paolo Valente

> Il giorno 08 ago 2017, alle ore 19:33, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 08 ago 2017, alle ore 10:06, Paolo Valente 
>>  ha scritto:
>> 
>>> 
>>> Il giorno 07 ago 2017, alle ore 20:42, Paolo Valente 
>>>  ha scritto:
>>> 
 
 Il giorno 07 ago 2017, alle ore 19:32, Paolo Valente 
  ha scritto:
 
> 
> Il giorno 05 ago 2017, alle ore 00:05, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 04 ago 2017, alle ore 13:01, Mel Gorman 
>>  ha scritto:
>> 
>> On Fri, Aug 04, 2017 at 09:26:20AM +0200, Paolo Valente wrote:
 I took that into account BFQ with low-latency was also tested and the
 impact was not a universal improvement although it can be a noticable
 improvement. From the same machine;
 
 dbench4 Loadfile Execution Time
   4.12.0 4.12.0 
 4.12.0
   legacy-cfq mq-bfq
 mq-bfq-tput
 Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 
 (  -5.00%)
 Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 
 (   4.45%)
 Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 
 ( -10.95%)
 Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 
 (  19.86%)
 
>>> 
>>> Thanks for trying with low_latency disabled.  If I read numbers
>>> correctly, we move from a worst case of 361% higher execution time to
>>> a worst case of 11%.  With a best case of 20% of lower execution time.
>>> 
>> 
>> Yes.
>> 
>>> I asked you about none and mq-deadline in a previous email, because
>>> actually we have a double change here: change of the I/O stack, and
>>> change of the scheduler, with the first change probably not irrelevant
>>> with respect to the second one.
>>> 
>> 
>> True. However, the difference between legacy-deadline mq-deadline is
>> roughly around the 5-10% mark across workloads for SSD. It's not
>> universally true but the impact is not as severe. While this is not
>> proof that the stack change is the sole root cause, it makes it less
>> likely.
>> 
> 
> I'm getting a little lost here.  If I'm not mistaken, you are saying,
> since the difference between two virtually identical schedulers
> (legacy-deadline and mq-deadline) is only around 5-10%, while the
> difference between cfq and mq-bfq-tput is higher, then in the latter
> case it is not the stack's fault.  Yet the loss of mq-bfq-tput in the
> above test is exactly in the 5-10% range?  What am I missing?  Other
> tests with mq-bfq-tput not yet reported?
> 
>>> By chance, according to what you have measured so far, is there any
>>> test where, instead, you expect or have seen bfq-mq-tput to always
>>> lose?  I could start from there.
>>> 
>> 
>> global-dhp__io-fio-randread-async-randwrite-xfs but marginal enough that
>> it could be the stack change.
>> 
>> global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
>> machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
>> ext4 as a filesystem. The same is not true for XFS so the filesystem
>> matters.
>> 
> 
> Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
> soon as I can, thanks.
> 
> 
 
 I've run this test and tried to further investigate this regression.
 For the moment, the gist seems to be that blk-mq plays an important
 role, not only with bfq (unless I'm considering the wrong numbers).
 Even if your main purpose in this thread was just to give a heads-up,
 I guess it may be useful to share what I have found out.  In addition,
 I want to ask for some help, to try to get closer to the possible
 causes of at least this regression.  If you think it would be better
 to open a new thread on this stuff, I'll do it.
 
 First, I got mixed results on my system.  I'll focus only on the the
 case where mq-bfq-tput achieves its worst relative performance w.r.t.
 to cfq, which happens with 64 clients.  Still, also in this case
 mq-bfq is better than cfq in all average values, but Flush.  I don't
 know which are the best/right values to look at, so, here's the final
 report for both schedulers:
 
 CFQ
 
 OperationCountAvgLatMaxLat
 --
 Flush1312020.069   348.594
 Close   133696 0.00814.642
 LockX  512 0.009 0.059
 Rename7552 1.857   415.418
 ReadX   270720 0.141   535.632
 WriteX   89591   421.961  6363.271
 U

Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Mel Gorman
On Tue, Aug 08, 2017 at 07:33:37PM +0200, Paolo Valente wrote:
> > Differently from bfq-sq, setting slice_idle to 0 doesn't provide any
> > benefit, which lets me suspect that there is some other issue in
> > blk-mq (only a suspect).  I think I may have already understood how to
> > guarantee that bfq almost never idles the device uselessly also for
> > this workload.  Yet, since in blk-mq there is no gain even after
> > excluding useless idling, I'll wait for at least Ming's patches to be
> > merged before possibly proposing this contribution.  Maybe some other
> > little issue related to this lack of gain in blk-mq will be found and
> > solved in the meantime.
> > 
> > Moving to the read-write unfairness problem.
> > 
> 
> I've reproduced the unfairness issue (rand reader throttled by heavy
> writers) with bfq, using
> configs/config-global-dhp__io-fio-randread-sync-heavywrite, but with
> an important side problem: cfq suffers from exactly the same
> unfairness (785kB/s writers, 13.4kB/s reader).  Of course, this
> happens in my system, with a HITACHI HTS727550A9E364.
> 

It's interesting that CFQ suffers the same on your system. It's possible
that this is down to luck and the results depend not only on the disk but
the number of CPUs. At absolute minimum we saw different latency figures
from dbench even if the only observation s "different machines behave
differently, news at 11". If the results are inconsistent, then the value of
the benchmark can be dropped as a basis of comparison between IO schedulers
(although I'll be keeping it for detecting regressions between releases).

When the v4 results from Ming's patches complete, I'll double check the
results from this config.

> This discrepancy with your results makes a little bit harder for me to
> understand how to better proceed, as I see no regression.  Anyway,
> since this reader-throttling issue seems relevant, I have investigated
> it a little more in depth.  The cause of the throttling is that the
> fdatasync frequently performed by the writers in this test turns the
> I/O of the writers into a 100% sync I/O.  And neither bfq or cfq
> differentiate bandwidth between sync reads and sync writes.  Basically
> both cfq and bfq are willing to dispatch the I/O requests of each
> writer for a time slot equal to that devoted to the reader.  But write
> requests, after reaching the device, use the latter for much more time
> than reads.  This delays the completion of the requests of the reader,
> and, being the I/O sync, the issuing of the next I/O requests by the
> reader.  The final result is that the device spends most of the time
> serving write requests, while the reader issues its read requests very
> slowly.
> 

That is certainly plausible and implies that the actual results depend
too heavily on random timing factors and disk model to be really useful.

> It might not be so difficult to balance this unfairness, although I'm
> a little worried about changing bfq without being able to see the
> regression you report.  In case I give it a try, could I then count on
> some testing on your machines?
> 

Yes with the caveat that results take a variable amount of time depending
on how many problems I'm juggling in the air and how many of them are
occupying time on the machines.

-- 
Mel Gorman
SUSE Labs


Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Paolo Valente

> Il giorno 08 ago 2017, alle ore 10:06, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 07 ago 2017, alle ore 20:42, Paolo Valente 
>>  ha scritto:
>> 
>>> 
>>> Il giorno 07 ago 2017, alle ore 19:32, Paolo Valente 
>>>  ha scritto:
>>> 
 
 Il giorno 05 ago 2017, alle ore 00:05, Paolo Valente 
  ha scritto:
 
> 
> Il giorno 04 ago 2017, alle ore 13:01, Mel Gorman 
>  ha scritto:
> 
> On Fri, Aug 04, 2017 at 09:26:20AM +0200, Paolo Valente wrote:
>>> I took that into account BFQ with low-latency was also tested and the
>>> impact was not a universal improvement although it can be a noticable
>>> improvement. From the same machine;
>>> 
>>> dbench4 Loadfile Execution Time
>>>4.12.0 4.12.0 
>>> 4.12.0
>>>legacy-cfq mq-bfq
>>> mq-bfq-tput
>>> Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 
>>> (  -5.00%)
>>> Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 
>>> (   4.45%)
>>> Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 
>>> ( -10.95%)
>>> Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 
>>> (  19.86%)
>>> 
>> 
>> Thanks for trying with low_latency disabled.  If I read numbers
>> correctly, we move from a worst case of 361% higher execution time to
>> a worst case of 11%.  With a best case of 20% of lower execution time.
>> 
> 
> Yes.
> 
>> I asked you about none and mq-deadline in a previous email, because
>> actually we have a double change here: change of the I/O stack, and
>> change of the scheduler, with the first change probably not irrelevant
>> with respect to the second one.
>> 
> 
> True. However, the difference between legacy-deadline mq-deadline is
> roughly around the 5-10% mark across workloads for SSD. It's not
> universally true but the impact is not as severe. While this is not
> proof that the stack change is the sole root cause, it makes it less
> likely.
> 
 
 I'm getting a little lost here.  If I'm not mistaken, you are saying,
 since the difference between two virtually identical schedulers
 (legacy-deadline and mq-deadline) is only around 5-10%, while the
 difference between cfq and mq-bfq-tput is higher, then in the latter
 case it is not the stack's fault.  Yet the loss of mq-bfq-tput in the
 above test is exactly in the 5-10% range?  What am I missing?  Other
 tests with mq-bfq-tput not yet reported?
 
>> By chance, according to what you have measured so far, is there any
>> test where, instead, you expect or have seen bfq-mq-tput to always
>> lose?  I could start from there.
>> 
> 
> global-dhp__io-fio-randread-async-randwrite-xfs but marginal enough that
> it could be the stack change.
> 
> global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
> machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
> ext4 as a filesystem. The same is not true for XFS so the filesystem
> matters.
> 
 
 Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
 soon as I can, thanks.
 
 
>>> 
>>> I've run this test and tried to further investigate this regression.
>>> For the moment, the gist seems to be that blk-mq plays an important
>>> role, not only with bfq (unless I'm considering the wrong numbers).
>>> Even if your main purpose in this thread was just to give a heads-up,
>>> I guess it may be useful to share what I have found out.  In addition,
>>> I want to ask for some help, to try to get closer to the possible
>>> causes of at least this regression.  If you think it would be better
>>> to open a new thread on this stuff, I'll do it.
>>> 
>>> First, I got mixed results on my system.  I'll focus only on the the
>>> case where mq-bfq-tput achieves its worst relative performance w.r.t.
>>> to cfq, which happens with 64 clients.  Still, also in this case
>>> mq-bfq is better than cfq in all average values, but Flush.  I don't
>>> know which are the best/right values to look at, so, here's the final
>>> report for both schedulers:
>>> 
>>> CFQ
>>> 
>>> OperationCountAvgLatMaxLat
>>> --
>>> Flush1312020.069   348.594
>>> Close   133696 0.00814.642
>>> LockX  512 0.009 0.059
>>> Rename7552 1.857   415.418
>>> ReadX   270720 0.141   535.632
>>> WriteX   89591   421.961  6363.271
>>> Unlink   34048 1.281   662.467
>>> UnlockX512 0.007 0.057
>>> FIND_FIRST   62016 0.08625.060
>>> SET_FILE_INFORMATION 

Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Paolo Valente

> Il giorno 08 ago 2017, alle ore 12:30, Mel Gorman 
>  ha scritto:
> 
> On Mon, Aug 07, 2017 at 07:32:41PM +0200, Paolo Valente wrote:
 global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
 machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
 ext4 as a filesystem. The same is not true for XFS so the filesystem
 matters.
 
>>> 
>>> Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
>>> soon as I can, thanks.
>>> 
>>> 
>> 
>> I've run this test and tried to further investigate this regression.
>> For the moment, the gist seems to be that blk-mq plays an important
>> role, not only with bfq (unless I'm considering the wrong numbers).
>> Even if your main purpose in this thread was just to give a heads-up,
>> I guess it may be useful to share what I have found out.  In addition,
>> I want to ask for some help, to try to get closer to the possible
>> causes of at least this regression.  If you think it would be better
>> to open a new thread on this stuff, I'll do it.
>> 
> 
> I don't think it's necessary unless Christoph or Jens object and I doubt
> they will.
> 
>> First, I got mixed results on my system. 
> 
> For what it's worth, this is standard. In my experience, IO benchmarks
> are always multi-modal, particularly on rotary storage. Cases of universal
> win or universal loss for a scheduler or set of tuning are rare.
> 
>> I'll focus only on the the
>> case where mq-bfq-tput achieves its worst relative performance w.r.t.
>> to cfq, which happens with 64 clients.  Still, also in this case
>> mq-bfq is better than cfq in all average values, but Flush.  I don't
>> know which are the best/right values to look at, so, here's the final
>> report for both schedulers:
>> 
> 
> For what it's worth, it has often been observed that dbench overall
> performance was dominated by flush costs. This is also true for the
> standard reported throughput figures rather than the modified load file
> elapsed time that mmtests reports. In dbench3 it was even worse where the
> "performance" was dominated by whether the temporary files were deleted
> before writeback started.
> 
>> CFQ
>> 
>> OperationCountAvgLatMaxLat
>> --
>> Flush1312020.069   348.594
>> Close   133696 0.00814.642
>> LockX  512 0.009 0.059
>> Rename7552 1.857   415.418
>> ReadX   270720 0.141   535.632
>> WriteX   89591   421.961  6363.271
>> Unlink   34048 1.281   662.467
>> UnlockX512 0.007 0.057
>> FIND_FIRST   62016 0.08625.060
>> SET_FILE_INFORMATION 15616 0.995   176.621
>> QUERY_FILE_INFORMATION   28734 0.004 1.372
>> QUERY_PATH_INFORMATION  170240 0.163   820.292
>> QUERY_FS_INFORMATION 28736 0.017 4.110
>> NTCreateX   178688 0.437   905.567
>> 
>> MQ-BFQ-TPUT
>> 
>> OperationCountAvgLatMaxLat
>> --
>> Flush1350475.828 11196.035
>> Close   136896 0.004 3.855
>> LockX  640 0.005 0.031
>> Rename8064 1.020   288.989
>> ReadX   297600 0.081   685.850
>> WriteX   93515   391.637 12681.517
>> Unlink   34880 0.500   146.928
>> UnlockX640 0.004 0.032
>> FIND_FIRST   63680 0.045   222.491
>> SET_FILE_INFORMATION 16000 0.436   686.115
>> QUERY_FILE_INFORMATION   30464 0.003 0.773
>> QUERY_PATH_INFORMATION  175552 0.044   148.449
>> QUERY_FS_INFORMATION 29888 0.009 1.984
>> NTCreateX   183152 0.289   300.867
>> 
>> Are these results in line with yours for this test?
>> 
> 
> Very broadly speaking yes, but it varies. On a small machine, the differences
> in flush latency are visible but not as dramatic. It only has a few
> CPUs. On a machine that tops out with 32 CPUs, it is more noticable. On
> the one machine I have that topped out with CFQ/BFQ at 64 threads, the
> latency of flush is vaguely similar
> 
>   CFQ BFQ BFQ-TPUT
> latency   avg-Flush-64287.05  ( 0.00%)389.14  ( -35.57%)  
> 349.90  ( -21.90%)
> latency   avg-Close-640.00( 0.00%)0.00( -33.33%)  
> 0.00( 0.00%)
> latency   avg-LockX-640.01( 0.00%)0.01( -16.67%)  
> 0.01( 0.00%)
> latency   avg-Rename-64   0.18( 0.00%)0.21( -16.39%)  
> 0.18( 3.28%)
> latency   avg-ReadX-640.10( 0.00%)0.15( -40.95%)  
> 0.15( -40.95%)
> latency   avg-WriteX-64   0.86( 0.00%)0.81( 6.18%)
> 0

Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Mel Gorman
On Tue, Aug 08, 2017 at 07:49:53PM +0800, Ming Lei wrote:
> On Tue, Aug 8, 2017 at 7:27 PM, Mel Gorman  
> wrote:
> > On Tue, Aug 08, 2017 at 06:43:03PM +0800, Ming Lei wrote:
> >> Hi Mel Gorman,
> >>
> >> On Tue, Aug 8, 2017 at 6:30 PM, Mel Gorman  
> >> wrote:
> >> 
> >> >
> >> > o I've queued a subset of tests with Ming's v3 patchset as that was the
> >> >   latest branch at the time I looked. It'll take quite some time to 
> >> > execute
> >> >   as the grid I use to collect data is backlogged with other work
> >>
> >> The latest patchset is in the following post:
> >>
> >>   http://marc.info/?l=linux-block&m=150191624318513&w=2
> >>
> >> And you can find it in my github:
> >>
> >>   https://github.com/ming1/linux/commits/blk-mq-dispatch_for_scsi.V4
> >>
> >
> > Unfortunately, the tests were queued last Friday and are partially complete
> > depending on when machines become available. As it is, v3 will take a few
> > days to complete and a requeue would incur further delays. If you believe
> > the results will be substantially different then I'll discard v3 and 
> > requeue.
> 
> Firstly V3 on github(never posted out) causes boot hang if CPU cores is >= 16,
> so you need to check if the test is still running, :-(
> 

By co-incidence, the few machines that have completed had core counts
below this so I'll discard existing results and requeue.

Thanks.

-- 
Mel Gorman
SUSE Labs


Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Ming Lei
On Tue, Aug 8, 2017 at 7:27 PM, Mel Gorman  wrote:
> On Tue, Aug 08, 2017 at 06:43:03PM +0800, Ming Lei wrote:
>> Hi Mel Gorman,
>>
>> On Tue, Aug 8, 2017 at 6:30 PM, Mel Gorman  
>> wrote:
>> 
>> >
>> > o I've queued a subset of tests with Ming's v3 patchset as that was the
>> >   latest branch at the time I looked. It'll take quite some time to execute
>> >   as the grid I use to collect data is backlogged with other work
>>
>> The latest patchset is in the following post:
>>
>>   http://marc.info/?l=linux-block&m=150191624318513&w=2
>>
>> And you can find it in my github:
>>
>>   https://github.com/ming1/linux/commits/blk-mq-dispatch_for_scsi.V4
>>
>
> Unfortunately, the tests were queued last Friday and are partially complete
> depending on when machines become available. As it is, v3 will take a few
> days to complete and a requeue would incur further delays. If you believe
> the results will be substantially different then I'll discard v3 and requeue.

Firstly V3 on github(never posted out) causes boot hang if CPU cores is >= 16,
so you need to check if the test is still running, :-(

Also V3 on github may not perform well on IB SRP(or other low latency
SCSI disk), so
I improve bio merge in V4 and make IB SRP's perf better too, and it depends on
devices.

I suggest to focus on V2 posted in mail list(V4 in github).

-- 
Ming Lei


Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Mel Gorman
On Tue, Aug 08, 2017 at 06:43:03PM +0800, Ming Lei wrote:
> Hi Mel Gorman,
> 
> On Tue, Aug 8, 2017 at 6:30 PM, Mel Gorman  
> wrote:
> 
> >
> > o I've queued a subset of tests with Ming's v3 patchset as that was the
> >   latest branch at the time I looked. It'll take quite some time to execute
> >   as the grid I use to collect data is backlogged with other work
> 
> The latest patchset is in the following post:
> 
>   http://marc.info/?l=linux-block&m=150191624318513&w=2
> 
> And you can find it in my github:
> 
>   https://github.com/ming1/linux/commits/blk-mq-dispatch_for_scsi.V4
> 

Unfortunately, the tests were queued last Friday and are partially complete
depending on when machines become available. As it is, v3 will take a few
days to complete and a requeue would incur further delays. If you believe
the results will be substantially different then I'll discard v3 and requeue.

-- 
Mel Gorman
SUSE Labs


Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Ming Lei
Hi Mel Gorman,

On Tue, Aug 8, 2017 at 6:30 PM, Mel Gorman  wrote:

>
> o I've queued a subset of tests with Ming's v3 patchset as that was the
>   latest branch at the time I looked. It'll take quite some time to execute
>   as the grid I use to collect data is backlogged with other work

The latest patchset is in the following post:

  http://marc.info/?l=linux-block&m=150191624318513&w=2

And you can find it in my github:

  https://github.com/ming1/linux/commits/blk-mq-dispatch_for_scsi.V4

-- 
Ming Lei


Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Mel Gorman
On Mon, Aug 07, 2017 at 07:32:41PM +0200, Paolo Valente wrote:
> >> global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
> >> machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
> >> ext4 as a filesystem. The same is not true for XFS so the filesystem
> >> matters.
> >> 
> > 
> > Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
> > soon as I can, thanks.
> > 
> > 
> 
> I've run this test and tried to further investigate this regression.
> For the moment, the gist seems to be that blk-mq plays an important
> role, not only with bfq (unless I'm considering the wrong numbers).
> Even if your main purpose in this thread was just to give a heads-up,
> I guess it may be useful to share what I have found out.  In addition,
> I want to ask for some help, to try to get closer to the possible
> causes of at least this regression.  If you think it would be better
> to open a new thread on this stuff, I'll do it.
> 

I don't think it's necessary unless Christoph or Jens object and I doubt
they will.

> First, I got mixed results on my system. 

For what it's worth, this is standard. In my experience, IO benchmarks
are always multi-modal, particularly on rotary storage. Cases of universal
win or universal loss for a scheduler or set of tuning are rare.

> I'll focus only on the the
> case where mq-bfq-tput achieves its worst relative performance w.r.t.
> to cfq, which happens with 64 clients.  Still, also in this case
> mq-bfq is better than cfq in all average values, but Flush.  I don't
> know which are the best/right values to look at, so, here's the final
> report for both schedulers:
> 

For what it's worth, it has often been observed that dbench overall
performance was dominated by flush costs. This is also true for the
standard reported throughput figures rather than the modified load file
elapsed time that mmtests reports. In dbench3 it was even worse where the
"performance" was dominated by whether the temporary files were deleted
before writeback started.

> CFQ
> 
>  OperationCountAvgLatMaxLat
>  --
>  Flush1312020.069   348.594
>  Close   133696 0.00814.642
>  LockX  512 0.009 0.059
>  Rename7552 1.857   415.418
>  ReadX   270720 0.141   535.632
>  WriteX   89591   421.961  6363.271
>  Unlink   34048 1.281   662.467
>  UnlockX512 0.007 0.057
>  FIND_FIRST   62016 0.08625.060
>  SET_FILE_INFORMATION 15616 0.995   176.621
>  QUERY_FILE_INFORMATION   28734 0.004 1.372
>  QUERY_PATH_INFORMATION  170240 0.163   820.292
>  QUERY_FS_INFORMATION 28736 0.017 4.110
>  NTCreateX   178688 0.437   905.567
> 
> MQ-BFQ-TPUT
> 
> OperationCountAvgLatMaxLat
>  --
>  Flush1350475.828 11196.035
>  Close   136896 0.004 3.855
>  LockX  640 0.005 0.031
>  Rename8064 1.020   288.989
>  ReadX   297600 0.081   685.850
>  WriteX   93515   391.637 12681.517
>  Unlink   34880 0.500   146.928
>  UnlockX640 0.004 0.032
>  FIND_FIRST   63680 0.045   222.491
>  SET_FILE_INFORMATION 16000 0.436   686.115
>  QUERY_FILE_INFORMATION   30464 0.003 0.773
>  QUERY_PATH_INFORMATION  175552 0.044   148.449
>  QUERY_FS_INFORMATION 29888 0.009 1.984
>  NTCreateX   183152 0.289   300.867
> 
> Are these results in line with yours for this test?
> 

Very broadly speaking yes, but it varies. On a small machine, the differences
in flush latency are visible but not as dramatic. It only has a few
CPUs. On a machine that tops out with 32 CPUs, it is more noticable. On
the one machine I have that topped out with CFQ/BFQ at 64 threads, the
latency of flush is vaguely similar

CFQ BFQ BFQ-TPUT
latency avg-Flush-64287.05  ( 0.00%)389.14  ( -35.57%)  349.90  
( -21.90%)
latency avg-Close-640.00( 0.00%)0.00( -33.33%)  0.00
( 0.00%)
latency avg-LockX-640.01( 0.00%)0.01( -16.67%)  0.01
( 0.00%)
latency avg-Rename-64   0.18( 0.00%)0.21( -16.39%)  0.18
( 3.28%)
latency avg-ReadX-640.10( 0.00%)0.15( -40.95%)  0.15
( -40.95%)
latency avg-WriteX-64   0.86( 0.00%)0.81( 6.18%)0.74
( 13.75%)
latency avg-Unlink-64   1.49( 0.00%)1.52( -2.28%)   1.14
( 23.69%)
latency avg-UnlockX-64  0.00( 0.00%)0.00( 0.00%)0.00
( 0.00%)
latency avg

Re: Switching to MQ by default may generate some bug reports

2017-08-08 Thread Paolo Valente

> Il giorno 07 ago 2017, alle ore 20:42, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 07 ago 2017, alle ore 19:32, Paolo Valente 
>>  ha scritto:
>> 
>>> 
>>> Il giorno 05 ago 2017, alle ore 00:05, Paolo Valente 
>>>  ha scritto:
>>> 
 
 Il giorno 04 ago 2017, alle ore 13:01, Mel Gorman 
  ha scritto:
 
 On Fri, Aug 04, 2017 at 09:26:20AM +0200, Paolo Valente wrote:
>> I took that into account BFQ with low-latency was also tested and the
>> impact was not a universal improvement although it can be a noticable
>> improvement. From the same machine;
>> 
>> dbench4 Loadfile Execution Time
>> 4.12.0 4.12.0 
>> 4.12.0
>> legacy-cfq mq-bfq
>> mq-bfq-tput
>> Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 ( 
>>  -5.00%)
>> Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 ( 
>>   4.45%)
>> Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 ( 
>> -10.95%)
>> Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 ( 
>>  19.86%)
>> 
> 
> Thanks for trying with low_latency disabled.  If I read numbers
> correctly, we move from a worst case of 361% higher execution time to
> a worst case of 11%.  With a best case of 20% of lower execution time.
> 
 
 Yes.
 
> I asked you about none and mq-deadline in a previous email, because
> actually we have a double change here: change of the I/O stack, and
> change of the scheduler, with the first change probably not irrelevant
> with respect to the second one.
> 
 
 True. However, the difference between legacy-deadline mq-deadline is
 roughly around the 5-10% mark across workloads for SSD. It's not
 universally true but the impact is not as severe. While this is not
 proof that the stack change is the sole root cause, it makes it less
 likely.
 
>>> 
>>> I'm getting a little lost here.  If I'm not mistaken, you are saying,
>>> since the difference between two virtually identical schedulers
>>> (legacy-deadline and mq-deadline) is only around 5-10%, while the
>>> difference between cfq and mq-bfq-tput is higher, then in the latter
>>> case it is not the stack's fault.  Yet the loss of mq-bfq-tput in the
>>> above test is exactly in the 5-10% range?  What am I missing?  Other
>>> tests with mq-bfq-tput not yet reported?
>>> 
> By chance, according to what you have measured so far, is there any
> test where, instead, you expect or have seen bfq-mq-tput to always
> lose?  I could start from there.
> 
 
 global-dhp__io-fio-randread-async-randwrite-xfs but marginal enough that
 it could be the stack change.
 
 global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
 machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
 ext4 as a filesystem. The same is not true for XFS so the filesystem
 matters.
 
>>> 
>>> Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
>>> soon as I can, thanks.
>>> 
>>> 
>> 
>> I've run this test and tried to further investigate this regression.
>> For the moment, the gist seems to be that blk-mq plays an important
>> role, not only with bfq (unless I'm considering the wrong numbers).
>> Even if your main purpose in this thread was just to give a heads-up,
>> I guess it may be useful to share what I have found out.  In addition,
>> I want to ask for some help, to try to get closer to the possible
>> causes of at least this regression.  If you think it would be better
>> to open a new thread on this stuff, I'll do it.
>> 
>> First, I got mixed results on my system.  I'll focus only on the the
>> case where mq-bfq-tput achieves its worst relative performance w.r.t.
>> to cfq, which happens with 64 clients.  Still, also in this case
>> mq-bfq is better than cfq in all average values, but Flush.  I don't
>> know which are the best/right values to look at, so, here's the final
>> report for both schedulers:
>> 
>> CFQ
>> 
>> OperationCountAvgLatMaxLat
>> --
>> Flush1312020.069   348.594
>> Close   133696 0.00814.642
>> LockX  512 0.009 0.059
>> Rename7552 1.857   415.418
>> ReadX   270720 0.141   535.632
>> WriteX   89591   421.961  6363.271
>> Unlink   34048 1.281   662.467
>> UnlockX512 0.007 0.057
>> FIND_FIRST   62016 0.08625.060
>> SET_FILE_INFORMATION 15616 0.995   176.621
>> QUERY_FILE_INFORMATION   28734 0.004 1.372
>> QUERY_PATH_INFORMATION  170240 0.163   820.292
>> QUERY_FS_INFORMATION 28736 0.017 4.110

Re: Switching to MQ by default may generate some bug reports

2017-08-07 Thread Paolo Valente

> Il giorno 07 ago 2017, alle ore 19:32, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 05 ago 2017, alle ore 00:05, Paolo Valente 
>>  ha scritto:
>> 
>>> 
>>> Il giorno 04 ago 2017, alle ore 13:01, Mel Gorman 
>>>  ha scritto:
>>> 
>>> On Fri, Aug 04, 2017 at 09:26:20AM +0200, Paolo Valente wrote:
> I took that into account BFQ with low-latency was also tested and the
> impact was not a universal improvement although it can be a noticable
> improvement. From the same machine;
> 
> dbench4 Loadfile Execution Time
>  4.12.0 4.12.0 
> 4.12.0
>  legacy-cfq mq-bfq
> mq-bfq-tput
> Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 (  
> -5.00%)
> Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 (  
>  4.45%)
> Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 ( 
> -10.95%)
> Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 (  
> 19.86%)
> 
 
 Thanks for trying with low_latency disabled.  If I read numbers
 correctly, we move from a worst case of 361% higher execution time to
 a worst case of 11%.  With a best case of 20% of lower execution time.
 
>>> 
>>> Yes.
>>> 
 I asked you about none and mq-deadline in a previous email, because
 actually we have a double change here: change of the I/O stack, and
 change of the scheduler, with the first change probably not irrelevant
 with respect to the second one.
 
>>> 
>>> True. However, the difference between legacy-deadline mq-deadline is
>>> roughly around the 5-10% mark across workloads for SSD. It's not
>>> universally true but the impact is not as severe. While this is not
>>> proof that the stack change is the sole root cause, it makes it less
>>> likely.
>>> 
>> 
>> I'm getting a little lost here.  If I'm not mistaken, you are saying,
>> since the difference between two virtually identical schedulers
>> (legacy-deadline and mq-deadline) is only around 5-10%, while the
>> difference between cfq and mq-bfq-tput is higher, then in the latter
>> case it is not the stack's fault.  Yet the loss of mq-bfq-tput in the
>> above test is exactly in the 5-10% range?  What am I missing?  Other
>> tests with mq-bfq-tput not yet reported?
>> 
 By chance, according to what you have measured so far, is there any
 test where, instead, you expect or have seen bfq-mq-tput to always
 lose?  I could start from there.
 
>>> 
>>> global-dhp__io-fio-randread-async-randwrite-xfs but marginal enough that
>>> it could be the stack change.
>>> 
>>> global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
>>> machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
>>> ext4 as a filesystem. The same is not true for XFS so the filesystem
>>> matters.
>>> 
>> 
>> Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
>> soon as I can, thanks.
>> 
>> 
> 
> I've run this test and tried to further investigate this regression.
> For the moment, the gist seems to be that blk-mq plays an important
> role, not only with bfq (unless I'm considering the wrong numbers).
> Even if your main purpose in this thread was just to give a heads-up,
> I guess it may be useful to share what I have found out.  In addition,
> I want to ask for some help, to try to get closer to the possible
> causes of at least this regression.  If you think it would be better
> to open a new thread on this stuff, I'll do it.
> 
> First, I got mixed results on my system.  I'll focus only on the the
> case where mq-bfq-tput achieves its worst relative performance w.r.t.
> to cfq, which happens with 64 clients.  Still, also in this case
> mq-bfq is better than cfq in all average values, but Flush.  I don't
> know which are the best/right values to look at, so, here's the final
> report for both schedulers:
> 
> CFQ
> 
> OperationCountAvgLatMaxLat
> --
> Flush1312020.069   348.594
> Close   133696 0.00814.642
> LockX  512 0.009 0.059
> Rename7552 1.857   415.418
> ReadX   270720 0.141   535.632
> WriteX   89591   421.961  6363.271
> Unlink   34048 1.281   662.467
> UnlockX512 0.007 0.057
> FIND_FIRST   62016 0.08625.060
> SET_FILE_INFORMATION 15616 0.995   176.621
> QUERY_FILE_INFORMATION   28734 0.004 1.372
> QUERY_PATH_INFORMATION  170240 0.163   820.292
> QUERY_FS_INFORMATION 28736 0.017 4.110
> NTCreateX   178688 0.437   905.567
> 
> MQ-BFQ-TPUT
> 
> OperationCountAvgLatMaxLat
> --
> Flush 

Re: Switching to MQ by default may generate some bug reports

2017-08-07 Thread Paolo Valente

> Il giorno 05 ago 2017, alle ore 13:54, Mel Gorman 
>  ha scritto:
> ...
> 
>> In addition, as for coverage, we made the empiric assumption that
>> start-up time measured with each of the above easy-to-benchmark
>> applications gives an idea of the time that it would take with any
>> application of the same size and complexity.  User feedback confirmed
>> this assumptions so far.  Of course there may well be exceptions.
>> 
> 
> FWIW, I also have anecdotal evidence from at least one user that using
> BFQ is way better on their desktop than CFQ ever was even under the best
> of circumstances. I've had problems directly measuring it empirically but
> this was also the first time I switched on BFQ to see what fell out so
> it's early days yet.
> 

Yeah, I'm constantly trying (without great success so far :) ) to turn
this folklore into shared, repeatable tests and numbers.  The latter
could then be reliably evaluated, questioned or defended.

Thanks,
Paolo

> -- 
> Mel Gorman
> SUSE Labs



Re: Switching to MQ by default may generate some bug reports

2017-08-07 Thread Paolo Valente

> Il giorno 05 ago 2017, alle ore 00:05, Paolo Valente 
>  ha scritto:
> 
>> 
>> Il giorno 04 ago 2017, alle ore 13:01, Mel Gorman 
>>  ha scritto:
>> 
>> On Fri, Aug 04, 2017 at 09:26:20AM +0200, Paolo Valente wrote:
 I took that into account BFQ with low-latency was also tested and the
 impact was not a universal improvement although it can be a noticable
 improvement. From the same machine;
 
 dbench4 Loadfile Execution Time
   4.12.0 4.12.0 
 4.12.0
   legacy-cfq mq-bfq
 mq-bfq-tput
 Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 (  
 -5.00%)
 Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 (   
 4.45%)
 Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 ( 
 -10.95%)
 Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 (  
 19.86%)
 
>>> 
>>> Thanks for trying with low_latency disabled.  If I read numbers
>>> correctly, we move from a worst case of 361% higher execution time to
>>> a worst case of 11%.  With a best case of 20% of lower execution time.
>>> 
>> 
>> Yes.
>> 
>>> I asked you about none and mq-deadline in a previous email, because
>>> actually we have a double change here: change of the I/O stack, and
>>> change of the scheduler, with the first change probably not irrelevant
>>> with respect to the second one.
>>> 
>> 
>> True. However, the difference between legacy-deadline mq-deadline is
>> roughly around the 5-10% mark across workloads for SSD. It's not
>> universally true but the impact is not as severe. While this is not
>> proof that the stack change is the sole root cause, it makes it less
>> likely.
>> 
> 
> I'm getting a little lost here.  If I'm not mistaken, you are saying,
> since the difference between two virtually identical schedulers
> (legacy-deadline and mq-deadline) is only around 5-10%, while the
> difference between cfq and mq-bfq-tput is higher, then in the latter
> case it is not the stack's fault.  Yet the loss of mq-bfq-tput in the
> above test is exactly in the 5-10% range?  What am I missing?  Other
> tests with mq-bfq-tput not yet reported?
> 
>>> By chance, according to what you have measured so far, is there any
>>> test where, instead, you expect or have seen bfq-mq-tput to always
>>> lose?  I could start from there.
>>> 
>> 
>> global-dhp__io-fio-randread-async-randwrite-xfs but marginal enough that
>> it could be the stack change.
>> 
>> global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
>> machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
>> ext4 as a filesystem. The same is not true for XFS so the filesystem
>> matters.
>> 
> 
> Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
> soon as I can, thanks.
> 
> 

I've run this test and tried to further investigate this regression.
For the moment, the gist seems to be that blk-mq plays an important
role, not only with bfq (unless I'm considering the wrong numbers).
Even if your main purpose in this thread was just to give a heads-up,
I guess it may be useful to share what I have found out.  In addition,
I want to ask for some help, to try to get closer to the possible
causes of at least this regression.  If you think it would be better
to open a new thread on this stuff, I'll do it.

First, I got mixed results on my system.  I'll focus only on the the
case where mq-bfq-tput achieves its worst relative performance w.r.t.
to cfq, which happens with 64 clients.  Still, also in this case
mq-bfq is better than cfq in all average values, but Flush.  I don't
know which are the best/right values to look at, so, here's the final
report for both schedulers:

CFQ

 OperationCountAvgLatMaxLat
 --
 Flush1312020.069   348.594
 Close   133696 0.00814.642
 LockX  512 0.009 0.059
 Rename7552 1.857   415.418
 ReadX   270720 0.141   535.632
 WriteX   89591   421.961  6363.271
 Unlink   34048 1.281   662.467
 UnlockX512 0.007 0.057
 FIND_FIRST   62016 0.08625.060
 SET_FILE_INFORMATION 15616 0.995   176.621
 QUERY_FILE_INFORMATION   28734 0.004 1.372
 QUERY_PATH_INFORMATION  170240 0.163   820.292
 QUERY_FS_INFORMATION 28736 0.017 4.110
 NTCreateX   178688 0.437   905.567

MQ-BFQ-TPUT

OperationCountAvgLatMaxLat
 --
 Flush1350475.828 11196.035
 Close   136896 0.004 3.855
 LockX  640 0.005 0.031
 Rename8064 1.020   288.989
 ReadX  

Re: Switching to MQ by default may generate some bug reports

2017-08-05 Thread Mel Gorman
On Sat, Aug 05, 2017 at 12:05:00AM +0200, Paolo Valente wrote:
> > 
> > True. However, the difference between legacy-deadline mq-deadline is
> > roughly around the 5-10% mark across workloads for SSD. It's not
> > universally true but the impact is not as severe. While this is not
> > proof that the stack change is the sole root cause, it makes it less
> > likely.
> > 
> 
> I'm getting a little lost here.  If I'm not mistaken, you are saying,
> since the difference between two virtually identical schedulers
> (legacy-deadline and mq-deadline) is only around 5-10%, while the
> difference between cfq and mq-bfq-tput is higher, then in the latter
> case it is not the stack's fault.  Yet the loss of mq-bfq-tput in the
> above test is exactly in the 5-10% range?  What am I missing?  Other
> tests with mq-bfq-tput not yet reported?
> 

Unfortunately it's due to very broad generalisations. 10 configurations
from mmtests were used in total when I was checking this. Multiply those by
4 for each tested filesystem and then multiply again for each io scheduler
on a total of 7 machines taking 3-4 weeks to execute all tests. The deltas
between each configuration on different machines varies a lot. It also
is an impractical amount of information to present and discuss and the
point of the original mail was to highlight that switching the default
may create some bug reports so as not be too surprised or panic.

The general trend observed was that legacy-deadline vs mq-deadline generally
showed a small regression switching to mq-deadline but it was not universal
and it wasn't consistent. If nothing else, IO tests that are borderline
are difficult to test for significance as distributions are multimodal.
However, it was generally close enough to conclude "this could be tolerated
and more mq work is on the way". However, it's impossible to give a precise
range of how much of a hit it would take but it generally seemed to be
around the 5% mark.

CFQ switching to BFQ was often more dramatic. Sometimes it doesn't really
matter and sometimes turning off low_latency helped enough. bonnie, which
is a single IO issuer didn't show much differences in throughput. It had
a few problems with file create/delete but the absolute times there are
so small that tiny differences look relatively large and were ignored.
For the moment, I'll be temporarily ignoring bonnie because it was a
sniff-test only and I didn't expect many surprises from a single IO issuer.

The workload that cropped up as being most alarming was dbench was is ironic
given that it's not actually that IO intensive and tends to be limited by
fsync times. The benchmark has a number of other weaknesses.  It's more
often dominated by scheduler performance, can be gamed by starving all
but one threads from IO to give "better" results and is sensitive to the
exact timing of when writeback occurs which mmtests tries to mitigate by
reducing the loadfile size. If it turns out that it's the only benchmark
that really suffers then I think we would live with or find ways of tuning
around it but fio concerned me.

The fio ones were a concern because of different read/write throughputs
and the fact it was not consistent read or write that was favoured. These
changes are not necessary good or bad but I've seen in the past that writes
that get starved tend to impact workloads that periodically fsync dirty
data (think databases) and had to be tuned by reducing dirty_ratio. I've
also seen cases where syncing of metadata on some filesystems would cause
large stalls if there was a lot of write starvation. I regretted not adding
pgioperf (basic simulator of postgres IO behaviour) to the original set
of tests because it tends to be very good at detecting fsync stalls due
to write starvation.

> > 
> > Sure, but if during those handful of seconds the throughput is 10% of
> > what is used to be, it'll still be noticeable.
> > 
> 
> I did not have the time yet to repeat this test (I will try soon), but
> I had the time think about it a little bit.  And I soon realized that
> actually this is not a responsiveness test against background
> workload, or, it is at most an extreme corner case for it.  Both the
> write and the read thread start at the same time.  So, we are
> mimicking a user starting, e.g., a file copy, and, exactly at the same
> time, an app(in addition, the file copy starts to cause heavy writes
> immediately).
> 

Yes, although it's not entirely unrealistic to have light random readers
and heavy writers starting at the same time. A write-intensive database
can behave like this.

Also, I wouldn't panic about needing time to repeat this test. This is
not blocking me as such as all I was interested in was checking if the
switch could be safely made now or should it be deferred while keeping an
eye on how it's doing. It's perfectly possible others will make the switch
and find the majority of their workloads are fine. If others report bugs
and they're using rotary storage then it should

Re: Switching to MQ by default may generate some bug reports

2017-08-04 Thread Paolo Valente

> Il giorno 04 ago 2017, alle ore 13:01, Mel Gorman 
>  ha scritto:
> 
> On Fri, Aug 04, 2017 at 09:26:20AM +0200, Paolo Valente wrote:
>>> I took that into account BFQ with low-latency was also tested and the
>>> impact was not a universal improvement although it can be a noticable
>>> improvement. From the same machine;
>>> 
>>> dbench4 Loadfile Execution Time
>>>4.12.0 4.12.0 
>>> 4.12.0
>>>legacy-cfq mq-bfq
>>> mq-bfq-tput
>>> Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 (  
>>> -5.00%)
>>> Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 (   
>>> 4.45%)
>>> Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 ( 
>>> -10.95%)
>>> Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 (  
>>> 19.86%)
>>> 
>> 
>> Thanks for trying with low_latency disabled.  If I read numbers
>> correctly, we move from a worst case of 361% higher execution time to
>> a worst case of 11%.  With a best case of 20% of lower execution time.
>> 
> 
> Yes.
> 
>> I asked you about none and mq-deadline in a previous email, because
>> actually we have a double change here: change of the I/O stack, and
>> change of the scheduler, with the first change probably not irrelevant
>> with respect to the second one.
>> 
> 
> True. However, the difference between legacy-deadline mq-deadline is
> roughly around the 5-10% mark across workloads for SSD. It's not
> universally true but the impact is not as severe. While this is not
> proof that the stack change is the sole root cause, it makes it less
> likely.
> 

I'm getting a little lost here.  If I'm not mistaken, you are saying,
since the difference between two virtually identical schedulers
(legacy-deadline and mq-deadline) is only around 5-10%, while the
difference between cfq and mq-bfq-tput is higher, then in the latter
case it is not the stack's fault.  Yet the loss of mq-bfq-tput in the
above test is exactly in the 5-10% range?  What am I missing?  Other
tests with mq-bfq-tput not yet reported?

>> By chance, according to what you have measured so far, is there any
>> test where, instead, you expect or have seen bfq-mq-tput to always
>> lose?  I could start from there.
>> 
> 
> global-dhp__io-fio-randread-async-randwrite-xfs but marginal enough that
> it could be the stack change.
> 
> global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
> machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
> ext4 as a filesystem. The same is not true for XFS so the filesystem
> matters.
> 

Ok, then I will try to repeat global-dhp__io-dbench4-fsync-ext4 as
soon as I can, thanks.


>>> However, it's not a universal gain and there are also fairness issues.
>>> For example, this is a fio configuration with a single random reader and
>>> a single random writer on the same machine
>>> 
>>> fio Throughput
>>> 4.12.0 4.12.0   
>>>   4.12.0
>>> legacy-cfq mq-bfq   
>>>  mq-bfq-tput
>>> Hmean kb/sec-writer-write  398.15 (   0.00%) 4659.18 (1070.21%) 
>>> 4934.52 (1139.37%)
>>> Hmean kb/sec-reader-read   507.00 (   0.00%)   66.36 ( -86.91%) 
>>>   14.68 ( -97.10%)
>>> 
>>> With CFQ, there is some fairness between the readers and writers and
>>> with BFQ, there is a strong preference to writers. Again, this is not
>>> universal. It'll be a mix and sometimes it'll be classed as a gain and
>>> sometimes a regression.
>>> 
>> 
>> Yes, that's why I didn't pay too much attention so far to such an
>> issue.  I preferred to tune for maximum responsiveness and minimal
>> latency for soft real-time applications, w.r.t.  to reducing a kind of
>> unfairness for which no user happened to complain (so far).  Do you
>> have some real application (or benchmark simulating a real
>> application) in which we can see actual problems because of this form
>> of unfairness? 
> 
> I don't have data on that. This was a preliminary study only to see if
> a switch was safe running workloads that would appear in internal bug
> reports related to benchmarking.
> 
>> I was thinking of, e.g., two virtual machines, one
>> doing heavy writes and the other heavy reads.  But in that case,
>> cgroups have to be used, and I'm not sure we would still see this
>> problem.  Any suggestion is welcome.
>> 
> 
> I haven't spent time designing such a thing. Even if I did, I know I would
> get hit within weeks of a switch during distro development with reports
> related to fio, dbench and other basic IO benchmarks.
> 

I see.

>>> I had seen this assertion so one of the fio configurations had multiple
>>> heavy writers in the background and a random reader of small files to
>>> simulate that scenario. The intent was to simulate heavy IO in the presence
>>> of app

Re: Switching to MQ by default may generate some bug reports

2017-08-04 Thread Mel Gorman
On Fri, Aug 04, 2017 at 09:26:20AM +0200, Paolo Valente wrote:
> > I took that into account BFQ with low-latency was also tested and the
> > impact was not a universal improvement although it can be a noticable
> > improvement. From the same machine;
> > 
> > dbench4 Loadfile Execution Time
> > 4.12.0 4.12.0 
> > 4.12.0
> > legacy-cfq mq-bfq
> > mq-bfq-tput
> > Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 (  
> > -5.00%)
> > Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 (   
> > 4.45%)
> > Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 ( 
> > -10.95%)
> > Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 (  
> > 19.86%)
> > 
> 
> Thanks for trying with low_latency disabled.  If I read numbers
> correctly, we move from a worst case of 361% higher execution time to
> a worst case of 11%.  With a best case of 20% of lower execution time.
> 

Yes.

> I asked you about none and mq-deadline in a previous email, because
> actually we have a double change here: change of the I/O stack, and
> change of the scheduler, with the first change probably not irrelevant
> with respect to the second one.
> 

True. However, the difference between legacy-deadline mq-deadline is
roughly around the 5-10% mark across workloads for SSD. It's not
universally true but the impact is not as severe. While this is not
proof that the stack change is the sole root cause, it makes it less
likely.

> By chance, according to what you have measured so far, is there any
> test where, instead, you expect or have seen bfq-mq-tput to always
> lose?  I could start from there.
> 

global-dhp__io-fio-randread-async-randwrite-xfs but marginal enough that
it could be the stack change.

global-dhp__io-dbench4-fsync-ext4 was a universal loss across any
machine tested. This is global-dhp__io-dbench4-fsync from mmtests using
ext4 as a filesystem. The same is not true for XFS so the filesystem
matters.

> > However, it's not a universal gain and there are also fairness issues.
> > For example, this is a fio configuration with a single random reader and
> > a single random writer on the same machine
> > 
> > fio Throughput
> >  4.12.0 4.12.0  
> >4.12.0
> >  legacy-cfq mq-bfq  
> >   mq-bfq-tput
> > Hmean kb/sec-writer-write  398.15 (   0.00%) 4659.18 (1070.21%) 
> > 4934.52 (1139.37%)
> > Hmean kb/sec-reader-read   507.00 (   0.00%)   66.36 ( -86.91%) 
> >   14.68 ( -97.10%)
> > 
> > With CFQ, there is some fairness between the readers and writers and
> > with BFQ, there is a strong preference to writers. Again, this is not
> > universal. It'll be a mix and sometimes it'll be classed as a gain and
> > sometimes a regression.
> > 
> 
> Yes, that's why I didn't pay too much attention so far to such an
> issue.  I preferred to tune for maximum responsiveness and minimal
> latency for soft real-time applications, w.r.t.  to reducing a kind of
> unfairness for which no user happened to complain (so far).  Do you
> have some real application (or benchmark simulating a real
> application) in which we can see actual problems because of this form
> of unfairness? 

I don't have data on that. This was a preliminary study only to see if
a switch was safe running workloads that would appear in internal bug
reports related to benchmarking.

> I was thinking of, e.g., two virtual machines, one
> doing heavy writes and the other heavy reads.  But in that case,
> cgroups have to be used, and I'm not sure we would still see this
> problem.  Any suggestion is welcome.
> 

I haven't spent time designing such a thing. Even if I did, I know I would
get hit within weeks of a switch during distro development with reports
related to fio, dbench and other basic IO benchmarks.

> > I had seen this assertion so one of the fio configurations had multiple
> > heavy writers in the background and a random reader of small files to
> > simulate that scenario. The intent was to simulate heavy IO in the presence
> > of application startup
> > 
> >  4.12.0 4.12.0  
> >4.12.0
> >  legacy-cfq mq-bfq  
> >   mq-bfq-tput
> > Hmean kb/sec-writer-write 1997.75 (   0.00%) 2035.65 (   1.90%) 
> > 2014.50 (   0.84%)
> > Hmean kb/sec-reader-read   128.50 (   0.00%)   79.46 ( -38.16%) 
> >   12.78 ( -90.06%)
> > 
> > Write throughput is steady-ish across each IO scheduler but readers get
> > starved badly which I expect would slow application startup and disabling
> > low_latency makes it much worse.
> 
> A greedy random reader that goes on steadily mimics an 

Re: Switching to MQ by default may generate some bug reports

2017-08-04 Thread Paolo Valente

> Il giorno 03 ago 2017, alle ore 13:01, Mel Gorman 
>  ha scritto:
> 
> On Thu, Aug 03, 2017 at 11:21:59AM +0200, Paolo Valente wrote:
>>> For Paulo, if you want to try preemptively dealing with regression reports
>>> before 4.13 releases then all the tests in question can be reproduced with
>>> https://github.com/gormanm/mmtests . The most relevant test configurations
>>> I've seen so far are
>>> 
>>> configs/config-global-dhp__io-dbench4-async
>>> configs/config-global-dhp__io-fio-randread-async-randwrite
>>> configs/config-global-dhp__io-fio-randread-async-seqwrite
>>> configs/config-global-dhp__io-fio-randread-sync-heavywrite
>>> configs/config-global-dhp__io-fio-randread-sync-randwrite
>>> configs/config-global-dhp__pgioperf
>>> 
>> 
>> Hi Mel,
>> as it already happened with the latest Phoronix benchmark article (and
>> with other test results reported several months ago on this list), bad
>> results may be caused (also) by the fact that the low-latency, default
>> configuration of BFQ is being used. 
> 
> I took that into account BFQ with low-latency was also tested and the
> impact was not a universal improvement although it can be a noticable
> improvement. From the same machine;
> 
> dbench4 Loadfile Execution Time
> 4.12.0 4.12.0 
> 4.12.0
> legacy-cfq mq-bfq
> mq-bfq-tput
> Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 (  
> -5.00%)
> Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 (   
> 4.45%)
> Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 ( 
> -10.95%)
> Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 (  
> 19.86%)
> 

Thanks for trying with low_latency disabled.  If I read numbers
correctly, we move from a worst case of 361% higher execution time to
a worst case of 11%.  With a best case of 20% of lower execution time.

I asked you about none and mq-deadline in a previous email, because
actually we have a double change here: change of the I/O stack, and
change of the scheduler, with the first change probably not irrelevant
with respect to the second one.

Are we sure that part of the small losses and gains with bfq-mq-tput
aren't due to the change of I/O stack?  My problem is that it may be
hard to find issues or anomalies in BFQ that justify a 5% or 11% loss
in two cases, while the same scheduler has a 4% and a 20% gain in the
other two cases.

By chance, according to what you have measured so far, is there any
test where, instead, you expect or have seen bfq-mq-tput to always
lose?  I could start from there.

> However, it's not a universal gain and there are also fairness issues.
> For example, this is a fio configuration with a single random reader and
> a single random writer on the same machine
> 
> fio Throughput
>  4.12.0 4.12.0
>  4.12.0
>  legacy-cfq mq-bfq
> mq-bfq-tput
> Hmean kb/sec-writer-write  398.15 (   0.00%) 4659.18 (1070.21%)   
>   4934.52 (1139.37%)
> Hmean kb/sec-reader-read   507.00 (   0.00%)   66.36 ( -86.91%)   
> 14.68 ( -97.10%)
> 
> With CFQ, there is some fairness between the readers and writers and
> with BFQ, there is a strong preference to writers. Again, this is not
> universal. It'll be a mix and sometimes it'll be classed as a gain and
> sometimes a regression.
> 

Yes, that's why I didn't pay too much attention so far to such an
issue.  I preferred to tune for maximum responsiveness and minimal
latency for soft real-time applications, w.r.t.  to reducing a kind of
unfairness for which no user happened to complain (so far).  Do you
have some real application (or benchmark simulating a real
application) in which we can see actual problems because of this form
of unfairness?  I was thinking of, e.g., two virtual machines, one
doing heavy writes and the other heavy reads.  But in that case,
cgroups have to be used, and I'm not sure we would still see this
problem.  Any suggestion is welcome.

In any case, if needed, changing read/write throughput ratio should
not be a problem.

> While I accept that BFQ can be tuned, tuning IO schedulers is not something
> that normal users get right and they'll only look at "out of box" performance
> which, right now, will trigger bug reports. This is neither good nor bad,
> it simply is.
> 
>> This configuration is the default
>> one because the motivation for yet-another-scheduler as BFQ is that it
>> drastically reduces latency for interactive and soft real-time tasks
>> (e.g., opening an app or playing/streaming a video), when there is
>> some background I/O.  Low-latency heuristics are willing to sacrifice
>> throughput when this provides a large benefit in terms of the above
>> latency.
>> 
> 
> I had seen this assertion so one of the fio con

Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Ming Lei
On Thu, Aug 3, 2017 at 6:47 PM, Mel Gorman  wrote:
> On Thu, Aug 03, 2017 at 05:57:50PM +0800, Ming Lei wrote:
>> On Thu, Aug 3, 2017 at 5:42 PM, Mel Gorman  
>> wrote:
>> > On Thu, Aug 03, 2017 at 05:17:21PM +0800, Ming Lei wrote:
>> >> Hi Mel Gorman,
>> >>
>> >> On Thu, Aug 3, 2017 at 4:51 PM, Mel Gorman  
>> >> wrote:
>> >> > Hi Christoph,
>> >> >
>> >> > I know the reasons for switching to MQ by default but just be aware 
>> >> > that it's
>> >> > not without hazards albeit it the biggest issues I've seen are switching
>> >> > CFQ to BFQ. On my home grid, there is some experimental automatic 
>> >> > testing
>> >> > running every few weeks searching for regressions. Yesterday, it noticed
>> >> > that creating some work files for a postgres simulator called pgioperf
>> >> > was 38.33% slower and it auto-bisected to the switch to MQ. This is just
>> >> > linearly writing two files for testing on another benchmark and is not
>> >> > remarkable. The relevant part of the report is
>> >>
>> >> We saw some SCSI-MQ performance issue too, please see if the following
>> >> patchset fixes your issue:
>> >>
>> >> http://marc.info/?l=linux-block&m=150151989915776&w=2
>> >>
>> >
>> > That series is dealing with problems with legacy-deadline vs mq-none where
>> > as the bulk of the problems reported in this mail are related to
>> > legacy-CFQ vs mq-BFQ.
>>
>> The serials deals with none and all mq schedulers, and you can see
>> the improvement on mq-deadline in cover letter, :-)
>>
>
> Would it be expected to fix a 2x to 4x slowdown as experienced by BFQ
> that was not observed on other schedulers?

Actually if you look at the cover letter, you will see this patchset
increases by
> 10X sequential I/O IOPS on mq-deadline, so it would be reasonable to see
2x to 4x BFQ slowdown, but I didn't test BFQ.

Thanks,
Ming Lei


Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Mel Gorman
On Thu, Aug 03, 2017 at 11:21:59AM +0200, Paolo Valente wrote:
> > For Paulo, if you want to try preemptively dealing with regression reports
> > before 4.13 releases then all the tests in question can be reproduced with
> > https://github.com/gormanm/mmtests . The most relevant test configurations
> > I've seen so far are
> > 
> > configs/config-global-dhp__io-dbench4-async
> > configs/config-global-dhp__io-fio-randread-async-randwrite
> > configs/config-global-dhp__io-fio-randread-async-seqwrite
> > configs/config-global-dhp__io-fio-randread-sync-heavywrite
> > configs/config-global-dhp__io-fio-randread-sync-randwrite
> > configs/config-global-dhp__pgioperf
> > 
> 
> Hi Mel,
> as it already happened with the latest Phoronix benchmark article (and
> with other test results reported several months ago on this list), bad
> results may be caused (also) by the fact that the low-latency, default
> configuration of BFQ is being used. 

I took that into account BFQ with low-latency was also tested and the
impact was not a universal improvement although it can be a noticable
improvement. From the same machine;

dbench4 Loadfile Execution Time
 4.12.0 4.12.0 
4.12.0
 legacy-cfq mq-bfq
mq-bfq-tput
Amean 180.67 (   0.00%)   83.68 (  -3.74%)   84.70 (  
-5.00%)
Amean 292.87 (   0.00%)  121.63 ( -30.96%)   88.74 (   
4.45%)
Amean 4   102.72 (   0.00%)  474.33 (-361.77%)  113.97 ( 
-10.95%)
Amean 32 2543.93 (   0.00%) 1927.65 (  24.23%) 2038.74 (  
19.86%)

However, it's not a universal gain and there are also fairness issues.
For example, this is a fio configuration with a single random reader and
a single random writer on the same machine

fio Throughput
  4.12.0 4.12.0 
4.12.0
  legacy-cfq mq-bfq 
   mq-bfq-tput
Hmean kb/sec-writer-write  398.15 (   0.00%) 4659.18 (1070.21%) 
4934.52 (1139.37%)
Hmean kb/sec-reader-read   507.00 (   0.00%)   66.36 ( -86.91%) 
  14.68 ( -97.10%)

With CFQ, there is some fairness between the readers and writers and
with BFQ, there is a strong preference to writers. Again, this is not
universal. It'll be a mix and sometimes it'll be classed as a gain and
sometimes a regression.

While I accept that BFQ can be tuned, tuning IO schedulers is not something
that normal users get right and they'll only look at "out of box" performance
which, right now, will trigger bug reports. This is neither good nor bad,
it simply is.

> This configuration is the default
> one because the motivation for yet-another-scheduler as BFQ is that it
> drastically reduces latency for interactive and soft real-time tasks
> (e.g., opening an app or playing/streaming a video), when there is
> some background I/O.  Low-latency heuristics are willing to sacrifice
> throughput when this provides a large benefit in terms of the above
> latency.
> 

I had seen this assertion so one of the fio configurations had multiple
heavy writers in the background and a random reader of small files to
simulate that scenario. The intent was to simulate heavy IO in the presence
of application startup

  4.12.0 4.12.0 
4.12.0
  legacy-cfq mq-bfq 
   mq-bfq-tput
Hmean kb/sec-writer-write 1997.75 (   0.00%) 2035.65 (   1.90%) 
2014.50 (   0.84%)
Hmean kb/sec-reader-read   128.50 (   0.00%)   79.46 ( -38.16%) 
  12.78 ( -90.06%)

Write throughput is steady-ish across each IO scheduler but readers get
starved badly which I expect would slow application startup and disabling
low_latency makes it much worse. The mmtests configuration in question
is global-dhp__io-fio-randread-sync-heavywrite albeit editted to create
a fresh XFS filesystem on a test partition.

This is not exactly equivalent to real application startup but that can
be difficult to quantify properly.

> Of course, BFQ may not be optimal for every workload, even if
> low-latency mode is switched off.  In addition, there may still be
> some bug.  I'll repeat your tests on a machine of mine ASAP.
> 

The intent here is not to rag on BFQ because I know it's going to have some
wins and some losses and will take time to fix up. The primary intent was
to flag that 4.13 might have some "blah blah blah is slower on 4.13" reports
due to the switching of defaults that will bisect to a misleading commit.

-- 
Mel Gorman
SUSE Labs


Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Mel Gorman
On Thu, Aug 03, 2017 at 05:57:50PM +0800, Ming Lei wrote:
> On Thu, Aug 3, 2017 at 5:42 PM, Mel Gorman  
> wrote:
> > On Thu, Aug 03, 2017 at 05:17:21PM +0800, Ming Lei wrote:
> >> Hi Mel Gorman,
> >>
> >> On Thu, Aug 3, 2017 at 4:51 PM, Mel Gorman  
> >> wrote:
> >> > Hi Christoph,
> >> >
> >> > I know the reasons for switching to MQ by default but just be aware that 
> >> > it's
> >> > not without hazards albeit it the biggest issues I've seen are switching
> >> > CFQ to BFQ. On my home grid, there is some experimental automatic testing
> >> > running every few weeks searching for regressions. Yesterday, it noticed
> >> > that creating some work files for a postgres simulator called pgioperf
> >> > was 38.33% slower and it auto-bisected to the switch to MQ. This is just
> >> > linearly writing two files for testing on another benchmark and is not
> >> > remarkable. The relevant part of the report is
> >>
> >> We saw some SCSI-MQ performance issue too, please see if the following
> >> patchset fixes your issue:
> >>
> >> http://marc.info/?l=linux-block&m=150151989915776&w=2
> >>
> >
> > That series is dealing with problems with legacy-deadline vs mq-none where
> > as the bulk of the problems reported in this mail are related to
> > legacy-CFQ vs mq-BFQ.
> 
> The serials deals with none and all mq schedulers, and you can see
> the improvement on mq-deadline in cover letter, :-)
> 

Would it be expected to fix a 2x to 4x slowdown as experienced by BFQ
that was not observed on other schedulers?

-- 
Mel Gorman
SUSE Labs


Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Mel Gorman
On Thu, Aug 03, 2017 at 11:44:06AM +0200, Paolo Valente wrote:
> > That series is dealing with problems with legacy-deadline vs mq-none where
> > as the bulk of the problems reported in this mail are related to
> > legacy-CFQ vs mq-BFQ.
> > 
> 
> Out-of-curiosity: you get no regression with mq-none or mq-deadline?
> 

I didn't test mq-none as the underlying storage was not fast enough to
make a legacy-noop vs mq-none meaningful. legacy-deadline vs mq-deadline
did show small regressions on some workloads but not as dramatic and
small enough that it would go unmissed in some cases.

-- 
Mel Gorman
SUSE Labs


Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Ming Lei
On Thu, Aug 3, 2017 at 5:42 PM, Mel Gorman  wrote:
> On Thu, Aug 03, 2017 at 05:17:21PM +0800, Ming Lei wrote:
>> Hi Mel Gorman,
>>
>> On Thu, Aug 3, 2017 at 4:51 PM, Mel Gorman  
>> wrote:
>> > Hi Christoph,
>> >
>> > I know the reasons for switching to MQ by default but just be aware that 
>> > it's
>> > not without hazards albeit it the biggest issues I've seen are switching
>> > CFQ to BFQ. On my home grid, there is some experimental automatic testing
>> > running every few weeks searching for regressions. Yesterday, it noticed
>> > that creating some work files for a postgres simulator called pgioperf
>> > was 38.33% slower and it auto-bisected to the switch to MQ. This is just
>> > linearly writing two files for testing on another benchmark and is not
>> > remarkable. The relevant part of the report is
>>
>> We saw some SCSI-MQ performance issue too, please see if the following
>> patchset fixes your issue:
>>
>> http://marc.info/?l=linux-block&m=150151989915776&w=2
>>
>
> That series is dealing with problems with legacy-deadline vs mq-none where
> as the bulk of the problems reported in this mail are related to
> legacy-CFQ vs mq-BFQ.

The serials deals with none and all mq schedulers, and you can see
the improvement on mq-deadline in cover letter, :-)

Thanks,
Ming Lei


Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Paolo Valente

> Il giorno 03 ago 2017, alle ore 11:42, Mel Gorman 
>  ha scritto:
> 
> On Thu, Aug 03, 2017 at 05:17:21PM +0800, Ming Lei wrote:
>> Hi Mel Gorman,
>> 
>> On Thu, Aug 3, 2017 at 4:51 PM, Mel Gorman  
>> wrote:
>>> Hi Christoph,
>>> 
>>> I know the reasons for switching to MQ by default but just be aware that 
>>> it's
>>> not without hazards albeit it the biggest issues I've seen are switching
>>> CFQ to BFQ. On my home grid, there is some experimental automatic testing
>>> running every few weeks searching for regressions. Yesterday, it noticed
>>> that creating some work files for a postgres simulator called pgioperf
>>> was 38.33% slower and it auto-bisected to the switch to MQ. This is just
>>> linearly writing two files for testing on another benchmark and is not
>>> remarkable. The relevant part of the report is
>> 
>> We saw some SCSI-MQ performance issue too, please see if the following
>> patchset fixes your issue:
>> 
>> http://marc.info/?l=linux-block&m=150151989915776&w=2
>> 
> 
> That series is dealing with problems with legacy-deadline vs mq-none where
> as the bulk of the problems reported in this mail are related to
> legacy-CFQ vs mq-BFQ.
> 

Out-of-curiosity: you get no regression with mq-none or mq-deadline?

Thanks,
Paolo

> -- 
> Mel Gorman
> SUSE Labs



Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Mel Gorman
On Thu, Aug 03, 2017 at 05:17:21PM +0800, Ming Lei wrote:
> Hi Mel Gorman,
> 
> On Thu, Aug 3, 2017 at 4:51 PM, Mel Gorman  
> wrote:
> > Hi Christoph,
> >
> > I know the reasons for switching to MQ by default but just be aware that 
> > it's
> > not without hazards albeit it the biggest issues I've seen are switching
> > CFQ to BFQ. On my home grid, there is some experimental automatic testing
> > running every few weeks searching for regressions. Yesterday, it noticed
> > that creating some work files for a postgres simulator called pgioperf
> > was 38.33% slower and it auto-bisected to the switch to MQ. This is just
> > linearly writing two files for testing on another benchmark and is not
> > remarkable. The relevant part of the report is
> 
> We saw some SCSI-MQ performance issue too, please see if the following
> patchset fixes your issue:
> 
> http://marc.info/?l=linux-block&m=150151989915776&w=2
> 

That series is dealing with problems with legacy-deadline vs mq-none where
as the bulk of the problems reported in this mail are related to
legacy-CFQ vs mq-BFQ.

-- 
Mel Gorman
SUSE Labs


Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Ming Lei
On Thu, Aug 3, 2017 at 5:17 PM, Ming Lei  wrote:
> Hi Mel Gorman,
>
> On Thu, Aug 3, 2017 at 4:51 PM, Mel Gorman  
> wrote:
>> Hi Christoph,
>>
>> I know the reasons for switching to MQ by default but just be aware that it's
>> not without hazards albeit it the biggest issues I've seen are switching
>> CFQ to BFQ. On my home grid, there is some experimental automatic testing
>> running every few weeks searching for regressions. Yesterday, it noticed
>> that creating some work files for a postgres simulator called pgioperf
>> was 38.33% slower and it auto-bisected to the switch to MQ. This is just
>> linearly writing two files for testing on another benchmark and is not
>> remarkable. The relevant part of the report is
>
> We saw some SCSI-MQ performance issue too, please see if the following
> patchset fixes your issue:
>
> http://marc.info/?l=linux-block&m=150151989915776&w=2

BTW, the above patches(V1) can be found in the following tree:

  https://github.com/ming1/linux/commits/blk-mq-dispatch_for_scsi.V1

V2 has already been done but not posted out yet, because the performance test
on SRP isn't completed:

  https://github.com/ming1/linux/commits/blk-mq-dispatch_for_scsi.V2


Thanks,
Ming Lei


Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Paolo Valente

> Il giorno 03 ago 2017, alle ore 10:51, Mel Gorman 
>  ha scritto:
> 
> Hi Christoph,
> 
> I know the reasons for switching to MQ by default but just be aware that it's
> not without hazards albeit it the biggest issues I've seen are switching
> CFQ to BFQ. On my home grid, there is some experimental automatic testing
> running every few weeks searching for regressions. Yesterday, it noticed
> that creating some work files for a postgres simulator called pgioperf
> was 38.33% slower and it auto-bisected to the switch to MQ. This is just
> linearly writing two files for testing on another benchmark and is not
> remarkable. The relevant part of the report is
> 
> Last good/First bad commit
> ==
> Last good commit: 6d311fa7d2c18659d040b9beba5e41fe24c2a6f5
> First bad commit: 5c279bd9e40624f4ab6e688671026d6005b066fa
> From 5c279bd9e40624f4ab6e688671026d6005b066fa Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Fri, 16 Jun 2017 10:27:55 +0200
> Subject: [PATCH] scsi: default to scsi-mq
> Remove the SCSI_MQ_DEFAULT config option and default to the blk-mq I/O
> path now that we had plenty of testing, and have I/O schedulers for
> blk-mq.  The module option to disable the blk-mq path is kept around for
> now.
> Signed-off-by: Christoph Hellwig 
> Signed-off-by: Martin K. Petersen 
> drivers/scsi/Kconfig | 11 ---
> drivers/scsi/scsi.c  |  4 
> 2 files changed, 15 deletions(-)
> 
> Comparison
> ==
>initialinitial 
>   last  penup  first
> good-v4.12   bad-16f73eb02d7e  
> good-6d311fa7  good-d06c587d   bad-5c279bd9
> Usermin 0.06 (   0.00%)0.14 (-133.33%)0.14 
> (-133.33%)0.06 (   0.00%)0.19 (-216.67%)
> Usermean0.06 (   0.00%)0.14 (-133.33%)0.14 
> (-133.33%)0.06 (   0.00%)0.19 (-216.67%)
> Userstddev  0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> Usercoeffvar0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> Usermax 0.06 (   0.00%)0.14 (-133.33%)0.14 
> (-133.33%)0.06 (   0.00%)0.19 (-216.67%)
> System  min10.04 (   0.00%)   10.75 (  -7.07%)   10.05 (  
> -0.10%)   10.16 (  -1.20%)   10.73 (  -6.87%)
> System  mean   10.04 (   0.00%)   10.75 (  -7.07%)   10.05 (  
> -0.10%)   10.16 (  -1.20%)   10.73 (  -6.87%)
> System  stddev  0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> System  coeffvar0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> System  max10.04 (   0.00%)   10.75 (  -7.07%)   10.05 (  
> -0.10%)   10.16 (  -1.20%)   10.73 (  -6.87%)
> Elapsed min   251.53 (   0.00%)  351.05 ( -39.57%)  252.83 (  
> -0.52%)  252.96 (  -0.57%)  347.93 ( -38.33%)
> Elapsed mean  251.53 (   0.00%)  351.05 ( -39.57%)  252.83 (  
> -0.52%)  252.96 (  -0.57%)  347.93 ( -38.33%)
> Elapsed stddev  0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> Elapsed coeffvar0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> Elapsed max   251.53 (   0.00%)  351.05 ( -39.57%)  252.83 (  
> -0.52%)  252.96 (  -0.57%)  347.93 ( -38.33%)
> CPU min 4.00 (   0.00%)3.00 (  25.00%)4.00 (  
>  0.00%)4.00 (   0.00%)3.00 (  25.00%)
> CPU mean4.00 (   0.00%)3.00 (  25.00%)4.00 (  
>  0.00%)4.00 (   0.00%)3.00 (  25.00%)
> CPU stddev  0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> CPU coeffvar0.00 (   0.00%)0.00 (   0.00%)0.00 (  
>  0.00%)0.00 (   0.00%)0.00 (   0.00%)
> CPU max 4.00 (   0.00%)3.00 (  25.00%)4.00 (  
>  0.00%)4.00 (   0.00%)3.00 (  25.00%)
> 
> The "Elapsed mean" line is what the testing and auto-bisection was paying
> attention to. Commit 16f73eb02d7e is simply the head commit at the time
> the continuous testing started. The first "bad commit" is the last column.
> 
> It's not the only slowdown that has been observed from other testing when
> examining whether it's ok to switch to MQ by default. The biggest slowdown
> observed was with a modified version of dbench4 -- the modifications use
> shorter, but representative, load files to avoid timing

Re: Switching to MQ by default may generate some bug reports

2017-08-03 Thread Ming Lei
Hi Mel Gorman,

On Thu, Aug 3, 2017 at 4:51 PM, Mel Gorman  wrote:
> Hi Christoph,
>
> I know the reasons for switching to MQ by default but just be aware that it's
> not without hazards albeit it the biggest issues I've seen are switching
> CFQ to BFQ. On my home grid, there is some experimental automatic testing
> running every few weeks searching for regressions. Yesterday, it noticed
> that creating some work files for a postgres simulator called pgioperf
> was 38.33% slower and it auto-bisected to the switch to MQ. This is just
> linearly writing two files for testing on another benchmark and is not
> remarkable. The relevant part of the report is

We saw some SCSI-MQ performance issue too, please see if the following
patchset fixes your issue:

http://marc.info/?l=linux-block&m=150151989915776&w=2

Thanks,
Ming