Re: [PATCH] blk-mq: provider helper for setting up an SQ queue and tag set

2018-10-16 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes ThumshirnSUSE Labs
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH, RFC] ubd: remove use of blk_rq_map_sg

2018-10-16 Thread Richard Weinberger
Am Dienstag, 16. Oktober 2018, 04:19:51 CEST schrieb Jens Axboe:
> On 10/15/18 4:44 PM, Richard Weinberger wrote:
> > Am Dienstag, 16. Oktober 2018, 00:04:20 CEST schrieb Jens Axboe:
> >> On 10/15/18 3:46 PM, Richard Weinberger wrote:
> >>> Am Montag, 15. Oktober 2018, 22:55:29 CEST schrieb Christoph Hellwig:
>  On Mon, Oct 15, 2018 at 10:42:47PM +0200, Richard Weinberger wrote:
> >> Sadly not. I'm checking now what exactly is broken.
> >
> > I take this back. Christoph's fixup makes reading work.
> > The previous version corrupted my test block device in interesting ways
> > and confused all tests.
> > But the removal of blk_rq_map_sg() still has issues.
> > Now the device blocks endless upon flush.
> 
>  I suspect we still need to special case flush.  Updated patch below
>  including your other suggestion:
> >>>
> >>> While playing further with the patch I managed to hit
> >>> BUG_ON(blk_queued_rq(rq)) in blk_mq_requeue_request().
> >>>
> >>> UML requeues the request in ubd_queue_one_vec() if it was not able
> >>> to submit the request to the host io-thread.
> >>> The fd can return -EAGAIN, then UML has to try later.
> >>>
> >>> Isn't this allowed in that context?
> >>
> >> It is, the problem is that queue_one_vec() doesn't always return an
> >> error. The caller is doing a loop per bio, so we can encounter an
> >> error, requeue, and then the caller will call us again. We're in
> >> an illegal state at that point, and the next requeue will make that
> >> obvious since it's already pending. Actually, both the caller and
> >> ubd_queue_one_vec() also requeue. So it's a bit of a mess, the below
> >> might help.
> > 
> > I agree, the driver *is* a mess.
> > Unless someone else volunteers to clean it up, I'll push that task on
> > my never ending TODO list.
> 
> I doubt you'll have to fight anyone for that task ;-)
> 
> > Thanks for your hint with the illegal state.
> > Now with correct requeuing the driver seems to work fine!
> > Write/Flush support also suffered from that but didn't trigger the 
> > BUG_ON()...
> 
> OK good, at least we're making progress!

Yes. Shall I send a patch with your suggestion or will you?

I have one more question, in your first conversion you set queue_depth to 2.
How does one know this value?
My conversion has 64, which is more or less an educated guess... ;)

Thanks,
//richard





Re: [PATCH v2] block: BFQ default for single queue devices

2018-10-16 Thread Jens Axboe
On 10/15/18 1:44 PM, Paolo Valente wrote:
> 
> 
>> Il giorno 15 ott 2018, alle ore 21:26, Jens Axboe  ha 
>> scritto:
>>
>> On 10/15/18 12:26 PM, Paolo Valente wrote:
>>>
>>>
 Il giorno 15 ott 2018, alle ore 17:39, Jens Axboe  ha 
 scritto:

 On 10/15/18 8:10 AM, Linus Walleij wrote:
> This sets BFQ as the default scheduler for single queue
> block devices (nr_hw_queues == 1) if it is available. This
> affects notably MMC/SD-cards but also UBI and the loopback
> device.
>
> I have been running it for a while without any negative
> effects on my pet systems and I want some wider testing
> so let's throw it out there and see what people say.
> Admittedly my use cases are limited. I need to keep this
> patch around for my personal needs anyway.
>
> We take special care to avoid using BFQ on zoned devices
> (in particular SMR, shingled magnetic recording devices)
> as these currently require mq-deadline to group writes
> together.
>
> I have opted against introducing any default scheduler
> through Kconfig as the mq-deadline enforcement for
> zoned devices has to be done at runtime anyways and
> too many config options will make things confusing.
>
> My argument for setting a default policy in the kernel
> as opposed to user space is the "reasonable defaults"
> type, analogous to how we have one default CPU scheduling
> policy (CFS) that make most sense for most tasks, and
> how automatic process group scheduling happens in most
> distributions without userspace involvement. The BFQ
> scheduling policy makes most sense for single hardware
> queue devices and many embedded systems will not have
> the clever userspace tools (such as udev) to make an
> educated choice of scheduling policy. Defaults should be
> those that make most sense for the hardware.

 I still don't like this. There are going to be tons of
 cases where the single queue device is some hw raid setup
 or similar, where performance is going to be much worse with
 BFQ than it is with mq-deadline, for instance. That's just
 one case.

>>>
>>> Hi Jens,
>>> in my RAID tests bfq performed as well as in non-RAID tests.  Probably
>>> you refer to the fact that, in a RAID configuration, IOPS can become
>>> very high.  But, if that is the case, then the response to your
>>> objections already emerged in the previous thread.  Let me sum it up
>>> again.
>>>
>>> I tested bfq on virtually every device in the range from few hundred
>>> of IOPS to 50-100KIOPS.  Then, through the public script I already
>>> mentioned, I found the maximum number of IOPS that bfq can handle:
>>> about 400K with a commodity CPU.
>>>
>>> In particular, in all my tests with real hardware, bfq
>>> - is not even comparable to that of any of the other scheduler, in
>>>  terms of responsiveness, latency for real-time applications, ability
>>>  to provide strong bandwidth guarantees, ability to boost throughput
>>>  while guaranteeing bandwidths;
>>> - is a little worse than the other scheduler for only one test, on
>>>  only some hardware: total throughput with random reads, were it may
>>>  lose up to 10-15% of throughput.  Of course, the scheduler that reach
>>>  a higher throughput leave the machine unusable during the test.
>>>
>>> So I really cannot see a reason why bfq could do worse than any of
>>> these other schedulers for some single-queue device (conservatively)
>>> below 300KIOPS.
>>>
>>> Finally, since, AFAICT, single-queue devices doing 400+ KIOPS are
>>> probably less than 1% of all the single-queue storage around (USB
>>> drives, HDDs, eMMC, standard SSDs, ...), by sticking to mq-deadline we
>>> are sacrificing 99% of the hardware, to help 1% of the hardware, for
>>> one kind of test cases.
>>
>> I should have been more clear - I'm not worried about IOPS overhead,
>> I'm worried about scheduling decisions that lower performance on
>> (for instance) raid composed of many drives (rotational or otherwise).
>>
>> If you have actual data (on what hardware, and what kind of tests)
>> to disprove that worry, then that's great, and I'd love to see that.
>>
> 
> Here are some old results with a very simple configuration:
> http://algo.ing.unimo.it/people/paolo/disk_sched/old-results/4.4.0-v7r11/
> http://algo.ing.unimo.it/people/paolo/disk_sched/old-results/3.14.0-v7r3/
> http://algo.ing.unimo.it/people/paolo/disk_sched/old-results/3.13.0-v7r2/
> 
> Then I stopped repeating tests that always yielded the same good results.
> 
> As for more professional systems, a well-known company doing
> real-time packet-traffic dumping asked me to modify bfq so as to
> guarantee lossless data writing also during queries.  The involved box
> had a RAID reaching a few Gbps, and everything worked well.
> 
> Anyway, if you have specific issues in mind, I can check more deeply.

Do you have anything more recent? All of these 

Re: [PATCH 0/15 v2] loop: Fix oops and possible deadlocks

2018-10-16 Thread Omar Sandoval
On Tue, Oct 16, 2018 at 01:36:54PM +0200, Jan Kara wrote:
> On Wed 10-10-18 14:28:09, Jan Kara wrote:
> > On Wed 10-10-18 13:42:27, Johannes Thumshirn wrote:
> > > On Wed, Oct 10, 2018 at 07:19:00PM +0900, Tetsuo Handa wrote:
> > > > On 2018/10/10 19:04, Jan Kara wrote:
> > > > > Hi,
> > > > > 
> > > > > this patch series fixes oops and possible deadlocks as reported by 
> > > > > syzbot [1]
> > > > > [2]. The second patch in the series (from Tetsuo) fixes the oops, the 
> > > > > remaining
> > > > > patches are cleaning up the locking in the loop driver so that we can 
> > > > > in the
> > > > > end reasonably easily switch to rereading partitions without holding 
> > > > > mutex
> > > > > protecting the loop device.
> > > > > 
> > > > > I have lightly tested the patches by creating, deleting, and 
> > > > > modifying loop
> > > > > devices but if there's some more comprehensive loopback device 
> > > > > testsuite, I
> > > > > can try running it. Review is welcome!
> > > > 
> > > > Testing on linux-next by syzbot will be the most comprehensive. ;-)
> > > 
> > > Apart from that blktests has a loop category and I think it could also be
> > > worthwhile to add the C reproducer from syzkaller to blktests.
> > 
> > Yeah, I did run loop tests now and they ran fine. I can try converting the
> > syzbot reproducers into something legible but it will take a while.
> 
> So I took a stab at this. But I hit two issues:
> 
> 1) For the reproducer triggering the lockdep warning, you need a 32-bit
> binary (so that it uses compat_ioctl). I don't think we want to introduce
> 32-bit devel environment dependency to blktests. With 64-bits, the problem
> is also there but someone noticed and silenced lockdep (with a reason that
> I consider is incorrect)... I think the test is still worth it though as
> I'll remove the lockdep-fooling code in my patches and thus new breakage
> will be noticed.

Agreed, even if it doesn't trigger lockdep now, it's a good regression
test.

> 2) For the oops (use-after-free) issue I was not able to reproduce that in
> my test KVM in couple hours. The race window is rather narrow and syzbot
> with KASAN and everything hit it only 11 times. So I'm not sure how useful
> that test is. Any opinions?

I'd say we should add it anyways. If anything, it's a smoke test for
changing fds on a loop device. You could add a note that the race it's
testing for is very narrow.


Re: [PATCH v2] block: BFQ default for single queue devices

2018-10-16 Thread Paolo Valente



> Il giorno 16 ott 2018, alle ore 18:14, Federico Motta  ha 
> scritto:
> 
> On 10/15/18 5:02 PM, Bart Van Assche wrote:
>> On Mon, 2018-10-15 at 16:10 +0200, Linus Walleij wrote:
>>> + * For blk-mq devices, we default to using:
>>> + * - "none" for multiqueue devices (nr_hw_queues != 1)
>>> + * - "bfq", if available, for single queue devices
>>> + * - "mq-deadline" if "bfq" is not available for single queue devices
>>> + * - "none" for single queue devices as well as last resort
>> 
>> For SATA SSDs nr_hw_queues == 1 so this patch will also affect these SSDs.
>> Since this patch is an attempt to improve performance, I'd like to see
>> measurement data for one or more recent SATA SSDs before a decision is
>> taken about what to do with this patch. 
>> 
>> Thanks,
>> 
>> Bart.
>> 
> 
> Hi,
> although these tests should be run for single-queue devices, I tried to
> run them on an NVMe high-performance device. Imho if results are good
> in such a "difficult to deal with" multi-queue device, they should be
> good enough also in a "simpler" single-queue storage device..
> 
> Testbed specs:
> kernel = 4.18.0 (from bfq dev branch [1], where bfq already contains
> also the commits that will be available from 4.20)
> fs = ext4
> drive  = ssd samsung 960 pro NVMe m.2 512gb
> 
> Device data sheet specs state that under random IO:
> * QD  1 thread 1
>  * read  = 14 kIOPS
>  * write = 50 kIOPS
> * QD 32 thread 4
>  * read = write = 330 kIOPS
> 
> What follows is a results summary; under requests I can give all
> results. The workload notation (e.g. 5r5w-seq) means:
> - num_readers  (5r)
> - num_writers  (5w)
> - sequential_io or random_io   (-seq)
> 
> 
> # replayed gnome-terminal startup time (lower is better)
> workload  bfq-mq [s]  none [s]  % gain
>   --    --
> 10r-seq0.3725  2.79 86.65
> 5r5w-seq0.9725  5.53 82.41
> 
> # throughput (higher is better)
> workload   bfq-mq [mb/s]  none [mb/s]   % gain
> -  -  ---  ---
> 10r-rand   394.806  429.735-8.128
> 10r-seq   1387.63  1431.81 -3.086
>  1r-seq838.13   798.872 4.914
> 5r5w-rand  1118.12  1297.46-13.822
> 5r5w-seq   1187 1313.8  -9.651
> 

A little unexpectedly for me, throughput loss for random I/O is even
lower than what I have obtained with my nasty SATA SSD (and reported
in my public results).

I didn't expect that little loss with sequential parallel reads.
Probably, when going multiqueue, there are changes I haven't even
thought about (I have never even tested bfq on a multi-queue device).

Thanks,
Paolo

> Thanks,
> Federico
> 
> [1] https://github.com/Algodev-github/bfq-mq/commits/bfq-mq



Re: [PATCH, RFC] ubd: remove use of blk_rq_map_sg

2018-10-16 Thread Richard Weinberger
On Mon, Oct 15, 2018 at 8:56 AM Christoph Hellwig  wrote:
>
> There is no good reason to create a scatterlist in the ubd driver,
> it can just iterate the request directly.

BTW: Does it make sense to drop blk_rq_map_sq from()
drivers/mtd/ubi/block.c too?
If so we have to allocate a temporary structure for the worker thread
for each segment, just like
UBD does already. I'm not sure if that is cheaper than blk_rq_map_sq().

-- 
Thanks,
//richard


Re: [PATCH v2] block: BFQ default for single queue devices

2018-10-16 Thread Paolo Valente



> Il giorno 15 ott 2018, alle ore 20:34, Paolo Valente 
>  ha scritto:
> 
> 
> 
>> Il giorno 15 ott 2018, alle ore 17:02, Bart Van Assche  
>> ha scritto:
>> 
>> On Mon, 2018-10-15 at 16:10 +0200, Linus Walleij wrote:
>>> + * For blk-mq devices, we default to using:
>>> + * - "none" for multiqueue devices (nr_hw_queues != 1)
>>> + * - "bfq", if available, for single queue devices
>>> + * - "mq-deadline" if "bfq" is not available for single queue devices
>>> + * - "none" for single queue devices as well as last resort
>> 
>> For SATA SSDs nr_hw_queues == 1 so this patch will also affect these SSDs.
>> Since this patch is an attempt to improve performance, I'd like to see
>> measurement data for one or more recent SATA SSDs before a decision is
>> taken about what to do with this patch. 
>> 
> 
> Hi Bart,
> as I just wrote to Jens I don't think we need this test any longer.
> To save you one hope, I'll paste my reply to Jens below.
> 
> Anyway, it is very easy to do the tests you ask:
> - take a kernel containing the last bfq commits, such as for-next
> - do, e.g.,
> git clone https://github.com/Algodev-github/S.git
> cd S/run_multiple_benchmarks
> sudo ./run_main_benchmarks.sh "throughput replayed-startup" "bfq none"
> - compare results
> 

Two things:

1) By mistake, I put 'none' in the last command line above, but it should be 
mq-deadline:

sudo ./run_main_benchmarks.sh "throughput replayed-startup" "bfq mq-deadline"

2) If you are worried about wearing your device with writes, then just append 
'raw' to the last command line. So:

sudo ./run_main_benchmarks.sh "throughput replayed-startup" "bfq mq-deadline" 
raw

'raw' means: "don't even create files for the background traffic, but just read 
raw sectors".

Thanks,
Paolo

> Of course, do not do it for multi-queue devices or single-queues
> devices, on steroids, that do 400-500 KIOPS.
> 
> I'll see if I can convince someone to repeat these tests with a recent
> SSD.
> 
> And here is again my reply to Jens, which I think holds for your repeated
> objection too.
> 
> I tested bfq on virtually every device in the range from few hundred
> of IOPS to 50-100KIOPS.  Then, through the public script I already
> mentioned, I found the maximum number of IOPS that bfq can handle:
> about 400K with a commodity CPU.
> 
> In particular, in all my tests with real hardware, bfq performance
> - is not even comparable to that of any of the other scheduler, in
> terms of responsiveness, latency for real-time applications, ability
> to provide strong bandwidth guarantees, ability to boost throughput
> while guaranteeing bandwidths;
> - is a little worse than the other schedulers for only one test, on
> only some hardware: total throughput with random reads, were it may
> lose up to 10-15% of throughput.  Of course, the schedulers that reach
> a higher throughput leave the machine unusable during the test.
> 
> So I really cannot see a reason why bfq could do worse than any of
> these other schedulers for some single-queue device (conservatively)
> below 300KIOPS.
> 
> Finally, since, AFAICT, single-queue devices doing 400+ KIOPS are
> probably less than 1% of all the single-queue storage around (USB
> drives, HDDs, eMMC, standard SSDs, ...), by sticking to mq-deadline we
> are sacrificing 99% of the hardware, to help 1% of the hardware for
> one kind of test cases.
> 
> Thanks,
> Paolo
> 
>> Thanks,
>> 
>> Bart.
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "bfq-iosched" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to bfq-iosched+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



Re: [PATCH v2] block: BFQ default for single queue devices

2018-10-16 Thread Ulf Hansson
On 15 October 2018 at 16:10, Linus Walleij  wrote:
> This sets BFQ as the default scheduler for single queue
> block devices (nr_hw_queues == 1) if it is available. This
> affects notably MMC/SD-cards but also UBI and the loopback
> device.
>
> I have been running it for a while without any negative
> effects on my pet systems and I want some wider testing
> so let's throw it out there and see what people say.
> Admittedly my use cases are limited. I need to keep this
> patch around for my personal needs anyway.
>
> We take special care to avoid using BFQ on zoned devices
> (in particular SMR, shingled magnetic recording devices)
> as these currently require mq-deadline to group writes
> together.
>
> I have opted against introducing any default scheduler
> through Kconfig as the mq-deadline enforcement for
> zoned devices has to be done at runtime anyways and
> too many config options will make things confusing.
>
> My argument for setting a default policy in the kernel
> as opposed to user space is the "reasonable defaults"
> type, analogous to how we have one default CPU scheduling
> policy (CFS) that make most sense for most tasks, and
> how automatic process group scheduling happens in most
> distributions without userspace involvement. The BFQ
> scheduling policy makes most sense for single hardware
> queue devices and many embedded systems will not have
> the clever userspace tools (such as udev) to make an
> educated choice of scheduling policy. Defaults should be
> those that make most sense for the hardware.

As already stated for v1, this makes perfect sense to me, thanks for posting it!

I do understand there is some pushback from Bart and Jens, around how
to move this forward. However, let's hope they get convinced to try
this out.

When it comes to potential "performance" regressions, I am sure Paolo
is standing-by to help out with BFQ changes, if needed. Moreover, we
can always do a simple revert in worst case scenario, especially since
the change is really limited.

>
> Cc: Pavel Machek 
> Cc: Paolo Valente 
> Cc: Jens Axboe 
> Cc: Ulf Hansson 
> Cc: Richard Weinberger 
> Cc: Adrian Hunter 
> Cc: Bart Van Assche 
> Cc: Jan Kara 
> Cc: Artem Bityutskiy 
> Cc: Christoph Hellwig 
> Cc: Alan Cox 
> Cc: Mark Brown 
> Cc: Damien Le Moal 
> Cc: Johannes Thumshirn 
> Cc: Oleksandr Natalenko 
> Cc: Jonathan Corbet 
> Signed-off-by: Linus Walleij 

So FWIW:

Reviewed-by: Ulf Hansson 

Kind regards
Uffe

> ---
> ChangeLog v1->v2:
> - Add a quirk so that devices with zoned writes are forced
>   to use the deadline scheduler, this is necessary since only
>   that scheduler supports zoned writes.
> - There is a summary article in LWN for subscribers:
>   https://lwn.net/Articles/767987/
> ---
>  block/elevator.c | 22 ++
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/block/elevator.c b/block/elevator.c
> index 8fdcd64ae12e..6e6048ca3471 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -948,13 +948,16 @@ int elevator_switch_mq(struct request_queue *q,
>  }
>
>  /*
> - * For blk-mq devices, we default to using mq-deadline, if available, for 
> single
> - * queue devices.  If deadline isn't available OR we have multiple queues,
> - * default to "none".
> + * For blk-mq devices, we default to using:
> + * - "none" for multiqueue devices (nr_hw_queues != 1)
> + * - "bfq", if available, for single queue devices
> + * - "mq-deadline" if "bfq" is not available for single queue devices
> + * - "none" for single queue devices as well as last resort
>   */
>  int elevator_init_mq(struct request_queue *q)
>  {
> struct elevator_type *e;
> +   const char *policy;
> int err = 0;
>
> if (q->nr_hw_queues != 1)
> @@ -968,7 +971,18 @@ int elevator_init_mq(struct request_queue *q)
> if (unlikely(q->elevator))
> goto out_unlock;
>
> -   e = elevator_get(q, "mq-deadline", false);
> +   /*
> +* Zoned devices must use a deadline scheduler because currently
> +* that is the only scheduler respecting zoned writes.
> +*/
> +   if (blk_queue_is_zoned(q))
> +   policy = "mq-deadline";
> +   else if (IS_ENABLED(CONFIG_IOSCHED_BFQ))
> +   policy = "bfq";
> +   else
> +   policy = "mq-deadline";
> +
> +   e = elevator_get(q, policy, false);
> if (!e)
> goto out_unlock;
>
> --
> 2.17.2
>


Re: [PATCH 0/15 v2] loop: Fix oops and possible deadlocks

2018-10-16 Thread Jan Kara
On Wed 10-10-18 14:28:09, Jan Kara wrote:
> On Wed 10-10-18 13:42:27, Johannes Thumshirn wrote:
> > On Wed, Oct 10, 2018 at 07:19:00PM +0900, Tetsuo Handa wrote:
> > > On 2018/10/10 19:04, Jan Kara wrote:
> > > > Hi,
> > > > 
> > > > this patch series fixes oops and possible deadlocks as reported by 
> > > > syzbot [1]
> > > > [2]. The second patch in the series (from Tetsuo) fixes the oops, the 
> > > > remaining
> > > > patches are cleaning up the locking in the loop driver so that we can 
> > > > in the
> > > > end reasonably easily switch to rereading partitions without holding 
> > > > mutex
> > > > protecting the loop device.
> > > > 
> > > > I have lightly tested the patches by creating, deleting, and modifying 
> > > > loop
> > > > devices but if there's some more comprehensive loopback device 
> > > > testsuite, I
> > > > can try running it. Review is welcome!
> > > 
> > > Testing on linux-next by syzbot will be the most comprehensive. ;-)
> > 
> > Apart from that blktests has a loop category and I think it could also be
> > worthwhile to add the C reproducer from syzkaller to blktests.
> 
> Yeah, I did run loop tests now and they ran fine. I can try converting the
> syzbot reproducers into something legible but it will take a while.

So I took a stab at this. But I hit two issues:

1) For the reproducer triggering the lockdep warning, you need a 32-bit
binary (so that it uses compat_ioctl). I don't think we want to introduce
32-bit devel environment dependency to blktests. With 64-bits, the problem
is also there but someone noticed and silenced lockdep (with a reason that
I consider is incorrect)... I think the test is still worth it though as
I'll remove the lockdep-fooling code in my patches and thus new breakage
will be noticed.

2) For the oops (use-after-free) issue I was not able to reproduce that in
my test KVM in couple hours. The race window is rather narrow and syzbot
with KASAN and everything hit it only 11 times. So I'm not sure how useful
that test is. Any opinions?

Honza

-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 0/15 v2] loop: Fix oops and possible deadlocks

2018-10-16 Thread Johannes Thumshirn
On 16/10/18 13:36, Jan Kara wrote:
[...]
> 2) For the oops (use-after-free) issue I was not able to reproduce that in
> my test KVM in couple hours. The race window is rather narrow and syzbot
> with KASAN and everything hit it only 11 times. So I'm not sure how useful
> that test is. Any opinions?

Personally I think a test that has varying outcomes depending on how
often you run it (just to hit the race) isn't really suitable for a
suite like blktests.

But that's my personal opinion only, Omar what's your opinion here?

-- 
Johannes ThumshirnSUSE Labs
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH, RFC] ubd: remove use of blk_rq_map_sg

2018-10-16 Thread Jens Axboe
On 10/16/18 2:38 AM, Richard Weinberger wrote:
> Am Dienstag, 16. Oktober 2018, 04:19:51 CEST schrieb Jens Axboe:
>> On 10/15/18 4:44 PM, Richard Weinberger wrote:
>>> Am Dienstag, 16. Oktober 2018, 00:04:20 CEST schrieb Jens Axboe:
 On 10/15/18 3:46 PM, Richard Weinberger wrote:
> Am Montag, 15. Oktober 2018, 22:55:29 CEST schrieb Christoph Hellwig:
>> On Mon, Oct 15, 2018 at 10:42:47PM +0200, Richard Weinberger wrote:
 Sadly not. I'm checking now what exactly is broken.
>>>
>>> I take this back. Christoph's fixup makes reading work.
>>> The previous version corrupted my test block device in interesting ways
>>> and confused all tests.
>>> But the removal of blk_rq_map_sg() still has issues.
>>> Now the device blocks endless upon flush.
>>
>> I suspect we still need to special case flush.  Updated patch below
>> including your other suggestion:
>
> While playing further with the patch I managed to hit
> BUG_ON(blk_queued_rq(rq)) in blk_mq_requeue_request().
>
> UML requeues the request in ubd_queue_one_vec() if it was not able
> to submit the request to the host io-thread.
> The fd can return -EAGAIN, then UML has to try later.
>
> Isn't this allowed in that context?

 It is, the problem is that queue_one_vec() doesn't always return an
 error. The caller is doing a loop per bio, so we can encounter an
 error, requeue, and then the caller will call us again. We're in
 an illegal state at that point, and the next requeue will make that
 obvious since it's already pending. Actually, both the caller and
 ubd_queue_one_vec() also requeue. So it's a bit of a mess, the below
 might help.
>>>
>>> I agree, the driver *is* a mess.
>>> Unless someone else volunteers to clean it up, I'll push that task on
>>> my never ending TODO list.
>>
>> I doubt you'll have to fight anyone for that task ;-)
>>
>>> Thanks for your hint with the illegal state.
>>> Now with correct requeuing the driver seems to work fine!
>>> Write/Flush support also suffered from that but didn't trigger the 
>>> BUG_ON()...
>>
>> OK good, at least we're making progress!
> 
> Yes. Shall I send a patch with your suggestion or will you?

Christoph should just fold it in, the bug only exists after his
change to it.

> I have one more question, in your first conversion you set queue_depth to 2.
> How does one know this value?
> My conversion has 64, which is more or less an educated guess... ;)

64 is most likely just fine. Some drivers rely on having 1 in flight and
it's easier to manage to just let blk-mq take care of that. Outside of that,
there aren't any magic values. We should probably just use BLKDEV_MAX_RQ
for ones that don't have a specific hw limit or need.

-- 
Jens Axboe



Re: [PATCH v2] block: BFQ default for single queue devices

2018-10-16 Thread Federico Motta
On 10/15/18 5:02 PM, Bart Van Assche wrote:
> On Mon, 2018-10-15 at 16:10 +0200, Linus Walleij wrote:
>> + * For blk-mq devices, we default to using:
>> + * - "none" for multiqueue devices (nr_hw_queues != 1)
>> + * - "bfq", if available, for single queue devices
>> + * - "mq-deadline" if "bfq" is not available for single queue devices
>> + * - "none" for single queue devices as well as last resort
> 
> For SATA SSDs nr_hw_queues == 1 so this patch will also affect these SSDs.
> Since this patch is an attempt to improve performance, I'd like to see
> measurement data for one or more recent SATA SSDs before a decision is
> taken about what to do with this patch. 
> 
> Thanks,
> 
> Bart.
> 

Hi,
although these tests should be run for single-queue devices, I tried to
run them on an NVMe high-performance device. Imho if results are good
in such a "difficult to deal with" multi-queue device, they should be
good enough also in a "simpler" single-queue storage device..

Testbed specs:
kernel = 4.18.0 (from bfq dev branch [1], where bfq already contains
 also the commits that will be available from 4.20)
fs = ext4
drive  = ssd samsung 960 pro NVMe m.2 512gb

Device data sheet specs state that under random IO:
* QD  1 thread 1
  * read  = 14 kIOPS
  * write = 50 kIOPS
* QD 32 thread 4
  * read = write = 330 kIOPS

What follows is a results summary; under requests I can give all
results. The workload notation (e.g. 5r5w-seq) means:
- num_readers  (5r)
- num_writers  (5w)
- sequential_io or random_io   (-seq)


# replayed gnome-terminal startup time (lower is better)
workload  bfq-mq [s]  none [s]  % gain
  --    --
 10r-seq0.3725  2.79 86.65
5r5w-seq0.9725  5.53 82.41

# throughput (higher is better)
workload   bfq-mq [mb/s]  none [mb/s]   % gain
-  -  ---  ---
 10r-rand   394.806  429.735-8.128
 10r-seq   1387.63  1431.81 -3.086
  1r-seq838.13   798.872 4.914
5r5w-rand  1118.12  1297.46-13.822
5r5w-seq   1187 1313.8  -9.651

Thanks,
Federico

[1] https://github.com/Algodev-github/bfq-mq/commits/bfq-mq