Bug 109761 FreeBSD UFS2 file systems are not recognized on MS DOS (MBR) partitions

2017-05-16 Thread Richard
To allow FreeBSD filesystems to be recognized on Linux,
I submitted the following bug and patch in December 2015.

  https://bugzilla.kernel.org/show_bug.cgi?id=109761

The patch still works on current kernels (4.4.68 4.9.28 4.12.1-rc1)

Richard Narron
rich...@aaazen.com
(comet.berke...@gmail.com)


Re: [PATCH 16/25] fuse: Convert to separately allocated bdi

2017-05-16 Thread Rakesh Pandit
On Tue, May 16, 2017 at 12:48:31PM +0200, Jan Kara wrote:
> On Mon 15-05-17 23:34:00, Rakesh Pandit wrote:
> > Hi Jan, Miklos,
> > 
> > On Wed, Apr 12, 2017 at 12:24:40PM +0200, Jan Kara wrote:
> > > Allocate struct backing_dev_info separately instead of embedding it
> > > inside the superblock. This unifies handling of bdi among users.
> > >

> > 
> > ...
> > 
> > >  static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
> > >  {
> > >   int err;
> > > + char *suffix = "";
> > >  
> > > - fc->bdi.name = "fuse";
> > > - fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
> > > - /* fuse does it's own writeback accounting */
> > > - fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
> > > -
> > > - err = bdi_init(>bdi);
> > > + if (sb->s_bdev)
> > > + suffix = "-fuseblk";
> > > + err = super_setup_bdi_name(sb, "%u:%u%s", MAJOR(fc->dev),
> > > +MINOR(fc->dev), suffix);
> > >   if (err)
> > >   return err;
> > >
> > 
> > This call to super_setup_bdi_name would only work with "fuse" but not
> > with "fuseblk" as mounting a block device in userspace triggers
> > mount_bdev call which results in set_bdev_super taking a reference
> > from block device's BDI.  But super_setup_bdi_name allocates a new bdi
> > and ignores the already existing reference which triggers:
> > 
> > WARN_ON(sb->s_bdi != _backing_dev_info);
> > 
> > as sb->s_bdi already has a reference from set_bdev_super.  This works
> > for "fuse" (without a blocking device) for obvious reasons.  I can
> > reproduce this on -rc1 and also found a report on lkml:
> > https://lkml.org/lkml/2017/5/2/445
> > 
> > Only sane solution seems to be maintaining a private bdi instace just
> > for fuseblk and let fuse use the common new infrastructure.
> 
> Thanks for analysis! Does the attached patch fix the warning for you?
> 

Yes, tested. Feel free to add:
Tested-by: Rakesh Pandit 

> From 5b0cfc37b45670a35228c96cbaee2b99cd3d447c Mon Sep 17 00:00:00 2001
> From: Jan Kara 
> Date: Tue, 16 May 2017 12:22:22 +0200
> Subject: [PATCH] fuseblk: Fix warning in super_setup_bdi_name()
> 
> Commit 5f7f7543f52e "fuse: Convert to separately allocated bdi" didn't
> properly handle fuseblk filesystem. When fuse_bdi_init() is called for
> that filesystem type, sb->s_bdi is already initialized (by
> set_bdev_super()) to point to block device's bdi and consequently
> super_setup_bdi_name() complains about this fact when reseting bdi to
> the private one.
> 
> Fix the problem by properly dropping bdi reference in fuse_bdi_init()
> before creating a private bdi in super_setup_bdi_name().
> 
> Fixes: 5f7f7543f52eee03ed35c9d671fbb1cdbd4bc9b5
> Reported-by: Rakesh Pandit 
> Signed-off-by: Jan Kara 
> ---
>  fs/fuse/inode.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 5a1b58f8fef4..65c88379a3a1 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -975,8 +975,15 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct 
> super_block *sb)
>   int err;
>   char *suffix = "";
>  
> - if (sb->s_bdev)
> + if (sb->s_bdev) {
>   suffix = "-fuseblk";
> + /*
> +  * sb->s_bdi points to blkdev's bdi however we want to redirect
> +  * it to our private bdi...
> +  */
> + bdi_put(sb->s_bdi);
> + sb->s_bdi = _backing_dev_info;
> + }
>   err = super_setup_bdi_name(sb, "%u:%u%s", MAJOR(fc->dev),
>  MINOR(fc->dev), suffix);
>   if (err)
> -- 
> 2.12.0
> 



Re: [RFC PATCH 3/3] mmc: core: Allow mmc block device to re-claim the host

2017-05-16 Thread Ulf Hansson
On 16 May 2017 at 15:24, Adrian Hunter  wrote:
> On 15/05/17 17:05, Ulf Hansson wrote:
>> On 12 May 2017 at 10:36, Adrian Hunter  wrote:
>>> On 11/05/17 15:39, Ulf Hansson wrote:
 The current mmc block device implementation is tricky when it comes to
 claim and release of the host, while processing I/O requests. In principle
 we need to claim the host at the first request entering the queue and then
 we need to release the host, as soon as the queue becomes empty. This
 complexity relates to the asynchronous request mechanism that the mmc block
 device driver implements.

 For the legacy block interface that we currently implements, the above
 issue can be addressed, as we can find out when the queue really becomes
 empty.

 However, to find out whether the queue is empty, isn't really an applicable
 method when using the new blk-mq interface, as requests are instead pushed
 to us via the struct struct blk_mq_ops and its function pointers.
>>>
>>> That is not entirely true.  We can pull requests by running the queue i.e.
>>> blk_mq_run_hw_queues(q, false), returning BLK_MQ_RQ_QUEUE_BUSY and stopping
>>> / starting the queue as needed.
>>
>> I am not sure how that would work. It doesn't sound very effective to
>> me, but I may be wrong.
>
> The queue depth is not the arbiter of whether we can issue a request.  That
> means there will certainly be times when we have to return
> BLK_MQ_RQ_QUEUE_BUSY from ->queue_rq() and perhaps stop the queue as well.
>
> We could start with ->queue_rq() feeding every request to the existing
> thread and work towards having it submit requests immediately when possible.
>  Currently mmc core cannot submit mmc_requests without waiting, but the
> command queue implementation can for read/write requests when the host
> controller and card are runtime resumed and the card is switched to the
> correct internal partition (and we are not currently discarding flushing or
> recovering), so it might be simpler to start with cmdq ;-)

In the end I guess the only thing to do is to compare the patchsets to
see how the result would look like. :-)

My current observation is that our current implementation of the mmc
block device and corresponding mmc queue, is still rather messy, even
if you and Linus recently has worked hard to improve the situation.

Moreover it looks quite different compared to other block device
drivers and in the way of striving to make it more robust and
maintainable, that's not good.

Therefore, I am not really comfortable with replacing one mmc hack for
block device management with yet another, as that seems to be what
your approach would do - unless I am mistaken of course.

Instead I would like us to move into a more generic blk device
approach. Whatever that means. :-)

>
>>
>>>
>>> But, as I have written before, we could start with the most trivial
>>> implementation.  ->queue_rq() puts the requests in a list and then the
>>> thread removes them from the list.
>>
>> That would work...
>>
>>>
>>> That would be a good start because it would avoid having to deal with other
>>> issues at the same time.
>>
>> ...however this doesn't seem like a step in the direction we want to
>> take when porting to blkmq.
>>
>> There will be an extra context switch for each an every request, won't there?
>
> No, for synchronous requests, it would be the same as now. ->queue_rq()
> would be called in the context of the submitter and would wake the thread
> just like ->request_fn() does now.

You are right!

However, in my comparison I was thinking of how it *can* work if we
were able to submit/prepare request in the context of the caller.

>
>>
>> My point is, to be able to convert to blkmq, we must also avoid
>> performance regression - at leas as long as possible.
>
> It would still be better to start simple, and measure the performance, than
> to guess where the bottlenecks are.

Yes, starting simple is always good!

Although, even if simple, we need to stop adding more mmc specific
hacks into the mmc block device layer.

[...]

Kind regards
Uffe


Re: [RFC PATCH 0/2] Introduce blkdev_issue_flush_no_wait()

2017-05-16 Thread Bart Van Assche
On Tue, 2017-05-16 at 17:39 +0800, Anand Jain wrote:
> BTRFS wanted a block device flush function which does not wait for
> its completion, so that the flush for the next device can be called
> in the same thread.
> 
> Here is a RFC patch to provide the function
> 'blkdev_issue_flush_no_wait()', which is based on the current device
> flush function 'blkdev_issue_flush()', however it uses submit_bio()
> instead of submit_bio_wait().
> 
> This patch is for review comments, will send out a final patch based
> on the comments received.

Hello Anand,

Since the block layer can reorder requests, I think using
blkdev_issue_flush_no_wait() will only yield the intended result if
the caller waits until the requests that have to be flushed have completed.
Is that how you intend to use this function?

Thanks,

Bart.

Re: Playing with BFQ

2017-05-16 Thread Sedat Dilek
[...]

 What about those two (Kconfig) patches which is in your current
 bfq-4.11.y patchset.

>>>
>>> I'm not sure I fully understand the purpose of the two patches you
>>> propose (in your following emails).  The first patch seems to move BFQ
>>> config options to a different position in Kconfig.iosched, but the
>>> position of those items should be irrelevant.  Am I missing something?
>>> The second patch seems to have to do with configuration bits of bfq
>>> for blk, yet such a bfq version is not available in mainline.
>>>
>>> In any case, for possible new submissions, you should inline your
>>> patches.  For complete instructions on submitting patches, have a look
>>> at Documentation/process/submitting-patches
>>>
>>
>> All my testings was done with your patchset against Linux v4.11.y.
>> You have the same kconfig/kbuild stuff in your 0001 patch [1], so :-)?
>> What you have in 0001 is missing in Linux v4.12-rc1.
>> Not sure if this is intended.
>
> 4.12-rc1 contains bfq for blk-mq, while the patch you mention is for
> bfq for blk (never accepted in mainline).  Maybe you could get a
> clearer idea by having a look at the commits that add bfq (for blk-mq)
> in 4.12-rc1.
>

I recall, Markus pointed me to that difference between the BFQ implementations.
For testing purposes: You want people to test BFQ in Linux v4.12?

- Sedat -

> Hope this helps,
> Paolo
>
>> I will re-submit and add a "4.12" in the subject-line when I am at home.
>>
>> - Sedat -
>>
>> [1] 
>> http://algo.ing.unimo.it/people/paolo/disk_sched/patches/4.11.0-v8r11/0001-block-cgroups-kconfig-build-bits-for-BFQ-v7r11-4.11..patch
>


Re: [PATCH 16/25] fuse: Convert to separately allocated bdi

2017-05-16 Thread Jan Kara
On Mon 15-05-17 23:34:00, Rakesh Pandit wrote:
> Hi Jan, Miklos,
> 
> On Wed, Apr 12, 2017 at 12:24:40PM +0200, Jan Kara wrote:
> > Allocate struct backing_dev_info separately instead of embedding it
> > inside the superblock. This unifies handling of bdi among users.
> >
> > CC: Miklos Szeredi 
> > CC: linux-fsde...@vger.kernel.org
> > Acked-by: Miklos Szeredi 
> > Reviewed-by: Christoph Hellwig 
> > Signed-off-by: Jan Kara 
> 
> ...
> 
> >  static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
> >  {
> > int err;
> > +   char *suffix = "";
> >  
> > -   fc->bdi.name = "fuse";
> > -   fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
> > -   /* fuse does it's own writeback accounting */
> > -   fc->bdi.capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
> > -
> > -   err = bdi_init(>bdi);
> > +   if (sb->s_bdev)
> > +   suffix = "-fuseblk";
> > +   err = super_setup_bdi_name(sb, "%u:%u%s", MAJOR(fc->dev),
> > +  MINOR(fc->dev), suffix);
> > if (err)
> > return err;
> >
> 
> This call to super_setup_bdi_name would only work with "fuse" but not
> with "fuseblk" as mounting a block device in userspace triggers
> mount_bdev call which results in set_bdev_super taking a reference
> from block device's BDI.  But super_setup_bdi_name allocates a new bdi
> and ignores the already existing reference which triggers:
> 
> WARN_ON(sb->s_bdi != _backing_dev_info);
> 
> as sb->s_bdi already has a reference from set_bdev_super.  This works
> for "fuse" (without a blocking device) for obvious reasons.  I can
> reproduce this on -rc1 and also found a report on lkml:
> https://lkml.org/lkml/2017/5/2/445
> 
> Only sane solution seems to be maintaining a private bdi instace just
> for fuseblk and let fuse use the common new infrastructure.

Thanks for analysis! Does the attached patch fix the warning for you?

Honza
-- 
Jan Kara 
SUSE Labs, CR
>From 5b0cfc37b45670a35228c96cbaee2b99cd3d447c Mon Sep 17 00:00:00 2001
From: Jan Kara 
Date: Tue, 16 May 2017 12:22:22 +0200
Subject: [PATCH] fuseblk: Fix warning in super_setup_bdi_name()

Commit 5f7f7543f52e "fuse: Convert to separately allocated bdi" didn't
properly handle fuseblk filesystem. When fuse_bdi_init() is called for
that filesystem type, sb->s_bdi is already initialized (by
set_bdev_super()) to point to block device's bdi and consequently
super_setup_bdi_name() complains about this fact when reseting bdi to
the private one.

Fix the problem by properly dropping bdi reference in fuse_bdi_init()
before creating a private bdi in super_setup_bdi_name().

Fixes: 5f7f7543f52eee03ed35c9d671fbb1cdbd4bc9b5
Reported-by: Rakesh Pandit 
Signed-off-by: Jan Kara 
---
 fs/fuse/inode.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 5a1b58f8fef4..65c88379a3a1 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -975,8 +975,15 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
 	int err;
 	char *suffix = "";
 
-	if (sb->s_bdev)
+	if (sb->s_bdev) {
 		suffix = "-fuseblk";
+		/*
+		 * sb->s_bdi points to blkdev's bdi however we want to redirect
+		 * it to our private bdi...
+		 */
+		bdi_put(sb->s_bdi);
+		sb->s_bdi = _backing_dev_info;
+	}
 	err = super_setup_bdi_name(sb, "%u:%u%s", MAJOR(fc->dev),
    MINOR(fc->dev), suffix);
 	if (err)
-- 
2.12.0



[PATCH 2/2] btrfs: Use blkdev_issue_flush_no_wait()

2017-05-16 Thread Anand Jain
Signed-off-by: Anand Jain 
---
 fs/btrfs/disk-io.c | 108 -
 fs/btrfs/volumes.h |   2 +-
 2 files changed, 33 insertions(+), 77 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8685d67185d0..f059f9bdbbd7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3478,121 +3478,77 @@ static int write_dev_supers(struct btrfs_device 
*device,
 }
 
 /*
- * endio for the write_dev_flush, this will wake anyone waiting
+ * endio for blkdev_issues_flush_no_wait, this will wake anyone waiting
  * for the barrier when it is done
  */
 static void btrfs_end_empty_barrier(struct bio *bio)
 {
-   if (bio->bi_private)
-   complete(bio->bi_private);
+   struct btrfs_device *device;
+
+   device = container_of(bio->bi_private, struct btrfs_device,
+   flush_wait);
+
+   device->last_flush_error = bio->bi_error;
+   complete(>flush_wait);
+
bio_put(bio);
 }
 
-/*
- * trigger flushes for one the devices.  If you pass wait == 0, the flushes are
- * sent down.  With wait == 1, it waits for the previous flush.
- *
- * any device where the flush fails with eopnotsupp are flagged as not-barrier
- * capable
- */
-static int write_dev_flush(struct btrfs_device *device, int wait)
+static void write_dev_flush(struct btrfs_device *device)
 {
+   int ret;
struct request_queue *q = bdev_get_queue(device->bdev);
-   struct bio *bio;
-   int ret = 0;
 
if (!test_bit(QUEUE_FLAG_WC, >queue_flags))
-   return 0;
-
-   if (wait) {
-   bio = device->flush_bio;
-   if (!bio)
-   return 0;
-
-   wait_for_completion(>flush_wait);
-
-   if (bio->bi_error) {
-   ret = bio->bi_error;
-   btrfs_dev_stat_inc_and_print(device,
-   BTRFS_DEV_STAT_FLUSH_ERRS);
-   }
-
-   /* drop the reference from the wait == 0 run */
-   bio_put(bio);
-   device->flush_bio = NULL;
-
-   return ret;
-   }
-
-   /*
-* one reference for us, and we leave it for the
-* caller
-*/
-   device->flush_bio = NULL;
-   bio = btrfs_io_bio_alloc(GFP_NOFS, 0);
-   if (!bio)
-   return -ENOMEM;
-
-   bio->bi_end_io = btrfs_end_empty_barrier;
-   bio->bi_bdev = device->bdev;
-   bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
-   init_completion(>flush_wait);
-   bio->bi_private = >flush_wait;
-   device->flush_bio = bio;
-
-   bio_get(bio);
-   btrfsic_submit_bio(bio);
+   return;
 
-   return 0;
+   ret = blkdev_issue_flush_no_wait(device->bdev, GFP_NOFS,
+   btrfs_end_empty_barrier, >flush_wait);
+   if (ret)
+   device->last_flush_error = ret;
+   else
+   device->last_flush_error = -1;
 }
 
 /*
- * send an empty flush down to each device in parallel,
- * then wait for them
+ * send flush down to each device in parallel, then wait for them.
  */
 static int barrier_all_devices(struct btrfs_fs_info *info)
 {
struct list_head *head;
struct btrfs_device *dev;
-   int errors_send = 0;
int errors_wait = 0;
-   int ret;
 
/* send down all the barriers */
head = >fs_devices->devices;
list_for_each_entry_rcu(dev, head, dev_list) {
+   dev->last_flush_error = 0;
if (dev->missing)
continue;
+
+   if (!dev->in_fs_metadata || !dev->writeable)
+   continue;
+
if (!dev->bdev) {
-   errors_send++;
+   dev->last_flush_error = -ENXIO;
continue;
}
-   if (!dev->in_fs_metadata || !dev->writeable)
-   continue;
 
-   ret = write_dev_flush(dev, 0);
-   if (ret)
-   errors_send++;
+   write_dev_flush(dev);
}
 
/* wait for all the barriers */
list_for_each_entry_rcu(dev, head, dev_list) {
-   if (dev->missing)
-   continue;
-   if (!dev->bdev) {
-   errors_wait++;
-   continue;
-   }
-   if (!dev->in_fs_metadata || !dev->writeable)
-   continue;
+   if (dev->last_flush_error == -1)
+   wait_for_completion(>flush_wait);
 
-   ret = write_dev_flush(dev, 1);
-   if (ret)
+   if (dev->last_flush_error)
errors_wait++;
}
-   if (errors_send > info->num_tolerated_disk_barrier_failures ||
-   errors_wait > info->num_tolerated_disk_barrier_failures)
+
+   if 

[PATCH 1/2] block: Introduce blkdev_issue_flush_no_wait()

2017-05-16 Thread Anand Jain
blkdev_issue_flush() is a blocking function and returns only after
the flush bio is completed, so a module handling more than one
device can't issue flush for all the devices unless it uses worker
thread.

This patch adds a new function blkdev_issue_flush_no_wait(), which
uses submit_bio() instead of submit_bio_wait(), and accepts the
completion function and data from the caller.

Signed-off-by: Anand Jain 
---
 block/blk-flush.c  | 47 +++
 include/linux/blkdev.h |  8 
 2 files changed, 55 insertions(+)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index c4e0880b54bb..16b9c61f4cd3 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -488,6 +488,53 @@ void blk_insert_flush(struct request *rq)
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
 }
 
+/*
+ * blkdev_issue_flush_no_wait - Issue a flush for the given block device
+ * @bdev:   blockdev to issue flush for
+ * @gfp_mask:   memory allocation flags (for bio_alloc)
+ * @endio_fn:   completion function for the flush bio
+ * @data:   flush bio private data
+ *
+ * Description:
+ *Issue a flush for the block device in question. Caller must provide
+ *the completion function and the private data, to be called when the
+ *issued flush bio completes.
+ */
+int blkdev_issue_flush_no_wait(struct block_device *bdev, gfp_t gfp_mask,
+   bio_end_io_t *endio_fn, void *data)
+{
+   struct request_queue *q;
+   struct bio *bio;
+
+   if (bdev->bd_disk == NULL)
+   return -ENXIO;
+
+   q = bdev_get_queue(bdev);
+   if (!q)
+   return -ENXIO;
+
+   /*
+* some block devices may not have their queue correctly set up here
+* (e.g. loop device without a backing file) and so issuing a flush
+* here will panic. Ensure there is a request function before issuing
+* the flush.
+*/
+   if (!q->make_request_fn)
+   return -ENXIO;
+
+   bio = bio_alloc(gfp_mask, 0);
+   bio->bi_bdev = bdev;
+   bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
+   bio->bi_end_io = endio_fn;
+   bio->bi_private = data;
+
+   init_completion(data);
+   submit_bio(bio);
+
+   return 0;
+}
+EXPORT_SYMBOL(blkdev_issue_flush_no_wait);
+
 /**
  * blkdev_issue_flush - queue a flush
  * @bdev:  blockdev to issue flush for
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b5d1e27631ee..5d2b62239251 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1322,6 +1322,8 @@ static inline struct request 
*blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 }
 
 extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
+extern int blkdev_issue_flush_no_wait(struct block_device *bdev,
+   gfp_t gfp_mask, bio_end_io_t *endio_fn, void *data);
 extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct page *page);
 
@@ -1994,6 +1996,12 @@ static inline int blkdev_issue_flush(struct block_device 
*bdev, gfp_t gfp_mask,
return 0;
 }
 
+static inline int blkdev_issue_flush_no_wait(struct block_device *bdev,
+   gfp_t gfp_mask, bio_end_io_t *endio_fn, void *data)
+{
+   return 0;
+}
+
 #endif /* CONFIG_BLOCK */
 
 #endif
-- 
2.10.0



[RFC PATCH 0/2] Introduce blkdev_issue_flush_no_wait()

2017-05-16 Thread Anand Jain
BTRFS wanted a block device flush function which does not wait for
its completion, so that the flush for the next device can be called
in the same thread.

Here is a RFC patch to provide the function
'blkdev_issue_flush_no_wait()', which is based on the current device
flush function 'blkdev_issue_flush()', however it uses submit_bio()
instead of submit_bio_wait().

This patch is for review comments, will send out a final patch based
on the comments received.

Thanks, Anand

Anand Jain (2):
  block: Introduce blkdev_issue_flush_no_wait()
  btrfs: Use blkdev_issue_flush_no_wait()

 block/blk-flush.c  |  47 +
 fs/btrfs/disk-io.c | 108 +++--
 fs/btrfs/volumes.h |   2 +-
 include/linux/blkdev.h |   8 
 4 files changed, 88 insertions(+), 77 deletions(-)

-- 
2.10.0



Re: [PATCH 4/5] mmc: block: move single ioctl() commands to block requests

2017-05-16 Thread Ulf Hansson
On 10 May 2017 at 10:24, Linus Walleij  wrote:
> This wraps single ioctl() commands into block requests using
> the custom block layer request types REQ_OP_DRV_IN and
> REQ_OP_DRV_OUT.
>
> By doing this we are loosening the grip on the big host lock,
> since two calls to mmc_get_card()/mmc_put_card() are removed.
>
> We are storing the ioctl() in/out argument as a pointer in
> the per-request struct mmc_blk_request container. Since we
> now let the block layer allocate this data, blk_get_request()
> will allocate it for us and we can immediately dereference
> it and use it to pass the argument into the block layer.
>
> Tested on the ux500 with the userspace:
> mmc extcsd read /dev/mmcblk3
> resulting in a successful EXTCSD info dump back to the
> console.
>
> Signed-off-by: Linus Walleij 
> ---
>  drivers/mmc/core/block.c | 56 
> ++--

[...]

> @@ -1854,7 +1882,13 @@ void mmc_blk_issue_rq(struct mmc_queue *mq, struct 
> request *req)
> goto out;
> }
>
> -   if (req && req_op(req) == REQ_OP_DISCARD) {
> +   if (req &&
> +   (req_op(req) == REQ_OP_DRV_IN || req_op(req) == REQ_OP_DRV_OUT)) {
> +   /* complete ongoing async transfer before issuing ioctl()s */
> +   if (mq->qcnt)
> +   mmc_blk_issue_rw_rq(mq, NULL);
> +   mmc_blk_ioctl_cmd_issue(mq, req);
> +   } else if (req && req_op(req) == REQ_OP_DISCARD) {

While you are at it, would you mind converting this if-else-if to a
switch clause instead?

[...]

Kind regards
Uffe


Re: [PATCH 2/5] mmc: core: Allocate per-request data using the block layer core

2017-05-16 Thread Ulf Hansson
On 10 May 2017 at 10:24, Linus Walleij  wrote:
> The mmc_queue_req is a per-request state container the MMC core uses
> to carry bounce buffers, pointers to asynchronous requests and so on.
> Currently allocated as a static array of objects, then as a request
> comes in, a mmc_queue_req is assigned to it, and used during the
> lifetime of the request.
>
> This is backwards compared to how other block layer drivers work:
> they usally let the block core provide a per-request struct that get
> allocated right beind the struct request, and which can be obtained
> using the blk_mq_rq_to_pdu() helper. (The _mq_ infix in this function
> name is misleading: it is used by both the old and the MQ block
> layer.)
>
> The per-request struct gets allocated to the size stored in the queue
> variable .cmd_size initialized using the .init_rq_fn() and
> cleaned up using .exit_rq_fn().
>
> The block layer code makes the MMC core rely on this mechanism to
> allocate the per-request mmc_queue_req state container.
>
> Doing this make a lot of complicated queue handling go away. We only
> need to keep the .qnct that keeps count of how many request are
> currently being processed by the MMC layer. The MQ block layer will
> replace also this once we transition to it.
>
> Doing this refactoring is necessary to move the ioctl() operations
> into custom block layer requests tagged with REQ_OP_DRV_[IN|OUT]
> instead of the custom code using the BigMMCHostLock that we have
> today: those require that per-request data be obtainable easily from
> a request after creating a custom request with e.g.:
>
> struct request *rq = blk_get_request(q, REQ_OP_DRV_IN, __GFP_RECLAIM);
> struct mmc_queue_req *mq_rq = req_to_mq_rq(rq);
>
> And this is not possible with the current construction, as the request
> is not immediately assigned the per-request state container, but
> instead it gets assigned when the request finally enters the MMC
> queue, which is way too late for custom requests.
>
> Signed-off-by: Linus Walleij 
> ---
>  drivers/mmc/core/block.c |  38 ++--
>  drivers/mmc/core/queue.c | 222 
> +--
>  drivers/mmc/core/queue.h |  22 ++---
>  include/linux/mmc/card.h |   2 -
>  4 files changed, 80 insertions(+), 204 deletions(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 8273b078686d..be782b8d4a0d 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c

[...]

> @@ -1662,7 +1655,7 @@ static void mmc_blk_rw_try_restart(struct mmc_queue 
> *mq, struct request *req,
> if (mmc_card_removed(mq->card)) {
> req->rq_flags |= RQF_QUIET;
> blk_end_request_all(req, -EIO);
> -   mmc_queue_req_free(mq, mqrq);
> +   mq->qcnt--; /* FIXME: just set to 0? */

As mentioned below, perhaps this FIXME is fine to add. As I assume you
soon intend to take care of it, right?

[...]

> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 545466342fb1..65a8e0e63012 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c

[...]

> +/**
> + * mmc_init_request() - initialize the MMC-specific per-request data
> + * @q: the request queue
> + * @req: the request
> + * @gfp: memory allocation policy
> + */
> +static int mmc_init_request(struct request_queue *q, struct request *req,
> +   gfp_t gfp)
>  {
> -   int i;
> +   struct mmc_queue_req *mq_rq = req_to_mq_rq(req);
> +   struct mmc_queue *mq = q->queuedata;
> +   struct mmc_card *card = mq->card;
> +   struct mmc_host *host = card->host;
>
> -   for (i = 0; i < qdepth; i++) {
> -   mqrq[i].sg = mmc_alloc_sg(max_segs);
> -   if (!mqrq[i].sg)
> +   /* FIXME: use req_to_mq_rq() everywhere this is dereferenced */

Why not do that right now, instead of adding a FIXME comment?

> +   mq_rq->req = req;
> +
> +   if (card->bouncesz) {
> +   mq_rq->bounce_buf = kmalloc(card->bouncesz, gfp);
> +   if (!mq_rq->bounce_buf)
> +   return -ENOMEM;
> +   if (card->bouncesz > 512) {
> +   mq_rq->sg = mmc_alloc_sg(1, gfp);
> +   if (!mq_rq->sg)
> +   return -ENOMEM;
> +   mq_rq->bounce_sg = mmc_alloc_sg(card->bouncesz / 512,
> +   gfp);
> +   if (!mq_rq->bounce_sg)
> +   return -ENOMEM;
> +   }
> +   } else {
> +   mq_rq->bounce_buf = NULL;
> +   mq_rq->bounce_sg = NULL;
> +   mq_rq->sg = mmc_alloc_sg(host->max_segs, gfp);
> +   if (!mq_rq->sg)
> return -ENOMEM;
> }
>
> return 0;
>  }
>

[...]

> @@ -360,13 +248,21 @@ int mmc_init_queue(struct mmc_queue *mq, struct 

Re: Playing with BFQ

2017-05-16 Thread Paolo Valente

> Il giorno 13 mag 2017, alle ore 09:50, Sedat Dilek  ha 
> scritto:
> 
> On Wed, May 3, 2017 at 11:21 AM, Paolo Valente  
> wrote:
>> 
>>> Il giorno 03 mag 2017, alle ore 10:00, Sedat Dilek  
>>> ha scritto:
>>> 
>>> On Tue, May 2, 2017 at 2:16 PM, Markus Trippelsdorf
>>>  wrote:
 On 2017.05.02 at 14:07 +0200, Sedat Dilek wrote:
> On Tue, May 2, 2017 at 10:00 AM, Markus Trippelsdorf
>  wrote:
>> On 2017.05.02 at 09:54 +0200, Sedat Dilek wrote:
>>> Hi,
>>> 
>>> I want to play with BFQ.
>>> 
>>> My base is block-next as of 28-Apr-2017.
> [...]
>>> Not sure if the attached patches make sense (right now).
>> 
>> No, it doesn't make sense at all.
> 
> Hmm, I looked at 4.11.0-v8r11 and 0001 has exactly what my 2 patches do 
> :-).
 
 BFQ started as a conventional scheduler. But because mq is the way of
 the future it was ported before it was accepted into mainline.
 
>>> 
>>> I am still playing and want to do my own experiences with BFQ.
>>> 
>>> Not sure if FIO is a good testcase-tool here.
>>> 
>> 
>> If you want to perform a thorough benchmarking of also responsiveness
>> and latency for time-sensitive applications (such as video playing)
>> then you may want to use S [1].  It's rather rustic, do ask if you
>> encounter any difficulty.
>> 
>> [1] https://github.com/Algodev-github/S
>> 
> 
> Sorry for the delay.

Don't worry, I'm replying late too ...

> Currently, I am swittching from Ubuntu/precise 12.04 LTS (EOL) back to
> the Debian world.
> 
> The responsiveness is really bad when my mlocate cron-job, a git pull
> on linux.git and firefox runs parallel.

Thanks for reporting this issue.  I have a few considerations and
requests for information on it.

1) Two of the three sources of I/O you mention, namely mlocate update
and git pull, are doing writes.  As I already pointed in a few
occasions and places, intense write workloads trigger problems that an
I/O scheduler cannot solve.  In contrast, these problems *can* be
solved using BFQ.  In particular, I already have a prototype solution,
but I have't found support yet to turn it into a possible
production-level solution;  till a few days, ago, when I talked about
this with Goldwyn Rodrigues (in CC). He seems interested in having a
look at this solution, and possibly collaborating on it.

2) A web browser like Firefox can generate extremely varying
workloads; so, if you mentioned Firefox as one of the sources of I/O
in your unlucky situation, then it would be important to know what
Firefox was doing.

3) Even if BFQ cannot counteract problems occurring above its head, it
usually improves responsiveness even in heavy-write scenarios.  It
would then be interesting if you could compare responsiveness with the
other I/O schedulers (mq-deadline, Kyber) and with none too (make sure
that the I/O is really the same in all cases).


> This is with Linux v4.11.1-rc1 and BFQ patchset v4.11.0-v8r11.
> 
> My linux-config is attached.
> 
>>> So if MQ is the way why isn't the Kconfig called CONFIG_MQ_IOSCHED_BFQ
>>> according to CONFIG_MQ_IOSCHED_DEADLINE?
>>> 
>>> As we are talking about "*Storage* I/O schedulers" which of the MQ
>>> Kconfig make sense when using MQ_DEADLINE and (MQ_)BFQ?
>>> 
>>> # egrep -i 'bfq|deadline|_mq|mq_|_mq_' /boot/config-4.11.0-1-bfq-amd64
>>> CONFIG_POSIX_MQUEUE=y
>>> CONFIG_POSIX_MQUEUE_SYSCTL=y
>>> CONFIG_BLK_WBT_MQ=y
>>> CONFIG_BLK_MQ_PCI=y
>>> CONFIG_BLK_MQ_VIRTIO=y
>>> CONFIG_IOSCHED_DEADLINE=y
>>> CONFIG_IOSCHED_BFQ=y
>>> CONFIG_BFQ_GROUP_IOSCHED=y
>>> # CONFIG_DEFAULT_DEADLINE is not set
>>> CONFIG_DEFAULT_BFQ=y
>>> CONFIG_DEFAULT_IOSCHED="bfq"
>>> CONFIG_MQ_IOSCHED_DEADLINE=y
>>> # CONFIG_NET_SCH_MQPRIO is not set
>>> CONFIG_SCSI_MQ_DEFAULT=y
>>> # CONFIG_DM_MQ_DEFAULT is not set
>>> 
>> 
>> The config for BFQ seems correct.  For the others, it depends on what
>> scheduler you want.  If useful for you, the other two MQ- schedulers
>> are mq-deadline and cyber.
>> 
> 
> What about those two (Kconfig) patches which is in your current
> bfq-4.11.y patchset.
> 

I'm not sure I fully understand the purpose of the two patches you
propose (in your following emails).  The first patch seems to move BFQ
config options to a different position in Kconfig.iosched, but the
position of those items should be irrelevant.  Am I missing something?
The second patch seems to have to do with configuration bits of bfq
for blk, yet such a bfq version is not available in mainline.

In any case, for possible new submissions, you should inline your
patches.  For complete instructions on submitting patches, have a look
at Documentation/process/submitting-patches

Thanks,
Paolo


> Thanks Sedat.
>