Re: [PATCH 13/14] nvme: use the changed namespaces list log to clear ns data changed AENs

2018-06-04 Thread Christoph Hellwig
On Mon, Jun 04, 2018 at 01:59:09PM -0600, Keith Busch wrote:
> > Per section 5.2 we need to issue the corresponding log page to clear an
> > AEN, so for a namespace data changed AEN we need to read the changed
> > namespace list log.  And once we read that log anyway we might as well
> > use it to optimize the rescan.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> I'm a little concerned about this. Userspace might be reading the same
> log page. Since the contents of the page may change each time its read,
> it's possible the driver will see only a subset of changed namespaces
> at the point it gets to read the log page, missing a chance to
> revalidate others.

I agree with the concern, but I don't think it really is uniqueue here.
We allow userspace to send all kind sof harmful commands, even queue
for queue creation and deletion.

But let's assume we don't want to use the list due to this concern:
we'd still have to read the log page, as per the NVMe spec the only
think clearing a pending AEN is reading the associated log page.
We'd then need to still do the full scan using identify.  Is this what
we want?  If you think this is important for reliability we could
just ignore the log page.


Re: [PATCH 3/3] block: use bio_add_page in bio_iov_iter_get_pages

2018-06-04 Thread Ming Lei
On Tue, Jun 05, 2018 at 11:41:19AM +0800, Ming Lei wrote:
> On Mon, Jun 04, 2018 at 03:58:53PM +0200, Christoph Hellwig wrote:
> > Replace a nasty hack with a different nasty hack to prepare for multipage
> > bio_vecs.  By moving the temporary page array as far up as possible in
> > the space allocated for the bio_vec array we can iterate forward over it
> > and thus use bio_add_page.  Using bio_add_page means we'll be able to
> > merge physically contiguous pages once support for multipath bio_vecs is
> > merged.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  block/bio.c | 45 +
> >  1 file changed, 21 insertions(+), 24 deletions(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index d95fab72acb5..b09c133a1e24 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -871,6 +871,8 @@ int bio_add_page(struct bio *bio, struct page *page,
> >  }
> >  EXPORT_SYMBOL(bio_add_page);
> >  
> > +#define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *))
> > +
> >  /**
> >   * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
> >   * @bio: bio to add pages to
> > @@ -882,38 +884,33 @@ EXPORT_SYMBOL(bio_add_page);
> >  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> >  {
> > unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
> > +   unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
> > struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
> > struct page **pages = (struct page **)bv;
> > -   size_t offset, diff;
> > -   ssize_t size;
> > +   ssize_t size, left;
> > +   unsigned len, i;
> > +   size_t offset;
> > +
> > +   /*
> > +* Move page array up in the allocated memory for the bio vecs as
> > +* far as possible so that we can start filling biovecs from the
> > +* beginning without overwriting the temporary page array.
> > +*/
> > +   BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
> > +   pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
> >  
> > size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
> > if (unlikely(size <= 0))
> > return size ? size : -EFAULT;
> > -   nr_pages = (size + offset + PAGE_SIZE - 1) / PAGE_SIZE;
> >  
> > -   /*
> > -* Deep magic below:  We need to walk the pinned pages backwards
> > -* because we are abusing the space allocated for the bio_vecs
> > -* for the page array.  Because the bio_vecs are larger than the
> > -* page pointers by definition this will always work.  But it also
> > -* means we can't use bio_add_page, so any changes to it's semantics
> > -* need to be reflected here as well.
> > -*/
> > -   bio->bi_iter.bi_size += size;
> > -   bio->bi_vcnt += nr_pages;
> > -
> > -   diff = (nr_pages * PAGE_SIZE - offset) - size;
> > -   while (nr_pages--) {
> > -   bv[nr_pages].bv_page = pages[nr_pages];
> > -   bv[nr_pages].bv_len = PAGE_SIZE;
> > -   bv[nr_pages].bv_offset = 0;
> > -   }
> > +   for (left = size, i = 0; left > 0; left -= len, i++) {
> > +   struct page *page = pages[i];
> >  
> > -   bv[0].bv_offset += offset;
> > -   bv[0].bv_len -= offset;
> > -   if (diff)
> > -   bv[bio->bi_vcnt - 1].bv_len -= diff;
> > +   len = min_t(size_t, PAGE_SIZE - offset, left);
> > +   if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset) != len))
> > +   return -EINVAL;
> > +   offset = 0;
> > +   }
> 
> One invariant is that the page index 'i' is always <= bvec index

oops, the above "<=" should have been ">=".

Thanks,
Ming


Re: [PATCH 3/3] block: use bio_add_page in bio_iov_iter_get_pages

2018-06-04 Thread Ming Lei
On Mon, Jun 04, 2018 at 03:58:53PM +0200, Christoph Hellwig wrote:
> Replace a nasty hack with a different nasty hack to prepare for multipage
> bio_vecs.  By moving the temporary page array as far up as possible in
> the space allocated for the bio_vec array we can iterate forward over it
> and thus use bio_add_page.  Using bio_add_page means we'll be able to
> merge physically contiguous pages once support for multipath bio_vecs is
> merged.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/bio.c | 45 +
>  1 file changed, 21 insertions(+), 24 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index d95fab72acb5..b09c133a1e24 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -871,6 +871,8 @@ int bio_add_page(struct bio *bio, struct page *page,
>  }
>  EXPORT_SYMBOL(bio_add_page);
>  
> +#define PAGE_PTRS_PER_BVEC   (sizeof(struct bio_vec) / sizeof(struct page *))
> +
>  /**
>   * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
>   * @bio: bio to add pages to
> @@ -882,38 +884,33 @@ EXPORT_SYMBOL(bio_add_page);
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  {
>   unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
> + unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
>   struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
>   struct page **pages = (struct page **)bv;
> - size_t offset, diff;
> - ssize_t size;
> + ssize_t size, left;
> + unsigned len, i;
> + size_t offset;
> +
> + /*
> +  * Move page array up in the allocated memory for the bio vecs as
> +  * far as possible so that we can start filling biovecs from the
> +  * beginning without overwriting the temporary page array.
> +  */
> + BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
> + pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
>  
>   size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
>   if (unlikely(size <= 0))
>   return size ? size : -EFAULT;
> - nr_pages = (size + offset + PAGE_SIZE - 1) / PAGE_SIZE;
>  
> - /*
> -  * Deep magic below:  We need to walk the pinned pages backwards
> -  * because we are abusing the space allocated for the bio_vecs
> -  * for the page array.  Because the bio_vecs are larger than the
> -  * page pointers by definition this will always work.  But it also
> -  * means we can't use bio_add_page, so any changes to it's semantics
> -  * need to be reflected here as well.
> -  */
> - bio->bi_iter.bi_size += size;
> - bio->bi_vcnt += nr_pages;
> -
> - diff = (nr_pages * PAGE_SIZE - offset) - size;
> - while (nr_pages--) {
> - bv[nr_pages].bv_page = pages[nr_pages];
> - bv[nr_pages].bv_len = PAGE_SIZE;
> - bv[nr_pages].bv_offset = 0;
> - }
> + for (left = size, i = 0; left > 0; left -= len, i++) {
> + struct page *page = pages[i];
>  
> - bv[0].bv_offset += offset;
> - bv[0].bv_len -= offset;
> - if (diff)
> - bv[bio->bi_vcnt - 1].bv_len -= diff;
> + len = min_t(size_t, PAGE_SIZE - offset, left);
> + if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset) != len))
> + return -EINVAL;
> + offset = 0;
> + }

One invariant is that the page index 'i' is always <= bvec index
of the added page, and the two can only be same when adding the last
page.

So this approach is correct and looks smart & clean:

Reviewed-by: Ming Lei 

Thanks,
Ming


Re: [PATCH 2/3] block: bio_set_pages_dirty can't see NULL bv_page in a valid bio_vec

2018-06-04 Thread Ming Lei
On Mon, Jun 04, 2018 at 03:58:52PM +0200, Christoph Hellwig wrote:
> So don't bother handling it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/bio.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 50941c1c9118..d95fab72acb5 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1578,10 +1578,8 @@ void bio_set_pages_dirty(struct bio *bio)
>   int i;
>  
>   bio_for_each_segment_all(bvec, bio, i) {
> - struct page *page = bvec->bv_page;
> -
> - if (page && !PageCompound(page))
> - set_page_dirty_lock(page);
> + if (!PageCompound(bvec->bv_page))
> + set_page_dirty_lock(bvec->bv_page);
>   }
>  }

Looks reasonable:

Reviewed-by: Ming Lei 


Thanks,
Ming


Re: [PATCH 1/3] block: simplify bio_check_pages_dirty

2018-06-04 Thread Ming Lei
On Mon, Jun 04, 2018 at 03:58:51PM +0200, Christoph Hellwig wrote:
> bio_check_pages_dirty currently inviolates the invariant that bv_page of
> a bio_vec inside bi_vcnt shouldn't be zero, and that is going to become
> really annoying with multpath biovecs.  Fortunately there isn't any
> all that good reason for it - once we decide to defer freeing the bio
> to a workqueue holding onto a few additional pages isn't really an
> issue anymore.  So just check if there is a clean page that needs
> dirtying in the first path, and do a second pass to free them if there
> was none, while the cache is still hot.
> 
> Also use the chance to micro-optimize bio_dirty_fn a bit by not saving
> irq state - we know we are called from a workqueue.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/bio.c | 56 +---
>  1 file changed, 21 insertions(+), 35 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 53e0f0a1ed94..50941c1c9118 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1590,19 +1590,15 @@ static void bio_release_pages(struct bio *bio)
>   struct bio_vec *bvec;
>   int i;
>  
> - bio_for_each_segment_all(bvec, bio, i) {
> - struct page *page = bvec->bv_page;
> -
> - if (page)
> - put_page(page);
> - }
> + bio_for_each_segment_all(bvec, bio, i)
> + put_page(bvec->bv_page);
>  }
>  
>  /*
>   * bio_check_pages_dirty() will check that all the BIO's pages are still 
> dirty.
>   * If they are, then fine.  If, however, some pages are clean then they must
>   * have been written out during the direct-IO read.  So we take another ref 
> on
> - * the BIO and the offending pages and re-dirty the pages in process context.
> + * the BIO and re-dirty the pages in process context.
>   *
>   * It is expected that bio_check_pages_dirty() will wholly own the BIO from
>   * here on.  It will run one put_page() against each page and will run one
> @@ -1620,52 +1616,42 @@ static struct bio *bio_dirty_list;
>   */
>  static void bio_dirty_fn(struct work_struct *work)
>  {
> - unsigned long flags;
> - struct bio *bio;
> + struct bio *bio, *next;
>  
> - spin_lock_irqsave(_dirty_lock, flags);
> - bio = bio_dirty_list;
> + spin_lock_irq(_dirty_lock);
> + next = bio_dirty_list;
>   bio_dirty_list = NULL;
> - spin_unlock_irqrestore(_dirty_lock, flags);
> + spin_unlock_irq(_dirty_lock);
>  
> - while (bio) {
> - struct bio *next = bio->bi_private;
> + while ((bio = next) != NULL) {
> + next = bio->bi_private;
>  
>   bio_set_pages_dirty(bio);
>   bio_release_pages(bio);
>   bio_put(bio);
> - bio = next;
>   }
>  }
>  
>  void bio_check_pages_dirty(struct bio *bio)
>  {
>   struct bio_vec *bvec;
> - int nr_clean_pages = 0;
> + unsigned long flags;
>   int i;
>  
>   bio_for_each_segment_all(bvec, bio, i) {
> - struct page *page = bvec->bv_page;
> -
> - if (PageDirty(page) || PageCompound(page)) {
> - put_page(page);
> - bvec->bv_page = NULL;
> - } else {
> - nr_clean_pages++;
> - }
> + if (!PageDirty(bvec->bv_page) && !PageCompound(bvec->bv_page))
> + goto defer;
>   }
>  
> - if (nr_clean_pages) {
> - unsigned long flags;
> -
> - spin_lock_irqsave(_dirty_lock, flags);
> - bio->bi_private = bio_dirty_list;
> - bio_dirty_list = bio;
> - spin_unlock_irqrestore(_dirty_lock, flags);
> - schedule_work(_dirty_work);
> - } else {
> - bio_put(bio);
> - }
> + bio_release_pages(bio);
> + bio_put(bio);
> + return;
> +defer:
> + spin_lock_irqsave(_dirty_lock, flags);
> + bio->bi_private = bio_dirty_list;
> + bio_dirty_list = bio;
> + spin_unlock_irqrestore(_dirty_lock, flags);
> + schedule_work(_dirty_work);
>  }
>  
>  void generic_start_io_acct(struct request_queue *q, int rw,

It is good simplification, and the only effect may be some pages
which will be freed a bit late in partial clean case.

I have run fio randread on null_blk in a 512M ram fedora 27 cloud image
based VM, not see IOPS drop, so:

Tested-by: Ming Lei 
Reviewed-by: Ming Lei 

Thanks,
Ming


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Linus Torvalds
On Mon, Jun 4, 2018 at 5:56 PM Kent Overstreet
 wrote:
>
> I like your patch for a less invasive version, but I did finish and test my
> version, which deletes more code :)

I certainly won't object to that.

Your patch looks fine, and looks like the right thing in the long run anyway.

Plus, it's apparently even tested. What a concept.

Regardless, I'll sleep better tonight that this will all be fixed one
way or the other.

Thanks,

Linus


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Kent Overstreet
On Mon, Jun 04, 2018 at 05:42:04PM -0700, Linus Torvalds wrote:
> On Mon, Jun 4, 2018 at 12:04 PM Kent Overstreet
>  wrote:
> >
> > However, that's not correct as is because mddev_delayed_put() calls
> > kobject_put(), and the kobject isn't initialized when the mddev is first
> > allocated, it's initialized when the gendisk is allocated... that isn't 
> > hard to
> > fix but that's getting into real refactoring that I'll need to put actual 
> > work
> > into testing.
> 
> Well, it also removes the bioset_exit() calls entirely.

Yeah, I realized that when I went back to finish that patch
> 
> How about just the attached?
> 
> It simply does it as two different cases, and adds the bioset_exit()
> calls to mddev_delayed_delete().

Oh right, just taking advantage of the fact that just the queue_work() needs to
be under the spinlock, not the actual free in the other case.

I like your patch for a less invasive version, but I did finish and test my
version, which deletes more code :)

I've already gone to the trouble of coming up with a VM smoketest, so I can test
yours too... I don't really have a strong opinion on which patch should go in.

 drivers/md/md.c | 53 ++---
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fc692b7128..22203eba1e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -84,6 +84,8 @@ static void autostart_arrays(int part);
 static LIST_HEAD(pers_list);
 static DEFINE_SPINLOCK(pers_lock);
 
+static struct kobj_type md_ktype;
+
 struct md_cluster_operations *md_cluster_ops;
 EXPORT_SYMBOL(md_cluster_ops);
 struct module *md_cluster_mod;
@@ -510,11 +512,6 @@ static void mddev_delayed_delete(struct work_struct *ws);
 
 static void mddev_put(struct mddev *mddev)
 {
-   struct bio_set bs, sync_bs;
-
-   memset(, 0, sizeof(bs));
-   memset(_bs, 0, sizeof(sync_bs));
-
if (!atomic_dec_and_lock(>active, _mddevs_lock))
return;
if (!mddev->raid_disks && list_empty(>disks) &&
@@ -522,30 +519,23 @@ static void mddev_put(struct mddev *mddev)
/* Array is not configured at all, and not held active,
 * so destroy it */
list_del_init(>all_mddevs);
-   bs = mddev->bio_set;
-   sync_bs = mddev->sync_set;
-   memset(>bio_set, 0, sizeof(mddev->bio_set));
-   memset(>sync_set, 0, sizeof(mddev->sync_set));
-   if (mddev->gendisk) {
-   /* We did a probe so need to clean up.  Call
-* queue_work inside the spinlock so that
-* flush_workqueue() after mddev_find will
-* succeed in waiting for the work to be done.
-*/
-   INIT_WORK(>del_work, mddev_delayed_delete);
-   queue_work(md_misc_wq, >del_work);
-   } else
-   kfree(mddev);
+
+   /*
+* Call queue_work inside the spinlock so that
+* flush_workqueue() after mddev_find will succeed in waiting
+* for the work to be done.
+*/
+   INIT_WORK(>del_work, mddev_delayed_delete);
+   queue_work(md_misc_wq, >del_work);
}
spin_unlock(_mddevs_lock);
-   bioset_exit();
-   bioset_exit(_bs);
 }
 
 static void md_safemode_timeout(struct timer_list *t);
 
 void mddev_init(struct mddev *mddev)
 {
+   kobject_init(>kobj, _ktype);
mutex_init(>open_mutex);
mutex_init(>reconfig_mutex);
mutex_init(>bitmap_info.mutex);
@@ -5215,6 +5205,8 @@ static void md_free(struct kobject *ko)
put_disk(mddev->gendisk);
percpu_ref_exit(>writes_pending);
 
+   bioset_exit(>bio_set);
+   bioset_exit(>sync_set);
kfree(mddev);
 }
 
@@ -5348,8 +5340,7 @@ static int md_alloc(dev_t dev, char *name)
mutex_lock(>open_mutex);
add_disk(disk);
 
-   error = kobject_init_and_add(>kobj, _ktype,
-_to_dev(disk)->kobj, "%s", "md");
+   error = kobject_add(>kobj, _to_dev(disk)->kobj, "%s", "md");
if (error) {
/* This isn't possible, but as kobject_init_and_add is marked
 * __must_check, we must do something with the result
@@ -5506,7 +5497,7 @@ int md_run(struct mddev *mddev)
if (!bioset_initialized(>sync_set)) {
err = bioset_init(>sync_set, BIO_POOL_SIZE, 0, 
BIOSET_NEED_BVECS);
if (err)
-   goto abort;
+   return err;
}
 
spin_lock(_lock);
@@ -5519,8 +5510,7 @@ int md_run(struct mddev *mddev)
else
pr_warn("md: personality for level %s is not loaded!\n",
mddev->clevel);
-   err = -EINVAL;
-   goto abort;

Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Linus Torvalds
On Mon, Jun 4, 2018 at 5:42 PM Linus Torvalds
 wrote:
>
> How about just the attached?

Note: it probably goes without saying that the patch was entirely
untested, but it does build, and it does get rid of the insane stack
use.

Linus


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Linus Torvalds
On Mon, Jun 4, 2018 at 12:04 PM Kent Overstreet
 wrote:
>
> However, that's not correct as is because mddev_delayed_put() calls
> kobject_put(), and the kobject isn't initialized when the mddev is first
> allocated, it's initialized when the gendisk is allocated... that isn't hard 
> to
> fix but that's getting into real refactoring that I'll need to put actual work
> into testing.

Well, it also removes the bioset_exit() calls entirely.

How about just the attached?

It simply does it as two different cases, and adds the bioset_exit()
calls to mddev_delayed_delete().

Hmm?

 Linus
 drivers/md/md.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fc692b7128bb..6a2494065ab2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -510,11 +510,6 @@ static void mddev_delayed_delete(struct work_struct *ws);
 
 static void mddev_put(struct mddev *mddev)
 {
-	struct bio_set bs, sync_bs;
-
-	memset(, 0, sizeof(bs));
-	memset(_bs, 0, sizeof(sync_bs));
-
 	if (!atomic_dec_and_lock(>active, _mddevs_lock))
 		return;
 	if (!mddev->raid_disks && list_empty(>disks) &&
@@ -522,10 +517,6 @@ static void mddev_put(struct mddev *mddev)
 		/* Array is not configured at all, and not held active,
 		 * so destroy it */
 		list_del_init(>all_mddevs);
-		bs = mddev->bio_set;
-		sync_bs = mddev->sync_set;
-		memset(>bio_set, 0, sizeof(mddev->bio_set));
-		memset(>sync_set, 0, sizeof(mddev->sync_set));
 		if (mddev->gendisk) {
 			/* We did a probe so need to clean up.  Call
 			 * queue_work inside the spinlock so that
@@ -534,12 +525,15 @@ static void mddev_put(struct mddev *mddev)
 			 */
 			INIT_WORK(>del_work, mddev_delayed_delete);
 			queue_work(md_misc_wq, >del_work);
-		} else
-			kfree(mddev);
+			spin_unlock(_mddevs_lock);
+			return;
+		}
 	}
 	spin_unlock(_mddevs_lock);
-	bioset_exit();
-	bioset_exit(_bs);
+
+	bioset_exit(>bio_set);
+	bioset_exit(>sync_set);
+	kfree(mddev);
 }
 
 static void md_safemode_timeout(struct timer_list *t);
@@ -5234,6 +5228,9 @@ static void mddev_delayed_delete(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, del_work);
 
+	bioset_exit(>bio_set);
+	bioset_exit(>sync_set);
+
 	sysfs_remove_group(>kobj, _bitmap_group);
 	kobject_del(>kobj);
 	kobject_put(>kobj);


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Kent Overstreet
On Tue, Jun 05, 2018 at 08:52:32AM +1000, NeilBrown wrote:
> On Mon, Jun 04 2018, Kent Overstreet wrote:
> 
> > On Tue, Jun 05, 2018 at 07:16:51AM +1000, NeilBrown wrote:
> >> I really should get back to BIOSET_NEED_RESCUER and see if I can discard
> >> it completely.
> >> Kent - the only remaining user is bcache, and the main reason I haven't
> >> removed it is that I cannot follow the code (or at least, I couldn't
> >> last time I tried).  Are you able to see if it is still needed?
> >
> > Oh man, rescueurs make my head hurt. I remember you changed something so 
> > that
> > they weren't needed most of the time, but I can't remember what you did - 
> > can
> > you remind me what that was and whatever else you can remember offhand?
> 
> (and closures make my head hurt :-)

Fair enough :)

> There were two closely related changes.
> One is make sure that generic_make_request processed bios strictly
> in a depth-first order.  The other is to ensure that all (stacking)
> block device drivers never block in their make_request_fn waiting
> for something that have themselves submitted.
> 
> Together these mean that a thread in generic_make_request will never
> block waiting for a bio that might be queued in current->bio_list.
> The bio chosen by generic_make_request will be a leaf and won't depend
> on anything else in the list, and the handler for that request won't
> block after submitting a lower-level request.
> 
> The  second property is the important one for drivers and the one we
> need to ensure that bcache follows.
> A common bad pattern is something like:
> 
>  while (too_big(bio)) {
> split = bio_spilt(bio,...))
> process(split)
>  }
>  process(bio)
> 
> That needs to be more like
>   if (too_big(bio)) {
>  split = bio_split(bio,...);
>  generic_make_request(bio);
>  bio = split;
>   }
>   process(bio)
> 
> so that generic_make_request() gets to do that while-loop
> and can control the order in which bios are handled.
> 
> I cannot convince myself that bcache doesn't have something similar to
> that while loop (structured with closures).

Oh yeah, it's all coming back to me...

so, the problem isn't really closures, it's that we don't know where we have to
split until after we've already allocated all our state for the request and done
almost all our work for the request - also, the state we allocate is per
_incoming_ bio, not per split we submit.

So for the read side, it looks like
 - allocate a struct search (man, so much of the naming in bcache is crap, it's
   painful going back from bcachefs)
 - traverse the btree to the start of this request
 - iterate over extents, splitting and submitting for each extent the original
   bio overlaps with

so, converting it to your 2nd pattern would mean changing that code so that when
we walk the btree and find an extent we overlap with, if the extent doesn't
cover the whole bio, we split, submit the split, resubmit the original bio - and
then return and _restart the entire btree traversal_, for each split. ouch.

the write path is different, but the problem is roughly the same in that we
don't know where we're going to need to split until we're deep inside the write
path, allocating space for where the write will go - we split if the bucket
we're currently writing to doesn't have enough space. But again, we don't find
that out until well after allocating data structures for that write, taking
locks for space allocation, etc. etc...

So the approach of just resubmitting the rest of the bio just flat out won't
work, we've already allocated our data structures and they're tied to that
bio... it would require massive changes and deep surgery on how the entire
control flow works...

Also unconditionally doing this for every bio split (bailing out and restarting
the btree traversal/write path) would be pretty crappy performance wise, e.g.
when doing big sequential reads of data that was written all with random 4k
writes...

but...

the problem we're trying to avoid is just blocking while allocating memory,
while we've got bios blocked on current->bio_list, right?

So another approach would be to checking when we're allocating a bio split if
that allocation might deadlock - if so, do it with GFP_NOWAIT, and if it fails -
just punt the read or write request to workqueue.

this is something the code can actually do the way it's structured now... making
use of closures :)

cache_lookup() calls bch_btree_map_keys() to walk the extents the request
covers, and restarts itself if cache_lookup_fn() returns -EAGAIN - this is
because the btree lookup might require reading in btree nodes, so we can't block
on a btree node read while running under generic_make_request().

So this (untested!) patch should be sufficient for the read side:

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index ae67f5fa80..9aed34d050 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -538,6 +538,14 @@ static int 

Re: [blktests PATCHv2] Fix block/011 to not use sysfs for device disabling

2018-06-04 Thread Omar Sandoval
On Mon, Jun 04, 2018 at 04:51:41PM -0600, Keith Busch wrote:
> The PCI sysfs interface may not be a dependable method for toggling the
> PCI device state to trigger the timeouts. This patch goes directly to
> the config space to make device failure occur.
> 
> Signed-off-by: Keith Busch 
> ---
> v1 -> v2:
> 
>   Toggling only PCI Command Register BME bit, rather than including MEM.

Thanks, Keith, applied.


Re: [blktests PATCHv2] Fix block/011 to not use sysfs for device disabling

2018-06-04 Thread Jens Axboe
On 6/4/18 4:51 PM, Keith Busch wrote:
> The PCI sysfs interface may not be a dependable method for toggling the
> PCI device state to trigger the timeouts. This patch goes directly to
> the config space to make device failure occur.

Reviewed-by: Jens Axboe 

-- 
Jens Axboe



Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread NeilBrown
On Mon, Jun 04 2018, Kent Overstreet wrote:

> On Tue, Jun 05, 2018 at 07:16:51AM +1000, NeilBrown wrote:
>> I really should get back to BIOSET_NEED_RESCUER and see if I can discard
>> it completely.
>> Kent - the only remaining user is bcache, and the main reason I haven't
>> removed it is that I cannot follow the code (or at least, I couldn't
>> last time I tried).  Are you able to see if it is still needed?
>
> Oh man, rescueurs make my head hurt. I remember you changed something so that
> they weren't needed most of the time, but I can't remember what you did - can
> you remind me what that was and whatever else you can remember offhand?

(and closures make my head hurt :-)

There were two closely related changes.
One is make sure that generic_make_request processed bios strictly
in a depth-first order.  The other is to ensure that all (stacking)
block device drivers never block in their make_request_fn waiting
for something that have themselves submitted.

Together these mean that a thread in generic_make_request will never
block waiting for a bio that might be queued in current->bio_list.
The bio chosen by generic_make_request will be a leaf and won't depend
on anything else in the list, and the handler for that request won't
block after submitting a lower-level request.

The  second property is the important one for drivers and the one we
need to ensure that bcache follows.
A common bad pattern is something like:

 while (too_big(bio)) {
split = bio_spilt(bio,...))
process(split)
 }
 process(bio)

That needs to be more like
  if (too_big(bio)) {
 split = bio_split(bio,...);
 generic_make_request(bio);
 bio = split;
  }
  process(bio)

so that generic_make_request() gets to do that while-loop
and can control the order in which bios are handled.

I cannot convince myself that bcache doesn't have something similar to
that while loop (structured with closures).

Thanks,
NeilBrown


signature.asc
Description: PGP signature


[blktests PATCHv2] Fix block/011 to not use sysfs for device disabling

2018-06-04 Thread Keith Busch
The PCI sysfs interface may not be a dependable method for toggling the
PCI device state to trigger the timeouts. This patch goes directly to
the config space to make device failure occur.

Signed-off-by: Keith Busch 
---
v1 -> v2:

  Toggling only PCI Command Register BME bit, rather than including MEM.

 tests/block/011 | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/block/011 b/tests/block/011
index 62e89f7..2461442 100755
--- a/tests/block/011
+++ b/tests/block/011
@@ -21,7 +21,7 @@ DESCRIPTION="disable PCI device while doing I/O"
 TIMED=1
 
 requires() {
-   _have_fio
+   _have_fio && _have_program setpci
 }
 
 device_requires() {
@@ -43,10 +43,11 @@ test_device() {
_run_fio_rand_io --filename="$TEST_DEV" --size="$size" \
--ignore_error=EIO,ENXIO,ENODEV &
 
+   # toggle PCI Command Register's Bus Master Enabling
while kill -0 $! 2>/dev/null; do
-   echo 0 > "/sys/bus/pci/devices/${pdev}/enable"
+   setpci -s "${pdev}" 4.w=0:4
sleep .2
-   echo 1 > "/sys/bus/pci/devices/${pdev}/enable"
+   setpci -s "${pdev}" 4.w=4:4
sleep .2
done
 
-- 
2.14.3



Re: [PATCH] Revert "block: Add warning for bi_next not NULL in bio_endio()"

2018-06-04 Thread Kent Overstreet
On Mon, Jun 04, 2018 at 08:59:39AM +, Bart Van Assche wrote:
> On Tue, 2018-05-22 at 22:16 -0400, Kent Overstreet wrote:
> > May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: fail to get 
> > serial
> > May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: using deprecated
> > getuid callout
> > May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
> > /.../srp-test/bin/getuid_callout exited with 255
> 
> (back from traveling)
> 
> Hello Kent,
> 
> Please comment out the getuid_callout line from /etc/multipath.conf and try 
> again.
> Changing the "/..." part into the path of the srp-test software should also 
> help.
> That line is a left-over from the time when the NVMe initiator drivers 
> supported
> dm-multipath. I will update the srp-test README.md to make this more clear.

Deleted that line and it still doesn't work (I also tried fixing that line so
the path was correct, but got the same output as what I sent you earlier). With
that line deleted, the output is different. test output still says SRP login
failed, syslog says:

Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: start 
up
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: read 
/etc/multipath.conf
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: path checkers start 
up
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: sda: fail to get 
serial
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: sdb: fail to get 
serial
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd0: HDIO_GETGEO 
failed with 25
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd0: failed to get 
udev uid: Invalid argument
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd1: HDIO_GETGEO 
failed with 25
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd1: failed to get 
udev uid: Invalid argument
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd10: HDIO_GETGEO 
failed with 25

etc.


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Kent Overstreet
On Tue, Jun 05, 2018 at 07:16:51AM +1000, NeilBrown wrote:
> I really should get back to BIOSET_NEED_RESCUER and see if I can discard
> it completely.
> Kent - the only remaining user is bcache, and the main reason I haven't
> removed it is that I cannot follow the code (or at least, I couldn't
> last time I tried).  Are you able to see if it is still needed?

Oh man, rescueurs make my head hurt. I remember you changed something so that
they weren't needed most of the time, but I can't remember what you did - can
you remind me what that was and whatever else you can remember offhand?


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread NeilBrown
On Mon, Jun 04 2018, Jens Axboe wrote:

> On 6/4/18 9:51 AM, Linus Torvalds wrote:
>> On Sun, Jun 3, 2018 at 5:42 PM Jens Axboe  wrote:
>>>
>>>  drivers/md/md.c |  61 +--
>>>  drivers/md/md.h |   4 +-
>> 
>> So I've pulled this, but I get a new warning:
>> 
>>   drivers/md/md.c: In function ‘mddev_put’:
>>   drivers/md/md.c:543:1: warning: the frame size of 2112 bytes is
>> larger than 2048 bytes [-Wframe-larger-than=]
>> 
>> which seems to be due to commit afeee514ce7f ("md: convert to
>> bioset_init()/mempool_init()").
>> 
>> As of that commit, mddev_put() now allocates *two* "struct bio_set"
>> structures on the stack, and those really aren't small. Yes, part of
>> it is that I do my test-builds with all that debug stuff, but those
>> structures have several embedded mempool_t members, and they really
>> are pretty sizable.
>> 
>> And I _really_ do not think it's acceptable to have that kind of stack
>> usage in there. I'm not sure you can trigger mddev_put() in the IO
>> paths, but even without that I don't think it's acceptable.
>> 
>> Also, the code simply looks like complete and utter garbage. It does
>> 
>> bs = mddev->bio_set;
>> sync_bs = mddev->sync_set;
>> 
>> to _copy_ those structures, and then does bioset_exit() on them. WTF?
>> 
>> Why the hell doesn't it just do biset_exit() on the originals instead,
>> before freeing the mddev?
>> 
>> I've pulled this, but this needs to be fixed. That is just broken
>> garbage right now. No way in hell is it acceptable to waste 2kB of
>> stack space on something idiotic like this.
>
> Works is already underway to get this fixed up. I agree that the
> excessive stack usage isn't great, but the code in that function
> is pretty miserable, as you say. That's not new with the patch,
> the conversion just carried that forward.
>
> CC'ing Neil to get his input on how best to clean that up, I'd
> be much more comfortable with that logic improved rather than
> just catering to the stack usage issue. Also include Arnd, as
> he had a test patch for it.

Cc:ing Shaohua as well, as he is the current maintainer, and authored
  
  Commit: 7184ef8bab0c ("MD: fix sleep in atomic")

which seems most relevant.
The justification given in that patch is that bioset_free() will take a
mutex.
This must have been in destory_workqueue() when destroying
bs->rescue_workqueue.
md biosets don't have a work queue (BIOSET_NEED_RESCUER isn't set),
so these calls will no longer take a mutex.  So this commit
can now be reverted.  I think that will clean up this code suitably.

I really should get back to BIOSET_NEED_RESCUER and see if I can discard
it completely.
Kent - the only remaining user is bcache, and the main reason I haven't
removed it is that I cannot follow the code (or at least, I couldn't
last time I tried).  Are you able to see if it is still needed?

Thanks,
NeilBrown

>
> -- 
> Jens Axboe


signature.asc
Description: PGP signature


Re: [PATCH 13/14] nvme: use the changed namespaces list log to clear ns data changed AENs

2018-06-04 Thread Keith Busch
On Sat, May 26, 2018 at 12:27:34PM +0200, Christoph Hellwig wrote:
> Per section 5.2 we need to issue the corresponding log page to clear an
> AEN, so for a namespace data changed AEN we need to read the changed
> namespace list log.  And once we read that log anyway we might as well
> use it to optimize the rescan.
> 
> Signed-off-by: Christoph Hellwig 

I'm a little concerned about this. Userspace might be reading the same
log page. Since the contents of the page may change each time its read,
it's possible the driver will see only a subset of changed namespaces
at the point it gets to read the log page, missing a chance to
revalidate others.


Re: [PATCH blktests] Fix block/011 to not use sysfs for device disabling

2018-06-04 Thread Keith Busch
On Tue, May 29, 2018 at 12:54:28PM -0700, Omar Sandoval wrote:
> What's the plan for this test? Do you have a v2 coming?

Sorry for the delay. I've been out on holiday, but I'm catching up
quickly and will send a v2 shortly.

Thanks,
Keith


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Kent Overstreet
On Mon, Jun 04, 2018 at 12:25:54PM -0600, Jens Axboe wrote:
> On 6/4/18 12:22 PM, Linus Torvalds wrote:
> > On Mon, Jun 4, 2018 at 11:20 AM Tejun Heo  wrote:
> >>
> >>
> >> Looking at the code, the fundamental problem seems to be that it's
> >> weaving different parts of sync and async paths.  I don't understand
> >> why it'd punt the destructin of mddev but destroy biosets
> >> synchronously.  Can't it do sth like the following?
> > 
> > Yeah, that looks like the right thing to do.
> > 
> > Jens/Kent?
> 
> I agree with Tejun, we already discussed this offline.

I don't see any good reason for the exit path to have two or three variations,
depending on how you count. My preference would be for a patch that does this:


diff --git a/drivers/md/md.c b/drivers/md/md.c
index fc692b7128..f18d783785 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -510,36 +510,24 @@ static void mddev_delayed_delete(struct work_struct *ws);
 
 static void mddev_put(struct mddev *mddev)
 {
-   struct bio_set bs, sync_bs;
-
-   memset(, 0, sizeof(bs));
-   memset(_bs, 0, sizeof(sync_bs));
-
if (!atomic_dec_and_lock(>active, _mddevs_lock))
return;
+
if (!mddev->raid_disks && list_empty(>disks) &&
mddev->ctime == 0 && !mddev->hold_active) {
/* Array is not configured at all, and not held active,
 * so destroy it */
list_del_init(>all_mddevs);
-   bs = mddev->bio_set;
-   sync_bs = mddev->sync_set;
-   memset(>bio_set, 0, sizeof(mddev->bio_set));
-   memset(>sync_set, 0, sizeof(mddev->sync_set));
-   if (mddev->gendisk) {
-   /* We did a probe so need to clean up.  Call
-* queue_work inside the spinlock so that
-* flush_workqueue() after mddev_find will
-* succeed in waiting for the work to be done.
-*/
-   INIT_WORK(>del_work, mddev_delayed_delete);
-   queue_work(md_misc_wq, >del_work);
-   } else
-   kfree(mddev);
+
+   /* We did a probe so need to clean up.  Call
+* queue_work inside the spinlock so that
+* flush_workqueue() after mddev_find will
+* succeed in waiting for the work to be done.
+*/
+   INIT_WORK(>del_work, mddev_delayed_delete);
+   queue_work(md_misc_wq, >del_work);
}
spin_unlock(_mddevs_lock);
-   bioset_exit();
-   bioset_exit(_bs);
 }
 
 static void md_safemode_timeout(struct timer_list *t);

However, that's not correct as is because mddev_delayed_put() calls
kobject_put(), and the kobject isn't initialized when the mddev is first
allocated, it's initialized when the gendisk is allocated... that isn't hard to
fix but that's getting into real refactoring that I'll need to put actual work
into testing.

However, I have looked at where mddev_put() is called from and I think the stack
usage is fine for now - unless I've missed something it's not called from e.g.
under generic_make_request() anywhere, it's just called from sysfs code and
ioctls and things like that, where the stack is going to be pretty much empty.




Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Jens Axboe
On 6/4/18 12:22 PM, Linus Torvalds wrote:
> On Mon, Jun 4, 2018 at 11:20 AM Tejun Heo  wrote:
>>
>>
>> Looking at the code, the fundamental problem seems to be that it's
>> weaving different parts of sync and async paths.  I don't understand
>> why it'd punt the destructin of mddev but destroy biosets
>> synchronously.  Can't it do sth like the following?
> 
> Yeah, that looks like the right thing to do.
> 
> Jens/Kent?

I agree with Tejun, we already discussed this offline.

-- 
Jens Axboe



Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Linus Torvalds
On Mon, Jun 4, 2018 at 11:20 AM Tejun Heo  wrote:
>
>
> Looking at the code, the fundamental problem seems to be that it's
> weaving different parts of sync and async paths.  I don't understand
> why it'd punt the destructin of mddev but destroy biosets
> synchronously.  Can't it do sth like the following?

Yeah, that looks like the right thing to do.

Jens/Kent?

 Linus


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Tejun Heo
Hey, Linus.

On Mon, Jun 04, 2018 at 09:24:28AM -0700, Linus Torvalds wrote:
> Tejun, the code in question is mddev_put() in drivers/md/md.c, and it
> basically does
> 
> INIT_WORK(>del_work, mddev_delayed_delete);
> queue_work(md_misc_wq, >del_work);
> 
> inside a spinlock, but then wants to do some stuff *outside* the
> spinlock before that mddev_delayed_delete() thing actually gets
> called.
> 
> Is there some way to "half-queue" the work - enough that a
> flush_workqueue() will wait for it, but still guaranteeing that it
> won't actually run until released?

Such interface doesn't exist, yet anyway.  Workqueue does have the
concept of delayed queueing to handle max concurrency limits and we
can probably piggyback on that to create an interface which queues and
delays the work item and another interface to undelay it.

That said, it feels like a super-niche feature for a weird use case.

> IOW, what I think that code really wants to do is perhaps something like
> 
> /* under  _mddevs_lock spinlock */
> .. remove from all lists so that it can't be found ..
> 
> .. but add it to the md_misc_wq so that people will wait for it ..
> 
> INIT_WORK_LOCKED(>del_work, mddev_delayed_delete);
> queue_work(md_misc_wq, >del_work);
> ...
> 
> spin_unlock(_mddevs_lock);
> 
> /* mddev is still guaranteed live */
> bioset_exit(>bio_set);
> bioset_exit(>sync_set);
> 
> /* NOW we can release it */
> if (queued)
> unlock_work(>del_work);
> else
> kfree(mddev);
>  Linus

Looking at the code, the fundamental problem seems to be that it's
weaving different parts of sync and async paths.  I don't understand
why it'd punt the destructin of mddev but destroy biosets
synchronously.  Can't it do sth like the following?

lock;

if (need to be async) {
/* async destruction path */
INIT_WORK(...);
queue_work(...);/* the work does bioset_exits */
unlock;
return;
}

/* sync destructino path */
do whatever needs to be done under lock;
unlock;
bioset_exit(...);
bioset_exit(...);
kfree(mddev);

Thanks.

-- 
tejun


[PATCH] nbd: Consistently use request pointer in debug messages.

2018-06-04 Thread kvigor
From: Kevin Vigor 

Existing dev_dbg messages sometimes identify request using request
pointer, sometimes using nbd_cmd pointer. This makes it hard to
follow request flow. Consistently use request pointer instead.
---
 drivers/block/nbd.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 3ed1ef8..b5afa11 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -275,7 +275,7 @@ static void nbd_complete_rq(struct request *req)
 {
struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
 
-   dev_dbg(nbd_to_dev(cmd->nbd), "request %p: %s\n", cmd,
+   dev_dbg(nbd_to_dev(cmd->nbd), "request %p: %s\n", req,
cmd->status ? "failed" : "done");
 
blk_mq_end_request(req, cmd->status);
@@ -482,7 +482,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct 
nbd_cmd *cmd, int index)
memcpy(request.handle, , sizeof(tag));
 
dev_dbg(nbd_to_dev(nbd), "request %p: sending control (%s@%llu,%uB)\n",
-   cmd, nbdcmd_to_ascii(type),
+   req, nbdcmd_to_ascii(type),
(unsigned long long)blk_rq_pos(req) << 9, blk_rq_bytes(req));
result = sock_xmit(nbd, index, 1, ,
(type == NBD_CMD_WRITE) ? MSG_MORE : 0, );
@@ -518,7 +518,7 @@ static int nbd_send_cmd(struct nbd_device *nbd, struct 
nbd_cmd *cmd, int index)
int flags = is_last ? 0 : MSG_MORE;
 
dev_dbg(nbd_to_dev(nbd), "request %p: sending %d bytes 
data\n",
-   cmd, bvec.bv_len);
+   req, bvec.bv_len);
iov_iter_bvec(, ITER_BVEC | WRITE,
  , 1, bvec.bv_len);
if (skip) {
@@ -610,7 +610,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device 
*nbd, int index)
return cmd;
}
 
-   dev_dbg(nbd_to_dev(nbd), "request %p: got reply\n", cmd);
+   dev_dbg(nbd_to_dev(nbd), "request %p: got reply\n", req);
if (rq_data_dir(req) != WRITE) {
struct req_iterator iter;
struct bio_vec bvec;
@@ -637,7 +637,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device 
*nbd, int index)
return ERR_PTR(-EIO);
}
dev_dbg(nbd_to_dev(nbd), "request %p: got %d bytes 
data\n",
-   cmd, bvec.bv_len);
+   req, bvec.bv_len);
}
} else {
/* See the comment in nbd_queue_rq. */
-- 
2.7.4



Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Linus Torvalds
On Mon, Jun 4, 2018 at 8:54 AM Jens Axboe  wrote:
>
> On 6/4/18 9:51 AM, Linus Torvalds wrote:
> >
> > Why the hell doesn't it just do bioset_exit() on the originals instead,
> > before freeing the mddev?
>
> CC'ing Neil to get his input on how best to clean that up, I'd
> be much more comfortable with that logic improved rather than
> just catering to the stack usage issue. Also include Arnd, as
> he had a test patch for it.

Also adding Tejun, because the only reason for that odd sequencing
seems to be that

 - we want to queue the deletion work *inside* the spinlock, because
it actually has synchronization between the workqueue and the
'all_mddevs_lock' spinlock.

 - we want to bioset_exit() the fields *outside* the spinlock.

So what it does now is to copy those big 'struct bio_set' fields
because they might go away from under it.

Tejun, the code in question is mddev_put() in drivers/md/md.c, and it
basically does

INIT_WORK(>del_work, mddev_delayed_delete);
queue_work(md_misc_wq, >del_work);

inside a spinlock, but then wants to do some stuff *outside* the
spinlock before that mddev_delayed_delete() thing actually gets
called.

Is there some way to "half-queue" the work - enough that a
flush_workqueue() will wait for it, but still guaranteeing that it
won't actually run until released?

IOW, what I think that code really wants to do is perhaps something like

/* under  _mddevs_lock spinlock */
.. remove from all lists so that it can't be found ..

.. but add it to the md_misc_wq so that people will wait for it ..

INIT_WORK_LOCKED(>del_work, mddev_delayed_delete);
queue_work(md_misc_wq, >del_work);
...

spin_unlock(_mddevs_lock);

/* mddev is still guaranteed live */
bioset_exit(>bio_set);
bioset_exit(>sync_set);

/* NOW we can release it */
if (queued)
unlock_work(>del_work);
else
kfree(mddev);

or something like that?

The above is *ENTIRELY* hand-wavy garbage - don't take it seriously
per se, consider it just pseudo-code for documentation reasons, not
serious in any other way.

 Linus


Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Jens Axboe
On 6/4/18 9:51 AM, Linus Torvalds wrote:
> On Sun, Jun 3, 2018 at 5:42 PM Jens Axboe  wrote:
>>
>>  drivers/md/md.c |  61 +--
>>  drivers/md/md.h |   4 +-
> 
> So I've pulled this, but I get a new warning:
> 
>   drivers/md/md.c: In function ‘mddev_put’:
>   drivers/md/md.c:543:1: warning: the frame size of 2112 bytes is
> larger than 2048 bytes [-Wframe-larger-than=]
> 
> which seems to be due to commit afeee514ce7f ("md: convert to
> bioset_init()/mempool_init()").
> 
> As of that commit, mddev_put() now allocates *two* "struct bio_set"
> structures on the stack, and those really aren't small. Yes, part of
> it is that I do my test-builds with all that debug stuff, but those
> structures have several embedded mempool_t members, and they really
> are pretty sizable.
> 
> And I _really_ do not think it's acceptable to have that kind of stack
> usage in there. I'm not sure you can trigger mddev_put() in the IO
> paths, but even without that I don't think it's acceptable.
> 
> Also, the code simply looks like complete and utter garbage. It does
> 
> bs = mddev->bio_set;
> sync_bs = mddev->sync_set;
> 
> to _copy_ those structures, and then does bioset_exit() on them. WTF?
> 
> Why the hell doesn't it just do biset_exit() on the originals instead,
> before freeing the mddev?
> 
> I've pulled this, but this needs to be fixed. That is just broken
> garbage right now. No way in hell is it acceptable to waste 2kB of
> stack space on something idiotic like this.

Works is already underway to get this fixed up. I agree that the
excessive stack usage isn't great, but the code in that function
is pretty miserable, as you say. That's not new with the patch,
the conversion just carried that forward.

CC'ing Neil to get his input on how best to clean that up, I'd
be much more comfortable with that logic improved rather than
just catering to the stack usage issue. Also include Arnd, as
he had a test patch for it.

-- 
Jens Axboe



Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Linus Torvalds
On Sun, Jun 3, 2018 at 5:42 PM Jens Axboe  wrote:
>
>  drivers/md/md.c |  61 +--
>  drivers/md/md.h |   4 +-

So I've pulled this, but I get a new warning:

  drivers/md/md.c: In function ‘mddev_put’:
  drivers/md/md.c:543:1: warning: the frame size of 2112 bytes is
larger than 2048 bytes [-Wframe-larger-than=]

which seems to be due to commit afeee514ce7f ("md: convert to
bioset_init()/mempool_init()").

As of that commit, mddev_put() now allocates *two* "struct bio_set"
structures on the stack, and those really aren't small. Yes, part of
it is that I do my test-builds with all that debug stuff, but those
structures have several embedded mempool_t members, and they really
are pretty sizable.

And I _really_ do not think it's acceptable to have that kind of stack
usage in there. I'm not sure you can trigger mddev_put() in the IO
paths, but even without that I don't think it's acceptable.

Also, the code simply looks like complete and utter garbage. It does

bs = mddev->bio_set;
sync_bs = mddev->sync_set;

to _copy_ those structures, and then does bioset_exit() on them. WTF?

Why the hell doesn't it just do biset_exit() on the originals instead,
before freeing the mddev?

I've pulled this, but this needs to be fixed. That is just broken
garbage right now. No way in hell is it acceptable to waste 2kB of
stack space on something idiotic like this.

 Linus


[PATCH 3/3] block: use bio_add_page in bio_iov_iter_get_pages

2018-06-04 Thread Christoph Hellwig
Replace a nasty hack with a different nasty hack to prepare for multipage
bio_vecs.  By moving the temporary page array as far up as possible in
the space allocated for the bio_vec array we can iterate forward over it
and thus use bio_add_page.  Using bio_add_page means we'll be able to
merge physically contiguous pages once support for multipath bio_vecs is
merged.

Signed-off-by: Christoph Hellwig 
---
 block/bio.c | 45 +
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d95fab72acb5..b09c133a1e24 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -871,6 +871,8 @@ int bio_add_page(struct bio *bio, struct page *page,
 }
 EXPORT_SYMBOL(bio_add_page);
 
+#define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page *))
+
 /**
  * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
  * @bio: bio to add pages to
@@ -882,38 +884,33 @@ EXPORT_SYMBOL(bio_add_page);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
+   unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
struct page **pages = (struct page **)bv;
-   size_t offset, diff;
-   ssize_t size;
+   ssize_t size, left;
+   unsigned len, i;
+   size_t offset;
+
+   /*
+* Move page array up in the allocated memory for the bio vecs as
+* far as possible so that we can start filling biovecs from the
+* beginning without overwriting the temporary page array.
+*/
+   BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
+   pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, );
if (unlikely(size <= 0))
return size ? size : -EFAULT;
-   nr_pages = (size + offset + PAGE_SIZE - 1) / PAGE_SIZE;
 
-   /*
-* Deep magic below:  We need to walk the pinned pages backwards
-* because we are abusing the space allocated for the bio_vecs
-* for the page array.  Because the bio_vecs are larger than the
-* page pointers by definition this will always work.  But it also
-* means we can't use bio_add_page, so any changes to it's semantics
-* need to be reflected here as well.
-*/
-   bio->bi_iter.bi_size += size;
-   bio->bi_vcnt += nr_pages;
-
-   diff = (nr_pages * PAGE_SIZE - offset) - size;
-   while (nr_pages--) {
-   bv[nr_pages].bv_page = pages[nr_pages];
-   bv[nr_pages].bv_len = PAGE_SIZE;
-   bv[nr_pages].bv_offset = 0;
-   }
+   for (left = size, i = 0; left > 0; left -= len, i++) {
+   struct page *page = pages[i];
 
-   bv[0].bv_offset += offset;
-   bv[0].bv_len -= offset;
-   if (diff)
-   bv[bio->bi_vcnt - 1].bv_len -= diff;
+   len = min_t(size_t, PAGE_SIZE - offset, left);
+   if (WARN_ON_ONCE(bio_add_page(bio, page, len, offset) != len))
+   return -EINVAL;
+   offset = 0;
+   }
 
iov_iter_advance(iter, size);
return 0;
-- 
2.14.2



[PATCH 2/3] block: bio_set_pages_dirty can't see NULL bv_page in a valid bio_vec

2018-06-04 Thread Christoph Hellwig
So don't bother handling it.

Signed-off-by: Christoph Hellwig 
---
 block/bio.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 50941c1c9118..d95fab72acb5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1578,10 +1578,8 @@ void bio_set_pages_dirty(struct bio *bio)
int i;
 
bio_for_each_segment_all(bvec, bio, i) {
-   struct page *page = bvec->bv_page;
-
-   if (page && !PageCompound(page))
-   set_page_dirty_lock(page);
+   if (!PageCompound(bvec->bv_page))
+   set_page_dirty_lock(bvec->bv_page);
}
 }
 
-- 
2.14.2



a few cleanups to prepare for multipage bio_vecs

2018-06-04 Thread Christoph Hellwig
Clean up some bio_vec abuses to prepare for saner handling of multipage
bio_vecs.


[PATCH 1/3] block: simplify bio_check_pages_dirty

2018-06-04 Thread Christoph Hellwig
bio_check_pages_dirty currently inviolates the invariant that bv_page of
a bio_vec inside bi_vcnt shouldn't be zero, and that is going to become
really annoying with multpath biovecs.  Fortunately there isn't any
all that good reason for it - once we decide to defer freeing the bio
to a workqueue holding onto a few additional pages isn't really an
issue anymore.  So just check if there is a clean page that needs
dirtying in the first path, and do a second pass to free them if there
was none, while the cache is still hot.

Also use the chance to micro-optimize bio_dirty_fn a bit by not saving
irq state - we know we are called from a workqueue.

Signed-off-by: Christoph Hellwig 
---
 block/bio.c | 56 +---
 1 file changed, 21 insertions(+), 35 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 53e0f0a1ed94..50941c1c9118 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1590,19 +1590,15 @@ static void bio_release_pages(struct bio *bio)
struct bio_vec *bvec;
int i;
 
-   bio_for_each_segment_all(bvec, bio, i) {
-   struct page *page = bvec->bv_page;
-
-   if (page)
-   put_page(page);
-   }
+   bio_for_each_segment_all(bvec, bio, i)
+   put_page(bvec->bv_page);
 }
 
 /*
  * bio_check_pages_dirty() will check that all the BIO's pages are still dirty.
  * If they are, then fine.  If, however, some pages are clean then they must
  * have been written out during the direct-IO read.  So we take another ref on
- * the BIO and the offending pages and re-dirty the pages in process context.
+ * the BIO and re-dirty the pages in process context.
  *
  * It is expected that bio_check_pages_dirty() will wholly own the BIO from
  * here on.  It will run one put_page() against each page and will run one
@@ -1620,52 +1616,42 @@ static struct bio *bio_dirty_list;
  */
 static void bio_dirty_fn(struct work_struct *work)
 {
-   unsigned long flags;
-   struct bio *bio;
+   struct bio *bio, *next;
 
-   spin_lock_irqsave(_dirty_lock, flags);
-   bio = bio_dirty_list;
+   spin_lock_irq(_dirty_lock);
+   next = bio_dirty_list;
bio_dirty_list = NULL;
-   spin_unlock_irqrestore(_dirty_lock, flags);
+   spin_unlock_irq(_dirty_lock);
 
-   while (bio) {
-   struct bio *next = bio->bi_private;
+   while ((bio = next) != NULL) {
+   next = bio->bi_private;
 
bio_set_pages_dirty(bio);
bio_release_pages(bio);
bio_put(bio);
-   bio = next;
}
 }
 
 void bio_check_pages_dirty(struct bio *bio)
 {
struct bio_vec *bvec;
-   int nr_clean_pages = 0;
+   unsigned long flags;
int i;
 
bio_for_each_segment_all(bvec, bio, i) {
-   struct page *page = bvec->bv_page;
-
-   if (PageDirty(page) || PageCompound(page)) {
-   put_page(page);
-   bvec->bv_page = NULL;
-   } else {
-   nr_clean_pages++;
-   }
+   if (!PageDirty(bvec->bv_page) && !PageCompound(bvec->bv_page))
+   goto defer;
}
 
-   if (nr_clean_pages) {
-   unsigned long flags;
-
-   spin_lock_irqsave(_dirty_lock, flags);
-   bio->bi_private = bio_dirty_list;
-   bio_dirty_list = bio;
-   spin_unlock_irqrestore(_dirty_lock, flags);
-   schedule_work(_dirty_work);
-   } else {
-   bio_put(bio);
-   }
+   bio_release_pages(bio);
+   bio_put(bio);
+   return;
+defer:
+   spin_lock_irqsave(_dirty_lock, flags);
+   bio->bi_private = bio_dirty_list;
+   bio_dirty_list = bio;
+   spin_unlock_irqrestore(_dirty_lock, flags);
+   schedule_work(_dirty_work);
 }
 
 void generic_start_io_acct(struct request_queue *q, int rw,
-- 
2.14.2



Re: [PATCH 00/24] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)

2018-06-04 Thread Danil Kipnis
Hi Doug,

thanks for the feedback. You read the cover letter correctly: our
transport library implements multipath (load balancing and failover)
on top of RDMA API. Its name "IBTRS" is slightly misleading in that
regard: it can sit on top of ROCE as well. The library allows for
"bundling" multiple rdma "paths" (source addr - destination addr pair)
into one "session". So our session consists of one or more paths and
each path under the hood consists of as many QPs (each connecting
source with destination) as there are CPUs on the client system. The
user load (In our case IBNBD is a block device and generates some
block requests) is load-balanced on per cpu-basis.
I understand, this is something very different to what smc-r is doing.
Am I right? Do you know what stage MP-RDMA development currently is?

Best,

Danil Kipnis.

P.S. Sorry for the duplicate if any, first mail was returned cause of html.

On Thu, Feb 8, 2018 at 7:10 PM Bart Van Assche  wrote:
>
> On Thu, 2018-02-08 at 18:38 +0100, Danil Kipnis wrote:
> > thanks for the link to the article. To the best of my understanding,
> > the guys suggest to authenticate the devices first and only then
> > authenticate the users who use the devices in order to get access to a
> > corporate service. They also mention in the presentation the current
> > trend of moving corporate services into the cloud. But I think this is
> > not about the devices from which that cloud is build of. Isn't a cloud
> > first build out of devices connected via IB and then users (and their
> > devices) are provided access to the services of that cloud as a whole?
> > If a malicious user already plugged his device into an IB switch of a
> > cloud internal infrastructure, isn't it game over anyway? Can't he
> > just take the hard drives instead of mapping them?
>
> Hello Danil,
>
> It seems like we each have been focussing on different aspects of the article.
> The reason I referred to that article is because I read the following in
> that article: "Unlike the conventional perimeter security model, BeyondCorp
> doesn’t gate access to services and tools based on a user’s physical location
> or the originating network [ ... ] The zero trust architecture spells trouble
> for traditional attacks that rely on penetrating a tough perimeter to waltz
> freely within an open internal network." Suppose e.g. that an organization
> decides to use RoCE or iWARP for connectivity between block storage initiator
> systems and block storage target systems and that it has a single company-
> wide Ethernet network. If the target system does not restrict access based
> on initiator IP address then any penetrator would be able to access all the
> block devices exported by the target after a SoftRoCE or SoftiWARP initiator
> driver has been loaded. If the target system however restricts access based
> on the initiator IP address then that would make it harder for a penetrator
> to access the exported block storage devices. Instead of just penetrating the
> network access, IP address spoofing would have to be used or access would
> have to be obtained to a system that has been granted access to the target
> system.
>
> Thanks,
>
> Bart.
>
>


-- 
Danil Kipnis
Linux Kernel Developer


Re: [PATCH] Revert "block: Add warning for bi_next not NULL in bio_endio()"

2018-06-04 Thread Bart Van Assche
On Tue, 2018-05-22 at 22:16 -0400, Kent Overstreet wrote:
> May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: fail to get serial
> May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: using deprecated
> getuid callout
> May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
> /.../srp-test/bin/getuid_callout exited with 255

(back from traveling)

Hello Kent,

Please comment out the getuid_callout line from /etc/multipath.conf and try 
again.
Changing the "/..." part into the path of the srp-test software should also 
help.
That line is a left-over from the time when the NVMe initiator drivers supported
dm-multipath. I will update the srp-test README.md to make this more clear.

Thanks,

Bart.




Re: [PATCH 12/12] nvme: limit warnings from nvme_identify_ns

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:46:00 +0200
Christoph Hellwig  wrote:

> When rescanning namespaces after an AEN we will issue Identify
> Namespace comands to namespaces that have gone away, so don't warn
> for this specific case.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 10/12] nvme: mark nvme_queue_scan static

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:58 +0200
Christoph Hellwig  wrote:

> And move it toward the top of the file to avoid a forward declaration.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 19 +--
>  drivers/nvme/host/nvme.h |  1 -
>  2 files changed, 9 insertions(+), 11 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 11/12] nvme: use the changed namespaces list log to clear ns data changed AENs

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:59 +0200
Christoph Hellwig  wrote:

> Per section 5.2 we need to issue the corresponding log page to clear
> an AEN, so for a namespace data changed AEN we need to read the
> changed namespace list log.  And once we read that log anyway we
> might as well use it to optimize the rescan.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/nvme/host/core.c | 51
>  drivers/nvme/host/nvme.h |
> 2 ++ 2 files changed, 49 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 08/12] nvmet: mask pending AERs

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:56 +0200
Christoph Hellwig  wrote:

> Per section 5.2 of the NVMe 1.3 spec:
> 
>   "When the controller posts a completion queue entry for an
> outstanding Asynchronous Event Request command and thus reports an
> asynchronous event, subsequent events of that event type are
> automatically masked by the controller until the host clears that
> event. An event is cleared by reading the log page associated with
> that event using the Get Log Page command (see section 5.14)."
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/nvme/target/admin-cmd.c | 1 +
>  drivers/nvme/target/core.c  | 9 -
>  drivers/nvme/target/nvmet.h | 1 +
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 06/12] nvmet: implement the changed namespaces log

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:54 +0200
Christoph Hellwig  wrote:

> Just keep a per-controller buffer of changed namespaces and copy it
> out in the get log page implementation.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/nvme/target/admin-cmd.c | 26 +
>  drivers/nvme/target/core.c  | 50
> +++-- drivers/nvme/target/nvmet.h |
> 3 ++ 3 files changed, 70 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 05/12] nvmet: split log page implementation

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:53 +0200
Christoph Hellwig  wrote:

> Remove the common code to allocate a buffer and copy it into the SGL.
> Instead the two no-op implementations just zero the SGL directly, and
> the smart log allocates a buffer on its own.  This prepares for the
> more elaborate ANA log page.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/admin-cmd.c | 99
> - 1 file changed, 36 insertions(+),
> 63 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 04/12] nvmet: add a new nvmet_zero_sgl helper

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:52 +0200
Christoph Hellwig  wrote:

> Zeroes the SGL in the payload.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/nvme/target/core.c  | 7 +++
>  drivers/nvme/target/nvmet.h | 1 +
>  2 files changed, 8 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 01/12] nvme.h: untangle AEN notice definitions

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:49 +0200
Christoph Hellwig  wrote:

> Stop including the event type in the definitions for the notice type.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 30 ++
>  include/linux/nvme.h |  8 ++--
>  2 files changed, 24 insertions(+), 14 deletions(-)
> 

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 01/12] nvme.h: untangle AEN notice definitions

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:49 +0200
Christoph Hellwig  wrote:

> Stop including the event type in the definitions for the notice type.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/core.c | 30 ++
>  include/linux/nvme.h |  8 ++--
>  2 files changed, 24 insertions(+), 14 deletions(-)
> 

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 02/12] nvme.h: add the changed namespace list log

2018-06-04 Thread Hannes Reinecke
On Wed, 30 May 2018 18:45:50 +0200
Christoph Hellwig  wrote:

> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  include/linux/nvme.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes