Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Ming Lei
On Fri, Nov 03, 2017 at 02:42:50AM +, Bart Van Assche wrote:
> On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote:
> > [root@ibclient srp-test]# ./run_tests
> > modprobe: FATAL: Module target_core_mod is in use.
> 
> LIO must be unloaded before srp-test software is started.

Yeah, I can make this test kick off after running
'modprobe -fr ib_isert' first, but looks all tests failed without
any hang, could you check if that is the expected result?

https://pastebin.com/VXe66Jpg

Also could you provide some output from debugfs when this hang happens?
such as:

dispatch/busy/.. under hctxN

-- 
Ming


Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Bart Van Assche
On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote:
> [root@ibclient srp-test]# ./run_tests
> modprobe: FATAL: Module target_core_mod is in use.

LIO must be unloaded before srp-test software is started.

Bart.

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Ming Lei
Hi Laurence,

On Thu, Nov 02, 2017 at 09:16:04PM -0400, Laurence Oberman wrote:
> Hi Ming
> I have used Bart's tests on my test bed they should run fine.
> Are you using my SRP setup.

Yeah, I am using your SRP setup.

Once the three directories are created, I saw the new failure,
and both ib/srp and dm-mpath has been working well now.

Laurence and Bart, do you have receipt for making the test
working first?

[root@ibclient srp-test]# ./run_tests
modprobe: FATAL: Module target_core_mod is in use.
[root@ibclient srp-test]# lsmod | grep target
iscsi_target_mod  294912  1 ib_isert
target_core_mod   352256  2 iscsi_target_mod,ib_isert
[root@ibclient srp-test]# lsblk | head -n 10
NAME   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdy 65:128  0   3.9G  0 disk
└─36001405b2b5c6c24c084b6fa4d55da2f253:16   0   3.9G  0 mpath
sdam66:96   0   3.9G  0 disk
└─360014055f563ebf2abc4609a00a16794253:11   0   3.9G  0 mpath
sdbb67:80   0   3.9G  0 disk
└─36001405f58163b828714988bd2e88358253:17   0   3.9G  0 mpath
sdf  8:80   0   3.9G  0 disk
└─36001405fb1402260c1346688f23d07e0253:70   3.9G  0 mpath
sdau66:224  0   3.9G  0 disk

Thanks,
Ming
> 
> Thanks
> Laurence
> 
> 
> On Nov 2, 2017 8:48 PM, "Bart Van Assche"  wrote:
> 
> > On Fri, 2017-11-03 at 08:15 +0800, Ming Lei wrote:
> > > On Thu, Nov 02, 2017 at 11:54:57PM +, Bart Van Assche wrote:
> > > > On Fri, 2017-11-03 at 07:48 +0800, Ming Lei wrote:
> > > > > Could you please share your srp-tests script? I may find a IB/SRP
> > system
> > > > > to see if I can reproduce this issue and figure out one solution.
> > > >
> > > > Please have a look at https://github.com/bvanassche/srp-test.
> > >
> > > Could you provide me the exact command line for the reproduction?
> > >
> > > I simply run ./run_tests, and looks it doesn't work.
> > >
> > > [root@ srp-test]# ./run_tests
> > > df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
> > > df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
> > > Unmounting /home/ming/git/srp-test/mnt2 from
> > > umount: /home/ming/git/srp-test/mnt2: mountpoint not found
> > > df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
> > > Error: unmounting /home/ming/git/srp-test/mnt2 failed
> >
> > This means that I have to add the following to the README document:
> >
> > Run "mkdir mnt1 mnt2 mnt3" before starting the tests.
> >
> > Bart.



Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Bart Van Assche
On Fri, 2017-11-03 at 08:15 +0800, Ming Lei wrote:
> On Thu, Nov 02, 2017 at 11:54:57PM +, Bart Van Assche wrote:
> > On Fri, 2017-11-03 at 07:48 +0800, Ming Lei wrote:
> > > Could you please share your srp-tests script? I may find a IB/SRP system
> > > to see if I can reproduce this issue and figure out one solution.
> > 
> > Please have a look at https://github.com/bvanassche/srp-test.
> 
> Could you provide me the exact command line for the reproduction?
> 
> I simply run ./run_tests, and looks it doesn't work.
> 
> [root@ srp-test]# ./run_tests
> df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
> df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
> Unmounting /home/ming/git/srp-test/mnt2 from
> umount: /home/ming/git/srp-test/mnt2: mountpoint not found
> df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
> Error: unmounting /home/ming/git/srp-test/mnt2 failed

This means that I have to add the following to the README document:

Run "mkdir mnt1 mnt2 mnt3" before starting the tests.

Bart.

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Ming Lei
On Thu, Nov 02, 2017 at 11:54:57PM +, Bart Van Assche wrote:
> On Fri, 2017-11-03 at 07:48 +0800, Ming Lei wrote:
> > Could you please share your srp-tests script? I may find a IB/SRP system
> > to see if I can reproduce this issue and figure out one solution.
> 
> Please have a look at https://github.com/bvanassche/srp-test.

Could you provide me the exact command line for the reproduction?

I simply run ./run_tests, and looks it doesn't work.

[root@ srp-test]# ./run_tests
df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
Unmounting /home/ming/git/srp-test/mnt2 from
umount: /home/ming/git/srp-test/mnt2: mountpoint not found
df: ‘/home/ming/git/srp-test/mnt2’: No such file or directory
Error: unmounting /home/ming/git/srp-test/mnt2 failed


Thanks,
Ming


Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Bart Van Assche
On Fri, 2017-11-03 at 07:48 +0800, Ming Lei wrote:
> Could you please share your srp-tests script? I may find a IB/SRP system
> to see if I can reproduce this issue and figure out one solution.

Please have a look at https://github.com/bvanassche/srp-test.

Bart.

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Ming Lei
On Thu, Nov 02, 2017 at 11:43:55PM +, Bart Van Assche wrote:
> On Fri, 2017-11-03 at 07:38 +0800, Ming Lei wrote:
> > On Thu, Nov 02, 2017 at 03:57:05PM +, Bart Van Assche wrote:
> > > On Wed, 2017-11-01 at 08:21 -0600, Jens Axboe wrote:
> > > > Fixed that up, and applied these two patches as well.
> > > 
> > > Hello Jens,
> > > 
> > > Recently I noticed that a test system sporadically hangs during boot (Dell
> > > PowerEdge R720 that boots from a hard disk connected to a MegaRAID SAS 
> > > adapter)
> > > and also that srp-tests systematically hangs. Reverting the two patches 
> > > from
> > > this series fixes both issues. I'm not sure there is another solution than
> > > reverting the two patches from this series.
> > 
> > Then we need to find the root cause, instead of using sort of workaround
> > as before.
> > 
> > For SCSI, the restart is always run from scsi_end_request(), and this
> > kind of restart from all hctx isn't necessary at all.
> 
> Yes, we need to find the root cause, but after these two patches have either
> been reverted or have been removed from the for-4.15 queue. Otherwise if a
> bisect lands in the middle of these two patches and the fix for these patches
> that would be very frustrating for the person who is running the bisect.

Could you please share your srp-tests script? I may find a IB/SRP system
to see if I can reproduce this issue and figure out one solution.

-- 
Ming


Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Bart Van Assche
On Fri, 2017-11-03 at 07:38 +0800, Ming Lei wrote:
> On Thu, Nov 02, 2017 at 03:57:05PM +, Bart Van Assche wrote:
> > On Wed, 2017-11-01 at 08:21 -0600, Jens Axboe wrote:
> > > Fixed that up, and applied these two patches as well.
> > 
> > Hello Jens,
> > 
> > Recently I noticed that a test system sporadically hangs during boot (Dell
> > PowerEdge R720 that boots from a hard disk connected to a MegaRAID SAS 
> > adapter)
> > and also that srp-tests systematically hangs. Reverting the two patches from
> > this series fixes both issues. I'm not sure there is another solution than
> > reverting the two patches from this series.
> 
> Then we need to find the root cause, instead of using sort of workaround
> as before.
> 
> For SCSI, the restart is always run from scsi_end_request(), and this
> kind of restart from all hctx isn't necessary at all.

Yes, we need to find the root cause, but after these two patches have either
been reverted or have been removed from the for-4.15 queue. Otherwise if a
bisect lands in the middle of these two patches and the fix for these patches
that would be very frustrating for the person who is running the bisect.

Bart.

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Ming Lei
On Thu, Nov 02, 2017 at 03:57:05PM +, Bart Van Assche wrote:
> On Wed, 2017-11-01 at 08:21 -0600, Jens Axboe wrote:
> > Fixed that up, and applied these two patches as well.
> 
> Hello Jens,
> 
> Recently I noticed that a test system sporadically hangs during boot (Dell
> PowerEdge R720 that boots from a hard disk connected to a MegaRAID SAS 
> adapter)
> and also that srp-tests systematically hangs. Reverting the two patches from
> this series fixes both issues. I'm not sure there is another solution than
> reverting the two patches from this series.

Then we need to find the root cause, instead of using sort of workaround
as before.

For SCSI, the restart is always run from scsi_end_request(), and this
kind of restart from all hctx isn't necessary at all.

> 
> Bart.
> 
> 
> BTW, the following appeared in the kernel log when I tried to run srp-tests
> against a kernel with the two patches from this series applied:
> 
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
>   Tainted: GW   4.14.0-rc7-dbg+ #1
>   Tainted: GW   4.14.0-rc7-dbg+ #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/19:1D0   209  2 0x8000
> kworker/19:1D0   209  2 0x8000
> Workqueue: srp_remove srp_remove_work [ib_srp]
> Workqueue: srp_remove srp_remove_work [ib_srp]
> Call Trace:
> Call Trace:
>  __schedule+0x2fa/0xbb0
>  __schedule+0x2fa/0xbb0
>  schedule+0x36/0x90
>  schedule+0x36/0x90
>  async_synchronize_cookie_domain+0x88/0x130
>  async_synchronize_cookie_domain+0x88/0x130
>  ? finish_wait+0x90/0x90
>  ? finish_wait+0x90/0x90
>  async_synchronize_full_domain+0x18/0x20
>  async_synchronize_full_domain+0x18/0x20
>  sd_remove+0x4d/0xc0 [sd_mod]
>  sd_remove+0x4d/0xc0 [sd_mod]
>  device_release_driver_internal+0x160/0x210
>  device_release_driver_internal+0x160/0x210
>  device_release_driver+0x12/0x20
>  device_release_driver+0x12/0x20
>  bus_remove_device+0x100/0x180
>  bus_remove_device+0x100/0x180
>  device_del+0x1d8/0x340
>  device_del+0x1d8/0x340
>  __scsi_remove_device+0xfc/0x130
>  __scsi_remove_device+0xfc/0x130
>  scsi_forget_host+0x25/0x70
>  scsi_forget_host+0x25/0x70
>  scsi_remove_host+0x79/0x120
>  scsi_remove_host+0x79/0x120
>  srp_remove_work+0x90/0x1d0 [ib_srp]
>  srp_remove_work+0x90/0x1d0 [ib_srp]
>  process_one_work+0x20a/0x660
>  process_one_work+0x20a/0x660
>  worker_thread+0x3d/0x3b0
>  worker_thread+0x3d/0x3b0
>  kthread+0x13a/0x150
>  kthread+0x13a/0x150
>  ? process_one_work+0x660/0x660
>  ? process_one_work+0x660/0x660
>  ? kthread_create_on_node+0x40/0x40
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x27/0x40
>  ret_from_fork+0x27/0x40
> 
> Showing all locks held in the system:
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/170:
> 1 lock held by khungtaskd/170:
>  #0:  (tasklist_lock){.+.+}, at: [] 
> debug_show_all_locks+0x3d/0x1a0
>  #0:  (tasklist_lock){.+.+}, at: [] 
> debug_show_all_locks+0x3d/0x1a0
> 4 locks held by kworker/19:1/209:
> 4 locks held by kworker/19:1/209:
>  #0:  ("%s"("srp_remove")){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #0:  ("%s"("srp_remove")){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #1:  ((>remove_work)){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #1:  ((>remove_work)){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #2:  (>scan_mutex){+.+.}, at: [] 
> scsi_remove_host+0x1f/0x120
>  #2:  (>scan_mutex){+.+.}, at: [] 
> scsi_remove_host+0x1f/0x120
>  #3:  (>mutex){}, at: [] 
> device_release_driver_internal+0x39/0x210
>  #3:  (>mutex){}, at: [] 
> device_release_driver_internal+0x39/0x210
> 2 locks held by kworker/u66:0/1927:
> 2 locks held by kworker/u66:0/1927:
>  #0:  ("events_unbound"){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #0:  ("events_unbound"){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #1:  ((>work)){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #1:  ((>work)){+.+.}, at: [] 
> process_one_work+0x195/0x660
> 2 locks held by kworker/5:0/2047:
> 2 locks held by kworker/5:0/2047:
>  #0:  ("kaluad"){+.+.}, at: [] process_one_work+0x195/0x660
>  #0:  ("kaluad"){+.+.}, at: [] process_one_work+0x195/0x660
>  #1:  ((&(>rtpg_work)->work)){+.+.}, at: [] 
> process_one_work+0x195/0x660
>  #1:  ((&(>rtpg_work)->work)){+.+.}, at: [] 
> process_one_work+0x195/0x660
> 
> 
> =
> 
> =
> 
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
>   Tainted: GW   4.14.0-rc7-dbg+ #1
>   Tainted: GW   4.14.0-rc7-dbg+ #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 

[PATCH 5/5] nvme: also expose the namespace identification sysfs files for mpath nodes

2017-11-02 Thread Christoph Hellwig
We do this by adding a helper that returns the ns_head for a device that
can belong to either the per-controller or per-subsystem block device
nodes, and otherwise reuse all the existing code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
Reviewed-by: Johannes Thumshirn 
---
 drivers/nvme/host/core.c  | 57 +++
 drivers/nvme/host/multipath.c |  6 +
 drivers/nvme/host/nvme.h  |  1 +
 3 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 09f0e826b426..9209b8bca64b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2258,12 +2258,22 @@ static ssize_t nvme_sysfs_rescan(struct device *dev,
 }
 static DEVICE_ATTR(rescan_controller, S_IWUSR, NULL, nvme_sysfs_rescan);
 
+static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
+{
+   struct gendisk *disk = dev_to_disk(dev);
+
+   if (disk->fops == _fops)
+   return nvme_get_ns_from_dev(dev)->head;
+   else
+   return disk->private_data;
+}
+
 static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
-   char *buf)
+   char *buf)
 {
-   struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-   struct nvme_ns_ids *ids = >head->ids;
-   struct nvme_subsystem *subsys = ns->ctrl->subsys;
+   struct nvme_ns_head *head = dev_to_ns_head(dev);
+   struct nvme_ns_ids *ids = >ids;
+   struct nvme_subsystem *subsys = head->subsys;
int serial_len = sizeof(subsys->serial);
int model_len = sizeof(subsys->model);
 
@@ -2285,23 +2295,21 @@ static ssize_t wwid_show(struct device *dev, struct 
device_attribute *attr,
 
return sprintf(buf, "nvme.%04x-%*phN-%*phN-%08x\n", subsys->vendor_id,
serial_len, subsys->serial, model_len, subsys->model,
-   ns->head->ns_id);
+   head->ns_id);
 }
 static DEVICE_ATTR(wwid, S_IRUGO, wwid_show, NULL);
 
 static ssize_t nguid_show(struct device *dev, struct device_attribute *attr,
- char *buf)
+   char *buf)
 {
-   struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-   return sprintf(buf, "%pU\n", ns->head->ids.nguid);
+   return sprintf(buf, "%pU\n", dev_to_ns_head(dev)->ids.nguid);
 }
 static DEVICE_ATTR(nguid, S_IRUGO, nguid_show, NULL);
 
 static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
-   char *buf)
+   char *buf)
 {
-   struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-   struct nvme_ns_ids *ids = >head->ids;
+   struct nvme_ns_ids *ids = _to_ns_head(dev)->ids;
 
/* For backward compatibility expose the NGUID to userspace if
 * we have no UUID set
@@ -2316,22 +2324,20 @@ static ssize_t uuid_show(struct device *dev, struct 
device_attribute *attr,
 static DEVICE_ATTR(uuid, S_IRUGO, uuid_show, NULL);
 
 static ssize_t eui_show(struct device *dev, struct device_attribute *attr,
-   char *buf)
+   char *buf)
 {
-   struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-   return sprintf(buf, "%8phd\n", ns->head->ids.eui64);
+   return sprintf(buf, "%8phd\n", dev_to_ns_head(dev)->ids.eui64);
 }
 static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);
 
 static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,
-   char *buf)
+   char *buf)
 {
-   struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-   return sprintf(buf, "%d\n", ns->head->ns_id);
+   return sprintf(buf, "%d\n", dev_to_ns_head(dev)->ns_id);
 }
 static DEVICE_ATTR(nsid, S_IRUGO, nsid_show, NULL);
 
-static struct attribute *nvme_ns_attrs[] = {
+static struct attribute *nvme_ns_id_attrs[] = {
_attr_wwid.attr,
_attr_uuid.attr,
_attr_nguid.attr,
@@ -2340,12 +2346,11 @@ static struct attribute *nvme_ns_attrs[] = {
NULL,
 };
 
-static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
+static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
struct attribute *a, int n)
 {
struct device *dev = container_of(kobj, struct device, kobj);
-   struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-   struct nvme_ns_ids *ids = >head->ids;
+   struct nvme_ns_ids *ids = _to_ns_head(dev)->ids;
 
if (a == _attr_uuid.attr) {
if (uuid_is_null(>uuid) ||
@@ -2363,9 +2368,9 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject 
*kobj,
return a->mode;
 }
 
-static const struct attribute_group nvme_ns_attr_group = {
-   .attrs  = nvme_ns_attrs,
-   .is_visible = 

[PATCH 7/7] block: add a poll_fn callback to struct request_queue

2017-11-02 Thread Christoph Hellwig
That we we can also poll non blk-mq queues.  Mostly needed for
the NVMe multipath code, but could also be useful elsewhere.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 block/blk-core.c | 11 +++
 block/blk-mq.c   | 14 +-
 drivers/nvme/target/io-cmd.c |  2 +-
 fs/block_dev.c   |  4 ++--
 fs/direct-io.c   |  2 +-
 fs/iomap.c   |  2 +-
 include/linux/blkdev.h   |  4 +++-
 mm/page_io.c |  2 +-
 8 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 54fe673fb3ba..1cab621a552a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2319,6 +2319,17 @@ blk_qc_t submit_bio(struct bio *bio)
 }
 EXPORT_SYMBOL(submit_bio);
 
+bool blk_poll(struct request_queue *q, blk_qc_t cookie)
+{
+   if (!q->poll_fn || !blk_qc_t_valid(cookie))
+   return false;
+
+   if (current->plug)
+   blk_flush_plug_list(current->plug, false);
+   return q->poll_fn(q, cookie);
+}
+EXPORT_SYMBOL_GPL(blk_poll);
+
 /**
  * blk_cloned_rq_check_limits - Helper function to check a cloned request
  *  for new the queue limits
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7f01d69879d6..10c99cf6fd71 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,7 @@
 #include "blk-wbt.h"
 #include "blk-mq-sched.h"
 
+static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 static void blk_mq_poll_stats_start(struct request_queue *q);
 static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
 
@@ -2401,6 +2402,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct 
blk_mq_tag_set *set,
spin_lock_init(>requeue_lock);
 
blk_queue_make_request(q, blk_mq_make_request);
+   if (q->mq_ops->poll)
+   q->poll_fn = blk_mq_poll;
 
/*
 * Do this after blk_queue_make_request() overrides it...
@@ -2860,20 +2863,14 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, 
struct request *rq)
return false;
 }
 
-bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
+static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 {
struct blk_mq_hw_ctx *hctx;
-   struct blk_plug *plug;
struct request *rq;
 
-   if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
-   !test_bit(QUEUE_FLAG_POLL, >queue_flags))
+   if (!test_bit(QUEUE_FLAG_POLL, >queue_flags))
return false;
 
-   plug = current->plug;
-   if (plug)
-   blk_flush_plug_list(plug, false);
-
hctx = q->queue_hw_ctx[blk_qc_t_to_queue_num(cookie)];
if (!blk_qc_t_is_internal(cookie))
rq = blk_mq_tag_to_rq(hctx->tags, blk_qc_t_to_tag(cookie));
@@ -2891,7 +2888,6 @@ bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
 
return __blk_mq_poll(hctx, rq);
 }
-EXPORT_SYMBOL_GPL(blk_mq_poll);
 
 static int __init blk_mq_init(void)
 {
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index 0d4c23dc4532..db632818777d 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -94,7 +94,7 @@ static void nvmet_execute_rw(struct nvmet_req *req)
 
cookie = submit_bio(bio);
 
-   blk_mq_poll(bdev_get_queue(req->ns->bdev), cookie);
+   blk_poll(bdev_get_queue(req->ns->bdev), cookie);
 }
 
 static void nvmet_execute_flush(struct nvmet_req *req)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088ffc05c..49a55246ba50 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -249,7 +249,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
iov_iter *iter,
if (!READ_ONCE(bio.bi_private))
break;
if (!(iocb->ki_flags & IOCB_HIPRI) ||
-   !blk_mq_poll(bdev_get_queue(bdev), qc))
+   !blk_poll(bdev_get_queue(bdev), qc))
io_schedule();
}
__set_current_state(TASK_RUNNING);
@@ -414,7 +414,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
*iter, int nr_pages)
break;
 
if (!(iocb->ki_flags & IOCB_HIPRI) ||
-   !blk_mq_poll(bdev_get_queue(bdev), qc))
+   !blk_poll(bdev_get_queue(bdev), qc))
io_schedule();
}
__set_current_state(TASK_RUNNING);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 62cf812ed0e5..d2bc339cb1e9 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -486,7 +486,7 @@ static struct bio *dio_await_one(struct dio *dio)
dio->waiter = current;
spin_unlock_irqrestore(>bio_lock, flags);
if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
-   !blk_mq_poll(dio->bio_disk->queue, dio->bio_cookie))
+   !blk_poll(dio->bio_disk->queue, dio->bio_cookie))
   

nvme multipath support V6

2017-11-02 Thread Christoph Hellwig
Hi all,

this series adds support for multipathing, that is accessing nvme
namespaces through multiple controllers to the nvme core driver.

It is a very thin and efficient implementation that relies on
close cooperation with other bits of the nvme driver, and few small
and simple block helpers.

Compared to dm-multipath the important differences are how management
of the paths is done, and how the I/O path works.

Management of the paths is fully integrated into the nvme driver,
for each newly found nvme controller we check if there are other
controllers that refer to the same subsystem, and if so we link them
up in the nvme driver.  Then for each namespace found we check if
the namespace id and identifiers match to check if we have multiple
controllers that refer to the same namespaces.  For now path
availability is based entirely on the controller status, which at
least for fabrics will be continuously updated based on the mandatory
keep alive timer.  Once the Asynchronous Namespace Access (ANA)
proposal passes in NVMe we will also get per-namespace states in
addition to that, but for now any details of that remain confidential
to NVMe members.

The I/O path is very different from the existing multipath drivers,
which is enabled by the fact that NVMe (unlike SCSI) does not support
partial completions - a controller will either complete a whole
command or not, but never only complete parts of it.  Because of that
there is no need to clone bios or requests - the I/O path simply
redirects the I/O to a suitable path.  For successful commands
multipath is not in the completion stack at all.  For failed commands
we decide if the error could be a path failure, and if yes remove
the bios from the request structure and requeue them before completing
the request.  All together this means there is no performance
degradation compared to normal nvme operation when using the multipath
device node (at least not until I find a dual ported DRAM backed
device :))

A git tree is available at:

   git://git.infradead.org/users/hch/block.git nvme-mpath

gitweb:

   http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-mpath

Changes since V5:
  - dropped various prep patches merged into the nvme tree
  - removed a leftover debug printk
  - small cleanups to blk_steal_bio
  - add a sysfs attribute for hidden gendisks
  - don't complete cancelled requests if another path is available
  - use the instance index for device naming
  - make the multipath code optional at compile time
  - don't use the multipath node for single-controller subsystems
  - add a module_option to disable multipathing (temporarily)
   
Changes since V4:
  - add a refcount to release the device in struct nvme_subsystem
  - use the instance to name the nvme_subsystems in sysfs
  - remove a NULL check before nvme_put_ns_head
  - take a ns_head reference in ->open
  - improve open protection for GENHD_FL_HIDDEN
  - add poll support for the mpath device

Changes since V3:
  - new block layer support for hidden gendisks
  - a couple new patches to refactor device handling before the
actual multipath support
  - don't expose per-controller block device nodes
  - use /dev/nvmeXnZ as the device nodes for the whole subsystem.
  - expose subsystems in sysfs (Hannes Reinecke)
  - fix a subsystem leak when duplicate NQNs are found
  - fix up some names
  - don't clear current_path if freeing a different namespace

Changes since V2:
  - don't create duplicate subsystems on reset (Keith Bush)
  - free requests properly when failing over in I/O completion (Keith Bush)
  - new devices names: /dev/nvm-sub%dn%d
  - expose the namespace identification sysfs files for the mpath nodes

Changes since V1:
  - introduce new nvme_ns_ids structure to clean up identifier handling
  - generic_make_request_fast is now named direct_make_request and calls
generic_make_request_checks
  - reset bi_disk on resubmission
  - create sysfs links between the existing nvme namespace block devices and
the new share mpath device
  - temporarily added the timeout patches from James, this should go into
nvme-4.14, though


[PATCH 4/5] nvme: implement multipath access to nvme subsystems

2017-11-02 Thread Christoph Hellwig
This patch adds native multipath support to the nvme driver.  For each
namespace we create only single block device node, which can be used
to access that namespace through any of the controllers that refer to it.
The gendisk for each controllers path to the name space still exists
inside the kernel, but is hidden from userspace.  The character device
nodes are still available on a per-controller basis.  A new link from
the sysfs directory for the subsystem allows to find all controllers
for a given subsystem.

Currently we will always send I/O to the first available path, this will
be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
ratified and implemented, at which point we will look at the ANA state
for each namespace.  Another possibility that was prototyped is to
use the path that is closes to the submitting NUMA code, which will be
mostly interesting for PCI, but might also be useful for RDMA or FC
transports in the future.  There is not plan to implement round robin
or I/O service time path selectors, as those are not scalable with
the performance rates provided by NVMe.

The multipath device will go away once all paths to it disappear,
any delay to keep it alive needs to be implemented at the controller
level.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/Kconfig |   9 ++
 drivers/nvme/host/Makefile|   1 +
 drivers/nvme/host/core.c  | 133 +++---
 drivers/nvme/host/multipath.c | 255 ++
 drivers/nvme/host/nvme.h  |  57 ++
 5 files changed, 440 insertions(+), 15 deletions(-)
 create mode 100644 drivers/nvme/host/multipath.c

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 46d6cb1e03bd..45886800a1df 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -13,6 +13,15 @@ config BLK_DEV_NVME
  To compile this driver as a module, choose M here: the
  module will be called nvme.
 
+config NVME_MULTIPATH
+   bool "NVMe multipath support"
+   depends on NVME_CORE
+   ---help---
+  This option enables support for multipath access to NVMe
+  subsystems.  If this option is enabled only a single
+  /dev/nvneXnY device will show up for each NVMe namespaces,
+  even if it is accessible through multiple controllers.
+
 config NVME_FABRICS
tristate
 
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index cc0aacb4c8b4..b856f2f549cd 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_NVME_RDMA) += nvme-rdma.o
 obj-$(CONFIG_NVME_FC)  += nvme-fc.o
 
 nvme-core-y:= core.o
+nvme-core-$(CONFIG_NVME_MULTIPATH) += multipath.o
 nvme-core-$(CONFIG_NVM)+= lightnvm.o
 
 nvme-y += pci.o
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7a9ea44814ed..09f0e826b426 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -177,17 +177,22 @@ static inline bool nvme_req_needs_retry(struct request 
*req)
return false;
if (nvme_req(req)->retries >= nvme_max_retries)
return false;
-   if (blk_queue_dying(req->q))
-   return false;
return true;
 }
 
 void nvme_complete_rq(struct request *req)
 {
if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
-   nvme_req(req)->retries++;
-   blk_mq_requeue_request(req, true);
-   return;
+   if (nvme_req_needs_failover(req)) {
+   nvme_failover_req(req);
+   return;
+   }
+
+   if (!blk_queue_dying(req->q)) {
+   nvme_req(req)->retries++;
+   blk_mq_requeue_request(req, true);
+   return;
+   }
}
 
blk_mq_end_request(req, nvme_error_status(req));
@@ -278,7 +283,8 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
ctrl->state = new_state;
 
spin_unlock_irqrestore(>lock, flags);
-
+   if (changed && ctrl->state == NVME_CTRL_LIVE)
+   nvme_kick_requeue_lists(ctrl);
return changed;
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
@@ -288,6 +294,7 @@ static void nvme_free_ns_head(struct kref *ref)
struct nvme_ns_head *head =
container_of(ref, struct nvme_ns_head, ref);
 
+   nvme_mpath_remove_disk(head);
ida_simple_remove(>subsys->ns_ida, head->instance);
list_del_init(>entry);
cleanup_srcu_struct(>srcu);
@@ -1051,11 +1058,33 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct 
nvme_ns *ns,
return status;
 }
 
-static int nvme_ioctl(struct block_device *bdev, fmode_t mode,
-   unsigned int cmd, unsigned long arg)
+/*
+ * Issue ioctl requests on 

[PATCH 2/5] nvme: introduce a nvme_ns_ids structure

2017-11-02 Thread Christoph Hellwig
This allows us to manage the various uniqueue namespace identifiers
together instead needing various variables and arguments.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
Reviewed-by: Hannes Reinecke 
---
 drivers/nvme/host/core.c | 69 +++-
 drivers/nvme/host/nvme.h | 14 +++---
 2 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f385c35ed2d9..dc0bbbd4ef46 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -788,7 +788,7 @@ static int nvme_identify_ctrl(struct nvme_ctrl *dev, struct 
nvme_id_ctrl **id)
 }
 
 static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
-   u8 *eui64, u8 *nguid, uuid_t *uuid)
+   struct nvme_ns_ids *ids)
 {
struct nvme_command c = { };
int status;
@@ -824,7 +824,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, 
unsigned nsid,
goto free_data;
}
len = NVME_NIDT_EUI64_LEN;
-   memcpy(eui64, data + pos + sizeof(*cur), len);
+   memcpy(ids->eui64, data + pos + sizeof(*cur), len);
break;
case NVME_NIDT_NGUID:
if (cur->nidl != NVME_NIDT_NGUID_LEN) {
@@ -834,7 +834,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, 
unsigned nsid,
goto free_data;
}
len = NVME_NIDT_NGUID_LEN;
-   memcpy(nguid, data + pos + sizeof(*cur), len);
+   memcpy(ids->nguid, data + pos + sizeof(*cur), len);
break;
case NVME_NIDT_UUID:
if (cur->nidl != NVME_NIDT_UUID_LEN) {
@@ -844,7 +844,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, 
unsigned nsid,
goto free_data;
}
len = NVME_NIDT_UUID_LEN;
-   uuid_copy(uuid, data + pos + sizeof(*cur));
+   uuid_copy(>uuid, data + pos + sizeof(*cur));
break;
default:
/* Skip unnkown types */
@@ -1146,22 +1146,31 @@ static void nvme_config_discard(struct nvme_ctrl *ctrl,
 }
 
 static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
-   struct nvme_id_ns *id, u8 *eui64, u8 *nguid, uuid_t *uuid)
+   struct nvme_id_ns *id, struct nvme_ns_ids *ids)
 {
+   memset(ids, 0, sizeof(*ids));
+
if (ctrl->vs >= NVME_VS(1, 1, 0))
-   memcpy(eui64, id->eui64, sizeof(id->eui64));
+   memcpy(ids->eui64, id->eui64, sizeof(id->eui64));
if (ctrl->vs >= NVME_VS(1, 2, 0))
-   memcpy(nguid, id->nguid, sizeof(id->nguid));
+   memcpy(ids->nguid, id->nguid, sizeof(id->nguid));
if (ctrl->vs >= NVME_VS(1, 3, 0)) {
 /* Don't treat error as fatal we potentially
  * already have a NGUID or EUI-64
  */
-   if (nvme_identify_ns_descs(ctrl, nsid, eui64, nguid, uuid))
+   if (nvme_identify_ns_descs(ctrl, nsid, ids))
dev_warn(ctrl->device,
 "%s: Identify Descriptors failed\n", __func__);
}
 }
 
+static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
+{
+   return uuid_equal(>uuid, >uuid) &&
+   memcmp(>nguid, >nguid, sizeof(a->nguid)) == 0 &&
+   memcmp(>eui64, >eui64, sizeof(a->eui64)) == 0;
+}
+
 static void nvme_update_disk_info(struct gendisk *disk,
struct nvme_ns *ns, struct nvme_id_ns *id)
 {
@@ -1217,8 +1226,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
struct nvme_ns *ns = disk->private_data;
struct nvme_ctrl *ctrl = ns->ctrl;
struct nvme_id_ns *id;
-   u8 eui64[8] = { 0 }, nguid[16] = { 0 };
-   uuid_t uuid = uuid_null;
+   struct nvme_ns_ids ids;
int ret = 0;
 
if (test_bit(NVME_NS_DEAD, >flags)) {
@@ -1235,10 +1243,8 @@ static int nvme_revalidate_disk(struct gendisk *disk)
goto out;
}
 
-   nvme_report_ns_ids(ctrl, ns->ns_id, id, eui64, nguid, );
-   if (!uuid_equal(>uuid, ) ||
-   memcmp(>nguid, , sizeof(ns->nguid)) ||
-   memcmp(>eui, , sizeof(ns->eui))) {
+   nvme_report_ns_ids(ctrl, ns->ns_id, id, );
+   if (!nvme_ns_ids_equal(>ids, )) {
dev_err(ctrl->device,
"identifiers changed for nsid %d\n", ns->ns_id);
ret = -ENODEV;
@@ -2142,18 +2148,19 @@ static ssize_t wwid_show(struct device *dev, struct 
device_attribute *attr,
   

[PATCH 3/5] nvme: track shared namespaces

2017-11-02 Thread Christoph Hellwig
Introduce a new struct nvme_ns_head that holds information about an actual
namespace, unlike struct nvme_ns, which only holds the per-controller
namespace information.  For private namespaces there is a 1:1 relation of
the two, but for shared namespaces this lets us discover all the paths to
it.  For now only the identifiers are moved to the new structure, but most
of the information in struct nvme_ns should eventually move over.

To allow lockless path lookup the list of nvme_ns structures per
nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns
structure through call_srcu.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Keith Busch 
Reviewed-by: Javier González 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 drivers/nvme/host/core.c | 220 +++
 drivers/nvme/host/lightnvm.c |  14 +--
 drivers/nvme/host/nvme.h |  26 -
 3 files changed, 210 insertions(+), 50 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dc0bbbd4ef46..7a9ea44814ed 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -283,6 +283,22 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
 
+static void nvme_free_ns_head(struct kref *ref)
+{
+   struct nvme_ns_head *head =
+   container_of(ref, struct nvme_ns_head, ref);
+
+   ida_simple_remove(>subsys->ns_ida, head->instance);
+   list_del_init(>entry);
+   cleanup_srcu_struct(>srcu);
+   kfree(head);
+}
+
+static void nvme_put_ns_head(struct nvme_ns_head *head)
+{
+   kref_put(>ref, nvme_free_ns_head);
+}
+
 static void nvme_free_ns(struct kref *kref)
 {
struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
@@ -291,7 +307,7 @@ static void nvme_free_ns(struct kref *kref)
nvme_nvm_unregister(ns);
 
put_disk(ns->disk);
-   ida_simple_remove(>ctrl->ns_ida, ns->instance);
+   nvme_put_ns_head(ns->head);
nvme_put_ctrl(ns->ctrl);
kfree(ns);
 }
@@ -427,7 +443,7 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 {
memset(cmnd, 0, sizeof(*cmnd));
cmnd->common.opcode = nvme_cmd_flush;
-   cmnd->common.nsid = cpu_to_le32(ns->ns_id);
+   cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
@@ -458,7 +474,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, 
struct request *req,
 
memset(cmnd, 0, sizeof(*cmnd));
cmnd->dsm.opcode = nvme_cmd_dsm;
-   cmnd->dsm.nsid = cpu_to_le32(ns->ns_id);
+   cmnd->dsm.nsid = cpu_to_le32(ns->head->ns_id);
cmnd->dsm.nr = cpu_to_le32(segments - 1);
cmnd->dsm.attributes = cpu_to_le32(NVME_DSMGMT_AD);
 
@@ -497,7 +513,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 
memset(cmnd, 0, sizeof(*cmnd));
cmnd->rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
-   cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
+   cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
 
@@ -978,7 +994,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct 
nvme_user_io __user *uio)
memset(, 0, sizeof(c));
c.rw.opcode = io.opcode;
c.rw.flags = io.flags;
-   c.rw.nsid = cpu_to_le32(ns->ns_id);
+   c.rw.nsid = cpu_to_le32(ns->head->ns_id);
c.rw.slba = cpu_to_le64(io.slba);
c.rw.length = cpu_to_le16(io.nblocks);
c.rw.control = cpu_to_le16(io.control);
@@ -1043,7 +1059,7 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t 
mode,
switch (cmd) {
case NVME_IOCTL_ID:
force_successful_syscall_return();
-   return ns->ns_id;
+   return ns->head->ns_id;
case NVME_IOCTL_ADMIN_CMD:
return nvme_user_cmd(ns->ctrl, NULL, (void __user *)arg);
case NVME_IOCTL_IO_CMD:
@@ -1164,6 +1180,13 @@ static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, 
unsigned int nsid,
}
 }
 
+static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
+{
+   return !uuid_is_null(>uuid) ||
+   memchr_inv(ids->nguid, 0, sizeof(ids->nguid)) ||
+   memchr_inv(ids->eui64, 0, sizeof(ids->eui64));
+}
+
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
 {
return uuid_equal(>uuid, >uuid) &&
@@ -1234,7 +1257,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
return -ENODEV;
}
 
-   id = nvme_identify_ns(ctrl, ns->ns_id);
+   id = nvme_identify_ns(ctrl, ns->head->ns_id);
if (!id)
return -ENODEV;
 
@@ -1243,10 +1266,10 @@ static int 

[PATCH 6/7] block: introduce GENHD_FL_HIDDEN

2017-11-02 Thread Christoph Hellwig
With this flag a driver can create a gendisk that can be used for I/O
submission inside the kernel, but which is not registered as user
facing block device.  This will be useful for the NVMe multipath
implementation.

Signed-off-by: Christoph Hellwig 
---
 block/genhd.c | 68 +--
 include/linux/genhd.h |  1 +
 2 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 1174d24e405e..835e907e6e01 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -585,6 +585,11 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
 */
pm_runtime_set_memalloc_noio(ddev, true);
 
+   if (disk->flags & GENHD_FL_HIDDEN) {
+   dev_set_uevent_suppress(ddev, 0);
+   return;
+   }
+
disk->part0.holder_dir = kobject_create_and_add("holders", >kobj);
disk->slave_dir = kobject_create_and_add("slaves", >kobj);
 
@@ -616,6 +621,11 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
while ((part = disk_part_iter_next()))
kobject_uevent(_to_dev(part)->kobj, KOBJ_ADD);
disk_part_iter_exit();
+
+   err = sysfs_create_link(>kobj,
+   >queue->backing_dev_info->dev->kobj,
+   "bdi");
+   WARN_ON(err);
 }
 
 /**
@@ -630,7 +640,6 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
  */
 void device_add_disk(struct device *parent, struct gendisk *disk)
 {
-   struct backing_dev_info *bdi;
dev_t devt;
int retval;
 
@@ -639,7 +648,8 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
 * parameters make sense.
 */
WARN_ON(disk->minors && !(disk->major || disk->first_minor));
-   WARN_ON(!disk->minors && !(disk->flags & GENHD_FL_EXT_DEVT));
+   WARN_ON(!disk->minors &&
+   !(disk->flags & (GENHD_FL_EXT_DEVT | GENHD_FL_HIDDEN)));
 
disk->flags |= GENHD_FL_UP;
 
@@ -648,18 +658,26 @@ void device_add_disk(struct device *parent, struct 
gendisk *disk)
WARN_ON(1);
return;
}
-   disk_to_dev(disk)->devt = devt;
disk->major = MAJOR(devt);
disk->first_minor = MINOR(devt);
 
disk_alloc_events(disk);
 
-   /* Register BDI before referencing it from bdev */
-   bdi = disk->queue->backing_dev_info;
-   bdi_register_owner(bdi, disk_to_dev(disk));
-
-   blk_register_region(disk_devt(disk), disk->minors, NULL,
-   exact_match, exact_lock, disk);
+   if (disk->flags & GENHD_FL_HIDDEN) {
+   /*
+* Don't let hidden disks show up in /proc/partitions,
+* and don't bother scanning for partitions either.
+*/
+   disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
+   disk->flags |= GENHD_FL_NO_PART_SCAN;
+   } else {
+   /* Register BDI before referencing it from bdev */
+   disk_to_dev(disk)->devt = devt;
+   bdi_register_owner(disk->queue->backing_dev_info,
+   disk_to_dev(disk));
+   blk_register_region(disk_devt(disk), disk->minors, NULL,
+   exact_match, exact_lock, disk);
+   }
register_disk(parent, disk);
blk_register_queue(disk);
 
@@ -669,10 +687,6 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
 */
WARN_ON_ONCE(!blk_get_queue(disk->queue));
 
-   retval = sysfs_create_link(_to_dev(disk)->kobj, >dev->kobj,
-  "bdi");
-   WARN_ON(retval);
-
disk_add_events(disk);
blk_integrity_add(disk);
 }
@@ -701,7 +715,8 @@ void del_gendisk(struct gendisk *disk)
set_capacity(disk, 0);
disk->flags &= ~GENHD_FL_UP;
 
-   sysfs_remove_link(_to_dev(disk)->kobj, "bdi");
+   if (!(disk->flags & GENHD_FL_HIDDEN))
+   sysfs_remove_link(_to_dev(disk)->kobj, "bdi");
if (disk->queue) {
/*
 * Unregister bdi before releasing device numbers (as they can
@@ -712,13 +727,15 @@ void del_gendisk(struct gendisk *disk)
} else {
WARN_ON(1);
}
-   blk_unregister_region(disk_devt(disk), disk->minors);
+
+   if (!(disk->flags & GENHD_FL_HIDDEN)) {
+   blk_unregister_region(disk_devt(disk), disk->minors);
+   kobject_put(disk->part0.holder_dir);
+   kobject_put(disk->slave_dir);
+   }
 
part_stat_set_all(>part0, 0);
disk->part0.stamp = 0;
-
-   kobject_put(disk->part0.holder_dir);
-   kobject_put(disk->slave_dir);
if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
@@ -781,6 

[PATCH 1/5] nvme: track subsystems

2017-11-02 Thread Christoph Hellwig
This adds a new nvme_subsystem structure so that we can track multiple
controllers that belong to a single subsystem.  For now we only use it
to store the NQN, and to check that we don't have duplicate NQNs unless
the involved subsystems support multiple controllers.

Includes code originally from Hannes Reinecke to expose the subsystems
in sysfs.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
---
 drivers/nvme/host/core.c| 210 ++--
 drivers/nvme/host/fabrics.c |   4 +-
 drivers/nvme/host/nvme.h|  27 --
 3 files changed, 205 insertions(+), 36 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 66211617e7e8..f385c35ed2d9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -68,9 +68,14 @@ MODULE_PARM_DESC(streams, "turn on support for Streams write 
directives");
 struct workqueue_struct *nvme_wq;
 EXPORT_SYMBOL_GPL(nvme_wq);
 
+static DEFINE_IDA(nvme_subsystems_ida);
+static LIST_HEAD(nvme_subsystems);
+static DEFINE_MUTEX(nvme_subsystems_lock);
+
 static DEFINE_IDA(nvme_instance_ida);
 static dev_t nvme_chr_devt;
 static struct class *nvme_class;
+static struct class *nvme_subsys_class;
 
 static __le32 nvme_get_log_dw10(u8 lid, size_t size)
 {
@@ -1717,14 +1722,15 @@ static bool quirk_matches(const struct nvme_id_ctrl *id,
string_matches(id->fr, q->fr, sizeof(id->fr));
 }
 
-static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+static void nvme_init_subnqn(struct nvme_subsystem *subsys, struct nvme_ctrl 
*ctrl,
+   struct nvme_id_ctrl *id)
 {
size_t nqnlen;
int off;
 
nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
-   strcpy(ctrl->subnqn, id->subnqn);
+   strcpy(subsys->subnqn, id->subnqn);
return;
}
 
@@ -1732,14 +1738,140 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, 
struct nvme_id_ctrl *id)
dev_warn(ctrl->device, "missing or invalid SUBNQN field.\n");
 
/* Generate a "fake" NQN per Figure 254 in NVMe 1.3 + ECN 001 */
-   off = snprintf(ctrl->subnqn, NVMF_NQN_SIZE,
+   off = snprintf(subsys->subnqn, NVMF_NQN_SIZE,
"nqn.2014.08.org.nvmexpress:%4x%4x",
le16_to_cpu(id->vid), le16_to_cpu(id->ssvid));
-   memcpy(ctrl->subnqn + off, id->sn, sizeof(id->sn));
+   memcpy(subsys->subnqn + off, id->sn, sizeof(id->sn));
off += sizeof(id->sn);
-   memcpy(ctrl->subnqn + off, id->mn, sizeof(id->mn));
+   memcpy(subsys->subnqn + off, id->mn, sizeof(id->mn));
off += sizeof(id->mn);
-   memset(ctrl->subnqn + off, 0, sizeof(ctrl->subnqn) - off);
+   memset(subsys->subnqn + off, 0, sizeof(subsys->subnqn) - off);
+}
+
+static void __nvme_release_subsystem(struct nvme_subsystem *subsys)
+{
+   ida_simple_remove(_subsystems_ida, subsys->instance);
+   kfree(subsys);
+}
+
+static void nvme_release_subsystem(struct device *dev)
+{
+   __nvme_release_subsystem(container_of(dev, struct nvme_subsystem, dev));
+}
+
+static void nvme_destroy_subsystem(struct kref *ref)
+{
+   struct nvme_subsystem *subsys =
+   container_of(ref, struct nvme_subsystem, ref);
+
+   mutex_lock(_subsystems_lock);
+   list_del(>entry);
+   mutex_unlock(_subsystems_lock);
+
+   device_del(>dev);
+   put_device(>dev);
+}
+
+static void nvme_put_subsystem(struct nvme_subsystem *subsys)
+{
+   kref_put(>ref, nvme_destroy_subsystem);
+}
+
+static struct nvme_subsystem *__nvme_find_get_subsystem(const char *subsysnqn)
+{
+   struct nvme_subsystem *subsys;
+
+   lockdep_assert_held(_subsystems_lock);
+
+   list_for_each_entry(subsys, _subsystems, entry) {
+   if (strcmp(subsys->subnqn, subsysnqn))
+   continue;
+   if (!kref_get_unless_zero(>ref))
+   continue;
+   return subsys;
+   }
+
+   return NULL;
+}
+
+static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+{
+   struct nvme_subsystem *subsys, *found;
+   int ret;
+
+   subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
+   if (!subsys)
+   return -ENOMEM;
+   ret = ida_simple_get(_subsystems_ida, 0, 0, GFP_KERNEL);
+   if (ret < 0) {
+   kfree(subsys);
+   return ret;
+   }
+   subsys->instance = ret;
+   mutex_init(>lock);
+   kref_init(>ref);
+   INIT_LIST_HEAD(>ctrls);
+   nvme_init_subnqn(subsys, ctrl, id);
+   memcpy(subsys->serial, id->sn, sizeof(subsys->serial));
+   memcpy(subsys->model, id->mn, sizeof(subsys->model));
+   memcpy(subsys->firmware_rev, id->fr, sizeof(subsys->firmware_rev));
+   subsys->vendor_id = le16_to_cpu(id->vid);
+   subsys->cmic = id->cmic;
+
+

[PATCH 5/7] block: don't look at the struct device dev_t in disk_devt

2017-11-02 Thread Christoph Hellwig
The hidden gendisks introduced in the next patch need to keep the dev
field in their struct device empty so that udev won't try to create
block device nodes for them.  To support that rewrite disk_devt to
look at the major and first_minor fields in the gendisk itself instead
of looking into the struct device.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
---
 block/genhd.c | 4 
 include/linux/genhd.h | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index dd305c65ffb0..1174d24e405e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -649,10 +649,6 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
return;
}
disk_to_dev(disk)->devt = devt;
-
-   /* ->major and ->first_minor aren't supposed to be
-* dereferenced from here on, but set them just in case.
-*/
disk->major = MAJOR(devt);
disk->first_minor = MINOR(devt);
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ea652bfcd675..5c0ed5db33c2 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -234,7 +234,7 @@ static inline bool disk_part_scan_enabled(struct gendisk 
*disk)
 
 static inline dev_t disk_devt(struct gendisk *disk)
 {
-   return disk_to_dev(disk)->devt;
+   return MKDEV(disk->major, disk->first_minor);
 }
 
 static inline dev_t part_devt(struct hd_struct *part)
-- 
2.14.2



[PATCH 4/7] block: add a blk_steal_bios helper

2017-11-02 Thread Christoph Hellwig
This helpers allows to bounce steal the uncompleted bios from a request so
that they can be reissued on another path.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Sagi Grimberg 
Reviewed-by: Hannes Reinecke 
---
 block/blk-core.c   | 21 +
 include/linux/blkdev.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index b8c80f39f5fe..54fe673fb3ba 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2767,6 +2767,27 @@ struct request *blk_fetch_request(struct request_queue 
*q)
 }
 EXPORT_SYMBOL(blk_fetch_request);
 
+/*
+ * Steal bios from a request and add them to a bio list.
+ * The request must not have been partially completed before.
+ */
+void blk_steal_bios(struct bio_list *list, struct request *rq)
+{
+   if (rq->bio) {
+   if (list->tail)
+   list->tail->bi_next = rq->bio;
+   else
+   list->head = rq->bio;
+   list->tail = rq->biotail;
+
+   rq->bio = NULL;
+   rq->biotail = NULL;
+   }
+
+   rq->__data_len = 0;
+}
+EXPORT_SYMBOL_GPL(blk_steal_bios);
+
 /**
  * blk_update_request - Special helper function for request stacking drivers
  * @req:  the request being processed
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 780f01db5899..45c63764a14e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1110,6 +1110,8 @@ extern struct request *blk_peek_request(struct 
request_queue *q);
 extern void blk_start_request(struct request *rq);
 extern struct request *blk_fetch_request(struct request_queue *q);
 
+void blk_steal_bios(struct bio_list *list, struct request *rq);
+
 /*
  * Request completion related functions.
  *
-- 
2.14.2



[PATCH 2/7] block: add REQ_DRV bit

2017-11-02 Thread Christoph Hellwig
Set aside a bit in the request/bio flags for driver use.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Sagi Grimberg 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
---
 include/linux/blk_types.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index acc2f3cdc2fc..7ec2ed097a8a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -229,6 +229,9 @@ enum req_flag_bits {
/* command specific flags for REQ_OP_WRITE_ZEROES: */
__REQ_NOUNMAP,  /* do not free blocks when zeroing */
 
+   /* for driver use */
+   __REQ_DRV,
+
__REQ_NR_BITS,  /* stops here */
 };
 
@@ -249,6 +252,8 @@ enum req_flag_bits {
 
 #define REQ_NOUNMAP(1ULL << __REQ_NOUNMAP)
 
+#define REQ_DRV(1ULL << __REQ_DRV)
+
 #define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
 
-- 
2.14.2



[PATCH 3/7] block: provide a direct_make_request helper

2017-11-02 Thread Christoph Hellwig
This helper allows reinserting a bio into a new queue without much
overhead, but requires all queue limits to be the same for the upper
and lower queues, and it does not provide any recursion preventions.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Sagi Grimberg 
Reviewed-by: Javier González 
Reviewed-by: Hannes Reinecke 
---
 block/blk-core.c   | 34 ++
 include/linux/blkdev.h |  1 +
 2 files changed, 35 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14f7674fa0b1..b8c80f39f5fe 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2241,6 +2241,40 @@ blk_qc_t generic_make_request(struct bio *bio)
 }
 EXPORT_SYMBOL(generic_make_request);
 
+/**
+ * direct_make_request - hand a buffer directly to its device driver for I/O
+ * @bio:  The bio describing the location in memory and on the device.
+ *
+ * This function behaves like generic_make_request(), but does not protect
+ * against recursion.  Must only be used if the called driver is known
+ * to not call generic_make_request (or direct_make_request) again from
+ * its make_request function.  (Calling direct_make_request again from
+ * a workqueue is perfectly fine as that doesn't recurse).
+ */
+blk_qc_t direct_make_request(struct bio *bio)
+{
+   struct request_queue *q = bio->bi_disk->queue;
+   bool nowait = bio->bi_opf & REQ_NOWAIT;
+   blk_qc_t ret;
+
+   if (!generic_make_request_checks(bio))
+   return BLK_QC_T_NONE;
+
+   if (unlikely(blk_queue_enter(q, nowait))) {
+   if (nowait && !blk_queue_dying(q))
+   bio->bi_status = BLK_STS_AGAIN;
+   else
+   bio->bi_status = BLK_STS_IOERR;
+   bio_endio(bio);
+   return BLK_QC_T_NONE;
+   }
+
+   ret = q->make_request_fn(q, bio);
+   blk_queue_exit(q);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(direct_make_request);
+
 /**
  * submit_bio - submit a bio to the block device layer for I/O
  * @bio: The  bio which describes the I/O
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02fa42d24b52..780f01db5899 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -936,6 +936,7 @@ do {
\
 extern int blk_register_queue(struct gendisk *disk);
 extern void blk_unregister_queue(struct gendisk *disk);
 extern blk_qc_t generic_make_request(struct bio *bio);
+extern blk_qc_t direct_make_request(struct bio *bio);
 extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
-- 
2.14.2



block layer patches for nvme multipath support

2017-11-02 Thread Christoph Hellwig
Hi Jens,

these patches add the block layer helpers / tweaks for NVMe multipath
support.  Can you review them for inclusion?

There have been no functional changes to the versions posted with
previous nvme multipath patchset.


[PATCH 1/7] block: move REQ_NOWAIT

2017-11-02 Thread Christoph Hellwig
This flag should be before the operation-specific REQ_NOUNMAP bit.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Sagi Grimberg 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
---
 include/linux/blk_types.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a2d2aa709cef..acc2f3cdc2fc 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -224,11 +224,11 @@ enum req_flag_bits {
__REQ_PREFLUSH, /* request for cache flush */
__REQ_RAHEAD,   /* read ahead, can fail anytime */
__REQ_BACKGROUND,   /* background IO */
+   __REQ_NOWAIT,   /* Don't wait if request will block */
 
/* command specific flags for REQ_OP_WRITE_ZEROES: */
__REQ_NOUNMAP,  /* do not free blocks when zeroing */
 
-   __REQ_NOWAIT,   /* Don't wait if request will block */
__REQ_NR_BITS,  /* stops here */
 };
 
@@ -245,9 +245,9 @@ enum req_flag_bits {
 #define REQ_PREFLUSH   (1ULL << __REQ_PREFLUSH)
 #define REQ_RAHEAD (1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND)
+#define REQ_NOWAIT (1ULL << __REQ_NOWAIT)
 
 #define REQ_NOUNMAP(1ULL << __REQ_NOUNMAP)
-#define REQ_NOWAIT (1ULL << __REQ_NOWAIT)
 
 #define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
-- 
2.14.2



[PATCH 5/6] nvme: set the chunk size before freezing the queue

2017-11-02 Thread Christoph Hellwig
We don't need a frozen queue to update the chunk_size, which just is a
hint, and moving it a little earlier will allow for some better code
reuse with the multipath code.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bfdbecdc307a..3758528be0ef 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1184,12 +1184,13 @@ static void __nvme_revalidate_disk(struct gendisk 
*disk, struct nvme_id_ns *id)
if (ctrl->nr_streams && ns->sws && ns->sgs)
stream_alignment = ns->sws * ns->sgs;
 
+   if (ns->noiob)
+   nvme_set_chunk_size(ns);
+
blk_mq_freeze_queue(disk->queue);
blk_integrity_unregister(disk);
 
blk_queue_logical_block_size(ns->queue, bs);
-   if (ns->noiob)
-   nvme_set_chunk_size(ns);
if (ns->ms && !ns->ext &&
(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
nvme_init_integrity(disk, ns->ms, ns->pi_type);
-- 
2.14.2



[PATCH 6/6] nvme: split __nvme_revalidate_disk

2017-11-02 Thread Christoph Hellwig
Split out the code that applies the calculate value to a given disk/queue
into new helper that can be reused by the multipath code.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 49 +---
 1 file changed, 26 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3758528be0ef..66211617e7e8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1157,12 +1157,34 @@ static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, 
unsigned int nsid,
}
 }
 
+static void nvme_update_disk_info(struct gendisk *disk,
+   struct nvme_ns *ns, struct nvme_id_ns *id)
+{
+   sector_t capacity = le64_to_cpup(>nsze) << (ns->lba_shift - 9);
+   unsigned stream_alignment = 0;
+
+   if (ns->ctrl->nr_streams && ns->sws && ns->sgs)
+   stream_alignment = ns->sws * ns->sgs;
+
+   blk_mq_freeze_queue(disk->queue);
+   blk_integrity_unregister(disk);
+
+   blk_queue_logical_block_size(disk->queue, 1 << ns->lba_shift);
+   if (ns->ms && !ns->ext &&
+   (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
+   nvme_init_integrity(disk, ns->ms, ns->pi_type);
+   if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
+   capacity = 0;
+   set_capacity(disk, capacity);
+
+   if (ns->ctrl->oncs & NVME_CTRL_ONCS_DSM)
+   nvme_config_discard(ns->ctrl, stream_alignment, disk->queue);
+   blk_mq_unfreeze_queue(disk->queue);
+}
+
 static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 {
struct nvme_ns *ns = disk->private_data;
-   struct nvme_ctrl *ctrl = ns->ctrl;
-   unsigned stream_alignment = 0;
-   u16 bs;
 
/*
 * If identify namespace failed, use default 512 byte block size so
@@ -1171,7 +1193,6 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
ns->lba_shift = id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ds;
if (ns->lba_shift == 0)
ns->lba_shift = 9;
-   bs = 1 << ns->lba_shift;
ns->noiob = le16_to_cpu(id->noiob);
ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
@@ -1181,27 +1202,9 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
else
ns->pi_type = 0;
 
-   if (ctrl->nr_streams && ns->sws && ns->sgs)
-   stream_alignment = ns->sws * ns->sgs;
-
if (ns->noiob)
nvme_set_chunk_size(ns);
-
-   blk_mq_freeze_queue(disk->queue);
-   blk_integrity_unregister(disk);
-
-   blk_queue_logical_block_size(ns->queue, bs);
-   if (ns->ms && !ns->ext &&
-   (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
-   nvme_init_integrity(disk, ns->ms, ns->pi_type);
-   if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
-   set_capacity(disk, 0);
-   else
-   set_capacity(disk, le64_to_cpup(>nsze) << (ns->lba_shift - 
9));
-
-   if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
-   nvme_config_discard(ctrl, stream_alignment, disk->queue);
-   blk_mq_unfreeze_queue(disk->queue);
+   nvme_update_disk_info(disk, ns, id);
 }
 
 static int nvme_revalidate_disk(struct gendisk *disk)
-- 
2.14.2



[PATCH 2/6] nvme: always unregister the integrity profile in __nvme_revalidate_disk

2017-11-02 Thread Christoph Hellwig
This is safe because the queue is always frozen when we revalidate, and
it simplifies both the existing code as well as the multipath
implementation.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 40 ++--
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1c2686d4eaf3..bc27f6603861 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1081,29 +1081,6 @@ static int nvme_getgeo(struct block_device *bdev, struct 
hd_geometry *geo)
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
-   u16 bs)
-{
-   struct nvme_ns *ns = disk->private_data;
-   u16 old_ms = ns->ms;
-   u8 pi_type = 0;
-
-   ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-   ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
-
-   /* PI implementation requires metadata equal t10 pi tuple size */
-   if (ns->ms == sizeof(struct t10_pi_tuple))
-   pi_type = id->dps & NVME_NS_DPS_PI_MASK;
-
-   if (blk_get_integrity(disk) &&
-   (ns->pi_type != pi_type || ns->ms != old_ms ||
-bs != queue_logical_block_size(disk->queue) ||
-(ns->ms && ns->ext)))
-   blk_integrity_unregister(disk);
-
-   ns->pi_type = pi_type;
-}
-
 static void nvme_init_integrity(struct nvme_ns *ns)
 {
struct blk_integrity integrity;
@@ -1130,10 +1107,6 @@ static void nvme_init_integrity(struct nvme_ns *ns)
blk_queue_max_integrity_segments(ns->queue, 1);
 }
 #else
-static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
-   u16 bs)
-{
-}
 static void nvme_init_integrity(struct nvme_ns *ns)
 {
 }
@@ -1202,15 +1175,22 @@ static void __nvme_revalidate_disk(struct gendisk 
*disk, struct nvme_id_ns *id)
ns->lba_shift = 9;
bs = 1 << ns->lba_shift;
ns->noiob = le16_to_cpu(id->noiob);
+   ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
+   ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
+   /* the PI implementation requires metadata equal t10 pi tuple size */
+   if (ns->ms == sizeof(struct t10_pi_tuple))
+   ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
+   else
+   ns->pi_type = 0;
 
blk_mq_freeze_queue(disk->queue);
+   blk_integrity_unregister(disk);
 
-   if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
-   nvme_prep_integrity(disk, id, bs);
blk_queue_logical_block_size(ns->queue, bs);
if (ns->noiob)
nvme_set_chunk_size(ns);
-   if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
+   if (ns->ms && !ns->ext &&
+   (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
nvme_init_integrity(ns);
if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
set_capacity(disk, 0);
-- 
2.14.2



[PATCH 1/6] nvme: move the dying queue check from cancel to completion

2017-11-02 Thread Christoph Hellwig
With multipath we don't want a hard DNR bit on a request that is cancelled
by a controller reset, but instead want to be able to retry it on another
patch.  To archive this don't always set the DNR bit when the queue is
dying in nvme_cancel_request, but defer that decision to
nvme_req_needs_retry.  Note that it applies to any command there and not
just cancelled commands, but one the queue is dying that is the right
thing to do anyway.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 07b9f4e1c283..1c2686d4eaf3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -172,6 +172,8 @@ static inline bool nvme_req_needs_retry(struct request *req)
return false;
if (nvme_req(req)->retries >= nvme_max_retries)
return false;
+   if (blk_queue_dying(req->q))
+   return false;
return true;
 }
 
@@ -189,18 +191,13 @@ EXPORT_SYMBOL_GPL(nvme_complete_rq);
 
 void nvme_cancel_request(struct request *req, void *data, bool reserved)
 {
-   int status;
-
if (!blk_mq_request_started(req))
return;
 
dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
"Cancelling I/O %d", req->tag);
 
-   status = NVME_SC_ABORT_REQ;
-   if (blk_queue_dying(req->q))
-   status |= NVME_SC_DNR;
-   nvme_req(req)->status = status;
+   nvme_req(req)->status = NVME_SC_ABORT_REQ;
blk_mq_complete_request(req);
 
 }
-- 
2.14.2



[PATCH 3/6] nvme: don't pass struct nvme_ns to nvme_init_integrity

2017-11-02 Thread Christoph Hellwig
To allow reusing this function for the multipath node.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bc27f6603861..8702e49c5c45 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1081,12 +1081,12 @@ static int nvme_getgeo(struct block_device *bdev, 
struct hd_geometry *geo)
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct nvme_ns *ns)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
 {
struct blk_integrity integrity;
 
memset(, 0, sizeof(integrity));
-   switch (ns->pi_type) {
+   switch (pi_type) {
case NVME_NS_DPS_PI_TYPE3:
integrity.profile = _pi_type3_crc;
integrity.tag_size = sizeof(u16) + sizeof(u32);
@@ -1102,12 +1102,12 @@ static void nvme_init_integrity(struct nvme_ns *ns)
integrity.profile = NULL;
break;
}
-   integrity.tuple_size = ns->ms;
-   blk_integrity_register(ns->disk, );
-   blk_queue_max_integrity_segments(ns->queue, 1);
+   integrity.tuple_size = ms;
+   blk_integrity_register(disk, );
+   blk_queue_max_integrity_segments(disk->queue, 1);
 }
 #else
-static void nvme_init_integrity(struct nvme_ns *ns)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
 {
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
@@ -1191,7 +1191,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
nvme_set_chunk_size(ns);
if (ns->ms && !ns->ext &&
(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
-   nvme_init_integrity(ns);
+   nvme_init_integrity(disk, ns->ms, ns->pi_type);
if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
set_capacity(disk, 0);
else
-- 
2.14.2



[PATCH 4/6] nvme: don't pass struct nvme_ns to nvme_config_discard

2017-11-02 Thread Christoph Hellwig
To allow reusing this function for the multipath node.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8702e49c5c45..bfdbecdc307a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1118,29 +1118,26 @@ static void nvme_set_chunk_size(struct nvme_ns *ns)
blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(chunk_size));
 }
 
-static void nvme_config_discard(struct nvme_ns *ns)
+static void nvme_config_discard(struct nvme_ctrl *ctrl,
+   unsigned stream_alignment, struct request_queue *queue)
 {
-   struct nvme_ctrl *ctrl = ns->ctrl;
-   u32 logical_block_size = queue_logical_block_size(ns->queue);
+   u32 size = queue_logical_block_size(queue);
+
+   if (stream_alignment)
+   size *= stream_alignment;
 
BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
NVME_DSM_MAX_RANGES);
 
-   if (ctrl->nr_streams && ns->sws && ns->sgs) {
-   unsigned int sz = logical_block_size * ns->sws * ns->sgs;
+   queue->limits.discard_alignment = size;
+   queue->limits.discard_granularity = size;
 
-   ns->queue->limits.discard_alignment = sz;
-   ns->queue->limits.discard_granularity = sz;
-   } else {
-   ns->queue->limits.discard_alignment = logical_block_size;
-   ns->queue->limits.discard_granularity = logical_block_size;
-   }
-   blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
-   blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
-   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
+   blk_queue_max_discard_sectors(queue, UINT_MAX);
+   blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
+   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, queue);
 
if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
-   blk_queue_max_write_zeroes_sectors(ns->queue, UINT_MAX);
+   blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
 static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
@@ -1164,6 +1161,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
 {
struct nvme_ns *ns = disk->private_data;
struct nvme_ctrl *ctrl = ns->ctrl;
+   unsigned stream_alignment = 0;
u16 bs;
 
/*
@@ -1183,6 +1181,9 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
else
ns->pi_type = 0;
 
+   if (ctrl->nr_streams && ns->sws && ns->sgs)
+   stream_alignment = ns->sws * ns->sgs;
+
blk_mq_freeze_queue(disk->queue);
blk_integrity_unregister(disk);
 
@@ -1198,7 +1199,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, 
struct nvme_id_ns *id)
set_capacity(disk, le64_to_cpup(>nsze) << (ns->lba_shift - 
9));
 
if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
-   nvme_config_discard(ns);
+   nvme_config_discard(ctrl, stream_alignment, disk->queue);
blk_mq_unfreeze_queue(disk->queue);
 }
 
-- 
2.14.2



nvme cleanups in preparation for the multipath code

2017-11-02 Thread Christoph Hellwig
Hi all,

below are a couple cleanup patches to prepare for the next version
of the nvme multipath code.


Re: [PATCH 16/17] nvme: implement multipath access to nvme subsystems

2017-11-02 Thread Christoph Hellwig
On Mon, Oct 30, 2017 at 11:37:55AM +0800, Guan Junxiong wrote:
> > +   head->disk->flags = GENHD_FL_EXT_DEVT;
> > +   sprintf(head->disk->disk_name, "nvme%dn%d",
> > +   ctrl->subsys->instance, nsid);
> 
> Is it okay to use head->instance instead of nsid for disk name nvme#n# ?
> Becuase _nsid_ sets are not continuous sometimes, so disk name is ugly in 
> that case.

This actually was supposed to be the ns_head instance, that's why the
ns_ida moved to the subsystem.  I've fixed it up.


Re: [PATCH] bcache: add a comment in journal bucket reading

2017-11-02 Thread Michael Lyle
Tang--

On 11/01/2017 08:08 PM, tang.jun...@zte.com.cn wrote:
> From: Tang Junhui 
> 
> Journal bucket is a circular buffer, the bucket
> can be like YYYNNNYY, which means the first valid journal in
> the 7th bucket, and the latest valid journal in third bucket, in
> this case, if we do not try we the zero index first, We
> may get a valid journal in the 7th bucket, then we call
> find_next_bit(bitmap,ca->sb.njournal_buckets, l + 1) to get the
> first invalid bucket after the 7th bucket, because all these
> buckets is valid, so no bit 1 in bitmap, thus find_next_bit()
> function would return with ca->sb.njournal_buckets (8). So, after
> that, bcache only read journal in 7th and 8the bucket,
> the first to the third buckets are lost.
> 
> So, it is important to let developer know that, we need to try
> the zero index at first in the hash-search, and avoid any breaks
> in future's code modification.
> 
> Signed-off-by: Tang Junhui 

Fixed a small whitespace error at end of line.

Reviewed-by: Michael Lyle 

Thank you for making this much more clear.

Mike


Re: [PATCH V2 0/2] block: remove unnecessary RESTART

2017-11-02 Thread Bart Van Assche
On Wed, 2017-11-01 at 08:21 -0600, Jens Axboe wrote:
> Fixed that up, and applied these two patches as well.

Hello Jens,

Recently I noticed that a test system sporadically hangs during boot (Dell
PowerEdge R720 that boots from a hard disk connected to a MegaRAID SAS adapter)
and also that srp-tests systematically hangs. Reverting the two patches from
this series fixes both issues. I'm not sure there is another solution than
reverting the two patches from this series.

Bart.


BTW, the following appeared in the kernel log when I tried to run srp-tests
against a kernel with the two patches from this series applied:

INFO: task kworker/19:1:209 blocked for more than 480 seconds.
INFO: task kworker/19:1:209 blocked for more than 480 seconds.
  Tainted: GW   4.14.0-rc7-dbg+ #1
  Tainted: GW   4.14.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/19:1D0   209  2 0x8000
kworker/19:1D0   209  2 0x8000
Workqueue: srp_remove srp_remove_work [ib_srp]
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
Call Trace:
 __schedule+0x2fa/0xbb0
 __schedule+0x2fa/0xbb0
 schedule+0x36/0x90
 schedule+0x36/0x90
 async_synchronize_cookie_domain+0x88/0x130
 async_synchronize_cookie_domain+0x88/0x130
 ? finish_wait+0x90/0x90
 ? finish_wait+0x90/0x90
 async_synchronize_full_domain+0x18/0x20
 async_synchronize_full_domain+0x18/0x20
 sd_remove+0x4d/0xc0 [sd_mod]
 sd_remove+0x4d/0xc0 [sd_mod]
 device_release_driver_internal+0x160/0x210
 device_release_driver_internal+0x160/0x210
 device_release_driver+0x12/0x20
 device_release_driver+0x12/0x20
 bus_remove_device+0x100/0x180
 bus_remove_device+0x100/0x180
 device_del+0x1d8/0x340
 device_del+0x1d8/0x340
 __scsi_remove_device+0xfc/0x130
 __scsi_remove_device+0xfc/0x130
 scsi_forget_host+0x25/0x70
 scsi_forget_host+0x25/0x70
 scsi_remove_host+0x79/0x120
 scsi_remove_host+0x79/0x120
 srp_remove_work+0x90/0x1d0 [ib_srp]
 srp_remove_work+0x90/0x1d0 [ib_srp]
 process_one_work+0x20a/0x660
 process_one_work+0x20a/0x660
 worker_thread+0x3d/0x3b0
 worker_thread+0x3d/0x3b0
 kthread+0x13a/0x150
 kthread+0x13a/0x150
 ? process_one_work+0x660/0x660
 ? process_one_work+0x660/0x660
 ? kthread_create_on_node+0x40/0x40
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x27/0x40
 ret_from_fork+0x27/0x40

Showing all locks held in the system:

Showing all locks held in the system:
1 lock held by khungtaskd/170:
1 lock held by khungtaskd/170:
 #0:  (tasklist_lock){.+.+}, at: [] 
debug_show_all_locks+0x3d/0x1a0
 #0:  (tasklist_lock){.+.+}, at: [] 
debug_show_all_locks+0x3d/0x1a0
4 locks held by kworker/19:1/209:
4 locks held by kworker/19:1/209:
 #0:  ("%s"("srp_remove")){+.+.}, at: [] 
process_one_work+0x195/0x660
 #0:  ("%s"("srp_remove")){+.+.}, at: [] 
process_one_work+0x195/0x660
 #1:  ((>remove_work)){+.+.}, at: [] 
process_one_work+0x195/0x660
 #1:  ((>remove_work)){+.+.}, at: [] 
process_one_work+0x195/0x660
 #2:  (>scan_mutex){+.+.}, at: [] 
scsi_remove_host+0x1f/0x120
 #2:  (>scan_mutex){+.+.}, at: [] 
scsi_remove_host+0x1f/0x120
 #3:  (>mutex){}, at: [] 
device_release_driver_internal+0x39/0x210
 #3:  (>mutex){}, at: [] 
device_release_driver_internal+0x39/0x210
2 locks held by kworker/u66:0/1927:
2 locks held by kworker/u66:0/1927:
 #0:  ("events_unbound"){+.+.}, at: [] 
process_one_work+0x195/0x660
 #0:  ("events_unbound"){+.+.}, at: [] 
process_one_work+0x195/0x660
 #1:  ((>work)){+.+.}, at: [] 
process_one_work+0x195/0x660
 #1:  ((>work)){+.+.}, at: [] 
process_one_work+0x195/0x660
2 locks held by kworker/5:0/2047:
2 locks held by kworker/5:0/2047:
 #0:  ("kaluad"){+.+.}, at: [] process_one_work+0x195/0x660
 #0:  ("kaluad"){+.+.}, at: [] process_one_work+0x195/0x660
 #1:  ((&(>rtpg_work)->work)){+.+.}, at: [] 
process_one_work+0x195/0x660
 #1:  ((&(>rtpg_work)->work)){+.+.}, at: [] 
process_one_work+0x195/0x660


=

=

INFO: task kworker/19:1:209 blocked for more than 480 seconds.
INFO: task kworker/19:1:209 blocked for more than 480 seconds.
  Tainted: GW   4.14.0-rc7-dbg+ #1
  Tainted: GW   4.14.0-rc7-dbg+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/19:1D0   209  2 0x8000
kworker/19:1D0   209  2 0x8000
Workqueue: srp_remove srp_remove_work [ib_srp]
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
Call Trace:
 __schedule+0x2fa/0xbb0
 __schedule+0x2fa/0xbb0
 schedule+0x36/0x90
 schedule+0x36/0x90
 async_synchronize_cookie_domain+0x88/0x130
 async_synchronize_cookie_domain+0x88/0x130
 ? finish_wait+0x90/0x90
 ? finish_wait+0x90/0x90
 async_synchronize_full_domain+0x18/0x20
 async_synchronize_full_domain+0x18/0x20
 

[PATCH V3 0/7] blk-mq: don't allocate driver tag beforehand for flush rq

2017-11-02 Thread Ming Lei
Hi Jens,

This patchset avoids to allocate driver tag beforehand for flush rq
in case of I/O scheduler, then flush rq isn't treated specially
wrt. get/put driver tag, code gets cleanup much, such as,
reorder_tags_to_front() is removed, and we needn't to worry
about request order in dispatch list for avoiding I/O deadlock.

'dbench -t 30 -s 64' has been run on different devices(shared tag,
multi-queue, singele queue, ...), and no issues are observed,
even very low queue depth test are run, debench still works well.

Please consider it for V4.15, thanks!


V3:
- rebase on the latest for-next of block tree
- clean up commit log and comment
- include Jianchao's fix on put driver tag

V2:
- release driver tag before requeue
- warning on inserting a req with driver tag


Jianchao Wang (1):
  blk-mq: put the driver tag of nxt rq before first one is requeued

Ming Lei (6):
  blk-flush: don't run queue for requests of bypassing flush
  block: pass 'run_queue' to blk_mq_request_bypass_insert
  blk-flush: use blk_mq_request_bypass_insert()
  blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ
  blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h
  blk-mq: don't allocate driver tag beforehand for flush rq

 block/blk-core.c |  2 +-
 block/blk-flush.c| 37 ++--
 block/blk-mq-sched.c | 63 +-
 block/blk-mq.c   | 97 ++--
 block/blk-mq.h   | 35 ++-
 5 files changed, 97 insertions(+), 137 deletions(-)

-- 
2.9.5



[PATCH V3 1/7] blk-mq: put the driver tag of nxt rq before first one is requeued

2017-11-02 Thread Ming Lei
From: Jianchao Wang 

When free the driver tag of the next rq with I/O scheduler
configured, it get the first entry of the list, however, at the
moment, the failed rq has been requeued at the head of the list.
The rq it gets is the failed rq not the next rq.
Free the driver tag of next rq before the failed one is requeued
in the failure branch of queue_rq callback and it is just needed
there.

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e4d2490f4e7e..fe14b28760eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1091,7 +1091,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
bool got_budget)
 {
struct blk_mq_hw_ctx *hctx;
-   struct request *rq;
+   struct request *rq, *nxt;
int errors, queued;
 
if (list_empty(list))
@@ -1153,14 +1153,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
if (list_empty(list))
bd.last = true;
else {
-   struct request *nxt;
-
nxt = list_first_entry(list, struct request, queuelist);
bd.last = !blk_mq_get_driver_tag(nxt, NULL, false);
}
 
ret = q->mq_ops->queue_rq(hctx, );
if (ret == BLK_STS_RESOURCE) {
+   /*
+* If an I/O scheduler has been configured and we got a
+* driver tag for the next request already, free it 
again.
+*/
+   if (!list_empty(list)) {
+   nxt = list_first_entry(list, struct request, 
queuelist);
+   blk_mq_put_driver_tag(nxt);
+   }
blk_mq_put_driver_tag_hctx(hctx, rq);
list_add(>queuelist, list);
__blk_mq_requeue_request(rq);
@@ -1184,13 +1190,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list,
 * that is where we will continue on next queue run.
 */
if (!list_empty(list)) {
-   /*
-* If an I/O scheduler has been configured and we got a driver
-* tag for the next request already, free it again.
-*/
-   rq = list_first_entry(list, struct request, queuelist);
-   blk_mq_put_driver_tag(rq);
-
spin_lock(>lock);
list_splice_init(list, >dispatch);
spin_unlock(>lock);
-- 
2.9.5



[PATCH V3 4/7] blk-flush: use blk_mq_request_bypass_insert()

2017-11-02 Thread Ming Lei
In the following patch, we will use RQF_FLUSH_SEQ to decide:

1) if the flag isn't set, the flush rq need to be inserted via
blk_insert_flush()

2) otherwise, the flush rq need to be dispatched directly since
it is in flush machinery now.

So we use blk_mq_request_bypass_insert() for requests of bypassing
flush machinery, just like what the legacy path did.

Signed-off-by: Ming Lei 
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 81bd1a843043..a9773d2075ac 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops)
-   blk_mq_sched_insert_request(rq, false, false, false, 
false);
+   blk_mq_request_bypass_insert(rq, false);
else
list_add_tail(>queuelist, >queue_head);
return;
-- 
2.9.5



[PATCH V3 2/7] blk-flush: don't run queue for requests of bypassing flush

2017-11-02 Thread Ming Lei
blk_insert_flush() should only insert request since run queue always
follows it.

In case of bypassing flush, we don't need to run queue because every
blk_insert_flush() follows one run queue.

Signed-off-by: Ming Lei 
---
 block/blk-flush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 4938bec8cfef..81bd1a843043 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -463,7 +463,7 @@ void blk_insert_flush(struct request *rq)
if ((policy & REQ_FSEQ_DATA) &&
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
if (q->mq_ops)
-   blk_mq_sched_insert_request(rq, false, true, false, 
false);
+   blk_mq_sched_insert_request(rq, false, false, false, 
false);
else
list_add_tail(>queuelist, >queue_head);
return;
-- 
2.9.5



[PATCH V3 5/7] blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ

2017-11-02 Thread Ming Lei
In case of IO scheduler we always pre-allocate one driver tag before
calling blk_insert_flush(), and flush request will be marked as
RQF_FLUSH_SEQ once it is in flush machinary.

So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle
the request, otherwise the flush request is dispatched to ->dispatch
list directly.

This is a prepare patch for not preallocating driver tag on flush rq,
and don't treat flush rq as special case, just what the legacy path
did.

Signed-off-by: Ming Lei 
---
 block/blk-mq-sched.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 7775f6b12fa9..32e4ea2d0a77 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -355,21 +355,23 @@ void blk_mq_sched_request_inserted(struct request *rq)
 EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
 
 static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
+  bool has_sched,
   struct request *rq)
 {
-   if (rq->tag == -1) {
+   /* dispatch flush rq directly */
+   if (rq->rq_flags & RQF_FLUSH_SEQ) {
+   spin_lock(>lock);
+   list_add(>queuelist, >dispatch);
+   spin_unlock(>lock);
+   return true;
+   }
+
+   if (has_sched) {
rq->rq_flags |= RQF_SORTED;
-   return false;
+   WARN_ON(rq->tag != -1);
}
 
-   /*
-* If we already have a real request tag, send directly to
-* the dispatch list.
-*/
-   spin_lock(>lock);
-   list_add(>queuelist, >dispatch);
-   spin_unlock(>lock);
-   return true;
+   return false;
 }
 
 /*
@@ -395,12 +397,13 @@ void blk_mq_sched_insert_request(struct request *rq, bool 
at_head,
struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 
-   if (rq->tag == -1 && op_is_flush(rq->cmd_flags)) {
+   /* flush rq in flush machinery need to be dispatched directly */
+   if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
blk_mq_sched_insert_flush(hctx, rq, can_block);
return;
}
 
-   if (e && blk_mq_sched_bypass_insert(hctx, rq))
+   if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
goto run;
 
if (e && e->type->ops.mq.insert_requests) {
@@ -438,7 +441,7 @@ void blk_mq_sched_insert_requests(struct request_queue *q,
list_for_each_entry_safe(rq, next, list, queuelist) {
if (WARN_ON_ONCE(rq->tag != -1)) {
list_del_init(>queuelist);
-   blk_mq_sched_bypass_insert(hctx, rq);
+   blk_mq_sched_bypass_insert(hctx, true, rq);
}
}
}
-- 
2.9.5



[PATCH V3 6/7] blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h

2017-11-02 Thread Ming Lei
We need this helper to put the driver tag for flush rq, since we will
not share tag in the flush request sequence in the following patch
in case that I/O scheduler is applied.

Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 32 
 block/blk-mq.h | 33 +
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5e2866b83305..bf62065a15ec 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -993,38 +993,6 @@ bool blk_mq_get_driver_tag(struct request *rq, struct 
blk_mq_hw_ctx **hctx,
return rq->tag != -1;
 }
 
-static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
-   struct request *rq)
-{
-   blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
-   rq->tag = -1;
-
-   if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-   rq->rq_flags &= ~RQF_MQ_INFLIGHT;
-   atomic_dec(>nr_active);
-   }
-}
-
-static void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
-  struct request *rq)
-{
-   if (rq->tag == -1 || rq->internal_tag == -1)
-   return;
-
-   __blk_mq_put_driver_tag(hctx, rq);
-}
-
-static void blk_mq_put_driver_tag(struct request *rq)
-{
-   struct blk_mq_hw_ctx *hctx;
-
-   if (rq->tag == -1 || rq->internal_tag == -1)
-   return;
-
-   hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
-   __blk_mq_put_driver_tag(hctx, rq);
-}
-
 /*
  * If we fail getting a driver tag because all the driver tags are already
  * assigned and on the dispatch list, BUT the first entry does not have a
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 713be5ae83ba..f55d39b0c6ed 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -2,6 +2,7 @@
 #define INT_BLK_MQ_H
 
 #include "blk-stat.h"
+#include "blk-mq-tag.h"
 
 struct blk_mq_tag_set;
 
@@ -157,4 +158,36 @@ static inline blk_status_t blk_mq_get_dispatch_budget(
return BLK_STS_OK;
 }
 
+static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
+  struct request *rq)
+{
+   blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+   rq->tag = -1;
+
+   if (rq->rq_flags & RQF_MQ_INFLIGHT) {
+   rq->rq_flags &= ~RQF_MQ_INFLIGHT;
+   atomic_dec(>nr_active);
+   }
+}
+
+static inline void blk_mq_put_driver_tag_hctx(struct blk_mq_hw_ctx *hctx,
+  struct request *rq)
+{
+   if (rq->tag == -1 || rq->internal_tag == -1)
+   return;
+
+   __blk_mq_put_driver_tag(hctx, rq);
+}
+
+static inline void blk_mq_put_driver_tag(struct request *rq)
+{
+   struct blk_mq_hw_ctx *hctx;
+
+   if (rq->tag == -1 || rq->internal_tag == -1)
+   return;
+
+   hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu);
+   __blk_mq_put_driver_tag(hctx, rq);
+}
+
 #endif
-- 
2.9.5



[PATCH V3 7/7] blk-mq: don't allocate driver tag beforehand for flush rq

2017-11-02 Thread Ming Lei
The behind idea is simple:

1) for none scheduler, driver tag has to be borrowed for flush rq, otherwise
we may run out of tag, and IO hang is caused. And get/put driver tag is
actually noop, so reorder tags isn't necessary at all.

2) for real I/O scheduler, we needn't to allocate driver tag beforehand for
flush rq, and it works just fine to follow the way for normal requests: allocate
driver tag for each rq just before calling .queue_rq().

One visible change to driver is that the driver tag isn't shared in the flush
request sequence, that won't be a problem, since we always do that in legacy 
path.

Then flush rq needn't to be treated specially wrt. get/put driver tag, code gets
cleanup much, such as, reorder_tags_to_front() is removed, and we needn't to 
worry
about request order in dispatch list for avoiding I/O deadlock.

Also we have to put the driver tag before requeueing.

Signed-off-by: Ming Lei 
---
 block/blk-flush.c| 35 ++-
 block/blk-mq-sched.c | 42 +-
 block/blk-mq.c   | 41 ++---
 3 files changed, 37 insertions(+), 81 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index a9773d2075ac..f17170675917 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,8 +231,13 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
/* release the tag's ownership to the req cloned from */
spin_lock_irqsave(>mq_flush_lock, flags);
hctx = blk_mq_map_queue(q, flush_rq->mq_ctx->cpu);
-   blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
-   flush_rq->tag = -1;
+   if (!q->elevator) {
+   blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+   flush_rq->tag = -1;
+   } else {
+   blk_mq_put_driver_tag_hctx(hctx, flush_rq);
+   flush_rq->internal_tag = -1;
+   }
}
 
running = >flush_queue[fq->flush_running_idx];
@@ -318,19 +323,26 @@ static bool blk_kick_flush(struct request_queue *q, 
struct blk_flush_queue *fq)
blk_rq_init(q, flush_rq);
 
/*
-* Borrow tag from the first request since they can't
-* be in flight at the same time. And acquire the tag's
-* ownership for flush req.
+* In case of none scheduler, borrow tag from the first request
+* since they can't be in flight at the same time. And acquire
+* the tag's ownership for flush req.
+*
+* In case of IO scheduler, flush rq need to borrow scheduler tag
+* just for cheating put/get driver tag.
 */
if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx;
 
flush_rq->mq_ctx = first_rq->mq_ctx;
-   flush_rq->tag = first_rq->tag;
-   fq->orig_rq = first_rq;
 
-   hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
-   blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+   if (!q->elevator) {
+   fq->orig_rq = first_rq;
+   flush_rq->tag = first_rq->tag;
+   hctx = blk_mq_map_queue(q, first_rq->mq_ctx->cpu);
+   blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+   } else {
+   flush_rq->internal_tag = first_rq->internal_tag;
+   }
}
 
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
@@ -394,6 +406,11 @@ static void mq_flush_data_end_io(struct request *rq, 
blk_status_t error)
 
hctx = blk_mq_map_queue(q, ctx->cpu);
 
+   if (q->elevator) {
+   WARN_ON(rq->tag < 0);
+   blk_mq_put_driver_tag_hctx(hctx, rq);
+   }
+
/*
 * After populating an empty queue, kick it to avoid stall.  Read
 * the comment in flush_end_io().
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 32e4ea2d0a77..eb082f82b470 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -366,29 +366,12 @@ static bool blk_mq_sched_bypass_insert(struct 
blk_mq_hw_ctx *hctx,
return true;
}
 
-   if (has_sched) {
+   if (has_sched)
rq->rq_flags |= RQF_SORTED;
-   WARN_ON(rq->tag != -1);
-   }
 
return false;
 }
 
-/*
- * Add flush/fua to the queue. If we fail getting a driver tag, then
- * punt to the requeue list. Requeue will re-invoke us from a context
- * that's safe to block from.
- */
-static void blk_mq_sched_insert_flush(struct blk_mq_hw_ctx *hctx,
- struct request *rq, bool can_block)
-{
-   if (blk_mq_get_driver_tag(rq, , can_block)) {
-   blk_insert_flush(rq);
-   blk_mq_run_hw_queue(hctx, true);
-   } else
-   blk_mq_add_to_requeue_list(rq, false, true);
-}
-
 

Re: [PATCH] ide: Make ide_cdrom_prep_fs() initialize the sense buffer pointer

2017-11-02 Thread Bart Van Assche
On Thu, 2017-11-02 at 10:36 +0800, Hongxu Jia wrote:
> Apply this patch, and the test on my platform is passed.

Thank you for having tested this patch. Does this mean that you are OK
with adding the following to this patch: Tested-by: Hongxu Jia 
 ?

Bart.


Re: [PATCH] ide: Make ide_cdrom_prep_fs() initialize the sense buffer pointer

2017-11-02 Thread Jens Axboe
On 11/01/2017 04:31 PM, Bart Van Assche wrote:
> The changes introduced through commit 82ed4db499b8 assume that the
> sense buffer pointer in struct scsi_request is initialized for all
> requests - passthrough and filesystem requests. Hence make sure
> that that pointer is initialized for filesystem requests. Remove
> the memset() call that clears .cmd because the scsi_req_init()
> call in ide_initialize_rq() already initializes the .cmd.

I'll queue this up for 4.15, post the initial merge.

-- 
Jens Axboe



Re: [PATCH] skd: use ktime_get_real_seconds()

2017-11-02 Thread Jens Axboe
On 11/02/2017 05:42 AM, Arnd Bergmann wrote:
> Like many storage drivers, skd uses an unsigned 32-bit number for
> interchanging the current time with the firmware. This will overflow in
> y2106 and is otherwise safe.
> 
> However, the get_seconds() function is generally considered deprecated
> since the behavior is different between 32-bit and 64-bit architectures,
> and using it may indicate a bigger problem.
> 
> To annotate that we've thought about this, let's add a comment here
> and migrate to the ktime_get_real_seconds() function that consistently
> returns a 64-bit number.

Thanks Arnd, applied.

-- 
Jens Axboe



答复: [bug report after v4.5-rc1]block: When the scsi device has a timeout IO, the scsi device is stuck when it is deleted

2017-11-02 Thread Zouming (IT)
OK,3ks.

-邮件原件-
发件人: chenxiang (M) 
发送时间: 2017年11月2日 20:34
收件人: Zouming (IT) ; linux-block@vger.kernel.org; 
ax...@fb.com
抄送: wangzhoumengjian 
主题: Re: [bug report after v4.5-rc1]block: When the scsi device has a timeout 
IO, the scsi device is stuck when it is deleted

在 2017/11/2 20:16, Zouming (IT) 写道:
> 1.Repeat steps:
> (1) send IO on the device /dev/sdx.
> (2) Simulate an IO lost
> (3) Use the command before to  delete scsi device  before IO timeout
>ehco 1 > /sys/class/sdx/device/delete
>
> 2.The stack of delete thead is before:
> [] msleep+0x2f/0x40
> [] __blk_drain_queue+0xa4/0x170 [] 
> blk_cleanup_queue+0x13d/0x150 [] 
> __scsi_remove_device+0x4a/0xd0 [] 
> scsi_remove_device+0x26/0x40 [] 
> sdev_store_delete_callback+0x15/0x20
> [] sysfs_schedule_callback_work+0x14/0x60
> [] process_one_work+0x17a/0x440 [] 
> worker_thread+0x126/0x3c0 [] kthread+0xcf/0xe0 
> [] ret_from_fork+0x58/0x90
>
> 3.The reason is before:
>(1) When the scsi device is deleted, invoke blk_cleanup_queue funtion to
>  set the flag of request_queue dying, and wait all IO back.
>
>(2) when IO timout,the timeout workqueue invoke blk_timeout_work function 
> to abort IO,
> but it will not abort the IO because it call  blk_queue_enter funtion
> judge the request_queue is dying and return direct without doing 
> anything.

Hi Zouming,
You can have a test on Bart's patch "[PATCH] block: Fix a race between
blk_cleanup_queue() and timeout handling" for this issue.
I think this patch can solve your issue.





Re: [bug report after v4.5-rc1]block: When the scsi device has a timeout IO, the scsi device is stuck when it is deleted

2017-11-02 Thread chenxiang (M)

在 2017/11/2 20:16, Zouming (IT) 写道:

1.Repeat steps:
(1) send IO on the device /dev/sdx.
(2) Simulate an IO lost
(3) Use the command before to  delete scsi device  before IO timeout
   ehco 1 > /sys/class/sdx/device/delete

2.The stack of delete thead is before:
[] msleep+0x2f/0x40
[] __blk_drain_queue+0xa4/0x170
[] blk_cleanup_queue+0x13d/0x150
[] __scsi_remove_device+0x4a/0xd0
[] scsi_remove_device+0x26/0x40
[] sdev_store_delete_callback+0x15/0x20
[] sysfs_schedule_callback_work+0x14/0x60
[] process_one_work+0x17a/0x440
[] worker_thread+0x126/0x3c0
[] kthread+0xcf/0xe0
[] ret_from_fork+0x58/0x90

3.The reason is before:
   (1) When the scsi device is deleted, invoke blk_cleanup_queue funtion to
 set the flag of request_queue dying, and wait all IO back.

   (2) when IO timout,the timeout workqueue invoke blk_timeout_work function to 
abort IO,
but it will not abort the IO because it call  blk_queue_enter funtion
judge the request_queue is dying and return direct without doing 
anything.


Hi Zouming,
You can have a test on Bart's patch "[PATCH] block: Fix a race between 
blk_cleanup_queue() and timeout handling" for this issue.

I think this patch can solve your issue.





[bug report after v4.5-rc1]block: When the scsi device has a timeout IO, the scsi device is stuck when it is deleted

2017-11-02 Thread Zouming (IT)
1.Repeat steps: 
(1) send IO on the device /dev/sdx. 
(2) Simulate an IO lost
(3) Use the command before to  delete scsi device  before IO timeout
      ehco 1 > /sys/class/sdx/device/delete 

2.The stack of delete thead is before: 
[] msleep+0x2f/0x40 
[] __blk_drain_queue+0xa4/0x170 
[] blk_cleanup_queue+0x13d/0x150 
[] __scsi_remove_device+0x4a/0xd0 
[] scsi_remove_device+0x26/0x40 
[] sdev_store_delete_callback+0x15/0x20 
[] sysfs_schedule_callback_work+0x14/0x60 
[] process_one_work+0x17a/0x440 
[] worker_thread+0x126/0x3c0 
[] kthread+0xcf/0xe0 
[] ret_from_fork+0x58/0x90  

3.The reason is before: 
  (1) When the scsi device is deleted, invoke blk_cleanup_queue funtion to 
        set the flag of request_queue dying, and wait all IO back. 

  (2) when IO timout,the timeout workqueue invoke blk_timeout_work function to 
abort IO, 
       but it will not abort the IO because it call  blk_queue_enter funtion  
       judge the request_queue is dying and return direct without doing 
anything. 



[PATCH] skd: use ktime_get_real_seconds()

2017-11-02 Thread Arnd Bergmann
Like many storage drivers, skd uses an unsigned 32-bit number for
interchanging the current time with the firmware. This will overflow in
y2106 and is otherwise safe.

However, the get_seconds() function is generally considered deprecated
since the behavior is different between 32-bit and 64-bit architectures,
and using it may indicate a bigger problem.

To annotate that we've thought about this, let's add a comment here
and migrate to the ktime_get_real_seconds() function that consistently
returns a 64-bit number.

Signed-off-by: Arnd Bergmann 
---
 drivers/block/skd_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index 64d0fc17c174..2819f23e8bf2 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -1967,7 +1967,8 @@ static void skd_isr_msg_from_dev(struct skd_device *skdev)
break;
 
case FIT_MTD_CMD_LOG_HOST_ID:
-   skdev->connect_time_stamp = get_seconds();
+   /* hardware interface overflows in y2106 */
+   skdev->connect_time_stamp = (u32)ktime_get_real_seconds();
data = skdev->connect_time_stamp & 0x;
mtd = FIT_MXD_CONS(FIT_MTD_CMD_LOG_TIME_STAMP_LO, 0, data);
SKD_WRITEL(skdev, mtd, FIT_MSG_TO_DEVICE);
-- 
2.9.0