Re: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-10 Thread Hannes Reinecke
On 01/11/2017 05:07 AM, Damien Le Moal wrote:
> 
> Matias,
> 
> On 1/10/17 22:06, Matias Bjorling wrote:
>> On 01/10/2017 05:24 AM, Theodore Ts'o wrote:
>>> This may be an area where if we can create the right framework, and
>>> fund some research work, we might be able to get some researchers and
>>> their graduate students interested in doing some work in figuring out
>>> what sort of divisions of responsibilities and hints back and forth
>>> between the storage device and host have the most benefit.
>>>
>>
>> That is a good idea. There is a couple of papers at FAST with
>> Open-Channel SSDs this year.  They look into the interface and various
>> ways to reduce latency fluctuations.
>>
>> One thing I've heard a couple of times is the feature to move the GC
>> read/write process into the firmware. Enabling the host to offload GC
>> data movement, while the keeping the host in control. Would this be
>> beneficial for SMR?
> 
> Host-aware SMR drives already have GC internally implemented (for cases
> when the host does not write sequentially). Host-managed drives do not.
> As for moving an application specific GC code into the device, well,
> code injection in the storage device is not for tomorrow, and likely not
> ever.
> 
> There are however other clever ways to reduce GC related host overhead
> with basic commands. For SCSI, these may be WRITE SCATTERED, EXTENDED
> COPY, and some others can greatly improve overhead over a simple
> read+write loop. A better approach to GC offload may not be a "GC"
> command, but something more generic for moving around LBAs internally
> within the device. That is, if existing commands are not satisfactory.
> 
Logical head depop rears its head again...

But yes, I think it's more sensible to have I/O functions which help GC
(like UNMAP) instead of influencing the GC itself.

Anyway. Given the length of this thread I guess this is a worthy topic
for LSF.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LFS/MM TOPIC][LFS/MM ATTEND]: - Storage Stack and Driver Testing methodology.

2017-01-10 Thread Hannes Reinecke
On 01/10/2017 11:40 PM, Chaitanya Kulkarni wrote:
> Resending it at as a plain text.
> 
> From: Chaitanya Kulkarni
> Sent: Tuesday, January 10, 2017 2:37 PM
> To: lsf...@lists.linux-foundation.org
> Cc: linux-fsde...@vger.kernel.org; linux-block@vger.kernel.org; 
> linux-n...@lists.infradead.org; linux-s...@vger.kernel.org; 
> linux-...@vger.kernel.org
> Subject: [LFS/MM TOPIC][LFS/MM ATTEND]: - Storage Stack and Driver Testing 
> methodology.
>   
> 
> Hi Folks,
> 
> I would like to propose a general discussion on Storage stack and device 
> driver testing.
> 
> Purpose:-
> -
> The main objective of this discussion is to address the need for 
> a Unified Test Automation Framework which can be used by different subsystems
> in the kernel in order to improve the overall development and stability
> of the storage stack.
> 
> For Example:- 
> From my previous experience, I've worked on the NVMe driver testing last year 
> and we
> have developed simple unit test framework
>  (https://github.com/linux-nvme/nvme-cli/tree/master/tests). 
> In current implementation Upstream NVMe Driver supports following subsystems:-
> 1. PCI Host.
> 2. RDMA Target.
> 3. Fiber Channel Target (in progress).
> Today due to lack of centralized automated test framework NVMe Driver testing 
> is 
> scattered and performed using the combination of various utilities like 
> nvme-cli/tests, 
> nvmet-cli, shell scripts (git://git.infradead.org/nvme-fabrics.git 
> nvmf-selftests) etc.
> 
> In order to improve overall driver stability with various subsystems, it will 
> be beneficial
> to have a Unified Test Automation Framework (UTAF) which will centralize 
> overall
> testing. 
> 
> This topic will allow developers from various subsystem engage in the 
> discussion about 
> how to collaborate efficiently instead of having discussions on lengthy email 
> threads.
> 
> Participants:-
> --
> I'd like to invite developers from different subsystems to discuss an 
> approach towards 
> a unified testing methodology for storage stack and device drivers belongs to 
> different subsystems.
> 
> Topics for Discussion:-
> --
> As a part of discussion following are some of the key points which we can 
> focus on:-
> 1. What are the common components of the kernel used by the various 
> subsystems?
> 2. What are the potential target drivers which can benefit from this 
> approach? 
>   (e.g. NVMe, NVMe Over Fabric, Open Channel Solid State Drives etc.)
> 3. What are the desired features that can be implemented in this Framework?
>   (code coverage, unit tests, stress testings, regression, generating 
> Coccinelle reports etc.) 
> 4. Desirable Report generation mechanism?
> 5. Basic performance validation?
> 6. Whether QEMU can be used to emulate some of the H/W functionality to 
> create a test 
>   platform? (Optional subsystem specific)
> 
> Some background about myself I'm Chaitanya Kulkarni, I worked as a team lead 
> which was responsible for delivering scalable multiplatform Automated Test 
> Framework for device drivers testing at HGST. It's been used for more than 1 
> year on 
> Linux/Windows for unit testing/regression/performance validation of the NVMe 
> Linux and
> Windows driver successfully. I've also recently started contributing to the 
> 
> NVMe Host and NVMe over Fabrics Target driver.
> 
Oh, yes, please.
That's a discussion I'd like to have, too.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-10 Thread Matias Bjorling
On 01/11/2017 05:07 AM, Damien Le Moal wrote:
> 
> Matias,
> 
> On 1/10/17 22:06, Matias Bjorling wrote:
>> On 01/10/2017 05:24 AM, Theodore Ts'o wrote:
>>> This may be an area where if we can create the right framework, and
>>> fund some research work, we might be able to get some researchers and
>>> their graduate students interested in doing some work in figuring out
>>> what sort of divisions of responsibilities and hints back and forth
>>> between the storage device and host have the most benefit.
>>>
>>
>> That is a good idea. There is a couple of papers at FAST with
>> Open-Channel SSDs this year.  They look into the interface and various
>> ways to reduce latency fluctuations.
>>
>> One thing I've heard a couple of times is the feature to move the GC
>> read/write process into the firmware. Enabling the host to offload GC
>> data movement, while the keeping the host in control. Would this be
>> beneficial for SMR?
> 
> Host-aware SMR drives already have GC internally implemented (for cases
> when the host does not write sequentially). Host-managed drives do not.
> As for moving an application specific GC code into the device, well,
> code injection in the storage device is not for tomorrow, and likely not
> ever.
> 
> There are however other clever ways to reduce GC related host overhead
> with basic commands. For SCSI, these may be WRITE SCATTERED, EXTENDED
> COPY, and some others can greatly improve overhead over a simple
> read+write loop. A better approach to GC offload may not be a "GC"
> command, but something more generic for moving around LBAs internally
> within the device. That is, if existing commands are not satisfactory.

Hi Damien,

You're right. I was thinking of something similar to scattered
read/write to move data from one place to another. There is no
sector-granularity mapping table maintained by the OCSSD, which leaves
the logic up to the host.

Let me know if you decide to kick of a standardized interface for code
injection. Such an interface is long overdue. ;)


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] block: loose check on sg gap

2017-01-10 Thread Dexuan Cui
> From: Dexuan Cui
> Sent: Tuesday, December 20, 2016 11:41
> To: 'Jens Axboe' ; Ming Lei 
> Cc: Linux Kernel Mailing List ; linux-block
> ; Christoph Hellwig ;
> Vitaly Kuznetsov ; Keith Busch
> ; Hannes Reinecke ; Mike Christie
> ; Martin K. Petersen ;
> Toshi Kani ; Dan Williams ;
> Damien Le Moal 
> Subject: RE: [PATCH] block: loose check on sg gap
> 
> > From: Jens Axboe [mailto:ax...@fb.com]
> > Sent: Tuesday, December 20, 2016 10:31
> > To: Ming Lei 
> > Cc: Linux Kernel Mailing List ; linux-block
>  > bl...@vger.kernel.org>; Christoph Hellwig ; Dexuan
> Cui
> > ; Vitaly Kuznetsov ; Keith
> Busch
> > ; Hannes Reinecke ; Mike Christie
> > ; Martin K. Petersen
> ;
> > Toshi Kani ; Dan Williams
> ;
> > Damien Le Moal 
> > Subject: Re: [PATCH] block: loose check on sg gap
> >
> > On 12/19/2016 07:07 PM, Ming Lei wrote:
> > > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe  wrote:
> > >> On 12/17/2016 03:49 AM, Ming Lei wrote:
> > >>> If the last bvec of the 1st bio and the 1st bvec of the next
> > >>> bio are contineous physically, and the latter can be merged
> > >>> to last segment of the 1st bio, we should think they don't
> > >>> violate sg gap(or virt boundary) limit.
> > >>>
> > >>> Both Vitaly and Dexuan reported lots of unmergeable small bios
> > >>> are observed when running mkfs on Hyper-V virtual storage, and
> > >>> performance becomes quite low, so this patch is figured out for
> > >>> fixing the performance issue.
> > >>>
> > >>> The same issue should exist on NVMe too sine it sets virt boundary
> too.
> > >>
> > >> It looks pretty reasonable to me. I'll queue it up for some testing,
> > >> changes like this always make me a little nervous.
> > >
> > > Understood.
> > >
> > > But given it is still in early stage of 4.10 cycle, seems fine to expose
> > > it now, and we should have enough time to fix it if there might be
> > > regressions.
> > >
> > > BTW, it passes my xfstest(ext4) over sata/NVMe.
> >
> > It's been fine here in testing, too. I'm not worried about performance
> > regressions, those we can always fix. Merging makes me worried about
> > corruption, and those regressions are much worse.
> >
> > Any reason we need to rush this? I'd be more comfortable pushing this to
> > 4.11, unless there are strong reasons this should make 4.10.
> >
> > --
> > Jens Axboe
> 
> Hi Jens,
> 
> As far as I know, the patch is important to popular Linux distros,
> e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on
> Hyper-V/Azure, because they can suffer from a pretty bad
> throughput/latency
> in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes, but
> with the patch, it only takes 1 second.
> 
> -- Dexuan

Hi Ming, Jens,
Did you find any issue later when testing with the patch?

May I know if it's possible to have it in 4.10 considering the above impact?

Is it on some temporary branch of linux-block.git? Looks not.

Thanks,
-- Dexuan


Re: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-10 Thread Damien Le Moal

Matias,

On 1/10/17 22:06, Matias Bjorling wrote:
> On 01/10/2017 05:24 AM, Theodore Ts'o wrote:
>> This may be an area where if we can create the right framework, and
>> fund some research work, we might be able to get some researchers and
>> their graduate students interested in doing some work in figuring out
>> what sort of divisions of responsibilities and hints back and forth
>> between the storage device and host have the most benefit.
>>
> 
> That is a good idea. There is a couple of papers at FAST with
> Open-Channel SSDs this year.  They look into the interface and various
> ways to reduce latency fluctuations.
> 
> One thing I've heard a couple of times is the feature to move the GC
> read/write process into the firmware. Enabling the host to offload GC
> data movement, while the keeping the host in control. Would this be
> beneficial for SMR?

Host-aware SMR drives already have GC internally implemented (for cases
when the host does not write sequentially). Host-managed drives do not.
As for moving an application specific GC code into the device, well,
code injection in the storage device is not for tomorrow, and likely not
ever.

There are however other clever ways to reduce GC related host overhead
with basic commands. For SCSI, these may be WRITE SCATTERED, EXTENDED
COPY, and some others can greatly improve overhead over a simple
read+write loop. A better approach to GC offload may not be a "GC"
command, but something more generic for moving around LBAs internally
within the device. That is, if existing commands are not satisfactory.

Best.

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
damien.lem...@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/15] scsi: allocate scsi_cmnd structures as part of struct request

2017-01-10 Thread Christoph Hellwig
Rely on the new block layer functionality to allocate additional driver
specific data behind struct request instead of implementing it in SCSI
itѕelf.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/hosts.c  |  20 +--
 drivers/scsi/scsi.c   | 319 --
 drivers/scsi/scsi_error.c |  17 ++-
 drivers/scsi/scsi_lib.c   | 122 --
 drivers/scsi/scsi_priv.h  |   8 +-
 include/scsi/scsi_host.h  |   3 -
 6 files changed, 95 insertions(+), 394 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 6d29c4a..831a1c8 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -230,19 +230,6 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
}
}
 
-   /*
-* Note that we allocate the freelist even for the MQ case for now,
-* as we need a command set aside for scsi_reset_provider.  Having
-* the full host freelist and one command available for that is a
-* little heavy-handed, but avoids introducing a special allocator
-* just for this.  Eventually the structure of scsi_reset_provider
-* will need a major overhaul.
-*/
-   error = scsi_setup_command_freelist(shost);
-   if (error)
-   goto out_destroy_tags;
-
-
if (!shost->shost_gendev.parent)
shost->shost_gendev.parent = dev ? dev : _bus;
if (!dma_dev)
@@ -262,7 +249,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
 
error = device_add(>shost_gendev);
if (error)
-   goto out_destroy_freelist;
+   goto out_disable_runtime_pm;
 
scsi_host_set_state(shost, SHOST_RUNNING);
get_device(shost->shost_gendev.parent);
@@ -312,13 +299,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, 
struct device *dev,
device_del(>shost_dev);
  out_del_gendev:
device_del(>shost_gendev);
- out_destroy_freelist:
+ out_disable_runtime_pm:
device_disable_async_suspend(>shost_gendev);
pm_runtime_disable(>shost_gendev);
pm_runtime_set_suspended(>shost_gendev);
pm_runtime_put_noidle(>shost_gendev);
-   scsi_destroy_command_freelist(shost);
- out_destroy_tags:
if (shost_use_blk_mq(shost))
scsi_mq_destroy_tags(shost);
  fail:
@@ -359,7 +344,6 @@ static void scsi_host_dev_release(struct device *dev)
kfree(dev_name(>shost_dev));
}
 
-   scsi_destroy_command_freelist(shost);
if (shost_use_blk_mq(shost)) {
if (shost->tag_set.tags)
scsi_mq_destroy_tags(shost);
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2e24f31..3d8d215 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -98,163 +98,6 @@ EXPORT_SYMBOL(scsi_sd_probe_domain);
 ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain);
 EXPORT_SYMBOL(scsi_sd_pm_domain);
 
-struct scsi_host_cmd_pool {
-   struct kmem_cache   *cmd_slab;
-   unsigned intusers;
-   char*cmd_name;
-};
-
-static struct scsi_host_cmd_pool scsi_cmd_pool = {
-   .cmd_name   = "scsi_cmd_cache",
-};
-
-static DEFINE_MUTEX(host_cmd_pool_mutex);
-
-/**
- * scsi_host_free_command - internal function to release a command
- * @shost: host to free the command for
- * @cmd:   command to release
- *
- * the command must previously have been allocated by
- * scsi_host_alloc_command.
- */
-static void
-scsi_host_free_command(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
-{
-   struct scsi_host_cmd_pool *pool = shost->cmd_pool;
-
-   if (cmd->prot_sdb)
-   kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
-   scsi_free_sense_buffer(shost, cmd->sense_buffer);
-   kmem_cache_free(pool->cmd_slab, cmd);
-}
-
-/**
- * scsi_host_alloc_command - internal function to allocate command
- * @shost: SCSI host whose pool to allocate from
- * @gfp_mask:  mask for the allocation
- *
- * Returns a fully allocated command with sense buffer and protection
- * data buffer (where applicable) or NULL on failure
- */
-static struct scsi_cmnd *
-scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t gfp_mask)
-{
-   struct scsi_host_cmd_pool *pool = shost->cmd_pool;
-   struct scsi_cmnd *cmd;
-
-   cmd = kmem_cache_zalloc(pool->cmd_slab, gfp_mask);
-   if (!cmd)
-   goto fail;
-
-   cmd->sense_buffer = scsi_alloc_sense_buffer(shost, gfp_mask,
-   NUMA_NO_NODE);
-   if (!cmd->sense_buffer)
-   goto fail_free_cmd;
-
-   if (scsi_host_get_prot(shost) >= SHOST_DIX_TYPE0_PROTECTION) {
-   cmd->prot_sdb = kmem_cache_zalloc(scsi_sdb_cache, gfp_mask);
-   if (!cmd->prot_sdb)
-   goto fail_free_sense;
-   }
-
-   return cmd;
-
-fail_free_sense:
-   scsi_free_sense_buffer(shost, cmd->sense_buffer);

[PATCH 03/15] block: simplify blk_init_allocated_queue

2017-01-10 Thread Christoph Hellwig
Return an errno value instead of the passed in queue so that the callers
don't have to keep track of two queues, and move the assignment of the
request_fn and lock to the caller as passing them as argument doesn't
simplify anything.  While we're at it also remove two pointless NULL
assignments, given that the request structure is zeroed on allocation.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c   | 38 +++---
 drivers/md/dm-rq.c |  3 ++-
 include/linux/blkdev.h |  3 +--
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 61ba08c..761cee8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -821,15 +821,19 @@ EXPORT_SYMBOL(blk_init_queue);
 struct request_queue *
 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 {
-   struct request_queue *uninit_q, *q;
+   struct request_queue *q;
 
-   uninit_q = blk_alloc_queue_node(GFP_KERNEL, node_id);
-   if (!uninit_q)
+   q = blk_alloc_queue_node(GFP_KERNEL, node_id);
+   if (!q)
return NULL;
 
-   q = blk_init_allocated_queue(uninit_q, rfn, lock);
-   if (!q)
-   blk_cleanup_queue(uninit_q);
+   q->request_fn = rfn;
+   if (lock)
+   q->queue_lock = lock;
+   if (blk_init_allocated_queue(q) < 0) {
+   blk_cleanup_queue(q);
+   return NULL;
+   }
 
return q;
 }
@@ -837,30 +841,19 @@ EXPORT_SYMBOL(blk_init_queue_node);
 
 static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio);
 
-struct request_queue *
-blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
-spinlock_t *lock)
-{
-   if (!q)
-   return NULL;
 
+int blk_init_allocated_queue(struct request_queue *q)
+{
q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, 0);
if (!q->fq)
-   return NULL;
+   return -ENOMEM;
 
if (blk_init_rl(>root_rl, q, GFP_KERNEL))
goto fail;
 
INIT_WORK(>timeout_work, blk_timeout_work);
-   q->request_fn   = rfn;
-   q->prep_rq_fn   = NULL;
-   q->unprep_rq_fn = NULL;
q->queue_flags  |= QUEUE_FLAG_DEFAULT;
 
-   /* Override internal queue lock with supplied lock pointer */
-   if (lock)
-   q->queue_lock   = lock;
-
/*
 * This also sets hw/phys segments, boundary and size
 */
@@ -878,13 +871,12 @@ blk_init_allocated_queue(struct request_queue *q, 
request_fn_proc *rfn,
}
 
mutex_unlock(>sysfs_lock);
-
-   return q;
+   return 0;
 
 fail:
blk_free_flush_queue(q->fq);
wbt_exit(q);
-   return NULL;
+   return -ENOMEM;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
 
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 9d7275f..93f6e9f 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -823,7 +823,8 @@ static void dm_old_request_fn(struct request_queue *q)
 int dm_old_init_request_queue(struct mapped_device *md)
 {
/* Fully initialize the queue */
-   if (!blk_init_allocated_queue(md->queue, dm_old_request_fn, NULL))
+   md->queue->request_fn = dm_old_request_fn;
+   if (blk_init_allocated_queue(md->queue) < 0)
return -EINVAL;
 
/* disable dm_old_request_fn's merge heuristic by default */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8369564..9a6f3ab 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1116,8 +1116,7 @@ extern void blk_unprep_request(struct request *);
 extern struct request_queue *blk_init_queue_node(request_fn_proc *rfn,
spinlock_t *lock, int node_id);
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
-extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
- request_fn_proc *, 
spinlock_t *);
+extern int blk_init_allocated_queue(struct request_queue *);
 extern void blk_cleanup_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/15] dm: remove incomple BLOCK_PC support

2017-01-10 Thread Christoph Hellwig
DM tries to copy a few fields around for BLOCK_PC requests, but given
that no dm-target ever wires up scsi_cmd_ioctl BLOCK_PC can't actually
be sent to dm.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-rq.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 93f6e9f..3f12916 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -270,19 +270,6 @@ static void dm_end_request(struct request *clone, int 
error)
struct mapped_device *md = tio->md;
struct request *rq = tio->orig;
 
-   if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
-   rq->errors = clone->errors;
-   rq->resid_len = clone->resid_len;
-
-   if (rq->sense)
-   /*
-* We are using the sense buffer of the original
-* request.
-* So setting the length of the sense data is enough.
-*/
-   rq->sense_len = clone->sense_len;
-   }
-
free_rq_clone(clone);
rq_end_stats(md, rq);
if (!rq->q->mq_ops)
@@ -511,9 +498,6 @@ static int setup_clone(struct request *clone, struct 
request *rq,
if (r)
return r;
 
-   clone->cmd = rq->cmd;
-   clone->cmd_len = rq->cmd_len;
-   clone->sense = rq->sense;
clone->end_io = end_clone_request;
clone->end_io_data = tio;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/15] block: allow specifying size for extra command data

2017-01-10 Thread Christoph Hellwig
This mirrors the blk-mq capabilities to allocate extra drivers-specific
data behind struct request by setting a cmd_size field, as well as having
a constructor / destructor for it.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c   | 59 --
 block/blk-flush.c  |  5 ++---
 block/blk-sysfs.c  |  7 --
 include/linux/blkdev.h |  7 ++
 4 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 761cee8..7190bc1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -604,17 +604,41 @@ void blk_cleanup_queue(struct request_queue *q)
 EXPORT_SYMBOL(blk_cleanup_queue);
 
 /* Allocate memory local to the request queue */
-static void *alloc_request_struct(gfp_t gfp_mask, void *data)
+static void *alloc_request_simple(gfp_t gfp_mask, void *data)
 {
-   int nid = (int)(long)data;
-   return kmem_cache_alloc_node(request_cachep, gfp_mask, nid);
+   struct request_queue *q = data;
+
+   return kmem_cache_alloc_node(request_cachep, gfp_mask, q->node);
 }
 
-static void free_request_struct(void *element, void *unused)
+static void free_request_simple(void *element, void *data)
 {
kmem_cache_free(request_cachep, element);
 }
 
+static void *alloc_request_size(gfp_t gfp_mask, void *data)
+{
+   struct request_queue *q = data;
+   struct request *rq;
+
+   rq = kmalloc_node(sizeof(struct request) + q->cmd_size, gfp_mask,
+   q->node);
+   if (rq && q->init_rq_fn && q->init_rq_fn(q, rq, gfp_mask) < 0) {
+   kfree(rq);
+   rq = NULL;
+   }
+   return rq;
+}
+
+static void free_request_size(void *element, void *data)
+{
+   struct request_queue *q = data;
+
+   if (q->exit_rq_fn)
+   q->exit_rq_fn(q, element);
+   kfree(element);
+}
+
 int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask)
 {
@@ -627,10 +651,15 @@ int blk_init_rl(struct request_list *rl, struct 
request_queue *q,
init_waitqueue_head(>wait[BLK_RW_SYNC]);
init_waitqueue_head(>wait[BLK_RW_ASYNC]);
 
-   rl->rq_pool = mempool_create_node(BLKDEV_MIN_RQ, alloc_request_struct,
- free_request_struct,
- (void *)(long)q->node, gfp_mask,
- q->node);
+   if (q->cmd_size) {
+   rl->rq_pool = mempool_create_node(BLKDEV_MIN_RQ,
+   alloc_request_size, free_request_size,
+   q, gfp_mask, q->node);
+   } else {
+   rl->rq_pool = mempool_create_node(BLKDEV_MIN_RQ,
+   alloc_request_simple, free_request_simple,
+   q, gfp_mask, q->node);
+   }
if (!rl->rq_pool)
return -ENOMEM;
 
@@ -844,12 +873,15 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio);
 
 int blk_init_allocated_queue(struct request_queue *q)
 {
-   q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, 0);
+   q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, q->cmd_size);
if (!q->fq)
return -ENOMEM;
 
+   if (q->init_rq_fn && q->init_rq_fn(q, q->fq->flush_rq, GFP_KERNEL))
+   goto out_free_flush_queue;
+
if (blk_init_rl(>root_rl, q, GFP_KERNEL))
-   goto fail;
+   goto out_exit_flush_rq;
 
INIT_WORK(>timeout_work, blk_timeout_work);
q->queue_flags  |= QUEUE_FLAG_DEFAULT;
@@ -867,13 +899,16 @@ int blk_init_allocated_queue(struct request_queue *q)
/* init elevator */
if (elevator_init(q, NULL)) {
mutex_unlock(>sysfs_lock);
-   goto fail;
+   goto out_exit_flush_rq;
}
 
mutex_unlock(>sysfs_lock);
return 0;
 
-fail:
+out_exit_flush_rq:
+   if (q->exit_rq_fn)
+   q->exit_rq_fn(q, q->fq->flush_rq);
+out_free_flush_queue:
blk_free_flush_queue(q->fq);
wbt_exit(q);
return -ENOMEM;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 20b7c7a..f390aea 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -545,11 +545,10 @@ struct blk_flush_queue *blk_alloc_flush_queue(struct 
request_queue *q,
if (!fq)
goto fail;
 
-   if (q->mq_ops) {
+   if (q->mq_ops)
spin_lock_init(>mq_flush_lock);
-   rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
-   }
 
+   rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
fq->flush_rq = kzalloc_node(rq_sz, GFP_KERNEL, node);
if (!fq->flush_rq)
goto fail_rq;
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1dbce05..894f773 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -814,10 +814,13 @@ static void blk_release_queue(struct kobject 

RFC: split scsi passthrough fields out of struct request

2017-01-10 Thread Christoph Hellwig
Hi all,

this series splits the support for SCSI passthrough commands from the
main struct request used all over the block layer into a separate
scsi_request structure that drivers that want to support SCSI passthough
need to embedded as the first thing into their request-private data,
similar to how we handle NVMe passthrough commands.

To support this I've added support for that the private data after
request structure to the legacy request path instead, so that it can
be treated the same way as the blk-mq path.  Compare to the current
scsi_cmnd allocator that actually is a major simplification.

There is one major show-stopper for the series as-is:  Request-based
device mapper currently allocate the request structures in the stacking
driver without a knowledge of the queue it's going to be submitte on
for the legacy request path.  It managed to avoid that issue for the
blk-mq path, but that currently can't be used on legacy request devices
for reasons I don't fully understand.  We'll need to figure out how
sort this out, but maybe that's a good opportunity to drop one or two
of the three different dm-mpath I/O paths? :)

Note that the first two patches are 4.10 material and have already
been submitted independently.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/15] virtio_blk: avoid DMA to stack for the sense buffer

2017-01-10 Thread Christoph Hellwig
Most users of BLOCK_PC requests allocate the sense buffer on the stack,
so to avoid DMA to the stack copy them to a field in the heap allocated
virtblk_req structure.  Without that any attempt at SCSI passthrough I/O,
including the SG_IO ioctl from userspace will crash the kernel.  Note that
this includes running tools like hdparm even when the host does not have
SCSI passthrough enabled.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/virtio_blk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 5545a67..3c3b8f6 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -56,6 +56,7 @@ struct virtblk_req {
struct virtio_blk_outhdr out_hdr;
struct virtio_scsi_inhdr in_hdr;
u8 status;
+   u8 sense[SCSI_SENSE_BUFFERSIZE];
struct scatterlist sg[];
 };
 
@@ -102,7 +103,8 @@ static int __virtblk_add_req(struct virtqueue *vq,
}
 
if (type == cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_SCSI_CMD)) {
-   sg_init_one(, vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
+   memcpy(vbr->sense, vbr->req->sense, SCSI_SENSE_BUFFERSIZE);
+   sg_init_one(, vbr->sense, SCSI_SENSE_BUFFERSIZE);
sgs[num_out + num_in++] = 
sg_init_one(, >in_hdr, sizeof(vbr->in_hdr));
sgs[num_out + num_in++] = 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/15] scsi: remove __scsi_alloc_queue

2017-01-10 Thread Christoph Hellwig
Instead do an internal export of __scsi_init_queue for the transport
classes that export BSG nodes.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi_lib.c | 19 ---
 drivers/scsi/scsi_transport_fc.c|  6 --
 drivers/scsi/scsi_transport_iscsi.c |  3 ++-
 include/scsi/scsi_host.h|  2 --
 include/scsi/scsi_transport.h   |  2 ++
 5 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e5ebb3a..898af1a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2082,7 +2082,7 @@ static u64 scsi_calculate_bounce_limit(struct Scsi_Host 
*shost)
return bounce_limit;
 }
 
-static void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
+void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
struct device *dev = shost->dma_dev;
 
@@ -2117,28 +2117,17 @@ static void __scsi_init_queue(struct Scsi_Host *shost, 
struct request_queue *q)
 */
blk_queue_dma_alignment(q, 0x03);
 }
-
-struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
-request_fn_proc *request_fn)
-{
-   struct request_queue *q;
-
-   q = blk_init_queue(request_fn, NULL);
-   if (!q)
-   return NULL;
-   __scsi_init_queue(shost, q);
-   return q;
-}
-EXPORT_SYMBOL(__scsi_alloc_queue);
+EXPORT_SYMBOL_GPL(__scsi_init_queue);
 
 struct request_queue *scsi_alloc_queue(struct scsi_device *sdev)
 {
struct request_queue *q;
 
-   q = __scsi_alloc_queue(sdev->host, scsi_request_fn);
+   q = blk_init_queue(scsi_request_fn, NULL);
if (!q)
return NULL;
 
+   __scsi_init_queue(sdev->host, q);
blk_queue_prep_rq(q, scsi_prep_fn);
blk_queue_unprep_rq(q, scsi_unprep_fn);
blk_queue_softirq_done(q, scsi_softirq_done);
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 03577bd..afcedec 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3776,7 +3776,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
snprintf(bsg_name, sizeof(bsg_name),
 "fc_host%d", shost->host_no);
 
-   q = __scsi_alloc_queue(shost, bsg_request_fn);
+   q = blk_init_queue(bsg_request_fn, NULL);
if (!q) {
dev_err(dev,
"fc_host%d: bsg interface failed to initialize - no 
request queue\n",
@@ -3784,6 +3784,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
return -ENOMEM;
}
 
+   __scsi_init_queue(shost, q);
err = bsg_setup_queue(dev, q, bsg_name, fc_bsg_dispatch,
 i->f->dd_bsg_size);
if (err) {
@@ -3831,12 +3832,13 @@ fc_bsg_rportadd(struct Scsi_Host *shost, struct 
fc_rport *rport)
if (!i->f->bsg_request)
return -ENOTSUPP;
 
-   q = __scsi_alloc_queue(shost, bsg_request_fn);
+   q = blk_init_queue(bsg_request_fn, NULL);
if (!q) {
dev_err(dev, "bsg interface failed to initialize - no request 
queue\n");
return -ENOMEM;
}
 
+   __scsi_init_queue(shost, q);
err = bsg_setup_queue(dev, q, NULL, fc_bsg_dispatch, i->f->dd_bsg_size);
if (err) {
dev_err(dev, "failed to setup bsg queue\n");
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index 42bca61..04ebe6e 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1544,10 +1544,11 @@ iscsi_bsg_host_add(struct Scsi_Host *shost, struct 
iscsi_cls_host *ihost)
 
snprintf(bsg_name, sizeof(bsg_name), "iscsi_host%d", shost->host_no);
 
-   q = __scsi_alloc_queue(shost, bsg_request_fn);
+   q = blk_init_queue(bsg_request_fn, NULL);
if (!q)
return -ENOMEM;
 
+   __scsi_init_queue(shost, q);
ret = bsg_setup_queue(dev, q, bsg_name, iscsi_bsg_host_dispatch, 0);
if (ret) {
shost_printk(KERN_ERR, shost, "bsg interface failed to "
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 36680f1..f4964d7 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -826,8 +826,6 @@ extern void scsi_block_requests(struct Scsi_Host *);
 
 struct class_container;
 
-extern struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
-   void (*) (struct request_queue 
*));
 /*
  * These two functions are used to allocate and free a pseudo device
  * which will connect to the host adapter itself rather than any
diff --git a/include/scsi/scsi_transport.h b/include/scsi/scsi_transport.h
index 8129239..b6e07b5 100644
--- a/include/scsi/scsi_transport.h
+++ b/include/scsi/scsi_transport.h
@@ -119,4 +119,6 @@ 

[PATCH 06/15] scsi_dh_rdac: switch to scsi_execute_req_flags()

2017-01-10 Thread Christoph Hellwig
From: Hannes Reinecke 

Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of
open-coding it.  Using scsi_execute_req_flags() will set REQ_QUIET and
REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and
should be able to send the command even if the device is quiesced.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/device_handler/scsi_dh_rdac.c | 174 +
 1 file changed, 51 insertions(+), 123 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_rdac.c 
b/drivers/scsi/device_handler/scsi_dh_rdac.c
index 00d9c32..b64eaae 100644
--- a/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ b/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -205,7 +205,6 @@ struct rdac_dh_data {
 #define RDAC_NON_PREFERRED 1
charpreferred;
 
-   unsigned char   sense[SCSI_SENSE_BUFFERSIZE];
union   {
struct c2_inquiry c2;
struct c4_inquiry c4;
@@ -262,40 +261,12 @@ do { \
sdev_printk(KERN_INFO, sdev, RDAC_NAME ": " f "\n", ## arg); \
 } while (0);
 
-static struct request *get_rdac_req(struct scsi_device *sdev,
-   void *buffer, unsigned buflen, int rw)
+static unsigned int rdac_failover_get(struct rdac_controller *ctlr,
+ struct list_head *list,
+ unsigned char *cdb)
 {
-   struct request *rq;
-   struct request_queue *q = sdev->request_queue;
-
-   rq = blk_get_request(q, rw, GFP_NOIO);
-
-   if (IS_ERR(rq)) {
-   sdev_printk(KERN_INFO, sdev,
-   "get_rdac_req: blk_get_request failed.\n");
-   return NULL;
-   }
-   blk_rq_set_block_pc(rq);
-
-   if (buflen && blk_rq_map_kern(q, rq, buffer, buflen, GFP_NOIO)) {
-   blk_put_request(rq);
-   sdev_printk(KERN_INFO, sdev,
-   "get_rdac_req: blk_rq_map_kern failed.\n");
-   return NULL;
-   }
-
-   rq->cmd_flags |= REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
-REQ_FAILFAST_DRIVER;
-   rq->retries = RDAC_RETRIES;
-   rq->timeout = RDAC_TIMEOUT;
-
-   return rq;
-}
-
-static struct request *rdac_failover_get(struct scsi_device *sdev,
-   struct rdac_dh_data *h, struct list_head *list)
-{
-   struct request *rq;
+   struct scsi_device *sdev = ctlr->ms_sdev;
+   struct rdac_dh_data *h = sdev->handler_data;
struct rdac_mode_common *common;
unsigned data_size;
struct rdac_queue_data *qdata;
@@ -332,27 +303,17 @@ static struct request *rdac_failover_get(struct 
scsi_device *sdev,
lun_table[qdata->h->lun] = 0x81;
}
 
-   /* get request for block layer packet command */
-   rq = get_rdac_req(sdev, >ctlr->mode_select, data_size, WRITE);
-   if (!rq)
-   return NULL;
-
/* Prepare the command. */
if (h->ctlr->use_ms10) {
-   rq->cmd[0] = MODE_SELECT_10;
-   rq->cmd[7] = data_size >> 8;
-   rq->cmd[8] = data_size & 0xff;
+   cdb[0] = MODE_SELECT_10;
+   cdb[7] = data_size >> 8;
+   cdb[8] = data_size & 0xff;
} else {
-   rq->cmd[0] = MODE_SELECT;
-   rq->cmd[4] = data_size;
+   cdb[0] = MODE_SELECT;
+   cdb[4] = data_size;
}
-   rq->cmd_len = COMMAND_SIZE(rq->cmd[0]);
-
-   rq->sense = h->sense;
-   memset(rq->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq->sense_len = 0;
 
-   return rq;
+   return data_size;
 }
 
 static void release_controller(struct kref *kref)
@@ -400,46 +361,14 @@ static struct rdac_controller *get_controller(int index, 
char *array_name,
return ctlr;
 }
 
-static int submit_inquiry(struct scsi_device *sdev, int page_code,
- unsigned int len, struct rdac_dh_data *h)
-{
-   struct request *rq;
-   struct request_queue *q = sdev->request_queue;
-   int err = SCSI_DH_RES_TEMP_UNAVAIL;
-
-   rq = get_rdac_req(sdev, >inq, len, READ);
-   if (!rq)
-   goto done;
-
-   /* Prepare the command. */
-   rq->cmd[0] = INQUIRY;
-   rq->cmd[1] = 1;
-   rq->cmd[2] = page_code;
-   rq->cmd[4] = len;
-   rq->cmd_len = COMMAND_SIZE(INQUIRY);
-
-   rq->sense = h->sense;
-   memset(rq->sense, 0, SCSI_SENSE_BUFFERSIZE);
-   rq->sense_len = 0;
-
-   err = blk_execute_rq(q, NULL, rq, 1);
-   if (err == -EIO)
-   err = SCSI_DH_IO;
-
-   blk_put_request(rq);
-done:
-   return err;
-}
-
 static int get_lun_info(struct scsi_device *sdev, struct rdac_dh_data *h,
char *array_name, u8 *array_id)
 {
-   int err, i;
-   struct c8_inquiry *inqp;
+   

[PATCH 14/15] block/bsg: move queue creation into bsg_setup_queue

2017-01-10 Thread Christoph Hellwig
Simply the boilerplate code needed for bsg nodes a bit.

Signed-off-by: Christoph Hellwig 
---
 block/bsg-lib.c | 21 +++--
 drivers/scsi/scsi_transport_fc.c| 36 
 drivers/scsi/scsi_transport_iscsi.c | 15 ---
 include/linux/bsg-lib.h |  5 ++---
 4 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 9d652a9..c74acf4 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -177,7 +177,7 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
  *
  * Drivers/subsys should pass this to the queue init function.
  */
-void bsg_request_fn(struct request_queue *q)
+static void bsg_request_fn(struct request_queue *q)
__releases(q->queue_lock)
__acquires(q->queue_lock)
 {
@@ -214,24 +214,24 @@ void bsg_request_fn(struct request_queue *q)
put_device(dev);
spin_lock_irq(q->queue_lock);
 }
-EXPORT_SYMBOL_GPL(bsg_request_fn);
 
 /**
  * bsg_setup_queue - Create and add the bsg hooks so we can receive requests
  * @dev: device to attach bsg device to
- * @q: request queue setup by caller
  * @name: device to give bsg device
  * @job_fn: bsg job handler
  * @dd_job_size: size of LLD data needed for each job
- *
- * The caller should have setup the reuqest queue with bsg_request_fn
- * as the request_fn.
  */
-int bsg_setup_queue(struct device *dev, struct request_queue *q,
-   char *name, bsg_job_fn *job_fn, int dd_job_size)
+struct request_queue *bsg_setup_queue(struct device *dev, char *name,
+   bsg_job_fn *job_fn, int dd_job_size)
 {
+   struct request_queue *q;
int ret;
 
+   q = blk_init_queue(bsg_request_fn, NULL);
+   if (!q)
+   return ERR_PTR(-ENOMEM);
+
q->queuedata = dev;
q->bsg_job_size = dd_job_size;
q->bsg_job_fn = job_fn;
@@ -243,9 +243,10 @@ int bsg_setup_queue(struct device *dev, struct 
request_queue *q,
if (ret) {
printk(KERN_ERR "%s: bsg interface failed to "
   "initialize - register queue\n", dev->kobj.name);
-   return ret;
+   blk_cleanup_queue(q);
+   return ERR_PTR(ret);
}
 
-   return 0;
+   return q;
 }
 EXPORT_SYMBOL_GPL(bsg_setup_queue);
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index afcedec..13dcb9b 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3765,7 +3765,6 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
struct device *dev = >shost_gendev;
struct fc_internal *i = to_fc_internal(shost->transportt);
struct request_queue *q;
-   int err;
char bsg_name[20];
 
fc_host->rqst_q = NULL;
@@ -3776,24 +3775,14 @@ fc_bsg_hostadd(struct Scsi_Host *shost, struct 
fc_host_attrs *fc_host)
snprintf(bsg_name, sizeof(bsg_name),
 "fc_host%d", shost->host_no);
 
-   q = blk_init_queue(bsg_request_fn, NULL);
-   if (!q) {
-   dev_err(dev,
-   "fc_host%d: bsg interface failed to initialize - no 
request queue\n",
-   shost->host_no);
-   return -ENOMEM;
-   }
-
-   __scsi_init_queue(shost, q);
-   err = bsg_setup_queue(dev, q, bsg_name, fc_bsg_dispatch,
-i->f->dd_bsg_size);
-   if (err) {
+   q = bsg_setup_queue(dev, bsg_name, fc_bsg_dispatch, i->f->dd_bsg_size);
+   if (IS_ERR(q)) {
dev_err(dev,
"fc_host%d: bsg interface failed to initialize - setup 
queue\n",
shost->host_no);
-   blk_cleanup_queue(q);
-   return err;
+   return PTR_ERR(q);
}
+   __scsi_init_queue(shost, q);
blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);
fc_host->rqst_q = q;
@@ -3825,27 +3814,18 @@ fc_bsg_rportadd(struct Scsi_Host *shost, struct 
fc_rport *rport)
struct device *dev = >dev;
struct fc_internal *i = to_fc_internal(shost->transportt);
struct request_queue *q;
-   int err;
 
rport->rqst_q = NULL;
 
if (!i->f->bsg_request)
return -ENOTSUPP;
 
-   q = blk_init_queue(bsg_request_fn, NULL);
-   if (!q) {
-   dev_err(dev, "bsg interface failed to initialize - no request 
queue\n");
-   return -ENOMEM;
-   }
-
-   __scsi_init_queue(shost, q);
-   err = bsg_setup_queue(dev, q, NULL, fc_bsg_dispatch, i->f->dd_bsg_size);
-   if (err) {
+   q = bsg_setup_queue(dev, NULL, fc_bsg_dispatch, i->f->dd_bsg_size);
+   if (IS_ERR(q)) {
dev_err(dev, "failed to setup bsg queue\n");
-   blk_cleanup_queue(q);
-   return err;
+ 

[PATCH 02/15] nvme-rdma: fix nvme_rdma_queue_is_ready

2017-01-10 Thread Christoph Hellwig
Now that we don't abuse the cmd field in struct request for nvme command
passthrough this function needs to be converted to the proper accessor
as well.

Fixes: d49187e97e ("nvme: introduce struct nvme_request")
Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f587af3..34e5648 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1422,7 +1422,7 @@ static inline bool nvme_rdma_queue_is_ready(struct 
nvme_rdma_queue *queue,
struct request *rq)
 {
if (unlikely(!test_bit(NVME_RDMA_Q_LIVE, >flags))) {
-   struct nvme_command *cmd = (struct nvme_command *)rq->cmd;
+   struct nvme_command *cmd = nvme_req(rq)->cmd;
 
if (rq->cmd_type != REQ_TYPE_DRV_PRIV ||
cmd->common.opcode != nvme_fabrics_command ||
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/15] scsi: remove scsi_cmd_dma_pool

2017-01-10 Thread Christoph Hellwig
There is no need for GFP_DMA allocations of the scsi_cmnd structures
themselves, all that might be DMAed to or from is the actual payload,
or the sense buffers.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 469aa0f..2e24f31 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -102,17 +102,10 @@ struct scsi_host_cmd_pool {
struct kmem_cache   *cmd_slab;
unsigned intusers;
char*cmd_name;
-   unsigned intslab_flags;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
.cmd_name   = "scsi_cmd_cache",
-   .slab_flags = SLAB_HWCACHE_ALIGN,
-};
-
-static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
-   .cmd_name   = "scsi_cmd_cache(DMA)",
-   .slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
 };
 
 static DEFINE_MUTEX(host_cmd_pool_mutex);
@@ -290,8 +283,6 @@ scsi_find_host_cmd_pool(struct Scsi_Host *shost)
 {
if (shost->hostt->cmd_size)
return shost->hostt->cmd_pool;
-   if (shost->unchecked_isa_dma)
-   return _cmd_dma_pool;
return _cmd_pool;
 }
 
@@ -318,10 +309,6 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
return NULL;
}
 
-   pool->slab_flags = SLAB_HWCACHE_ALIGN;
-   if (shost->unchecked_isa_dma)
-   pool->slab_flags |= SLAB_CACHE_DMA;
-
if (hostt->cmd_size)
hostt->cmd_pool = pool;
 
@@ -349,7 +336,7 @@ scsi_get_host_cmd_pool(struct Scsi_Host *shost)
 
if (!pool->users) {
pool->cmd_slab = kmem_cache_create(pool->cmd_name, cmd_size, 0,
-  pool->slab_flags, NULL);
+  SLAB_HWCACHE_ALIGN, NULL);
if (!pool->cmd_slab)
goto out_free_pool;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/15] scsi: respect unchecked_isa_dma for blk-mq

2017-01-10 Thread Christoph Hellwig
Currently blk-mq always allocates the sense buffer using normal GFP_KERNEL
allocation.  Refactor the cmd pool code to split the cmd and sense allocation
and share the code to allocate the sense buffers as well as the sense buffer
slab caches between the legacy and blk-mq path.

Note that this switches to lazy allocation of the sense slab caches - the
slab caches (not the actual allocations) won't be destroy until the scsi
module is unloaded instead of keeping track of hosts using them.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/hosts.c |  4 
 drivers/scsi/scsi.c  | 24 ---
 drivers/scsi/scsi_lib.c  | 62 +---
 drivers/scsi/scsi_priv.h |  5 
 4 files changed, 73 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 258a3f9..6d29c4a 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -213,6 +213,10 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
goto fail;
}
 
+   error = scsi_init_sense_cache(shost);
+   if (error)
+   goto fail;
+
if (shost_use_blk_mq(shost)) {
error = scsi_mq_setup_tags(shost);
if (error)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 0f93892..469aa0f 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -100,22 +100,18 @@ EXPORT_SYMBOL(scsi_sd_pm_domain);
 
 struct scsi_host_cmd_pool {
struct kmem_cache   *cmd_slab;
-   struct kmem_cache   *sense_slab;
unsigned intusers;
char*cmd_name;
-   char*sense_name;
unsigned intslab_flags;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {
.cmd_name   = "scsi_cmd_cache",
-   .sense_name = "scsi_sense_cache",
.slab_flags = SLAB_HWCACHE_ALIGN,
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {
.cmd_name   = "scsi_cmd_cache(DMA)",
-   .sense_name = "scsi_sense_cache(DMA)",
.slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
 };
 
@@ -136,7 +132,7 @@ scsi_host_free_command(struct Scsi_Host *shost, struct 
scsi_cmnd *cmd)
 
if (cmd->prot_sdb)
kmem_cache_free(scsi_sdb_cache, cmd->prot_sdb);
-   kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
+   scsi_free_sense_buffer(shost, cmd->sense_buffer);
kmem_cache_free(pool->cmd_slab, cmd);
 }
 
@@ -158,7 +154,8 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t 
gfp_mask)
if (!cmd)
goto fail;
 
-   cmd->sense_buffer = kmem_cache_alloc(pool->sense_slab, gfp_mask);
+   cmd->sense_buffer = scsi_alloc_sense_buffer(shost, gfp_mask,
+   NUMA_NO_NODE);
if (!cmd->sense_buffer)
goto fail_free_cmd;
 
@@ -171,7 +168,7 @@ scsi_host_alloc_command(struct Scsi_Host *shost, gfp_t 
gfp_mask)
return cmd;
 
 fail_free_sense:
-   kmem_cache_free(pool->sense_slab, cmd->sense_buffer);
+   scsi_free_sense_buffer(shost, cmd->sense_buffer);
 fail_free_cmd:
kmem_cache_free(pool->cmd_slab, cmd);
 fail:
@@ -301,7 +298,6 @@ scsi_find_host_cmd_pool(struct Scsi_Host *shost)
 static void
 scsi_free_host_cmd_pool(struct scsi_host_cmd_pool *pool)
 {
-   kfree(pool->sense_name);
kfree(pool->cmd_name);
kfree(pool);
 }
@@ -317,8 +313,7 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
return NULL;
 
pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->proc_name);
-   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->proc_name);
-   if (!pool->cmd_name || !pool->sense_name) {
+   if (!pool->cmd_name) {
scsi_free_host_cmd_pool(pool);
return NULL;
}
@@ -357,12 +352,6 @@ scsi_get_host_cmd_pool(struct Scsi_Host *shost)
   pool->slab_flags, NULL);
if (!pool->cmd_slab)
goto out_free_pool;
-
-   pool->sense_slab = kmem_cache_create(pool->sense_name,
-SCSI_SENSE_BUFFERSIZE, 0,
-pool->slab_flags, NULL);
-   if (!pool->sense_slab)
-   goto out_free_slab;
}
 
pool->users++;
@@ -371,8 +360,6 @@ scsi_get_host_cmd_pool(struct Scsi_Host *shost)
mutex_unlock(_cmd_pool_mutex);
return retval;
 
-out_free_slab:
-   kmem_cache_destroy(pool->cmd_slab);
 out_free_pool:
if (hostt->cmd_size) {
scsi_free_host_cmd_pool(pool);
@@ -398,7 +385,6 @@ static void scsi_put_host_cmd_pool(struct Scsi_Host *shost)
 
if (!--pool->users) {
kmem_cache_destroy(pool->cmd_slab);
-   kmem_cache_destroy(pool->sense_slab);

Re: [LSF/MM TOPIC][LSF/MM ATTEND] OCSSDs - SMR, Hierarchical Interface, and Vector I/Os

2017-01-10 Thread Matias Bjorling
On 01/10/2017 05:24 AM, Theodore Ts'o wrote:
> This may be an area where if we can create the right framework, and
> fund some research work, we might be able to get some researchers and
> their graduate students interested in doing some work in figuring out
> what sort of divisions of responsibilities and hints back and forth
> between the storage device and host have the most benefit.
> 

That is a good idea. There is a couple of papers at FAST with
Open-Channel SSDs this year.  They look into the interface and various
ways to reduce latency fluctuations.

One thing I've heard a couple of times is the feature to move the GC
read/write process into the firmware. Enabling the host to offload GC
data movement, while the keeping the host in control. Would this be
beneficial for SMR?

-Matias
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM TOPIC] [LSF/MM ATTEND] FS Management Interfaces

2017-01-10 Thread Steven Whitehouse

Hi,

This is a request both to attend LSF/MM and also a topic proposal. The 
last couple of years I've proposed a topic of the block/fs interface. 
I'm happy to do that again, if there is consensus that this would be 
useful, however this time I thought that I'd propose something a bit 
different.


I had originally thought about calling the proposal "kernel/userland 
interface", however that seemed a bit vague and management interfaces 
seems like a better title since it is I hope a bit clearer of the kind 
of thing that I'm thinking about in this case.


There are a number of possible sub-topics and I hope that I might find a 
few more before LSF too. One is that of space management (we have 
statfs, but currently no notifications for thresholds crossed etc., so 
everything is polled. That is ok sometimes, but statfs can be expensive 
in the case of distributed filesystems, if 100% accurate. We could just 
have ENOSPC notifications for 100% full, or something more generic), 
another is state transitions (is the fs running normally, or has it gone 
read only/withdrawn/etc due to I/O errors?) and a further topic would be 
working towards a common interface for fs statistics (at the moment each 
fs defines their own interface). One potential implementation, at least 
for the first two sub-topics, would be to use something along the lines 
of the quota netlink interface, but since few ideas survive first 
contact with the community at large, I'm throwing this out for further 
discussion and feedback on whether this approach is considered the right 
way to go.


Assuming the topic is accepted, my intention would be to gather together 
some additional sub-topics relating to fs management to go along with 
those I mentioned above, and I'd be very interested to hear of any other 
issues that could be usefully added to the list for discussion.


My interest in other topics is fairly wide... I'm, as usual, interested 
in all filesystem related topics and a good number of block device and 
mm topics too. Anything relating to vfs, xfs, ext*, btrfs, gfs2, 
overlayfs, NFS/CIFS, and technologies such as copy-offload, DAX, 
reflink, RDMA, NVMe(F), etc.,


Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] block: Don't register a registered bdi device

2017-01-10 Thread Yijing Wang
This issue is same as Sreekanth Reddy's report bug,
link:https://patchwork.kernel.org/patch/9394471/
so I quoted the calltrace Sreekanth Reddy's report.

Observing below kernel panic while creating second raid disk
on LSI SAS3008 HBA card.
[  +0.55] [ cut here ]
[  +0.07] WARNING: CPU: 2 PID: 281 at fs/sysfs/dir.c:31 
sysfs_warn_dup+0x62/0x80
[  +0.02] sysfs: cannot create duplicate filename 
'/devices/virtual/bdi/8:32'
[  +0.01] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat  nf_conntrack tun bridge stp 
llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl sb_edac 
edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt ipmi_ssif 
mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich mfd_corema 
acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc 
sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class nvme_core 
scsi_transport_sas dca
[  +0.67] CPU: 2 PID: 281 Comm: kworker/u49:5 Not tainted 4.9.0-rc2 #1
[  +0.02] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS 1.1 
07/22/2015
[  +0.05] Workqueue: events_unbound async_run_entry_fn
[  +0.04] Call Trace:
[  +0.09]  [] dump_stack+0x63/0x85
[  +0.05]  [] __warn+0xcb/0xf0
[  +0.04]  [] warn_slowpath_fmt+0x5f/0x80
[  +0.06]  [] ? kernfs_path_from_node+0x4f/0x60
[  +0.02]  [] sysfs_warn_dup+0x62/0x80
[  +0.02]  [] sysfs_create_dir_ns+0x77/0x90
[  +0.04]  [] kobject_add_internal+0x99/0x330
[  +0.03]  [] ? vsnprintf+0x35b/0x4c0
[  +0.03]  [] kobject_add+0x75/0xd0
[  +0.06]  [] ? device_private_init+0x23/0x70
[  +0.07]  [] ? mutex_lock+0x12/0x30
[  +0.03]  [] device_add+0x119/0x670
[  +0.04]  [] device_create_groups_vargs+0xe0/0xf0
[  +0.03]  [] device_create_vargs+0x1c/0x20
[  +0.06]  [] bdi_register+0x8c/0x180
[  +0.03]  [] bdi_register_owner+0x36/0x60
[  +0.06]  [] device_add_disk+0x168/0x480
[  +0.05]  [] ? update_autosuspend+0x51/0x60
[  +0.05]  [] sd_probe_async+0x110/0x1c0
[  +0.02]  [] async_run_entry_fn+0x39/0x140
[  +0.03]  [] process_one_work+0x15f/0x430
[  +0.02]  [] worker_thread+0x4e/0x490
[  +0.02]  [] ? process_one_work+0x430/0x430
[  +0.03]  [] kthread+0xd9/0xf0
[  +0.03]  [] ? kthread_park+0x60/0x60
[  +0.03]  [] ret_from_fork+0x25/0x30
[  +0.02] [ cut here ]
[  +0.04] WARNING: CPU: 2 PID: 281 at lib/kobject.c:240 
kobject_add_internal+0x2bd/0x330
[  +0.01] kobject_add_internal failed for 8:32 with -EEXIST, don't try to 
register things with the same name in the same
[  +0.01] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat  nf_conntrack tun bridge stp 
llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl sb_edac 
edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt ipmi_ssif 
mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich mfd_corema 
acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc 
sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class nvme_core 
scsi_transport_sas dca
[  +0.43] CPU: 2 PID: 281 Comm: kworker/u49:5 Tainted: GW   
4.9.0-rc2 #1
[  +0.01] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS 1.1 
07/22/2015
[  +0.02] Workqueue: events_unbound async_run_entry_fn
[  +0.03] Call Trace:
[  +0.03]  [] dump_stack+0x63/0x85
[  +0.03]  [] __warn+0xcb/0xf0
[  +0.04]  [] warn_slowpath_fmt+0x5f/0x80
[  +0.02]  [] ? sysfs_warn_dup+0x6a/0x80
[  +0.03]  [] kobject_add_internal+0x2bd/0x330
[  +0.03]  [] ? vsnprintf+0x35b/0x4c0
[  +0.03]  [] kobject_add+0x75/0xd0
[  +0.03]  [] ? device_private_init+0x23/0x70
[  +0.04]  [] ? mutex_lock+0x12/0x30
[  +0.02]  [] device_add+0x119/0x670
[  +0.04]  [] device_create_groups_vargs+0xe0/0xf0
[  +0.03]  [] device_create_vargs+0x1c/0x20
[  +0.03]  [] bdi_register+0x8c/0x180
[  +0.03]  [] bdi_register_owner+0x36/0x60
[  +0.04]  [] device_add_disk+0x168/0x480
[  +0.03]  [] ? update_autosuspend+0x51/0x60
[  +0.02]  [] sd_probe_async+0x110/0x1c0
[  +0.02]  [] async_run_entry_fn+0x39/0x140
[  +0.02]  [] process_one_work+0x15f/0x430
[  +0.02]  [] worker_thread+0x4e/0x490
[  +0.02]  [] ? process_one_work+0x430/0x430
[  +0.03]  [] kthread+0xd9/0xf0
[  +0.03]  [] ? kthread_park+0x60/0x60
[  +0.03]  [] ret_from_fork+0x25/0x30
[  +0.000949] BUG: unable to handle kernel
[  +0.005263] NULL pointer dereference
[  +0.002853] IP: [] sysfs_do_create_link_sd.isra.2+0x34/0xb0
[  +0.008584] PGD 0

[  +0.006115] Oops:  [#1] SMP
[  +0.004531] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat  nf_conntrack tun bridge stp 
llc ebtable_filter ebtables ip6table_filter ip6_tables