[PATCH v2] block: avoid incorrect bdi_unregiter call

2016-12-01 Thread Masayoshi Mizuma
bdi_unregister() should be called after bdi_register() is called,
so we should check whether WB_registered flag is set.

For example of the situation, error path in device driver may call
blk_cleanup_queue() before the driver calls bdi_register().

Signed-off-by: Masayoshi Mizuma 
---
 mm/backing-dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8fde443..f8b07d4 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -853,6 +853,9 @@ static void bdi_remove_from_list(struct backing_dev_info 
*bdi)
 
 void bdi_unregister(struct backing_dev_info *bdi)
 {
+   if (!test_bit(WB_registered, >wb.state))
+   return;
+
/* make sure nobody finds us on the bdi_list anymore */
bdi_remove_from_list(bdi);
wb_shutdown(>wb);
-- 
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block: avoid incorrect bdi_unregiter call in blk_cleanup_queue

2016-12-01 Thread Masayoshi Mizuma
Hi Jens,

On Thu, 1 Dec 2016 08:04:33 -0700 Jens Axboe wrote:
> On 12/01/2016 01:45 AM, Masayoshi Mizuma wrote:
>> blk_cleanup_queue() should call bdi_unregister() only if WB_registered
>> is set to bdi->wb.state. Because, blk_cleanup_queue() may be called
>> before bdi_register() is called, for example, error path.
> 
> Should this logic be in bdi_unregister() instead?

Thank you for pointed it out! I think your suggestion is better.
I will rewrite the patch.

- Masayoshi Mizuma

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq: preparation of the ground for BFQ inclusion

2016-12-01 Thread Jens Axboe
On 12/01/2016 02:03 PM, Jens Axboe wrote:
> On 11/14/2016 03:42 PM, Jens Axboe wrote:
>> On 11/14/2016 03:13 PM, Paolo Valente wrote:
>>> Hi Jens,
>>> have you had time to look into the first extensions/changes required?
>>> Any time plan?
>>
>> I was busy last week on the writeback and on the polling. I should have
>> something end this week.
> 
> It's coming together, running some testing and fixing some issues. But I
> now have unmodified legacy IO schedulers running under blk-mq, for
> single queue devices. Scheduler queue depth is independent of the device
> queue depth.

Here is the branch, btw:

http://git.kernel.dk/cgit/linux-block/log/?h=for-4.11/blk-mq-legacy-sched

if you want to look at it or test it. It's not in a clean state, I
wanted to get this reasonably working before cleaning it up. Should be
ready for testing, I've successfully run various IO tests on deadline
and CFQ with null_blk loaded with queue_mode=2 submit_queues=1
use_sched=1.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq: preparation of the ground for BFQ inclusion

2016-12-01 Thread Jens Axboe
On 11/14/2016 03:42 PM, Jens Axboe wrote:
> On 11/14/2016 03:13 PM, Paolo Valente wrote:
>> Hi Jens,
>> have you had time to look into the first extensions/changes required?
>> Any time plan?
> 
> I was busy last week on the writeback and on the polling. I should have
> something end this week.

It's coming together, running some testing and fixing some issues. But I
now have unmodified legacy IO schedulers running under blk-mq, for
single queue devices. Scheduler queue depth is independent of the device
queue depth.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block: Factor out req_set_nomerge

2016-12-01 Thread Ritesh Harjani



On 12/1/2016 9:08 PM, Jens Axboe wrote:

On 12/01/2016 03:27 AM, Ritesh Harjani wrote:

Factor out common code for setting REQ_NOMERGE
flag which is being used out at certain places
and make it a helper instead (req_set_nomerge).


I applied this, but I got rid of the inline. Let the compiler do it's
job, if anything this should be marked out-of-line, since it's not a hot
path.

Beware that your email triggers the gmail spam filter, since it's
failing the authenticity check. You might want to look into that.


Thanks for informing. I will take a look.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] block: Add Sed-opal library

2016-12-01 Thread Keith Busch
On Thu, Dec 01, 2016 at 10:53:43AM -0700, Scott Bauer wrote:
> > Maybe.  I need to look at the TCG spec again (oh my good, what a fucking
> > mess), but if I remember the context if it is the whole nvme controller
> > and not just a namespace, so a block_device might be the wrong context.
> > Then again we can always go from the block_device to the controller
> > fairly easily.  So instead of adding the security operation to the
> > block_device_operations which we don't really need for now maybe we
> > should add a security_conext to the block device so that we can avoid
> > all the lookup code?
> 
> I spent some time this morning reading through the numerous specs/documents,
> with a lot of coffee.
> 
> Specifically in:
> https://www.trustedcomputinggroup.org/wp-content/uploads/TCG_SWG_SIIS_Version_1_02_Revision_1_00_20111230.pdf
> 
> 5.5.2
> Namespace
> 
> A target that has multiple Namespaces MAY have  multiple TPers. Each TPer
> SHALL be associated with a different Namespace. Every Namespace on a device
> is not required to have a TPer, but Namespaces that support the TCG Core
> specification commands and functionality SHALL have a TPer. A TPer SHALL only
> be associated with exactly one Namespace. A Namespace MAY have no TPer.
> 
> From reading that it seems we will probably have to keep it at the block 
> layer,
> since its possible to have a valid "Locking range 1" on n1 and a "Locking 
> range 1"
> on n2.

Thanks for tracking that down! Specifically for NVMe, security
send/recieve requires NSID, so it is a little more difficult to get to
that if we're not using the abstracton that contains the namespace.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] block: Add Sed-opal library

2016-12-01 Thread Scott Bauer
On Thu, Dec 01, 2016 at 02:04:56AM -0800, Christoph Hellwig wrote:
> On Wed, Nov 30, 2016 at 07:50:07PM -0500, Keith Busch wrote:
> > I think we should get rid of the "majmin" stuff
> 
> Absolutely agreed.
> 
> >
> > and directly use
> > block_device. Then if we add the security send/receive operations to the
> > block_device_operations, that will simplify chaining the security request
> > to the driver without needing to thread the driver's requested callback
> > and data the way you have to here since all the necessary information
> > is encapsulated in the block_device.
> 
> Maybe.  I need to look at the TCG spec again (oh my good, what a fucking
> mess), but if I remember the context if it is the whole nvme controller
> and not just a namespace, so a block_device might be the wrong context.
> Then again we can always go from the block_device to the controller
> fairly easily.  So instead of adding the security operation to the
> block_device_operations which we don't really need for now maybe we
> should add a security_conext to the block device so that we can avoid
> all the lookup code?

I spent some time this morning reading through the numerous specs/documents,
with a lot of coffee.

Specifically in:
https://www.trustedcomputinggroup.org/wp-content/uploads/TCG_SWG_SIIS_Version_1_02_Revision_1_00_20111230.pdf

5.5.2
Namespace

A target that has multiple Namespaces MAY have  multiple TPers. Each TPer
SHALL be associated with a different Namespace. Every Namespace on a device
is not required to have a TPer, but Namespaces that support the TCG Core
specification commands and functionality SHALL have a TPer. A TPer SHALL only
be associated with exactly one Namespace. A Namespace MAY have no TPer.

>From reading that it seems we will probably have to keep it at the block layer,
since its possible to have a valid "Locking range 1" on n1 and a "Locking range 
1"
on n2.


[snip]
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] blk-mq: Fix failed allocation path when mapping queues

2016-12-01 Thread Gabriel Krisman Bertazi
Gabriel Krisman Bertazi  writes:

> My one concern about this patch is if remapping an arbitrary queue to
> hctx_0 could result in outstanding requests getting submitted to the
> wrong hctx.  I couldn't observe this happening during tests, but I'm not
> entirely sure it'll never happen.  I believe the queue will be empty if
> we are trying to allocate tags for it, unless it was using another alive
> hctx queue and for some reason got reassigned to this new hctx.  is this
> possible at all?

I no longer believe this case to be an issue for accepting this patch,
since if a queue enters this path and get's remapped to hctx_0, it means
the ctx would already be getting remaped to a new queue anyway, and it
should use the requests that were just mapped for that queue.  So, for a
correctness standpoint it doesn't matter which queue we are remaping,
the failed hctx or hctx_0, no request would get redirected to the wrong
queue with this change.

Jens, do you think this set is good for merging in 4.10?  It fixes some
issues for us on low memory conditions.

-- 
Gabriel Krisman Bertazi

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block: Factor out req_set_nomerge

2016-12-01 Thread Jens Axboe
On 12/01/2016 03:27 AM, Ritesh Harjani wrote:
> Factor out common code for setting REQ_NOMERGE
> flag which is being used out at certain places
> and make it a helper instead (req_set_nomerge).

I applied this, but I got rid of the inline. Let the compiler do it's
job, if anything this should be marked out-of-line, since it's not a hot
path.

Beware that your email triggers the gmail spam filter, since it's
failing the authenticity check. You might want to look into that.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] block: Check partition alignment on zoned block devices

2016-12-01 Thread Jens Axboe
On 12/01/2016 12:00 AM, Damien Le Moal wrote:
> Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of
> a zoned block device. However, the first and last zones reported for a
> partition make sense only if the partition start sector and size are aligned
> on the device zone size. The same applies for zone reset. Resetting the first
> or the last zone of a partition straddling zones may impact neighboring
> partitions. Finally, if a partition start sector is not at the beginning of a
> sequential zone, it will be impossible to write to the first sectors of the
> partition on a host-managed device.
> Avoid all these problems and incoherencies by ignoring partitions that are not
> zone aligned.
> 
> Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the
> correct disk zoning type (host-aware, host-managed or none) but
> bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone
> size is unknown). So test this as a way to ensure that a zoned block device is
> being handled as such. As a result, for a host-aware devices, unaligned zone
> partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the
> disk will be treated as a regular block device (as it should). If zoned block
> device support is enabled, only aligned partitions will be accepted.

Thanks Damien, applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] block: Factor out req_set_nomerge

2016-12-01 Thread Ritesh Harjani
Factor out common code for setting REQ_NOMERGE
flag which is being used out at certain places
and make it a helper instead (req_set_nomerge).

Signed-off-by: Ritesh Harjani 
---
 block/blk-merge.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2642e5f..4549fec 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -492,6 +492,14 @@ int blk_rq_map_sg(struct request_queue *q, struct request 
*rq,
 }
 EXPORT_SYMBOL(blk_rq_map_sg);
 
+static inline void req_set_nomerge(struct request_queue *q,
+  struct request *req)
+{
+   req->cmd_flags |= REQ_NOMERGE;
+   if (req == q->last_merge)
+   q->last_merge = NULL;
+}
+
 static inline int ll_new_hw_segment(struct request_queue *q,
struct request *req,
struct bio *bio)
@@ -512,9 +520,7 @@ static inline int ll_new_hw_segment(struct request_queue *q,
return 1;
 
 no_merge:
-   req->cmd_flags |= REQ_NOMERGE;
-   if (req == q->last_merge)
-   q->last_merge = NULL;
+   req_set_nomerge(q, req);
return 0;
 }
 
@@ -528,9 +534,7 @@ int ll_back_merge_fn(struct request_queue *q, struct 
request *req,
return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, blk_rq_pos(req))) {
-   req->cmd_flags |= REQ_NOMERGE;
-   if (req == q->last_merge)
-   q->last_merge = NULL;
+   req_set_nomerge(q, req);
return 0;
}
if (!bio_flagged(req->biotail, BIO_SEG_VALID))
@@ -552,9 +556,7 @@ int ll_front_merge_fn(struct request_queue *q, struct 
request *req,
return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, bio->bi_iter.bi_sector)) {
-   req->cmd_flags |= REQ_NOMERGE;
-   if (req == q->last_merge)
-   q->last_merge = NULL;
+   req_set_nomerge(q, req);
return 0;
}
if (!bio_flagged(bio, BIO_SEG_VALID))
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
a Linux Foundation Collaborative Project.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block: protect iterate_bdevs() against concurrent close

2016-12-01 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] block: Add Sed-opal library

2016-12-01 Thread Christoph Hellwig
On Wed, Nov 30, 2016 at 07:50:07PM -0500, Keith Busch wrote:
> I think we should get rid of the "majmin" stuff

Absolutely agreed.

>
> and directly use
> block_device. Then if we add the security send/receive operations to the
> block_device_operations, that will simplify chaining the security request
> to the driver without needing to thread the driver's requested callback
> and data the way you have to here since all the necessary information
> is encapsulated in the block_device.

Maybe.  I need to look at the TCG spec again (oh my good, what a fucking
mess), but if I remember the context if it is the whole nvme controller
and not just a namespace, so a block_device might be the wrong context.
Then again we can always go from the block_device to the controller
fairly easily.  So instead of adding the security operation to the
block_device_operations which we don't really need for now maybe we
should add a security_conext to the block device so that we can avoid
all the lookup code?

> We shouldn't need to be allocating an 'opal_dev' for every range. The
> range-specific parts should be in a different structure that the opal_dev
> can have a list of. That will simpify the unlock from suspend a bit.

Agreed.

> I can appreciate how compact this is, but this is a little harder to
> read IMO, and it works only because you were so careful in setting up
> the array. I think expanding the ioctl into a switch will be easier to
> follow, and has a more tolerent coding convention for future additions.

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 5/5] nvmet: add support for the Write Zeroes command

2016-12-01 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 4/5] nvme: add support for the Write Zeroes command

2016-12-01 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 3/5] nvme.h: add Write Zeroes definitions

2016-12-01 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 1/5] block: add async variant of blkdev_issue_zeroout

2016-12-01 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

[although normally a cover letter for the series would be nice, like
the last time around]
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] block: avoid incorrect bdi_unregiter call in blk_cleanup_queue

2016-12-01 Thread Masayoshi Mizuma
blk_cleanup_queue() should call bdi_unregister() only if WB_registered
is set to bdi->wb.state. Because, blk_cleanup_queue() may be called
before bdi_register() is called, for example, error path.

Signed-off-by: Masayoshi Mizuma 
---
 block/blk-core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c07..c1ab17e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -595,7 +595,8 @@ void blk_cleanup_queue(struct request_queue *q)
q->queue_lock = >__queue_lock;
spin_unlock_irq(lock);
 
-   bdi_unregister(>backing_dev_info);
+   if (test_bit(WB_registered, >backing_dev_info.wb.state))
+   bdi_unregister(>backing_dev_info);
 
/* @q is and will stay empty, shutdown and put */
blk_put_queue(q);
-- 
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html