[RFC 2/3] cfq: add cfq_find_async_wb_req

2016-08-16 Thread Daeho Jeong
Implemented a function to find asynchronous writeback I/O with a
specified sector number and remove the found I/O from the queue
and return that to the caller.

Signed-off-by: Daeho Jeong 
---
 block/cfq-iosched.c  |   29 +
 block/elevator.c |   24 
 include/linux/elevator.h |3 +++
 3 files changed, 56 insertions(+)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4a34978..69355e2 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2524,6 +2524,32 @@ static void cfq_remove_request(struct request *rq)
}
 }
 
+#ifdef CONFIG_BOOST_URGENT_ASYNC_WB
+static struct request *
+cfq_find_async_wb_req(struct request_queue *q, sector_t sector)
+{
+   struct cfq_data *cfqd = q->elevator->elevator_data;
+   struct cfq_queue *cfqq;
+   struct request *found_req = NULL;
+   int i;
+
+   for (i = 0; i < IOPRIO_BE_NR; i++) {
+   cfqq = cfqd->root_group->async_cfqq[1][i];
+   if (cfqq) {
+   if (cfqq->queued[0])
+   found_req = elv_rb_find_incl(>sort_list,
+ sector);
+   if (found_req) {
+   cfq_remove_request(found_req);
+   return found_req;
+   }
+   }
+   }
+
+   return NULL;
+}
+#endif
+
 static int cfq_merge(struct request_queue *q, struct request **req,
 struct bio *bio)
 {
@@ -4735,6 +4761,9 @@ static struct elevator_type iosched_cfq = {
.elevator_add_req_fn =  cfq_insert_request,
.elevator_activate_req_fn = cfq_activate_request,
.elevator_deactivate_req_fn =   cfq_deactivate_request,
+#ifdef CONFIG_BOOST_URGENT_ASYNC_WB
+   .elevator_find_async_wb_req_fn = cfq_find_async_wb_req,
+#endif
.elevator_completed_req_fn =cfq_completed_request,
.elevator_former_req_fn =   elv_rb_former_request,
.elevator_latter_req_fn =   elv_rb_latter_request,
diff --git a/block/elevator.c b/block/elevator.c
index e4081ce..d34267a 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -343,6 +343,30 @@ struct request *elv_rb_find(struct rb_root *root, sector_t 
sector)
 }
 EXPORT_SYMBOL(elv_rb_find);
 
+#ifdef CONFIG_BOOST_URGENT_ASYNC_WB
+struct request *elv_rb_find_incl(struct rb_root *root, sector_t sector)
+{
+   struct rb_node *n = root->rb_node;
+   struct request *rq;
+
+   while (n) {
+   rq = rb_entry(n, struct request, rb_node);
+
+   if (sector < blk_rq_pos(rq))
+   n = n->rb_left;
+   else if (sector > blk_rq_pos(rq)) {
+   if (sector < blk_rq_pos(rq) + blk_rq_sectors(rq))
+   return rq;
+   n = n->rb_right;
+   } else
+   return rq;
+   }
+
+   return NULL;
+}
+EXPORT_SYMBOL(elv_rb_find_incl);
+#endif
+
 /*
  * Insert rq into dispatch queue of q.  Queue lock must be held on
  * entry.  rq is sort instead into the dispatch queue. To be used by
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 08ce155..efc202a 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -183,6 +183,9 @@ extern struct request *elv_rb_latter_request(struct 
request_queue *, struct requ
 extern void elv_rb_add(struct rb_root *, struct request *);
 extern void elv_rb_del(struct rb_root *, struct request *);
 extern struct request *elv_rb_find(struct rb_root *, sector_t);
+#ifdef CONFIG_BOOST_URGENT_ASYNC_WB
+extern struct request *elv_rb_find_incl(struct rb_root *, sector_t);
+#endif
 
 /*
  * Return values from elevator merger
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 3/3] ext4: tag asynchronous writeback io

2016-08-16 Thread Daeho Jeong
Set the page with PG_asyncwb and PG_plugged, and set the bio with
BIO_ASYNC_WB when submitting asynchronous writeback I/O in order to
mark which pages are flushed as asynchronous writeback I/O and which
one stays in the plug list.

Signed-off-by: Daeho Jeong 
---
 fs/ext4/page-io.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 2a01df9..5912e59 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -370,6 +370,10 @@ static int io_submit_init_bio(struct ext4_io_submit *io,
bio->bi_private = ext4_get_io_end(io->io_end);
io->io_bio = bio;
io->io_next_block = bh->b_blocknr;
+#ifdef CONFIG_BOOST_URGENT_ASYNC_WB
+   if (io->io_wbc->sync_mode == WB_SYNC_NONE)
+   bio->bi_flags |= (1 << BIO_ASYNC_WB);
+#endif
return 0;
 }
 
@@ -416,6 +420,13 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));
 
+#ifdef CONFIG_BOOST_URGENT_ASYNC_WB
+   if (wbc->sync_mode == WB_SYNC_NONE) {
+   SetPagePlugged(page);
+   SetPageAsyncWB(page);
+   }
+#endif
+
if (keep_towrite)
set_page_writeback_keepwrite(page);
else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] block: Fix secure erase

2016-08-16 Thread Christoph Hellwig
On Tue, Aug 16, 2016 at 10:20:25AM +0300, Adrian Hunter wrote:
> On 15/08/16 21:14, Christoph Hellwig wrote:
> > On Mon, Aug 15, 2016 at 11:43:12AM -0500, Shaun Tancheff wrote:
> >> Hmm ... Since REQ_SECURE implied REQ_DISCARD doesn't this
> >> mean that we should include REQ_OP_SECURE_ERASE checking
> >> wherever REQ_OP_DISCARD is being checked now in drivers/scsi/sd.c ?
> >>
> >> (It's only in 3 spots so it's a quickie patch)
> > 
> > SCSI doesn't support secure erase operations.  Only MMC really
> > supports it, plus the usual cargo culting in Xen blkfront that's
> > probably never been tested..
> > 
> 
> I left SCSI out because support does not exist at the moment.
> However there is UFS which is seen as the replacement for eMMC.
> And there is a patch to add support for BLKSECDISCARD:
> 
>   http://marc.info/?l=linux-scsi=146953519016056
> 
> So SCSI will need updating if that is to go in.

That patch is complete crap and if anyone thinks they'd get shit like
that in they are on the same crack that apparently the authors of the
UFS spec are on.

If you want secure discard supported in UFS get a command for into
SBC instead of bypassing the command set.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Deadlock in blk_mq_register_disk error path

2016-08-16 Thread Jinpu Wang
On Mon, Aug 15, 2016 at 6:22 PM, Bart Van Assche
 wrote:
> On 08/15/2016 09:01 AM, Jinpu Wang wrote:
>>
>> It's more likely you hit another bug, my colleague Roman fix that:
>>
>> http://www.spinics.net/lists/linux-block/msg04552.html
>
>
> Hello Jinpu,
>
> Interesting. However, I see that wrote the following: "Firstly this wrong
> sequence raises two kernel warnings: 1st. WARNING at
> lib/percpu-recount.c:309 percpu_ref_kill_and_confirm called more than once
> 2nd. WARNING at lib/percpu-refcount.c:331". I haven't seen any of these
> kernel warnings ...
>
> Thanks,
>
> Bart.
>

The warning happened from time to time, but your hung tasks are
similar with ours.
We injected some delay in order to reproduce easily.


-- 
Mit freundlichen Grüßen,
Best Regards,

Jack Wang

Linux Kernel Developer Storage
ProfitBricks GmbH  The IaaS-Company.

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 5770083-42
Fax: +49 30 5770085-98
Email: jinpu.w...@profitbricks.com
URL: http://www.profitbricks.de

Sitz der Gesellschaft: Berlin.
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] block: Fix secure erase

2016-08-16 Thread Adrian Hunter
Commit 288dab8a35a0 ("block: add a separate operation type for secure
erase") split REQ_OP_SECURE_ERASE from REQ_OP_DISCARD without considering
all the places REQ_OP_DISCARD was being used to mean either. Fix those.

Signed-off-by: Adrian Hunter 
Fixes: 288dab8a35a0 ("block: add a separate operation type for secure erase")
---


Changes in V2:
In elv_dispatch_sort() don't allow requests with different ops to pass
one another.


 block/bio.c  | 21 +++--
 block/blk-merge.c| 33 +++--
 block/elevator.c |  2 +-
 drivers/mmc/card/block.c |  1 +
 drivers/mmc/card/queue.c |  3 ++-
 drivers/mmc/card/queue.h |  4 +++-
 include/linux/bio.h  | 10 --
 include/linux/blkdev.h   |  6 --
 kernel/trace/blktrace.c  |  2 +-
 9 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index f39477538fef..aa7354088008 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -667,18 +667,19 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t 
gfp_mask,
bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size= bio_src->bi_iter.bi_size;
 
-   if (bio_op(bio) == REQ_OP_DISCARD)
-   goto integrity_clone;
-
-   if (bio_op(bio) == REQ_OP_WRITE_SAME) {
+   switch (bio_op(bio)) {
+   case REQ_OP_DISCARD:
+   case REQ_OP_SECURE_ERASE:
+   break;
+   case REQ_OP_WRITE_SAME:
bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
-   goto integrity_clone;
+   break;
+   default:
+   bio_for_each_segment(bv, bio_src, iter)
+   bio->bi_io_vec[bio->bi_vcnt++] = bv;
+   break;
}
 
-   bio_for_each_segment(bv, bio_src, iter)
-   bio->bi_io_vec[bio->bi_vcnt++] = bv;
-
-integrity_clone:
if (bio_integrity(bio_src)) {
int ret;
 
@@ -1788,7 +1789,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
 * Discards need a mutable bio_vec to accommodate the payload
 * required by the DSM TRIM and UNMAP commands.
 */
-   if (bio_op(bio) == REQ_OP_DISCARD)
+   if (bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_SECURE_ERASE)
split = bio_clone_bioset(bio, gfp, bs);
else
split = bio_clone_fast(bio, gfp, bs);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3eec75a9e91d..72627e3cf91e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -172,12 +172,18 @@ void blk_queue_split(struct request_queue *q, struct bio 
**bio,
struct bio *split, *res;
unsigned nsegs;
 
-   if (bio_op(*bio) == REQ_OP_DISCARD)
+   switch (bio_op(*bio)) {
+   case REQ_OP_DISCARD:
+   case REQ_OP_SECURE_ERASE:
split = blk_bio_discard_split(q, *bio, bs, );
-   else if (bio_op(*bio) == REQ_OP_WRITE_SAME)
+   break;
+   case REQ_OP_WRITE_SAME:
split = blk_bio_write_same_split(q, *bio, bs, );
-   else
+   break;
+   default:
split = blk_bio_segment_split(q, *bio, q->bio_split, );
+   break;
+   }
 
/* physical segments can be figured out during splitting */
res = split ? split : *bio;
@@ -213,7 +219,7 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
 * This should probably be returning 0, but blk_add_request_payload()
 * (Christoph)
 */
-   if (bio_op(bio) == REQ_OP_DISCARD)
+   if (bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_SECURE_ERASE)
return 1;
 
if (bio_op(bio) == REQ_OP_WRITE_SAME)
@@ -385,7 +391,9 @@ static int __blk_bios_map_sg(struct request_queue *q, 
struct bio *bio,
nsegs = 0;
cluster = blk_queue_cluster(q);
 
-   if (bio_op(bio) == REQ_OP_DISCARD) {
+   switch (bio_op(bio)) {
+   case REQ_OP_DISCARD:
+   case REQ_OP_SECURE_ERASE:
/*
 * This is a hack - drivers should be neither modifying the
 * biovec, nor relying on bi_vcnt - but because of
@@ -393,19 +401,16 @@ static int __blk_bios_map_sg(struct request_queue *q, 
struct bio *bio,
 * a payload we need to set up here (thank you Christoph) and
 * bi_vcnt is really the only way of telling if we need to.
 */
-
-   if (bio->bi_vcnt)
-   goto single_segment;
-
-   return 0;
-   }
-
-   if (bio_op(bio) == REQ_OP_WRITE_SAME) {
-single_segment:
+   if (!bio->bi_vcnt)
+   return 0;
+   /* Fall through */
+   case REQ_OP_WRITE_SAME:
*sg = sglist;
bvec = bio_iovec(bio);
sg_set_page(*sg, bvec.bv_page, bvec.bv_len, bvec.bv_offset);
return 1;
+   

Re: [PATCH] block: Fix secure erase

2016-08-16 Thread Adrian Hunter
On 15/08/16 21:14, Christoph Hellwig wrote:
> On Mon, Aug 15, 2016 at 11:43:12AM -0500, Shaun Tancheff wrote:
>> Hmm ... Since REQ_SECURE implied REQ_DISCARD doesn't this
>> mean that we should include REQ_OP_SECURE_ERASE checking
>> wherever REQ_OP_DISCARD is being checked now in drivers/scsi/sd.c ?
>>
>> (It's only in 3 spots so it's a quickie patch)
> 
> SCSI doesn't support secure erase operations.  Only MMC really
> supports it, plus the usual cargo culting in Xen blkfront that's
> probably never been tested..
> 

I left SCSI out because support does not exist at the moment.
However there is UFS which is seen as the replacement for eMMC.
And there is a patch to add support for BLKSECDISCARD:

http://marc.info/?l=linux-scsi=146953519016056

So SCSI will need updating if that is to go in.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

2016-08-16 Thread Shaun Tancheff
On Tue, Aug 9, 2016 at 11:38 PM, Damien Le Moal  wrote:
> Shaun,
>
> On 8/10/16 12:58, Shaun Tancheff wrote:
>>
>> On Tue, Aug 9, 2016 at 3:09 AM, Damien Le Moal 
>> wrote:

 On Aug 9, 2016, at 15:47, Hannes Reinecke  wrote:
>>
>>
>> [trim]
>>
> Since disk type == 0 for everything that isn't HM so I would prefer the
> sysfs 'zoned' file just report if the drive is HA or HM.
>
 Okay. So let's put in the 'zoned' attribute the device type:
 'host-managed', 'host-aware', or 'device managed'.
>>>
>>>
>>> I hacked your patches and simply put a "0" or "1" in the sysfs zoned
>>> file.
>>> Any drive that has ZBC/ZAC command support gets a "1", "0" for everything
>>> else. This means that drive managed models are not exposed as zoned block
>>> devices. For HM vs HA differentiation, an application can look at the
>>> device type file since it is already present.
>>>
>>> We could indeed set the "zoned" file to the device type, but HM drives
>>> and
>>> regular drives will both have "0" in it, so no differentiation possible.
>>> The other choice could be the "zoned" bits defined by ZBC, but these
>>> do not define a value for host managed drives, and the drive managed
>>> value
>>> being not "0" could be confusing too. So I settled for a simple 0/1
>>> boolean.
>>
>>
>> This seems good to me.
>
>
> Another option I forgot is for the "zoned" file to indicate the total number
> of zones of the device, and 0 for a non zoned regular block device. That
> would work as well.

Clearly either is sufficient.

> [...]
>>>
>>> Done: I hacked Shaun ioctl code and added finish zone too. The
>>> difference with Shaun initial code is that the ioctl are propagated down
>>> to
>>> the driver (__blkdev_driver_ioctl -> sd_ioctl) so that there is no need
>>> for
>>> BIO request definition for the zone operations. So a lot less code added.
>>
>>
>> The purpose of the BIO flags is not to enable the ioctls so much as
>> the other way round. Creating BIO op's is to enable issuing ZBC
>> commands from device mapper targets and file systems without some
>> heinous ioctl hacks.
>> Making the resulting block layer interfaces available via ioctls is just a
>> reasonable way to exercise the code ... or that was my intent.
>
>
> Yes, I understood your code. However, since (or if) we keep the zone
> information in the RB-tree cache, there is no need for the report zone
> operation BIO interface. Same for reset write pointer by keeping the mapping
> to discard. blk_lookup_zone can be used in kernel as a report zone BIO
> replacement and works as well for the report zone ioctl implementation. For
> reset, there is blkdev_issue_discrad in kernel, and the reset zone ioctl
> becomes equivalent to BLKDISCARD ioctl. These are simple. Open, close and
> finish zone remains. For these, adding the BIO interface seemed an overkill.
> Hence my choice of propagating the ioctl to the driver.
> This is debatable of course, and adding an in-kernel interface is not hard:
> we can implement blk_open_zone, blk_close_zone and blk_finish_zone using
> __blkdev_driver_ioctl. That looks clean to me.

Uh. I would call that "heinous" ioctl hacks myself. Kernel -> User API
-> Kernel
is not really a good designed IMO.

> Overall, my concern with the BIO based interface for the ZBC commands is
> that it adds one flag for each command, which is not really the philosophy
> of the interface and potentially opens the door for more such
> implementations in the future with new standards and new commands coming up.
> Clearly that is not a sustainable path. So I think that a more specific
> interface for these zone operations is a better choice. That is consistent
> with what happens with the tons of ATA and SCSI commands not actually doing
> data I/Os (mode sense, log pages, SMART, etc). All these do not use BIOs and
> are processed as request REQ_TYPE_BLOCK_PC.

Part of the reason for following on Mike Christie's bio op/flags cleanup was to
make these op's. The advantage of being added as ops is that there is only
1 extra bit need (not 4 or 5 bits for flags). The other reason for being
promoted into the block layer as commands is because it seems to me
to make sense that these abstractions could be allowed to be passed through
a DM layer and be handled by a files system.

>>> The ioctls do not mimic exactly the ZBC standard. For instance, there is
>>> no
>>> reporting options for report zones, nor is the "all" bit supported for
>>> open,
>>> close or finish zone commands. But the information provided on zones is
>>> complete
>>> and maps to the standard definitions.
>>
>>
>> For the reporting options I have planned to reuse the stream_id in
>> struct bio when that is formalized. There are certainly other places in
>> struct bio to stuff a few extra bits ...

Sorry I was confused here. I was under the impression you were talking
about one of my patches when you seem to have been talking about