Re: [PATCH 27/27] aio: add support for pre-mapped user IO buffers

2018-11-30 Thread Jens Axboe
On 11/30/18 3:04 PM, Jeff Moyer wrote: > Jens Axboe writes: > A limit of 4M is imposed as the largest buffer we currently support. There's nothing preventing us from going larger, but we need some cap, and 4M seemed like it would definitely be big enough. >>> >>> Doesn't this mean

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Jens Axboe
On 11/30/18 2:34 PM, Jens Axboe wrote: > On 11/30/18 2:25 PM, Al Viro wrote: >> On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote: > Would this make you happy: > > if (!is_vmalloc_addr(kv->iov_base)) > page = virt_to_page(kv->iov_base); > else >

Re: [PATCH 27/27] aio: add support for pre-mapped user IO buffers

2018-11-30 Thread Jeff Moyer
Jens Axboe writes: >>> A limit of 4M is imposed as the largest buffer we currently support. >>> There's nothing preventing us from going larger, but we need some cap, >>> and 4M seemed like it would definitely be big enough. >> >> Doesn't this mean that a user can pin a bunch of memory?

Re: [PATCH 27/27] aio: add support for pre-mapped user IO buffers

2018-11-30 Thread Jens Axboe
On 11/30/18 2:44 PM, Jeff Moyer wrote: > Hi, Jens, > > Jens Axboe writes: > >> If we have fixed user buffers, we can map them into the kernel when we >> setup the io_context. That avoids the need to do get_user_pages() for >> each and every IO. >> >> To utilize this feature, the application

Re: [PATCH 27/27] aio: add support for pre-mapped user IO buffers

2018-11-30 Thread Jeff Moyer
Hi, Jens, Jens Axboe writes: > If we have fixed user buffers, we can map them into the kernel when we > setup the io_context. That avoids the need to do get_user_pages() for > each and every IO. > > To utilize this feature, the application must set both > IOCTX_FLAG_USERIOCB, to provide iocb's

Re: [PATCH 1/2] sbitmap: ammortize cost of clearing bits

2018-11-30 Thread Omar Sandoval
On Fri, Nov 30, 2018 at 01:10:47PM -0700, Jens Axboe wrote: > On 11/30/18 1:03 PM, Omar Sandoval wrote: > > On Fri, Nov 30, 2018 at 09:01:17AM -0700, Jens Axboe wrote: > >> sbitmap maintains a set of words that we use to set and clear bits, with > >> each bit representing a tag for blk-mq. Even

Re: [PATCH 2/2] sbitmap: optimize wakeup check

2018-11-30 Thread Omar Sandoval
On Fri, Nov 30, 2018 at 09:01:18AM -0700, Jens Axboe wrote: > Even if we have no waiters on any of the sbitmap_queue wait states, we > still have to loop every entry to check. We do this for every IO, so > the cost adds up. > > Shift a bit of the cost to the slow path, when we actually have

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Jens Axboe
On 11/30/18 2:25 PM, Al Viro wrote: > On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote: Would this make you happy: if (!is_vmalloc_addr(kv->iov_base)) page = virt_to_page(kv->iov_base); else page = vmalloc_to_page(kv->iov_base); >>> >>> Free

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Al Viro
On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote: > >> Would this make you happy: > >> > >> if (!is_vmalloc_addr(kv->iov_base)) > >> page = virt_to_page(kv->iov_base); > >> else > >> page = vmalloc_to_page(kv->iov_base); > > > > Free advice: don't ever let Linus see

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Jens Axboe
On 11/30/18 2:11 PM, Al Viro wrote: > On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote: >> On 11/30/18 1:15 PM, Jens Axboe wrote: >>> On 11/30/18 12:21 PM, Al Viro wrote: On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: > For an ITER_KVEC, we can just iterate the iov

[PATCH v2] blk-mq: don't call ktime_get_ns() if we don't need it

2018-11-30 Thread Jens Axboe
We only need the request fields and the end_io time if we have stats enabled, or if we have a scheduler attached as those may use it for completion time stats. Signed-off-by: Jens Axboe --- v2: add helper, use it in both spots. also clear ->start_time_ns so merging doesn't read garbage.

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Al Viro
On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote: > On 11/30/18 1:15 PM, Jens Axboe wrote: > > On 11/30/18 12:21 PM, Al Viro wrote: > >> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: > >>> For an ITER_KVEC, we can just iterate the iov and add the pages > >>> to the bio

Re: [PATCH] blk-mq: don't call ktime_get_ns() if we don't need it

2018-11-30 Thread Jens Axboe
On 11/30/18 10:29 AM, Jens Axboe wrote: > On 11/30/18 10:27 AM, Christoph Hellwig wrote: >> On Fri, Nov 30, 2018 at 08:56:25AM -0700, Jens Axboe wrote: >>> We only need the request fields and the end_io time if we have stats >>> enabled, or if we have a scheduler attached as those may use it for

Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 01:36:09PM -0700, Jens Axboe wrote: > On 11/30/18 1:26 PM, Keith Busch wrote: > > A driver may wish to iterate every tagged request, not just ones that > > satisfy blk_mq_request_started(). The intended use is so a driver may > > terminate entered requests on quiesced

Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-11-30 Thread Jens Axboe
On 11/30/18 1:26 PM, Keith Busch wrote: > A driver may wish to iterate every tagged request, not just ones that > satisfy blk_mq_request_started(). The intended use is so a driver may > terminate entered requests on quiesced queues. How about we just move the started check into the handler passed

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Jens Axboe
On 11/30/18 1:15 PM, Jens Axboe wrote: > On 11/30/18 12:21 PM, Al Viro wrote: >> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: >>> For an ITER_KVEC, we can just iterate the iov and add the pages >>> to the bio directly. >> >>> + page = virt_to_page(kv->iov_base); >>> +

[PATCH 2/2] nvme: Remove queue flushing hack

2018-11-30 Thread Keith Busch
The nvme driver checked the queue state on every IO so the path could drain requests. The code however declares "We shold not need to do this", so let's not do it. Instead, use blk-mq's tag iterator to terminate entered requests on dying queues so the IO path doesn't have to deal with these

[PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-11-30 Thread Keith Busch
A driver may wish to iterate every tagged request, not just ones that satisfy blk_mq_request_started(). The intended use is so a driver may terminate entered requests on quiesced queues. Signed-off-by: Keith Busch --- block/blk-mq-tag.c | 41 +++--

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Jens Axboe
On 11/30/18 12:21 PM, Al Viro wrote: > On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: >> For an ITER_KVEC, we can just iterate the iov and add the pages >> to the bio directly. > >> +page = virt_to_page(kv->iov_base); >> +size = bio_add_page(bio, page,

Re: [PATCH 26/27] iov_iter: add import_kvec()

2018-11-30 Thread Jens Axboe
On 11/30/18 12:17 PM, Al Viro wrote: > On Fri, Nov 30, 2018 at 09:56:45AM -0700, Jens Axboe wrote: >> This explicitly sets up an ITER_KVEC from an iovec with kernel ranges >> mapped. > >> +int import_kvec(int type, const struct kvec *kvecs, unsigned nr_segs, >> +size_t bytes, struct

Re: [PATCH 1/2] sbitmap: ammortize cost of clearing bits

2018-11-30 Thread Jens Axboe
On 11/30/18 1:03 PM, Omar Sandoval wrote: > On Fri, Nov 30, 2018 at 09:01:17AM -0700, Jens Axboe wrote: >> sbitmap maintains a set of words that we use to set and clear bits, with >> each bit representing a tag for blk-mq. Even though we spread the bits >> out and maintain a hint cache, one

Re: [PATCH 1/2] sbitmap: ammortize cost of clearing bits

2018-11-30 Thread Omar Sandoval
On Fri, Nov 30, 2018 at 09:01:17AM -0700, Jens Axboe wrote: > sbitmap maintains a set of words that we use to set and clear bits, with > each bit representing a tag for blk-mq. Even though we spread the bits > out and maintain a hint cache, one particular bit allocated will end up > being cleared

Re: [PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Al Viro
On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote: > For an ITER_KVEC, we can just iterate the iov and add the pages > to the bio directly. > + page = virt_to_page(kv->iov_base); > + size = bio_add_page(bio, page, kv->iov_len, > +

Re: [PATCH 26/27] iov_iter: add import_kvec()

2018-11-30 Thread Al Viro
On Fri, Nov 30, 2018 at 09:56:45AM -0700, Jens Axboe wrote: > This explicitly sets up an ITER_KVEC from an iovec with kernel ranges > mapped. > +int import_kvec(int type, const struct kvec *kvecs, unsigned nr_segs, > + size_t bytes, struct iov_iter *iter) > +{ > + const struct

Re: [PATCH] block: fix single range discard merge

2018-11-30 Thread Bart Van Assche
On Fri, 2018-11-30 at 10:20 -0700, Jens Axboe wrote: > On 11/30/18 10:18 AM, Bart Van Assche wrote: > > On Sat, 2018-12-01 at 00:38 +0800, Ming Lei wrote: > > > Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler > > > attached") > > > > Since this patch fixes a bug introduced in

Re: [PATCH] blk-mq: don't call ktime_get_ns() if we don't need it

2018-11-30 Thread Jens Axboe
On 11/30/18 10:27 AM, Christoph Hellwig wrote: > On Fri, Nov 30, 2018 at 08:56:25AM -0700, Jens Axboe wrote: >> We only need the request fields and the end_io time if we have stats >> enabled, or if we have a scheduler attached as those may use it for >> completion time stats. >> >> Signed-off-by:

Re: [PATCH] block: fix single range discard merge

2018-11-30 Thread Christoph Hellwig
Looks good, Reviewed-by: Christoph Hellwig

Re: [PATCH] blk-mq: don't call ktime_get_ns() if we don't need it

2018-11-30 Thread Christoph Hellwig
On Fri, Nov 30, 2018 at 08:56:25AM -0700, Jens Axboe wrote: > We only need the request fields and the end_io time if we have stats > enabled, or if we have a scheduler attached as those may use it for > completion time stats. > > Signed-off-by: Jens Axboe > --- > block/blk-mq.c | 13

Re: [PATCH 01/27] aio: fix failure to put the file pointer

2018-11-30 Thread Bart Van Assche
On Fri, 2018-11-30 at 10:08 -0700, Jens Axboe wrote: > On 11/30/18 10:07 AM, Bart Van Assche wrote: > > On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote: > > > If the ioprio capability check fails, we return without putting > > > the file pointer. > > > > > > Fixes: d9a08a9e616b ("fs: Add aio

Re: [PATCH] block: fix single range discard merge

2018-11-30 Thread Jens Axboe
On 11/30/18 10:18 AM, Bart Van Assche wrote: > On Sat, 2018-12-01 at 00:38 +0800, Ming Lei wrote: >> Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached") > > Since this patch fixes a bug introduced in kernel v4.16, does it need > a "Cc: stable" tag? Like the other one,

Re: [PATCH] block: fix single range discard merge

2018-11-30 Thread Bart Van Assche
On Sat, 2018-12-01 at 00:38 +0800, Ming Lei wrote: > Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached") Since this patch fixes a bug introduced in kernel v4.16, does it need a "Cc: stable" tag? Thanks, Bart.

Re: [PATCH 05/27] block: ensure that async polled IO is marked REQ_NOWAIT

2018-11-30 Thread Jens Axboe
On 11/30/18 10:12 AM, Bart Van Assche wrote: > On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote: >> We can't wait for polled events to complete, as they may require active >> polling from whoever submitted it. If that is the same task that is >> submitting new IO, we could deadlock waiting for

Re: [PATCH 02/27] aio: clear IOCB_HIPRI

2018-11-30 Thread Christoph Hellwig
I think we'll need to queue this up for 4.21 ASAP independent of the rest, given that with separate poll queues userspace could otherwise submit I/O that will never get polled for anywhere.

Re: [PATCH 02/27] aio: clear IOCB_HIPRI

2018-11-30 Thread Jens Axboe
On 11/30/18 10:13 AM, Christoph Hellwig wrote: > I think we'll need to queue this up for 4.21 ASAP independent of the > rest, given that with separate poll queues userspace could otherwise > submit I/O that will never get polled for anywhere. Probably a good idea, I can just move it to my 4.21

Re: [PATCH 05/27] block: ensure that async polled IO is marked REQ_NOWAIT

2018-11-30 Thread Bart Van Assche
On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote: > We can't wait for polled events to complete, as they may require active > polling from whoever submitted it. If that is the same task that is > submitting new IO, we could deadlock waiting for IO to complete that > this task is supposed to be

Re: [GIT PULL] nvme fixes for 4.20

2018-11-30 Thread Jens Axboe
On 11/30/18 9:26 AM, Christoph Hellwig wrote: > On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote: >> On 11/30/18 8:24 AM, Christoph Hellwig wrote: >>> Various fixlets all over, including throwing in a 'default y' for the >>> multipath code, given that we want people to actually enable it

Re: [PATCH] block: fix single range discard merge

2018-11-30 Thread Jens Axboe
On 11/30/18 9:38 AM, Ming Lei wrote: > There are actually two kinds of discard merge: > > - one is the normal discard merge, just like normal read/write request, > and call it single-range discard > > - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1 > > For the former

Re: [PATCH 01/27] aio: fix failure to put the file pointer

2018-11-30 Thread Jens Axboe
On 11/30/18 10:07 AM, Bart Van Assche wrote: > On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote: >> If the ioprio capability check fails, we return without putting >> the file pointer. >> >> Fixes: d9a08a9e616b ("fs: Add aio iopriority support") >> Reviewed-by: Johannes Thumshirn >>

Re: [PATCH 01/27] aio: fix failure to put the file pointer

2018-11-30 Thread Bart Van Assche
On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote: > If the ioprio capability check fails, we return without putting > the file pointer. > > Fixes: d9a08a9e616b ("fs: Add aio iopriority support") > Reviewed-by: Johannes Thumshirn > Reviewed-by: Christoph Hellwig > Signed-off-by: Jens Axboe >

[PATCH 25/27] fs: add support for mapping an ITER_KVEC for O_DIRECT

2018-11-30 Thread Jens Axboe
This adds support for sync/async O_DIRECT to make a kvec type iter for bdev access, as well as iomap. Signed-off-by: Jens Axboe --- fs/block_dev.c | 16 fs/iomap.c | 5 - 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c

[PATCH 15/27] aio: add io_setup2() system call

2018-11-30 Thread Jens Axboe
This is just like io_setup(), except add a flags argument to let the caller control/define some of the io_context behavior. Outside of that, we pass in an iocb array for future use. Signed-off-by: Jens Axboe --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c

[PATCH 16/27] aio: add support for having user mapped iocbs

2018-11-30 Thread Jens Axboe
For io_submit(), we have to first copy each pointer to an iocb, then copy the iocb. The latter is 64 bytes in size, and that's a lot of copying for a single IO. Add support for setting IOCTX_FLAG_USERIOCB through the new io_setup2() system call, which allows the iocbs to reside in userspace. If

[PATCH 23/27] block: add BIO_HOLD_PAGES flag

2018-11-30 Thread Jens Axboe
For user mapped IO, we do get_user_pages() upfront, and then do a put_page() on each page at end_io time to release the page reference. In preparation for having permanently mapped pages, add a BIO_HOLD_PAGES flag that tells us not to release the pages, the caller will do that. Signed-off-by:

[PATCH 24/27] block: implement bio helper to add iter kvec pages to bio

2018-11-30 Thread Jens Axboe
For an ITER_KVEC, we can just iterate the iov and add the pages to the bio directly. Signed-off-by: Jens Axboe --- block/bio.c | 30 ++ include/linux/bio.h | 1 + 2 files changed, 31 insertions(+) diff --git a/block/bio.c b/block/bio.c index

[PATCH 18/27] aio: add submission side request cache

2018-11-30 Thread Jens Axboe
We have to add each submitted polled request to the io_context poll_submitted list, which means we have to grab the poll_lock. We already use the block plug to batch submissions if we're doing a batch of IO submissions, extend that to cover the poll requests internally as well. Signed-off-by:

[PATCH 10/27] aio: don't zero entire aio_kiocb aio_get_req()

2018-11-30 Thread Jens Axboe
It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Signed-off-by: Jens Axboe --- fs/aio.c | 19 --- 1 file changed, 12

[PATCH 21/27] aio: split iocb init from allocation

2018-11-30 Thread Jens Axboe
Signed-off-by: Jens Axboe --- fs/aio.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 291bbc62b2a8..341eb1b19319 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1088,6 +1088,16 @@ static bool get_reqs_available(struct kioctx *ctx)

[PATCH 22/27] aio: batch aio_kiocb allocation

2018-11-30 Thread Jens Axboe
Similarly to how we use the state->ios_left to know how many references to get to a file, we can use it to allocate the aio_kiocb's we need in bulk. Signed-off-by: Jens Axboe --- fs/aio.c | 42 +- 1 file changed, 37 insertions(+), 5 deletions(-) diff

[PATCH 17/27] aio: support for IO polling

2018-11-30 Thread Jens Axboe
Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act like their non-polled counterparts, except we expect to poll for completion of them. The polling happens at io_getevent() time, and works just like non-polled IO. To setup an io_context for polled IO, the application must call

[PATCH 20/27] aio: use fget/fput_many() for file references

2018-11-30 Thread Jens Axboe
On the submission side, add file reference batching to the aio_submit_state. We get as many references as the number of iocbs we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple,

[PATCH 14/27] aio: abstract out io_event filler helper

2018-11-30 Thread Jens Axboe
Signed-off-by: Jens Axboe --- fs/aio.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index ba5758c854e8..12859ea1cb64 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1057,6 +1057,15 @@ static inline void iocb_put(struct aio_kiocb *iocb)

[PATCH 19/27] fs: add fget_many() and fput_many()

2018-11-30 Thread Jens Axboe
Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument

[PATCH 12/27] aio: use iocb_put() instead of open coding it

2018-11-30 Thread Jens Axboe
Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Signed-off-by: Jens Axboe --- fs/aio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 533cb7b1112f..e8457f9486e3 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1878,10

[PATCH 13/27] aio: split out iocb copy from io_submit_one()

2018-11-30 Thread Jens Axboe
In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Signed-off-by: Jens Axboe --- fs/aio.c | 68 +++- 1 file changed, 38

[PATCH 27/27] aio: add support for pre-mapped user IO buffers

2018-11-30 Thread Jens Axboe
If we have fixed user buffers, we can map them into the kernel when we setup the io_context. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must set both IOCTX_FLAG_USERIOCB, to provide iocb's in userspace, and then

[PATCH 26/27] iov_iter: add import_kvec()

2018-11-30 Thread Jens Axboe
This explicitly sets up an ITER_KVEC from an iovec with kernel ranges mapped. Signed-off-by: Jens Axboe --- include/linux/uio.h | 3 +++ lib/iov_iter.c | 35 ++- 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/include/linux/uio.h

[PATCH 09/27] aio: separate out ring reservation from req allocation

2018-11-30 Thread Jens Axboe
This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Jens Axboe --- fs/aio.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index cf0de61743e8..eaceb40e6cf5 100644 --- a/fs/aio.c

[PATCH 03/27] fs: add an iopoll method to struct file_operations

2018-11-30 Thread Jens Axboe
From: Christoph Hellwig This new methods is used to explicitly poll for I/O completion for an iocb. It must be called for any iocb submitted asynchronously (that is with a non-null ki_complete) which has the IOCB_HIPRI flag set. The method is assisted by a new ki_cookie field in struct iocb to

[PATCH 06/27] iomap: wire up the iopoll method

2018-11-30 Thread Jens Axboe
From: Christoph Hellwig Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: Christoph Hellwig

[PATCH 11/27] aio: only use blk plugs for > 2 depth submissions

2018-11-30 Thread Jens Axboe
Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Signed-off-by: Jens Axboe --- fs/aio.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index

[PATCH 08/27] aio: use assigned completion handler

2018-11-30 Thread Jens Axboe
We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH 05/27] block: ensure that async polled IO is marked REQ_NOWAIT

2018-11-30 Thread Jens Axboe
We can't wait for polled events to complete, as they may require active polling from whoever submitted it. If that is the same task that is submitting new IO, we could deadlock waiting for IO to complete that this task is supposed to be completing itself. Signed-off-by: Jens Axboe ---

[PATCH 07/27] iomap: ensure that async polled IO is marked REQ_NOWAIT

2018-11-30 Thread Jens Axboe
We can't wait for polled events to complete, as they may require active polling from whoever submitted it. If that is the same task that is submitting new IO, we could deadlock waiting for IO to complete that this task is supposed to be completing itself. Signed-off-by: Jens Axboe ---

[PATCHSET v4] Support for polled aio

2018-11-30 Thread Jens Axboe
For the grand introduction to this feature, see my original posting here: https://lore.kernel.org/linux-block/20181117235317.7366-1-ax...@kernel.dk/ and refer to the previous postings of this patchset for whatever features were added there. Outside of "just" supporting polled IO, it also adds

[PATCH 01/27] aio: fix failure to put the file pointer

2018-11-30 Thread Jens Axboe
If the ioprio capability check fails, we return without putting the file pointer. Fixes: d9a08a9e616b ("fs: Add aio iopriority support") Reviewed-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 1 + 1 file changed, 1 insertion(+) diff --git

[PATCH 04/27] block: wire up block device iopoll method

2018-11-30 Thread Jens Axboe
From: Christoph Hellwig Just call blk_poll on the iocb cookie, we can derive the block device from the inode trivially. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/block_dev.c | 10 ++ 1 file changed, 10 insertions(+) diff --git

[PATCH 02/27] aio: clear IOCB_HIPRI

2018-11-30 Thread Jens Axboe
From: Christoph Hellwig No one is going to poll for aio (yet), so we must clear the HIPRI flag, as we would otherwise send it down the poll queues, where no one will be polling for completions. Signed-off-by: Christoph Hellwig IOCB_HIPRI, not RWF_HIPRI. Reviewed-by: Johannes Thumshirn

[PATCH] block: fix single range discard merge

2018-11-30 Thread Ming Lei
There are actually two kinds of discard merge: - one is the normal discard merge, just like normal read/write request, and call it single-range discard - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1 For the former case, queue_max_discard_segments(rq->q) is 1, and we

Re: [GIT PULL] nvme fixes for 4.20

2018-11-30 Thread Christoph Hellwig
On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote: > On 11/30/18 8:24 AM, Christoph Hellwig wrote: > > Various fixlets all over, including throwing in a 'default y' for the > > multipath code, given that we want people to actually enable it for full > > functionality. > > Why enable it

[PATCH 1/2] sbitmap: ammortize cost of clearing bits

2018-11-30 Thread Jens Axboe
sbitmap maintains a set of words that we use to set and clear bits, with each bit representing a tag for blk-mq. Even though we spread the bits out and maintain a hint cache, one particular bit allocated will end up being cleared in the exact same spot. This introduces batched clearing of bits.

[PATCH 2/2] sbitmap: optimize wakeup check

2018-11-30 Thread Jens Axboe
Even if we have no waiters on any of the sbitmap_queue wait states, we still have to loop every entry to check. We do this for every IO, so the cost adds up. Shift a bit of the cost to the slow path, when we actually have waiters. Wrap prepare_to_wait_exclusive() and finish_wait(), so we can

[PATCHSET v4] sbitmap optimizations

2018-11-30 Thread Jens Axboe
This versions tests out solid, and we're still seeing the same improvements. Changes: - Lock map index for the move. This eliminates the race completely, since it's now not possible to find ->cleared == 0 while swap of bits is in progress. The previous version was fine for users that

[PATCH] blk-mq: don't call ktime_get_ns() if we don't need it

2018-11-30 Thread Jens Axboe
We only need the request fields and the end_io time if we have stats enabled, or if we have a scheduler attached as those may use it for completion time stats. Signed-off-by: Jens Axboe --- block/blk-mq.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-30 Thread Jens Axboe
On 11/30/18 1:00 AM, Christoph Hellwig wrote: > On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote: >> On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote: >>> +enum hctx_type { >>> + HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */ >>> + HCTX_TYPE_READ,

Re: [GIT PULL] nvme fixes for 4.20

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote: > On 11/30/18 8:24 AM, Christoph Hellwig wrote: > > Various fixlets all over, including throwing in a 'default y' for the > > multipath code, given that we want people to actually enable it for full > > functionality. > > Why enable it

Re: [GIT PULL] nvme fixes for 4.20

2018-11-30 Thread Jens Axboe
On 11/30/18 8:24 AM, Christoph Hellwig wrote: > Various fixlets all over, including throwing in a 'default y' for the > multipath code, given that we want people to actually enable it for full > functionality. Why enable it by default? 99.9% of users aren't going to care. That seems like an odd

[GIT PULL] nvme fixes for 4.20

2018-11-30 Thread Christoph Hellwig
Various fixlets all over, including throwing in a 'default y' for the multipath code, given that we want people to actually enable it for full functionality. The following changes since commit 14b04063cc994effc86f976625bf8f806d8d44cb: Merge branch 'nvme-4.20' of git://git.infradead.org/nvme

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-30 Thread Christoph Hellwig
On Fri, Nov 30, 2018 at 03:20:51PM +, Jens Axboe wrote: > Thanks - are you going to post a v3? Would like to get this staged. Yes, will do. Either late tonight or over the weekend.

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-30 Thread Jens Axboe
On 11/30/18 12:56 AM, Christoph Hellwig wrote: > On Thu, Nov 29, 2018 at 07:50:09PM +, Jens Axboe wrote: >>> in our post-spectre world. Also having too many queue type is just >>> going to create confusion, so I'd rather manage them centrally. >>> >>> Note that the queue type naming and

Re: [PATCH] block: avoid extra bio reference for async O_DIRECT

2018-11-30 Thread Jens Axboe
On 11/30/18 1:38 AM, Christoph Hellwig wrote: > I like the idea, but I don't think it is correct as-is. We can't just > move the blkdev_bio_dirty_release for the last bio into the caller, as > the completion order that decrements the refcount might have different > ordering. Ugh yeah, not sure

Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 12:08:09AM -0800, Christoph Hellwig wrote: > On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote: > > On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote: > > > + > > > + /* handle any remaining CQEs */ > > > + if (opcode ==

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 12:00:13AM -0800, Christoph Hellwig wrote: > On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote: > > On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote: > > > +enum hctx_type { > > > + HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */

Re: [PATCH v5 0/5] lightnvm: Flexible metadata

2018-11-30 Thread Matias Bjørling
On 11/30/2018 12:43 PM, Igor Konopko wrote: This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata is supported in pblk. Also there is an support for case without NVMe metadata. Changes v4 --> v5: -rebase on top of

Re: [PATCH v5 0/5] lightnvm: Flexible metadata

2018-11-30 Thread Hans Holmberg
I just started a regression test on this patch set that'll run over the weekend. I'll add a tested-by if everything checks out. All the best, Hans On Fri, Nov 30, 2018 at 12:49 PM Igor Konopko wrote: > > This series of patches extends the way how pblk can > store L2P sector metadata. After this

[PATCH v5 3/5] lightnvm: Flexible DMA pool entry size

2018-11-30 Thread Igor Konopko
Currently whole lightnvm and pblk uses single DMA pool, for which entry size is always equal to PAGE_SIZE. PPA list always needs 8B*64, so there is only 56B*64 space for OOB meta. Since NVMe OOB meta can be bigger, such as 128B, this solution is not robustness. This patch add the possiblity to

[PATCH v5 2/5] lightnvm: pblk: Helpers for OOB metadata

2018-11-30 Thread Igor Konopko
Currently pblk assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. Still, after this patch only OOB metadata equal to 16 bytes is supported.

[PATCH v5 4/5] lightnvm: Disable interleaved metadata

2018-11-30 Thread Igor Konopko
Currently pblk and lightnvm does only check for size of OOB metadata and does not care wheather this meta is located in separate buffer or is interleaved with data in single buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO

[PATCH v5 5/5] lightnvm: pblk: Support for packed metadata

2018-11-30 Thread Igor Konopko
In current pblk implementation, l2p mapping for not closed lines is always stored only in OOB metadata and recovered from it. Such a solution does not provide data integrity when drives does not have such a OOB metadata space. The goal of this patch is to add support for so called packed

[PATCH v5 0/5] lightnvm: Flexible metadata

2018-11-30 Thread Igor Konopko
This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata is supported in pblk. Also there is an support for case without NVMe metadata. Changes v4 --> v5: -rebase on top of ocssd/for-4.21/core Changes v3 --> v4: -rename

[PATCH v5 1/5] lightnvm: pblk: Move lba list to partial read context

2018-11-30 Thread Igor Konopko
Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Reviewed-by: Javier González Signed-off-by: Igor Konopko --- drivers/lightnvm/pblk-read.c | 20

Re: [PATCH v4 0/5] lightnvm: Flexible metadata

2018-11-30 Thread Matias Bjørling
On 11/29/2018 08:16 AM, Igor Konopko wrote: This series of patches extends the way how pblk can store L2P sector metadata. After this set of changes any size of NVMe metadata is supported in pblk. Also there is an support for case without NVMe metadata. Changes v3 --> v4: -rename

Re: [PATCH] block: avoid extra bio reference for async O_DIRECT

2018-11-30 Thread Christoph Hellwig
I like the idea, but I don't think it is correct as-is. We can't just move the blkdev_bio_dirty_release for the last bio into the caller, as the completion order that decrements the refcount might have different ordering. Something like the simpler patch below might archive your goal, though:

Re: [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues

2018-11-30 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 02:08:40PM -0700, Keith Busch wrote: > On Thu, Nov 29, 2018 at 08:13:05PM +0100, Christoph Hellwig wrote: > > @@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data) > > irqreturn_t ret = IRQ_NONE; > > u16 start, end; > > > > -

Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues

2018-11-30 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote: > On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote: > > This is the last place outside of nvme_irq that handles CQEs from > > interrupt context, and thus is in the way of removing the cq_lock for > > normal queues, and

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-30 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote: > On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote: > > +enum hctx_type { > > + HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */ > > + HCTX_TYPE_READ, /* just for READ I/O */ > > +