[PATCH 21/26] block: add BIO_HOLD_PAGES flag

2018-12-07 Thread Jens Axboe
For user mapped IO, we do get_user_pages() upfront, and then do a put_page() on each page at end_io time to release the page reference. In preparation for having permanently mapped pages, add a BIO_HOLD_PAGES flag that tells us not to release the pages, the caller will do that. Signed-off-by:

[PATCH 20/26] aio: batch aio_kiocb allocation

2018-12-07 Thread Jens Axboe
Similarly to how we use the state->ios_left to know how many references to get to a file, we can use it to allocate the aio_kiocb's we need in bulk. Signed-off-by: Jens Axboe --- fs/aio.c | 47 +++ 1 file changed, 39 insertions(+), 8 deletions(-)

[PATCH 26/26] aio: add support for submission/completion rings

2018-12-07 Thread Jens Axboe
Experimental support for submitting and completing IO through rings shared between the application and kernel. The submission rings are struct iocb, like we would submit through io_submit(), and the completion rings are struct io_event, like we would pass in (and copy back) from io_getevents().

[PATCH 25/26] aio: split old ring complete out from aio_complete()

2018-12-07 Thread Jens Axboe
Signed-off-by: Jens Axboe --- fs/aio.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index fd323b3ba499..de48faeab0fd 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1218,12 +1218,9 @@ static void aio_fill_event(struct io_event *ev, struct

[PATCH 17/26] fs: add fget_many() and fput_many()

2018-12-07 Thread Jens Axboe
Some uses cases repeatedly get and put references to the same file, but the only exposed interface is doing these one at the time. As each of these entail an atomic inc or dec on a shared structure, that cost can add up. Add fget_many(), which works just like fget(), except it takes an argument

[PATCH 18/26] aio: use fget/fput_many() for file references

2018-12-07 Thread Jens Axboe
On the submission side, add file reference batching to the aio_submit_state. We get as many references as the number of iocbs we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple,

[PATCH 24/26] aio: add support for pre-mapped user IO buffers

2018-12-07 Thread Jens Axboe
If we have fixed user buffers, we can map them into the kernel when we setup the io_context. That avoids the need to do get_user_pages() for each and every IO. To utilize this feature, the application must set both IOCTX_FLAG_USERIOCB, to provide iocb's in userspace, and then

[PATCH 19/26] aio: split iocb init from allocation

2018-12-07 Thread Jens Axboe
In preparation from having pre-allocated requests, that we then just need to initialize before use. Signed-off-by: Jens Axboe --- fs/aio.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3c07cc9cb11a..51c7159f09bf 100644 ---

[PATCH 23/26] fs: add support for mapping an ITER_BVEC for O_DIRECT

2018-12-07 Thread Jens Axboe
This adds support for sync/async O_DIRECT to make a bvec type iter for bdev access, as well as iomap. Signed-off-by: Jens Axboe --- fs/block_dev.c | 16 fs/iomap.c | 10 +++--- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/block_dev.c

[PATCH 22/26] block: implement bio helper to add iter bvec pages to bio

2018-12-07 Thread Jens Axboe
For an ITER_BVEC, we can just iterate the iov and add the pages to the bio directly. Signed-off-by: Jens Axboe --- block/bio.c | 27 +++ include/linux/bio.h | 1 + 2 files changed, 28 insertions(+) diff --git a/block/bio.c b/block/bio.c index

[PATCH 16/26] aio: add submission side request cache

2018-12-07 Thread Jens Axboe
We have to add each submitted polled request to the io_context poll_submitted list, which means we have to grab the poll_lock. We already use the block plug to batch submissions if we're doing a batch of IO submissions, extend that to cover the poll requests internally as well. Signed-off-by:

[PATCH 12/26] aio: abstract out io_event filler helper

2018-12-07 Thread Jens Axboe
Signed-off-by: Jens Axboe --- fs/aio.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 06c8bcc72496..173f1f79dc8f 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1063,6 +1063,15 @@ static inline void iocb_put(struct aio_kiocb *iocb)

[PATCH 14/26] aio: add support for having user mapped iocbs

2018-12-07 Thread Jens Axboe
For io_submit(), we have to first copy each pointer to an iocb, then copy the iocb. The latter is 64 bytes in size, and that's a lot of copying for a single IO. Add support for setting IOCTX_FLAG_USERIOCB through the new io_setup2() system call, which allows the iocbs to reside in userspace. If

[PATCH 11/26] aio: split out iocb copy from io_submit_one()

2018-12-07 Thread Jens Axboe
In preparation of handing in iocbs in a different fashion as well. Also make it clear that the iocb being passed in isn't modified, by marking it const throughout. Signed-off-by: Jens Axboe --- fs/aio.c | 68 +++- 1 file changed, 38

[PATCH 08/26] aio: don't zero entire aio_kiocb aio_get_req()

2018-12-07 Thread Jens Axboe
It's 192 bytes, fairly substantial. Most items don't need to be cleared, especially not upfront. Clear the ones we do need to clear, and leave the other ones for setup when the iocb is prepared and submitted. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 9 +++--

[PATCH 13/26] aio: add io_setup2() system call

2018-12-07 Thread Jens Axboe
This is just like io_setup(), except add a flags argument to let the caller control/define some of the io_context behavior. Outside of the flags, we add an iocb array and two user pointers for future use. Signed-off-by: Jens Axboe --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/aio.c

[PATCH 15/26] aio: support for IO polling

2018-12-07 Thread Jens Axboe
Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act like their non-polled counterparts, except we expect to poll for completion of them. The polling happens at io_getevent() time, and works just like non-polled IO. To setup an io_context for polled IO, the application must call

[PATCH 07/26] aio: separate out ring reservation from req allocation

2018-12-07 Thread Jens Axboe
From: Christoph Hellwig This is in preparation for certain types of IO not needing a ring reserveration. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/aio.c

[PATCH 05/26] iomap: wire up the iopoll method

2018-12-07 Thread Jens Axboe
From: Christoph Hellwig Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: Christoph Hellwig

[PATCH 09/26] aio: only use blk plugs for > 2 depth submissions

2018-12-07 Thread Jens Axboe
Plugging is meant to optimize submission of a string of IOs, if we don't have more than 2 being submitted, don't bother setting up a plug. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git

[PATCH 10/26] aio: use iocb_put() instead of open coding it

2018-12-07 Thread Jens Axboe
Replace the percpu_ref_put() + kmem_cache_free() with a call to iocb_put() instead. Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index ed6c3914477a..cf93b92bfb1e 100644 ---

[PATCH 04/26] block: use REQ_HIPRI_ASYNC for non-sync polled IO

2018-12-07 Thread Jens Axboe
Tell the block layer if it's a sync or async polled request, so it can do the right thing. Signed-off-by: Jens Axboe --- fs/block_dev.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 6de8d35f6e41..b8f574615792 100644 ---

[PATCH 06/26] aio: use assigned completion handler

2018-12-07 Thread Jens Axboe
We know this is a read/write request, but in preparation for having different kinds of those, ensure that we call the assigned handler instead of assuming it's aio_complete_rq(). Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/aio.c | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH 01/26] fs: add an iopoll method to struct file_operations

2018-12-07 Thread Jens Axboe
From: Christoph Hellwig This new methods is used to explicitly poll for I/O completion for an iocb. It must be called for any iocb submitted asynchronously (that is with a non-null ki_complete) which has the IOCB_HIPRI flag set. The method is assisted by a new ki_cookie field in struct iocb to

[PATCHSET v6] Support for polled aio (and more)

2018-12-07 Thread Jens Axboe
For the grand introduction to this feature, see my original posting here: https://lore.kernel.org/linux-block/20181117235317.7366-1-ax...@kernel.dk/ and refer to the previous postings of this patchset for whatever features were added there. Particularly v4 has some performance results:

[PATCH 02/26] block: add REQ_HIPRI_ASYNC

2018-12-07 Thread Jens Axboe
For the upcoming async polled IO, we can't sleep allocating requests. If we do, then we introduce a deadlock where the submitter already has async polled IO in-flight, but can't wait for them to complete since polled requests must be active found and reaped. Signed-off-by: Jens Axboe ---

[PATCH 03/26] block: wire up block device iopoll method

2018-12-07 Thread Jens Axboe
From: Christoph Hellwig Just call blk_poll on the iocb cookie, we can derive the block device from the inode trivially. Reviewed-by: Johannes Thumshirn Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/block_dev.c | 10 ++ 1 file changed, 10 insertions(+) diff --git

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jens Axboe
On 12/7/18 3:00 PM, Jens Axboe wrote: > On 12/7/18 2:59 PM, Jens Axboe wrote: >> On 12/7/18 2:58 PM, Jens Axboe wrote: >>> On 12/7/18 12:35 PM, Jens Axboe wrote: On 12/7/18 12:34 PM, Jeff Moyer wrote: > Jens Axboe writes: > >> BTW, quick guess is that it doesn't work so well with

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jens Axboe
On 12/7/18 2:59 PM, Jens Axboe wrote: > On 12/7/18 2:58 PM, Jens Axboe wrote: >> On 12/7/18 12:35 PM, Jens Axboe wrote: >>> On 12/7/18 12:34 PM, Jeff Moyer wrote: Jens Axboe writes: > BTW, quick guess is that it doesn't work so well with fixed buffers, as > that > hasn't

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jens Axboe
On 12/7/18 2:58 PM, Jens Axboe wrote: > On 12/7/18 12:35 PM, Jens Axboe wrote: >> On 12/7/18 12:34 PM, Jeff Moyer wrote: >>> Jens Axboe writes: >>> BTW, quick guess is that it doesn't work so well with fixed buffers, as that hasn't been tested. You could try and remove

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jens Axboe
On 12/7/18 12:35 PM, Jens Axboe wrote: > On 12/7/18 12:34 PM, Jeff Moyer wrote: >> Jens Axboe writes: >> >>> BTW, quick guess is that it doesn't work so well with fixed buffers, as that >>> hasn't been tested. You could try and remove IOCTX_FLAG_FIXEDBUFS from the >>> test program and see if that

Re: [GIT PULL] first batch of nvme updates for 4.21

2018-12-07 Thread Jens Axboe
On 12/7/18 12:20 PM, Christoph Hellwig wrote: > Hi Jens, > > please pull this first batch of nvme updates for Linux 4.21. > > Highlights: > - target support for persistent discovery controllers (Jay Sternberg) > - target optimizations to use non-blocking reads (Chaitanya Kulkarni) > - host

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jens Axboe
On 12/7/18 12:34 PM, Jeff Moyer wrote: > Jens Axboe writes: > >> BTW, quick guess is that it doesn't work so well with fixed buffers, as that >> hasn't been tested. You could try and remove IOCTX_FLAG_FIXEDBUFS from the >> test program and see if that works. > > That results in a NULL pointer

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jeff Moyer
Jens Axboe writes: > BTW, quick guess is that it doesn't work so well with fixed buffers, as that > hasn't been tested. You could try and remove IOCTX_FLAG_FIXEDBUFS from the > test program and see if that works. That results in a NULL pointer dereference. I'll stick to block device testing

[GIT PULL] first batch of nvme updates for 4.21

2018-12-07 Thread Christoph Hellwig
Hi Jens, please pull this first batch of nvme updates for Linux 4.21. Highlights: - target support for persistent discovery controllers (Jay Sternberg) - target optimizations to use non-blocking reads (Chaitanya Kulkarni) - host side support for the Enhanced Command Retry TP (Keith Busch) -

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jens Axboe
On 12/7/18 11:52 AM, Jens Axboe wrote: > On 12/7/18 11:48 AM, Jeff Moyer wrote: >> Hi, Jens, >> >> Jens Axboe writes: >> >>> You can also find the patches in my aio-poll branch: >>> >>> http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll >>> >>> or by cloning: >>> >>>

Re: [GIT PULL] Block fixes for 4.20-rc6

2018-12-07 Thread pr-tracker-bot
The pull request you sent on Fri, 7 Dec 2018 10:12:12 -0700: > git://git.kernel.dk/linux-block.git tags/for-linus-20181207 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/0b43a299794ee9dba2dc1b0f0290b1acab9d445d Thank you! -- Deet-doot-dot, I am a bot. ht

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jens Axboe
On 12/7/18 11:48 AM, Jeff Moyer wrote: > Hi, Jens, > > Jens Axboe writes: > >> You can also find the patches in my aio-poll branch: >> >> http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll >> >> or by cloning: >> >> git://git.kernel.dk/linux-block aio-poll > > I made an xfs file system on a

Re: [PATCHSET v5] Support for polled aio

2018-12-07 Thread Jeff Moyer
Hi, Jens, Jens Axboe writes: > You can also find the patches in my aio-poll branch: > > http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll > > or by cloning: > > git://git.kernel.dk/linux-block aio-poll I made an xfs file system on a partition of an nvme device. I created a 1 GB file on

Re: [PATCH V2] blk-mq: re-build queue map in case of kdump kernel

2018-12-07 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

[GIT PULL] Block fixes for 4.20-rc6

2018-12-07 Thread Jens Axboe
, and a regression fix for BFQ from this merge window. The BFQ fix looks bigger than it is, it's 90% comment updates. Please pull! git://git.kernel.dk/linux-block.git tags/for-linus-20181207 Israel Rukshin (1): nvmet-rdma: fix response

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Jens Axboe
On 12/7/18 9:41 AM, Bart Van Assche wrote: > On Fri, 2018-12-07 at 09:35 -0700, Jens Axboe wrote: >> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >> index 29bfe8017a2d..9e5bda8800f8 100644 >> --- a/block/blk-mq-sched.c >> +++ b/block/blk-mq-sched.c >> @@ -377,6 +377,16 @@ void

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Bart Van Assche
On Fri, 2018-12-07 at 09:35 -0700, Jens Axboe wrote: > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 29bfe8017a2d..9e5bda8800f8 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -377,6 +377,16 @@ void blk_mq_sched_insert_request(struct request *rq, > bool

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Jens Axboe
On 12/7/18 9:24 AM, Jens Axboe wrote: > On 12/7/18 9:19 AM, Bart Van Assche wrote: >> On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote: >>> Instead of making special cases for what we can direct issue, and now >>> having to deal with DM solving the livelock while still retaining a BUSY >>>

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Jens Axboe
On 12/7/18 9:19 AM, Bart Van Assche wrote: > On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote: >> Instead of making special cases for what we can direct issue, and now >> having to deal with DM solving the livelock while still retaining a BUSY >> condition feedback loop, always just add a

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Bart Van Assche
On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote: > Instead of making special cases for what we can direct issue, and now > having to deal with DM solving the livelock while still retaining a BUSY > condition feedback loop, always just add a request that has been through > ->queue_rq() to the

Re: [GIT PULL] nvme fixes for 4.20

2018-12-07 Thread Jens Axboe
On 12/7/18 8:38 AM, Christoph Hellwig wrote: > The following changes since commit ba7aeae5539c7a74cf07a2bc61281a93c50e: > > block, bfq: fix decrement of num_active_groups (2018-12-07 07:40:07 -0700) > > are available in the Git repository at: > > git://git.infradead.org/nvme.git

[GIT PULL] nvme fixes for 4.20

2018-12-07 Thread Christoph Hellwig
The following changes since commit ba7aeae5539c7a74cf07a2bc61281a93c50e: block, bfq: fix decrement of num_active_groups (2018-12-07 07:40:07 -0700) are available in the Git repository at: git://git.infradead.org/nvme.git nvme-4.20 for you to fetch changes up to

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Mike Snitzer
On Fri, Dec 07 2018 at 12:17am -0500, Jens Axboe wrote: > After the direct dispatch corruption fix, we permanently disallow direct > dispatch of non read/write requests. This works fine off the normal IO > path, as they will be retried like any other failed direct dispatch > request. But for the

Urgently need money? We can help you!

2018-12-07 Thread Mr. Muller Dieter
Urgently need money? We can help you! Are you by the current situation in trouble or threatens you in trouble? In this way, we give you the ability to take a new development. As a rich person I feel obliged to assist people who are struggling to give them a chance. Everyone deserved a second

Re: [PATCH 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery

2018-12-07 Thread Javier Gonzalez
> On 7 Dec 2018, at 13.03, Matias Bjørling wrote: > > On 12/07/2018 10:12 AM, Javier Gonzalez wrote: >>> On 6 Dec 2018, at 16.45, Igor Konopko wrote: >>> >>> When we are using PBLK with 0 sized metadata during recovery >>> process we need to reference a last page of bio. Currently >>> KASAN

Re: [PATCH 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery

2018-12-07 Thread Matias Bjørling
On 12/07/2018 10:12 AM, Javier Gonzalez wrote: On 6 Dec 2018, at 16.45, Igor Konopko wrote: When we are using PBLK with 0 sized metadata during recovery process we need to reference a last page of bio. Currently KASAN reports use-after-free in that case, since bio is freed on IO completion.

Re: [PATCH v2 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery

2018-12-07 Thread Matias Bjørling
On 12/07/2018 09:25 AM, Igor Konopko wrote: When we are using PBLK with 0 sized metadata during recovery process we need to reference a last page of bio. Currently KASAN reports use-after-free in that case, since bio is freed on IO completion. This patch adds addtional bio reference to ensure,

Re: [PATCH v2 1/2] lightnvm: pblk: Do not overwrite ppa list with meta list

2018-12-07 Thread Matias Bjørling
On 12/07/2018 09:25 AM, Igor Konopko wrote: Currently when using PBLK with 0 sized metadata both ppa list and meta list points to the same memory since pblk_dma_meta_size() returns 0 in that case. This commit fix that issue by ensuring that pblk_dma_meta_size() always returns space equal to

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Ming Lei
On Thu, Dec 06, 2018 at 10:17:44PM -0700, Jens Axboe wrote: > After the direct dispatch corruption fix, we permanently disallow direct > dispatch of non read/write requests. This works fine off the normal IO > path, as they will be retried like any other failed direct dispatch > request. But for

Re: [PATCH] blk-mq: fix corruption with direct issue

2018-12-07 Thread Ming Lei
On Fri, Dec 07, 2018 at 11:44:39AM +0800, Ming Lei wrote: > On Thu, Dec 06, 2018 at 09:46:42PM -0500, Theodore Y. Ts'o wrote: > > On Wed, Dec 05, 2018 at 11:03:01AM +0800, Ming Lei wrote: > > > > > > But at that time, there isn't io scheduler for MQ, so in theory the > > > issue should be there

Re: [PATCH 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery

2018-12-07 Thread Javier Gonzalez
> On 6 Dec 2018, at 16.45, Igor Konopko wrote: > > When we are using PBLK with 0 sized metadata during recovery > process we need to reference a last page of bio. Currently > KASAN reports use-after-free in that case, since bio is > freed on IO completion. > > This patch adds addtional bio

[PATCH v2 1/2] lightnvm: pblk: Do not overwrite ppa list with meta list

2018-12-07 Thread Igor Konopko
Currently when using PBLK with 0 sized metadata both ppa list and meta list points to the same memory since pblk_dma_meta_size() returns 0 in that case. This commit fix that issue by ensuring that pblk_dma_meta_size() always returns space equal to sizeof(struct pblk_sec_meta) and thus ppa list

[PATCH v2 2/2] lightnvm: pblk: Ensure that bio is not freed on recovery

2018-12-07 Thread Igor Konopko
When we are using PBLK with 0 sized metadata during recovery process we need to reference a last page of bio. Currently KASAN reports use-after-free in that case, since bio is freed on IO completion. This patch adds addtional bio reference to ensure, that we can still use bio memory after IO

Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list

2018-12-07 Thread Ming Lei
On Thu, Dec 06, 2018 at 10:17:44PM -0700, Jens Axboe wrote: > After the direct dispatch corruption fix, we permanently disallow direct > dispatch of non read/write requests. This works fine off the normal IO > path, as they will be retried like any other failed direct dispatch > request. But for