For user mapped IO, we do get_user_pages() upfront, and then do a
put_page() on each page at end_io time to release the page reference. In
preparation for having permanently mapped pages, add a BIO_HOLD_PAGES
flag that tells us not to release the pages, the caller will do that.
Signed-off-by: Jens
Similarly to how we use the state->ios_left to know how many references
to get to a file, we can use it to allocate the aio_kiocb's we need in
bulk.
Signed-off-by: Jens Axboe
---
fs/aio.c | 47 +++
1 file changed, 39 insertions(+), 8 deletions(-)
diff
Experimental support for submitting and completing IO through rings
shared between the application and kernel.
The submission rings are struct iocb, like we would submit through
io_submit(), and the completion rings are struct io_event, like we
would pass in (and copy back) from io_getevents().
A
Signed-off-by: Jens Axboe
---
fs/aio.c | 17 -
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index fd323b3ba499..de48faeab0fd 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1218,12 +1218,9 @@ static void aio_fill_event(struct io_event *ev, struct
ai
Some uses cases repeatedly get and put references to the same file, but
the only exposed interface is doing these one at the time. As each of
these entail an atomic inc or dec on a shared structure, that cost can
add up.
Add fget_many(), which works just like fget(), except it takes an
argument fo
On the submission side, add file reference batching to the
aio_submit_state. We get as many references as the number of iocbs we
are submitting, and drop unused ones if we end up switching files. The
assumption here is that we're usually only dealing with one fd, and if
there are multiple, hopefuly
If we have fixed user buffers, we can map them into the kernel when we
setup the io_context. That avoids the need to do get_user_pages() for
each and every IO.
To utilize this feature, the application must set both
IOCTX_FLAG_USERIOCB, to provide iocb's in userspace, and then
IOCTX_FLAG_FIXEDBUFS.
In preparation from having pre-allocated requests, that we then just
need to initialize before use.
Signed-off-by: Jens Axboe
---
fs/aio.c | 17 +++--
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 3c07cc9cb11a..51c7159f09bf 100644
--- a/fs/a
This adds support for sync/async O_DIRECT to make a bvec type iter
for bdev access, as well as iomap.
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 16
fs/iomap.c | 10 +++---
2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.
For an ITER_BVEC, we can just iterate the iov and add the pages
to the bio directly.
Signed-off-by: Jens Axboe
---
block/bio.c | 27 +++
include/linux/bio.h | 1 +
2 files changed, 28 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index 3e45e5650265..8158e
For io_submit(), we have to first copy each pointer to an iocb, then
copy the iocb. The latter is 64 bytes in size, and that's a lot of
copying for a single IO.
Add support for setting IOCTX_FLAG_USERIOCB through the new io_setup2()
system call, which allows the iocbs to reside in userspace. If th
In preparation of handing in iocbs in a different fashion as well. Also
make it clear that the iocb being passed in isn't modified, by marking
it const throughout.
Signed-off-by: Jens Axboe
---
fs/aio.c | 68 +++-
1 file changed, 38 insertions(
We have to add each submitted polled request to the io_context
poll_submitted list, which means we have to grab the poll_lock. We
already use the block plug to batch submissions if we're doing a batch
of IO submissions, extend that to cover the poll requests internally as
well.
Signed-off-by: Jens
Signed-off-by: Jens Axboe
---
fs/aio.c | 14 ++
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 06c8bcc72496..173f1f79dc8f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1063,6 +1063,15 @@ static inline void iocb_put(struct aio_kiocb *iocb)
}
It's 192 bytes, fairly substantial. Most items don't need to be cleared,
especially not upfront. Clear the ones we do need to clear, and leave
the other ones for setup when the iocb is prepared and submitted.
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 9 +++--
1
This is just like io_setup(), except add a flags argument to let the
caller control/define some of the io_context behavior.
Outside of the flags, we add an iocb array and two user pointers for
future use.
Signed-off-by: Jens Axboe
---
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/aio.c
From: Christoph Hellwig
This is in preparation for certain types of IO not needing a ring
reserveration.
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/fs/aio.c b/fs/a
From: Christoph Hellwig
Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device. Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: Christoph Hellwig
Modified
Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act
like their non-polled counterparts, except we expect to poll for
completion of them. The polling happens at io_getevent() time, and
works just like non-polled IO.
To setup an io_context for polled IO, the application must call
io_se
Plugging is meant to optimize submission of a string of IOs, if we don't
have more than 2 being submitted, don't bother setting up a plug.
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 18 ++
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git
Replace the percpu_ref_put() + kmem_cache_free() with a call to
iocb_put() instead.
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index ed6c3914477a..cf93b92bfb1e 100644
--- a/fs/ai
Tell the block layer if it's a sync or async polled request, so it can
do the right thing.
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6de8d35f6e41..b8f574615792 100644
--- a/fs/blo
We know this is a read/write request, but in preparation for
having different kinds of those, ensure that we call the assigned
handler instead of assuming it's aio_complete_rq().
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 2 +-
1 file changed, 1 insertion(+), 1 dele
From: Christoph Hellwig
This new methods is used to explicitly poll for I/O completion for an
iocb. It must be called for any iocb submitted asynchronously (that
is with a non-null ki_complete) which has the IOCB_HIPRI flag set.
The method is assisted by a new ki_cookie field in struct iocb to
For the grand introduction to this feature, see my original posting
here:
https://lore.kernel.org/linux-block/20181117235317.7366-1-ax...@kernel.dk/
and refer to the previous postings of this patchset for whatever
features were added there. Particularly v4 has some performance results:
https://l
For the upcoming async polled IO, we can't sleep allocating requests.
If we do, then we introduce a deadlock where the submitter already
has async polled IO in-flight, but can't wait for them to complete
since polled requests must be active found and reaped.
Signed-off-by: Jens Axboe
---
include
From: Christoph Hellwig
Just call blk_poll on the iocb cookie, we can derive the block device
from the inode trivially.
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 10 ++
1 file changed, 10 insertions(+)
diff --git
On 12/7/18 3:00 PM, Jens Axboe wrote:
> On 12/7/18 2:59 PM, Jens Axboe wrote:
>> On 12/7/18 2:58 PM, Jens Axboe wrote:
>>> On 12/7/18 12:35 PM, Jens Axboe wrote:
On 12/7/18 12:34 PM, Jeff Moyer wrote:
> Jens Axboe writes:
>
>> BTW, quick guess is that it doesn't work so well with
On 12/7/18 2:59 PM, Jens Axboe wrote:
> On 12/7/18 2:58 PM, Jens Axboe wrote:
>> On 12/7/18 12:35 PM, Jens Axboe wrote:
>>> On 12/7/18 12:34 PM, Jeff Moyer wrote:
Jens Axboe writes:
> BTW, quick guess is that it doesn't work so well with fixed buffers, as
> that
> hasn't bee
On 12/7/18 2:58 PM, Jens Axboe wrote:
> On 12/7/18 12:35 PM, Jens Axboe wrote:
>> On 12/7/18 12:34 PM, Jeff Moyer wrote:
>>> Jens Axboe writes:
>>>
BTW, quick guess is that it doesn't work so well with fixed buffers, as
that
hasn't been tested. You could try and remove IOCTX_FLAG_F
On 12/7/18 12:35 PM, Jens Axboe wrote:
> On 12/7/18 12:34 PM, Jeff Moyer wrote:
>> Jens Axboe writes:
>>
>>> BTW, quick guess is that it doesn't work so well with fixed buffers, as that
>>> hasn't been tested. You could try and remove IOCTX_FLAG_FIXEDBUFS from the
>>> test program and see if that
On Fri, Dec 07 2018 at 2:43pm -0500,
Christoph Hellwig wrote:
> On Thu, Dec 06, 2018 at 05:15:07PM -0500, Mike Snitzer wrote:
> > From: Eric Wheeler
> >
> > Since dm-crypt queues writes (and sometimes reads) to a different kernel
> > thread (workqueue), the bios will dispatch from tasks with d
On 12/7/18 12:20 PM, Christoph Hellwig wrote:
> Hi Jens,
>
> please pull this first batch of nvme updates for Linux 4.21.
>
> Highlights:
> - target support for persistent discovery controllers (Jay Sternberg)
> - target optimizations to use non-blocking reads (Chaitanya Kulkarni)
> - host sid
On Thu, Dec 06, 2018 at 05:15:07PM -0500, Mike Snitzer wrote:
> From: Eric Wheeler
>
> Since dm-crypt queues writes (and sometimes reads) to a different kernel
> thread (workqueue), the bios will dispatch from tasks with different
> io_context->ioprio settings than the submitting task, thus givin
On 12/7/18 12:34 PM, Jeff Moyer wrote:
> Jens Axboe writes:
>
>> BTW, quick guess is that it doesn't work so well with fixed buffers, as that
>> hasn't been tested. You could try and remove IOCTX_FLAG_FIXEDBUFS from the
>> test program and see if that works.
>
> That results in a NULL pointer de
Jens Axboe writes:
> BTW, quick guess is that it doesn't work so well with fixed buffers, as that
> hasn't been tested. You could try and remove IOCTX_FLAG_FIXEDBUFS from the
> test program and see if that works.
That results in a NULL pointer dereference. I'll stick to block device
testing for
Hi Jens,
please pull this first batch of nvme updates for Linux 4.21.
Highlights:
- target support for persistent discovery controllers (Jay Sternberg)
- target optimizations to use non-blocking reads (Chaitanya Kulkarni)
- host side support for the Enhanced Command Retry TP (Keith Busch)
- h
On 12/7/18 11:52 AM, Jens Axboe wrote:
> On 12/7/18 11:48 AM, Jeff Moyer wrote:
>> Hi, Jens,
>>
>> Jens Axboe writes:
>>
>>> You can also find the patches in my aio-poll branch:
>>>
>>> http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll
>>>
>>> or by cloning:
>>>
>>> git://git.kernel.dk/linux-b
The pull request you sent on Fri, 7 Dec 2018 10:12:12 -0700:
> git://git.kernel.dk/linux-block.git tags/for-linus-20181207
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/0b43a299794ee9dba2dc1b0f0290b1acab9d445d
Thank you!
--
Deet-doot-dot, I am a bot.
ht
On 12/7/18 11:48 AM, Jeff Moyer wrote:
> Hi, Jens,
>
> Jens Axboe writes:
>
>> You can also find the patches in my aio-poll branch:
>>
>> http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll
>>
>> or by cloning:
>>
>> git://git.kernel.dk/linux-block aio-poll
>
> I made an xfs file system on a
Hi, Jens,
Jens Axboe writes:
> You can also find the patches in my aio-poll branch:
>
> http://git.kernel.dk/cgit/linux-block/log/?h=aio-poll
>
> or by cloning:
>
> git://git.kernel.dk/linux-block aio-poll
I made an xfs file system on a partition of an nvme device. I created a
1 GB file on tha
Reviewed-by: Sagi Grimberg
o small fixes, and a
regression fix for BFQ from this merge window. The BFQ fix looks bigger
than it is, it's 90% comment updates.
Please pull!
git://git.kernel.dk/linux-block.git tags/for-linus-20181207
Israel Rukshin (1):
On 12/7/18 9:41 AM, Bart Van Assche wrote:
> On Fri, 2018-12-07 at 09:35 -0700, Jens Axboe wrote:
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 29bfe8017a2d..9e5bda8800f8 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -377,6 +377,16 @@ void blk_mq_sc
On Fri, 2018-12-07 at 09:35 -0700, Jens Axboe wrote:
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 29bfe8017a2d..9e5bda8800f8 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -377,6 +377,16 @@ void blk_mq_sched_insert_request(struct request *rq,
> bool at_
On 12/7/18 9:24 AM, Jens Axboe wrote:
> On 12/7/18 9:19 AM, Bart Van Assche wrote:
>> On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote:
>>> Instead of making special cases for what we can direct issue, and now
>>> having to deal with DM solving the livelock while still retaining a BUSY
>>> condi
On 12/7/18 9:19 AM, Bart Van Assche wrote:
> On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote:
>> Instead of making special cases for what we can direct issue, and now
>> having to deal with DM solving the livelock while still retaining a BUSY
>> condition feedback loop, always just add a reques
On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote:
> Instead of making special cases for what we can direct issue, and now
> having to deal with DM solving the livelock while still retaining a BUSY
> condition feedback loop, always just add a request that has been through
> ->queue_rq() to the ha
On 12/07/2018 11:15 PM, Paul Durrant wrote:
>> -Original Message-
>> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
>> Sent: 07 December 2018 15:10
>> To: Paul Durrant ; linux-ker...@vger.kernel.org;
>> xen-de...@lists.xenproject.org; linux-block@vger.kernel.org
>> Cc: ax...@kernel.
On 12/7/18 8:38 AM, Christoph Hellwig wrote:
> The following changes since commit ba7aeae5539c7a74cf07a2bc61281a93c50e:
>
> block, bfq: fix decrement of num_active_groups (2018-12-07 07:40:07 -0700)
>
> are available in the Git repository at:
>
> git://git.infradead.org/nvme.git nvme-4.2
The following changes since commit ba7aeae5539c7a74cf07a2bc61281a93c50e:
block, bfq: fix decrement of num_active_groups (2018-12-07 07:40:07 -0700)
are available in the Git repository at:
git://git.infradead.org/nvme.git nvme-4.20
for you to fetch changes up to d7dcdf9d4e15189ecfda24cc8
> -Original Message-
> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
> Sent: 07 December 2018 15:10
> To: Paul Durrant ; linux-ker...@vger.kernel.org;
> xen-de...@lists.xenproject.org; linux-block@vger.kernel.org
> Cc: ax...@kernel.dk; Roger Pau Monne ;
> konrad.w...@oracle.com
> Subj
On Fri, Dec 07 2018 at 12:17am -0500,
Jens Axboe wrote:
> After the direct dispatch corruption fix, we permanently disallow direct
> dispatch of non read/write requests. This works fine off the normal IO
> path, as they will be retried like any other failed direct dispatch
> request. But for the
Hi Paul,
On 12/07/2018 05:39 PM, Paul Durrant wrote:
>> -Original Message-
>> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
>> Of Dongli Zhang
>> Sent: 07 December 2018 04:18
>> To: linux-ker...@vger.kernel.org; xen-de...@lists.xenproject.org; linux-
>> bl...@vg
On 12/7/18 3:01 AM, Paolo Valente wrote:
>
>
>> Il giorno 7 dic 2018, alle ore 03:23, Jens Axboe ha
>> scritto:
>>
>> On 12/6/18 11:18 AM, Paolo Valente wrote:
>>> Hi Jens,
>>> the first patch in this series fixes an error in the decrementing of
>>> the counter of the number of groups with pend
Urgently need money? We can help you!
Are you by the current situation in trouble or threatens you in trouble?
In this way, we give you the ability to take a new development.
As a rich person I feel obliged to assist people who are struggling to give
them a chance. Everyone deserved a second chanc
> On 7 Dec 2018, at 13.03, Matias Bjørling wrote:
>
> On 12/07/2018 10:12 AM, Javier Gonzalez wrote:
>>> On 6 Dec 2018, at 16.45, Igor Konopko wrote:
>>>
>>> When we are using PBLK with 0 sized metadata during recovery
>>> process we need to reference a last page of bio. Currently
>>> KASAN re
On 12/07/2018 10:12 AM, Javier Gonzalez wrote:
On 6 Dec 2018, at 16.45, Igor Konopko wrote:
When we are using PBLK with 0 sized metadata during recovery
process we need to reference a last page of bio. Currently
KASAN reports use-after-free in that case, since bio is
freed on IO completion.
On 12/07/2018 09:25 AM, Igor Konopko wrote:
When we are using PBLK with 0 sized metadata during recovery
process we need to reference a last page of bio. Currently
KASAN reports use-after-free in that case, since bio is
freed on IO completion.
This patch adds addtional bio reference to ensure, t
On 12/07/2018 09:25 AM, Igor Konopko wrote:
Currently when using PBLK with 0 sized metadata both ppa list
and meta list points to the same memory since pblk_dma_meta_size()
returns 0 in that case.
This commit fix that issue by ensuring that pblk_dma_meta_size()
always returns space equal to size
On Fri 07-12-18 11:24:47, Alexander Lochmann wrote:
> Am 05.12.18 um 16:32 schrieb Jan Kara:
> >
> > Thinking more about this I'm not sure if this is actually the right
> > solution. Because for example the write(2) can set S_NOSEC flag wrongly
> > when it would race with chmod adding SUID bit. So
On Thu, Dec 06, 2018 at 10:17:44PM -0700, Jens Axboe wrote:
> After the direct dispatch corruption fix, we permanently disallow direct
> dispatch of non read/write requests. This works fine off the normal IO
> path, as they will be retried like any other failed direct dispatch
> request. But for th
Am 05.12.18 um 16:32 schrieb Jan Kara:
>
> Thinking more about this I'm not sure if this is actually the right
> solution. Because for example the write(2) can set S_NOSEC flag wrongly
> when it would race with chmod adding SUID bit. So probably we rather need
> to acquire i_rwsem in blkdev_write_
> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Friday, December 7, 2018 3:50 PM
> To: Kashyap Desai
> Cc: Bart Van Assche; linux-block; Jens Axboe; linux-scsi; Suganath Prabu
> Subramani; Sreekanth Reddy; Sathya Prakash Veerichetty
> Subject: Re: +AFs-PATCH+AF0-
On Thu, Dec 06, 2018 at 11:15:13AM +0530, Kashyap Desai wrote:
> >
> > If the 'tag' passed to scsi_host_find_tag() is valid, I think there
> > shouldn't have such issue.
> >
> > If you want to find outstanding IOs, maybe you can try
> > blk_mq_queue_tag_busy_iter()
> > or blk_mq_tagset_busy_iter(),
> Il giorno 7 dic 2018, alle ore 03:23, Jens Axboe ha scritto:
>
> On 12/6/18 11:18 AM, Paolo Valente wrote:
>> Hi Jens,
>> the first patch in this series fixes an error in the decrementing of
>> the counter of the number of groups with pending I/O. This wrong
>> decrement caused loss of throu
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Dongli Zhang
> Sent: 07 December 2018 04:18
> To: linux-ker...@vger.kernel.org; xen-de...@lists.xenproject.org; linux-
> bl...@vger.kernel.org
> Cc: ax...@kernel.dk; Roger Pau Monne ;
> kon
On Fri, Dec 07, 2018 at 11:44:39AM +0800, Ming Lei wrote:
> On Thu, Dec 06, 2018 at 09:46:42PM -0500, Theodore Y. Ts'o wrote:
> > On Wed, Dec 05, 2018 at 11:03:01AM +0800, Ming Lei wrote:
> > >
> > > But at that time, there isn't io scheduler for MQ, so in theory the
> > > issue should be there si
> On 6 Dec 2018, at 16.45, Igor Konopko wrote:
>
> When we are using PBLK with 0 sized metadata during recovery
> process we need to reference a last page of bio. Currently
> KASAN reports use-after-free in that case, since bio is
> freed on IO completion.
>
> This patch adds addtional bio refe
Currently when using PBLK with 0 sized metadata both ppa list
and meta list points to the same memory since pblk_dma_meta_size()
returns 0 in that case.
This commit fix that issue by ensuring that pblk_dma_meta_size()
always returns space equal to sizeof(struct pblk_sec_meta) and thus
ppa list and
When we are using PBLK with 0 sized metadata during recovery
process we need to reference a last page of bio. Currently
KASAN reports use-after-free in that case, since bio is
freed on IO completion.
This patch adds addtional bio reference to ensure, that we
can still use bio memory after IO compl
On Thu, Dec 06, 2018 at 10:17:44PM -0700, Jens Axboe wrote:
> After the direct dispatch corruption fix, we permanently disallow direct
> dispatch of non read/write requests. This works fine off the normal IO
> path, as they will be retried like any other failed direct dispatch
> request. But for th
72 matches
Mail list logo