On 11/30/18 3:04 PM, Jeff Moyer wrote:
> Jens Axboe writes:
>
A limit of 4M is imposed as the largest buffer we currently support.
There's nothing preventing us from going larger, but we need some cap,
and 4M seemed like it would definitely be big enough.
>>>
>>> Doesn't this mean
On 11/30/18 2:34 PM, Jens Axboe wrote:
> On 11/30/18 2:25 PM, Al Viro wrote:
>> On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote:
> Would this make you happy:
>
> if (!is_vmalloc_addr(kv->iov_base))
> page = virt_to_page(kv->iov_base);
> else
>
Jens Axboe writes:
>>> A limit of 4M is imposed as the largest buffer we currently support.
>>> There's nothing preventing us from going larger, but we need some cap,
>>> and 4M seemed like it would definitely be big enough.
>>
>> Doesn't this mean that a user can pin a bunch of memory?
On 11/30/18 2:44 PM, Jeff Moyer wrote:
> Hi, Jens,
>
> Jens Axboe writes:
>
>> If we have fixed user buffers, we can map them into the kernel when we
>> setup the io_context. That avoids the need to do get_user_pages() for
>> each and every IO.
>>
>> To utilize this feature, the application
Hi, Jens,
Jens Axboe writes:
> If we have fixed user buffers, we can map them into the kernel when we
> setup the io_context. That avoids the need to do get_user_pages() for
> each and every IO.
>
> To utilize this feature, the application must set both
> IOCTX_FLAG_USERIOCB, to provide iocb's
On Fri, Nov 30, 2018 at 01:10:47PM -0700, Jens Axboe wrote:
> On 11/30/18 1:03 PM, Omar Sandoval wrote:
> > On Fri, Nov 30, 2018 at 09:01:17AM -0700, Jens Axboe wrote:
> >> sbitmap maintains a set of words that we use to set and clear bits, with
> >> each bit representing a tag for blk-mq. Even
On Fri, Nov 30, 2018 at 09:01:18AM -0700, Jens Axboe wrote:
> Even if we have no waiters on any of the sbitmap_queue wait states, we
> still have to loop every entry to check. We do this for every IO, so
> the cost adds up.
>
> Shift a bit of the cost to the slow path, when we actually have
On 11/30/18 2:25 PM, Al Viro wrote:
> On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote:
Would this make you happy:
if (!is_vmalloc_addr(kv->iov_base))
page = virt_to_page(kv->iov_base);
else
page = vmalloc_to_page(kv->iov_base);
>>>
>>> Free
On Fri, Nov 30, 2018 at 02:16:38PM -0700, Jens Axboe wrote:
> >> Would this make you happy:
> >>
> >> if (!is_vmalloc_addr(kv->iov_base))
> >> page = virt_to_page(kv->iov_base);
> >> else
> >> page = vmalloc_to_page(kv->iov_base);
> >
> > Free advice: don't ever let Linus see
On 11/30/18 2:11 PM, Al Viro wrote:
> On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote:
>> On 11/30/18 1:15 PM, Jens Axboe wrote:
>>> On 11/30/18 12:21 PM, Al Viro wrote:
On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
> For an ITER_KVEC, we can just iterate the iov
We only need the request fields and the end_io time if we have
stats enabled, or if we have a scheduler attached as those may
use it for completion time stats.
Signed-off-by: Jens Axboe
---
v2: add helper, use it in both spots. also clear ->start_time_ns
so merging doesn't read garbage.
On Fri, Nov 30, 2018 at 01:32:21PM -0700, Jens Axboe wrote:
> On 11/30/18 1:15 PM, Jens Axboe wrote:
> > On 11/30/18 12:21 PM, Al Viro wrote:
> >> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
> >>> For an ITER_KVEC, we can just iterate the iov and add the pages
> >>> to the bio
On 11/30/18 10:29 AM, Jens Axboe wrote:
> On 11/30/18 10:27 AM, Christoph Hellwig wrote:
>> On Fri, Nov 30, 2018 at 08:56:25AM -0700, Jens Axboe wrote:
>>> We only need the request fields and the end_io time if we have stats
>>> enabled, or if we have a scheduler attached as those may use it for
On Fri, Nov 30, 2018 at 01:36:09PM -0700, Jens Axboe wrote:
> On 11/30/18 1:26 PM, Keith Busch wrote:
> > A driver may wish to iterate every tagged request, not just ones that
> > satisfy blk_mq_request_started(). The intended use is so a driver may
> > terminate entered requests on quiesced
On 11/30/18 1:26 PM, Keith Busch wrote:
> A driver may wish to iterate every tagged request, not just ones that
> satisfy blk_mq_request_started(). The intended use is so a driver may
> terminate entered requests on quiesced queues.
How about we just move the started check into the handler passed
On 11/30/18 1:15 PM, Jens Axboe wrote:
> On 11/30/18 12:21 PM, Al Viro wrote:
>> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
>>> For an ITER_KVEC, we can just iterate the iov and add the pages
>>> to the bio directly.
>>
>>> + page = virt_to_page(kv->iov_base);
>>> +
The nvme driver checked the queue state on every IO so the path could
drain requests. The code however declares "We shold not need to do this",
so let's not do it. Instead, use blk-mq's tag iterator to terminate
entered requests on dying queues so the IO path doesn't have to deal
with these
A driver may wish to iterate every tagged request, not just ones that
satisfy blk_mq_request_started(). The intended use is so a driver may
terminate entered requests on quiesced queues.
Signed-off-by: Keith Busch
---
block/blk-mq-tag.c | 41 +++--
On 11/30/18 12:21 PM, Al Viro wrote:
> On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
>> For an ITER_KVEC, we can just iterate the iov and add the pages
>> to the bio directly.
>
>> +page = virt_to_page(kv->iov_base);
>> +size = bio_add_page(bio, page,
On 11/30/18 12:17 PM, Al Viro wrote:
> On Fri, Nov 30, 2018 at 09:56:45AM -0700, Jens Axboe wrote:
>> This explicitly sets up an ITER_KVEC from an iovec with kernel ranges
>> mapped.
>
>> +int import_kvec(int type, const struct kvec *kvecs, unsigned nr_segs,
>> +size_t bytes, struct
On 11/30/18 1:03 PM, Omar Sandoval wrote:
> On Fri, Nov 30, 2018 at 09:01:17AM -0700, Jens Axboe wrote:
>> sbitmap maintains a set of words that we use to set and clear bits, with
>> each bit representing a tag for blk-mq. Even though we spread the bits
>> out and maintain a hint cache, one
On Fri, Nov 30, 2018 at 09:01:17AM -0700, Jens Axboe wrote:
> sbitmap maintains a set of words that we use to set and clear bits, with
> each bit representing a tag for blk-mq. Even though we spread the bits
> out and maintain a hint cache, one particular bit allocated will end up
> being cleared
On Fri, Nov 30, 2018 at 09:56:43AM -0700, Jens Axboe wrote:
> For an ITER_KVEC, we can just iterate the iov and add the pages
> to the bio directly.
> + page = virt_to_page(kv->iov_base);
> + size = bio_add_page(bio, page, kv->iov_len,
> +
On Fri, Nov 30, 2018 at 09:56:45AM -0700, Jens Axboe wrote:
> This explicitly sets up an ITER_KVEC from an iovec with kernel ranges
> mapped.
> +int import_kvec(int type, const struct kvec *kvecs, unsigned nr_segs,
> + size_t bytes, struct iov_iter *iter)
> +{
> + const struct
On Fri, 2018-11-30 at 10:20 -0700, Jens Axboe wrote:
> On 11/30/18 10:18 AM, Bart Van Assche wrote:
> > On Sat, 2018-12-01 at 00:38 +0800, Ming Lei wrote:
> > > Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler
> > > attached")
> >
> > Since this patch fixes a bug introduced in
On 11/30/18 10:27 AM, Christoph Hellwig wrote:
> On Fri, Nov 30, 2018 at 08:56:25AM -0700, Jens Axboe wrote:
>> We only need the request fields and the end_io time if we have stats
>> enabled, or if we have a scheduler attached as those may use it for
>> completion time stats.
>>
>> Signed-off-by:
Looks good,
Reviewed-by: Christoph Hellwig
On Fri, Nov 30, 2018 at 08:56:25AM -0700, Jens Axboe wrote:
> We only need the request fields and the end_io time if we have stats
> enabled, or if we have a scheduler attached as those may use it for
> completion time stats.
>
> Signed-off-by: Jens Axboe
> ---
> block/blk-mq.c | 13
On Fri, 2018-11-30 at 10:08 -0700, Jens Axboe wrote:
> On 11/30/18 10:07 AM, Bart Van Assche wrote:
> > On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote:
> > > If the ioprio capability check fails, we return without putting
> > > the file pointer.
> > >
> > > Fixes: d9a08a9e616b ("fs: Add aio
On 11/30/18 10:18 AM, Bart Van Assche wrote:
> On Sat, 2018-12-01 at 00:38 +0800, Ming Lei wrote:
>> Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached")
>
> Since this patch fixes a bug introduced in kernel v4.16, does it need
> a "Cc: stable" tag?
Like the other one,
On Sat, 2018-12-01 at 00:38 +0800, Ming Lei wrote:
> Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached")
Since this patch fixes a bug introduced in kernel v4.16, does it need
a "Cc: stable" tag?
Thanks,
Bart.
On 11/30/18 10:12 AM, Bart Van Assche wrote:
> On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote:
>> We can't wait for polled events to complete, as they may require active
>> polling from whoever submitted it. If that is the same task that is
>> submitting new IO, we could deadlock waiting for
I think we'll need to queue this up for 4.21 ASAP independent of the
rest, given that with separate poll queues userspace could otherwise
submit I/O that will never get polled for anywhere.
On 11/30/18 10:13 AM, Christoph Hellwig wrote:
> I think we'll need to queue this up for 4.21 ASAP independent of the
> rest, given that with separate poll queues userspace could otherwise
> submit I/O that will never get polled for anywhere.
Probably a good idea, I can just move it to my 4.21
On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote:
> We can't wait for polled events to complete, as they may require active
> polling from whoever submitted it. If that is the same task that is
> submitting new IO, we could deadlock waiting for IO to complete that
> this task is supposed to be
On 11/30/18 9:26 AM, Christoph Hellwig wrote:
> On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote:
>> On 11/30/18 8:24 AM, Christoph Hellwig wrote:
>>> Various fixlets all over, including throwing in a 'default y' for the
>>> multipath code, given that we want people to actually enable it
On 11/30/18 9:38 AM, Ming Lei wrote:
> There are actually two kinds of discard merge:
>
> - one is the normal discard merge, just like normal read/write request,
> and call it single-range discard
>
> - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1
>
> For the former
On 11/30/18 10:07 AM, Bart Van Assche wrote:
> On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote:
>> If the ioprio capability check fails, we return without putting
>> the file pointer.
>>
>> Fixes: d9a08a9e616b ("fs: Add aio iopriority support")
>> Reviewed-by: Johannes Thumshirn
>>
On Fri, 2018-11-30 at 09:56 -0700, Jens Axboe wrote:
> If the ioprio capability check fails, we return without putting
> the file pointer.
>
> Fixes: d9a08a9e616b ("fs: Add aio iopriority support")
> Reviewed-by: Johannes Thumshirn
> Reviewed-by: Christoph Hellwig
> Signed-off-by: Jens Axboe
>
This adds support for sync/async O_DIRECT to make a kvec type iter
for bdev access, as well as iomap.
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 16
fs/iomap.c | 5 -
2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
This is just like io_setup(), except add a flags argument to let the
caller control/define some of the io_context behavior. Outside of that,
we pass in an iocb array for future use.
Signed-off-by: Jens Axboe
---
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/aio.c
For io_submit(), we have to first copy each pointer to an iocb, then
copy the iocb. The latter is 64 bytes in size, and that's a lot of
copying for a single IO.
Add support for setting IOCTX_FLAG_USERIOCB through the new io_setup2()
system call, which allows the iocbs to reside in userspace. If
For user mapped IO, we do get_user_pages() upfront, and then do a
put_page() on each page at end_io time to release the page reference. In
preparation for having permanently mapped pages, add a BIO_HOLD_PAGES
flag that tells us not to release the pages, the caller will do that.
Signed-off-by:
For an ITER_KVEC, we can just iterate the iov and add the pages
to the bio directly.
Signed-off-by: Jens Axboe
---
block/bio.c | 30 ++
include/linux/bio.h | 1 +
2 files changed, 31 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index
We have to add each submitted polled request to the io_context
poll_submitted list, which means we have to grab the poll_lock. We
already use the block plug to batch submissions if we're doing a batch
of IO submissions, extend that to cover the poll requests internally as
well.
Signed-off-by:
It's 192 bytes, fairly substantial. Most items don't need to be cleared,
especially not upfront. Clear the ones we do need to clear, and leave
the other ones for setup when the iocb is prepared and submitted.
Signed-off-by: Jens Axboe
---
fs/aio.c | 19 ---
1 file changed, 12
Signed-off-by: Jens Axboe
---
fs/aio.c | 20
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 291bbc62b2a8..341eb1b19319 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1088,6 +1088,16 @@ static bool get_reqs_available(struct kioctx *ctx)
Similarly to how we use the state->ios_left to know how many references
to get to a file, we can use it to allocate the aio_kiocb's we need in
bulk.
Signed-off-by: Jens Axboe
---
fs/aio.c | 42 +-
1 file changed, 37 insertions(+), 5 deletions(-)
diff
Add polled variants of PREAD/PREADV and PWRITE/PWRITEV. These act
like their non-polled counterparts, except we expect to poll for
completion of them. The polling happens at io_getevent() time, and
works just like non-polled IO.
To setup an io_context for polled IO, the application must call
On the submission side, add file reference batching to the
aio_submit_state. We get as many references as the number of iocbs we
are submitting, and drop unused ones if we end up switching files. The
assumption here is that we're usually only dealing with one fd, and if
there are multiple,
Signed-off-by: Jens Axboe
---
fs/aio.c | 14 ++
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index ba5758c854e8..12859ea1cb64 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1057,6 +1057,15 @@ static inline void iocb_put(struct aio_kiocb *iocb)
Some uses cases repeatedly get and put references to the same file, but
the only exposed interface is doing these one at the time. As each of
these entail an atomic inc or dec on a shared structure, that cost can
add up.
Add fget_many(), which works just like fget(), except it takes an
argument
Replace the percpu_ref_put() + kmem_cache_free() with a call to
iocb_put() instead.
Signed-off-by: Jens Axboe
---
fs/aio.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 533cb7b1112f..e8457f9486e3 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1878,10
In preparation of handing in iocbs in a different fashion as well. Also
make it clear that the iocb being passed in isn't modified, by marking
it const throughout.
Signed-off-by: Jens Axboe
---
fs/aio.c | 68 +++-
1 file changed, 38
If we have fixed user buffers, we can map them into the kernel when we
setup the io_context. That avoids the need to do get_user_pages() for
each and every IO.
To utilize this feature, the application must set both
IOCTX_FLAG_USERIOCB, to provide iocb's in userspace, and then
This explicitly sets up an ITER_KVEC from an iovec with kernel ranges
mapped.
Signed-off-by: Jens Axboe
---
include/linux/uio.h | 3 +++
lib/iov_iter.c | 35 ++-
2 files changed, 29 insertions(+), 9 deletions(-)
diff --git a/include/linux/uio.h
This is in preparation for certain types of IO not needing a ring
reserveration.
Signed-off-by: Jens Axboe
---
fs/aio.c | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index cf0de61743e8..eaceb40e6cf5 100644
--- a/fs/aio.c
From: Christoph Hellwig
This new methods is used to explicitly poll for I/O completion for an
iocb. It must be called for any iocb submitted asynchronously (that
is with a non-null ki_complete) which has the IOCB_HIPRI flag set.
The method is assisted by a new ki_cookie field in struct iocb to
From: Christoph Hellwig
Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device. Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: Christoph Hellwig
Plugging is meant to optimize submission of a string of IOs, if we don't
have more than 2 being submitted, don't bother setting up a plug.
Signed-off-by: Jens Axboe
---
fs/aio.c | 18 ++
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index
We know this is a read/write request, but in preparation for
having different kinds of those, ensure that we call the assigned
handler instead of assuming it's aio_complete_rq().
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 2 +-
1 file changed, 1 insertion(+), 1
We can't wait for polled events to complete, as they may require active
polling from whoever submitted it. If that is the same task that is
submitting new IO, we could deadlock waiting for IO to complete that
this task is supposed to be completing itself.
Signed-off-by: Jens Axboe
---
We can't wait for polled events to complete, as they may require active
polling from whoever submitted it. If that is the same task that is
submitting new IO, we could deadlock waiting for IO to complete that
this task is supposed to be completing itself.
Signed-off-by: Jens Axboe
---
For the grand introduction to this feature, see my original posting
here:
https://lore.kernel.org/linux-block/20181117235317.7366-1-ax...@kernel.dk/
and refer to the previous postings of this patchset for whatever
features were added there.
Outside of "just" supporting polled IO, it also adds
If the ioprio capability check fails, we return without putting
the file pointer.
Fixes: d9a08a9e616b ("fs: Add aio iopriority support")
Reviewed-by: Johannes Thumshirn
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/aio.c | 1 +
1 file changed, 1 insertion(+)
diff --git
From: Christoph Hellwig
Just call blk_poll on the iocb cookie, we can derive the block device
from the inode trivially.
Reviewed-by: Johannes Thumshirn
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 10 ++
1 file changed, 10 insertions(+)
diff --git
From: Christoph Hellwig
No one is going to poll for aio (yet), so we must clear the HIPRI
flag, as we would otherwise send it down the poll queues, where no
one will be polling for completions.
Signed-off-by: Christoph Hellwig
IOCB_HIPRI, not RWF_HIPRI.
Reviewed-by: Johannes Thumshirn
There are actually two kinds of discard merge:
- one is the normal discard merge, just like normal read/write request,
and call it single-range discard
- another is the multi-range discard, queue_max_discard_segments(rq->q) > 1
For the former case, queue_max_discard_segments(rq->q) is 1, and we
On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote:
> On 11/30/18 8:24 AM, Christoph Hellwig wrote:
> > Various fixlets all over, including throwing in a 'default y' for the
> > multipath code, given that we want people to actually enable it for full
> > functionality.
>
> Why enable it
sbitmap maintains a set of words that we use to set and clear bits, with
each bit representing a tag for blk-mq. Even though we spread the bits
out and maintain a hint cache, one particular bit allocated will end up
being cleared in the exact same spot.
This introduces batched clearing of bits.
Even if we have no waiters on any of the sbitmap_queue wait states, we
still have to loop every entry to check. We do this for every IO, so
the cost adds up.
Shift a bit of the cost to the slow path, when we actually have waiters.
Wrap prepare_to_wait_exclusive() and finish_wait(), so we can
This versions tests out solid, and we're still seeing the same
improvements.
Changes:
- Lock map index for the move. This eliminates the race completely,
since it's now not possible to find ->cleared == 0 while swap of
bits is in progress. The previous version was fine for users that
We only need the request fields and the end_io time if we have stats
enabled, or if we have a scheduler attached as those may use it for
completion time stats.
Signed-off-by: Jens Axboe
---
block/blk-mq.c | 13 ++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git
On 11/30/18 1:00 AM, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote:
>> On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
>>> +enum hctx_type {
>>> + HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */
>>> + HCTX_TYPE_READ,
On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote:
> On 11/30/18 8:24 AM, Christoph Hellwig wrote:
> > Various fixlets all over, including throwing in a 'default y' for the
> > multipath code, given that we want people to actually enable it for full
> > functionality.
>
> Why enable it
On 11/30/18 8:24 AM, Christoph Hellwig wrote:
> Various fixlets all over, including throwing in a 'default y' for the
> multipath code, given that we want people to actually enable it for full
> functionality.
Why enable it by default? 99.9% of users aren't going to care. That
seems like an odd
Various fixlets all over, including throwing in a 'default y' for the
multipath code, given that we want people to actually enable it for full
functionality.
The following changes since commit 14b04063cc994effc86f976625bf8f806d8d44cb:
Merge branch 'nvme-4.20' of git://git.infradead.org/nvme
On Fri, Nov 30, 2018 at 03:20:51PM +, Jens Axboe wrote:
> Thanks - are you going to post a v3? Would like to get this staged.
Yes, will do. Either late tonight or over the weekend.
On 11/30/18 12:56 AM, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 07:50:09PM +, Jens Axboe wrote:
>>> in our post-spectre world. Also having too many queue type is just
>>> going to create confusion, so I'd rather manage them centrally.
>>>
>>> Note that the queue type naming and
On 11/30/18 1:38 AM, Christoph Hellwig wrote:
> I like the idea, but I don't think it is correct as-is. We can't just
> move the blkdev_bio_dirty_release for the last bio into the caller, as
> the completion order that decrements the refcount might have different
> ordering.
Ugh yeah, not sure
On Fri, Nov 30, 2018 at 12:08:09AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote:
> > > +
> > > + /* handle any remaining CQEs */
> > > + if (opcode ==
On Fri, Nov 30, 2018 at 12:00:13AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
> > > +enum hctx_type {
> > > + HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */
On 11/30/2018 12:43 PM, Igor Konopko wrote:
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata is supported in pblk.
Also there is an support for case without NVMe metadata.
Changes v4 --> v5:
-rebase on top of
I just started a regression test on this patch set that'll run over
the weekend. I'll add a tested-by if everything checks out.
All the best,
Hans
On Fri, Nov 30, 2018 at 12:49 PM Igor Konopko wrote:
>
> This series of patches extends the way how pblk can
> store L2P sector metadata. After this
Currently whole lightnvm and pblk uses single DMA pool,
for which entry size is always equal to PAGE_SIZE.
PPA list always needs 8B*64, so there is only 56B*64
space for OOB meta. Since NVMe OOB meta can be bigger,
such as 128B, this solution is not robustness.
This patch add the possiblity to
Currently pblk assumes that size of OOB metadata on drive is always
equal to size of pblk_sec_meta struct. This commit add helpers which will
allow to handle different sizes of OOB metadata on drive in the future.
Still, after this patch only OOB metadata equal to 16 bytes is supported.
Currently pblk and lightnvm does only check for size
of OOB metadata and does not care wheather this meta
is located in separate buffer or is interleaved with
data in single buffer.
In reality only the first scenario is supported, where
second mode will break pblk functionality during any
IO
In current pblk implementation, l2p mapping for not closed lines
is always stored only in OOB metadata and recovered from it.
Such a solution does not provide data integrity when drives does
not have such a OOB metadata space.
The goal of this patch is to add support for so called packed
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata is supported in pblk.
Also there is an support for case without NVMe metadata.
Changes v4 --> v5:
-rebase on top of ocssd/for-4.21/core
Changes v3 --> v4:
-rename
Currently DMA allocated memory is reused on partial read
for lba_list_mem and lba_list_media arrays. In preparation
for dynamic DMA pool sizes we need to move this arrays
into pblk_pr_ctx structures.
Reviewed-by: Javier González
Signed-off-by: Igor Konopko
---
drivers/lightnvm/pblk-read.c | 20
On 11/29/2018 08:16 AM, Igor Konopko wrote:
This series of patches extends the way how pblk can
store L2P sector metadata. After this set of changes
any size of NVMe metadata is supported in pblk.
Also there is an support for case without NVMe metadata.
Changes v3 --> v4:
-rename
I like the idea, but I don't think it is correct as-is. We can't just
move the blkdev_bio_dirty_release for the last bio into the caller, as
the completion order that decrements the refcount might have different
ordering.
Something like the simpler patch below might archive your goal, though:
On Thu, Nov 29, 2018 at 02:08:40PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 08:13:05PM +0100, Christoph Hellwig wrote:
> > @@ -1050,12 +1051,16 @@ static irqreturn_t nvme_irq(int irq, void *data)
> > irqreturn_t ret = IRQ_NONE;
> > u16 start, end;
> >
> > -
On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote:
> > This is the last place outside of nvme_irq that handles CQEs from
> > interrupt context, and thus is in the way of removing the cq_lock for
> > normal queues, and
On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote:
> On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
> > +enum hctx_type {
> > + HCTX_TYPE_DEFAULT, /* all I/O not otherwise accounted for */
> > + HCTX_TYPE_READ, /* just for READ I/O */
> > +
95 matches
Mail list logo