On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote:
> On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote:
> >
> > +/**
> > + * bio_iov_iter_get_pages - pin user or kernel pages and add them
> > to a bio
> > + * @bio: bio to add pages to
> > + * @i
_IO() path, the call to do_blockdev_direct_IO() has been
replaced by __blkdev_direct_IO() in 542ff7bf18c6.
do_blockdev_direct_IO() takes care not to leave async bios in flight for
partial writes if an error occurs. Implement the same semantics for
__blkdev_direct_IO().
Signed-off-by: Martin Wilck
Fixes:
an Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Martin Wilck
---
block/bio.c | 18 --
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 67eff5e..489a430 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -912,16 +912,16 @@
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
simplified bdev direct-io")
Reviewed-by: Ming Lei
Signed-off-by: Martin Wilck
---
fs/block_dev.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0dd87a
quot;short read", and __generic_file_write_iter()
falls back to buffered writes, which may lead to data corruption.
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
simplified bdev direct-io")
Signed-off-by: Martin Wilck
---
block/bio.c | 35 ---
ace in bio exhausted (Jan)
- 3/3: add comments
Martin Wilck (3):
block: bio_iov_iter_get_pages: fix size of last iovec
blkdev: __blkdev_direct_IO_simple: fix leak in error case
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
> > This avoids race condition the driver would otherwise have if these
> > groups need to be created with sysfs_add_groups().
> >
> > Signed-off-by: Martin Wilck
> > Signed-off-by: Hannes Reinecke
>
> This should be From: Martin, shouldn't it?
I'm fine w
On Sat, 2018-07-21 at 00:16 +0800, Ming Lei wrote:
> On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > segment from the iov_iter to the bio. Some callers prefer to
> > obtain as many pages as w
On Fri, 2018-07-20 at 17:11 +0200, Christoph Hellwig wrote:
> On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > segment from the iov_iter to the bio. Some callers prefer to
> > obtain as ma
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
simplified bdev direct-io")
Reviewed-by: Ming Lei
Signed-off-by: Martin Wilck
---
fs/block_dev.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0dd87a
, __generic_file_write_iter() falls
back to buffered writes, which has been observed to cause data
corruption in certain workloads.
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
simplified bdev direct-io")
Signed-off-by: Martin Wilck
---
fs/block_dev.c | 9 +++
bio_iov_iter_get_pages() only adds pages for the next non-zero
segment from the iov_iter to the bio. Some callers prefer to
obtain as many pages as would fit into the bio, with proper
rollback in case of failure. Add bio_iov_iter_get_all_pages()
for this purpose.
Signed-off-by: Martin Wilck
an Kara
Reviewed-by: Christoph Hellwig
Signed-off-by: Martin Wilck
---
block/bio.c | 18 --
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 67eff5e..489a430 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -912,16 +912,16 @@
(3/4, 4/4).
- 3/4: add a new helper to retrieve as many pages as possible (Ming)
- 3/4: put pages in case of error (Ming)
Martin Wilck (4):
block: bio_iov_iter_get_pages: fix size of last iovec
blkdev: __blkdev_direct_IO_simple: fix leak in error case
block: add bio_iov_iter_g
On Fri, 2018-07-20 at 10:41 +0800, Ming Lei wrote:
> On Thu, Jul 19, 2018 at 11:01:58PM +0200, Martin Wilck wrote:
> > When bio_iov_iter_get_pages() is called
> > from __blkdev_direct_IO_simple(),
> > we already know that the content of the input iov_iter fits into a
>
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev
direct-io")
Signed-off-by: Martin Wilck
---
fs/block_dev.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0dd87aa..aba2541 10064
irst patch was at v2 already in the previous
submission.
Changes wrt v1:
- 1/3: minor formatting change (Christoph)
- 2/3: split off the leak fix (Ming)
- 3/3: give up if bio_iov_iter_get_pages() returns an error (Jan)
- 3/3: warn if space in bio exhausted (Jan)
- 3/3: add comments
Marti
invocation, the correct index is (nr_pages - 1).
v2: improved readability following suggestions from Ming Lei.
v3: followed a formatting suggestion from Christoph Hellwig.
Fixes: 2cefe4dbaadf ("block: add bio_iov_iter_get_pages()")
Signed-off-by: Martin Wilck
Reviewed-by: Hannes Reinecke
Revie
l bio worth of IO for simplified bdev
direct-io")
Signed-off-by: Martin Wilck
---
fs/block_dev.c | 18 ++
1 file changed, 18 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index aba2541..561c34e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -222
On Thu, 2018-07-19 at 17:15 +0200, Jan Kara wrote:
> On Thu 19-07-18 14:23:53, Martin Wilck wrote:
> > On Thu, 2018-07-19 at 12:45 +0200, Jan Kara wrote:
> > > Secondly, I don't think it is good to discard error from
> > > bio_iov_iter_get_pages() here
ends on the segment structure of the input iov_iter. If the input
iovec has just a single segment, it fills the bio in a single call.
With multiple segments, it just returns the page(s) of the first
segment. The point of my patch is to make no difference between single-
segment and multi-segment IOs.
Rega
ges() returns less
data than possible. The function just takes one iteration step at a
time, and thus iterating over it until the desired number of pages is
obtained, and collecting the resulting pages in a single bio, isn't
"fragile", it's just the natural thing to do.
Martin
--
Dr. Ma
On Thu, 2018-07-19 at 17:21 +0200, Jan Kara wrote:
> On Thu 19-07-18 20:20:51, Ming Lei wrote:
> > On Thu, Jul 19, 2018 at 01:56:16PM +0200, Jan Kara wrote:
> > > On Thu 19-07-18 19:04:46, Ming Lei wrote:
> > > > On Thu, Jul 19, 2018 at 11:39:
On Thu, 2018-07-19 at 13:56 +0200, Jan Kara wrote:
> On Thu 19-07-18 19:04:46, Ming Lei wrote:
> > On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote:
> > > bio_iov_iter_get_pages() returns only pages for a single non-
> > > empty
> > > segment of
On Thu, 2018-07-19 at 12:45 +0200, Jan Kara wrote:
> On Thu 19-07-18 11:39:18, Martin Wilck wrote:
> > bio_iov_iter_get_pages() returns only pages for a single non-empty
> > segment of the input iov_iter's iovec. This may be much less than
> > the number
> > of page
and the customer's
tests. It'd be possible to detect this situation and merge bi_io_vec slots
that refer to the same page, but I prefer to keep it simple for now.
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev
direct-io")
Signed-off-by: Martin Wilc
se.
Looking forward to your opinions and suggestions.
Regards
Martin
Martin Wilck (2):
block: bio_iov_iter_get_pages: fix size of last iovec
blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio
block/bio.c| 18 --
fs/block_dev.c | 8 +++-
2 files changed,
invocation, the correct index is (nr_pages - 1).
V2: improved readability following suggestions from Ming Lei.
Fixes: 2cefe4dbaadf ("block: add bio_iov_iter_get_pages()")
Signed-off-by: Martin Wilck
---
block/bio.c | 18 --
1 file changed, 8 insertions(+), 10 deletions(-)
diff
On Wed, 2018-07-18 at 10:48 +0800, Ming Lei wrote:
> On Wed, Jul 18, 2018 at 02:07:28AM +0200, Martin Wilck wrote:
> >
> > From b75adc856119346e02126cf8975755300f2d9b7f Mon Sep 17 00:00:00
> > 2001
> > From: Martin Wilck
> > Date: Wed, 18 Jul 2018 01:56:37
On Mon, 2018-07-16 at 19:45 +0800, Ming Lei wrote:
> On Sat, Jul 14, 2018 at 6:29 AM, Martin Wilck
> wrote:
> > Hi Ming & Jens,
> >
> > On Fri, 2018-07-13 at 12:54 -0600, Jens Axboe wrote:
> > > On 7/12/18 5:29 PM, Ming Lei wrote:
> > > >
&
On Fri, 2018-07-13 at 14:52 -0600, Jens Axboe wrote:
> On 7/13/18 2:48 PM, Martin Wilck wrote:
> > >
> > > > However, so far I've only identified a minor problem, see below
> > > > -
> > > > it
> > > > doesn't explain the data cor
one test at a time. We're currently trying
to find out more about the IO sizes at which the problem occurs. As you
noted, currently we know no more than that it happens somewhere between
16k and 1M.
Martin
--
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
On Fri, 2018-07-13 at 12:50 -0600, Jens Axboe wrote:
> On 7/13/18 12:00 PM, Jens Axboe wrote:
> > On 7/13/18 10:56 AM, Martin Wilck wrote:
> > > On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote:
> > > >
> > > > Hence the patch I sent is wrong, t
On Fri, 2018-07-13 at 12:00 -0600, Jens Axboe wrote:
> On 7/13/18 10:56 AM, Martin Wilck wrote:
> > On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote:
> > >
> > > Hence the patch I sent is wrong, the code actually looks fine.
> > > Which
> > >
much simpler thatn
__blkdev_direct_IO(), and it seems to be broken in a subtle way.
However, so far I've only identified a minor problem, see below - it
doesn't explain the data corruption we're seeing.
Martin
--
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSELinux GmbH, GF: Felix Imendörffer, J
ly
> more
> pages. __blkdev_direct_IO_simple() only does one bio, so it has to
> fit
> within that one bio. __blkdev_direct_IO() will loop just fine and
> will finish any size, BIO_MAX_PAGES at the time.
Right. Hannes, I think we (at least myself) have been irritated by
looking at outdated code. The key point which I missed is that
__blkdev_direct_IO() is called with min(nr_pages, BIO_MAX_PAGES), and
advances beyond that later in the loop.
> Hence the patch I sent is wrong, the code actually looks fine. Which
> means we're back to trying to figure out what is going on here. It'd
> be great with a test case...
Unfortunately we have no reproducer just yet. Only the customer can
reproduce it. The scenario is a data base running on a KVM virtual
machine on top of a virtio-scsi volume backed by a multipath map, with
cache='none' in qemu.
My late thinking is that if, for whatever reason I don't yet
understand, blkdev_direct_IO() resulted in a short write,
__generic_file_write_iter() would fall back to buffered writing, which
might be a possible explanation for the data corruption we observe.
That's just speculation at the current stage.
Regards
Martin
--
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
if (!nr_pages)
> return 0;
> - if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES)
> + if (is_sync_kiocb(iocb))
> return __blkdev_direct_IO_simple(iocb, iter,
> nr_pages);
>
> - return __blkdev_direct_IO(iocb, iter, min(nr_pages,
> BIO_MAX_PAGES));
> + return __blkdev_direct_IO(iocb, iter, nr_pages);
> }
>
> static __init int blkdev_init(void)
>
--
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
get the point why this would
be wrong in general - what's the difference to e.g. calling "mdadm -I"?
Regards
Martin
--
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
On Fri, 2017-09-29 at 16:59 -0600, Keith Busch wrote:
> On Thu, Sep 28, 2017 at 09:36:36PM +0200, Martin Wilck wrote:
> > In the NVME subsystem, we're seeing a race condition with udev
> > where
> > device_add_disk() is called (which triggers an "add" uevent),
On Sun, 2017-10-01 at 10:00 +0200, Christoph Hellwig wrote:
> While this looks okay-ish to me I really don't want people confused
> with three variants of add_disk, we really need to consolidate
> our helpers there a bit..
>
Can you give me a hint what you'd like to see?
Martin
--
the device is incomplete, in particular, device
WWIDs may not be determined correctly.
To fix this, this patch introduces a new function
device_add_disk_with_groups(), which takes a list of attribute groups
and adds them to the device before sending out uevents.
Signed-off-by: Martin Wilck <mw
By using device_add_disk_with_groups(), we can avoid the race
condition with udev rule processing, because no udev event will
be triggered before all attributes are available.
Signed-off-by: Martin Wilck <mwi...@suse.com>
---
drivers/nvme/host/core.c | 12 +++-
1 file chan
LETE;
> +done:
> if (pgpath) {
> - ps = >pg->ps;
> + struct path_selector *ps = >pg->ps;
> +
> if (ps->type->end_io)
> ps->type->end_io(ps, >path, mpio-
> >nr_bytes);
> }
>
> - return r;
> + return error;
> }
>
> /*
--
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
43 matches
Mail list logo