Re: [PATCH v5 3/3] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

2018-08-22 Thread Martin Wilck
On Mon, 2018-07-30 at 20:37 +0800, Ming Lei wrote: > On Wed, Jul 25, 2018 at 11:15:09PM +0200, Martin Wilck wrote: > > > > +/** > > + * bio_iov_iter_get_pages - pin user or kernel pages and add them > > to a bio > > + * @bio: bio to add pages to > > + * @i

[RFC PATCH] blkdev: __blkdev_direct_IO: collect async writes in error case

2018-07-25 Thread Martin Wilck
_IO() path, the call to do_blockdev_direct_IO() has been replaced by __blkdev_direct_IO() in 542ff7bf18c6. do_blockdev_direct_IO() takes care not to leave async bios in flight for partial writes if an error occurs. Implement the same semantics for __blkdev_direct_IO(). Signed-off-by: Martin Wilck Fixes:

[PATCH v5 1/3] block: bio_iov_iter_get_pages: fix size of last iovec

2018-07-25 Thread Martin Wilck
an Kara Reviewed-by: Christoph Hellwig Signed-off-by: Martin Wilck --- block/bio.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/block/bio.c b/block/bio.c index 67eff5e..489a430 100644 --- a/block/bio.c +++ b/block/bio.c @@ -912,16 +912,16 @@

[PATCH v5 2/3] blkdev: __blkdev_direct_IO_simple: fix leak in error case

2018-07-25 Thread Martin Wilck
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Reviewed-by: Ming Lei Signed-off-by: Martin Wilck --- fs/block_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 0dd87a

[PATCH v5 3/3] block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

2018-07-25 Thread Martin Wilck
quot;short read", and __generic_file_write_iter() falls back to buffered writes, which may lead to data corruption. Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Signed-off-by: Martin Wilck --- block/bio.c | 35 ---

[PATCH v5 0/3] Fix silent data corruption in blkdev_direct_IO()

2018-07-25 Thread Martin Wilck
ace in bio exhausted (Jan) - 3/3: add comments Martin Wilck (3): block: bio_iov_iter_get_pages: fix size of last iovec blkdev: __blkdev_direct_IO_simple: fix leak in error case block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

Re: [PATCH 2/5] block: genhd: add device_add_disk_with_groups

2018-07-25 Thread Martin Wilck
> > This avoids race condition the driver would otherwise have if these > > groups need to be created with sysfs_add_groups(). > > > > Signed-off-by: Martin Wilck > > Signed-off-by: Hannes Reinecke > > This should be From: Martin, shouldn't it? I'm fine w

Re: [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper

2018-07-20 Thread Martin Wilck
On Sat, 2018-07-21 at 00:16 +0800, Ming Lei wrote: > On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote: > > bio_iov_iter_get_pages() only adds pages for the next non-zero > > segment from the iov_iter to the bio. Some callers prefer to > > obtain as many pages as w

Re: [PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper

2018-07-20 Thread Martin Wilck
On Fri, 2018-07-20 at 17:11 +0200, Christoph Hellwig wrote: > On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote: > > bio_iov_iter_get_pages() only adds pages for the next non-zero > > segment from the iov_iter to the bio. Some callers prefer to > > obtain as ma

[PATCH v4 2/4] blkdev: __blkdev_direct_IO_simple: fix leak in error case

2018-07-20 Thread Martin Wilck
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Reviewed-by: Ming Lei Signed-off-by: Martin Wilck --- fs/block_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 0dd87a

[PATCH v4 4/4] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-20 Thread Martin Wilck
, __generic_file_write_iter() falls back to buffered writes, which has been observed to cause data corruption in certain workloads. Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Signed-off-by: Martin Wilck --- fs/block_dev.c | 9 +++

[PATCH v4 3/4] block: add bio_iov_iter_get_all_pages() helper

2018-07-20 Thread Martin Wilck
bio_iov_iter_get_pages() only adds pages for the next non-zero segment from the iov_iter to the bio. Some callers prefer to obtain as many pages as would fit into the bio, with proper rollback in case of failure. Add bio_iov_iter_get_all_pages() for this purpose. Signed-off-by: Martin Wilck

[PATCH v4 1/4] block: bio_iov_iter_get_pages: fix size of last iovec

2018-07-20 Thread Martin Wilck
an Kara Reviewed-by: Christoph Hellwig Signed-off-by: Martin Wilck --- block/bio.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/block/bio.c b/block/bio.c index 67eff5e..489a430 100644 --- a/block/bio.c +++ b/block/bio.c @@ -912,16 +912,16 @@

[PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO()

2018-07-20 Thread Martin Wilck
(3/4, 4/4). - 3/4: add a new helper to retrieve as many pages as possible (Ming) - 3/4: put pages in case of error (Ming) Martin Wilck (4): block: bio_iov_iter_get_pages: fix size of last iovec blkdev: __blkdev_direct_IO_simple: fix leak in error case block: add bio_iov_iter_g

Re: [PATCH v3 3/3] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-20 Thread Martin Wilck
On Fri, 2018-07-20 at 10:41 +0800, Ming Lei wrote: > On Thu, Jul 19, 2018 at 11:01:58PM +0200, Martin Wilck wrote: > > When bio_iov_iter_get_pages() is called > > from __blkdev_direct_IO_simple(), > > we already know that the content of the input iov_iter fits into a >

[PATCH v3 2/3] blkdev: __blkdev_direct_IO_simple: fix leak in error case

2018-07-19 Thread Martin Wilck
Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Signed-off-by: Martin Wilck --- fs/block_dev.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 0dd87aa..aba2541 10064

[PATCH v3 0/3] Fix silent data corruption in blkdev_direct_IO()

2018-07-19 Thread Martin Wilck
irst patch was at v2 already in the previous submission. Changes wrt v1: - 1/3: minor formatting change (Christoph) - 2/3: split off the leak fix (Ming) - 3/3: give up if bio_iov_iter_get_pages() returns an error (Jan) - 3/3: warn if space in bio exhausted (Jan) - 3/3: add comments Marti

[PATCH v3 1/3] block: bio_iov_iter_get_pages: fix size of last iovec

2018-07-19 Thread Martin Wilck
invocation, the correct index is (nr_pages - 1). v2: improved readability following suggestions from Ming Lei. v3: followed a formatting suggestion from Christoph Hellwig. Fixes: 2cefe4dbaadf ("block: add bio_iov_iter_get_pages()") Signed-off-by: Martin Wilck Reviewed-by: Hannes Reinecke Revie

[PATCH v3 3/3] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
l bio worth of IO for simplified bdev direct-io") Signed-off-by: Martin Wilck --- fs/block_dev.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/fs/block_dev.c b/fs/block_dev.c index aba2541..561c34e 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -222

Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
On Thu, 2018-07-19 at 17:15 +0200, Jan Kara wrote: > On Thu 19-07-18 14:23:53, Martin Wilck wrote: > > On Thu, 2018-07-19 at 12:45 +0200, Jan Kara wrote: > > > Secondly, I don't think it is good to discard error from > > > bio_iov_iter_get_pages() here

Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
ends on the segment structure of the input iov_iter. If the input iovec has just a single segment, it fills the bio in a single call. With multiple segments, it just returns the page(s) of the first segment. The point of my patch is to make no difference between single- segment and multi-segment IOs. Rega

Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
ges() returns less data than possible. The function just takes one iteration step at a time, and thus iterating over it until the desired number of pages is obtained, and collecting the resulting pages in a single bio, isn't "fragile", it's just the natural thing to do. Martin -- Dr. Ma

Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
On Thu, 2018-07-19 at 17:21 +0200, Jan Kara wrote: > On Thu 19-07-18 20:20:51, Ming Lei wrote: > > On Thu, Jul 19, 2018 at 01:56:16PM +0200, Jan Kara wrote: > > > On Thu 19-07-18 19:04:46, Ming Lei wrote: > > > > On Thu, Jul 19, 2018 at 11:39:

Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
On Thu, 2018-07-19 at 13:56 +0200, Jan Kara wrote: > On Thu 19-07-18 19:04:46, Ming Lei wrote: > > On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote: > > > bio_iov_iter_get_pages() returns only pages for a single non- > > > empty > > > segment of

Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
On Thu, 2018-07-19 at 12:45 +0200, Jan Kara wrote: > On Thu 19-07-18 11:39:18, Martin Wilck wrote: > > bio_iov_iter_get_pages() returns only pages for a single non-empty > > segment of the input iov_iter's iovec. This may be much less than > > the number > > of page

[PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

2018-07-19 Thread Martin Wilck
and the customer's tests. It'd be possible to detect this situation and merge bi_io_vec slots that refer to the same page, but I prefer to keep it simple for now. Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") Signed-off-by: Martin Wilc

[PATCH 0/2] Fix silent data corruption in blkdev_direct_IO()

2018-07-19 Thread Martin Wilck
se. Looking forward to your opinions and suggestions. Regards Martin Martin Wilck (2): block: bio_iov_iter_get_pages: fix size of last iovec blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio block/bio.c| 18 -- fs/block_dev.c | 8 +++- 2 files changed,

[PATCH 1/2] block: bio_iov_iter_get_pages: fix size of last iovec

2018-07-19 Thread Martin Wilck
invocation, the correct index is (nr_pages - 1). V2: improved readability following suggestions from Ming Lei. Fixes: 2cefe4dbaadf ("block: add bio_iov_iter_get_pages()") Signed-off-by: Martin Wilck --- block/bio.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff

Re: Silent data corruption in blkdev_direct_IO()

2018-07-18 Thread Martin Wilck
On Wed, 2018-07-18 at 10:48 +0800, Ming Lei wrote: > On Wed, Jul 18, 2018 at 02:07:28AM +0200, Martin Wilck wrote: > > > > From b75adc856119346e02126cf8975755300f2d9b7f Mon Sep 17 00:00:00 > > 2001 > > From: Martin Wilck > > Date: Wed, 18 Jul 2018 01:56:37

Re: Silent data corruption in blkdev_direct_IO()

2018-07-17 Thread Martin Wilck
On Mon, 2018-07-16 at 19:45 +0800, Ming Lei wrote: > On Sat, Jul 14, 2018 at 6:29 AM, Martin Wilck > wrote: > > Hi Ming & Jens, > > > > On Fri, 2018-07-13 at 12:54 -0600, Jens Axboe wrote: > > > On 7/12/18 5:29 PM, Ming Lei wrote: > > > > &

Re: Silent data corruption in blkdev_direct_IO()

2018-07-16 Thread Martin Wilck
On Fri, 2018-07-13 at 14:52 -0600, Jens Axboe wrote: > On 7/13/18 2:48 PM, Martin Wilck wrote: > > > > > > > However, so far I've only identified a minor problem, see below > > > > - > > > > it > > > > doesn't explain the data cor

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
one test at a time. We're currently trying to find out more about the IO sizes at which the problem occurs. As you noted, currently we know no more than that it happens somewhere between 16k and 1M. Martin -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
On Fri, 2018-07-13 at 12:50 -0600, Jens Axboe wrote: > On 7/13/18 12:00 PM, Jens Axboe wrote: > > On 7/13/18 10:56 AM, Martin Wilck wrote: > > > On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: > > > > > > > > Hence the patch I sent is wrong, t

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
On Fri, 2018-07-13 at 12:00 -0600, Jens Axboe wrote: > On 7/13/18 10:56 AM, Martin Wilck wrote: > > On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: > > > > > > Hence the patch I sent is wrong, the code actually looks fine. > > > Which > > >

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
much simpler thatn __blkdev_direct_IO(), and it seems to be broken in a subtle way. However, so far I've only identified a minor problem, see below - it doesn't explain the data corruption we're seeing. Martin -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSELinux GmbH, GF: Felix Imendörffer, J

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
ly > more > pages. __blkdev_direct_IO_simple() only does one bio, so it has to > fit > within that one bio. __blkdev_direct_IO() will loop just fine and > will finish any size, BIO_MAX_PAGES at the time. Right. Hannes, I think we (at least myself) have been irritated by looking at outdated code. The key point which I missed is that __blkdev_direct_IO() is called with min(nr_pages, BIO_MAX_PAGES), and advances beyond that later in the loop. > Hence the patch I sent is wrong, the code actually looks fine. Which > means we're back to trying to figure out what is going on here. It'd > be great with a test case... Unfortunately we have no reproducer just yet. Only the customer can reproduce it. The scenario is a data base running on a KVM virtual machine on top of a virtio-scsi volume backed by a multipath map, with cache='none' in qemu. My late thinking is that if, for whatever reason I don't yet understand, blkdev_direct_IO() resulted in a short write, __generic_file_write_iter() would fall back to buffered writing, which might be a possible explanation for the data corruption we observe. That's just speculation at the current stage. Regards Martin -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)

Re: Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Martin Wilck
if (!nr_pages) > return 0; > - if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES) > + if (is_sync_kiocb(iocb)) > return __blkdev_direct_IO_simple(iocb, iter, > nr_pages); > > - return __blkdev_direct_IO(iocb, iter, min(nr_pages, > BIO_MAX_PAGES)); > + return __blkdev_direct_IO(iocb, iter, nr_pages); > } > > static __init int blkdev_init(void) > -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)

Re: [PATCH] dm-zoned-tools: add zoned disk udev rules for scheduler / dmsetup

2018-06-15 Thread Martin Wilck
get the point why this would be wrong in general - what's the difference to e.g. calling "mdadm -I"? Regards Martin -- Dr. Martin Wilck , Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)

Re: [PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-10-04 Thread Martin Wilck
On Fri, 2017-09-29 at 16:59 -0600, Keith Busch wrote: > On Thu, Sep 28, 2017 at 09:36:36PM +0200, Martin Wilck wrote: > > In the NVME subsystem, we're seeing a race condition with udev > > where > > device_add_disk() is called (which triggers an "add" uevent),

Re: [PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-10-04 Thread Martin Wilck
On Sun, 2017-10-01 at 10:00 +0200, Christoph Hellwig wrote: > While this looks okay-ish to me I really don't want people confused > with three variants of add_disk, we really need to consolidate > our helpers there a bit.. > Can you give me a hint what you'd like to see? Martin --

[PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-09-28 Thread Martin Wilck
the device is incomplete, in particular, device WWIDs may not be determined correctly. To fix this, this patch introduces a new function device_add_disk_with_groups(), which takes a list of attribute groups and adds them to the device before sending out uevents. Signed-off-by: Martin Wilck <mw

[PATCH 2/2] nvme: use device_add_disk_with_groups()

2017-09-28 Thread Martin Wilck
By using device_add_disk_with_groups(), we can avoid the race condition with udev rule processing, because no udev event will be triggered before all attributes are available. Signed-off-by: Martin Wilck <mwi...@suse.com> --- drivers/nvme/host/core.c | 12 +++- 1 file chan

Re: [dm-devel] [PATCH 08/15] dm mpath: merge do_end_io_bio into multipath_end_io_bio

2017-05-22 Thread Martin Wilck
LETE; > +done: >   if (pgpath) { > - ps = >pg->ps; > + struct path_selector *ps = >pg->ps; > + >   if (ps->type->end_io) >   ps->type->end_io(ps, >path, mpio- > >nr_bytes); >   } >   > - return r; > + return error; >  } >   >  /* -- Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)