Re: [PATCH v5 0/3] Fix silent data corruption in blkdev_direct_IO()

2018-07-26 Thread Jens Axboe
On 7/25/18 2:15 PM, Martin Wilck wrote: > Hello Jens, Ming, Jan, and all others, > > the following patches have been verified by a customer to fix a silent data > corruption which he has been seeing since "72ecad2 block: support a full bio > worth of IO for simplified bdev direct-io". > > The

[PATCH v5 0/3] Fix silent data corruption in blkdev_direct_IO()

2018-07-25 Thread Martin Wilck
Hello Jens, Ming, Jan, and all others, the following patches have been verified by a customer to fix a silent data corruption which he has been seeing since "72ecad2 block: support a full bio worth of IO for simplified bdev direct-io". The patches are based on our observation that the corruption

[PATCH v4 0/4] Fix silent data corruption in blkdev_direct_IO()

2018-07-20 Thread Martin Wilck
Hello Jens, Ming, Jan, and all others, the following patches have been verified by a customer to fix a silent data corruption which he has been seeing since "72ecad2 block: support a full bio worth of IO for simplified bdev direct-io". The patches are based on our observation that the corruption

[PATCH v3 0/3] Fix silent data corruption in blkdev_direct_IO()

2018-07-19 Thread Martin Wilck
Hello Jens, Ming, Jan, and all others, the following patches have been verified by a customer to fix a silent data corruption which he has been seeing since "72ecad2 block: support a full bio worth of IO for simplified bdev direct-io". The patches are based on our observation that the corruption

Re: [PATCH 0/2] Fix silent data corruption in blkdev_direct_IO()

2018-07-19 Thread Christoph Hellwig
Martin, please NEVER send a patch series as a reply to a previous thread. That makes it complete hell to find in the inbox.

Re: [PATCH 0/2] Fix silent data corruption in blkdev_direct_IO()

2018-07-19 Thread Hannes Reinecke
On 07/19/2018 11:39 AM, Martin Wilck wrote: > Hello Jens, Ming, Jan, and all others, > > the following patches have been verified by a customer to fix a silent data > corruption which he has been seeing since "72ecad2 block: support a full bio > worth of IO for simplified bdev direct-io". > >

[PATCH 0/2] Fix silent data corruption in blkdev_direct_IO()

2018-07-19 Thread Martin Wilck
Hello Jens, Ming, Jan, and all others, the following patches have been verified by a customer to fix a silent data corruption which he has been seeing since "72ecad2 block: support a full bio worth of IO for simplified bdev direct-io". The patches are based on our observation that the corruption

Re: Silent data corruption in blkdev_direct_IO()

2018-07-18 Thread Jan Kara
On Wed 18-07-18 13:40:07, Jan Kara wrote: > On Wed 18-07-18 11:20:15, Johannes Thumshirn wrote: > > On Wed, Jul 18, 2018 at 03:54:46PM +0800, Ming Lei wrote: > > > Please go ahead and take care of it since you have the test cases. > > > > Speaking of which, do we already know how it is triggered

Re: Silent data corruption in blkdev_direct_IO()

2018-07-18 Thread Jan Kara
On Wed 18-07-18 11:20:15, Johannes Thumshirn wrote: > On Wed, Jul 18, 2018 at 03:54:46PM +0800, Ming Lei wrote: > > Please go ahead and take care of it since you have the test cases. > > Speaking of which, do we already know how it is triggered and can we > cook up a blktests testcase for it?

Re: Silent data corruption in blkdev_direct_IO()

2018-07-18 Thread Johannes Thumshirn
On Wed, Jul 18, 2018 at 03:54:46PM +0800, Ming Lei wrote: > Please go ahead and take care of it since you have the test cases. Speaking of which, do we already know how it is triggered and can we cook up a blktests testcase for it? This would be more than helpful for all parties. Thanks,

Re: Silent data corruption in blkdev_direct_IO()

2018-07-18 Thread Ming Lei
On Wed, Jul 18, 2018 at 09:32:12AM +0200, Martin Wilck wrote: > On Wed, 2018-07-18 at 10:48 +0800, Ming Lei wrote: > > On Wed, Jul 18, 2018 at 02:07:28AM +0200, Martin Wilck wrote: > > > > > > From b75adc856119346e02126cf8975755300f2d9b7f Mon Sep 17 00:00:00 > > > 2001 > > > From: Martin Wilck >

Re: Silent data corruption in blkdev_direct_IO()

2018-07-18 Thread Martin Wilck
On Wed, 2018-07-18 at 10:48 +0800, Ming Lei wrote: > On Wed, Jul 18, 2018 at 02:07:28AM +0200, Martin Wilck wrote: > > > > From b75adc856119346e02126cf8975755300f2d9b7f Mon Sep 17 00:00:00 > > 2001 > > From: Martin Wilck > > Date: Wed, 18 Jul 2018 01:56:37 +0200 > > Subject: [PATCH] block:

Re: Silent data corruption in blkdev_direct_IO()

2018-07-17 Thread Ming Lei
On Wed, Jul 18, 2018 at 02:07:28AM +0200, Martin Wilck wrote: > On Mon, 2018-07-16 at 19:45 +0800, Ming Lei wrote: > > On Sat, Jul 14, 2018 at 6:29 AM, Martin Wilck > > wrote: > > > Hi Ming & Jens, > > > > > > On Fri, 2018-07-13 at 12:54 -0600, Jens Axboe wrote: > > > > On 7/12/18 5:29 PM, Ming

Re: Silent data corruption in blkdev_direct_IO()

2018-07-17 Thread Martin Wilck
On Mon, 2018-07-16 at 19:45 +0800, Ming Lei wrote: > On Sat, Jul 14, 2018 at 6:29 AM, Martin Wilck > wrote: > > Hi Ming & Jens, > > > > On Fri, 2018-07-13 at 12:54 -0600, Jens Axboe wrote: > > > On 7/12/18 5:29 PM, Ming Lei wrote: > > > > > > > > Maybe you can try the following patch from

Re: Silent data corruption in blkdev_direct_IO()

2018-07-16 Thread Martin Wilck
On Fri, 2018-07-13 at 14:52 -0600, Jens Axboe wrote: > On 7/13/18 2:48 PM, Martin Wilck wrote: > > > > > > > However, so far I've only identified a minor problem, see below > > > > - > > > > it > > > > doesn't explain the data corruption we're seeing. > > > > > > What would help is trying to

Re: Silent data corruption in blkdev_direct_IO()

2018-07-16 Thread Ming Lei
On Sat, Jul 14, 2018 at 6:29 AM, Martin Wilck wrote: > Hi Ming & Jens, > > On Fri, 2018-07-13 at 12:54 -0600, Jens Axboe wrote: >> On 7/12/18 5:29 PM, Ming Lei wrote: >> > >> > Maybe you can try the following patch from Christoph to see if it >> > makes a >> > difference: >> > >> >

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
Hi Ming & Jens, On Fri, 2018-07-13 at 12:54 -0600, Jens Axboe wrote: > On 7/12/18 5:29 PM, Ming Lei wrote: > > > > Maybe you can try the following patch from Christoph to see if it > > makes a > > difference: > > > > https://marc.info/?l=linux-kernel=153013977816825=2 > > That's not a bad

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
On Fri, 2018-07-13 at 12:50 -0600, Jens Axboe wrote: > On 7/13/18 12:00 PM, Jens Axboe wrote: > > On 7/13/18 10:56 AM, Martin Wilck wrote: > > > On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: > > > > > > > > Hence the patch I sent is wrong, the code actually looks fine. > > > > Which > > >

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Jens Axboe
On 7/13/18 2:48 PM, Martin Wilck wrote: >> For all you know, the bug could be elsewhere and >> we're just going to be hitting it differently some other way. The >> head-in-the-sand approach is rarely a win long term. >> >> It's saving an allocation per IO, that's definitely measurable on >> the

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
On Fri, 2018-07-13 at 12:00 -0600, Jens Axboe wrote: > On 7/13/18 10:56 AM, Martin Wilck wrote: > > On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: > > > > > > Hence the patch I sent is wrong, the code actually looks fine. > > > Which > > > means we're back to trying to figure out what is

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Jens Axboe
On 7/12/18 5:29 PM, Ming Lei wrote: > On Thu, Jul 12, 2018 at 10:36 PM, Hannes Reinecke wrote: >> Hi Jens, Christoph, >> >> we're currently hunting down a silent data corruption occurring due to >> commit 72ecad22d9f1 ("block: support a full bio worth of IO for >> simplified bdev direct-io"). >>

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Jens Axboe
On 7/13/18 12:00 PM, Jens Axboe wrote: > On 7/13/18 10:56 AM, Martin Wilck wrote: >> On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: >>> >>> Hence the patch I sent is wrong, the code actually looks fine. Which >>> means we're back to trying to figure out what is going on here. It'd >>> be

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Jens Axboe
On 7/13/18 10:56 AM, Martin Wilck wrote: > On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: >> >> Hence the patch I sent is wrong, the code actually looks fine. Which >> means we're back to trying to figure out what is going on here. It'd >> be great with a test case... > > We don't have an

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: > > Hence the patch I sent is wrong, the code actually looks fine. Which > means we're back to trying to figure out what is going on here. It'd > be great with a test case... We don't have an easy test case yet. But the customer has confirmed

Re: Silent data corruption in blkdev_direct_IO()

2018-07-13 Thread Martin Wilck
On Thu, 2018-07-12 at 10:42 -0600, Jens Axboe wrote: > On 7/12/18 10:20 AM, Jens Axboe wrote: > > On 7/12/18 10:14 AM, Hannes Reinecke wrote: > > > On 07/12/2018 05:08 PM, Jens Axboe wrote: > > > > On 7/12/18 8:36 AM, Hannes Reinecke wrote: > > > > > Hi Jens, Christoph, > > > > > > > > > > we're

Re: Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Ming Lei
On Thu, Jul 12, 2018 at 10:36 PM, Hannes Reinecke wrote: > Hi Jens, Christoph, > > we're currently hunting down a silent data corruption occurring due to > commit 72ecad22d9f1 ("block: support a full bio worth of IO for > simplified bdev direct-io"). > > While the whole thing is still hazy on the

Re: Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Jens Axboe
On 7/12/18 10:20 AM, Jens Axboe wrote: > On 7/12/18 10:14 AM, Hannes Reinecke wrote: >> On 07/12/2018 05:08 PM, Jens Axboe wrote: >>> On 7/12/18 8:36 AM, Hannes Reinecke wrote: Hi Jens, Christoph, we're currently hunting down a silent data corruption occurring due to commit

Re: Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Jens Axboe
On 7/12/18 10:14 AM, Hannes Reinecke wrote: > On 07/12/2018 05:08 PM, Jens Axboe wrote: >> On 7/12/18 8:36 AM, Hannes Reinecke wrote: >>> Hi Jens, Christoph, >>> >>> we're currently hunting down a silent data corruption occurring due to >>> commit 72ecad22d9f1 ("block: support a full bio worth of

Re: Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Hannes Reinecke
On 07/12/2018 05:08 PM, Jens Axboe wrote: On 7/12/18 8:36 AM, Hannes Reinecke wrote: Hi Jens, Christoph, we're currently hunting down a silent data corruption occurring due to commit 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io"). While the whole thing is

Re: Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Martin Wilck
On Thu, 2018-07-12 at 09:08 -0600, Jens Axboe wrote: > On 7/12/18 8:36 AM, Hannes Reinecke wrote: > > Hi Jens, Christoph, > > > > we're currently hunting down a silent data corruption occurring due > > to > > commit 72ecad22d9f1 ("block: support a full bio worth of IO for > > simplified bdev

Re: Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Jens Axboe
On 7/12/18 8:36 AM, Hannes Reinecke wrote: > Hi Jens, Christoph, > > we're currently hunting down a silent data corruption occurring due to > commit 72ecad22d9f1 ("block: support a full bio worth of IO for > simplified bdev direct-io"). > > While the whole thing is still hazy on the details, the

Silent data corruption in blkdev_direct_IO()

2018-07-12 Thread Hannes Reinecke
Hi Jens, Christoph, we're currently hunting down a silent data corruption occurring due to commit 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io"). While the whole thing is still hazy on the details, the one thing we've found is that reverting that patch fixes