Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-15 Thread Sitsofe Wheeler
On Tue, Jun 07, 2016 at 07:58:25AM -0700, Shaohua Li wrote:
>
> I didn't follow. io_err is only and always set when ret == 0. io_err is
> meanless if ret != 0, because that means the disk doesn't support discard and
> we don't dispatch discard IO. why should we initialized io_err to 0?

My mistake - I confused what !ret would mean.

Unfortunately the V2 patch no longer cleanly applies to the latest
kernel (db06d759d6cf903aeda8c107fd3abd366dd80200 ) so I can't easily
test it there.

-- 
Sitsofe | http://sucs.org/~sits/


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-15 Thread Sitsofe Wheeler
On Tue, Jun 07, 2016 at 07:58:25AM -0700, Shaohua Li wrote:
>
> I didn't follow. io_err is only and always set when ret == 0. io_err is
> meanless if ret != 0, because that means the disk doesn't support discard and
> we don't dispatch discard IO. why should we initialized io_err to 0?

My mistake - I confused what !ret would mean.

Unfortunately the V2 patch no longer cleanly applies to the latest
kernel (db06d759d6cf903aeda8c107fd3abd366dd80200 ) so I can't easily
test it there.

-- 
Sitsofe | http://sucs.org/~sits/


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-15 Thread Sitsofe Wheeler
On Tue, Jun 14, 2016 at 10:14:50PM -0400, Martin K. Petersen wrote:
> > "Christoph" == Christoph Hellwig  writes:
> 
> Christoph> And I'd much prefer to get this right now.  It's not like
> Christoph> this is recently introduced behavior.
> 
> Unfortunately there are quite a few callers of blkdev_issue_discard()
> these days. Some of them ignore the return value but not all of
> them. I'm concerned about causing all sorts of breakage if we suddenly
> start returning errors various places in the stable trees.

This is true. We have problematic behaviour in stable kernels today so
there needs to be a "least intrusive" workaround which changes the
behaviour as little as possible for those. I would say that means
maintaining the current -EOPNOTSUPP behaviour in those kernels
regardless of what goes into master.

-- 
Sitsofe | http://sucs.org/~sits/


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-15 Thread Sitsofe Wheeler
On Tue, Jun 14, 2016 at 10:14:50PM -0400, Martin K. Petersen wrote:
> > "Christoph" == Christoph Hellwig  writes:
> 
> Christoph> And I'd much prefer to get this right now.  It's not like
> Christoph> this is recently introduced behavior.
> 
> Unfortunately there are quite a few callers of blkdev_issue_discard()
> these days. Some of them ignore the return value but not all of
> them. I'm concerned about causing all sorts of breakage if we suddenly
> start returning errors various places in the stable trees.

This is true. We have problematic behaviour in stable kernels today so
there needs to be a "least intrusive" workaround which changes the
behaviour as little as possible for those. I would say that means
maintaining the current -EOPNOTSUPP behaviour in those kernels
regardless of what goes into master.

-- 
Sitsofe | http://sucs.org/~sits/


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Mike Snitzer
On Tue, Jun 14 2016 at 10:30pm -0400,
Martin K. Petersen  wrote:

> > "Mike" == Mike Snitzer  writes:
> 
> Mike,
> 
> Mike> so long story short: making this change to remove this so-called
> Mike> "stupid behaviour" will require code like
> Mike> drivers/md/dm-thin.c:issue_discard(() to check the return from
> Mike> __blkdev_issue_discard() and if it is -EOPNOTSUPP then it should
> Mike> return 0.
> 
> Yes, please.
> 
> The original -EOPNOTSUPP equals success is a remnant from the days where
> discards were only a hint. And sadly that policy got encoded in the
> actual interface instead of being left up to the caller.
> 
> Now the world has moved on. And reliable zeroout behavior, the SCSI
> target drivers and other kernel users need an interface that tells them
> exactly what happened at the bottom of the stack so they in turn can
> provide a deterministic result (including partial block zeroing) to
> their clients.
> 
> It's imperative that this gets fixed up. And instead of perpetuating a
> weird interface that returns success on failure, let's fix DM and the
> callers that actually check the return of blkdev_issue_discard() so they
> do the right thing.
> 
> I really don't understand why you are objecting so much to this. It's a
> trivial change that may not directly benefit DM but it helps everybody
> else. And it cleans up a library call that's confusing, error prone and
> goes against the very grain of how all our kernel interfaces work in
> general.

I've been consistently objecting to changing the blkdev_issue_discard()
interface.  Fixing the async __blkdev_issue_discard() to offer
unfiltered return values is perfectly fine by me.

But the ship has sailed on the blkdev_issue_discard() interface.


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Mike Snitzer
On Tue, Jun 14 2016 at 10:30pm -0400,
Martin K. Petersen  wrote:

> > "Mike" == Mike Snitzer  writes:
> 
> Mike,
> 
> Mike> so long story short: making this change to remove this so-called
> Mike> "stupid behaviour" will require code like
> Mike> drivers/md/dm-thin.c:issue_discard(() to check the return from
> Mike> __blkdev_issue_discard() and if it is -EOPNOTSUPP then it should
> Mike> return 0.
> 
> Yes, please.
> 
> The original -EOPNOTSUPP equals success is a remnant from the days where
> discards were only a hint. And sadly that policy got encoded in the
> actual interface instead of being left up to the caller.
> 
> Now the world has moved on. And reliable zeroout behavior, the SCSI
> target drivers and other kernel users need an interface that tells them
> exactly what happened at the bottom of the stack so they in turn can
> provide a deterministic result (including partial block zeroing) to
> their clients.
> 
> It's imperative that this gets fixed up. And instead of perpetuating a
> weird interface that returns success on failure, let's fix DM and the
> callers that actually check the return of blkdev_issue_discard() so they
> do the right thing.
> 
> I really don't understand why you are objecting so much to this. It's a
> trivial change that may not directly benefit DM but it helps everybody
> else. And it cleans up a library call that's confusing, error prone and
> goes against the very grain of how all our kernel interfaces work in
> general.

I've been consistently objecting to changing the blkdev_issue_discard()
interface.  Fixing the async __blkdev_issue_discard() to offer
unfiltered return values is perfectly fine by me.

But the ship has sailed on the blkdev_issue_discard() interface.


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike,

Mike> so long story short: making this change to remove this so-called
Mike> "stupid behaviour" will require code like
Mike> drivers/md/dm-thin.c:issue_discard(() to check the return from
Mike> __blkdev_issue_discard() and if it is -EOPNOTSUPP then it should
Mike> return 0.

Yes, please.

The original -EOPNOTSUPP equals success is a remnant from the days where
discards were only a hint. And sadly that policy got encoded in the
actual interface instead of being left up to the caller.

Now the world has moved on. And reliable zeroout behavior, the SCSI
target drivers and other kernel users need an interface that tells them
exactly what happened at the bottom of the stack so they in turn can
provide a deterministic result (including partial block zeroing) to
their clients.

It's imperative that this gets fixed up. And instead of perpetuating a
weird interface that returns success on failure, let's fix DM and the
callers that actually check the return of blkdev_issue_discard() so they
do the right thing.

I really don't understand why you are objecting so much to this. It's a
trivial change that may not directly benefit DM but it helps everybody
else. And it cleans up a library call that's confusing, error prone and
goes against the very grain of how all our kernel interfaces work in
general.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Martin K. Petersen
> "Mike" == Mike Snitzer  writes:

Mike,

Mike> so long story short: making this change to remove this so-called
Mike> "stupid behaviour" will require code like
Mike> drivers/md/dm-thin.c:issue_discard(() to check the return from
Mike> __blkdev_issue_discard() and if it is -EOPNOTSUPP then it should
Mike> return 0.

Yes, please.

The original -EOPNOTSUPP equals success is a remnant from the days where
discards were only a hint. And sadly that policy got encoded in the
actual interface instead of being left up to the caller.

Now the world has moved on. And reliable zeroout behavior, the SCSI
target drivers and other kernel users need an interface that tells them
exactly what happened at the bottom of the stack so they in turn can
provide a deterministic result (including partial block zeroing) to
their clients.

It's imperative that this gets fixed up. And instead of perpetuating a
weird interface that returns success on failure, let's fix DM and the
callers that actually check the return of blkdev_issue_discard() so they
do the right thing.

I really don't understand why you are objecting so much to this. It's a
trivial change that may not directly benefit DM but it helps everybody
else. And it cleans up a library call that's confusing, error prone and
goes against the very grain of how all our kernel interfaces work in
general.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> We can move the sanity checks out.  Or even better get rid of
Christoph> the stupid behavior of ignoring the late -EOPNOTSUPP in this
Christoph> low level helper and instead leaving it to the caller(s) that
Christoph> care.

It definitely should be a caller decision whether to ignore the return
value or not.

>> I am OK with your patch as a stable fix but this really needs to be
>> fixed up properly.

Christoph> And I'd much prefer to get this right now.  It's not like
Christoph> this is recently introduced behavior.

Unfortunately there are quite a few callers of blkdev_issue_discard()
these days. Some of them ignore the return value but not all of
them. I'm concerned about causing all sorts of breakage if we suddenly
start returning errors various places in the stable trees.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> We can move the sanity checks out.  Or even better get rid of
Christoph> the stupid behavior of ignoring the late -EOPNOTSUPP in this
Christoph> low level helper and instead leaving it to the caller(s) that
Christoph> care.

It definitely should be a caller decision whether to ignore the return
value or not.

>> I am OK with your patch as a stable fix but this really needs to be
>> fixed up properly.

Christoph> And I'd much prefer to get this right now.  It's not like
Christoph> this is recently introduced behavior.

Unfortunately there are quite a few callers of blkdev_issue_discard()
these days. Some of them ignore the return value but not all of
them. I'm concerned about causing all sorts of breakage if we suddenly
start returning errors various places in the stable trees.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Mike Snitzer
On Mon, Jun 13 2016 at  4:20am -0400,
Christoph Hellwig  wrote:

> On Fri, Jun 10, 2016 at 09:49:44PM -0400, Martin K. Petersen wrote:
> > >> What does the extra io_err buy us? Just have this function return an
> > >> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
> > >> special case it there.
> > 
> > Shaohua> The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't
> > Shaohua> support discard.  in that case, blkdev_issue_discard doesn't
> > Shaohua> return 0. blkdev_issue_discard only returns 0 if IO error is
> > Shaohua> -EOPNOTSUPP.
> > 
> > Oh, I see. The sanity checks are now in __blkdev_issue_discard() so
> > there is no way to distinguish between -EOPNOTSUPP and the other
> > -EOPNOTSUPP. *sigh*
> 
> We can move the sanity checks out.  Or even better get rid of the
> stupid behavior of ignoring the late -EOPNOTSUPP in this low level
> helper and instead leaving it to the caller(s) that care.

I'm not onboard with blkdev_issue_discard() no longer masking the late
return of -EOPNOTSUPP.

I'd be fine with moving the early -EOPNOTSUPP checks and the masking of
late -EOPNOTSUPP out to blkdev_issue_discard().  But to be clear,
the masking of late -EOPNOTSUPP return is there for stacking drivers
like MD and DM.  So long as the upper level ioctl code, filesystems, etc
makes use of blkdev_issue_discard() then they'll still get the benefit
of that masking.

drivers/md/dm-thin.c is now using the new async __blkdev_issue_discard()
and it'll only ever do so to a device it knows supports discards -- BUT
it could be that the DM thin-pool's data device is itself a stacked
device that doesn't uniformly support discards throughout its entire
logical address space.  So it could issue a discard to a portion of the
stacked data device that will return -EOPNOTSUPP.. so long story short:
making this change to remove this so-called "stupid behaviour" will
require code like drivers/md/dm-thin.c:issue_discard(() to check the
return from __blkdev_issue_discard() and if it is -EOPNOTSUPP then it
should return 0.

> So far the DM test suite seems to be the only one that does.

The device-mapper-test-suite was only ever relying on
blkdev_issue_discard()'s early return of -EOPNOTSUPP.

> > I am OK with your patch as a stable fix but this really needs to be
> > fixed up properly.
> 
> And I'd much prefer to get this right now.  It's not like this is
> recently introduced behavior.

We need to sequence the fixes such that stable kernels get the zeroout
fallback fixed.  Right?  Not sure if that is a goal of shli's though..

In 4.7-rc, where you introduced __blkdev_issue_discard and I made
dm-thin.c consume it, I'm fine with seeing __blkdev_issue_discard stop
masking -EOPNOTSUPP... but at the same time that change is made
dm-thin.c would need to be fixed (in the same commit as the interface
change).  Though I'm now missing what lifting the -EOPNOTSUPP behavior
into blkdev_issue_discard() buys us... maybe purity of the new async
__blkdev_issue_discard()?

Mike


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-14 Thread Mike Snitzer
On Mon, Jun 13 2016 at  4:20am -0400,
Christoph Hellwig  wrote:

> On Fri, Jun 10, 2016 at 09:49:44PM -0400, Martin K. Petersen wrote:
> > >> What does the extra io_err buy us? Just have this function return an
> > >> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
> > >> special case it there.
> > 
> > Shaohua> The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't
> > Shaohua> support discard.  in that case, blkdev_issue_discard doesn't
> > Shaohua> return 0. blkdev_issue_discard only returns 0 if IO error is
> > Shaohua> -EOPNOTSUPP.
> > 
> > Oh, I see. The sanity checks are now in __blkdev_issue_discard() so
> > there is no way to distinguish between -EOPNOTSUPP and the other
> > -EOPNOTSUPP. *sigh*
> 
> We can move the sanity checks out.  Or even better get rid of the
> stupid behavior of ignoring the late -EOPNOTSUPP in this low level
> helper and instead leaving it to the caller(s) that care.

I'm not onboard with blkdev_issue_discard() no longer masking the late
return of -EOPNOTSUPP.

I'd be fine with moving the early -EOPNOTSUPP checks and the masking of
late -EOPNOTSUPP out to blkdev_issue_discard().  But to be clear,
the masking of late -EOPNOTSUPP return is there for stacking drivers
like MD and DM.  So long as the upper level ioctl code, filesystems, etc
makes use of blkdev_issue_discard() then they'll still get the benefit
of that masking.

drivers/md/dm-thin.c is now using the new async __blkdev_issue_discard()
and it'll only ever do so to a device it knows supports discards -- BUT
it could be that the DM thin-pool's data device is itself a stacked
device that doesn't uniformly support discards throughout its entire
logical address space.  So it could issue a discard to a portion of the
stacked data device that will return -EOPNOTSUPP.. so long story short:
making this change to remove this so-called "stupid behaviour" will
require code like drivers/md/dm-thin.c:issue_discard(() to check the
return from __blkdev_issue_discard() and if it is -EOPNOTSUPP then it
should return 0.

> So far the DM test suite seems to be the only one that does.

The device-mapper-test-suite was only ever relying on
blkdev_issue_discard()'s early return of -EOPNOTSUPP.

> > I am OK with your patch as a stable fix but this really needs to be
> > fixed up properly.
> 
> And I'd much prefer to get this right now.  It's not like this is
> recently introduced behavior.

We need to sequence the fixes such that stable kernels get the zeroout
fallback fixed.  Right?  Not sure if that is a goal of shli's though..

In 4.7-rc, where you introduced __blkdev_issue_discard and I made
dm-thin.c consume it, I'm fine with seeing __blkdev_issue_discard stop
masking -EOPNOTSUPP... but at the same time that change is made
dm-thin.c would need to be fixed (in the same commit as the interface
change).  Though I'm now missing what lifting the -EOPNOTSUPP behavior
into blkdev_issue_discard() buys us... maybe purity of the new async
__blkdev_issue_discard()?

Mike


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-13 Thread Christoph Hellwig
On Fri, Jun 10, 2016 at 09:49:44PM -0400, Martin K. Petersen wrote:
> >> What does the extra io_err buy us? Just have this function return an
> >> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
> >> special case it there.
> 
> Shaohua> The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't
> Shaohua> support discard.  in that case, blkdev_issue_discard doesn't
> Shaohua> return 0. blkdev_issue_discard only returns 0 if IO error is
> Shaohua> -EOPNOTSUPP.
> 
> Oh, I see. The sanity checks are now in __blkdev_issue_discard() so
> there is no way to distinguish between -EOPNOTSUPP and the other
> -EOPNOTSUPP. *sigh*

We can move the sanity checks out.  Or even better get rid of the
stupid behavior of ignoring the late -EOPNOTSUPP in this low level
helper and instead leaving it to the caller(s) that care.  So far
the DM test suite seems to be the only one that does.

> I am OK with your patch as a stable fix but this really needs to be
> fixed up properly.

And I'd much prefer to get this right now.  It's not like this is
recently introduced behavior.


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-13 Thread Christoph Hellwig
On Fri, Jun 10, 2016 at 09:49:44PM -0400, Martin K. Petersen wrote:
> >> What does the extra io_err buy us? Just have this function return an
> >> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
> >> special case it there.
> 
> Shaohua> The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't
> Shaohua> support discard.  in that case, blkdev_issue_discard doesn't
> Shaohua> return 0. blkdev_issue_discard only returns 0 if IO error is
> Shaohua> -EOPNOTSUPP.
> 
> Oh, I see. The sanity checks are now in __blkdev_issue_discard() so
> there is no way to distinguish between -EOPNOTSUPP and the other
> -EOPNOTSUPP. *sigh*

We can move the sanity checks out.  Or even better get rid of the
stupid behavior of ignoring the late -EOPNOTSUPP in this low level
helper and instead leaving it to the caller(s) that care.  So far
the DM test suite seems to be the only one that does.

> I am OK with your patch as a stable fix but this really needs to be
> fixed up properly.

And I'd much prefer to get this right now.  It's not like this is
recently introduced behavior.


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-10 Thread Martin K. Petersen
> "Shaohua" == Shaohua Li  writes:

Shaohua,

>> What does the extra io_err buy us? Just have this function return an
>> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
>> special case it there.

Shaohua> The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't
Shaohua> support discard.  in that case, blkdev_issue_discard doesn't
Shaohua> return 0. blkdev_issue_discard only returns 0 if IO error is
Shaohua> -EOPNOTSUPP.

Oh, I see. The sanity checks are now in __blkdev_issue_discard() so
there is no way to distinguish between -EOPNOTSUPP and the other
-EOPNOTSUPP. *sigh*

I am OK with your patch as a stable fix but this really needs to be
fixed up properly.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-10 Thread Martin K. Petersen
> "Shaohua" == Shaohua Li  writes:

Shaohua,

>> What does the extra io_err buy us? Just have this function return an
>> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
>> special case it there.

Shaohua> The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't
Shaohua> support discard.  in that case, blkdev_issue_discard doesn't
Shaohua> return 0. blkdev_issue_discard only returns 0 if IO error is
Shaohua> -EOPNOTSUPP.

Oh, I see. The sanity checks are now in __blkdev_issue_discard() so
there is no way to distinguish between -EOPNOTSUPP and the other
-EOPNOTSUPP. *sigh*

I am OK with your patch as a stable fix but this really needs to be
fixed up properly.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-09 Thread Shaohua Li
On Thu, Jun 09, 2016 at 10:04:08PM -0400, Martin K. Petersen wrote:
> > "Shaohua" == Shaohua Li  writes:
> 
> Shaohua,
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 23d7f30..a3a26c8 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>  }
>  EXPORT_SYMBOL(__blkdev_issue_discard);
>  
> +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t 
> sector,
> + sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
> + int *io_err)
> +{
> + int type = REQ_WRITE | REQ_DISCARD;
> + struct bio *bio = NULL;
> + struct blk_plug plug;
> + int ret;
> +
> + if (flags & BLKDEV_DISCARD_SECURE)
> + type |= REQ_SECURE;
> +
> + blk_start_plug();
> + ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> + );
> + if (!ret && bio)
> + *io_err = submit_bio_wait(type, bio);
> + blk_finish_plug();
> +
> + return ret;
> +}
> +
> 
> What does the extra io_err buy us? Just have this function return an
> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
> special case it there.

The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't support discard.
in that case, blkdev_issue_discard doesn't return 0. blkdev_issue_discard only
returns 0 if IO error is -EOPNOTSUPP. Please see bbd848e0fade51ae51da.

Thanks,
Shaohua


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-09 Thread Shaohua Li
On Thu, Jun 09, 2016 at 10:04:08PM -0400, Martin K. Petersen wrote:
> > "Shaohua" == Shaohua Li  writes:
> 
> Shaohua,
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 23d7f30..a3a26c8 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>  }
>  EXPORT_SYMBOL(__blkdev_issue_discard);
>  
> +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t 
> sector,
> + sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
> + int *io_err)
> +{
> + int type = REQ_WRITE | REQ_DISCARD;
> + struct bio *bio = NULL;
> + struct blk_plug plug;
> + int ret;
> +
> + if (flags & BLKDEV_DISCARD_SECURE)
> + type |= REQ_SECURE;
> +
> + blk_start_plug();
> + ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> + );
> + if (!ret && bio)
> + *io_err = submit_bio_wait(type, bio);
> + blk_finish_plug();
> +
> + return ret;
> +}
> +
> 
> What does the extra io_err buy us? Just have this function return an
> error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
> special case it there.

The __blkdev_issue_discard returns -EOPNOTSUPP if disk doesn't support discard.
in that case, blkdev_issue_discard doesn't return 0. blkdev_issue_discard only
returns 0 if IO error is -EOPNOTSUPP. Please see bbd848e0fade51ae51da.

Thanks,
Shaohua


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-09 Thread Martin K. Petersen
> "Shaohua" == Shaohua Li  writes:

Shaohua,

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..a3a26c8 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL(__blkdev_issue_discard);
 
+static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
+   int *io_err)
+{
+   int type = REQ_WRITE | REQ_DISCARD;
+   struct bio *bio = NULL;
+   struct blk_plug plug;
+   int ret;
+
+   if (flags & BLKDEV_DISCARD_SECURE)
+   type |= REQ_SECURE;
+
+   blk_start_plug();
+   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
+   );
+   if (!ret && bio)
+   *io_err = submit_bio_wait(type, bio);
+   blk_finish_plug();
+
+   return ret;
+}
+

What does the extra io_err buy us? Just have this function return an
error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
special case it there.

 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
 {
-   int type = REQ_WRITE | REQ_DISCARD;
-   struct bio *bio = NULL;
-   struct blk_plug plug;
-   int ret;
+   int ret, io_err;
 
-   if (flags & BLKDEV_DISCARD_SECURE)
-   type |= REQ_SECURE;
-
-   blk_start_plug();
-   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
-   );
-   if (!ret && bio) {
-   ret = submit_bio_wait(type, bio);
-   if (ret == -EOPNOTSUPP)
-   ret = 0;
-   }
-   blk_finish_plug();
+   ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+   flags, _err);
+   if (!ret && io_err != -EOPNOTSUPP)
+   ret = io_err;
 
return ret;
 }
@@ -167,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
 
if (bio)
ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
-   return ret != -EOPNOTSUPP ? ret : 0;
+   return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
@@ -236,9 +247,11 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
struct request_queue *q = bdev_get_queue(bdev);
+   int io_err = 0;
 
if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
+   (do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
+_err) == 0 && io_err == 0))
return 0;
 
if (bdev_write_same(bdev) &&

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-09 Thread Martin K. Petersen
> "Shaohua" == Shaohua Li  writes:

Shaohua,

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..a3a26c8 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL(__blkdev_issue_discard);
 
+static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
+   int *io_err)
+{
+   int type = REQ_WRITE | REQ_DISCARD;
+   struct bio *bio = NULL;
+   struct blk_plug plug;
+   int ret;
+
+   if (flags & BLKDEV_DISCARD_SECURE)
+   type |= REQ_SECURE;
+
+   blk_start_plug();
+   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
+   );
+   if (!ret && bio)
+   *io_err = submit_bio_wait(type, bio);
+   blk_finish_plug();
+
+   return ret;
+}
+

What does the extra io_err buy us? Just have this function return an
error. And then in blkdev_issue_discard if you get -EOPNOTSUPP you
special case it there.

 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
 {
-   int type = REQ_WRITE | REQ_DISCARD;
-   struct bio *bio = NULL;
-   struct blk_plug plug;
-   int ret;
+   int ret, io_err;
 
-   if (flags & BLKDEV_DISCARD_SECURE)
-   type |= REQ_SECURE;
-
-   blk_start_plug();
-   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
-   );
-   if (!ret && bio) {
-   ret = submit_bio_wait(type, bio);
-   if (ret == -EOPNOTSUPP)
-   ret = 0;
-   }
-   blk_finish_plug();
+   ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+   flags, _err);
+   if (!ret && io_err != -EOPNOTSUPP)
+   ret = io_err;
 
return ret;
 }
@@ -167,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
 
if (bio)
ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
-   return ret != -EOPNOTSUPP ? ret : 0;
+   return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
@@ -236,9 +247,11 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
struct request_queue *q = bdev_get_queue(bdev);
+   int io_err = 0;
 
if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
+   (do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
+_err) == 0 && io_err == 0))
return 0;
 
if (bdev_write_same(bdev) &&

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-07 Thread Shaohua Li
On Tue, Jun 07, 2016 at 05:50:49AM +0100, Sitsofe Wheeler wrote:
> On Mon, Jun 06, 2016 at 03:33:58PM -0700, Shaohua Li wrote:
> > blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
> > fallback to regular write. The problem is discard/writesame doesn't
> > return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
> > disk data not changed. zeroout should have guaranteed zero-fill
> > behavior.
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=118581
> > 
> > V2: move the return value policy to blkdev_issue_discard and
> > delete the policy for blkdev_issue_write_same (Martin)
> > 
> > Cc: Sitsofe Wheeler 
> > Cc: Mike Snitzer 
> > Cc: Jens Axboe 
> > Cc: Martin K. Petersen 
> > Signed-off-by: Shaohua Li 
> > ---
> >  block/blk-lib.c | 49 +++--
> >  1 file changed, 31 insertions(+), 18 deletions(-)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 23d7f30..a3a26c8 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> > sector_t sector,
> >  }
> >  EXPORT_SYMBOL(__blkdev_issue_discard);
> >  
> > +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t 
> > sector,
> > +   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
> > +   int *io_err)
> > +{
> > +   int type = REQ_WRITE | REQ_DISCARD;
> > +   struct bio *bio = NULL;
> > +   struct blk_plug plug;
> > +   int ret;
> > +
> > +   if (flags & BLKDEV_DISCARD_SECURE)
> > +   type |= REQ_SECURE;
> > +
> > +   blk_start_plug();
> > +   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> > +   );
> > +   if (!ret && bio)
> > +   *io_err = submit_bio_wait(type, bio);
> > +   blk_finish_plug();
> > +
> > +   return ret;
> > +}
> > +
> >  /**
> >   * blkdev_issue_discard - queue a discard
> >   * @bdev:  blockdev to issue discard for
> > @@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
> >  int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> > sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
> >  {
> > -   int type = REQ_WRITE | REQ_DISCARD;
> > -   struct bio *bio = NULL;
> > -   struct blk_plug plug;
> > -   int ret;
> > +   int ret, io_err;
> >  
> > -   if (flags & BLKDEV_DISCARD_SECURE)
> > -   type |= REQ_SECURE;
> > -
> > -   blk_start_plug();
> > -   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> > -   );
> > -   if (!ret && bio) {
> > -   ret = submit_bio_wait(type, bio);
> > -   if (ret == -EOPNOTSUPP)
> > -   ret = 0;
> > -   }
> > -   blk_finish_plug();
> > +   ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
> > +   flags, _err);
> > +   if (!ret && io_err != -EOPNOTSUPP)
> > +   ret = io_err;
> 
> Because io_err is always consulted if ret is not true shouldn't it be
> explicitly initialized to 0 before the call to do_blkdev_issue_discard
> (as do_blkdev_issue_discard will only set io_err if bio returned true)?
> 
> Perhaps there's an argument that do_blkdev_issue_discard should always
> set io_err on all its paths rather than just on errors in case the
> caller hasn't initialized it - is there an existing kernel pattern for
> this)?

I didn't follow. io_err is only and always set when ret == 0. io_err is
meanless if ret != 0, because that means the disk doesn't support discard and
we don't dispatch discard IO. why should we initialized io_err to 0?


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-07 Thread Shaohua Li
On Tue, Jun 07, 2016 at 05:50:49AM +0100, Sitsofe Wheeler wrote:
> On Mon, Jun 06, 2016 at 03:33:58PM -0700, Shaohua Li wrote:
> > blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
> > fallback to regular write. The problem is discard/writesame doesn't
> > return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
> > disk data not changed. zeroout should have guaranteed zero-fill
> > behavior.
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=118581
> > 
> > V2: move the return value policy to blkdev_issue_discard and
> > delete the policy for blkdev_issue_write_same (Martin)
> > 
> > Cc: Sitsofe Wheeler 
> > Cc: Mike Snitzer 
> > Cc: Jens Axboe 
> > Cc: Martin K. Petersen 
> > Signed-off-by: Shaohua Li 
> > ---
> >  block/blk-lib.c | 49 +++--
> >  1 file changed, 31 insertions(+), 18 deletions(-)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 23d7f30..a3a26c8 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> > sector_t sector,
> >  }
> >  EXPORT_SYMBOL(__blkdev_issue_discard);
> >  
> > +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t 
> > sector,
> > +   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
> > +   int *io_err)
> > +{
> > +   int type = REQ_WRITE | REQ_DISCARD;
> > +   struct bio *bio = NULL;
> > +   struct blk_plug plug;
> > +   int ret;
> > +
> > +   if (flags & BLKDEV_DISCARD_SECURE)
> > +   type |= REQ_SECURE;
> > +
> > +   blk_start_plug();
> > +   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> > +   );
> > +   if (!ret && bio)
> > +   *io_err = submit_bio_wait(type, bio);
> > +   blk_finish_plug();
> > +
> > +   return ret;
> > +}
> > +
> >  /**
> >   * blkdev_issue_discard - queue a discard
> >   * @bdev:  blockdev to issue discard for
> > @@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
> >  int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> > sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
> >  {
> > -   int type = REQ_WRITE | REQ_DISCARD;
> > -   struct bio *bio = NULL;
> > -   struct blk_plug plug;
> > -   int ret;
> > +   int ret, io_err;
> >  
> > -   if (flags & BLKDEV_DISCARD_SECURE)
> > -   type |= REQ_SECURE;
> > -
> > -   blk_start_plug();
> > -   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> > -   );
> > -   if (!ret && bio) {
> > -   ret = submit_bio_wait(type, bio);
> > -   if (ret == -EOPNOTSUPP)
> > -   ret = 0;
> > -   }
> > -   blk_finish_plug();
> > +   ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
> > +   flags, _err);
> > +   if (!ret && io_err != -EOPNOTSUPP)
> > +   ret = io_err;
> 
> Because io_err is always consulted if ret is not true shouldn't it be
> explicitly initialized to 0 before the call to do_blkdev_issue_discard
> (as do_blkdev_issue_discard will only set io_err if bio returned true)?
> 
> Perhaps there's an argument that do_blkdev_issue_discard should always
> set io_err on all its paths rather than just on errors in case the
> caller hasn't initialized it - is there an existing kernel pattern for
> this)?

I didn't follow. io_err is only and always set when ret == 0. io_err is
meanless if ret != 0, because that means the disk doesn't support discard and
we don't dispatch discard IO. why should we initialized io_err to 0?


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-06 Thread Sitsofe Wheeler
On Mon, Jun 06, 2016 at 03:33:58PM -0700, Shaohua Li wrote:
> blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
> fallback to regular write. The problem is discard/writesame doesn't
> return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
> disk data not changed. zeroout should have guaranteed zero-fill
> behavior.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=118581
> 
> V2: move the return value policy to blkdev_issue_discard and
> delete the policy for blkdev_issue_write_same (Martin)
> 
> Cc: Sitsofe Wheeler 
> Cc: Mike Snitzer 
> Cc: Jens Axboe 
> Cc: Martin K. Petersen 
> Signed-off-by: Shaohua Li 
> ---
>  block/blk-lib.c | 49 +++--
>  1 file changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 23d7f30..a3a26c8 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>  }
>  EXPORT_SYMBOL(__blkdev_issue_discard);
>  
> +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t 
> sector,
> + sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
> + int *io_err)
> +{
> + int type = REQ_WRITE | REQ_DISCARD;
> + struct bio *bio = NULL;
> + struct blk_plug plug;
> + int ret;
> +
> + if (flags & BLKDEV_DISCARD_SECURE)
> + type |= REQ_SECURE;
> +
> + blk_start_plug();
> + ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> + );
> + if (!ret && bio)
> + *io_err = submit_bio_wait(type, bio);
> + blk_finish_plug();
> +
> + return ret;
> +}
> +
>  /**
>   * blkdev_issue_discard - queue a discard
>   * @bdev:blockdev to issue discard for
> @@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
>  int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
>  {
> - int type = REQ_WRITE | REQ_DISCARD;
> - struct bio *bio = NULL;
> - struct blk_plug plug;
> - int ret;
> + int ret, io_err;
>  
> - if (flags & BLKDEV_DISCARD_SECURE)
> - type |= REQ_SECURE;
> -
> - blk_start_plug();
> - ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> - );
> - if (!ret && bio) {
> - ret = submit_bio_wait(type, bio);
> - if (ret == -EOPNOTSUPP)
> - ret = 0;
> - }
> - blk_finish_plug();
> + ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
> + flags, _err);
> + if (!ret && io_err != -EOPNOTSUPP)
> + ret = io_err;

Because io_err is always consulted if ret is not true shouldn't it be
explicitly initialized to 0 before the call to do_blkdev_issue_discard
(as do_blkdev_issue_discard will only set io_err if bio returned true)?

Perhaps there's an argument that do_blkdev_issue_discard should always
set io_err on all its paths rather than just on errors in case the
caller hasn't initialized it - is there an existing kernel pattern for
this)?

>  
>   return ret;
>  }
> @@ -167,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
> sector_t sector,
>  
>   if (bio)
>   ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
> - return ret != -EOPNOTSUPP ? ret : 0;
> + return ret;
>  }
>  EXPORT_SYMBOL(blkdev_issue_write_same);
>  
> @@ -236,9 +247,11 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
>sector_t nr_sects, gfp_t gfp_mask, bool discard)
>  {
>   struct request_queue *q = bdev_get_queue(bdev);
> + int io_err = 0;
>  
>   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
> - blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
> + (do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
> +  _err) == 0 && io_err == 0))
>   return 0;
>  
>   if (bdev_write_same(bdev) &&
> -- 
> 2.8.0.rc2

-- 
Sitsofe | http://sucs.org/~sits/


Re: [PATCH V2] block: correctly fallback for zeroout

2016-06-06 Thread Sitsofe Wheeler
On Mon, Jun 06, 2016 at 03:33:58PM -0700, Shaohua Li wrote:
> blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
> fallback to regular write. The problem is discard/writesame doesn't
> return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
> disk data not changed. zeroout should have guaranteed zero-fill
> behavior.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=118581
> 
> V2: move the return value policy to blkdev_issue_discard and
> delete the policy for blkdev_issue_write_same (Martin)
> 
> Cc: Sitsofe Wheeler 
> Cc: Mike Snitzer 
> Cc: Jens Axboe 
> Cc: Martin K. Petersen 
> Signed-off-by: Shaohua Li 
> ---
>  block/blk-lib.c | 49 +++--
>  1 file changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 23d7f30..a3a26c8 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
> sector_t sector,
>  }
>  EXPORT_SYMBOL(__blkdev_issue_discard);
>  
> +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t 
> sector,
> + sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
> + int *io_err)
> +{
> + int type = REQ_WRITE | REQ_DISCARD;
> + struct bio *bio = NULL;
> + struct blk_plug plug;
> + int ret;
> +
> + if (flags & BLKDEV_DISCARD_SECURE)
> + type |= REQ_SECURE;
> +
> + blk_start_plug();
> + ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> + );
> + if (!ret && bio)
> + *io_err = submit_bio_wait(type, bio);
> + blk_finish_plug();
> +
> + return ret;
> +}
> +
>  /**
>   * blkdev_issue_discard - queue a discard
>   * @bdev:blockdev to issue discard for
> @@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
>  int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
>  {
> - int type = REQ_WRITE | REQ_DISCARD;
> - struct bio *bio = NULL;
> - struct blk_plug plug;
> - int ret;
> + int ret, io_err;
>  
> - if (flags & BLKDEV_DISCARD_SECURE)
> - type |= REQ_SECURE;
> -
> - blk_start_plug();
> - ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
> - );
> - if (!ret && bio) {
> - ret = submit_bio_wait(type, bio);
> - if (ret == -EOPNOTSUPP)
> - ret = 0;
> - }
> - blk_finish_plug();
> + ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
> + flags, _err);
> + if (!ret && io_err != -EOPNOTSUPP)
> + ret = io_err;

Because io_err is always consulted if ret is not true shouldn't it be
explicitly initialized to 0 before the call to do_blkdev_issue_discard
(as do_blkdev_issue_discard will only set io_err if bio returned true)?

Perhaps there's an argument that do_blkdev_issue_discard should always
set io_err on all its paths rather than just on errors in case the
caller hasn't initialized it - is there an existing kernel pattern for
this)?

>  
>   return ret;
>  }
> @@ -167,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
> sector_t sector,
>  
>   if (bio)
>   ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
> - return ret != -EOPNOTSUPP ? ret : 0;
> + return ret;
>  }
>  EXPORT_SYMBOL(blkdev_issue_write_same);
>  
> @@ -236,9 +247,11 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
>sector_t nr_sects, gfp_t gfp_mask, bool discard)
>  {
>   struct request_queue *q = bdev_get_queue(bdev);
> + int io_err = 0;
>  
>   if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
> - blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
> + (do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
> +  _err) == 0 && io_err == 0))
>   return 0;
>  
>   if (bdev_write_same(bdev) &&
> -- 
> 2.8.0.rc2

-- 
Sitsofe | http://sucs.org/~sits/


[PATCH V2] block: correctly fallback for zeroout

2016-06-06 Thread Shaohua Li
blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
fallback to regular write. The problem is discard/writesame doesn't
return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
disk data not changed. zeroout should have guaranteed zero-fill
behavior.

https://bugzilla.kernel.org/show_bug.cgi?id=118581

V2: move the return value policy to blkdev_issue_discard and
delete the policy for blkdev_issue_write_same (Martin)

Cc: Sitsofe Wheeler 
Cc: Mike Snitzer 
Cc: Jens Axboe 
Cc: Martin K. Petersen 
Signed-off-by: Shaohua Li 
---
 block/blk-lib.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..a3a26c8 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL(__blkdev_issue_discard);
 
+static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
+   int *io_err)
+{
+   int type = REQ_WRITE | REQ_DISCARD;
+   struct bio *bio = NULL;
+   struct blk_plug plug;
+   int ret;
+
+   if (flags & BLKDEV_DISCARD_SECURE)
+   type |= REQ_SECURE;
+
+   blk_start_plug();
+   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
+   );
+   if (!ret && bio)
+   *io_err = submit_bio_wait(type, bio);
+   blk_finish_plug();
+
+   return ret;
+}
+
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
 {
-   int type = REQ_WRITE | REQ_DISCARD;
-   struct bio *bio = NULL;
-   struct blk_plug plug;
-   int ret;
+   int ret, io_err;
 
-   if (flags & BLKDEV_DISCARD_SECURE)
-   type |= REQ_SECURE;
-
-   blk_start_plug();
-   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
-   );
-   if (!ret && bio) {
-   ret = submit_bio_wait(type, bio);
-   if (ret == -EOPNOTSUPP)
-   ret = 0;
-   }
-   blk_finish_plug();
+   ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+   flags, _err);
+   if (!ret && io_err != -EOPNOTSUPP)
+   ret = io_err;
 
return ret;
 }
@@ -167,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
 
if (bio)
ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
-   return ret != -EOPNOTSUPP ? ret : 0;
+   return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
@@ -236,9 +247,11 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
struct request_queue *q = bdev_get_queue(bdev);
+   int io_err = 0;
 
if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
+   (do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
+_err) == 0 && io_err == 0))
return 0;
 
if (bdev_write_same(bdev) &&
-- 
2.8.0.rc2



[PATCH V2] block: correctly fallback for zeroout

2016-06-06 Thread Shaohua Li
blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
fallback to regular write. The problem is discard/writesame doesn't
return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
disk data not changed. zeroout should have guaranteed zero-fill
behavior.

https://bugzilla.kernel.org/show_bug.cgi?id=118581

V2: move the return value policy to blkdev_issue_discard and
delete the policy for blkdev_issue_write_same (Martin)

Cc: Sitsofe Wheeler 
Cc: Mike Snitzer 
Cc: Jens Axboe 
Cc: Martin K. Petersen 
Signed-off-by: Shaohua Li 
---
 block/blk-lib.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..a3a26c8 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -84,6 +84,28 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL(__blkdev_issue_discard);
 
+static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+   sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
+   int *io_err)
+{
+   int type = REQ_WRITE | REQ_DISCARD;
+   struct bio *bio = NULL;
+   struct blk_plug plug;
+   int ret;
+
+   if (flags & BLKDEV_DISCARD_SECURE)
+   type |= REQ_SECURE;
+
+   blk_start_plug();
+   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
+   );
+   if (!ret && bio)
+   *io_err = submit_bio_wait(type, bio);
+   blk_finish_plug();
+
+   return ret;
+}
+
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -98,23 +120,12 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
 {
-   int type = REQ_WRITE | REQ_DISCARD;
-   struct bio *bio = NULL;
-   struct blk_plug plug;
-   int ret;
+   int ret, io_err;
 
-   if (flags & BLKDEV_DISCARD_SECURE)
-   type |= REQ_SECURE;
-
-   blk_start_plug();
-   ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, type,
-   );
-   if (!ret && bio) {
-   ret = submit_bio_wait(type, bio);
-   if (ret == -EOPNOTSUPP)
-   ret = 0;
-   }
-   blk_finish_plug();
+   ret = do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+   flags, _err);
+   if (!ret && io_err != -EOPNOTSUPP)
+   ret = io_err;
 
return ret;
 }
@@ -167,7 +178,7 @@ int blkdev_issue_write_same(struct block_device *bdev, 
sector_t sector,
 
if (bio)
ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
-   return ret != -EOPNOTSUPP ? ret : 0;
+   return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
@@ -236,9 +247,11 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
struct request_queue *q = bdev_get_queue(bdev);
+   int io_err = 0;
 
if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-   blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
+   (do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
+_err) == 0 && io_err == 0))
return 0;
 
if (bdev_write_same(bdev) &&
-- 
2.8.0.rc2