Re: [Drbd-dev] [PATCH 28/42] drbd: switch to proc_create_single

2018-05-18 Thread Lars Ellenberg
On Wed, May 16, 2018 at 11:43:32AM +0200, Christoph Hellwig wrote:
> And stop messing with try_module_get on THIS_MODULE, which doesn't make
> any sense here.

The idea was to increase module count on /proc/drbd access.

If someone holds /proc/drbd open, previously rmmod would
"succeed" in starting the unload, but then block on remove_proc_entry,
leading to a situation where the lsmod does not show drbd anymore,
but /proc/drbd being still there (but no longer accessible).

I'd rather have rmmod fail up front in this case.
And try_module_get() seemed most appropriate.

Lars



Re: [Drbd-dev] [PATCH 23/27] drbd: make intelligent use of blkdev_issue_zeroout

2018-01-16 Thread Lars Ellenberg
On Mon, Jan 15, 2018 at 10:07:38AM -0500, Mike Snitzer wrote:
> > See also:
> > https://www.redhat.com/archives/dm-devel/2017-March/msg00213.html
> > https://www.redhat.com/archives/dm-devel/2017-March/msg00226.html
> 
> Right, now that you mention it it is starting to ring a bell (especially
> after I read your 2nd dm-devel archive url above).

> > In tree, either dm-thin learns to do REQ_OP_WRITE_ZEROES "properly",
> > so the result in this scenario is what we expect:
> > 
> >   _: unprovisioned, not allocated, returns zero on read anyways
> >   *: provisioned, some arbitrary data
> >   0: explicitly zeroed:
> > 
> >   |gran|ular|ity ||||
> >   |||||
> >  to|-be-|zero|ed
> >   |**00|||00**|
> > 
> > (leave unallocated blocks alone,
> >  de-allocate full blocks just like with discard,
> >  explicitly zero unaligned head and tail)
> 
> "de-allocate full blocks just like with discard" is an interesting take
> what it means for dm-thin to handle REQ_OP_WRITE_ZEROES "properly".
> 
> > Or DRBD will have to resurrect that reinvented zeroout again,
> > with exactly those semantics. I did reinvent it for a reason ;)
> 
> Yeah, I now recall dropping that line of development because it
> became "hard" (or at least harder than originally thought).
> 
> Don't people use REQ_OP_WRITE_ZEROES to initialize a portion of the
> disk?  E.g. zeroing superblocks, metadata areas, or whatever?
> 
> If we just discarded the logical extent and then a user did a partial
> write to the block, areas that a user might expect to be zeroed wouldn't
> be (at least in the case of dm-thinp if "skip_block_zeroing" is
> enabled).


Oh-kay.
So "immediately after" such an operation
("zero-out head and tail and de-alloc full blocks")
a read to that area would return all zeros, as expected.

But once you do a partial write of something to one of those
de-allocated blocks (and skip_block_zeroing is enabled,
which it likely is due to "performance"),
"magically" arbitrary old garbage data springs into existence
on the LBAs that just before read as zeros.

lvmthin lvm.conf
Would that not break a lot of other things
(any read-modify-write of "upper layers")?
Would that not even be a serious "information leak"
(old garbage of other completely unrelated LVs leaking into this one)?

But thank you for that, I start to see the problem ;-)

> No, dm-thinp doesn't have an easy way to mark an allocated block as
> containing zeroes (without actually zeroing).  I toyed with adding that
> but then realized that even if we had it it'd still require block
> zeroing be enabled.  But block zeroing is done at allocation time.  So
> we'd need to interpret the "this block is zeroes" flag to mean "on first
> write or read to this block it needs to first zero it".  Fugly to say
> the least...


Maybe have a "known zeroed block" pool, allocate only from there,
and "lazy zero" unallocated blocks, add to the known-zero pool?
Fallback to zero-on-alloc if that known-zero-pool is depleted.

Easier said than done, I know.

> But sadly, in general, this is a low priority for me, so you might do
> well to reintroduce your drbd workaround.. sorry about that :(

No problem.
I'll put that back in, and document that we strongly recommend to
NOT skip_block_zeroing in those setups.

Thanks,

Lars



Re: [Drbd-dev] [PATCH 23/27] drbd: make intelligent use of blkdev_issue_zeroout

2018-01-15 Thread Lars Ellenberg
On Sat, Jan 13, 2018 at 12:46:40AM +, Eric Wheeler wrote:
> Hello All,
> 
> We just noticed that discards to DRBD devices backed by dm-thin devices 
> are fully allocating the thin blocks.
> 
> This behavior does not exist before 
> ee472d83 block: add a flags argument to (__)blkdev_issue_zeroout
> 
> The problem exists somewhere between
> [working] c20cfc27 block: stop using blkdev_issue_write_same for zeroing
>   and
> [broken]  45c21793 drbd: implement REQ_OP_WRITE_ZEROES
> 
> Note that c20cfc27 works as expected, but 45c21793 discards blocks 
> being zeroed on the dm-thin backing device. All commits between those two 
> produce the following error:
> 
> blkdiscard: /dev/drbd/by-res/test: BLKDISCARD ioctl failed: Input/output error
> 
> Also note that issuing a blkdiscard to the backing device directly 
> discards as you would expect. This is just a problem when sending discards 
> through DRBD.
> 
> Is there an easy way to solve this in the short term, even if the ultimate 
> fix is more involved?

> On Wed, 5 Apr 2017, Christoph Hellwig wrote:
> 

commit 0dbed96a3cc9786bc4814dab98a7218753bde934
Author: Christoph Hellwig <h...@lst.de>
Date:   Wed Apr 5 19:21:21 2017 +0200

drbd: make intelligent use of blkdev_issue_zeroout

> > drbd always wants its discard wire operations to zero the blocks, so
> > use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
> > reinventing it poorly.

> > -/*
> > - * We *may* ignore the discard-zeroes-data setting, if so configured.
> > - *
> > - * Assumption is that it "discard_zeroes_data=0" is only because the 
> > backend
> > - * may ignore partial unaligned discards.
> > - *
> > - * LVM/DM thin as of at least
> > - *   LVM version: 2.02.115(2)-RHEL7 (2015-01-28)
> > - *   Library version: 1.02.93-RHEL7 (2015-01-28)
> > - *   Driver version:  4.29.0
> > - * still behaves this way.
> > - *
> > - * For unaligned (wrt. alignment and granularity) or too small discards,
> > - * we zero-out the initial (and/or) trailing unaligned partial chunks,
> > - * but discard all the aligned full chunks.
> > - *
> > - * At least for LVM/DM thin, the result is effectively 
> > "discard_zeroes_data=1".
> > - */
> > -int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t 
> > start, unsigned int nr_sectors, bool discard)


As I understood it,
blkdev_issue_zeroout() was supposed to "always try to unmap",
deprovision, the relevant region, and zero-out any unaligned
head or tail, just like my work around above was doing.

And that device mapper thin was "about to" learn this, "soon",
or maybe block core would do the equivalent of my workaround
described above.

But it then did not.

See also:
https://www.redhat.com/archives/dm-devel/2017-March/msg00213.html
https://www.redhat.com/archives/dm-devel/2017-March/msg00226.html

I then did not follow this closely enough anymore,
and I missed that with recent enough kernel,
discard on DRBD on dm-thin would fully allocate.

In our out-of-tree module, we had to keep the older code for
compat reasons, anyways. I will just re-enable our zeroout
workaround there again.

In tree, either dm-thin learns to do REQ_OP_WRITE_ZEROES "properly",
so the result in this scenario is what we expect:

  _: unprovisioned, not allocated, returns zero on read anyways
  *: provisioned, some arbitrary data
  0: explicitly zeroed:

  |gran|ular|ity ||||
  |||||
 to|-be-|zero|ed
  |**00|||00**|

(leave unallocated blocks alone,
 de-allocate full blocks just like with discard,
 explicitly zero unaligned head and tail)

Or DRBD will have to resurrect that reinvented zeroout again,
with exactly those semantics. I did reinvent it for a reason ;)

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT


Re: [Drbd-dev] [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES

2017-03-30 Thread Lars Ellenberg
On Thu, Mar 30, 2017 at 01:44:09PM +0200, Christoph Hellwig wrote:
> On Thu, Mar 30, 2017 at 12:06:41PM +0200, Lars Ellenberg wrote:
> > On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> > > It seems like DRBD assumes its on the wire TRIM request always zeroes 
> > > data.
> > > Use that fact to implement REQ_OP_WRITE_ZEROES.
> > > 
> > > XXX: will need a careful audit from the drbd team!
> > 
> > Thanks, this one looks ok to me.
> 
> So the DRBD protocol requires the TRIM operation to always zero?

"users" (both as in submitting entities, and people using DRBD)
expect that DRBD guarantees replicas to be identical after whatever
operations have been completed by all replicas.

Which means that for trim/discard/unmap, we can only expose that to
upper layers (or use it for internal purposes) if the operation has
a well defined, and on all backends identical, result.

Short answer: Yes.

> > The real question for me is, will the previous one (21/23)
> > return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
> > what we have now?
> 
> No, blkdev_issue_zeroout should never return -EOPNOTSUPP.
> 
> > Will it make an fstrim cause thinly provisioned
> > devices to suddenly be fully allocated?
> 
> Not for SCSI devices.  Yes for dm-thinp until it implements
> REQ_OP_WRITE_ZEROES, which will hopefully be soon.

"good enough for me" ;-)

Thanks,

Lars



Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES

2017-03-30 Thread Lars Ellenberg
On Thu, Mar 23, 2017 at 10:33:40AM -0400, Christoph Hellwig wrote:
> It seems like DRBD assumes its on the wire TRIM request always zeroes data.
> Use that fact to implement REQ_OP_WRITE_ZEROES.
> 
> XXX: will need a careful audit from the drbd team!

Thanks, this one looks ok to me.

The real question for me is, will the previous one (21/23)
return != 0 (some EOPNOTSUPP or else) to DRBD in more situations than
what we have now?  Will it make an fstrim cause thinly provisioned
devices to suddenly be fully allocated?
Or does it unmap "the same" as what we have now?
Especially on top of dm-thin, but also on top of any other device.
That's something that is not really "obvious" to me yet.

Cheers,
Lars




Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload

2017-03-23 Thread Lars Ellenberg
On Thu, Mar 23, 2017 at 01:02:22PM -0400, Mike Snitzer wrote:
> On Thu, Mar 23 2017 at 11:54am -0400,
> Lars Ellenberg <lars.ellenb...@linbit.com> wrote:
> 
> > On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> > > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > > supported by the block layer, and switches existing implementations
> > > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > > based zeroing in SCSI to this new method.
> > > 
> > > I've done testing with ATA, SCSI and NVMe setups, but there are
> > > a few things that will need more attention:
> > > 
> > 
> > >  - The DRBD code in this area was very odd,
> > 
> > DRBD wants all replicas to give back identical data.
> > If what comes back after a discard is "undefined",
> > we cannot really use that.
> > 
> > We used to "stack" discard only if our local backend claimed
> > "discard_zeroes_data". We replicate that IO request to the peer
> > as discard, and if the peer cannot do discards itself, or has
> > discard_zeroes_data == 0, the peer will use zeroout instead.
> > 
> > One use-case for this is the device mapper "thin provisioning".
> > At the time I wrote those "odd" hacks, dm thin targets
> > would set discard_zeroes_data=0, NOT change discard granularity,
> > but only actually discard (drop from the tree) whole "chunks",
> > leaving partial start/end chunks in the mapping tree unchanged.
> > 
> > The logic of "only stack discard, if backend discard_zeroes_data"

That is DRBDs logic I just explained above.
And the "backend" (to DRBD) in that sentence would be thin, and not
the "real" hardware below thin, which may not even support discard.

> > would mean that we would not be able to accept and pass down discards
> > to dm-thin targets. But with data on dm-thin, you would really like
> > to do the occasional fstrim.
> 
> Are you sure you aren't thinking of MD raid?

Yes.

> To this day, dm-thin.c has: ti->discard_zeroes_data_unsupported = true

That is exactly what I was saying.

Thin does not claim to zero data on discard.  which is ok, and correct,
because it only punches holes on full chunks (or whatever you call
them), and leaves the rest in the mapping tree as is.

And that behaviour would prevent DRBD from exposing discards if
configured on top of thin. (see above)

But thin *could* easily guarantee zeroing, by simply punching holes
where it can, and zeroing out the not fully-aligned partial start and
end of the range.

Which is what I added as an option between DRBD and whatever is below,
with the use-case of dm-thin in mind.

And that made it possible for DRBD to
 a) expose "discard" to upper layers, even if we would usually only do
if the DRBD Primary sits on top of a device that guarantees discard
zeros data,
 b) still use discards on a secondary, without falling back to zero-out,
which would unexpectedly fully allocate, instead of trim, a thinly
provisioned device-mapper target.


Thanks,

Lars



Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload

2017-03-23 Thread Lars Ellenberg
On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> supported by the block layer, and switches existing implementations
> of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> removes incorrect discard_zeroes_data, and also switches WRITE SAME
> based zeroing in SCSI to this new method.
> 
> I've done testing with ATA, SCSI and NVMe setups, but there are
> a few things that will need more attention:
> 

>  - The DRBD code in this area was very odd,

DRBD wants all replicas to give back identical data.
If what comes back after a discard is "undefined",
we cannot really use that.

We used to "stack" discard only if our local backend claimed
"discard_zeroes_data". We replicate that IO request to the peer
as discard, and if the peer cannot do discards itself, or has
discard_zeroes_data == 0, the peer will use zeroout instead.

One use-case for this is the device mapper "thin provisioning".
At the time I wrote those "odd" hacks, dm thin targets
would set discard_zeroes_data=0, NOT change discard granularity,
but only actually discard (drop from the tree) whole "chunks",
leaving partial start/end chunks in the mapping tree unchanged.

The logic of "only stack discard, if backend discard_zeroes_data"
would mean that we would not be able to accept and pass down discards
to dm-thin targets. But with data on dm-thin, you would really like
to do the occasional fstrim.

Also, IO backends on the peers do not have to have the same
characteristics.  You could have the DRBD Primary on some SSD,
and the Secondary on some thin-pool LV,
scheduling thin snapthots in intervals or on demand.

With the logic of "use zero-out instead", fstrim would cause it to
fully allocate what was supposed to be thinly provisioned :-(

So what I did there was optionally tell DRBD that
"discard_zeroes_data == 0" on that peer would actually mean
"discard_zeroes_data == 1,
 IF you zero-out the partial chunks of this granularity yourself".

And implemented this "discard aligned chunks of that granularity,
and zero-out partial start/end chunks, if any".

And then claim to upper layers that, yes, discard_zeroes_data=1,
in that case, if so configured, even if our backend (dm-thin)
would say discard_zeroes_data=0.

Does that make sense?  Can we still do that?  Has something like that
been done in block core or device mapper meanwhile?


> and will need an audit from the maintainers.

Will need to make some time for review and testing.

Thanks,

Lars


Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)

2014-06-25 Thread Lars Ellenberg
On Tue, Jun 24, 2014 at 07:11:47PM -0400, Martin K. Petersen wrote:
  Lars == Lars Ellenberg lars.ellenb...@linbit.com writes:
 
 Lars We are receiving (from network) and submitting (to lower level IO
 Lars stack) in the same context and would like the submit to be async.
 
 Lars Do you intend to provide an asynchronous interface?
 
 I guess we can look into that if there is a need.
 
 Do different clients share that context? I.e. does a synchronous discard
 block other clients from accessing the drbd server?

Uhm, it's not like exactly like that, really.

Because the way we do some internal bookkeeping,
we announce a max discard of 4 MiB.
So if some user on the active (Primary) DRBD
does large discards, you will end up submitting
lots of bios, and these are async.

Bios are the entry point to DRBD.
So DRBD ships these discard-bios over to the peer,
which then right now submits them as bios, again async.
So we do some pipelining, may have a number of discard bios in flight,
and effectively the latency will be increased by something in the order
of the network rtt.

If we now have to use the synchronous interface on the peer
for each discard bio, there is no longer any pipelining,
and the overall latency of a single user level discard
(that ends up doing many discard bios) will noticeably increase.

Also, since the receiver is blocked in submit,
we cannot meanwhile interleave other, normal BIOs,
so a larger discard will block all write (and depending on configuration
and current state, also read) within that DRBD resource (which again may
be one or more DRBD minor devices or volumes).

I don't have real-life numbers on how much that may hurt.

Similar for the WRITE_SAME interface (which we do not properly support
on the DRBD protocol level yet -- backward compatibility concerns -- but
intend to support soon).

If we only have a synchronous interface,
we will probably have to either add some async wrapper,
or defer such submissions to worker threads.
I'd prefer to have an async submit path.

Lars

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)

2014-06-24 Thread Lars Ellenberg
On Mon, Jun 23, 2014 at 03:37:03PM -0400, Martin K. Petersen wrote:
  Lars == Lars Ellenberg lars.ellenb...@linbit.com writes:
 
 Lars,
 
 Thanks for fixing this. 
 
 I'd still like to see you use the lib call instead like you do for
 zeroout. I have some patches in the pipeline for multi-range discard
 support and things are going to break for drbd if you manually roll
 bios.

Okay, thanks for the heads up.
I think we just did this so we would not
have to use a synchronous interface there.

We are receiving (from network) and submitting (to lower level IO stack)
in the same context and would like the submit to be async.

Do you intend to provide an asynchronous interface?

Lars

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)

2014-06-23 Thread Lars Ellenberg
On Sat, Jun 21, 2014 at 07:48:22PM +0200, Stefan Priebe wrote:
 Hi Lars,
 Am 20.06.2014 20:29, schrieb Lars Ellenberg:
 On Fri, Jun 20, 2014 at 12:49:39PM -0400, Martin K. Petersen wrote:
 Lars == Lars Ellenberg lars.ellenb...@linbit.com writes:
 
 Lars,
 
 Lars Any bio allocated that will be passed down with REQ_DISCARD has to
 Lars be allocated with nr_iovecs = 1 (at least), even though it must
 Lars not contain any bio_vec payload.
 
 True. Although the correct answer is: Any discard request must be issued
 by blkdev_issue_discard(). That's the interface.
 
 The hacks we do to carry the information inside the bio constitute an
 internal interface that is subject to change (it is just about to,
 actually).
 
 Lars Though DRBD in 3.10 is not supposed to accept discard requests.
 Lars So I'm not sure how it manages to pass them down?
 
 your're absolutely right - a collegue installed drbd 8.4.4 as a
 module. I didn't knew that. Sorry.

That is (again) incorrect/incomplete.

Your original post:
 while using vanilla 3.10.44 with drbd on top of a md raid1.
...
 CPU: 0 PID: 636 Comm: md124_raid1 Tainted: G O 3.10.41+76-ph #1
 Modules linked in: ... drbd ...

So it's not vanilla, its not 3.10.44, and its not 3.10.41 either,
and its not even a clean external module.

But its something based on 3.10.41,
where you added your own patches or backports,
and now complain to the upstream maintainers that it explodes,
and don't bother to tell them that it is modified code.

 So your attached patch will fix it?

No.
For the out-of-tree module it is fixed.
You just need to upgrade.

This is for the 3.16-rc1 and later in-tree DRBD,
where this fix apparently slipt through when preparing the pull request.

It has not even been in a released mainline kernel yet.

But thanks anyways for reporting it,
it may have ended up unnoticed in 3.16.

Lars

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)

2014-06-20 Thread Lars Ellenberg
On Thu, Jun 19, 2014 at 11:08:22PM -0400, Martin K. Petersen wrote:
  Stefan == Stefan Priebe - Profihost AG s.pri...@profihost.ag 
  writes:
 
 Stefan Hi, while using vanilla 3.10.44 with drbd on top of a md raid1.
 
 Stefan I'm pretty often hitting the followin kernel bug.
 
 Stefan [8128105c] blk_add_request_payload+0xc/0x90
 
 That's really messed up. This means we received a request with no bio.

No.
That means you received a bio that has been allocated with
bio_alloc(... , nr_iovecs = 0);

thus bio-bi_io_vec is NULL,
but blk_add_request_payload insists on using it anyways.

Even though it also requires that bio-bi_vcnt = 0
(because it then explicitly sets that to 1).

This is some subtlety with discard requests that has bitten some
stacking drivers now.

Any bio allocated that will be passed down with REQ_DISCARD
has to be allocated with nr_iovecs = 1 (at least),
even though it must not contain any bio_vec payload.

Though DRBD in 3.10 is not supposed to accept discard requests.
So I'm not sure how it manages to pass them down?

Lars

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sd_setup_discard_cmnd: BUG: unable to handle kernel NULL pointer dereference at (null)

2014-06-20 Thread Lars Ellenberg
On Fri, Jun 20, 2014 at 12:49:39PM -0400, Martin K. Petersen wrote:
  Lars == Lars Ellenberg lars.ellenb...@linbit.com writes:
 
 Lars,
 
 Lars Any bio allocated that will be passed down with REQ_DISCARD has to
 Lars be allocated with nr_iovecs = 1 (at least), even though it must
 Lars not contain any bio_vec payload.
 
 True. Although the correct answer is: Any discard request must be issued
 by blkdev_issue_discard(). That's the interface.
 
 The hacks we do to carry the information inside the bio constitute an
 internal interface that is subject to change (it is just about to,
 actually).
 
 Lars Though DRBD in 3.10 is not supposed to accept discard requests.
 Lars So I'm not sure how it manages to pass them down?
 
 drbd_receiver.c:
 
 static unsigned long wire_flags_to_bio(struct drbd_conf *mdev, u32 dpf)
 {
 return  (dpf  DP_RW_SYNC ? REQ_SYNC : 0) |
 (dpf  DP_FUA ? REQ_FUA : 0) |
 (dpf  DP_FLUSH ? REQ_FLUSH : 0) |
 (dpf  DP_DISCARD ? REQ_DISCARD : 0);
 }
 
 [...]
 
 /* mirrored write */
 static int receive_Data(struct drbd_tconn *tconn, struct packet_info
 *pi)
 {
 [...]
dp_flags = be32_to_cpu(p-dp_flags);
rw |= wire_flags_to_bio(mdev, dp_flags);
 [...]
 
 That's pretty busticated. I suggest you simply remove REQ_DISCARD from
 that helper for now.
 
 It's also a good idea to disable discard and write same on the client
 side when you set up the request queue:
 
   blk_queue_max_discard_sectors(q, 0);
   blk_queue_max_write_same_sectors(q, 0);

Our main development still happens out-of-tree,
trying to be compatible to a large range of kernel versions.

linux upstream DRBD is supposed to handle discards correctly
(even though not using the proper interface blkdev_issue_discard).

But it does not, because one fix apparently slipped through
when preparing the pull request.

So linux upstream needs:
diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index b6c8aaf..5b17ec8 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1337,8 +1337,11 @@ int drbd_submit_peer_request(struct drbd_device *device,
return 0;
}
 
+   /* Discards don't have any payload.
+* But the scsi layer still expects a bio_vec it can use internally,
+* see sd_setup_discard_cmnd() and blk_add_request_payload(). */
if (peer_req-flags  EE_IS_TRIM)
-   nr_pages = 0; /* discards don't have any payload. */
+   nr_pages = 1;
 
/* In most cases, we will only need one bio.  But in case the lower
 * level restrictions happen to be different at this offset on this

I'll prepare a proper patch with commit message later.

linux upstream DRBD also does blk_queue_max_write_same_sectors(q, 0)
and blk_queue_max_discard_sectors(q, DRBD_MAX_DISCARD_SECTORS)

---
For linux 3.10, things are different.

DRBD in linux 3.10 does not set QUEUE_FLAG_DISCARD,
and does not announce discard capabilities in any other way,
even though it already contains some preparation steps
(those pieces your grep foo managed to find above...)

DRBD does a handshake, and if there is no discard capability announced,
the peer is supposed to never send discards (and stop announcing them
on his side), even if the peer's DRBD version already supports
and announces discard capabilities.

So I'm still not really seeing how discard requests would be issued
by that version of DRBD.
The local submit path should not allow them (no QUEUE_FLAG_DISCARD set)
and the remote submit path should not allow them either,
for the same reason, and because the DRBD handshake does not allow them.

So my current guess would be that Stefan prepared a 3.10.44
+ upstream DRBD, but unfortunately not upstream enough?

Stefan, please give more details how to trigger this,
with which exact DRBD versions on the peers, and what action.

Lars
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html