> On 8 Feb 2018, at 16:28, Eric Blake <ebl...@redhat.com> wrote: > > On 02/08/2018 07:23 AM, Edgar Kaziakhmedov wrote: >> Upstream NBD protocol implementation supports an efficient zero out >> mechanism over the wire, along with the ability to check whether a >> client allows using a hole. >> Accordingly, since PWRITE_ZERO doesn't involve any payload on the wire, >> increase a maximum size of the PWRITE_ZERO request up to 1Gb (aligned). >> Moreover, such change will decrease the number of PWRITE_ZERO NBD commands >> in comparison with the current 32M limit. The benefits of >> the larger constraint can be examined in a block mirroring over NBD. > > We've got a potential problem. Unless you have out-of-band communication of > the maximum NBD_CMD_WRITE_ZEROES sizing (or if the NBD protocol is enhanced > to advertise that as an additional piece of block size information during > NBD_OPT_GO), then a client CANNOT assume that the server will accept a > request this large. We MIGHT get lucky if all existing servers that accept > WRITE_ZEROES requests either act on large requests or reply with EINVAL but > do not outright drop the connection (which is different from servers that DO > outright drop the connection for an NBD_CMD_WRITE larger than 32M). But I > don't know if that's how all servers behave, so sending a too-large > WRITE_ZEROES request may have the unintended consequence of killing the > connection. > > I'm adding the NBD list; perhaps before accepting this into qemu, I should > revive my earlier attempt at codifying an NBD_OPT_GO info advertisement for > maximum trim/zero sizing, which would let clients have a guarantee that their > choice of sizing won't cause unexpected failures.
A couple of comments: 1. The length field is only 32 bits, so no writes more than 0xffffffff in length are going to work anyway :-) 2. I'm not sure the situation is as bad as you make out Eric. I think you've forgotten the NBD_OPT_INFO work and the conversation around that we had where we determined that servers not supporting NBD_OPT_INFO were already meant to support 'unlimited' size writes. Per the spec: "If block size constraints have not been advertised or agreed on externally, then a client SHOULD assume a default minimum block size of 1, a preferred block size of 2^12 (4,096), and a maximum block size of the smaller of the export size or 0xffffffff (effectively unlimited)." I read these to apply to all uses of 'length', but even if one argues it doesn't apply to NBD_CMD_WRITE_ZEROES because it doesn't have a payload, I think the rebuttal is that a server which supports NBD_CMD_WRITE of a given length must also support NBD_CMD_WRITE_ZEROES of that length. So I think a reasonable logic for Qemu would be to try NBD_CMD_INFO and find the maximum write size, and if that's unsupported use 0xffffffff (capping at export size, or export size minus write offset). -- Alex Bligh