> On 8 Feb 2018, at 16:28, Eric Blake <ebl...@redhat.com> wrote:
> 
> On 02/08/2018 07:23 AM, Edgar Kaziakhmedov wrote:
>> Upstream NBD protocol implementation supports an efficient zero out
>> mechanism over the wire, along with the ability to check whether a
>> client allows using a hole.
>> Accordingly, since PWRITE_ZERO doesn't involve any payload on the wire,
>> increase a maximum size of the PWRITE_ZERO request up to 1Gb (aligned).
>> Moreover, such change will decrease the number of PWRITE_ZERO NBD commands
>> in comparison with the current 32M limit. The benefits of
>> the larger constraint can be examined in a block mirroring over NBD.
> 
> We've got a potential problem.  Unless you have out-of-band communication of 
> the maximum NBD_CMD_WRITE_ZEROES sizing (or if the NBD protocol is enhanced 
> to advertise that as an additional piece of block size information during 
> NBD_OPT_GO), then a client CANNOT assume that the server will accept a 
> request this large.  We MIGHT get lucky if all existing servers that accept 
> WRITE_ZEROES requests either act on large requests or reply with EINVAL but 
> do not outright drop the connection (which is different from servers that DO 
> outright drop the connection for an NBD_CMD_WRITE larger than 32M).  But I 
> don't know if that's how all servers behave, so sending a too-large 
> WRITE_ZEROES request may have the unintended consequence of killing the 
> connection.
> 
> I'm adding the NBD list; perhaps before accepting this into qemu, I should 
> revive my earlier attempt at codifying an NBD_OPT_GO info advertisement for 
> maximum trim/zero sizing, which would let clients have a guarantee that their 
> choice of sizing won't cause unexpected failures.

A couple of comments:

1. The length field is only 32 bits, so no writes more than 0xffffffff in 
length are going to work anyway :-)

2. I'm not sure the situation is as bad as you make out Eric. I think you've 
forgotten the NBD_OPT_INFO work and the conversation around that we had where 
we determined that servers not supporting NBD_OPT_INFO were already meant to 
support 'unlimited' size writes. Per the spec:

"If block size constraints have not been advertised or agreed on externally, 
then a client SHOULD assume a default minimum block size of 1, a preferred 
block size of 2^12 (4,096), and a maximum block size of the smaller of the 
export size or 0xffffffff (effectively unlimited)."

I read these to apply to all uses of 'length', but even if one argues it 
doesn't apply to NBD_CMD_WRITE_ZEROES because it doesn't have a payload, I 
think the rebuttal is that a server which supports NBD_CMD_WRITE of a given 
length must also support NBD_CMD_WRITE_ZEROES of that length.

So I think a reasonable logic for Qemu would be to try NBD_CMD_INFO and find 
the maximum write size, and if that's unsupported use 0xffffffff (capping at 
export size, or export size minus write offset).

-- 
Alex Bligh





Reply via email to