On 02/08/2018 06:55 PM, Eric Blake wrote:
On 02/08/2018 09:28 AM, Edgar Kaziakhmedov wrote:
We've got a potential problem. Unless you have out-of-band
communication of the maximum NBD_CMD_WRITE_ZEROES sizing (or if the
NBD protocol is enhanced to advertise that as an additional piece of
block size information during NBD_OPT_GO), then a client CANNOT
assume that the server will accept a request this large. We MIGHT
get lucky if all existing servers that accept WRITE_ZEROES requests
either act on large requests or reply with EINVAL but do not
outright drop the connection (which is different from servers that
DO outright drop the connection for an NBD_CMD_WRITE larger than
32M). But I don't know if that's how all servers behave, so sending
a too-large WRITE_ZEROES request may have the unintended consequence
of killing the connection.
Actually, I do not understand why current NBD servers shouldn't
accept such large requests, because most servers should apply some
optimizations avoiding direct filling with zeroes.
Just because a server CAN optimize doesn't mean that it is REQUIRED to
optimize. You cannot make assumptions that a server will be happy
with a larger request, merely because less data was sent over the
wire, because the server may still have to allocate memory locally to
perform the request.
As for block-mirroring over NBD, it works fine with QEMU server
implementation and isn't it the main application?
Yes, qemu-to-qemu interoperating as efficiently as possible is nice;
but I'm worried about qemu-to-other interoperating as well. The point
of a public specification is to avoid one-way silos, so that you CAN
mix-and-match a server from one implementation with a client from
another, rather than being forced to use qemu as the server when qemu
is the client. Note that portability can include hand-shaking to fall
back to the least-common denominator, rather than requiring both sides
to always understand all extensions; but the important part is that
neither party should make assumptions about the other side without
using the spec as their guide.
So, in that case it is required to negotiate about the biggest
write_zero chunk size before communication, if such option is featured
in mainline NBD.