from:"Eric Blake"

Re: [Nbd] [Qemu-block] NBD structured reads vs. block size

2018-08-01 Thread Eric Blake


On 08/01/2018 09:13 AM, Richard W.M. Jones wrote:

On Wed, Aug 01, 2018 at 04:49:21PM +0300, Nir Soffer wrote:

All this is an issue only when the underlying file is a file using raw
format, right?


It's an issue for any setup where the server and client disagree on the 
minimum block size, and where the file size is not aligned to one (or 
both) of the two endpoints' notion of minimum block size.




NBD should only really be used to send raw format data over the wire.
While it is possible to serve a qcow2 file over NBD as qcow2, it's not
really how NBD is intended to be used.


Well, if I ever get time to implement my proposed NBD_CMD_RESIZE 
support, then exporting qcow2 over NBD as qcow2 will be a lot more 
feasible than it currently is, but that's a side discussion.





Since NBD is meant for block devices, and block devices cannot have
partial blocks, would it be simpler to round down unaligned file size,  and
treat possible unaligned range at the end of the file as area that can not
be accessed via NBD?


NBD is not just for block devices, but yes, block devices are much less 
likely to have a reported size that is not aligned to the minimum block 
size.  So yes, the issue I'm describing (and in particular, qemu's bug 
of splitting NBD_CMD_READ and NBD_CMD_BLOCK_STATUS on non-sector 
boundaries) tends to occur mainly when using unaligned files, and not 
block devices, as the basis of the export that the NBD server is exposing.




For nbdkit I would like to preserve the ability to serve arbitrary-
sized files.  nbdkit allows people to write plugins, and we cannot be
sure that every plugin everyone writes will always use sensible block
sizes.

In a forthcoming nbdkit release you can add the "truncate" filter on
top of any plugin to round up the size to a multiple of any power of 2
(or round down if you really want).

https://www.redhat.com/archives/libguestfs/2018-August/msg6.html

Rich.



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] NBD structured reads vs. block size

2018-08-01 Thread Eric Blake

Rich Jones pointed me to questionable behavior in qemu's NBD server 
implementation today: qemu advertises a minimum block size of 512 to any 
client that promises to honor block sizes, but when serving up a raw 
file that is not aligned to a sector boundary, attempting to read that 
final portion of the file results in a structured read with two chunks, 
the first for the data up to the end of the actual file, and the second 
reporting a hole for the rest of the sector. If a client is promising to 
obey block sizes on its requests, it seems odd that the server is 
allowed to send a result that is not also aligned to block sizes.


Right now, the NBD spec says that when structured replies are in use, 
then for a structured read:


The server MAY split the reply into any number of content chunks;
each chunk MUST describe at least one byte, although to minimize
overhead, the server SHOULD use chunks with lengths and offsets as
an integer multiple of 512 bytes, where possible (the first and
last chunk of an unaligned read being the most obvious places for
an exception).

I'm wondering if we should tighten that to require that the server 
partition the reply chunks to be aligned to the advertised minimum block 
size (at which point, qemu should either advertise 1 instead of 512 as 
the minimum size when serving up an unaligned file, or else qemu should 
just send the final partial sector as a single data chunk rather than 
trying to report the last few bytes as a hole).


For comparison, on block status, we require:

   The server SHOULD use descriptor
lengths that are an integer multiple of 512 bytes where possible
(the first and last descriptor of an unaligned query being the
most obvious places for an exception), and MUST use descriptor
lengths that are an integer multiple of any advertised minimum
block size.

And qemu as a client currently hangs up on any server that violates that 
requirement on block status (that is, when qemu as the server tries to 
send a block status that was not aligned to the advertised block size, 
qemu as the client flags it as an invalid server - which means qemu as 
server is currently broken).  So I'm thinking we should copy that 
requirement onto servers for reads as well.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] Further tidy-up on block status

2018-05-03 Thread Eric Blake

Reviving this discussion, as it still seems useful to incorporate now 
that BLOCK_STATUS is only recently part of the standard.


On 12/14/2016 11:09 AM, Wouter Verhelst wrote:


One thing I've been thinking about that we might want to add:

There may be cases where a server, in performing the required calls to
be able to handle a BLOCK_STATUS request, will end up with more
information than the client asked; e.g., if the client asked information
in the base:allocation context on an extent at offset X of length Y,
then the server might conceivably do an lseek(SEEK_DATA) and/or
lseek(SEEK_HOLE). The result of that call might be that the file offset
will now point to a location Z, where Z > (X+Y).

Currently, our spec says "the sum of the *length* fields MUST not be
greater than the overall *length* of request". This means that in
essense, the server then has to throw away the information it has on the
range Z - (X + Y). In case a client was interested in that information,
that seems like a waste. I would therefore like to remove that
requirement, by rephrasing that "sum of the *length* fields" thing into
something along the following lines:

   In case the server returns N extents, the sum of the *length* fields
   of the first N-1 extents MUST NOT be greater than the overall *length*
   of the request. The final extent MAY exceed the length of the request
   if the server has that information anyway as a side effect of looking
   up the required information and wishes to share it.

This would then result in the fact that the "length" field in the
BLOCK_STATUS command would be little more than a hint, since we're
saying that a server can return more data than requested (if it's
available anyway) and less data than requested (if it would be too
resource-intensive to provide all the information). Not sure whether all
that makes much sense anymore, but hey.

In addition, the combination of a server providing more information than
requested with a "REQ_ONE" flag and a length field of zero could be an
interesting way to enumerate a whole export, too. Essentially, we could
define that as a client saying "I'm interested in what the size of the
extent at offset X is, and what its properties are".

Thoughts?



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH 1/1] nbd: increase maximum size of the PWRITE_ZERO request

2018-02-08 Thread Eric Blake


On 02/08/2018 07:23 AM, Edgar Kaziakhmedov wrote:

Upstream NBD protocol implementation supports an efficient zero out
mechanism over the wire, along with the ability to check whether a
client allows using a hole.

Accordingly, since PWRITE_ZERO doesn't involve any payload on the wire,
increase a maximum size of the PWRITE_ZERO request up to 1Gb (aligned).
Moreover, such change will decrease the number of PWRITE_ZERO NBD commands
in comparison with the current 32M limit. The benefits of
the larger constraint can be examined in a block mirroring over NBD.


We've got a potential problem.  Unless you have out-of-band 
communication of the maximum NBD_CMD_WRITE_ZEROES sizing (or if the NBD 
protocol is enhanced to advertise that as an additional piece of block 
size information during NBD_OPT_GO), then a client CANNOT assume that 
the server will accept a request this large.  We MIGHT get lucky if all 
existing servers that accept WRITE_ZEROES requests either act on large 
requests or reply with EINVAL but do not outright drop the connection 
(which is different from servers that DO outright drop the connection 
for an NBD_CMD_WRITE larger than 32M).  But I don't know if that's how 
all servers behave, so sending a too-large WRITE_ZEROES request may have 
the unintended consequence of killing the connection.


I'm adding the NBD list; perhaps before accepting this into qemu, I 
should revive my earlier attempt at codifying an NBD_OPT_GO info 
advertisement for maximum trim/zero sizing, which would let clients have 
a guarantee that their choice of sizing won't cause unexpected failures.




Signed-off-by: Edgar Kaziakhmedov <edgar.kaziakhme...@virtuozzo.com>
---
  block/nbd.c | 2 +-
  include/block/nbd.h | 3 +++
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 94220f6d14..3641d9244e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -477,7 +477,7 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error 
**errp)
  uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);

  bs->bl.max_pdiscard = max;
-bs->bl.max_pwrite_zeroes = max;
+bs->bl.max_pwrite_zeroes = NBD_MAX_PWRITE_ZERO_SIZE;


max_discard should get the same treatment, whatever we decide (yes, it 
is feasible that a server may have different limits for NBD_CMD_TRIM 
than for NBD_CMD_WRITE_ZEROES, but in general, it's probably easier if 
both commands share theh same limit as both have very similar 
implementations).



  bs->bl.max_transfer = max;

  if (s->info.opt_block &&
diff --git a/include/block/nbd.h b/include/block/nbd.h
index ee74ec391a..e2f18e2332 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -182,6 +182,9 @@ enum {
  /* Maximum size of a single READ/WRITE data buffer */
  #define NBD_MAX_BUFFER_SIZE (32 * 1024 * 1024)

+/* Maximum size of a single PWRITE_ZERO request 1Gb */
+#define NBD_MAX_PWRITE_ZERO_SIZE (1024 * 1024 * 1024)
+
  /* Maximum size of an export name. The NBD spec requires 256 and
   * suggests that servers support up to 4096, but we stick to only the
   * required size so that we can stack-allocate the names, and because
--
2.11.0




--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] nbd structured reply

2017-10-05 Thread Eric Blake

On 10/05/2017 06:30 AM, Vladimir Sementsov-Ogievskiy wrote:
> 21.09.2017 15:18, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> I'm about this:
>>
>> "A server SHOULD try to minimize the number of chunks sent in a reply,
>> but MUST NOT mark a chunk as final if there is still a possibility of
>> detecting an error before transmission of that chunk completes"
>>
>> What do we mean by "possibility"? Formally, such possibility exists
>> always, so, we'll never mark a chunk as final.
>>
> 
> One more question:
> 
> for |NBD_REPLY_TYPE_ERROR and ||NBD_REPLY_TYPE_ERROR_OFFSET, why do we
> need message_length field? why not to calc it as chunk.lenght - 4 for
> ||NBD_REPLY_TYPE_ERROR and chunk.lenght - 12 for
> ||NBD_REPLY_TYPE_ERROR_OFFSET?

For consistency.  If _all_ NBD_REPLY_TYPE_ERROR* message have a
message_length field, then it is easier to write a generic handler that
knows how to deal with an unknown error, no matter what command the
error is sent in response to.  Ideally, a server should never send an
error message that the client is not expecting, but having a robust
protocol that lets clients deal with bad servers is worth the redundancy
caused by being consistent, and we are more likely to add additional
error modes to existing commands than we are to add more success modes.

> 
> For example, with NBD_REPLY_TYPE_OFFSET_DATA variable data length is
> calculated, not specified separately.

That's because non-error types don't have the same consistency concerns;
if we want to introduce a new success response, we'll probably introduce
it via a new command, rather than as a reply to an existing command.
Furthermore, while error replies are likely to have a free-form text
error description, success replies tend to not need it.  The layout of
the error types is designed to make it easy to grab the free-form error
message from a known location for display to the user, even if the
client has no idea what the rest of the error means, as that may be a
useful debugging aid.

> 
> What is the reason for server to send NBD_REPLY_TYPE_ERROR with
> message_lenght < chunk.lenght - 4?

In all likelihood, all well-written servers will never send garbage
bytes (possible only when setting chunk.length larger than
message_length + sizeof(documented fields)).  But we wrote the spec to
be conservative, in case we want to add a later defined field that
earlier clients will still gracefully ignore, rather than strict
(allowing inequality, instead of requiring exact lengths, lets a client
skip over what it considers garbage bytes rather than dropping the
connection because a too-new server tried to send useful information in
those bytes).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Moving this list to lists.debian.org?

2017-07-25 Thread Eric Blake

On 07/25/2017 08:27 AM, Wouter Verhelst wrote:
> Is there anyone on this list who would object to that? If so, are there
> any suggestions for better alternatives?

No opposition to the move here, but DO make sure it is well-publicized
(you may want to make an NBD release near the same time, if only to
include the updated list address in any released documentation).

Hmm, just noticing that 'nbd-client--version' gives bad output, and
'nbd-client --help' doesn't mention anything about where to find online
resources/bug-reporting address (compare that to 'ls --help | tail' for
a program that gives a bit more useful output).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] Swap options and option name

2017-04-26 Thread Eric Blake

[adding some cc's]

On 04/21/2017 02:17 AM, Wouter Verhelst wrote:

> Side note though, while checking my OPT_INFO implementation, I
> encountered a problem. I ran your qemu-nbd like so:
> 
> ./qemu-nbd -x foo -D "test image" -t -p 18909 test.img
> 
> and then ran my nbd-client against it several times. At one point, I got
> a SIGPIPE, because of:
> 
> qemu-nbd: io/channel.c:306: qio_channel_yield: Assertion 
> `!ioc->read_coroutine' failed.

This may be an issue that we are already aware of on qemu; I also know
that Paolo has patches lined up that rework control flow in qemu-nbd to
be a bit more efficient in lock handling that may clear it up.  But if
it's something we haven't seen yet, I appreciate you reporting the test
case.

> 
> this was during a debugging session, so there have been a few times where
> I hit "k" in gdb, so the client was killed and the socket aborted. I'm
> not sure whether that's related, but it could be -- and I thought you'd
> want to know this.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] Swap options and option name

2017-04-17 Thread Eric Blake

On 04/16/2017 04:42 AM, Wouter Verhelst wrote:
> Hi Eric,
> 
> Side note:
> 
> On Fri, Apr 14, 2017 at 01:03:00PM -0500, Eric Blake wrote:
>> I'm so glad my patches for NBD_OPT_GO didn't make it into qemu 2.9 (due
>> to be released later this month).
> 
> Can you tell me where these are? I'd like to check my code against yours.

Last posted here to the qemu list (prior to your definition swap):

https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg04528.html

Maintained here, although I will be doing non-fast-forward rebases from
time to time before posting back to the qemu list (but the swap HAS been
made if you pull from there today):

http://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/nbd
git fetch git://repo.or.cz/qemu/ericb.git nbd

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] Swap options and option name

2017-04-14 Thread Eric Blake

On 04/14/2017 07:46 AM, Wouter Verhelst wrote:
> When trying to reply to a client with certain information, it's useful
> if we don't need to read the whole reply but can handle it by reading
> one option, handling it, reading the next one, etc etc. This requires
> that the server is told which export to deal with before getting the
> option names, however.
> 
> As specified, that isn't the case, so this would require the server to
> read in the whole request before it can start processing it.
> 
> Swap options around to make handling the INFO/GO messages easier to do.

I'm so glad my patches for NBD_OPT_GO didn't make it into qemu 2.9 (due
to be released later this month). It's pretty easy for me to fix them to
accommodate this new order, and they should make it into qemu 2.10.

> 
> Signed-off-by: Wouter Verhelst <w...@uter.be>
> ---
>  doc/proto.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index 92df086..8712626 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -919,11 +919,11 @@ of the newstyle negotiation.
>  
>  Data (both commands):
>  
> -- 16 bits, number of information requests
> -- 16 bits x n - list of `NBD_INFO` information requests
>  - 32 bits, length of name (unsigned); MUST be no larger than the
>option data length - 6
>  - String: name of the export
> +- 16 bits, number of information requests
> +- 16 bits x n - list of `NBD_INFO` information requests
>  

I don't see any drawbacks to this change, and your rationale for
covering the block name first makes sense.  ACK.

>  The client MAY list one or more items of specific information it
>  is seeking in the list of information requests, or it MAY specify
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] doc: Fix some minor issues

2017-02-16 Thread Eric Blake

Fix a grammar nit and typo, and resolve a conflict between two
extensions picking the same command number (BLOCK_STATUS is the
older proposal, so it keeps command 7, RESIZE gets command 8).

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

Counterpart patch for the extension-resize branch

 doc/proto.md | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index a4deace..90a073a 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -287,7 +287,7 @@ order, except that:

 * All write commands (that includes `NBD_CMD_WRITE`,
   `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_TRIM`) that the server
-  completes (i.e. replies to) prior to processing to a
+  completes (i.e. replies to) prior to processing a
   `NBD_CMD_FLUSH` MUST be written to non-volatile
   storage prior to replying to that `NBD_CMD_FLUSH`. This
   paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within
@@ -990,7 +990,12 @@ The following request types exist:
 including one or more sectors beyond the size of the device. It SHOULD
 return `EPERM` if it receives a write zeroes request on a read-only export.

-* `NBD_CMD_RESIZE` (7)
+* `NBD_CMD_BLOCK_STATUS` (7)
+
+Defined by the experimental `BLOCK_STATUS`
+
[extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md).
+
+* `NBD_CMD_RESIZE` (8)

 A resize request. The server should make the necessary changes by
 either allocating more space or releasing space that is no longer
-- 
2.9.3


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] doc: Fix some minor issues

2017-02-16 Thread Eric Blake

Fix a grammar nit, and resolve a conflict between two extensions
picking the same command number (BLOCK_STATUS is the older proposal,
so it keeps command 7, RESIZE gets command 8).

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

To be applied on the master branch. The extension-resize branch will
also need patching.

 doc/proto.md | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 39391a5..79117b5 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -287,7 +287,7 @@ order, except that:

 * All write commands (that includes `NBD_CMD_WRITE`,
   `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_TRIM`) that the server
-  completes (i.e. replies to) prior to processing to a
+  completes (i.e. replies to) prior to processing a
   `NBD_CMD_FLUSH` MUST be written to non-volatile
   storage prior to replying to that `NBD_CMD_FLUSH`. This
   paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within
@@ -991,7 +991,12 @@ The following request types exist:
 including one or more sectors beyond the size of the device. It SHOULD
 return `EPERM` if it receives a write zeroes request on a read-only export.

-* `NBD_CMD_RESIZE` (7)
+* `NBD_CMD_BLOCK_STATUS` (7)
+
+Defined by the experimental `BLOCK_STATUS`
+
[extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md).
+
+* `NBD_CMD_RESIZE` (8)

 Defined by the experimental `RESIZE`
 
[extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-resize/doc/proto.md).
-- 
2.9.3


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] build: Fix build with older gcc

2017-02-16 Thread Eric Blake

On 12/19/2016 04:15 PM, Eric Blake wrote:
> gcc 4.4.7 (hello RHEL 6) complains about redefinition of typedefs,
> as in:
> 
> gcc -std=gnu99 -DHAVE_CONFIG_H -I.  -DSYSCONFDIR='"/usr/local/etc"'  -g -O2  
> -g -O2 -MT nbd_client-buffer.o -MD -MP -MF .deps/nbd_client-buffer.Tpo -c -o 
> nbd_client-buffer.o `test -f 'buffer.c' || echo './'`buffer.c
> buffer.c:39: error: redefinition of typedef ‘buffer_t’
> buffer.h:31: note: previous declaration of ‘buffer_t’ was here
> 
> The .c file only has to declare the struct being typedef'd, not
> repeat the typedef declaration.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> ---
>  buffer.c| 4 ++--
>  crypto-gnutls.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)

Ping; this problem is still present.

> 
> diff --git a/buffer.c b/buffer.c
> index e08efd8..491ff87 100644
> --- a/buffer.c
> +++ b/buffer.c
> @@ -28,7 +28,7 @@ OTHER DEALINGS IN THE SOFTWARE.
> 
>  #include "buffer.h"
> 
> -typedef struct buffer
> +struct buffer
>  {
>char *buf;
>ssize_t size;
> @@ -36,7 +36,7 @@ typedef struct buffer
>ssize_t ridx;
>ssize_t widx;
>int empty;
> -} buffer_t;
> +};
> 
>  /* the buffer is organised internally as follows:
>   *
> diff --git a/crypto-gnutls.c b/crypto-gnutls.c
> index d885250..9ce394d 100644
> --- a/crypto-gnutls.c
> +++ b/crypto-gnutls.c
> @@ -53,7 +53,7 @@ OTHER DEALINGS IN THE SOFTWARE.
> 
>  #define PRIORITY "NORMAL:-VERS-TLS-ALL:+VERS-TLS1.2"
> 
> -typedef struct tlssession
> +struct tlssession
>  {
>gnutls_certificate_credentials_t creds;
>gnutls_session_t session;
> @@ -62,7 +62,7 @@ typedef struct tlssession
>int (*erroutfn) (void *opaque, const char *format, va_list ap);
>int debug;
>void *opaque;
> -} tlssession_t;
> +};
> 
>  #define BUF_SIZE 65536
>  #define BUF_HWM ((BUF_SIZE*3)/4)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-25 Thread Eric Blake

On 01/25/2017 03:30 PM, Greg KH wrote:
>> Given (because of NBD_DO_IT) we need an ioctl anyway, and we have
>> an ioctl that isn't going to go away, it would seem better if possible
>> to stick with ioctls, and not introduce either a dependency
>> on netlink (which would presumably bloat static binaries that
>> are used early in the boot process). Personally I'd have thought
>> adding a new NBD ioctl (or extending an existing one) would be
>> less entropy than adding a new char device.
> 
> Why can't you just do this on any existing nbd block device with an
> ioctl to it?  No need to have to do it on an out-of-band char device
> node, right?

How do you get an fd to existing nbd block device?  Your intent is to
use an ioctl to request creating/opening a new nbd device that no one
else is using; opening an existing device in order to send that ioctl
may have negative ramifications to the actual user of that existing
device, if not permissions issues that prevent the open from even
happening.  Having a separate control fd makes it obvious that you are
asking for a new nbd device, and don't want to stomp on any existing
devices.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-23 Thread Eric Blake

On 01/22/2017 05:43 AM, Wouter Verhelst wrote:
> 
> Having given this some more thought, I'm not entirely sure anymore that
> an active resize from the NBD layer is necessarily a layering violation.
> There might be other cases where this is useful, and e.g., the Linux
> kernel also supports active resizes of block devices in some cases
> (e.g., LVM). We'll also need to define what happens in case the resize
> operation isn't possible for some reason (e.g., the size increase asks
> for more space than the server's storage has available).
> 
> I suggest the following semantics:
> 
> - Add a transmission flag to indicate resize is possible for
>   transmission phase (NBD_FLAG_SEND_RESIZE, flag 10). I don't see any
>   need for an NBD_OPT_RESIZE or some such; just the flag should suffice.

Except that in previous threads, you've argued that burning per-export
flags is rather expensive if it is instead soemthing that an NBD_OPT
exchange could enable.  On the other hand, I could definitely see it
being a per-export setting (not all exports have the ability to be
resized), so it doesn't hurt my feelings to burn a per-export flag on
this bit of information.

> - NBD_CMD_RESIZE (command 7): resize the export. The "offset" field
>   contains the new size, "length" is reserved (why not the other way
>   around? offset is 64 bits, length is 32)
> - resize command can fail with:
>   EPERM: server configuration does not allow this resize, or
>   (optionally) other clients are using the same export and the request
>   was for a smaller size.
>   EIO: I/O error while trying to resize the device
>   ENOSPC: new export size requires more space than is available
>   ESHUTDOWN: server is shutting down
> 
> I've added an "extension-resize" branch with the above. If that works
> for you, then run with it. If not, just implement what you want and send
> updates as you go along.

You have at least one problem in there:

+If the resize was successful, clients MAY now issue other requests
+up to the new size as requested in the request header. If the new
+size is larger than it was before the request, requests beyond the
+new size but not beyond the old one MUST fail with ENOSPC.

You probably want "if the new size is _smaller_ than it was before", as
it is not possible to have requests beyond the new size but not beyond
the old if the new size is larger than the old.

I'm still thinking that allowing the client to query the current size is
useful.  Over the weekend, I was thinking of SEEK_SET/SEEK_END semantics
(SEEK_CUR doesn't really make sense, since we don't maintain a current
offset), and wondering if we could improve the command as follows:

If invoked without flags, it operates with SEEK_SET semantics (the
offset is the new requested size); if invoked with
NBD_CMD_FLAG_RELATIVE, it operates with SEEK_END semantics (the offset
is added to the existing size, and can be treated as a signed 64-bit
negative number to shrink).  Either way, on success, the command replies
with a structured reply containing the new 64-bit total size of the
disk.  This would require the reply to be a structured reply, and the
semantics of NBD_CMD_FLAG_RELATIVE with an offset of 0 becomes a way to
probe the existing disk size (thus enabling the server to resize without
client request, and perhaps used with some out-of-band means for the
client to know that it should query the server for an updated size).
Also note that the server's reply of the current size may be slightly
different than what was requested by the client (that is, we should
allow the server to round the client's request up to an appropriate
granularity) - we should probably require that the server can only round
up (not down).

Thoughts?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Eric Blake

On 01/12/2017 11:57 AM, Bob Chen wrote:
> There might be a time window between the NBD server's resize and the
> client's `re-read size` request. Is it safe?

For resize larger, it seems that it would be safe for the server to just
remember the last size it has advertised to the client.  As long as the
client doesn't request read/write to any offset beyond the
last-advertised size (either the size the server gave at handshake, or
the size that the server reported when the client last used the new
're-read size' command), there's no difference in behavior (and
well-behaved clients fall in this group); if the client DOES try to
access out-of-bounds with respect to the last known size, the server
SHOULD reject the access, but MAY serve it instead.  A client that is
unaware of the 're-read size' command will never learn that the server
is now offering a larger image.

For resize smaller, things are a lot trickier - how do you block access
to a portion of a file that the client still thinks exist, but which the
server has truncated away?  Maybe the easy way out is to state that the
server MUST NOT resize smaller.

> 
> What about an active `resize` request from the client? Considering some NBD
> servers might have the capability to do instant resizing, not applying to
> LVM or host block device, of course...

And perhaps the 're-read size' command can serve a dual purpose.  Since
all NBD commands already include space for size and offset parameters,
we could define that if the client passes 0/0 for size and offset, it
wants to read the server's current size; if it passes 0/non-zero, it is
asking the server to resize to the new non-zero size (and the server's
success or error response tells whether the resize happened).

Should I spend time turning this idea into a more formal spec, along the
lines of other NBD extension proposals?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Eric Blake

On 01/12/2017 04:22 AM, Bob Chen wrote:
> Hi,
> 
> 
> My qemu runs on a 3rd party distributed block storage, and the disk backend
> protocol is nbd.
> 
> I notices that there are differences between default qcow2 local disk and
> my nbd disk, in terms of resizing the disk on the fly.
> 
> Local qcow2 disk could work no matter using qemu-img resize 

No. Don't EVER use 'qemu-img resize' while a qemu process is actively
serving a VM guest.  Two concurrent modifiers to a qcow2 file is
unsupported, and is highly likely to corrupt your guest's view of the
disk, if not the entire qcow2 file.

> or qemu monitor
> 'block_resize',

Yes, that is the only supported way to do an online resize.

> but the nbd disk seemed to fail to detect the backend size
> change(had resized the disk on EBS at first). It said "this feature or
> command is not currently supported".

That's because the NBD protocol lacks a resize command.  You'd have to
first get that proposed as an NBD extension before qemu could support it.

> 
> Is that possible to hack qemu nbd code, making it the same way as resizing
> local qcow2 disk? I have the interface to resize EBS disk at backend.

Anything is possible in open source with enough time and patches, but
the place to tackle this is to first propose an extension to the NBD
protocol (I've added the NBD list in cc).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] build: Fix build with older gcc

2016-12-19 Thread Eric Blake

On 12/19/2016 04:15 PM, Eric Blake wrote:
> gcc 4.4.7 (hello RHEL 6) complains about redefinition of typedefs,
> as in:
> 
> gcc -std=gnu99 -DHAVE_CONFIG_H -I.  -DSYSCONFDIR='"/usr/local/etc"'  -g -O2  
> -g -O2 -MT nbd_client-buffer.o -MD -MP -MF .deps/nbd_client-buffer.Tpo -c -o 
> nbd_client-buffer.o `test -f 'buffer.c' || echo './'`buffer.c
> buffer.c:39: error: redefinition of typedef ‘buffer_t’
> buffer.h:31: note: previous declaration of ‘buffer_t’ was here
> 
> The .c file only has to declare the struct being typedef'd, not
> repeat the typedef declaration.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> ---
>  buffer.c| 4 ++--
>  crypto-gnutls.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)

With this patch, I was able to run 'make check' on my RHEL 6 machine,
with its older gnutls-devel-2.12.23-17.el6.x86_64.  There were two test
failures, both related to the recent STARTTLS additions, but I don't
have enough time or tls experience to try and debug them today. It may
just be a bug in the testsuite, not prepared to handle the difference
between older and modern gnutls.

./tls
TLS handshake failed: A TLS packet with unexpected length was received.

** (process:23575): WARNING **: Could not open socket: Could not read
size: Connection reset by peer

** (process:23575): WARNING **: Could not read size: Connection reset by
peer

** (process:23575): WARNING **: Could not run test: Could not read size:
Connection reset by peer
FAIL: tls
./tlshuge
TLS handshake failed: A TLS packet with unexpected length was received.

** (process:23615): WARNING **: Could not open socket: Could not read
size: Resource temporarily unavailable

** (process:23615): WARNING **: Could not read size: Resource
temporarily unavailable

** (process:23615): WARNING **: Could not run test: Could not read size:
Resource temporarily unavailable
FAIL: tlshuge
./tlswrongcert
TLS handshake failed: A TLS packet with unexpected length was received.

** (process:23645): WARNING **: Could not open socket: Could not read
size: Resource temporarily unavailable

** (process:23645): WARNING **: Could not run test: Could not read size:
Resource temporarily unavailable
XFAIL: tlswrongcert

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] build: Fix build with older gcc

2016-12-19 Thread Eric Blake

gcc 4.4.7 (hello RHEL 6) complains about redefinition of typedefs,
as in:

gcc -std=gnu99 -DHAVE_CONFIG_H -I.  -DSYSCONFDIR='"/usr/local/etc"'  -g -O2  -g 
-O2 -MT nbd_client-buffer.o -MD -MP -MF .deps/nbd_client-buffer.Tpo -c -o 
nbd_client-buffer.o `test -f 'buffer.c' || echo './'`buffer.c
buffer.c:39: error: redefinition of typedef ‘buffer_t’
buffer.h:31: note: previous declaration of ‘buffer_t’ was here

The .c file only has to declare the struct being typedef'd, not
repeat the typedef declaration.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 buffer.c| 4 ++--
 crypto-gnutls.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/buffer.c b/buffer.c
index e08efd8..491ff87 100644
--- a/buffer.c
+++ b/buffer.c
@@ -28,7 +28,7 @@ OTHER DEALINGS IN THE SOFTWARE.

 #include "buffer.h"

-typedef struct buffer
+struct buffer
 {
   char *buf;
   ssize_t size;
@@ -36,7 +36,7 @@ typedef struct buffer
   ssize_t ridx;
   ssize_t widx;
   int empty;
-} buffer_t;
+};

 /* the buffer is organised internally as follows:
  *
diff --git a/crypto-gnutls.c b/crypto-gnutls.c
index d885250..9ce394d 100644
--- a/crypto-gnutls.c
+++ b/crypto-gnutls.c
@@ -53,7 +53,7 @@ OTHER DEALINGS IN THE SOFTWARE.

 #define PRIORITY "NORMAL:-VERS-TLS-ALL:+VERS-TLS1.2"

-typedef struct tlssession
+struct tlssession
 {
   gnutls_certificate_credentials_t creds;
   gnutls_session_t session;
@@ -62,7 +62,7 @@ typedef struct tlssession
   int (*erroutfn) (void *opaque, const char *format, va_list ap);
   int debug;
   void *opaque;
-} tlssession_t;
+};

 #define BUF_SIZE 65536
 #define BUF_HWM ((BUF_SIZE*3)/4)
-- 
2.9.3


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Proposal to merge WRITE_ZEROES extension into master

2016-12-19 Thread Eric Blake

On 12/19/2016 03:41 PM, Wouter Verhelst wrote:
> On Thu, Dec 15, 2016 at 12:09:42PM +0100, Wouter Verhelst wrote:
>> Eric's patch makes -d work again, but disables maxconnections in that
>> case. I'll need to fix it up; hopefully that'll happen sometime next
>> week, because otherwise I'll be too late to make the Debian freeze for
>> stretch, and I'd really like STARTTLS to be in there.
> 
> I'm an idiot :-)
> 
> When we say "don't fork", that means we cannot handle more than one
> connection, because we can't fork, and we do fork-per-child.
> 
> I'll apply Eric's patch, it's the correct thing to do.

I'm not sure if I broke the ACL handling in my patch, though, so you'll
want to double-check that aspect of it (I basically hardcoded -d to mean
that the ACL check would always succeed).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH 1/2] nbd-server: Improve command line documentation

2016-12-15 Thread Eric Blake

Mention -d in --help output, to match that it is listed in the
man page.  Update the man page examples and the --help output
to make it obvious that our preferred usage nowadays will omit
the port/file/size arguments, since the config file is more
powerful.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 man/nbd-server.1.in.sgml | 8 +++-
 nbd-server.c | 9 ++---
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/man/nbd-server.1.in.sgml b/man/nbd-server.1.in.sgml
index 72a840b..d1d4301 100644
--- a/man/nbd-server.1.in.sgml
+++ b/man/nbd-server.1.in.sgml
@@ -269,7 +269,13 @@ manpage.1: manpage.sgml
 Some examples of nbd-server usage:
 
   
-   To export a file /export/nbd/exp-bl-dev on port 2000:
+   The simplest usage, when everything is specified in the
+ configuration file (preferred):
+   nbd-server -C /path/to/config
+  
+  
+   To export a file /export/nbd/exp-bl-dev on port 2000,
+ using the older command line syntax:
nbd-server 2000 /export/nbd/exp-bl-dev
   
   
diff --git a/nbd-server.c b/nbd-server.c
index 76b515d..698e86e 100644
--- a/nbd-server.c
+++ b/nbd-server.c
@@ -439,11 +439,12 @@ static inline void spliceit(int fd_in, loff_t *off_in, 
int fd_out,
  */
 void usage() {
printf("This is nbd-server version " VERSION "\n");
-   printf("Usage: [ip:|ip6@]port file_to_export [size][kKmM] [-l 
authorize_file] [-r] [-m] [-c] [-C configuration file] [-p PID file name] [-o 
section name] [-M max connections] [-V]\n"
+   printf("Usage: [[ip:|ip6@]port file_to_export [size][kKmM]] [-l 
authorize_file] [-r] [-m] [-c] [-C configuration file] [-p PID file name] [-o 
section name] [-M max connections] [-d] [-V]\n"
   "\t-r|--read-only\t\tread only\n"
   "\t-m|--multi-file\t\tmultiple file\n"
   "\t-c|--copy-on-write\tcopy on write\n"
   "\t-C|--config-file\tspecify an alternate configuration file\n"
+  "\t-d|--dont-fork\t\trun in foreground rather than daemon, for 
debugging\n"
   "\t-l|--authorize-file\tfile with list of hosts that are allowed 
to\n\t\t\t\tconnect.\n"
   "\t-p|--pid-file\t\tspecify a filename to write our PID to\n"
   "\t-o|--output-config\toutput a config file section for what 
you\n\t\t\t\tspecified on the command line, with the\n\t\t\t\tspecified section 
name\n"
@@ -451,8 +452,10 @@ void usage() {
   "\t-V|--version\toutput the version and exit\n\n"
   "\tif port is set to 0, stdin is used (for running from 
inetd).\n"
   "\tif file_to_export contains '%%s', it is substituted with the 
IP\n"
-  "\t\taddress of the machine trying to connect\n" 
-  "\tif ip is set, it contains the local IP address on which we're 
listening.\n\tif not, the server will listen on all local IP addresses\n");
+  "\t\taddress of the machine trying to connect\n"
+  "\tif ip is set, it contains the local IP address on which we're 
listening.\n\tif not, the server will listen on all local IP addresses.\n"
+  "\n\tNote that port, file_to_export, and size are deprecated in 
favor of the\n\tconfiguration file specified by -C.\n"
+   );
printf("Using configuration file %s\n", CFILE);
 }

-- 
2.9.3


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH 2/2] nbd-server: Make command-line -r apply to all exports

2016-12-15 Thread Eric Blake

Now that we prefer the config file for everything, many command
line options don't make much sense any more.  But one common
tweak is a decision on whether to fire up nbd-server with all
exports as read-only.  Repurpose the -r command line option to
imply that all exports are now readonly, rather than just the
ones mentioned as readonly in the config, which makes it easier
to test a server setup without the worry of accidental writes
while also avoiding the need to tweak the config file just to
turn lots of readonly markers on and then back off again.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 man/nbd-server.1.in.sgml | 5 +++--
 man/nbd-server.5.in.sgml | 2 +-
 nbd-server.c | 7 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/man/nbd-server.1.in.sgml b/man/nbd-server.1.in.sgml
index d1d4301..16b6411 100644
--- a/man/nbd-server.1.in.sgml
+++ b/man/nbd-server.1.in.sgml
@@ -159,9 +159,10 @@ manpage.1: manpage.sgml
   
-r

- Export the file read-only. If a client tries to write
+ Export all files as read-only. If a client tries to write
to a read-only exported file, it will receive an error, but
-   the connection will stay up.
+   the connection will stay up. This option overrides the
+   configuration file per-export readonly settings.

   
   
diff --git a/man/nbd-server.5.in.sgml b/man/nbd-server.5.in.sgml
index e838bcf..03ded44 100644
--- a/man/nbd-server.5.in.sgml
+++ b/man/nbd-server.5.in.sgml
@@ -684,7 +684,7 @@ manpage.1: manpage.sgml
Use of this option in conjunction with
copyonwrite is possible, but silly.
  
- Corresponds to the -r option on the
+ Implied if the -r option is used on the
  command line.

   
diff --git a/nbd-server.c b/nbd-server.c
index 698e86e..cb341f6 100644
--- a/nbd-server.c
+++ b/nbd-server.c
@@ -164,6 +164,7 @@ int dontfork = 0;
 #define F_OLDSTYLE 1 /**< Allow oldstyle (port-based) exports */
 #define F_LIST 2 /**< Allow clients to list the exports on a server */
 #define F_NO_ZEROES 4/**< Do not send zeros to client */
+#define F_ALL_READONLY 8  /**< Force all exports to be readonly */
 // also accepts F_FORCEDTLS (which is 16384)
 GHashTable *children;
 char pidfname[256]; /**< name of our PID file */
@@ -440,7 +441,7 @@ static inline void spliceit(int fd_in, loff_t *off_in, int 
fd_out,
 void usage() {
printf("This is nbd-server version " VERSION "\n");
printf("Usage: [[ip:|ip6@]port file_to_export [size][kKmM]] [-l 
authorize_file] [-r] [-m] [-c] [-C configuration file] [-p PID file name] [-o 
section name] [-M max connections] [-d] [-V]\n"
-  "\t-r|--read-only\t\tread only\n"
+  "\t-r|--read-only\t\tmake all exports be read only\n"
   "\t-m|--multi-file\t\tmultiple file\n"
   "\t-c|--copy-on-write\tcopy on write\n"
   "\t-C|--config-file\tspecify an alternate configuration file\n"
@@ -577,6 +578,7 @@ SERVER* cmdline(int argc, char *argv[], struct generic_conf 
*genconf) {
}
break;
case 'r':
+   genconf->flags |= F_ALL_READONLY;
serve->flags |= F_READONLY;
break;
case 'm':
@@ -945,6 +947,9 @@ GArray* parse_cfile(gchar* f, struct generic_conf *const 
genconf, bool expect_ge
/* Don't append values for the [generic] group */
if(i>0 || !expect_generic) {
s.servename = groups[i];
+   if (genconftmp.flags & F_ALL_READONLY) {
+   s.flags |= F_READONLY;
+   }

g_array_append_val(retval, s);
}
-- 
2.9.3


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH 0/2] Repurpose nbd-server -r

2016-12-15 Thread Eric Blake

As discussed in another thread earlier today, I found myself
wanting to quickly turn read-only on and off on all exports
while testing interoperability between qemu client and nbd-server,
without having to re-edit the config file all the time.

The first patch is probably worth having no matter what (well,
after any tweaks it gets based on review comments); the second
is more of my RFC on whether the idea even makes sense to have
the command line flags override all config file export sections
at once.

Eric Blake (2):
  nbd-server: Improve command line documentation
  nbd-server: Make command-line -r apply to all exports

 man/nbd-server.1.in.sgml | 13 ++---
 man/nbd-server.5.in.sgml |  2 +-
 nbd-server.c | 16 
 3 files changed, 23 insertions(+), 8 deletions(-)

-- 
2.9.3


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] build: Allow CFLAGS override during make

2016-12-15 Thread Eric Blake

On 12/15/2016 10:53 AM, Wouter Verhelst wrote:
> On Wed, Dec 14, 2016 at 04:19:53PM -0600, Eric Blake wrote:
>> Automake recommends the use of $(CFLAGS), not @CFLAGS@, because
>> that allows a user to override CFLAGS at make time (with
>> 'make CFLAGS=-g', for example) rather than being hard-coded to
>> the CFLAGS in use during configure time.  Use the preferred
>> substitution style for ALL variables, not just CFLAGS.
> 
> Actually, we want to use AM_CFLAGS, which works with older versions of
> automake, too, IIRC. Given that, I don't think there's much point in 
> converting
> all the @FOO_CFLAGS@ etc to $() syntax.

Whether it is $(CFLAGS) or $(AM_CFLAGS) is less important, the point
remains that automake (including older versions all the way back to the
automake present on RHEL 5) documents that $(name) is preferred over
@name@ substitutions in Makefile.am.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] nbd-server: Kill dead mainloop()

2016-12-15 Thread Eric Blake

Unused since commit 6c2d8511.  Be the chainsaw mentioned in the comment :)

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

Applies to the master branch; will cause a (trivial) merge conflict with
the extensions-write-zeroes branch.


 nbd-server.c | 173 ---
 1 file changed, 173 deletions(-)

diff --git a/nbd-server.c b/nbd-server.c
index 3dcfadd..fec397b 100644
--- a/nbd-server.c
+++ b/nbd-server.c
@@ -2019,179 +2019,6 @@ static int mainloop_threaded(CLIENT* client) {
writeit(client->transactionlogfd, , sizeof(reply)); }
 /** error macro. */
 #define ERROR(client,reply,errcode) { reply.error = nbd_errno(errcode); 
SEND(client,reply); reply.error = 0; }
-/**
- * Serve a file to a single client.
- *
- * @todo This beast needs to be split up in many tiny little manageable
- * pieces. Preferably with a chainsaw.
- *
- * @param client The client we're going to serve to.
- * @return when the client disconnects
- **/
-int mainloop(CLIENT *client) {
-   struct nbd_request request;
-   struct nbd_reply reply;
-   gboolean go_on=TRUE;
-#ifdef DODBG
-   int i = 0;
-#endif
-   send_export_info(client);
-   DEBUG("Entering request loop!\n");
-   reply.magic = htonl(NBD_REPLY_MAGIC);
-   reply.error = 0;
-   while (go_on) {
-   char buf[BUFSIZE];
-   char* p;
-   size_t len;
-   size_t currlen;
-   size_t writelen;
-   uint16_t command;
-#ifdef DODBG
-   i++;
-   printf("%d: ", i);
-#endif
-   socket_read(client, , sizeof(request));
-   if (client->transactionlogfd != -1)
-   writeit(client->transactionlogfd, , 
sizeof(request));
-
-   request.from = ntohll(request.from);
-   request.type = ntohl(request.type);
-   command = request.type & NBD_CMD_MASK_COMMAND;
-   len = ntohl(request.len);
-
-   DEBUG("%s from %llu (%llu) len %u, ", getcommandname(command),
-   (unsigned long long)request.from,
-   (unsigned long long)request.from / 512, len);
-
-   if (request.magic != htonl(NBD_REQUEST_MAGIC))
-   err("Not enough magic.");
-
-   memcpy(reply.handle, request.handle, sizeof(reply.handle));
-
-   if ((command==NBD_CMD_WRITE) || (command==NBD_CMD_READ) ||
-   (command==NBD_CMD_TRIM)) {
-   if (request.from + len < request.from) { // 64 bit 
overflow!!
-   DEBUG("[Number too large!]");
-   ERROR(client, reply, EINVAL);
-   continue;
-   }
-
-   if (((off_t)request.from + len) > client->exportsize) {
-   DEBUG("[RANGE!]");
-   ERROR(client, reply, (command==NBD_CMD_WRITE) ? 
ENOSPC : EINVAL);
-   continue;
-   }
-
-   currlen = len;
-   if (currlen > BUFSIZE - sizeof(struct nbd_reply)) {
-   currlen = BUFSIZE - sizeof(struct nbd_reply);
-   if(!logged_oversized) {
-   msg(LOG_DEBUG, "oversized request (this 
is not a problem)");
-   logged_oversized = true;
-   }
-   }
-   }
-
-   switch (command) {
-
-   case NBD_CMD_DISC:
-   msg(LOG_INFO, "Disconnect request received.");
-   if (client->server->flags & F_COPYONWRITE) { 
-   if (client->difmap) g_free(client->difmap) ;
-   close(client->difffile);
-   unlink(client->difffilename);
-   free(client->difffilename);
-   }
-   go_on=FALSE;
-   continue;
-
-   case NBD_CMD_WRITE:
-   DEBUG("wr: net->buf, ");
-   while(len > 0) {
-   socket_read(client, buf, currlen);
-   DEBUG("buf->exp, ");
-   if ((client->server->flags & F_READONLY) ||
-   (client->server->flags & F_AUTOREADONLY)) {
-   DEBUG("[WRITE to READONLY!]");
-   ERROR(client, reply, EPERM);
-   consume(clien

[Nbd] [PATCH] write-zeroes: Fix doc typo prior to mainline merge

2016-12-15 Thread Eric Blake

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

This is the only thing I noticed broken in extensions-write-zeroes
when comparing it to mainline, so I think we're ready to merge.

(Well, this, and the dead mainloop() code is a pointless merge, but
I'll post that patch separately).

 doc/proto.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/proto.md b/doc/proto.md
index afe71fc..445778f 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -286,7 +286,7 @@ The server MAY process commands out of order, and MAY reply 
out of
 order, except that:

 * All write commands (that includes `NBD_CMD_WRITE`,
-  `NBD_WRITE_ZEROES` and `NBD_CMD_TRIM`) that the server
+  `NBD_CMD_WRITE_ZEROES` and `NBD_CMD_TRIM`) that the server
   completes (i.e. replies to) prior to processing to a
   `NBD_CMD_FLUSH` MUST be written to non-volatile
   storage prior to replying to that `NBD_CMD_FLUSH`. This
-- 
2.9.3


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] suspicious errno handling

2016-12-15 Thread Eric Blake

Noticed this while reviewing the extensions-write-zeroes branch: we
probably ought to audit errno usage, as I spotted several places that
use an undefined value of errno.  For example:

if(expwrite(req->from, pkg->data, req->len, client, fua)) {
DEBUG("Write failed: %m");
rep.error = nbd_errno(errno);

This is assuming that expwrite() left a sane errno value, and that
DEBUG() didn't clobber it.  But a quick look at DEBUG() shows that it
(can) call printf() (which can clobber errno), and a close look at
expwrite() shows that there are error paths where errno is indeterminate
(for example, expwrite() can return -1 if rawread_fully() fails, but
rawread_fully() can call close() in between read() and returning a
value, where close() can clobber errno).

It may be smarter to have functions return an actual error value (or
negative error value) instead of -1 on failure, rather than relying on
errno remaining unchanged through all the interim code.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Proposal to merge WRITE_ZEROES extension into master

2016-12-15 Thread Eric Blake

On 12/14/2016 11:03 PM, Alex Bligh wrote:
> Eric,
> 
>> On 14 Dec 2016, at 21:03, Eric Blake <ebl...@redhat.com> wrote:
>>
>> Okay, I've cloned gondbserver sources, but I've never compiled a Go
>> project before.  How do I get an executable server that I can then point
>> the qemu 2.8 client at, to test interoperability?
> 
> go build
> 
> should give you a binary in the current directory.

Apparently, it takes a lot more setup than I have lying around:

$ pwd
/home/eblake/gonbdserver/nbd
$ go build
rbd.go:13:2: cannot find package "github.com/ceph/go-ceph/rados" in any of:
/usr/src/github.com/ceph/go-ceph/rados (from $GOROOT)
($GOPATH not set)
rbd.go:14:2: cannot find package "github.com/ceph/go-ceph/rbd" in any of:
/usr/src/github.com/ceph/go-ceph/rbd (from $GOROOT)
($GOPATH not set)
config.go:6:2: cannot find package "github.com/sevlyar/go-daemon" in any of:
/usr/src/github.com/sevlyar/go-daemon (from $GOROOT)
($GOPATH not set)
aiofile.go:11:2: cannot find package "github.com/traetox/goaio" in any of:
/usr/src/github.com/traetox/goaio (from $GOROOT)
($GOPATH not set)
aiofile.go:12:2: cannot find package "golang.org/x/net/context" in any of:
/usr/src/golang.org/x/net/context (from $GOROOT)
($GOPATH not set)
config.go:8:2: cannot find package "gopkg.in/yaml.v2" in any of:
/usr/src/gopkg.in/yaml.v2 (from $GOROOT)
($GOPATH not set)


> 
> (though I'd presumed you'd test qemu against the reference server
> implementation, which is possibly easier)

Yes, I did that as well, but comparing more than one implementation
can't hurt, if it's easy.

> 
>> Are you available on a particular IRC chat channel, so I can coordinate
>> some of this testing without clogging the mailing list?  For example,
>> I'm eblake on freenode's #kvm channel.
> 
> Sure. I'm normally alexbligh or alexbligh1 on freenode or OFTC (where I'm
> on #qemu). However I'm travelling Thurs through lands of little
> connectivity. I'll try to remember to leave the IRC client running when
> I'm back in the land of connectivity.

Thanks, I'll catch you sometime when we're both available, as me getting
my go environment up and running is probably less relevant to the list.
But no rush.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Proposal to merge WRITE_ZEROES extension into master

2016-12-14 Thread Eric Blake

On 12/14/2016 12:55 PM, Alex Bligh wrote:
> 
>> On 14 Dec 2016, at 18:51, Eric Blake <ebl...@redhat.com> wrote:
>>
>> It's working well in qemu 2.8 without needing tweaks to the
>> documentation.  Should we try and do some cross-implementation testing
>> today, before doing the actual merge?
> 
> If you have a minute to do so, that would be great.

With qemu-io as the client and nbd-server as the server (a merge of the
master branch, extensions-write-zeroes-branch, and the patch below), I
was successfully able to validate that nbd-server correctly advertises
the bit when asked to, that it handles NBD_CMD_WRITE_ZEROES requests at
any alignment, and that it is okay with any combination of the flags
NBD_CMD_FLAG_FUA and NBD_CMD_FLAG_NO_HOLE, insofar as the client can
then re-read correct all-zero data. There's an oddity with the error
reported when attempting to write to a read-only export: NBD_CMD_WRITE
fails with EPERM, but NBD_CMD_WRITE_ZEROES fails with EINVAL.  That's
probably a bug in nbd-server, based on the recommendations of the spec,
but one which qemu-io handled just fine as a client.

I tested with this nbd-server config file, and the mentioned command lines:

# Use with:
# ./nbd-server -C local -d
# ./qemu-io -f raw -t none nbd://localhost:/file2
[generic]
  port = 
  allowlist = true
[file2]
  exportname = /home/eblake/qemu/file2
  flush = true
  fua = true
  trim = true
# uncomment as needed
#  readonly = true

[It would be nice if '/nbd-server -C file -d -r' would force read-only
mode on ALL exports, regardless of the readonly settings in the
individual export settings of the config file - but I suppose that's a
patch for another day]

A quick read of nbd-server.c shows that it never tries to punch holes
(as permitted when NBD_CMD_FLAG_NO_HOLE is omitted), so the behavior is
semantically correct even if not optimal when sparse files are supported.

qemu-io as a client correctly refuses to send NBD_CMD_WRITE_ZEROES if
the server doesn't advertise support, and does not allow me to easily
attempt to try a write beyond EOF (which means I was unable to test the
server response in the face of an ill-behaved client that sends the
command when not advertised, or which sends bad length/offset).  I could
tweak the code to qemu's client to allow me to test these situations, if
you think further testing of ill-behaved clients is worthwhile, but I'm
at least glad that the positive testing was successful.

As promised, here's the patches I used, above the merge of the master
and extensions-write-zeroes branches (most of the merge conflicts were
in doc/proto.md, I didn't closely review the resolution to those merges,
but we'll have to do it correctly when we actually make the merge):

diff --git c/nbd-server.c w/nbd-server.c
index 9d031c3..312091a 100644
--- c/nbd-server.c
+++ w/nbd-server.c
@@ -2252,7 +2252,7 @@ int mainloop(CLIENT *client) {
ERROR(client, reply, errno);
continue;
}
-   SEND(client->net, reply);
+   SEND(client, reply);
continue;

default:
@@ -2690,11 +2690,15 @@ handle_modern_connection(GArray *const servers,
const int sock, struct generic_c
 msg(LOG_ERR, "Modern initial negotiation failed");
 goto handler_err;
 }
-   len = strlen(client->server->servename);
-   writeit(commsocket, , sizeof len);
-   writeit(commsocket, client->server->servename, len);
-   readit(commsocket, , 1);
-   close(commsocket);
+   if (dontfork) {
+   acl = 'Y';
+   } else {
+   len = strlen(client->server->servename);
+   writeit(commsocket, , sizeof len);
+   writeit(commsocket, client->server->servename, len);
+   readit(commsocket, , 1);
+   close(commsocket);
+   }

switch(acl) {
case 'N':

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] did 'nbd-server -d' accidentally break?

2016-12-14 Thread Eric Blake

I don't see how it can be reliable:

static void
handle_modern_connection(GArray *const servers, const int sock, struct
generic_conf *genconf)
{
...
int commsocket;
...
if (!dontfork) {
pid = spawn_child();
...
}
...
client = negotiate(net, servers, genconf);
if (!client) {
msg(LOG_ERR, "Modern initial negotiation failed");
goto handler_err;
}
len = strlen(client->server->servename);
writeit(commsocket, , sizeof len);
writeit(commsocket, client->server->servename, len);
readit(commsocket, , 1);
close(commsocket);

Since -d turns dontfork on, it means commsocket is uninitialized(!). If
you're lucky, it starts life as 0, and if you are in an interactive
environment (where stdin happens to be a tty), then the output goes
somewhere rather than causing an error, and then you have to type
something other than N or X to get the readit() to succeed.

Looks like commit 7e901617 is the culprit, and maybe the solution is to
just skip the commsocket stuff when -d is active (should -d also imply
maxconnections of 1?).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Proposal to merge WRITE_ZEROES extension into master

2016-12-14 Thread Eric Blake

On 12/14/2016 12:55 PM, Alex Bligh wrote:
> 
>> On 14 Dec 2016, at 18:51, Eric Blake <ebl...@redhat.com> wrote:
>>
>> It's working well in qemu 2.8 without needing tweaks to the
>> documentation.  Should we try and do some cross-implementation testing
>> today, before doing the actual merge?
> 
> If you have a minute to do so, that would be great.

Okay, I've cloned gondbserver sources, but I've never compiled a Go
project before.  How do I get an executable server that I can then point
the qemu 2.8 client at, to test interoperability?

Are you available on a particular IRC chat channel, so I can coordinate
some of this testing without clogging the mailing list?  For example,
I'm eblake on freenode's #kvm channel.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Proposal to merge WRITE_ZEROES extension into master

2016-12-14 Thread Eric Blake

On 12/14/2016 12:52 PM, Wouter Verhelst wrote:
> On Wed, Dec 14, 2016 at 06:20:57PM +, Alex Bligh wrote:
>> Wouter,
>>> Did the implementation of WRITE_ZEROES encounter any issues with the spec
>>> that we might want to fix up? Or did things work out as expected?
>>
>> It was fine in nbdserver, and also fine in gonbdserver. Admittedly I didn't
>> have a 'real' client to test it against.
> 
> Oh, I missed that part.
> 
> Is there a client anywhere that is more than a test thing? If not, then 
> perhaps
> hold off on merging for now until that exists too.

Qemu 2.8 (currently at -rc3, not officially released yet but should be
soon, but none of the pending patches between -rc3 and the final release
will affect nbd behavior) has both a client and a server implementation,
and I'm also set up to turn on tracing in either the client or the
server in order to trace what the other side sends over the wire.  An
arbitrary client is a bit harder to test if you don't know how to
provoke it into sending write zero commands, but qemu-io serves as a
nice testbed that lets the qemu client send arbitrary commands to any
server.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Proposal to merge WRITE_ZEROES extension into master

2016-12-14 Thread Eric Blake

On 12/14/2016 12:20 PM, Alex Bligh wrote:
> Wouter,
> 
>> On 14 Dec 2016, at 18:14, Wouter Verhelst <w...@uter.be> wrote:
>>
>> Go ahead, as far as I'm concerned.
> 
> Great. I'll do that in a bit unless anyone else shouts with a compelling
> reason why not to do so. It will be a first test of an 'extension-' merge.
> 
>> Did the implementation of WRITE_ZEROES encounter any issues with the spec 
>> that
>> we might want to fix up? Or did things work out as expected?
> 
> It was fine in nbdserver, and also fine in gonbdserver. Admittedly
> I didn't have a 'real' client to test it against.

It's working well in qemu 2.8 without needing tweaks to the
documentation.  Should we try and do some cross-implementation testing
today, before doing the actual merge?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH] Further tidy-up on block status

2016-12-14 Thread Eric Blake

On 12/14/2016 09:08 AM, Alex Bligh wrote:
> (NB: I've already applied this and pushed it)
> 
> * Change NBD_OPT_LIST_METADATA etc. to explicitly send a list of queries
>   and add a count of queries so we can extend the command later (rather than
>   rely on the length of option)
> 
> * For NBD_OPT_LIST_METADATA make absence of any query (as opposed to zero
>   length query) list all contexts, as absence of any query is now simple.
> 
> * Move definition of namespaces in the document to somewhere more appopriate.
> 
> * Various other minor changes as discussed on the mailing list
> 

> +Metadata contexts are identified by their names. The name MUST
> +consist of a namespace, followed by a colon, followed by a leaf-name.
> +The namespace and the leaf-name must each consist entirely of
> +printable non-whitespace UTF-8 characters other than colons,
> +and be non-empty. The entire name (namespace, colon and leaf-name)
> +MUST NOT exceed 255 bytes (and therefore botht he namespace and

s/botht he/both the/


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-14 Thread Eric Blake

On 12/14/2016 02:22 AM, Wouter Verhelst wrote:
> Hi Eric,
> 
> On Tue, Dec 13, 2016 at 04:36:08PM -0600, Eric Blake wrote:
>> On 12/13/2016 06:18 AM, Wouter Verhelst wrote:
>>> On Tue, Dec 13, 2016 at 08:38:12AM +0100, Kevin Wolf wrote:
>>>> Am 12.12.2016 um 19:12 hat Wouter Verhelst geschrieben:
>>>>> I'm not opposed to this proposal, per se, but there seems to be some
>>>>> disagreement (by Kevin, for instance) on whether this extension is at
>>>>> all useful.
>>>>
>>>> FWIW, I'm not opposed to adding the flag. I don't think it can hurt, but
>>>> I wanted to ask the question so that we don't end up adding a feature
>>>> that noone actually uses. Ultimately it's your call as a maintainer
>>>> anyway how conservative you want to be with spec additions.
>>>
>>> I'm not opposed either, but I do agree with you that we shouldn't add
>>> such a feature if it doesn't end up getting used. Especially so if it
>>> burns a flag in the (16-bit) "transmission flags" field, where space is
>>> at a premium.
>>
>> No, it is NOT a "transmission flag", as it is a per-export property
>> (where we currently have 64 bits).
> 
> That may be what you meant, but the patch you sent uses a flag in the
> transmission flags field.
> 
> If you meant to use something else, you'll have to clarify what your
> patch should have been like ;-)

/me goes back and re-reads spec - I shouldn't reply to mails purely off
of memory...

Okay, I'm crossing terms here.  "Transmission flags" ARE the per-export
flags - and are sent in response to NBD_OPT_EXPORT_NAME or NBD_OPT_INFO
or NBD_OPT_GO.  And you are right that we only have 16 bits in the
current spec, but that they can differ between exports (case in point:
NBD_FLAG_READ_ONLY).  So my proposal of bit 10 as NBD_FLAG_INIT_ZEROES
is potentially in the right place - it is a per-export property (a
server may support multiple named exports for the client to choose from,
of which only a subset are known to be all zero content at the time of
export).  But your argument that 16 bits is at a premium may be worth
considering an alternative representation for the same information.

Possibility 1: when we run out of our current 16 transmission flags,
we'll need a way to extend the protocol to support more than that.  To
do that, we'd probably want a new Handshake flag
NBD_FLAG_LARGE_TRANSMISSION_FLAGS sent by the server, and which must be
answered by the client sending a new Client flag
NBD_FLAG_C_LARGE_TRANSMISSION_FLAGS, so that both sides know that all
subsequent places in the protocol that send transmission flags will now
send 64 bits instead of 16 bits (old-style negotiation cannot benefit
from this, and is permanently stuck at 16 bits, but we discourage
old-style negotiation).  We'd still want to prioritize which bits land
in the first 16 positions, for maximum compatibility with older servers
or clients (where the higher bit positions are used for data that is
more suited for optimizations, rather than controlling correct usage) -
thus NBD_FLAG_INIT_ZEROES would serve as an example to live in the
extended positions.

Possibility 2: Leverage the extension-info branch: Add a new
NBD_INFO_INIT_ZERO information datum.  NBD_BLOCK_INFO already exists to
let server and client exchange as much per-export information as they
want during handshake without burning any new transmission flags, and
with a specification that gracefully ignores unknown requests from the
client and unknown advertisements from the server.  There's the drawback
that the server can't advertise the known-zero-init optimization to
clients that don't use NBD_OPT_GO, but that's not too bad (especially if
qemu is the first client with code to actually make use of the
optimization, since I am already trying to get qemu to be one of the
first adopters of NBD_OPT_GO).

We'll probably have to eventually use something along the lines of
possibility 1 for more transmission flags for some other reason down the
road (since we have lately been adding one flag per new command/command
group as we add extensions), but I agree that it is a bit heavy for now.
 So it sounds like I should rework the proposal along the lines of
possibility 2, which also implies that we should get extension-info
ready for promotion to current.  And that means that my current proposal
to the extension-write-zeroes branch is probably not ideal, but it
reopens the question of whether extension-write-zeroes as it currently
stands is ready for promotion to stable now that qemu 2.8 implements it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
---

Re: [Nbd] [Qemu-devel] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-13 Thread Eric Blake

On 12/13/2016 06:18 AM, Wouter Verhelst wrote:
> On Tue, Dec 13, 2016 at 08:38:12AM +0100, Kevin Wolf wrote:
>> Am 12.12.2016 um 19:12 hat Wouter Verhelst geschrieben:
>>> I'm not opposed to this proposal, per se, but there seems to be some
>>> disagreement (by Kevin, for instance) on whether this extension is at
>>> all useful.
>>
>> FWIW, I'm not opposed to adding the flag. I don't think it can hurt, but
>> I wanted to ask the question so that we don't end up adding a feature
>> that noone actually uses. Ultimately it's your call as a maintainer
>> anyway how conservative you want to be with spec additions.
> 
> I'm not opposed either, but I do agree with you that we shouldn't add
> such a feature if it doesn't end up getting used. Especially so if it
> burns a flag in the (16-bit) "transmission flags" field, where space is
> at a premium.

No, it is NOT a "transmission flag", as it is a per-export property
(where we currently have 64 bits).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v4] doc: Add NBD_CMD_BLOCK_STATUS extension

2016-12-12 Thread Eric Blake

On 12/12/2016 02:26 PM, Wouter Verhelst wrote:

>>>  
>>>  This section describes the value and meaning of constants (other than
>>> @@ -768,8 +814,6 @@ The field has the following format:
>>>to that command to the client. In the absense of this flag, clients
>>
>> drive-by typo fix applicable to the main branch:
>> s/absense/absence/
> 
> Isn't that a matter of en_US vs en_GB?

Not in this instance - both countries use absence.  (If it helps, I'm US
but my wife is UK, so I generally have a good feel for which words have
different spellings across the pond)

>>> then the server MUST send all the metadata
>>> +  contexts it knows about. If specified, this query string MUST
>>> +  start with a name that uniquely identifies a server
>>> +  implementation; e.g., the reference implementation that
>>> +  accompanies this document would support query strings starting
>>> +  with 'nbd-server:'
>>
>> 'nbd-server:' or 'base:' ?  [oh, I see more on this below]
> 
> Indeed. It might certainly make sense to clarify that a bit, or I could
> just decide that I take the "base:" context, and that everyone else can
> register their own names ;->

I'd be fine with the reference implementation taking 'base:' :)

>>> +
>>> +Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in
>>> +its request, the server MAY return less descriptors in the reply
>>> +than would be required to fully specify the whole range of requested
>>> +information to the client, if the number of descriptors would be
>>> +over 16 otherwise and looking up the information would be too
>>> +resource-intensive for the server.
>>
>> Do we still want to require servers to always send 16 extents (when not
>> limited to exactly 1), or is it better to just state that as long as the
>> server sends at least one extent (so that the client can make progress),
>> then the server can shorten the reply if it is resource-intensive to
>> provide details over the entire client request?
> 
> Hrm, I wanted to drop that (I did drop another reference to that thing,
> I though), but apparently I forgot.

It looks like elsewhere you make the point that there is always at least
one extent on success, and I think that's sufficient (whether a server
stops at 1, stops at 16, or always provides as much as the client
requests, is then a quality-of-implementation decision on the server; a
client already has to be prepared for a short answer, and should also be
prepared to cache a long answer rather than repeating a query that is
redundant; and a server should tolerate a client that doesn't cache things).

> There were actually more things that I defined multiple times, and I
> thought I'd dropped them all, but apparently not so. I'll fix that up
> ASAP.
> 
> (sorry about the "noise" in that I didn't double-check everything. I'll
> do better next time, honest ;-)

That's okay.  The inter-branch diff is indeed the best way to review the
current state of the changes in relation to the master branch, rather
than trying to follow one patch at a time.  I'm grateful that you've
stepped in to try and nail down some of the wordings, and doing a qemu
proof-of-concept implementation is indeed on my plate of things to
tackle soon (well, structured replies first...)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v4] doc: Add NBD_CMD_BLOCK_STATUS extension

2016-12-12 Thread Eric Blake

structured fields at the end.
>were sent earlier in the structured reply, the server SHOULD NOT
>send multiple distinct offsets that lie within the bounds of a
>single content chunk.  Valid as a reply to `NBD_CMD_READ`,
> -  `NBD_CMD_WRITE`, and `NBD_CMD_TRIM`.
> +  `NBD_CMD_WRITE`, `NBD_CMD_TRIM`, and `NBD_CMD_BLOCK_STATUS`.
>  
>The payload is structured as:
>  
> @@ -1259,6 +1448,79 @@ The following request types exist:
>  
>  Defined by the experimental `WRITE_ZEROES` 
> [extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-write-zeroes/doc/proto.md).
>  
> +* `NBD_CMD_BLOCK_STATUS` (7)
> +
> +A block status query request. Length and offset define the range of
> +interest. Clients MUST NOT use this request unless metadata
> +contexts have been negotiated, which in turn requires the client to
> +first negotiate structured replies. For a successful return, the
> +server MUST use a structured reply, containing at least one chunk of
> +type `NBD_REPLY_TYPE_BLOCK_STATUS`.
> +
> +The list of block status descriptors within the
> +`NBD_REPLY_TYPE_BLOCK_STATUS` chunk represent consecutive portions
> +of the file starting from specified *offset*, and the sum of the
> +*length* fields of each descriptor MUST not be greater than the
> +overall *length* of the request. This means that the server MAY
> +return less data than required. However the server MUST return at
> +least one status descriptor.  The server SHOULD use different
> +*status* values between consecutive descriptors, and SHOULD use
> +descriptor lengths that are an integer multiple of 512 bytes where
> +possible (the first and last descriptor of an unaligned query being
> +the most obvious places for an exception). The status flags are
> +intentionally defined so that a server MAY always safely report a
> +status of 0 for any block, although the server SHOULD return
> +additional status values when they can be easily detected.
> +
> +If an error occurs, the server SHOULD set the appropriate error
> +code in the error field of either a simple reply or an error
> +chunk.  However, if the error does not involve invalid usage (such
> +as a request beyond the bounds of the file), a server MAY reply
> +with a single block status descriptor with *length* matching the
> +requested length, and *status* of 0 rather than reporting the
> +error.
> +
> +Upon receiving an `NBD_CMD_BLOCK_STATUS` command, the server MUST
> +return the status of the device, where the status field of each
> +descriptor is determined by the following bits (all combinations of
> +these bits are possible):
> +
> +  - `NBD_STATE_HOLE` (bit 0): if set, the block represents a hole
> +(and future writes to that area may cause fragmentation or
> +encounter an `ENOSPC` error); if clear, the block is allocated
> +or the server could not otherwise determine its status.  Note
> +that the use of `NBD_CMD_TRIM` is related to this status, but
> +that the server MAY report a hole even where trim has not been
> +requested, and also that a server MAY report metadata even
> +where a trim has been requested. Additionally, clients should be
> +aware that servers may have no information on the storage
> +availability of an export; e.g., an export may be stored on a
> +sparsely-populated storage device itself, even if it doesn't
> +appear to be the case using regular system calls. As such, it is
> +not an error for a server to report an `ENOSPC` error on a
> +region of the file where the `base:allocation` context has
> +`NBD_STATE_HOLE` clear (although servers SHOULD attempt to avoid
> +this situation).
> +  - `NBD_STATE_ZERO` (bit 1): if set, the block contents read as
> +all zeroes; if clear, the block contents are not known.  Note
> +that the use of `NBD_CMD_WRITE_ZEROES` is related to this
> +status, but that the server MAY report zeroes even where write
> +zeroes has not been requested, and also that a server MAY
> +report unknown content even where write zeroes has been
> +requested.

This is the second time you've defined these bits, should one of the
places be a cross-reference rather than repeated text?

> +
> +It is not an error for a server to report that a region of the
> +export has both `NBD_STATE_HOLE` set and `NBD_STATE_ZERO` clear. The
> +contents of such an area is undefined, and may not be stable;
> +clients who are aware of the existence of such a region SHOULD NOT
> +read it.
> +
> +A client MAY terminate the connection if it detects that the server has
> +sent an invalid chunk (such as lengths in the
> +`NBD_REPLY_TYPE_BLOCK_STATUS` not summing up to the requested length).
> +The server SHOULD return `EINVAL` if it receives a `BLOCK_STATUS`
> +request including one or more sectors beyond the size of the device.
> +
>  * Other requests
>  
>  Some third-party implementations may require additional protocol
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension

2016-12-08 Thread Eric Blake

On 12/08/2016 08:40 AM, Alex Bligh wrote:

>>> + metadata context is the basic "exists at all" metadata context.
>>>
>>> Disagree. You're saying that if a server supports metadata contexts
>>> at all, it must support this one.
>>
>> No, I'm trying to say that this metadata context exposes whether the
>> *block* exists at all (i.e., it exports NBD_STATE_HOLE). I should
>> probably clarify that wording then, if you misunderstood it in that way.
> 
> Ah. Perhaps 'exists at all' itself is misleading. 'Occupies storage
> space'. Or 'is not a hole'?
> 
>> No server MUST implement it (although we might want to make it a SHOULD)
> 
> Don't see why it should even be a 'SHOULD' to be honest. An nbd
> server cooked up for a specific purpose, or with a backend that
> can't provide that (or where there is never a hole) shouldn't
> be criticised.
> 
>>> I think the last sentence is probably meant to read something like:
>>>
>>> If a server supports the "BASE:allocation" metadata context, then
>>> writing to an extent which has `NBD_STATE_HOLE` clear MUST NOT fail
>>> with ENOSPC.
>>
>> No, it can't.
>>
>> Other metadata contexts may change the semantics to the extent that if
>> the block is allocated at the BASE:allocation context, it would still
>> need to space on another plane. In that case, writing to an area which
>> is not STATE_HOLE might still fail.

Not just that, but it is ALWAYS permissible to report NBD_STATE_HOLE as
clear (just not always optimal) - to allow servers that can't determine
sparseness information, but DO know how to communicate extents to the
client.  Yes, it is boring to communicate a single extent that the
entire file is data, and clients can't optimize their usage in that
case, but it should always be considered semantically correct to do so,
since the presence and knowledge of holes is merely an optimization
opportunity, not a data correctness issue.


>>> Why 512 bytes as opposed to 'minimum block size' (or is it because
>>> that is also an experimental extension)?
>>
>> Yes, and this extension does not depend on that one. Hence why it is
>> also a SHOULD and not a MUST.
> 
> OK. As a separate discussion I think we should talk about promoting
> WRITE_ZEROES and INFO. The former has a reference implementation,
> I think Eric did a qemu implementation, and I did a gonbdserver
> implementation. The latter I believe lacks a reference implementation.

Yes, I still have reference qemu patches for INFO; they did not make it
into qemu 2.8 (while WRITE_ZEROES did), but should make it into 2.9.

I also hope to get structured reads into qemu 2.9, but that's a bigger
task, as I don't have reference patches complete yet.  On the other
hand, since BLOCK_STATUS depends on structured reply, I have all the
more reason to complete it soon.

>>> +A client SHOULD NOT read from an area that has both `NBD_STATE_HOLE`
>>> +set and `NBD_STATE_ZERO` clear.
>>>
>>> That makes no sense, as normal data has both these bits clear! This
>>> also implies that to comply with this SHOULD, a client needs to
>>> request block status before any read, which is ridiculous. This
>>> should be dropped.
>>
>> No, it should not, although it may need rewording. It clarifies that
>> having STATE_HOLE set (i.e., there's no valid data in the given range)
>> and STATE_ZERO clear (i.e., we don't assert that it would read as
>> all-zeroes) is not an invalid thing for a server to set. The spec here
>> clarifies what a client should do with that information if it gets it
>> (i.e., "don't read it, it doesn't contain anything interesting").
> 
> That's fair enough until the last bit in brackets. Rather than saying
> a client SHOULD NOT read it, it should simply say that a read on
> such areas will succeed but the data read is undefined (and may
> not be stable).

We should use similar wording to whatever we already say about what a
client would see when reading data cleared by NBD_CMD_TRIM.  After all,
the status of STATE_HOLE set and STATE_ZERO clear is what you logically
get when TRIM cannot guarantee reads-as-zero.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-block] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-07 Thread Eric Blake

On 12/07/2016 04:44 AM, Kevin Wolf wrote:
>> Just because the NBD spec describes the bit does NOT require that
>> servers HAVE to set the bit on all images that are all zeroes.  It is
>> perfectly compliant if the server never advertises the bit.
> 
> True, but if no server exists that would actually make use of the
> feature, it's kind of useless to include it in the spec.

No server, or no client?  I think qemu can be a client fairly easily: if
the server advertises the bit, then the client can set
.bdrv_has_zero_init() and avoid rewriting any sector of a file that is
already known to be zero.

> 
> I think we should have concrete use cases in mind when extending the
> spec, and explain them in the commit message. Just "maybe this could be
> useful for someone sometime" isn't a good enough justification if you
> ask me.

But I think we DO have a concrete case already in mind, both of a client
being able to take advantage of the information if the bit is set, and
of servers that are able to set the bit because they can either
efficiently determine that the file is all-zeroes, or can give the user
a flag to advertise the bit.

> 
> qemu doesn't really know whether an image has been written to since it
> has been created.

It's not so much whether the image has been written to, as much as
easily proving that the image reads as all zeroes.

> The interesting case is probably where the image is
> created externally with qemu-img before it's exported either with
> qemu-nbd or the builtin server, and then we use it as a mirror target.
> 
> Even in the rare cases where both image creation and the NBD server are
> in the same process, bdrv_create() doesn't work on a BlockDriverState,
> but just on a filename. So even then you would have to do hacks like
> remembering file names between create and the first open or something
> like that.

Or add in a driver-specific callback that checks if a file is provably
all-zeroes; for the raw file driver, check if lseek(SEEK_DATA) finds
nothing, for the qcow2 driver, check for no backing files, and no L1
table entries.

>> Another option on the NBD server side is to create a server option -
>> when firing up a server to serve a particular file as an export, the
>> user can explicitly tell the server to advertise the bit because the
>> user has side knowledge that the file was just created (and then the
>> burden of misbehavior is on the user if they mistakenly request the
>> advertisement when it is not true).
> 
> Maybe that's the only practical approach.

But it's still a viable approach, and therefore this bit definition in
the NBD protocol is worthwhile.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] doc: Propose NBD_FLAG_INIT_ZEROES extension

2016-12-05 Thread Eric Blake

While not directly related to NBD_CMD_WRITE_ZEROES, the qemu
team discovered that it is useful if a server can advertise
whether an export is in a known-all-zeroes state at the time
the client connects.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 doc/proto.md | 5 +
 1 file changed, 5 insertions(+)

This replaces the following qemu patch attempt:
https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00357.html
which tried to add NBD_CMD_HAS_ZERO_INIT with poor semantics. The
semantics in this proposal should be much better.

Patch is to the merge of the master branch and the
extension-write-zeroes branch.  By the way, qemu 2.8 is due
to be released "real soon now", and implements NBD_CMD_WRITE_ZEROES,
so maybe it is time to consider promoting the extension-write-zeroes
branch into master.

diff --git a/doc/proto.md b/doc/proto.md
index afe71fc..7e4ec7f 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -697,6 +697,11 @@ The field has the following format:
   the export.
 - bit 9, `NBD_FLAG_SEND_BLOCK_STATUS`: defined by the experimental
   `BLOCK_STATUS` 
[extension](https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md).
+- bit 10, `NBD_FLAG_INIT_ZEROES`: Indicates that the server guarantees
+  that at the time transmission phase begins, all offsets within the
+  export read as zero bytes.  Clients MAY use this information to
+  avoid writing to sections of the export that should still read as
+  zero after the client is done writing.

 Clients SHOULD ignore unknown flags.

-- 
2.9.3


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] nbd: use dev_err_ratelimited in io path

2016-12-05 Thread Eric Blake

On 12/05/2016 03:20 PM, Josef Bacik wrote:
> While doing stress tests we noticed that we'd get a lot of dmesg spam if
> we suddenly disconnected teh nbd device out of band.  Rate limite the

s/teh/the/
s/limite/limit/

> messages in the io path in order to deal with this.
> 
> Signed-off-by: Josef Bacik <jba...@fb.com>
> ---
>  drivers/block/nbd.c | 23 ---
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] build: Ignore copied file during in-tree build

2016-10-20 Thread Eric Blake

Commit b885246 creates a symlink to work around an automake weakness,
but forgot to ignore the link when doing an in-tree build.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

Hmm. I guess I was so busy testing VPATH builds that I forgot to
do one final test of an in-tree build :)

 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index b005b11..0f22910 100644
--- a/.gitignore
+++ b/.gitignore
@@ -49,3 +49,4 @@ ltmain.sh
 .libs/
 *.la
 ar-lib
+tests/run/cliserv.c
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v2 5/6] nbd: Test recent bug fixes

2016-10-20 Thread Eric Blake

On 10/20/2016 02:31 AM, Wouter Verhelst wrote:
>> Trickier said than done.  There is no 'testflags' parameter passed all
>> the way into setup_connection_common(), so it would require quite a bit
>> of additional framework to decide whether to do normal connection
>> (return a socket ready for further reads) or a short connection (return
>> a socket still in the middle of negotiation so we can test further
>> options).  But I'll play with the idea.
> 
> Change the CONECTION_TYPE enum, adding a CONNECTION_TYPE_WRONGOPT (or some
> such) to it? That enum was meant as that "framework" ;-)
> 

my v3 changed the TEST_ enum instead, but I'll see if I can redo it to
trigger off of a CONNECTION_TYPE_ value.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] maint: Let emacs know our preferred style

2016-10-20 Thread Eric Blake

Teach emacs that this entire source tree uses Linux style
indentation (8-space indents, but with hard TAB character
instead of spaces).  Without this hint, the default emacs
C style tries to use 2-space indentation, making it a pain
to edit correctly when automatic formatting is enabled.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 .dir-locals.el | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 .dir-locals.el

diff --git a/.dir-locals.el b/.dir-locals.el
new file mode 100644
index 000..046e004
--- /dev/null
+++ b/.dir-locals.el
@@ -0,0 +1,2 @@
+((c-mode . ((c-file-style . "linux")
+   (indent-tabs-mode . t
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] ideal emacs settings?

2016-10-20 Thread Eric Blake

On 10/20/2016 03:55 AM, Alex Bligh wrote:
> 
>> On 19 Oct 2016, at 23:07, Eric Blake <ebl...@redhat.com> wrote:
>>
>> I noticed that nbd source code prefers the use of hard tab instead of
>> spaces for indentation, and emacs does not play nicely with that by
>> default. Does anyone have the proper emacs configuration for easier
>> editing of this codebase that doesn't butcher existing style? And if so,
>> can we add it within a .dir-locals.el file to make it apply to fresh git
>> checkouts?
>>
> 
> From memory you want to set the 'linux' C mode.

Yep, that appears to do it.  Patch coming up.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] ideal emacs settings?

2016-10-20 Thread Eric Blake

On 10/20/2016 02:38 AM, Wouter Verhelst wrote:
> On Wed, Oct 19, 2016 at 05:07:11PM -0500, Eric Blake wrote:
>> I noticed that nbd source code prefers the use of hard tab instead of
>> spaces for indentation, and emacs does not play nicely with that by
>> default. Does anyone have the proper emacs configuration for easier
>> editing of this codebase that doesn't butcher existing style? And if so,
>> can we add it within a .dir-locals.el file to make it apply to fresh git
>> checkouts?
> 
> I don't insist on tabs. If you prefer spaces, we can move to spaces; I'll just
> have to add "set expandtab" to my .vimrc ;-)

I don't mind tabs, and I'd rather not reindent the whole code-base. But
what I do mind is that emacs defaults to thinking that code should use
2-space indents (whether it uses tabs or spaces for every 8 spaces is a
separate setting); while nbd uses 8-space indents (with 8 spaces written
as tab).  Emacs can be taught to live with that, and I was just seeing
if someone knew the magic incantation to do so; and if so, if we can
then automate it by adding a file to the repository.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v3 5/6] tests: Cover recent bug fixes

2016-10-19 Thread Eric Blake

Add a new test 'handshake' that intentionally provokes a server error
during option negotiation, before falling back to NBD_OPT_ABORT to
end negotiation, to prove that the server is correctly allowing a
client to fall back to known options; thus covering two recent bug
fixes for a server sending the wrong length in an error reply, and
for a server not reading enough data when replying to an unknown
command.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

As requested by Wouter, split this off into an independent test,
rather than impacting the startup of all other tests.

 tests/run/Makefile.am |   4 +-
 tests/run/nbd-tester-client.c | 122 +-
 tests/run/simple_test |  13 +
 3 files changed, 126 insertions(+), 13 deletions(-)

diff --git a/tests/run/Makefile.am b/tests/run/Makefile.am
index c9cfa8f..4ba7e57 100644
--- a/tests/run/Makefile.am
+++ b/tests/run/Makefile.am
@@ -1,5 +1,6 @@
 TESTS_ENVIRONMENT=$(srcdir)/simple_test
-TESTS = cfg1 cfgmulti cfgnew cfgsize write flush integrity dirconfig list 
rowrite tree rotree unix integrityhuge
+TESTS = cfg1 cfgmulti cfgnew cfgsize write flush integrity dirconfig list \
+   rowrite tree rotree unix integrityhuge handshake
 check_PROGRAMS = nbd-tester-client
 nbd_tester_client_SOURCES = nbd-tester-client.c $(top_srcdir)/cliserv.h 
$(top_srcdir)/netdb-compat.h $(top_srcdir)/cliserv.c
 nbd_tester_client_CFLAGS = @CFLAGS@ @GLIB_CFLAGS@
@@ -20,3 +21,4 @@ rowrite:
 tree:
 rotree:
 unix:
+handshake:
diff --git a/tests/run/nbd-tester-client.c b/tests/run/nbd-tester-client.c
index 28db8ee..56e1f46 100644
--- a/tests/run/nbd-tester-client.c
+++ b/tests/run/nbd-tester-client.c
@@ -272,6 +272,7 @@ int writebuffer(int fd, struct chunklist *l)
 #define TEST_WRITE (1<<0)
 #define TEST_FLUSH (1<<1)
 #define TEST_EXPECT_ERROR (1<<2)
+#define TEST_HANDSHAKE (1<<3)

 int timeval_subtract(struct timeval *result, struct timeval *x,
 struct timeval *y)
@@ -348,7 +349,7 @@ static inline int write_all(int f, void *buf, size_t len)
 #define WRITE_ALL_ERR_RT(f, buf, len, whereto, rval, errmsg...) 
if((write_all(f, buf, len))<=0) { snprintf(errstr, errstr_len, ##errmsg); 
retval = rval; goto whereto; }

 int setup_connection_common(int sock, char *name, CONNECTION_TYPE ctype,
-   int *serverflags)
+   int *serverflags, int testflags)
 {
char buf[256];
u64 tmp64;
@@ -403,6 +404,14 @@ int setup_connection_common(int sock, char *name, 
CONNECTION_TYPE ctype,
negotiationflags = htonl(negotiationflags);
WRITE_ALL_ERRCHK(sock, , sizeof(negotiationflags), err,
 "Could not write reserved field: %s", strerror(errno));
+   if (testflags & TEST_HANDSHAKE) {
+   /* Server must support newstyle for this test */
+   if (!(handshakeflags & NBD_FLAG_FIXED_NEWSTYLE)) {
+   strncpy(errstr, "server does not support handshake", 
errstr_len);
+   goto err;
+   }
+   goto end;
+   }
/* magic */
tmp64 = htonll(opts_magic);
WRITE_ALL_ERRCHK(sock, , sizeof(tmp64), err,
@@ -435,7 +444,7 @@ end:
 }

 int setup_unix_connection(gchar * unixsock, gchar * name, CONNECTION_TYPE 
ctype,
- int *serverflags)
+ int *serverflags, int testflags)
 {
struct sockaddr_un addr;
int sock;
@@ -457,7 +466,7 @@ int setup_unix_connection(gchar * unixsock, gchar * name, 
CONNECTION_TYPE ctype,
strncpy(errstr, strerror(errno), errstr_len);
goto err_open;
}
-   sock = setup_connection_common(sock, name, ctype, serverflags);
+   sock = setup_connection_common(sock, name, ctype, serverflags, 
testflags);
goto end;
 err_open:
close(sock);
@@ -468,7 +477,7 @@ end:
 }

 int setup_inet_connection(gchar * hostname, int port, gchar * name,
- CONNECTION_TYPE ctype, int *serverflags)
+ CONNECTION_TYPE ctype, int *serverflags, int 
testflags)
 {
int sock;
struct hostent *host;
@@ -493,7 +502,7 @@ int setup_inet_connection(gchar * hostname, int port, gchar 
* name,
strncpy(errstr, strerror(errno), errstr_len);
goto err_open;
}
-   sock = setup_connection_common(sock, name, ctype, serverflags);
+   sock = setup_connection_common(sock, name, ctype, serverflags, 
testflags);
goto end;
 err_open:
close(sock);
@@ -504,14 +513,14 @@ end:
 }

 int setup_connection(gchar * hostname, gchar * unixsock, int port, gchar * 
name,
-CONNECTION_TYPE ctype, int *serverflags)
+CONNECTION_TYPE ctype, int *serverflags, int testflags)
 {
if (hostname != NULL) {

[Nbd] ideal emacs settings?

2016-10-19 Thread Eric Blake

I noticed that nbd source code prefers the use of hard tab instead of
spaces for indentation, and emacs does not play nicely with that by
default. Does anyone have the proper emacs configuration for easier
editing of this codebase that doesn't butcher existing style? And if so,
can we add it within a .dir-locals.el file to make it apply to fresh git
checkouts?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v2 5/6] nbd: Test recent bug fixes

2016-10-19 Thread Eric Blake

On 10/18/2016 06:07 AM, Wouter Verhelst wrote:
> On Mon, Oct 17, 2016 at 03:23:39PM -0500, Eric Blake wrote:
>> Add a test of intentionally provoking a server error during option
>> negotiation to prove that a client can still fall back to known
>> options; thus covering two recent bug fixes for a server sending
>> the wrong length in an error reply, and for a server not reading
>> enough data when replying to an unknown command.
> 
> While this is a good idea, it might be better to do this conditionally,
> and add another test to TESTS which triggers it. That way, if we break
> it again in the future, the reason why will be more obvious.

Trickier said than done.  There is no 'testflags' parameter passed all
the way into setup_connection_common(), so it would require quite a bit
of additional framework to decide whether to do normal connection
(return a socket ready for further reads) or a short connection (return
a socket still in the middle of negotiation so we can test further
options).  But I'll play with the idea.

> 
> (we can then, after successfully negotiating, send NBD_OPT_ABORT and not
> do anything...)

That part makes sense - sticking the handshake negotiation testing in a
dedicated test, where there is no subsequent I/O, is indeed a nice
separation of tasks when trying to debug any test failures.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v3 1/2] build: Distribute netdb-compat.h without relying on tests

2016-10-19 Thread Eric Blake

nbd-server depends on netdb-compat.h; however, we were only
including it in the tarball as a side effect of it also being
used by the testsuite.  Make the dependency explicit.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 Makefile.am | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile.am b/Makefile.am
index 32774e3..36d35b6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -6,7 +6,8 @@ noinst_LTLIBRARIES = libnbdsrv.la libcliserv.la
 libcliserv_la_SOURCES = cliserv.h cliserv.c
 libcliserv_la_CFLAGS = @CFLAGS@
 nbd_client_SOURCES = nbd-client.c cliserv.h
-nbd_server_SOURCES = nbd-server.c cliserv.h lfs.h nbd.h nbdsrv.h backend.h
+nbd_server_SOURCES = nbd-server.c cliserv.h lfs.h nbd.h nbdsrv.h backend.h \
+   netdb-compat.h
 nbd_trdump_SOURCES = nbd-trdump.c cliserv.h nbd.h
 nbd_server_CFLAGS = @CFLAGS@ @GLIB_CFLAGS@ @GnuTLS_CFLAGS@
 nbd_trdump_CFLAGS = @CFLAGS@ @GLIB_CFLAGS@
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v3 2/2] build: Silence autogen.sh warnings

2016-10-19 Thread Eric Blake

Starting from a fresh git checkout, running ./autogen.sh gives a
couple of warnings on my Fedora 24 build tools, one from libtool:

libtoolize: Consider adding '-I support' to ACLOCAL_AMFLAGS in Makefile.am.

and one from automake:

tests/run/Makefile.am:4: warning: source file '$(top_srcdir)/cliserv.c' is in a 
subdirectory,
tests/run/Makefile.am:4: but option 'subdir-objects' is disabled
automake: warning: possible forward-incompatibility.
automake: At least a source file is in a subdirectory, but the 'subdir-objects'
automake: automake option hasn't been enabled.  For now, the corresponding 
output
automake: object file(s) will be placed in the top-level directory.  However,
automake: this behaviour will change in future Automake versions: they will
automake: unconditionally cause object files to be placed in the same 
subdirectory
automake: of the corresponding sources.
automake: You are advised to start using 'subdir-objects' option throughout your
automake: project, to avoid future incompatibilities.

Following the advice almost works, except that automake 1.15 still
has a nasty bug (https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928)
where use of $(foo) in a _SOURCES variable coupled with subdir-objects
creates a directory with a literal name $(foo) rather than the intended
name.  And while open-coding it (using ../../ instead of $(top_srcdir)/)
works around the problem of bad naming in automake 1.15, it fails for
automake 1.11 (hello CentOS 6), due to the order in which .deps files
are erased vs. included in makefiles.  The solution that works across
all automake versions is to only stick files in _SOURCES that do not
live outside of the subtree.  Note that we only need to copy cliserv.c
into the tests/run directory; automakes handling of dependencies will
still rebuild against .h file changes even without listing the .h files
or copying them locally.

I also noticed that the build was already leaving behind an untracked
manpage.log file, in addition to the new .dirstamp witness file created
by our new use of subdir-objects.

This patch has been tested with 'make distcheck' across multiple
automake and libtool versions, ranging from CentOS 6 vintage to current
git toolchains.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 .gitignore| 2 ++
 Makefile.am   | 1 +
 configure.ac  | 2 +-
 tests/run/Makefile.am | 7 ++-
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/.gitignore b/.gitignore
index b8163e0..b005b11 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 .deps
+.dirstamp
 Makefile
 autom4te.cache
 autoscan.log
@@ -38,6 +39,7 @@ install-sh
 configure
 man/*.sh
 man/*.sh.in
+man/manpage.log
 make-integrityhuge
 nbd-trdump
 missing
diff --git a/Makefile.am b/Makefile.am
index 36d35b6..34f4697 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,3 +1,4 @@
+ACLOCAL_AMFLAGS = -I support
 SUBDIRS = . man doc tests systemd gznbd
 bin_PROGRAMS = nbd-server nbd-trdump
 sbin_PROGRAMS = @NBD_CLIENT_NAME@
diff --git a/configure.ac b/configure.ac
index 83e4f91..ce225a6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -11,7 +11,7 @@ m4_define([serial_tests], [
   awk '{split ($NF,a,"."); if (a[1] == 1 && a[2] >= 12) { print 
"serial-tests" }}'
   ])
 ])
-AM_INIT_AUTOMAKE(foreign dist-xz serial_tests)
+AM_INIT_AUTOMAKE(foreign dist-xz serial_tests subdir-objects)
 AM_MAINTAINER_MODE([enable])
 AC_CONFIG_MACRO_DIR([support])
 LT_INIT
diff --git a/tests/run/Makefile.am b/tests/run/Makefile.am
index c9cfa8f..23b01e3 100644
--- a/tests/run/Makefile.am
+++ b/tests/run/Makefile.am
@@ -1,7 +1,12 @@
 TESTS_ENVIRONMENT=$(srcdir)/simple_test
 TESTS = cfg1 cfgmulti cfgnew cfgsize write flush integrity dirconfig list 
rowrite tree rotree unix integrityhuge
 check_PROGRAMS = nbd-tester-client
-nbd_tester_client_SOURCES = nbd-tester-client.c $(top_srcdir)/cliserv.h 
$(top_srcdir)/netdb-compat.h $(top_srcdir)/cliserv.c
+## Various Automake versions don't play nice with files in parent
+## directories, so instead work with a local copy
+cliserv.c:
+   rm -f cliserv.c
+   ln -s $(top_srcdir)/cliserv.c cliserv.c
+nbd_tester_client_SOURCES = nbd-tester-client.c cliserv.c
 nbd_tester_client_CFLAGS = @CFLAGS@ @GLIB_CFLAGS@
 nbd_tester_client_CPPFLAGS = -I$(top_srcdir)
 nbd_tester_client_LDADD = @GLIB_LIBS@
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v3 0/2] silence build warnings

2016-10-19 Thread Eric Blake

These two patches replace 1/6 in my previous posting, while
2-6 of that series remain identical (and can even be applied
independently prior to this respin, other than the review
comment on 5/6 that still needs addressing).

Eric Blake (2):
  build: Distribute netdb-compat.h without relying on tests
  build: Silence autogen.sh warnings

 .gitignore| 2 ++
 Makefile.am   | 4 +++-
 configure.ac  | 2 +-
 tests/run/Makefile.am | 7 ++-
 4 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v2 1/6] build: Silence autogen.sh warnings

2016-10-18 Thread Eric Blake

On 10/18/2016 06:01 AM, Wouter Verhelst wrote:

> 
> The reason for using $(top_srcdir) is so that VPATH builds don't go
> bollocks. Did you do a "make distcheck" with this?

VPATH build using automake 1.15 worked just fine (as in:
$ ./autogen.sh
$ mkdir ../nbd-build
$ cd ../nbd-build
$ ../nbd/configure
$ make distcheck
)

> 
> Also, did you check that this doesn't break on older automake versions?

I ended up checking out commit 485cf84 (the last commit prior to your
TLS work started), and tested that on RHEL 6 (automake 1.11.1), 'make
distcheck' worked on that patch.  Then I did a cherry-pick of just this
patch on top.  The cherry-pick wasn't quite clean (there was a context
conflict in tests/run/Makefile.am); but the resolution was obvious.
Sadly, your prediction was right; 'make distcheck' complained:

...
make[2]: Leaving directory
`/home/dummy/nbd-build/nbd-3.14-11-gf814c28/_build/man'
Making distclean in .
make[2]: Entering directory
`/home/dummy/nbd-build/nbd-3.14-11-gf814c28/_build'
Makefile:475: .deps/libcliserv_la-cliserv.Plo: No such file or directory

Of course, the hard-fail during configure regarding TLS prevents testing
this patch on top of current nbd.git in that environment, but I'm still
working on seeing if I can make TLS support optional and default to off
if the system libraries are too old.

> (not critical, but I just recently cherry-picked two commits from Josef
> to allow building on CentOS 6, would be silly if we lose that again...)

I'll make sure it works before sending v3 of this patch.  However, the
other patches in this series should still be usable as-is without this
one going in.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v2 1/6] build: Silence autogen.sh warnings

2016-10-18 Thread Eric Blake

On 10/18/2016 09:12 AM, Eric Blake wrote:
> On 10/18/2016 06:01 AM, Wouter Verhelst wrote:
> 
>>> Following the advice almost works, except that automake 1.15 still
>>> has a nasty bug (https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928)
>>> where use of $(foo) in a _SOURCES variable coupled with subdir-objects
>>> creates a directory with a literal name $(foo) rather than the intended
>>> name.  But as long as we only care about $(srcdir) (or parent
>>> directories), the solution is to just not use $(srcdir) in _SOURCES,
>>> and instead open-code the traversal to the desired files.
>>
>> The reason for using $(top_srcdir) is so that VPATH builds don't go
>> bollocks. Did you do a "make distcheck" with this?
> 
> Not yet; will do and report back.

Initial testing: broken before my patches:

checking for GnuTLS... configure: error: Package requirements (gnutls >=
3.3.0) were not met:

Requested 'gnutls >= 3.3.0' but version of GnuTLS is 2.8.5

You'll want to make the new TLS code conditional, and merely disable TLS
rather than fail the entire build if you don't have a new enough library
to rely on.  I might be able to develop that patch too...

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v2 1/6] build: Silence autogen.sh warnings

2016-10-18 Thread Eric Blake

On 10/18/2016 06:01 AM, Wouter Verhelst wrote:

>> Following the advice almost works, except that automake 1.15 still
>> has a nasty bug (https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928)
>> where use of $(foo) in a _SOURCES variable coupled with subdir-objects
>> creates a directory with a literal name $(foo) rather than the intended
>> name.  But as long as we only care about $(srcdir) (or parent
>> directories), the solution is to just not use $(srcdir) in _SOURCES,
>> and instead open-code the traversal to the desired files.
> 
> The reason for using $(top_srcdir) is so that VPATH builds don't go
> bollocks. Did you do a "make distcheck" with this?

Not yet; will do and report back.

> 
> Also, did you check that this doesn't break on older automake versions?
> (not critical, but I just recently cherry-picked two commits from Josef
> to allow building on CentOS 6, would be silly if we lose that again...)

And that will be part of my testing.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v2 1/6] build: Silence autogen.sh warnings

2016-10-17 Thread Eric Blake

Starting from a fresh git checkout, running ./autogen.sh gives a
couple of warnings on my Fedora 24 build tools, one from libtool:

libtoolize: Consider adding '-I support' to ACLOCAL_AMFLAGS in Makefile.am.

and one from automake:

tests/run/Makefile.am:4: warning: source file '$(top_srcdir)/cliserv.c' is in a 
subdirectory,
tests/run/Makefile.am:4: but option 'subdir-objects' is disabled
automake: warning: possible forward-incompatibility.
automake: At least a source file is in a subdirectory, but the 'subdir-objects'
automake: automake option hasn't been enabled.  For now, the corresponding 
output
automake: object file(s) will be placed in the top-level directory.  However,
automake: this behaviour will change in future Automake versions: they will
automake: unconditionally cause object files to be placed in the same 
subdirectory
automake: of the corresponding sources.
automake: You are advised to start using 'subdir-objects' option throughout your
automake: project, to avoid future incompatibilities.

Following the advice almost works, except that automake 1.15 still
has a nasty bug (https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928)
where use of $(foo) in a _SOURCES variable coupled with subdir-objects
creates a directory with a literal name $(foo) rather than the intended
name.  But as long as we only care about $(srcdir) (or parent
directories), the solution is to just not use $(srcdir) in _SOURCES,
and instead open-code the traversal to the desired files.

I also noticed that the build was already leaving behind an untracked
manpage.log file, in addition to the new .dirstamp witness file created
by our new use of subdir-objects.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 .gitignore| 2 ++
 Makefile.am   | 1 +
 configure.ac  | 2 +-
 tests/run/Makefile.am | 2 +-
 4 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/.gitignore b/.gitignore
index b8163e0..b005b11 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 .deps
+.dirstamp
 Makefile
 autom4te.cache
 autoscan.log
@@ -38,6 +39,7 @@ install-sh
 configure
 man/*.sh
 man/*.sh.in
+man/manpage.log
 make-integrityhuge
 nbd-trdump
 missing
diff --git a/Makefile.am b/Makefile.am
index 32774e3..c1740d6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1,3 +1,4 @@
+ACLOCAL_AMFLAGS = -I support
 SUBDIRS = . man doc tests systemd gznbd
 bin_PROGRAMS = nbd-server nbd-trdump
 sbin_PROGRAMS = @NBD_CLIENT_NAME@
diff --git a/configure.ac b/configure.ac
index 83e4f91..ce225a6 100644
--- a/configure.ac
+++ b/configure.ac
@@ -11,7 +11,7 @@ m4_define([serial_tests], [
   awk '{split ($NF,a,"."); if (a[1] == 1 && a[2] >= 12) { print 
"serial-tests" }}'
   ])
 ])
-AM_INIT_AUTOMAKE(foreign dist-xz serial_tests)
+AM_INIT_AUTOMAKE(foreign dist-xz serial_tests subdir-objects)
 AM_MAINTAINER_MODE([enable])
 AC_CONFIG_MACRO_DIR([support])
 LT_INIT
diff --git a/tests/run/Makefile.am b/tests/run/Makefile.am
index c9cfa8f..b790c3f 100644
--- a/tests/run/Makefile.am
+++ b/tests/run/Makefile.am
@@ -1,7 +1,7 @@
 TESTS_ENVIRONMENT=$(srcdir)/simple_test
 TESTS = cfg1 cfgmulti cfgnew cfgsize write flush integrity dirconfig list 
rowrite tree rotree unix integrityhuge
 check_PROGRAMS = nbd-tester-client
-nbd_tester_client_SOURCES = nbd-tester-client.c $(top_srcdir)/cliserv.h 
$(top_srcdir)/netdb-compat.h $(top_srcdir)/cliserv.c
+nbd_tester_client_SOURCES = nbd-tester-client.c ../../cliserv.h 
../../netdb-compat.h ../../cliserv.c
 nbd_tester_client_CFLAGS = @CFLAGS@ @GLIB_CFLAGS@
 nbd_tester_client_CPPFLAGS = -I$(top_srcdir)
 nbd_tester_client_LDADD = @GLIB_LIBS@
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v2 2/6] server: Fix botched strlen computation of error message

2016-10-17 Thread Eric Blake

Commit 3b80382 tried to make it easy for the server to send an
error message whose length was determined by strlen(), but ended
up sending a length of UINT32_MAX, causing clients to either
hang up (reply too large) or wait for nearly 4G of data that was
never coming.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 nbd-server.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/nbd-server.c b/nbd-server.c
index 25c335b..d0c6fa6 100644
--- a/nbd-server.c
+++ b/nbd-server.c
@@ -1321,11 +1321,12 @@ static void send_reply(CLIENT* client, uint32_t opt, 
uint32_t reply_type, ssize_
htonl(reply_type),
htonl(datasize),
};
+   if(datasize < 0) {
+   datasize = strlen((char*)data);
+   header.datasize = htonl(datasize);
+   }
socket_write(client, , sizeof(header));
if(datasize != 0) {
-   if(datasize < 0) {
-   datasize = strlen((char*)data);
-   }
socket_write(client, data, datasize);
}
 }
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v2 3/6] server: Swap argument order in consume()

2016-10-17 Thread Eric Blake

The signature of consume() threw me off.  Good design says that if
you are going to have paired parameters (buf and bufsize), you
generally want them adjacent, not separated by an unrelated parameter
(len).  Move len to be first, adjusting all callers.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 nbd-server.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/nbd-server.c b/nbd-server.c
index d0c6fa6..c93a9d8 100644
--- a/nbd-server.c
+++ b/nbd-server.c
@@ -359,12 +359,12 @@ static void socket_read(CLIENT* client, void *buf, size_t 
len) {
 /**
  * Consume data from a socket that we don't want
  *
- * @param f a file descriptor
- * @param buf a buffer
+ * @param c the client to read from
  * @param len the number of bytes to consume
+ * @param buf a buffer
  * @param bufsiz the size of the buffer
  **/
-static inline void consume(CLIENT* c, void * buf, size_t len, size_t bufsiz) {
+static inline void consume(CLIENT* c, size_t len, void * buf, size_t bufsiz) {
size_t curlen;
while (len>0) {
curlen = (len>bufsiz)?bufsiz:len;
@@ -1853,14 +1853,14 @@ int mainloop(CLIENT *client) {
(client->server->flags & F_AUTOREADONLY)) {
DEBUG("[WRITE to READONLY!]");
ERROR(client, reply, EPERM);
-   consume(client, buf, len-currlen, 
BUFSIZE);
+   consume(client, len-currlen, buf, 
BUFSIZE);
continue;
}
if (expwrite(request.from, buf, currlen, client,
 request.type & NBD_CMD_FLAG_FUA)) {
DEBUG("Write failed: %m" );
ERROR(client, reply, errno);
-   consume(client, buf, len-currlen, 
BUFSIZE);
+   consume(client, len-currlen, buf, 
BUFSIZE);
continue;
}
len -= currlen;
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v2 6/6] server: Read client's TLS length data before next option

2016-10-17 Thread Eric Blake

Any client attempting to probe support for a new option, such as
NBD_OPT_STARTTLS or NBD_OPT_GO, with plans to do a graceful
fallback to older methods, will fail in its attempt if the server
does not consume the length field and potential payload of the
unrecognized (or rejected) option, because the server will then
be reading out of sync and not see the client's magic for the
next option.  While it is true that sane clients are unlikely to
send more than one NBD_OPT_STARTTLS, and thus never trigger some
of the paths in this patch, it is still better to be robust for
all clients.

Furthermore, even if the server requires TLS, and rejects all but
NBD_OPT_STARTTLS as the first valid option, it should still honor
NBD_OPT_ABORT.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 nbd-server.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/nbd-server.c b/nbd-server.c
index 4b0692d..d5b2013 100644
--- a/nbd-server.c
+++ b/nbd-server.c
@@ -1507,6 +1507,11 @@ CLIENT* negotiate(int net, GArray* servers, const gchar* 
tlsdir) {
// so must do hard close
goto hard_close;
}
+   if(opt == NBD_OPT_ABORT) {
+   // handled below
+   break;
+   }
+   consume_len(client);
send_reply(client, opt, NBD_REP_ERR_TLS_REQD, -1, "TLS 
is required on this server");
continue;
}
@@ -1529,11 +1534,14 @@ CLIENT* negotiate(int net, GArray* servers, const 
gchar* tlsdir) {
break;
case NBD_OPT_STARTTLS:
if(client->tls_session != NULL) {
+   consume_len(client);
send_reply(client, opt, NBD_REP_ERR_INVALID, 
-1, "Invalid STARTTLS request: TLS has already been negotiated!");
continue;
}
if(tlsdir == NULL) {
+   consume_len(client);
send_reply(client, opt, NBD_REP_ERR_POLICY, -1, 
"TLS not allowed on this server");
+   continue;
}
if(handle_starttls(client, opt, servers, cflags, 
tlsdir) == NULL) {
// can't recover from failed TLS negotiation.
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH v2 4/6] server: Read client's unknown option length before next option

2016-10-17 Thread Eric Blake

Any client attempting to probe support for a new option, such as
NBD_OPT_STARTTLS or NBD_OPT_GO, with plans to do a graceful
fallback to older methods, will fail in its attempt if the server
does not consume the length field and potential payload of the
unrecognized (or rejected) option, because the server will then
be reading out of sync and not see the client's magic for the
next option.  This bug has been latent in the reference
server since commit 626c2a3 in 2012, even though it is EXACTLY
the bug that NBD_FLAG_FIXED_NEWSTYLE was designed to prevent.

The only reason it has been buggy for so long is that it has
taken us this long to finally want to implement clients that use
a new option.

This patch fixes only the portion of the server that has been
previously released, to make backports easier. The new code for
handling TLS also needs fixing, in the next patch.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 nbd-server.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/nbd-server.c b/nbd-server.c
index c93a9d8..4b0692d 100644
--- a/nbd-server.c
+++ b/nbd-server.c
@@ -373,6 +373,21 @@ static inline void consume(CLIENT* c, size_t len, void * 
buf, size_t bufsiz) {
}
 }

+/**
+ * Consume a length field and corresponding payload that we don't want
+ *
+ * @param c the client to read from
+ **/
+static inline void consume_len(CLIENT* c) {
+   uint32_t len;
+   char buf[1024];
+
+   socket_read(c, , sizeof(len));
+   len = ntohl(len);
+   consume(c, len, buf, sizeof(buf));
+}
+
+
 static void socket_write(CLIENT* client, void *buf, size_t len) {
g_assert(client->socket_write != NULL);
client->socket_write(client, buf, len);
@@ -1525,6 +1540,7 @@ CLIENT* negotiate(int net, GArray* servers, const gchar* 
tlsdir) {
goto hard_close;
}
default:
+   consume_len(client);
send_reply(client, opt, NBD_REP_ERR_UNSUP, -1, "The 
given option is unknown to this server implementation");
break;
}
-- 
2.7.4


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] build: Silence autogen.sh warnings

2016-10-17 Thread Eric Blake

On 10/14/2016 03:19 PM, Eric Blake wrote:
> Starting from a fresh git checkout, running ./autogen.sh gives a
> couple of warnings on my Fedora 24 build tools, one from libtool:
> 
> libtoolize: Consider adding '-I support' to ACLOCAL_AMFLAGS in Makefile.am.
> 
> and one from automake:
> 
> tests/run/Makefile.am:4: warning: source file '$(top_srcdir)/cliserv.c' is in 
> a subdirectory,
> tests/run/Makefile.am:4: but option 'subdir-objects' is disabled
> automake: warning: possible forward-incompatibility.
> automake: At least a source file is in a subdirectory, but the 
> 'subdir-objects'
> automake: automake option hasn't been enabled.  For now, the corresponding 
> output
> automake: object file(s) will be placed in the top-level directory.  However,
> automake: this behaviour will change in future Automake versions: they will
> automake: unconditionally cause object files to be placed in the same 
> subdirectory
> automake: of the corresponding sources.
> automake: You are advised to start using 'subdir-objects' option throughout 
> your
> automake: project, to avoid future incompatibilities.
> 
> Following the advice doesn't break anything, so let's do it.

Except my patch did break something. NACK; this will need v2.

> 
> I also noticed that the build leaves behind an untracked file.
> 
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> ---
>  .gitignore   | 1 +
>  Makefile.am  | 1 +
>  configure.ac | 2 +-
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/.gitignore b/.gitignore
> index b8163e0..a13897a 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -38,6 +38,7 @@ install-sh
>  configure
>  man/*.sh
>  man/*.sh.in
> +man/manpage.log
>  make-integrityhuge
>  nbd-trdump
>  missing
> diff --git a/Makefile.am b/Makefile.am
> index 32774e3..c1740d6 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -1,3 +1,4 @@
> +ACLOCAL_AMFLAGS = -I support
>  SUBDIRS = . man doc tests systemd gznbd
>  bin_PROGRAMS = nbd-server nbd-trdump
>  sbin_PROGRAMS = @NBD_CLIENT_NAME@
> diff --git a/configure.ac b/configure.ac
> index 83e4f91..ce225a6 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -11,7 +11,7 @@ m4_define([serial_tests], [
>awk '{split ($NF,a,"."); if (a[1] == 1 && a[2] >= 12) { print 
> "serial-tests" }}'
>])
>  ])
> -AM_INIT_AUTOMAKE(foreign dist-xz serial_tests)
> +AM_INIT_AUTOMAKE(foreign dist-xz serial_tests subdir-objects)
>  AM_MAINTAINER_MODE([enable])
>  AC_CONFIG_MACRO_DIR([support])
>  LT_INIT
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] build failure

2016-10-17 Thread Eric Blake

On 10/15/2016 01:46 PM, Wouter Verhelst wrote:
> On Fri, Oct 14, 2016 at 03:11:55PM -0500, Eric Blake wrote:
>> On 10/14/2016 01:32 PM, Wouter Verhelst wrote:
>>> On Thu, Oct 13, 2016 at 05:33:18PM -0500, Eric Blake wrote:
>>>> I'm getting this failure when trying to build NBD, as part of
>>>> 'autoreconf -vfi':
>>>>
>>>> configure.ac:248: error: required file 'systemd/n...@.service.sh.in' not
>>>> found
>>>
>>> That's why we have autogen.sh ;-)
>>
>> Okay, that got me further, but there's still warnings about improper
>> automake usage, and then still a fatal build failure during make:
>>
>> make[3]: Entering directory '/home/eblake/nbd/tests/run'
>> Makefile:403: ../../.deps/nbd_tester_client-cliserv.Po: No such file or
>> directory
>> make[3]: *** No rule to make target
>> '../../.deps/nbd_tester_client-cliserv.Po'.  Stop.
> 
> That shouldn't happen, unless you've done something terribly wrong.

Oh, I see what happened. My attempted patch to silence automake warnings
is not quite right, because now I get:
$ find -name nbd_tester_client\*
./tests/run/$(top_srcdir)/.deps/nbd_tester_client-cliserv.Po
./tests/run/.deps/nbd_tester_client-nbd-tester-client.Po

Whoops - that literal '$(top_srcdir)' directory name is a (known)
automake issue, so I'll have to fix my patch.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] TLS implementation in reference nbd-server

2016-10-14 Thread Eric Blake

On 10/14/2016 01:18 PM, Wouter Verhelst wrote:

>>> If you want to check it out, just run nbd-server from git master.
>>> Feedback (and/or review) welcome :-)
>>
>> I'm happy to have a detailed look at this later (and indeed
>> do some interoperability testing - I'll see if I can dig out
>> the qemu-img command line I used to test gonbdserver),

I've also got some experience creating qemu-nbd command lines for
interoperability; here's one with new enough qemu:

$  qemu-nbd --object tls-creds-x509,id=tls0,endpoint=server,\
dir=/path/to/qemutls --tls-creds tls0 --exportname default

where /path/to/qemutls contains the necessary certificate files.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] server: Read client's length data before next option

2016-10-14 Thread Eric Blake

On 10/14/2016 04:02 PM, Eric Blake wrote:

>  /**
>   * Consume data from a socket that we don't want
>   *
> - * @param f a file descriptor
> + * @param c the client data stream
>   * @param buf a buffer
>   * @param len the number of bytes to consume
>   * @param bufsiz the size of the buffer
> @@ -373,6 +373,21 @@ static inline void consume(CLIENT* c, void * buf, size_t 
> len, size_t bufsiz) {

This signature threw me off.  Good design says that if you are going to
have paired parameters (buf and bufsize), you generally want them
adjacent, not separated by an unrelated parameter (len).  Shall I submit
the obvious patch to swap parameter order and update all callers?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] server: Read client's length data before next option

2016-10-14 Thread Eric Blake

On 10/14/2016 04:02 PM, Eric Blake wrote:
> Any client attempting to probe support for a new option, such as
> NBD_OPT_STARTTLS or NBD_OPT_GO, with plans to do a graceful
> fallback to older methods, will fail in its attempt if the server
> does not ignore the length field and potential payload of the
> unrecognized (or rejected) option, because the server will then
> be reading out of sync and not see the client's magic for the
> next option.  This bug has been latent in the reference
> server since commit 626c2a3 in 2012, even though it is EXACTLY
> the bug that NBD_FLAG_FIXED_NEWSTYLE was designed to prevent.

Given that we have 4 years of buggy servers that will fail to react
correctly to NBD_OPT_GO and friends, is it worth enhancing the docs to
suggest that a robust client should be prepared for (buggy) servers that
mistakenly hang up after sending an NBD_OPT_ERR_UNSUP, and try
reconnecting to the server while avoiding the command that failed on the
previous run?  Eventually, buggy servers will disappear, so we can't
mandate that clients take that extra step, but since it is a very common
problem at the present, it might help client implementations to be aware
of it.  Then again, most client implementors read this list, so
documenting it in the protocol may be overkill.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] build failure

2016-10-14 Thread Eric Blake

On 10/14/2016 01:32 PM, Wouter Verhelst wrote:
> On Thu, Oct 13, 2016 at 05:33:18PM -0500, Eric Blake wrote:
>> I'm getting this failure when trying to build NBD, as part of
>> 'autoreconf -vfi':
>>
>> configure.ac:248: error: required file 'systemd/n...@.service.sh.in' not
>> found
> 
> That's why we have autogen.sh ;-)

Okay, that got me further, but there's still warnings about improper
automake usage, and then still a fatal build failure during make:

make[3]: Entering directory '/home/eblake/nbd/tests/run'
Makefile:403: ../../.deps/nbd_tester_client-cliserv.Po: No such file or
directory
make[3]: *** No rule to make target
'../../.deps/nbd_tester_client-cliserv.Po'.  Stop.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCHv3] Docs: improve description of disconnection methods

2016-10-14 Thread Eric Blake

On 04/14/2016 03:12 PM, Alex Bligh wrote:

Sorry for reviving an old thread, but just a heads-up to implementors:

> Improve the documentation as per the mailing list discussion.
> Here's what we decided (broadly).
> 
> * One side MAY drop the connection if the other end violates a
>   MUST condition.
...
> ---

> +
> +On a server shutdown, the server SHOULD wait for inflight
> +requests to be serviced prior to initiating a hard disconnect.
> +A server MAY speed this process up by issuing error replies.
> +The error value issued in respect of these requests and
> +any subsequently received requests SHOULD be `ESHUTDOWN`.
> +
> +If the client receives an `ESHUTDOWN` error it MUST initiate
> +a soft disconnect.
> +

> @@ -960,6 +1057,7 @@ The following error values are defined:
>  * `ENOSPC` (28), No space left on device.
>  * `EOVERFLOW` (75), Value too large; SHOULD NOT be sent outside of the
>experimental `STRUCTURED_REPLY` extension; see below.
> +* `ESHUTDOWN` (108), Server has been shut down.
>  

Note that ESHUTDOWN, although present on both Linux and BSD, is not
specified by POSIX, and is therefore not present on all platforms.  This
should not matter to the protocol itself (we've hard-coded that an error
value of 108 MEANS that the server is trying to shut down and will
reject all further transmission requests from the client, regardless of
whether either the server or the client actually has an ESHUTDOWN error,
and regardless of whether a platform's ESHUTDOWN has the value of 108).
But it CAN make for some interesting compilation scenarios if you are
trying to port code to more than one platform, such as mingw.

I don't think we have to change anything in the spec, so much as me
posting this email just as a heads-up in case anyone else hits the
problem (I just hit it in qemu while trying to add ESHUTDOWN support).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] proto: add 'shift' extension.

2016-09-26 Thread Eric Blake

On 09/26/2016 07:51 AM, Paolo Bonzini wrote:

>> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
>> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
>> +  *length* and *offset* left by N bits, where N is defined by 
>> `NBD_OPT_SHIFT`
>> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
>> +  not specified. If after shift `(offset + length)` exceeds disk size, 
>> length
>> +  should be reduced to `( - offset)`. However, `(offset + 
>> length)`
>> +  must not exceed disk size by more than `(1 << N) - 1`.
>>  
>>   Request types
>>  
>>
> 
> This is very ad hoc.  Can we instead have a block size common to all
> commands?  Block devices in practice have one, in fact that's why
> they're called block devices...

The INFO extensions are supposed to be a way for the server to
communicate block sizes to the guest (note, it is one-way communication;
the guest does NOT get to pick an arbitrary size, but relies on the
server reporting it) - but I don't know if that sizing information is
useful enough for the task at hand.  As it was, the INFO extension had a
proposal idea a while back about advertising a different size for TRIM
and WRITE_ZEROES than what is preferred for WRITE and READ, so having a
single size that works for shifted commands that operate on blocks
instead of bytes may not even be feasible if there is no single block
size to settle on.

But having a universal command flag that says "this command requests
offset and length in units of blocks instead of bytes", where blocks is
an up-front sizing settled on during handshake via INFO extension, and
NOT something that the client can change at-will during a live session,
may be useful.  Or it may be the source of arithmetic overflow exploits
in poor implementations, or of denial-of-service with used with a READ
or WRITE to send more than 2G of data in a single command.  In other
words, I don't yet see a compelling use case for being able to request
shifted sizes.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH] proto: add 'shift' extension.

2016-09-26 Thread Eric Blake

On 09/26/2016 07:46 AM, Vladimir Sementsov-Ogievskiy wrote:
> This extension allows big requests for TRIM and WRITE_ZEROES through
> special 'shift' parameter, which means that offset and length should be
> shifted left by several bits.
> 
> This is needed for efficient clearing large regions of the disk (up to
> the whole disk).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
> ---
> 
> Here mentioned WRITE_ZEROES command which is only an experemental
> extension for now.
> 
> To dicuss:

> +- `NBD_OPT_SHIFT` (10)
> +
> +Defines shift used to calculate request offset and length if
> +`NBD_CMD_FLAG_SHIFT` is set.
> +
> +Data:
> +
> +- 32 bits, shift (unsigned); Must not be larger than 32.

Uggh. You're making the protocol stateful (the server has to remember
what the client has previously requested via NBD_CMD_FLAG_SHIFT, rather
than having ALL information it needs immediately available in the
current NBD_CMD_WRITE_ZEROES).

I'd much rather support a single flag that says to zero the entire disk
than to introduce stateful variable-amount shifting.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] write_zeroes/trim on the whole disk

2016-09-23 Thread Eric Blake

On 09/23/2016 01:32 PM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> There is a following problem. When we need to write_zeroes or trim the
> whole disk, we have to do it iteratively, because of 32-bit restriction
> on request length.
> For example, current implementation of mirror (see mirror_dirty_init())
> do this by chunks of 2147418112 bytes (with default granularity of
> 65536). So, to zero 16tb disk we will make 8192 requests instead of one.
> 
> Incremental zeroing of 1tb qcow2 takes > 80 seconds for me (see below).
> This means ~20 minutes for copying empty 16tb qcow2 disk which is
> obviously a waste of time.
> 
> We see the following solutions for nbd:
> ||
> 1. Add command NBD_MAKE_EMPTY, with flag, saying what should be done:
> trim or write_zeroes.

Presumably spelled NBD_CMD_MAKE_EMPTY.

> 2. Add flag NBD_CMD_FLAG_WHOLE for commands NBD_TRIM and
> NBD_WRITE_ZEROES, which will say (with zeroed offset and lenght of the
> request), that the whole disk should be discarded/zeroed.

Both of these are possible.  As it is, NBD_CMD_WRITE_ZEROES is not even
formally part of the NBD spec yet, although NBD_CMD_TRIM is (I'm still
sitting on my qemu proof-of-concept patches for WRITE_ZEROES, and need
to resubmit them now that the qemu 2.8 development window is open).
Either way, the server would have to advertise if the new command and/or
new flags to existing commands are available for a whole-disk trim/zero,
before a client could use it, and clients must be prepared to fall back
to incremental approaches otherwise.

My preference would be a new flag to the existing commands, with
explicit documentation that 0 offset and 0 length must be used with that
flag, when requesting a full-device wipe.

> 3. Increase length field of the request to 64bit.

No; that won't work.  It would be a fundamental change to the NBD
protocol, and require both new servers and new clients to talk a
different wire protocol with different size length parameters.

> 
> As soon as we have some way to empty disk  in nbd, we can use
> qcow2_make_empty, to trim the whole disk (and something similar should
> be done for zeroing).
> 
> What do you think about this all, and which way has a chance to get into
> nbd proto?

It's not necessarily obvious that the ability to bulk-trim or bulk-zero
a device should be fundamentally faster than doing it incrementally in
2G chunks; but I concede that there may indeed be scenarios such as
qemu's qcow2 file where that is true.  So it does sound like a useful
option and/or command to be proposed for addition to the NBD protocol,
from that point of view.

As with other extensions to NBD, the best way is to write up a proposal
for how the documentation should change, submit that as patches to the
nbd list, and accompany it with a proof-of-concept implementation
(qemu's nbd server and nbd client work well), so that we can iron out
the details of the documentation before making it a formal part of the
spec.  It's important to remember that such a proposal should still be
optional (a server need not implement the new mode, and a client should
be prepared to fall back to other means if the server does not support a
whole-device action).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Eric Blake

On 09/15/2016 11:27 AM, Wouter Verhelst wrote:
> On Thu, Sep 15, 2016 at 05:08:21PM +0100, Alex Bligh wrote:
>> Wouter,
>>
>>> The server can always refuse to allow multiple connections.
>>
>> Sure, but it would be neater to warn the client of that at negotiation
>> stage (it would only be one flag, e.g.  'multiple connections
>> unsafe').
> 
> I suppose that's not a bad idea.

Seems like it may need to be a per-export flag, rather than a global
flag (as a given server may be able to serve multiple types of files,
where the safety depends on the type of file being served).


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [RESEND][PATCH 0/5] nbd improvements

2016-09-15 Thread Eric Blake

On 09/15/2016 06:09 AM, Alex Bligh wrote:
> 
> I also wonder whether any servers that can do caching per
> connection will always share a consistent cache between 
> connections. The one I'm worried about in particular here
> is qemu-nbd - Eric Blake CC'd.
> 

I doubt that qemu-nbd would ever want to support the situation with more
than one client connection writing to the same image at the same time;
the implications of sorting out data consistency between multiple
writers is rather complex and not worth coding into qemu.  So I think
qemu would probably prefer to just prohibit the multiple writer
situation.  And while multiple readers with no writer should be fine,
I'm not even sure if multiple readers plus one writer can always be made
to appear sane (if there is no coordination between the different
connections, on an image where the writer changes AA to BA then flushes
then changes to BB, it is still feasible that a reader could see AB
(pre-flush state of the first sector, post-flush changes to the second
sector, even though the writer never flushed that particular content to
disk).

But Paolo Bonzini (cc'd) may have more insight on qemu's NBD server and
what it supports (or forbids) in the way of multiple clients to a single
server.

> A more general point is that with multiple queues requests
> may be processed in a different order even by those servers that
> currently process the requests in strict order, or in something
> similar to strict order. The server is permitted by the spec
> (save as mandated by NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA) to
> process commands out of order anyway, but I suspect this has
> to date been little tested.

qemu-nbd is definitely capable of serving reads and writes out-of-order
to a single connection client; but that's different than the case with
multiple connections.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server

2016-07-19 Thread Eric Blake

On 07/19/2016 09:34 PM, Fam Zheng wrote:
> On Tue, 07/19 17:45, Paolo Bonzini wrote:
>>
>>
>> On 19/07/2016 17:28, Eric Blake wrote:
>>>> If I'm reading the NBD proto.md correctly, this is not enough if
>>>> NBD_CMD_FLAG_NO_HOLE is specified. We probably need to use a zeroed buffer 
>>>> with
>>>> blk_pwrite, or pass a new flag (BDRV_RED_NO_HOLE) to blk_pwrite_zeroes to
>>>> enforce the bdrv_driver_pwritev() branch in bdrv_co_do_pwrite_zeroes().
>>
>> I agree with Eric's interpretation.  It's a bit weird to have the
>> direction inverted, but I'm not sure I see the ambiguity.  Can you explain?
> 
> Write zeroes _means_ "punch hole" on a raw file.
> 
> In block/raw-posix.c:handle_aiocb_write_zeroes():
>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>> if (s->has_discard && s->has_fallocate) {
>> int ret = do_fallocate(s->fd,
>>FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>>aiocb->aio_offset, aiocb->aio_nbytes);
>> if (ret == 0) {
>> ret = do_fallocate(s->fd, 0, aiocb->aio_offset, 
>> aiocb->aio_nbytes);

That is just implementation: punch a hole, BUT THEN reallocate it back,
so that in the end, the file is still not sparse in that region.  Or am
I reading it wrong?

But the implementation under the hood is not visible to the guest - as
long as the end result is that a guest requesting NO_HOLE ends up with a
non-sparse file, and the data reads back as all 0, the client doesn't
care whether the zeros were written byte-by-byte or sped up by punching
a hole then reallocating.

> 
> And unmap is translated to "punch hole", too.
> 
> In block/raw-posix.c:handle_aiocb_discard():
>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>> ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
>>aiocb->aio_offset, aiocb->aio_nbytes);
>> #endif

No, this call is different - it punches a hole, then stops.  There is no
followup do_fallocate(,0,,) to reallocate, so the file remains sparse.

> 
> So I agree that NBD_CMD_FLAG_NO_HOLE is a poorly named flag, because there is
> always going to be a hole event if it's set.

If we are punching holes even when BDRV_REQ_MAY_UNMAP is not set, that
seems like we have a bug in qemu (unless we are immediately then
reallocating so that there is no resulting hole).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server

2016-07-19 Thread Eric Blake

[adding nbd list]

On 07/19/2016 12:21 AM, Fam Zheng wrote:
> On Mon, 07/18 22:08, Eric Blake wrote:
>> Upstream NBD protocol recently added the ability to efficiently
>> write zeroes without having to send the zeroes over the wire,
>> along with a flag to control whether the client wants a hole.
>>
>> Signed-off-by: Eric Blake <ebl...@redhat.com>
>>
>> ---

>> @@ -1235,6 +1242,37 @@ static void nbd_trip(void *opaque)
>>  }
>>  break;
>>
>> +case NBD_CMD_WRITE_ZEROES:
>> +TRACE("Request type is WRITE_ZEROES");
>> +
>> +if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
>> +TRACE("Server is read-only, return error");
>> +reply.error = EROFS;
>> +goto error_reply;
>> +}
>> +
>> +TRACE("Writing to device");
>> +
>> +flags = 0;
>> +if (request.flags & NBD_CMD_FLAG_FUA) {
>> +flags |= BDRV_REQ_FUA;
>> +}
>> +if (!(request.flags & NBD_CMD_FLAG_NO_HOLE)) {
>> +flags |= BDRV_REQ_MAY_UNMAP;
> 
> If I'm reading the NBD proto.md correctly, this is not enough if
> NBD_CMD_FLAG_NO_HOLE is specified. We probably need to use a zeroed buffer 
> with
> blk_pwrite, or pass a new flag (BDRV_RED_NO_HOLE) to blk_pwrite_zeroes to
> enforce the bdrv_driver_pwritev() branch in bdrv_co_do_pwrite_zeroes().

If that's how you read it, then my proposal to proto.md needs updating.
 I specifically wrote the proposal to be as close as possible to the
existing qemu semantics, except that we negated the sense of the bit
because we wanted to allow the bit value of 0 to allow the server the
most flexibility in performing optimizations.  The code here (and in
14/14 on the client side) is merely catering to the fact that the bit
has opposite sense in the two projects.

That is, the rules in qemu are:

MAY_UNMAP == 0 : must write zeroes
MAY_UNMAP == 1 : may optimize if supported (where reads will see 0), but
must write zeroes if not

while the rules in NBD are:

FLAG_NO_HOLE == 1 : must write zeroes
FLAG_NO_HOLE == 0 : may optimize if supported (where reads will see 0),
but must write zeroes if not

Or another way of putting it: in qemu, the ability to punch holes was
added after the fact (default of no holes being 0 due to back-compat),
where prior to its addition full allocation was the only option, and
most callers have to worry about passing MAY_UNMAP when they care about
optimal use of storage; while in NBD we want to allow the server the
freedom to have optimal usage of storage by default, but need a way to
specifically ask for full allocation.

If you think the NBD flag is poorly named, we have not yet committed to
the NBD_CMD_WRITE_EXTENSIONS documentation yet, and are free to patch
proto.md to choose a better name and/or wording to better describe what
we actually mean on the NBD side of things.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH v4 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client

2016-06-27 Thread Eric Blake

On 06/27/2016 06:13 AM, Paolo Bonzini wrote:
> 
> 
> On 26/06/2016 00:15, Eric Blake wrote:
>> diff --git a/block/nbd.c b/block/nbd.c
>> index 8d57220..049d1bd 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -357,6 +357,7 @@ static int nbd_co_flush(BlockDriverState *bs)
>>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>>  {
>>  bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
>> +bs->bl.max_pwrite_zeroes = NBD_MAX_BUFFER_SIZE;
> 
> I have probably asked before---is there any reason for these to be
> limited, since the commands have no payload?

Here's the last time it was brought up on the nbd-general list [1].  We
have the potential BLOCK_SIZE handshake negotiation extension, where I
was proposing that the server can advertise its actual limits (rather
than the client having to guess them or rely on out-of-band information)
- and I was proposing that NBD_CMD_TRIM and NBD_CMD_WRITE_ZEROES should
be permitted to advertise additional limits that are larger than the
NBD_CMD_WRITE limit, precisely because they don't carry a payload and
can therefore be more efficient if done in bulk.

[1] https://sourceforge.net/p/nbd/mailman/message/35081223/

But at the time of that thread, there was concern expressed whether
adding and additional NBD_INFO for each NBD_CMD limit would scale well,
or whether we need a different approach, and I haven't revisited the
thread since that comment.  At any rate, I have BLOCK_SIZE patches ready
for qemu, once the WRITE_ZERO patches land, and where it should be easy
to make this limit runtime-settable to a larger value from actual server
limits, if we can decide how BLOCK_SIZE should advertise such a limit.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH 3/3]nbd: make nbd device wait for its users

2016-06-24 Thread Eric Blake

On 06/24/2016 04:09 AM, Pranay Kr. Srivastava wrote:
> When a timeout occurs or a recv fails, then
> instead of abruplty killing nbd block device

s/abruplty/abruptly/

> wait for it's users to finish.

s/it's/its/

> 
> This is more required when filesystem(s) like
> ext2 or ext3 don't expect their buffer heads to
> disappear while the filesystem is mounted.
> 
> Each open of a nbd device is refcounted, while
> the userland program [nbd-client] doing the
> NBD_DO_IT ioctl would now wait for any other users
> of this device before invalidating the nbd device.
> 
> A timedout or a disconnected device, if in use, can't
> be used until it has been resetted. The resetting happens

s/resetted/reset/

> when all tasks having this bdev open closes this bdev.
> 
> Signed-off-by: Pranay Kr. Srivastava <pran...@gmail.com>
> ---
>  drivers/block/nbd.c | 124 
> ----
>  1 file changed, 96 insertions(+), 28 deletions(-)
> 


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH 1/2] nbd: make nbd device wait for its users

2016-06-24 Thread Eric Blake

On 06/24/2016 03:29 AM, Markus Pargmann wrote:
> From: "Pranay Kr. Srivastava" <pran...@gmail.com>
> 
> When a timeout occurs or a recv fails, then instead of abruplty killing

s/abruplty/abruptly/

> nbd block device wait for it's users to finish.

s/it's/its/ (remember, "it's" is only usable where "it is" would also be
appropriate)

> 
> This is more required when filesystem(s) like ext2 or ext3 don't expect
> their buffer heads to disappear while the filesystem is mounted.
> 
> Each open is counted. The blockdevice is kept open until the last user
> closes the block device. This offers the possibility as well to open a
> new socket to be used while the filesystems are mounted.
> 
> Signed-off-by: Pranay Kr. Srivastava <pran...@gmail.com>
> 
> [mpa: Keep the blockdevice open until all users left]
> Signed-off-by: Markus Pargmann <m...@pengutronix.de>



-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Attend Shape: An AT Tech Expo July 15-16. Meet us at AT Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PULL 23/28] nbd: always query export list in fixed new style protocol

2016-05-17 Thread Eric Blake

On 05/17/2016 10:39 AM, Richard W.M. Jones wrote:
> On Tue, May 17, 2016 at 09:59:02AM -0600, Eric Blake wrote:
>> On 05/17/2016 09:52 AM, Eric Blake wrote:
>>>>> Perhaps nbdkit should implement NBD_OPT_LIST returning just "" (the
>>>>> default name) as its only list entry?
>>>>
>>>> Or "default".
>>>
>>> As I read the protocol, I don't see "default" as a permissible name of
>>> the default export, just "".
>>>
>>> Also, we currently state that NBD_OPT_LIST has zero or more
>>> NBD_REP_SERVER replies, which means that it is valid for the command to
>>> succeed with NBD_REP_ACK _without_ advertising any exports at all
>>> (rather annoying in that it tells you nothing about whether
>>> NBD_OPT_EXPORT_NAME/NBD_OPT_GO will succeed anyways).  Should we reword
>>> that to require that if NBD_REP_ACK is sent, then at least one
>>> NBD_REP_SERVER reply was given (where "" is the obvious name, but at the
>>> same time where "" is not mandatory if some other name is given)?
>>
>> Okay, I just downloaded nbdkit code and checked; nbdkit is sending 0
>> entries and NBD_REP_ACK (for success), and NOT an error code; so it is
>> implying that there are NO known export names but that the command was
>> recognized.
> 
> Bleah you are right.  I misread the code I wrote :-(
> 
> We can add a empty string list entry there easily enough.

Will adding the empty string break qemu 2.5 connections?  qemu 2.6  is
picky in that if the name it is requesting is not in the advertised
list, it gives up; but if there is NO advertised list, it assumes the
name will just work.  Since nbdkit has the latter behavior, I'm thinking
that nbdkit returning NBD_REP_ERR_UNSUP instead of NBD_REP_ACK is a
smarter course of action.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PULL 23/28] nbd: always query export list in fixed new style protocol

2016-05-17 Thread Eric Blake

;;
>> or an explicit statement that if a server rejects NBD_OPT_LIST, then the
>> client SHOULD assume that any name will work for
>> NBD_OPT_EXPORT_NAME/NBD_OPT_GO.
> 
> I think it's simpler than that. Servers claiming to implement fixed
> newstyle in my view need to implement all the options unless the options
> are marked as optional. I admit (now) those could be clarified.

But to date, I think ALL of the options (except NBD_OPT_EXPORT_NAME)
_are_ optional.  In fact, I'm arguing that per Option 2, we WANT
NBD_OPT_LIST to be optional for servers that don't care about names.



-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PULL 23/28] nbd: always query export list in fixed new style protocol

2016-05-17 Thread Eric Blake

On 05/17/2016 09:58 AM, Richard W.M. Jones wrote:
> On Tue, May 17, 2016 at 09:52:30AM -0600, Eric Blake wrote:
>> so it might be nicer to
>> make a change to the protocol document that instead permits current
>> nbdkit behavior and puts the burden on clients to interoperate when
>> NBD_OPT_LIST is not supported.
> 
> The purpose of nbdkit is to be a server for qemu, to be a replacement
> for qemu-nbd in cases where proprietary code cannot be combined with
> qemu for copyright/licensing/legal reasons.  So we aim to make sure we
> can interoperate with qemu.  No need to change any standards for
> nbdkit!  Clarifying standards documents is OK though.

I also noticed that nbdkit forcefully rejects a client that sends more
than 32 NBD_OPT_ commands, even though this is perfectly valid behavior
on the part of the client.  Maybe the protocol should document a
(higher) limit on minimum number of options a client can expect to be
serviced before the server dropping the connection because the client is
wasting the server's time.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PULL 23/28] nbd: always query export list in fixed new style protocol

2016-05-17 Thread Eric Blake

On 05/17/2016 09:52 AM, Eric Blake wrote:
>>> Perhaps nbdkit should implement NBD_OPT_LIST returning just "" (the
>>> default name) as its only list entry?
>>
>> Or "default".
> 
> As I read the protocol, I don't see "default" as a permissible name of
> the default export, just "".
> 
> Also, we currently state that NBD_OPT_LIST has zero or more
> NBD_REP_SERVER replies, which means that it is valid for the command to
> succeed with NBD_REP_ACK _without_ advertising any exports at all
> (rather annoying in that it tells you nothing about whether
> NBD_OPT_EXPORT_NAME/NBD_OPT_GO will succeed anyways).  Should we reword
> that to require that if NBD_REP_ACK is sent, then at least one
> NBD_REP_SERVER reply was given (where "" is the obvious name, but at the
> same time where "" is not mandatory if some other name is given)?

Okay, I just downloaded nbdkit code and checked; nbdkit is sending 0
entries and NBD_REP_ACK (for success), and NOT an error code; so it is
implying that there are NO known export names but that the command was
recognized.

>>
>> My interpretation of NBD_OPT_LIST failing would be 'this server
>> doesn't have anything it can export'.
> 
> Indeed, and that's why qemu as a client is currently dropping the
> connection with nbdkit.

Except that nbdkit is NOT failing NBD_OPT_LIST, just merely listing 0
replies.

>  But I would also make that interpretation for
> NBD_OPT_LIST succeeding with NBD_REP_ACK with 0 NBD_REP_SERVER replies -

which is the nbdkit behavior

> so maybe it is worth a note in the protocol how to detect servers that
> are exporting exactly one volume and don't care what name you pass, then
> tweaking either nbdkit, qemu, or both to comply to that added protocol
> wording.


-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PULL 23/28] nbd: always query export list in fixed new style protocol

2016-05-17 Thread Eric Blake

ssue here. It explains what needs to be
> implemented by the server, and that includes NBD_OPT_LIST. Very
> happy to add some clarity, but I'm not sure where it's needed.

I've hinted at it above - either an explicit statement that servers
cannot reject NBD_OPT_LIST with NBD_REP_UNSUP, and that if they have no
other exports, then the SHOULD include an NBD_REP_SERVER with name "";
or an explicit statement that if a server rejects NBD_OPT_LIST, then the
client SHOULD assume that any name will work for
NBD_OPT_EXPORT_NAME/NBD_OPT_GO.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PULL 23/28] nbd: always query export list in fixed new style protocol

2016-05-17 Thread Eric Blake

[adding nbd-list]

On 05/17/2016 03:53 AM, Richard W.M. Jones wrote:
> On Tue, Feb 16, 2016 at 05:34:41PM +0100, Paolo Bonzini wrote:
>> From: "Daniel P. Berrange" <berra...@redhat.com>
>>
>> With the new style protocol, the NBD client will currenetly
>> send NBD_OPT_EXPORT_NAME as the first (and indeed only)
>> option it wants. The problem is that the NBD protocol spec
>> does not allow for returning an error message with the
>> NBD_OPT_EXPORT_NAME option. So if the server mandates use
>> of TLS, the client will simply see an immediate connection
>> close after issuing NBD_OPT_EXPORT_NAME which is not user
>> friendly.
>>

> I just bisected qemu 2.6 to find out where it breaks interop with
> nbdkit, and it is this commit.
> 
> nbdkit implements the newstyle protocol, but doesn't implement export
> names (yet, maybe in future).  It processes the NBD_OPT_EXPORT_NAME
> option, but ignores the export name and carries on regardless.
> 
> nbdkit's implemention of NBD_OPT_LIST returns an error, because there
> is no such thing as a list of export names supported (in effect nbdkit
> allows any export name).

Perhaps nbdkit should implement NBD_OPT_LIST returning just "" (the
default name) as its only list entry?

And at some point, nbdkit should probably implement NBD_OPT_GO (which is
a nicer replacement to NBD_OPT_EXPORT_NAME; I've got patches pending for
qemu to implement that).

In fact, if NBD_OPT_GO is supported, then my patches to qemu won't even
try NBD_OPT_LIST.

> 
> Therefore I don't believe the assumption made here -- that you can
> list export names and choose them on the client side -- is a valid
> one.

Qemu is not choosing an export name, so much as proving whether any
OTHER errors can be detected (such as export name not present, or TLS
required).  It _should_ be gracefully ignoring NBD_OPT_LIST errors if
such errors don't definitively prove that the export will not succeed,
but I guess this is not happening.  I'll see if I can tweak the qemu
logic to be more forgiving of nbdkit's current behavior.

> 
> Naturally the protocol document
> (https://github.com/yoe/nbd/blob/master/doc/proto.md) isn't clear on
> this case.

You're right that we may also want to tweak the NBD protocol to make
this interoperability point obvious.

> 
> To test qemu against nbdkit you can do this in the nbdkit sources:
> 
>   make
>   make check TESTS=test-newstyle \
> LIBGUESTFS_HV=/path/to/qemu/x86_64-softmmu/qemu-system-x86_64 \
> LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1
> 
> Rich.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] doc: Clarify maximum size limits

2016-05-11 Thread Eric Blake

Requiring a server to support 32M as its maximum block size
may be a bit large for some setups; rewrite things to make it
obvious that a server may advertise a maximum as small as 1M,
but that without an advertisement, there are existing clients
(hello, kernel NBD module) that expect 32M to work.

Add some clarity that servers SHOULD NOT treat anything
smaller than 32M as a denial of service (and in contrast to
the recent patch for a 16k limit during option haggling),
and that denial-of-service limits may differ according to
command.

Also add a way for servers to advertise on a per-command basis
when it is willing to accept larger limits because there is no
payload involved; the two obvious commands are NBD_CMD_TRIM and
NBD_CMD_WRITE_ZEROES, which now have NBD_INFO_TRIM_SIZE and
NBD_INFO_ZERO_SIZE.  These new limits are listed separately
from NBD_INFO_BLOCK_SIZE since both commands are optional and
need not be implemented by a minimal server; but letting the
server advertise them makes sense since at least scsi devices
natively have three different size constraints for read/write,
unmap, and writesame.  A server MAY honor larger trim requests
even without advertising, but a client cannot rely on that
working unless the additional sizes are advertised.

The full text for NBD_INFO_ZERO_SIZE will be written later once
either that extension (or this one) is promoted to stable; it can
copy the pattern of NBD_INFO_TRIM_SIZE, except that where
CMD_TRIM can round the request (and only actually trim a subset
of the client request), CMD_WRITE_ZEROES must write actual zeroes
over the entire request (but if it can punch holes, only a
subset of the client request need result in holes).

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

For the extension-info branch, and builds on the earlier
patch sent for the master branch about handshake limits

 doc/proto.md | 92 +++-
 1 file changed, 79 insertions(+), 13 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 5692be5..bf29b3a 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -686,7 +686,9 @@ of 2^9 (512), a preferred block size of 2^16 (65,536), and 
a maximum
 block size of 2^25 (33,554,432)).  Notwithstanding any maximum block
 size advertised, either the server or the client MAY initiate a hard
 disconnect if the size of a request or a reply is large enough to be
-deemed a denial of service attack.
+deemed a denial of service attack.  However, for maximum portability,
+any length less than 2^25 (33,554,432) bytes SHOULD NOT be considered
+a denial of service attack.

 The minimum block size represents the smallest addressable length and
 alignment within the export, although writing to an area that small
@@ -710,29 +712,47 @@ preferred block size for that export.  The server MAY 
advertise an
 export size that is not an integer multiple of the preferred block
 size.

-The maximum block size represents the maximum length that the server
-is willing to handle in one request.  If advertised, it MUST be either
+The maximum block size represents the maximum payload length that the
+server is willing to handle in one request (whether the payload is
+from the client as in `NBD_CMD_WRITE` or from the server as in
+`NBD_CMD_READ`).  If advertised, it SHOULD be either
 an integer multiple of the minimum block size or the value 0x
 for no inherent limit, MUST be at least as large as the smaller of the
-preferred block size or export size, and SHOULD be at least 2^25
-(33,554,432) if the export is that large, but MAY be something other
+preferred block size or export size, and SHOULD be at least 2^20
+(1,048,576) if the export is that large, but MAY be something other
 than a power of 2.  For convenience, the server MAY advertise a
 maximum block size that is larger than the export size, although in
 that case, the client MUST treat the export size as the effective
 maximum block size (as further constrained by a non-zero offset).

+Some commands (such as `NBD_CMD_TRIM`) allow the client to specify
+*length* but do not involve a payload; for these commands, the client
+and server may negotiate additional information about larger size
+limits for those commands (such as `NBD_INFO_TRIM_SIZE`).  For these
+commands, a client SHOULD NOT use a *length* larger than the
+negotiated maximum block size from `NBD_INFO_BLOCK_SIZE` unless the
+additional information was successfully negotiated.
+
 Where a transmission request can have a non-zero *offset* and/or
 *length* (such as `NBD_CMD_READ`, `NBD_CMD_WRITE`, or `NBD_CMD_TRIM`),
 the client MUST ensure that *offset* and *length* are integer
 multiples of any advertised minimum block size, and SHOULD use integer
 multiples of any advertised preferred block size where possible.  For
 those requests, the client MUST NOT use a *length* larger than any
-advertised maximum block size or which, when added to *offset*, would
-exceed the export size.  The server

Re: [Nbd] [PATCH] doc: Document maximum message length during option haggling

2016-05-11 Thread Eric Blake

On 05/11/2016 01:00 PM, Alex Bligh wrote:
> Eric,
> 
> I'm not convinced this fixes anything.
> 
> You're now saying you MAY treat anything over 16k as a
> denial of service attack. But in the existing text, you
> MAY do that anyway - you don't have to treat all lengths
> alike for all commands.

Yes, but I'm about to post another patch for block sizes, where it is
recommended that the denial of service length during transmission phase
is 32M (since that matches the current kernel implementation, as well as
the qemu implementation) - anything smaller than reduces portability.
Calling out a MUCH smaller limit of 16k during handshake phase in
contrast to transmission phase, and acknowledging that we have bytes in
transmission phase that will always be 0 without some other extension to
state otherwise, seems worthwhile.  If nothing else, writing down our
minimum size at which denial of service may occur will make sure that
our future additions consider those lengths, and give clients something
to be aware of even when servers don't advertise limits.

Currently, qemu's denial of service limit is set to 32M for all
messages, including option haggling, which is much larger than the 16k
I'm proposing here.  But wading through 32M of garbage from a client is
a lot to ask of a server that wants to get to transmission phase as soon
as possible, and if we take this patch, I will be updating qemu to have
two different maximum message sizes (one for options, the other for
transmission).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] doc: Document maximum message length during option haggling

2016-05-11 Thread Eric Blake

In the transmission phase, large lengths make sense for
NBD_CMD_READ/WRITE, both because the client really does
want to access large blocks of data, and because requests
can be serviced out of order so that a large request need
not hold up a subsequent small request.

But these arguments do not apply during option haggling.
First, we are NOT handling arbitrary sizes at the client's
whim, where economies of scale matter, but instead dealing
with known sizes based on the specifications of the
individual options and replies (our worst case is currently
NBD_REP_SERVER, which can contain up to 8196 bytes based on
our hard maximum of 4096 bytes per string).  Second, the
option requests are synchronous, so one side cannot
send another message until it has first processed all the
data from the previous message.  Even if the request or
reply is unknown, forcing the remote side to read to the
end of a huge unknown message before it can continue may be
a waste of time compared to the remote side considering the
sender to be compromised and just disconnecting.

Document a restriction that we intend all future defined
option requests and replies to fit with 16k, and that either
client or server may initiate a hard disconnect on a size
larger than that.

Since this means that the upper two bytes of the length
field should always be 0, we may want to repurpose those bytes
in a future version of the protocol; for example, if we find
ourselves needing flags along with option requests, we could
break the 32-bit length field into a 16-bit flags and 16-bit
length (similar to how we broke the 32-bit request type into
16-bit flag and 16-bit request for the transmission phase).
But doing so would cause any older recipient to treat non-zero
flags as a rather large length, so document that such a change
in the protocol should also be accompanied by a new global
flag and client response to make it clear that both sides agree
on the new interpretation of those two bytes.  Also note that
we could introduce flags by breaking up the 32-bit option type
field rather than the length field (although then we'd have to
be careful about whether an NBD_REP message could have different
flags than those requested by the client).

Signed-off-by: Eric Blake <ebl...@redhat.com>
---
 doc/proto.md | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 2956746..0daed51 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -137,9 +137,18 @@ C: 32 bits, option
 C: 32 bits, length of option data (unsigned)  
 C: any data needed for the chosen option, of length as specified above.  

-The presence of the option length in every option allows the server
-to skip any options presented by the client that it does not
-understand.
+The presence of the option length in every option allows the server to
+skip any options presented by the client that it does not understand.
+Although *length* is a 32-bit field, none of the options defined in
+this standard include more than one string (which is capped at 4096
+bytes); and extensions should likewise limit the amount of data the
+client can send for a single valid option.  Therefore, the server MAY
+treat any option request with *length* larger than 16,384 as a denial
+of service attack, and initiate a hard disconnect.  If needed, a
+future version of the standard might introduce a global flag and
+client response flag that would allow the client and server to replace
+the 32-bit length field with a pair of 16-bit flags and 16-bit length
+fields.

 If the value of the option field is `NBD_OPT_EXPORT_NAME` and the server
 is willing to allow the export, the server replies with information
@@ -192,6 +201,19 @@ S: 32 bits, length of the reply. This MAY be zero for some 
replies, in
 S: any data as required by the reply (e.g., an export name in the case
of `NBD_REP_SERVER`)  

+Although *length* is a 32-bit field, none of the option replies
+defined in this standard include more than two strings (which are
+capped at 4096 bytes each); and extensions should likewise limit the
+amount of data the server can send for a single option reply.
+Therefore, the client MAY treat any option reply with *length* larger
+than 16,384 as a corrupt server, and initiate a hard disconnect (this
+limit applies only to a single option reply, and not to the cumulative
+length of a series of replies to a single option request).  If needed,
+a future version of the standard might introduce a global flag and
+client response flag that would allow the client and server to replace
+the 32-bit length field with a pair of 16-bit flags and 16-bit length
+fields.
+
 The client MUST NOT send any option until it has received a final
 reply to any option it has sent (note that some options e.g.
 `NBD_OPT_LIST` have multiple replies, and the final reply is
-- 
2.5.5


--
Mobile security can be en

Re: [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-11 Thread Eric Blake

On 05/11/2016 03:38 AM, Paolo Bonzini wrote:
> 
> 
> On 10/05/2016 18:23, Alex Bligh wrote:
>>> Is there a use case where you'd want to split up a single big TRIM request
>>> in smaller ones (as in some hardware would not support it or something)?
>>> Even then, it looks like this splitting up would be hardware dependant and
>>> better implemented in block device drivers.
>>
>> Part of the point of the block size extension is to push such limits to the
>> client.
>>
>> I could make up use cases (e.g. that performing a multi-gigabyte trim in
>> a single threaded server will effectively block all other I/O), but the
>> main reason in my book is orthogonality, and the fact the client needs
>> to do some breaking up anyway.
> 
> That's why SCSI for example has a limit to UNMAP and WRITE SAME requests
> (actually it has three limits: number of ranges per unmap, which in NBD
> and in SCSI WRITE SAME is 1; number of blocks per unmap descriptor;
> number of blocks per WRITE SAME operation).  These limits however are a
> different one than the read/write limit, because you want to support
> systems where TRIM is much faster than regular I/O (and possibly even
> O(1) if trimming something that is already trimmed).

Then I think I will propose a doc patch to the extension-info branch
that adds new INFO items for NBD_INFO_TRIM_SIZE and
NBD_INFO_WRITE_ZERO_SIZE (if requested by client and replied by server,
then this can be larger than the normal maximum size in
NBD_INFO_BLOCK_SIZE; if either side doesn't request the info, then
assume any maximum in NBD_INFO_BLOCK_SIZE applies, otherwise UINT32_MAX;
and the two infos are separate items because NBD_FLAG_SEND_TRIM and
NBD_FLAG_SEND_WRITE_ZEROES are orthogonally optional).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] Fwd: [PATCH v4 02/18] nbd: fix checkpatch trailing space warning.

2016-05-11 Thread Eric Blake

On 05/11/2016 03:38 AM, Pranay Srivastava wrote:

> 
> The series contained some checkpatch changes so I had included you as well.
> 
>> know why you are sending them to me), but I know I do not accept patches
>> without any changelog text at all in them, as that's just lazy.
> 
> That should be per patch or can appear in a cover letter for the patches?

Per patch.  However, if it were me, I would not have split into quite so
many patches.  The mantra is one patch per one fix, but I think it is
reasonable to state that "silence all checkpatch warnings" counts as one
fix, rather than 16 separate fixes.  If you DO consolidate the
checkpatch changes into a single patch, then the commit message body
should call out a bulleted list of all the changes you are making, as
well as a justification why it is worth churning the entire file rather
than just making smaller checkpatch fixes in just the areas that your
other patches touch.

> 
> Actually I've made more patches in this series after I had sent the
> earlier ones,
> but the earlier ones are not changed at all. It's only the addition of
> newer patches
> to the series.

The cover letter is a great place to point out how v4 differs from v3,
but also to point out which patches are unchanged from v3, to save
reviewer's time.  So if all you did was add new patches, a cover-letter
mention of which patches remain unchanged might be helpful.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake

On 05/10/2016 10:33 AM, Quentin Casasnovas wrote:

> Looks like there's an easier way:
> 
>  $ qemu-img create -f qcow2 foo.qcow2 10G
>  $ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
>  $ mkfs.ext4 /dev/nbd0
>  mke2fs 1.42.13 (17-May-2015)
>  Discarding device blocks: failed - Input/output error

strace on mkfs.ext4 shows:
...
open("/dev/nbd1", O_RDWR|O_EXCL)= 3
stat("/dev/nbd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(43, 1), ...}) = 0
ioctl(3, BLKDISCARDZEROES, [0]) = 0
ioctl(3, BLKROGET, [0]) = 0
uname({sysname="Linux", nodename="red", ...}) = 0
ioctl(3, BLKDISCARD, [0, 100])  = 0
write(1, "Discarding device blocks: ", 26) = 26
write(1, "   4096/2621440", 15   4096/2621440) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
ioctl(3, BLKDISCARD, [100, 8000]) = -1 EIO (Input/output error)
write(1, "   ", 15   ) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
write(1, "failed - ", 9failed - )= 9

so it does indeed look like the smaller request [0, 0x100] succeeded
while the larger [0x100, 0x8000] failed with EIO.  But I
instrumented my qemu-nbd server to output all of the incoming requests
it was receiving from the kernel, and I never saw a mention of
NBD_CMD_TRIM being received.  Maybe it's dependent on what version of
the kernel and/or NBD kernel module you are running?

I also tried fallocate(1) on /dev/nbd1, as in:
 # strace fallocate -o 0 -l 64k -p /dev/nbd1
but it looks like the kernel doesn't yet wire up fallocate(2) to forward
to the appropriate device ioctls and/or NBD_CMD_TRIM, where strace says:

open("/dev/nbd1", O_RDWR)   = 3
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 65536) = -1
ENODEV (No such device)

The ENODEV is a rather unusual choice of error.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake

On 05/10/2016 09:41 AM, Alex Bligh wrote:
> 
> On 10 May 2016, at 16:29, Eric Blake <ebl...@redhat.com> wrote:
> 
>> So the kernel is currently one of the clients that does NOT honor block
>> sizes, and as such, servers should be prepared for ANY size up to
>> UINT_MAX (other than DoS handling).
> 
> Interesting followup question:
> 
> If the kernel does not fragment TRIM requests at all (in the
> same way it fragments read and write requests), I suspect
> something bad may happen with TRIM requests over 2^31
> in size (particularly over 2^32 in size), as the length
> field in nbd only has 32 bits.
> 
> Whether it supports block size constraints or not, it is
> going to need to do *some* breaking up of requests.

Does anyone have an easy way to cause the kernel to request a trim
operation that large on a > 4G export?  I'm not familiar enough with
EXT4 operation to know what file system operations you can run to
ultimately indirectly create a file system trim operation that large.
But maybe there is something simpler - does the kernel let you use the
fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake

On 05/10/2016 09:08 AM, Alex Bligh wrote:
> Eric,
> 
>> Hmm. The current wording of the experimental block size additions does
>> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
>> maximum NBD_CMD_WRITE:
>> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints
> 
> Correct
> 
>> Maybe we should revisit that in the spec, and/or advertise yet another
>> block size (since the maximum size for a trim and/or write_zeroes
>> request may indeed be different than the maximum size for a read/write).
> 
> I think it's up to the server to either handle large requests, or
> for the client to break these up.

But the question at hand here is whether we should permit servers to
advertise multiple maximum block sizes (one for read/write, another one
for trim/write_zero, or even two [at least qemu tracks a separate
maximum trim vs. write_zero sizing in its generic block layer]), or
merely stick with the current wording that requires clients that honor
maximum block size to obey the same maximum for ALL commands, regardless
of amount of data sent over the wire.

> 
> The core problem here is that the kernel (and, ahem, most servers) are
> ignorant of the block size extension, and need to guess how to break
> things up. In my view the client (kernel in this case) should
> be breaking the trim requests up into whatever size it uses as the
> maximum size write requests. But then it would have to know about block
> sizes which are in (another) experimental extension.

Correct - no one has yet patched the kernel to honor block sizes
advertised through what is currently an experimental extension.  (We
have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
minimum block size, but I haven't audited whether the kernel actually
guarantees that all client requests are sent aligned to the value passed
that way - but we have nothing to set the maximum size, and are at the
mercy of however the kernel currently decides to split large requests).
 So the kernel is currently one of the clients that does NOT honor block
sizes, and as such, servers should be prepared for ANY size up to
UINT_MAX (other than DoS handling).  My question above only applies to
clients that use the experimental block size extensions.

> 
> I've nothing against you fixing it qemu's server (after all I don't
> think there is anything to /prohibit/ a server working on something
> larger than the maximum block size), and ...
> 
>> But since the kernel is the one sending the large length request, and
>> since you are right that this is not a denial-of-service in the amount
>> of data being sent in a single NBD message,
> 
> ... currently there isn't actually a maximum block size advertised (that
> being in another experimental addition), so this is just DoS prevention,
> which qemu is free to do how it wishes.

Okay, sounds like that part is uncontroversial, and it is indeed worth
improving qemu's behavior.

> 
> What surprises me is that a release kernel is using experimental
> NBD extensions; there's no guarantee these won't change. Or does
> fstrim work some other way?

No extension in play.  The kernel is obeying NBD_FLAG_SEND_TRIM, which
is in the normative standard, and unrelated to the INFO/BLOCK_SIZE
extensions.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE

2016-05-10 Thread Eric Blake

[adding nbd-devel, qemu-block]

On 05/06/2016 02:45 AM, Quentin Casasnovas wrote:
> When running fstrim on a filesystem mounted through qemu-nbd with
> --discard=on, fstrim would fail with I/O errors:
> 
>   $ fstrim /k/spl/ice/
>   fstrim: /k/spl/ice/: FITRIM ioctl failed: Input/output error
> 
> and qemu-nbd was spitting these:
> 
>   nbd.c:nbd_co_receive_request():L1232: len (94621696) is larger than max len 
> (33554432)
> 

> 
> The length of the request seems huge but this is really just the filesystem
> telling the block device driver that "this length should be trimmed", and,
> unlike for a NBD_CMD_READ or NBD_CMD_WRITE, we'll not try to read/write
> that amount of data from/to the NBD socket.  It is thus safe to remove the
> length check for a NBD_CMD_TRIM.
> 
> I've confirmed this with both the protocol documentation at:
> 
>  https://github.com/yoe/nbd/blob/master/doc/proto.md

Hmm. The current wording of the experimental block size additions does
NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
maximum NBD_CMD_WRITE:
https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints

Maybe we should revisit that in the spec, and/or advertise yet another
block size (since the maximum size for a trim and/or write_zeroes
request may indeed be different than the maximum size for a read/write).

But since the kernel is the one sending the large length request, and
since you are right that this is not a denial-of-service in the amount
of data being sent in a single NBD message, I definitely agree that qemu
would be wise as a quality-of-implementation to allow the larger size,
for maximum interoperability, even if it exceeds advertised limits (that
is, when no limits are advertised, we should handle everything possible
if it is not so large as to be construed a denial-of-service, and
NBD_CMD_TRIM is not large; and when limits ARE advertised, a client that
violates limits is out of spec but we can still be liberal and respond
successfully to such a client rather than having to outright reject it).
 So I think this patch is headed in the right direction.

> 
> and looking at the kernel side implementation of the nbd device
> (drivers/block/nbd.c) where it only sends the request header with no data
> for a NBD_CMD_TRIM.
> 
> With this fix in, I am now able to run fstrim on my qcow2 images and keep
> them small (or at least keep their size proportional to the amount of data
> present on them).
> 
> Signed-off-by: Quentin Casasnovas <quentin.casasno...@oracle.com>
> CC: Paolo Bonzini <pbonz...@redhat.com>
> CC: <qemu-de...@nongnu.org>
> CC: <qemu-sta...@nongnu.org>
> CC: <qemu-triv...@nongnu.org>

This is NOT trivial material and should not go in through that tree.
However, I concur that it qualifies for a backport on a stable branch.

> ---
>  nbd.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/nbd.c b/nbd.c
> index b3d9654..e733669 100644
> --- a/nbd.c
> +++ b/nbd.c
> @@ -1209,6 +1209,11 @@ static ssize_t nbd_co_send_reply(NBDRequest *req, 
> struct nbd_reply *reply,
>  return rc;
>  }
>  
> +static bool nbd_should_check_request_size(const struct nbd_request *request)
> +{
> +return (request->type & NBD_CMD_MASK_COMMAND) != NBD_CMD_TRIM;
> +}
> +
>  static ssize_t nbd_co_receive_request(NBDRequest *req, struct nbd_request 
> *request)
>  {
>  NBDClient *client = req->client;
> @@ -1227,7 +1232,8 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, 
> struct nbd_request *reque
>  goto out;
>  }
>  
> -if (request->len > NBD_MAX_BUFFER_SIZE) {
> +if (nbd_should_check_request_size(request) &&
> +request->len > NBD_MAX_BUFFER_SIZE) {

I'd rather sort out the implications of this on the NBD protocol before
taking anything into qemu.  We've got time on our hand, so let's use it
to get this right.  (That, and I have several pending patches that
conflict with this as part of adding WRITE_ZEROES and INFO_BLOCK_SIZE
support, where it may be easier to resubmit this fix on top of my
pending patches).

>  LOG("len (%u) is larger than max len (%u)",
>  request->len, NBD_MAX_BUFFER_SIZE);
>  rc = -EINVAL;
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCHv5 1/7] Add GnuTLS infrastructure

2016-05-05 Thread Eric Blake

On 05/05/2016 04:48 AM, Alex Bligh wrote:
>>> Indentation I think you will find is totally right, because it
>>> was run through GNU's 'indent' program. Unless something really odd
>>> has happened!
>>

> So the issue that the GNU coding standard requires use of hard tabs
> and these don't work well when transmitted through mail (particularly
> in patches). The actual formatting is 'correct' (in the sense of
> conforming to what 'indent' produces).

These days, GNU Coding Standards state that hard tabs are optional, but
if present they represent 8 spaces.  Many GNU projects use a space-only
policy (GNU coreutils for example).  So if I have any say, I'd recommend
that you re-tune your indent settings for space-only on any file that
you reindent to keep in the GNU style.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [Qemu-devel] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

2016-05-02 Thread Eric Blake

On 03/31/2016 02:34 PM, Eric Blake wrote:
> On 03/31/2016 02:17 PM, Alex Bligh wrote:
>> OK so I actually went and researched what my answer was last time I
>> was asked ( :-) ):
>>
>> Here was my conclusion last time after trawling through lkml
>> on the subject:
>>
>> From https://sourceforge.net/p/nbd/mailman/message/27569820/
>>

>>> Meanwhile, it sounds like FUA is valid on read, write, AND flush
>>> (because the kernel supports all three),
>>
>> Do you have a pointer to what FUA means on kernel reads? Does it
> 
> No clue. I'm not a kernel expert, and was assuming that you knew more
> about it than me.
> 
>> mean "force unit access for the read" or does it mean "flush any
>> write for that block first"? The first is subtly different if the
>> file is remote and being accessed by multiple people (e.g. NFS, Ceph etc.)
> 
> I would lean to the latter - FUA on a read seems like it is most useful
> if it means "guarantee that no one else can read something older than
> what I read", and NOT "give me possibly stale data because I accessed
> the underlying storage rather than paying attention to in-flight writes
> that would change what I read".  In other words, I think you should
> ALWAYS prefer data from in-flight writes over going to backing storage,
> but USUALLY don't need the overhead of waiting for those writes to
> complete; FUA slows down your read, but gives you better data assurance.

SCSI defines FUA on both reads and writes.  Reading
http://www.seagate.com/staticfiles/support/disc/manuals/scsi/100293068a.pdf
under READ(10), I see:

"The force unit access (FUA) and force unit access non-volatile cache
(FUA_NV) bits are defined in table 87.
...
The device server shall read the logical blocks from the medium. If a
cache contains a more
recent version of a logical block, the device server shall write the
logical block to the medium
before reading it.
"

Whether the kernel exposes this aspect of SCSI FUA bit through its bio
interface (and if not, whether it should) are a different matter, but it
sounds like since at least SCSI has a definition for FUA on read, maybe
NBD should (someday) likewise add a definition for it.  At least for
now, we left the door open for it, by at least requiring that servers
not reject the flag, although there may be some discovery issues to
figure out before a client can reasonably know that its FUA-on-read
requests will actually be honored.

> 
>>
>>> even if we aren't quite sure
>>> what to document of those flags.  And that means qemu is correct, and
>>> the NBD protocol has a bug.  Since you contributed the FUA flag, is that
>>> something you can try to improve?
>>
>> Yeah. My mess so I should clean it up. I think FUA should be valid
>> on essentially everything.
>>
>> I think I might wait until structured replies is in though!
> 
> It's also tricky because we just barely documented that servers SHOULD
> reject invalid flags with EINVAL; and that clients MUST NOT send FUA on
> commands where it is not documented; I don't know if we have an adequate
> discovery system in place to learn _which_ commands support FUA,
> especially if you are proposing that we expand the scope of FUA to be
> valid alongside a TRIM request.
> 
> It doesn't have to be solved today, though, so I'm fine if you wait for
> structured replies first.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCHv6] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO

2016-04-29 Thread Eric Blake

On 04/29/2016 09:25 AM, Alex Bligh wrote:

>> The server can always do a minimum block size of 1 (it already has code
>> to do a read-modify-write, when needed), so it will never disconnect a
>> client that uses NBD_OPT_EXPORT_NAME, nor will such a client ever get an
>> EINVAL for an unaligned read.  However, there are some files that are
>> more efficient with a block size of 1 (anything on the file system) than
>> others (an actual block device, where 512 or 4096 is more typical).  So
>> on a per-export basis, the server will prefer to advertise a minimum
>> block size that matches the type of file it is serving, where possible.
>> It works out to these four cases:
>>

> That seems perfectly legal to me, and indeed was what I was first
> planning on doing.
> 
> Then I thought of an alternative strategy. This is in essence to
> always reply with a minimum block size of 1 (after all, you *can*
> support it), and return a preferred block size of the block
> size of the back-end. This strategy assumes the client will
> respect the preferred block size at least 'most' of the time,
> and also that you don't have to open the back-end in a different
> manner to support block sizes smaller than the native blocksize
> (e.g. you are not doing O_DIRECT type things).

Except that I want the preferred size to be larger than 4096 - if I'm
serving a qcow2 file, I want the preferred size to be the cluster size
(often 64k, can be as small as 512 or as big as 2M depending on the
qcow2 file, though), because there ARE efficiency gains once you get to
the cluster level, and things like TRIM or WRITE_ZERO on a qcow2 file
are constrained to the cluster size.

> 
> FWIW gonbdserver just has a 'GetGeometry' call which returns
> the block sizes the backend would like. These are mildly
> sanitized (and overrideable in the config file), and then
> passed through.

Config file overrides count as out-of-band coordination :)  And to some
extent, the choice of preferred block size isn't as much of a sticking
point as the choice of the minimum block size.  And a choice of whether
to use O_DIRECT might also be, to some extent, and out-of-band
configuration.  At any rate, I'm glad we agree that my reading of the
spec vs. planned implementation sounds reasonable - it means the spec is
on the right track.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCHv6] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO

2016-04-29 Thread Eric Blake

On 04/29/2016 08:29 AM, Wouter Verhelst wrote:

>> You can't send NBD_REP_ERR_BLOCK_SIZE_REQD in response to an NBD_OPT_INFO
>> if it's asked for NBD_INFO_BLOCK_SIZE.
>>
>> If it has not asked for NBD_INFO_BLOCK_SIZE it is legitimate to error
>> the NBD_OPT_INFO with NBD_REP_ERR_BLOCK_SIZE_REQD so that the client knows
>> that if sends an NBD_OPT_GO with the same parameters it would get that
>> error, and hence it should either ask for block size constraints or
>> give up.
> 
> Oh, right. I hadn't considered sending an ERR_BLOCK_SIZE_REQD on an INFO
> request without a request for block sizes (after all, it's just an
> information request, the fact that you don't need information on
> everything doesn't mean you'll break things), but I suppose it makes
> sense to do that.

More importantly, if we _don't_ fail the NBD_OPT_INFO, then we _can't_
fail the NBD_OPT_GO. Nothing says that NBD_OPT_GO has to have the same
failure and/or information as a failed NBD_OPT_INFO with the same
parameters, nor that NBD_OPT_GO must not succeed if NBD_OPT_INFO failed;
but we DO want to make sure that if NBD_OPT_GO is going to fail, then
NBD_OPT_INFO should also fail in the sanest way possible.

> 
> It might make sense for such a server to still send all the information
> that the client *did* ask for in the reply, it would just send an error
> along as well to signal that more is going to be needed, but I don't
> suppose that's critical.

Yes, a good server should basically reply with everything that it
normally would on success, and then just switch NBD_REP_ACK to
NBD_REP_ERR_BLOCK_SIZE_REQD as the last message.

Here's what I'm planning on doing in my next qemu spin:

The server can always do a minimum block size of 1 (it already has code
to do a read-modify-write, when needed), so it will never disconnect a
client that uses NBD_OPT_EXPORT_NAME, nor will such a client ever get an
EINVAL for an unaligned read.  However, there are some files that are
more efficient with a block size of 1 (anything on the file system) than
others (an actual block device, where 512 or 4096 is more typical).  So
on a per-export basis, the server will prefer to advertise a minimum
block size that matches the type of file it is serving, where possible.
 It works out to these four cases:

If the client calls NBD_OPT_INFO without NBD_INFO_BLOCK_SIZE, the server
will reply with all information it has, and advertise the block size of
the actual file. If the block size is 1, the server will conclude with
NBD_REP_ACK; if the block size is > 1, the server will conclude with
NBD_REP_ERR_BLOCK_SIZE_REQD.

If the client calls NBD_OPT_INFO with NBD_INFO_BLOCK_SIZE, the server
will reply with all information it has, and advertise the block size of
the actual file.  It will then conclude with NBD_REP_ACK.

If the client calls NBD_OPT_GO without NBD_INFO_BLOCK_SIZE, the server
will reply with all information it has, except that the minimum block
size will be 1, then conclude with NBD_REP_ACK.  That way, the client
cannot send an unaligned request, and the server doesn't have to worry
about reporting EINVAL.  Note that for a file with native minimum block
size > 1, this is a success reply even when the corresponding
NBD_OPT_INFO with the same parameters would have failed, and with
different information - but the way we worded this, _it is okay_.  We
have no requirement on NBD_OPT_GO due to a failed NBD_OPT_INFO, only due
to a successful one.

If the client calls NBD_OPT_GO with NBD_INFO_BLOCK_SIZE, the server will
reply with all information it has, including correct minimum size, and
conclude with NBD_REP_ACK.  If the client then sends an unaligned
request, it was the client that violated the protocol, so all bets are
now off (the server will happen to honor the request, rather than
disconnect or fail with the suggested EINVAL, but the client was
out-of-spec so it shouldn't be relying on any particular server
behavior, whether success or a particular error).

Let me know if any of the above reasoning is wrong, or if we should try
harder to document this in the spec to make it clear to server
implementors how they can be portable to the largest number of clients.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] [PATCH] doc: Restore a lost sentence about NBD_REP_ERR_BLOCK_SIZE_REQD

2016-04-28 Thread Eric Blake

The previous patch accidentally dropped the suggestion that a
server replying with NBD_REP_ERR_BLOCK_SIZE_REQD should first
send NBD_REP_INFO with NBD_INFO_BLOCK_SIZE, so that the client
doesn't have to do yet another round trip NBD_OPT_INFO to learn
what sizes the server plans to enforce.

Signed-off-by: Eric Blake <ebl...@redhat.com>
---

extension-info branch

 doc/proto.md | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 1a0f1cf..da65df0 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -949,12 +949,13 @@ of the newstyle negotiation.
 - `NBD_REP_ERR_BLOCK_SIZE_REQD`: The server requires the client to
   request block size constraints using `NBD_INFO_BLOCK_SIZE` prior
   to entering transmission phase, because the server will be using
-  non-default block sizes constraints. The server MUST NOT send this
-  error if block size constraints were requested with
-  `NBD_INFO_BLOCK_SIZE` with the `NBD_OPT_INFO` or `NBD_OPT_GO`
-  request. The server SHOULD NOT send this error if it is using
-  default block size constraints or block size constraints
-  negotiated out of band.
+  non-default block sizes constraints. The server SHOULD first
+  send at least an `NBD_INFO_BLOCK_SIZE` information reply before
+  giving this error.  The server MUST NOT send this error if block
+  size constraints were requested with `NBD_INFO_BLOCK_SIZE` with
+  the `NBD_OPT_INFO` or `NBD_OPT_GO` request. The server SHOULD
+  NOT send this error if it is using default block size
+  constraints or block size constraints negotiated out of band.

 Additionally, if TLS has not been initiated, the server MAY reply
 with `NBD_REP_ERR_TLS_REQD` (instead of `NBD_REP_ERR_UNKNOWN`) to
-- 
2.5.5


--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCHv6] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO

2016-04-28 Thread Eric Blake

On 04/28/2016 10:54 AM, Wouter Verhelst wrote:
> [...]
>> +  request block size constraints using `NBD_INFO_BLOCK_SIZE` prior
>> +  to entering transmission phase, because the server will be using
>> +  non-default block sizes constraints. The server MUST NOT send this
>> +  error if block size constraints were requested with
>> +  `NBD_INFO_BLOCK_SIZE` with the `NBD_OPT_INFO` or `NBD_OPT_GO`
> 
> only NBD_OPT_GO here?

This one is correct - if the client requests NBD_INFO_BLOCK_SIZE, then
the server MUST NOT fail with NBD_REP_ERR_BLOCK_SIZE_REQD, regardless of
whether it was NBD_OPT_INFO or NBD_OPT_GO.  Thus,
NBD_REP_ERR_BLOCK_SIZE_REQD can only be returned by servers in cases
where the client used NBD_OPT_INFO or NBD_OPT_GO, but did not request
NBD_INFO_BLOCK_SIZE.

I also think we may want to add wording that when this particular error
is returned, the server MUST first send an NBD_INFO_BLOCK_SIZE
advertisement before the error, so that the client can have the block
size information available without needing another round trip, in order
for the client to make its decision on whether to include
NBD_INFO_BLOCK_SIZE in its next request for NBD_OPT_GO or to give up now.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCHv6] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO

2016-04-28 Thread Eric Blake

On 04/28/2016 02:58 AM, Alex Bligh wrote:
> Allow a client to requests specific bits of information using
> NBD_OPT_INFO and thereby signal to the server that it supports them.
> This makes it easier for the server to know whether the information
> it is providing will be used / respected.
> 
> Now a server can determine whether a client supports block sizes by
> the fact it uses NBD_OPT_INFO or NBD_OPT_GO with an NBD_INFO_BLOCK_SIZE,
> request as these are part of the same extension, so there is no
> need for NBD_OPT_BLOCK_SIZE.
> 
> Signed-off-by: Alex Bligh <a...@alex.org.uk>
> ---
>  doc/proto.md | 140 
> ++-
>  1 file changed, 80 insertions(+), 60 deletions(-)

> +++ b/doc/proto.md
> @@ -636,34 +636,53 @@ This functionality has not yet been implemented by the 
> reference
>  implementation, but was implemented by qemu and subsequently
>  by other users, so has been moved out of the "experimental" section.
>  
> -## Block sizes
> +## Block size constraints
>  
>  During transmission phase, several operations are constrained by the
>  export size sent by the final `NBD_OPT_EXPORT_NAME` or `NBD_OPT_GO`,
> -as well as by three block sizes defined here (minimum, preferred, and
> -maximum).  If a client can honor server block sizes (as set out in

My fault for introducing it...

> -`NBD_OPT_BLOCK_SIZE`), it SHOULD announce this
> -during the handshake phase, and SHOULD use `NBD_OPT_GO` rather than
> -`NBD_OPT_EXPORT_NAME`.  A server SHOULD advertise the block size
> -contraints during handshake phase via `NBD_OPT_INFO`.  A server
> -and client MAY agree on block sizes via out of band means.
> -
> -If block sizes have not been advertised or agreed on externally, then
> -a client SHOULD assume a default minimum block size of 1, a preferred
> +as well as by three block size constraints defined here (minimum,
> +preferred, and maximum).
> +
> +If a client can honor server block size constraints (as set out below

...but while you are touching this, s/honor/honour/ so that the document
consistently uses UK spellings.

> @@ -1077,13 +1093,17 @@ during option haggling in the fixed newstyle 
> negotiation.
>  
>  * `NBD_INFO_BLOCK_SIZE` (3)
>  
> -  Represents the server's advertised block sizes; see the "Block
> -  sizes" section for more details on what these values represent,
> -  and on constraints on their values.  The server MAY send this
> -  info whether or not the client has negotiated
> -  `NBD_OPT_BLOCK_SIZE`, and SHOULD send this info if it intends to
> -  enforce block sizes other than the defaults. The *length* MUST
> -  be 14, and the reply payload is interpreted as:
> +  Represents the server's advertised block size constraints; see the
> +  "Block size constraints" section for more details on what these
> +  values represent, and on constraints on their values.  The server
> +  MUST send this info if it is requested and it intends to enforce
> +  block size constraints other than the defaults. After
> +  sending this information in response to an `NBD_OPT_GO`,

s/`NBD_OPT_GO`/`NBD_OPT_GO` where the client requested
`NBD_INFO_BLOCK_SIZE`/

> the server
> +  can legitimately assume that any client that continues the session
> +  will support the block size constraints supplied (note that this
> +  assumption cannot be made solely on the basis of an `NBD_OPT_INFO`
> +  with an `NBD_INFO_BLOCK_SIZE` request).

s/request)/request, nor if the client did not request
`NBD_INFO_BLOCK_SIZE`)/

> The *length* MUST be 14,
> +  and the reply payload is interpreted as:
>  
>- 16 bits, `NBD_INFO_BLOCK_SIZE`  
>- 32 bits, minimum block size  
>

Otherwise I think this is ready.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Re: [Nbd] [PATCH/RFCv4] Remove NBD_OPT_BLOCK_SIZE; add specific requests to NBD_OPT_INFO

2016-04-26 Thread Eric Blake

On 04/26/2016 01:36 PM, Alex Bligh wrote:
> Allow a client to requests specific bits of information using
> NBD_OPT_INFO and thereby signal to the server that it supports them.
> This makes it easier for the server to know whether the information
> it is providing will be used / respected.
> 
> Now a server can determine whether a client supports block sizes by
> the fact it uses NBD_OPT_INFO or NBD_OPT_GO with an NBD_INFO_BLOCK_SIZE,
> request as these are part of the same extension, so there is no
> need for NBD_OPT_BLOCK_SIZE.
> 
> Signed-off-by: Alex Bligh <a...@alex.org.uk>
> ---
>  doc/proto.md | 103 
> +++
>  1 file changed, 54 insertions(+), 49 deletions(-)
> 

> @@ -883,8 +897,24 @@ of the newstyle negotiation.
>  
>  Data (both commands):
>  
> -- String: name of the export (as with `NBD_OPT_EXPORT_NAME`, the length
> -  comes from the option header).
> +- 16 bits, number of information requests
> +- 16 bits x n - list of `NBD_INFO` information requests
> +- 32 bits, length of name (unsigned); MUST be no larger than the
> +  option data length - 6

Could be 16 bits, no larger than 'length - 4', since the maximum name is
4096 bytes.

> +- String: name of the export
> +
> +The client MAY list one or more items of specific information it
> +is seeking in the list of information requests, or it MAY specify
> +an empty list. The server MUST ignore any information requests
> +it does not understand. The server MAY ignore information requests
> +that it does not wish to supply for policy reasons (other than
> +`NBD_INFO_EXPORT`). Equally the client MAY refuse to negotiate
> +if not supplied information it has requested. The server MAY send
> +information requests back which are not explicitly requested, but
> +the server MUST NOT assume that such information requests are
> +understood and respected by the client unless the client explicitly
> +asked for them. The client MUST ignore information replies it
> +does not understand.

Should we state that clients SHOULD NOT request a particular info item
more than once in its request list?

Do we need to state anything about whether a client may include
NBD_INFO_EXPORT in its request list (since that one is mandatory even
without being requested)?  On the other hand, I don't see what it would
add other than verbosity.

Looking better, and I'm still feeling comfortable with the amount of
tweaks to my qemu patches to implement this change if we go with it.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z___
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

1 2 3 >

1 - 100 of 264 matches

Mail list logo