date:20160406

Re: [Qemu-block] [PATCH] xen: piix reuse pci generic class init function

2016-04-06 Thread Michael S. Tsirkin

On Wed, Apr 06, 2016 at 04:56:12PM -0700, Stefano Stabellini wrote:
> On Sun, 3 Apr 2016, Michael S. Tsirkin wrote:
> > piix3_ide_xen_class_init is identical to piix3_ide_class_init
> > except it's buggy as it does not set exit and does not disable
> > hotplug properly.
> > 
> > Switch to the generic one.
> > 
> > Reviewed-by: Stefano Stabellini 
> > Signed-off-by: Michael S. Tsirkin 
> 
> Hey John,
> 
> are you going to take the patch or do you want me to handle it?
> 
> Cheers,
> 
> Stefano

it's in my tree already.

> 
> >  hw/ide/piix.c | 14 +-
> >  1 file changed, 1 insertion(+), 13 deletions(-)
> > 
> > diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> > index df46147..0a4cbcb 100644
> > --- a/hw/ide/piix.c
> > +++ b/hw/ide/piix.c
> > @@ -258,22 +258,10 @@ static const TypeInfo piix3_ide_info = {
> >  .class_init= piix3_ide_class_init,
> >  };
> >  
> > -static void piix3_ide_xen_class_init(ObjectClass *klass, void *data)
> > -{
> > -DeviceClass *dc = DEVICE_CLASS(klass);
> > -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > -
> > -k->realize = pci_piix_ide_realize;
> > -k->vendor_id = PCI_VENDOR_ID_INTEL;
> > -k->device_id = PCI_DEVICE_ID_INTEL_82371SB_1;
> > -k->class_id = PCI_CLASS_STORAGE_IDE;
> > -set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> > -}
> > -
> >  static const TypeInfo piix3_ide_xen_info = {
> >  .name  = "piix3-ide-xen",
> >  .parent= TYPE_PCI_IDE,
> > -.class_init= piix3_ide_xen_class_init,
> > +.class_init= piix3_ide_class_init,
> >  };
> >  
> >  static void piix4_ide_class_init(ObjectClass *klass, void *data)
> > -- 
> > MST
> >

Re: [Qemu-block] [PATCH] xen: piix reuse pci generic class init function

2016-04-06 Thread Stefano Stabellini

On Sun, 3 Apr 2016, Michael S. Tsirkin wrote:
> piix3_ide_xen_class_init is identical to piix3_ide_class_init
> except it's buggy as it does not set exit and does not disable
> hotplug properly.
> 
> Switch to the generic one.
> 
> Reviewed-by: Stefano Stabellini 
> Signed-off-by: Michael S. Tsirkin 

Hey John,

are you going to take the patch or do you want me to handle it?

Cheers,

Stefano


>  hw/ide/piix.c | 14 +-
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index df46147..0a4cbcb 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -258,22 +258,10 @@ static const TypeInfo piix3_ide_info = {
>  .class_init= piix3_ide_class_init,
>  };
>  
> -static void piix3_ide_xen_class_init(ObjectClass *klass, void *data)
> -{
> -DeviceClass *dc = DEVICE_CLASS(klass);
> -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> -
> -k->realize = pci_piix_ide_realize;
> -k->vendor_id = PCI_VENDOR_ID_INTEL;
> -k->device_id = PCI_DEVICE_ID_INTEL_82371SB_1;
> -k->class_id = PCI_CLASS_STORAGE_IDE;
> -set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> -}
> -
>  static const TypeInfo piix3_ide_xen_info = {
>  .name  = "piix3-ide-xen",
>  .parent= TYPE_PCI_IDE,
> -.class_init= piix3_ide_xen_class_init,
> +.class_init= piix3_ide_class_init,
>  };
>  
>  static void piix4_ide_class_init(ObjectClass *klass, void *data)
> -- 
> MST
>

Re: [Qemu-block] [PATCH for-2.6] block: Forbid I/O throttling on nodes with multiple parents for 2.6

2016-04-06 Thread Alberto Garcia

On Mon 04 Apr 2016 05:26:02 PM CEST, Kevin Wolf wrote:
> As the patches to move I/O throttling to BlockBackend didn't make it in
> time for the 2.6 release, but the release adds new ways of configuring
> VMs whose behaviour would change once the move is done, we need to
> outlaw such configurations temporarily.
>
> The problem exists whenever a BDS has more users than just its BB, for
> example it is used as a backing file for another node. (This wasn't
> possible in 2.5 yet as we introduced node references to specify a
> backing file only recently.) In these cases, the throttling would
> apply to these other users now, but after moving throttling to the
> BlockBackend the other users wouldn't be throttled any more.
>
> This patch both prevents making new references to a throttled node as
> well as using monitor commands to throttle a node with multiple parents.
>
> Compared to 2.5 this changes behaviour in some corner cases where
> references were allowed before, like bs->file or Quorum children. It
> seems reasonable to assume that users didn't use I/O throttling on such
> low level nodes. With the upcoming move of throttling into BlockBackend,
> such configurations won't be possible anyway.
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Alberto Garcia 

Berto

[Qemu-block] [PATCH v3 14/14] iotests: Add test for NBD's blockdev-add interface

2016-04-06 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/147 | 194 +
 tests/qemu-iotests/147.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 200 insertions(+)
 create mode 100755 tests/qemu-iotests/147
 create mode 100644 tests/qemu-iotests/147.out

diff --git a/tests/qemu-iotests/147 b/tests/qemu-iotests/147
new file mode 100755
index 000..f31de69
--- /dev/null
+++ b/tests/qemu-iotests/147
@@ -0,0 +1,194 @@
+#!/usr/bin/env python
+#
+# Test case for the QMP 'change' command and all other associated
+# commands
+#
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import socket
+import stat
+import time
+import iotests
+from iotests import cachemode, imgfmt, qemu_img, qemu_nbd
+
+NBD_PORT = 10811
+
+test_img = os.path.join(iotests.test_dir, 'test.img')
+
+class NBDBlockdevAddBase(iotests.QMPTestCase):
+def blockdev_add_options(self, address, export=None):
+options = { 'id': 'drive0',
+'driver': 'raw',
+'file': {
+'driver': 'nbd',
+'address': address
+} }
+if export is not None:
+options['file']['export'] = export
+return options
+
+def client_test(self, filename, address, export=None):
+bao = self.blockdev_add_options(address, export)
+result = self.vm.qmp('blockdev-add', options=bao)
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('query-block')
+self.assert_qmp(result, 'return[0]/inserted/image/filename', filename)
+
+result = self.vm.qmp('x-blockdev-del', id='drive0')
+self.assert_qmp(result, 'return', {})
+
+
+class QemuNBD(NBDBlockdevAddBase):
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, test_img, '64k')
+self.vm = iotests.VM()
+self.vm.launch()
+self.qemu_nbd = None
+
+def tearDown(self):
+self.vm.shutdown()
+if self.qemu_nbd is not None:
+self.qemu_nbd.wait()
+os.remove(test_img)
+
+def _server_up(self, *args):
+self.qemu_nbd = qemu_nbd('-f', imgfmt, test_img, *args)
+time.sleep(1)
+
+def test_inet(self):
+self._server_up('-p', str(NBD_PORT))
+address = { 'type': 'inet',
+'data': {
+'host': 'localhost',
+'port': str(NBD_PORT)
+} }
+self.client_test('nbd://localhost:%i' % NBD_PORT, address)
+
+def test_unix(self):
+socket = os.path.join(iotests.test_dir, 'qemu-nbd.socket')
+self._server_up('-k', socket)
+address = { 'type': 'unix',
+'data': { 'path': socket } }
+self.client_test('nbd+unix://?socket=' + socket, address)
+os.remove(socket)
+
+
+class BuiltinNBD(NBDBlockdevAddBase):
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, test_img, '64k')
+self.vm = iotests.VM()
+self.vm.launch()
+self.server = iotests.VM('.server')
+self.server.add_drive_raw('if=none,id=nbd-export,' +
+  'file=%s,' % test_img +
+  'format=%s,' % imgfmt +
+  'cache=%s' % cachemode)
+self.server.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+self.server.shutdown()
+os.remove(test_img)
+
+def _server_up(self, address):
+result = self.server.qmp('nbd-server-start', addr=address)
+self.assert_qmp(result, 'return', {})
+
+result = self.server.qmp('nbd-server-add', device='nbd-export')
+self.assert_qmp(result, 'return', {})
+
+def _server_down(self):
+result = self.server.qmp('nbd-server-stop')
+self.assert_qmp(result, 'return', {})
+
+def test_inet(self):
+address = { 'type': 'inet',
+'data': {
+'host': 'localhost',
+'port': str(NBD_PORT)
+} }
+self._server_up(address)
+self.client_test('nbd://localhost:%i/nbd-export' % NBD_PORT,
+ address, 'nbd-export')
+self._server_down()
+
+def test_inet6(self):
+

[Qemu-block] [PATCH v3 12/14] iotests.py: Allow concurrent qemu instances

2016-04-06 Thread Max Reitz

By adding an optional suffix to the files used for communication with a
VM, we can launch multiple VM instances concurrently.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b3c00dd..ef7e52f 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -157,10 +157,13 @@ def event_match(event, match=None):
 class VM(object):
 '''A QEMU VM'''
 
-def __init__(self):
-self._monitor_path = os.path.join(test_dir, 'qemu-mon.%d' % 
os.getpid())
-self._qemu_log_path = os.path.join(test_dir, 'qemu-log.%d' % 
os.getpid())
-self._qtest_path = os.path.join(test_dir, 'qemu-qtest.%d' % 
os.getpid())
+def __init__(self, path_suffix=''):
+self._monitor_path = os.path.join(test_dir, 'qemu-mon%s.%d' %
+(path_suffix, os.getpid()))
+self._qemu_log_path = os.path.join(test_dir, 'qemu-log%s.%d' %
+ (path_suffix, 
os.getpid()))
+self._qtest_path = os.path.join(test_dir, 'qemu-qtest%s.%d' %
+  (path_suffix, os.getpid()))
 self._args = qemu_args + ['-chardev',
  'socket,id=mon,path=' + self._monitor_path,
  '-mon', 'chardev=mon,mode=control',
-- 
2.8.0

Re: [Qemu-block] [PATCH v3 0/7] bdrv_flush_io_queue removal, shared LinuxAioState

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 20:28 hat Paolo Bonzini geschrieben:
> 
> 
> On 06/04/2016 20:19, Kevin Wolf wrote:
> >> > 
> >> > Great, I'll send v4 as soon as possible for inclusion in the block-next
> >> > branch.
> > Are you still working on v4 or did I miss it? All of the block-next
> > candidate series are waiting to be rebased on this one, and moving parts
> > at the bottom of a longish dependency chain are kind of nasty.
> 
> Yes, I am waiting for Fam's bdrv_co_drain patch, which is this one
> depends on.

Oh, I wasn't aware of that one. Thanks!

Kevin

[Qemu-block] [PATCH v3 13/14] socket_scm_helper: Accept fd directly

2016-04-06 Thread Max Reitz

This gives us more freedom about the fd that is passed to qemu, allowing
us to e.g. pass sockets.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/socket_scm_helper.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/socket_scm_helper.c 
b/tests/qemu-iotests/socket_scm_helper.c
index 80cadf4..eb76d31 100644
--- a/tests/qemu-iotests/socket_scm_helper.c
+++ b/tests/qemu-iotests/socket_scm_helper.c
@@ -60,7 +60,7 @@ static int send_fd(int fd, int fd_to_send)
 }
 
 /* Convert string to fd number. */
-static int get_fd_num(const char *fd_str)
+static int get_fd_num(const char *fd_str, bool silent)
 {
 int sock;
 char *err;
@@ -68,12 +68,16 @@ static int get_fd_num(const char *fd_str)
 errno = 0;
 sock = strtol(fd_str, , 10);
 if (errno) {
-fprintf(stderr, "Failed in strtol for socket fd, reason: %s\n",
-strerror(errno));
+if (!silent) {
+fprintf(stderr, "Failed in strtol for socket fd, reason: %s\n",
+strerror(errno));
+}
 return -1;
 }
 if (!*fd_str || *err || sock < 0) {
-fprintf(stderr, "bad numerical value for socket fd '%s'\n", fd_str);
+if (!silent) {
+fprintf(stderr, "bad numerical value for socket fd '%s'\n", 
fd_str);
+}
 return -1;
 }
 
@@ -104,18 +108,21 @@ int main(int argc, char **argv, char **envp)
 }
 
 
-sock = get_fd_num(argv[1]);
+sock = get_fd_num(argv[1], false);
 if (sock < 0) {
 return EXIT_FAILURE;
 }
 
-/* Now only open a file in readonly mode for test purpose. If more precise
-   control is needed, use python script in file operation, which is
-   supposed to fork and exec this program. */
-fd = open(argv[2], O_RDONLY);
+fd = get_fd_num(argv[2], true);
 if (fd < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", argv[2]);
-return EXIT_FAILURE;
+/* Now only open a file in readonly mode for test purpose. If more
+   precise control is needed, use python script in file operation, 
which
+   is supposed to fork and exec this program. */
+fd = open(argv[2], O_RDONLY);
+if (fd < 0) {
+fprintf(stderr, "Failed to open file '%s'\n", argv[2]);
+return EXIT_FAILURE;
+}
 }
 
 ret = send_fd(sock, fd);
-- 
2.8.0

[Qemu-block] [PATCH v3 11/14] iotests.py: Add qemu_nbd function

2016-04-06 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 8499e1b..b3c00dd 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -28,7 +28,7 @@ import qmp
 import qtest
 import struct
 
-__all__ = ['imgfmt', 'imgproto', 'test_dir' 'qemu_img', 'qemu_io',
+__all__ = ['imgfmt', 'imgproto', 'test_dir' 'qemu_img', 'qemu_io', 'qemu_nbd',
'VM', 'QMPTestCase', 'notrun', 'main', 'verify_image_format',
'verify_platform', 'filter_test_dir', 'filter_win32',
'filter_qemu_io', 'filter_chown', 'log']
@@ -43,6 +43,10 @@ qemu_io_args = [os.environ.get('QEMU_IO_PROG', 'qemu-io')]
 if os.environ.get('QEMU_IO_OPTIONS'):
 qemu_io_args += os.environ['QEMU_IO_OPTIONS'].strip().split(' ')
 
+qemu_nbd_args = [os.environ.get('QEMU_NBD_PROG', 'qemu-nbd')]
+if os.environ.get('QEMU_NBD_OPTIONS'):
+qemu_nbd_args += os.environ['QEMU_NBD_OPTIONS'].strip().split(' ')
+
 qemu_args = [os.environ.get('QEMU_PROG', 'qemu')]
 if os.environ.get('QEMU_OPTIONS'):
 qemu_args += os.environ['QEMU_OPTIONS'].strip().split(' ')
@@ -91,6 +95,11 @@ def qemu_io(*args):
 sys.stderr.write('qemu-io received signal %i: %s\n' % (-exitcode, ' 
'.join(args)))
 return subp.communicate()[0]
 
+def qemu_nbd(*args):
+'''Run qemu-nbd in the background'''
+subp = subprocess.Popen(qemu_nbd_args + list(args))
+return subp
+
 def compare_images(img1, img2):
 '''Return True if two image files are identical'''
 return qemu_img('compare', '-f', imgfmt,
-- 
2.8.0

[Qemu-block] [PATCH v3 08/14] block/nbd: Accept SocketAddress

2016-04-06 Thread Max Reitz

Add a new option "address" to the NBD block driver which accepts a
SocketAddress.

"path", "host" and "port" are still supported as legacy options and are
mapped to their corresponding SocketAddress representation.

Signed-off-by: Max Reitz 
---
 block/nbd.c   | 97 ++-
 tests/qemu-iotests/051.out|  4 +-
 tests/qemu-iotests/051.pc.out |  4 +-
 3 files changed, 64 insertions(+), 41 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 3adf302..9ae66ba 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -32,6 +32,8 @@
 #include "qemu/uri.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi-visit.h"
+#include "qapi/qmp-input-visitor.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qint.h"
@@ -128,7 +130,9 @@ static bool nbd_has_filename_options_conflict(QDict 
*options, Error **errp)
 if (!strcmp(e->key, "host")
 || !strcmp(e->key, "port")
 || !strcmp(e->key, "path")
-|| !strcmp(e->key, "export"))
+|| !strcmp(e->key, "export")
+|| !strcmp(e->key, "address")
+|| !strncmp(e->key, "address.", 8))
 {
 error_setg(errp, "Option '%s' cannot be used with a file name",
e->key);
@@ -202,46 +206,66 @@ out:
 g_free(file);
 }
 
+static bool nbd_process_legacy_socket_options(QDict *options, Error **errp)
+{
+if (qdict_haskey(options, "path") && qdict_haskey(options, "host")) {
+error_setg(errp, "path and host may not be used at the same time");
+return false;
+} else if (qdict_haskey(options, "path")) {
+if (qdict_haskey(options, "port")) {
+error_setg(errp, "port may not be used without host");
+return false;
+}
+
+qdict_put(options, "address.type", qstring_from_str("unix"));
+qdict_change_key(options, "path", "address.data.path");
+} else if (qdict_haskey(options, "host")) {
+qdict_put(options, "address.type", qstring_from_str("inet"));
+qdict_change_key(options, "host", "address.data.host");
+if (!qdict_change_key(options, "port", "address.data.port")) {
+qdict_put(options, "address.data.port",
+  qstring_from_str(stringify(NBD_DEFAULT_PORT)));
+}
+}
+
+return true;
+}
+
 static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char 
**export,
  Error **errp)
 {
-SocketAddress *saddr;
+SocketAddress *saddr = NULL;
+QDict *addr = NULL;
+QObject *crumpled_addr;
+QmpInputVisitor *iv;
+Error *local_err = NULL;
 
-if (qdict_haskey(options, "path") == qdict_haskey(options, "host")) {
-if (qdict_haskey(options, "path")) {
-error_setg(errp, "path and host may not be used at the same time");
-} else {
-error_setg(errp, "one of path and host must be specified");
-}
-return NULL;
+if (!nbd_process_legacy_socket_options(options, errp)) {
+goto fail;
 }
-if (qdict_haskey(options, "port") && !qdict_haskey(options, "host")) {
-error_setg(errp, "port may not be used without host");
-return NULL;
+
+qdict_extract_subqdict(options, , "address.");
+if (!qdict_size(addr)) {
+error_setg(errp, "NBD server address missing");
+goto fail;
 }
 
-saddr = g_new0(SocketAddress, 1);
+crumpled_addr = qdict_crumple(addr, true, errp);
+if (!crumpled_addr) {
+goto fail;
+}
 
-if (qdict_haskey(options, "path")) {
-UnixSocketAddress *q_unix;
-saddr->type = SOCKET_ADDRESS_KIND_UNIX;
-q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
-q_unix->path = g_strdup(qdict_get_str(options, "path"));
-qdict_del(options, "path");
-} else {
-InetSocketAddress *inet;
-saddr->type = SOCKET_ADDRESS_KIND_INET;
-inet = saddr->u.inet.data = g_new0(InetSocketAddress, 1);
-inet->host = g_strdup(qdict_get_str(options, "host"));
-if (!qdict_get_try_str(options, "port")) {
-inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
-} else {
-inet->port = g_strdup(qdict_get_str(options, "port"));
-}
-qdict_del(options, "host");
-qdict_del(options, "port");
+iv = qmp_input_visitor_new(crumpled_addr);
+visit_type_SocketAddress(qmp_input_get_visitor(iv), NULL, ,
+ _err);
+qmp_input_visitor_cleanup(iv);
+if (local_err) {
+error_propagate(errp, local_err);
+goto fail;
 }
 
+/* TODO: Detect superfluous (unused) options under in addr */
+
 s->client.is_unix = saddr->type == SOCKET_ADDRESS_KIND_UNIX;
 
 *export = g_strdup(qdict_get_try_str(options, "export"));
@@ -249,7 +273,12 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QDict 
*options,

[Qemu-block] [PATCH v3 10/14] qapi: Allow blockdev-add for NBD

2016-04-06 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 qapi/block-core.json | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1d09079..6e38ff0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1632,13 +1632,14 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @nbd: Since 2.6
 #
 # Since: 2.0
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
-'http', 'https', 'luks', 'null-aio', 'null-co', 'parallels',
+'http', 'https', 'luks', 'nbd', 'null-aio', 'null-co', 'parallels',
 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
 'vmdk', 'vpc', 'vvfat' ] }
 
@@ -2032,6 +2033,24 @@
 '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @BlockdevOptionsNbd
+#
+# Driver specific block device options for NBD.
+#
+# @address: NBD server address
+#
+# @export:  #optional export name
+#
+# @tls-creds:   #optional TLS credentials ID
+#
+# Since: 2.6
+##
+{ 'struct': 'BlockdevOptionsNbd',
+  'data': { 'address': 'SocketAddress',
+'*export': 'str',
+'*tls-creds': 'str' } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.  Many options are available for all
@@ -2101,7 +2120,7 @@
   'https':  'BlockdevOptionsFile',
 # TODO iscsi: Wait for structured options
   'luks':   'BlockdevOptionsLUKS',
-# TODO nbd: Should take InetSocketAddress for 'host'?
+  'nbd':'BlockdevOptionsNbd',
 # TODO nfs: Wait for structured options
   'null-aio':   'BlockdevOptionsNull',
   'null-co':'BlockdevOptionsNull',
-- 
2.8.0

[Qemu-block] [PATCH v3 06/14] block/nbd: Add nbd_has_filename_options_conflict()

2016-04-06 Thread Max Reitz

Right now, we have four possible options that conflict with specifying
an NBD filename, and a future patch will add another one ("address").
This future option is a nested QDict that is flattened at this point,
requiring as to test each option whether its key has an "address."
prefix. Therefore, we will then need to iterate through all options.

Adding this iteration logic now will simplify adding the new option
later. A nice side effect is that the user will not receive a long list
of five options which are not supposed to be specified with a filename,
but we can actually print the problematic option.

Signed-off-by: Max Reitz 
---
 block/nbd.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index d12bcc6..1736f68 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -120,6 +120,25 @@ out:
 return ret;
 }
 
+static bool nbd_has_filename_options_conflict(QDict *options, Error **errp)
+{
+const QDictEntry *e;
+
+for (e = qdict_first(options); e; e = qdict_next(options, e)) {
+if (!strcmp(e->key, "host")
+|| !strcmp(e->key, "port")
+|| !strcmp(e->key, "path")
+|| !strcmp(e->key, "export"))
+{
+error_setg(errp, "Option '%s' cannot be used with a file name",
+   e->key);
+return true;
+}
+}
+
+return false;
+}
+
 static void nbd_parse_filename(const char *filename, QDict *options,
Error **errp)
 {
@@ -128,12 +147,7 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 const char *host_spec;
 const char *unixpath;
 
-if (qdict_haskey(options, "host")
-|| qdict_haskey(options, "port")
-|| qdict_haskey(options, "path"))
-{
-error_setg(errp, "host/port/path and a file name may not be specified "
- "at the same time");
+if (nbd_has_filename_options_conflict(options, errp)) {
 return;
 }
 
-- 
2.8.0

[Qemu-block] [PATCH v3 09/14] block/nbd: Use SocketAddress options

2016-04-06 Thread Max Reitz

Drop the use of legacy options in favor of the SocketAddress
representation, even for internal use (i.e. for storing the result of
the filename parsing).

Signed-off-by: Max Reitz 
---
 block/nbd.c | 34 +-
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 9ae66ba..82dcb5e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -89,9 +89,13 @@ static int nbd_parse_uri(const char *filename, QDict 
*options)
 ret = -EINVAL;
 goto out;
 }
-qdict_put(options, "path", qstring_from_str(qp->p[0].value));
+qdict_put(options, "address.type", qstring_from_str("unix"));
+qdict_put(options, "address.data.path",
+  qstring_from_str(qp->p[0].value));
 } else {
 QString *host;
+char *port_str;
+
 /* nbd[+tcp]://host[:port]/export */
 if (!uri->server) {
 ret = -EINVAL;
@@ -106,12 +110,12 @@ static int nbd_parse_uri(const char *filename, QDict 
*options)
 host = qstring_from_str(uri->server);
 }
 
-qdict_put(options, "host", host);
-if (uri->port) {
-char* port_str = g_strdup_printf("%d", uri->port);
-qdict_put(options, "port", qstring_from_str(port_str));
-g_free(port_str);
-}
+qdict_put(options, "address.type", qstring_from_str("inet"));
+qdict_put(options, "address.data.host", host);
+
+port_str = g_strdup_printf("%d", uri->port ?: NBD_DEFAULT_PORT);
+qdict_put(options, "address.data.port", qstring_from_str(port_str));
+g_free(port_str);
 }
 
 out:
@@ -188,7 +192,8 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 
 /* are we a UNIX or TCP socket? */
 if (strstart(host_spec, "unix:", )) {
-qdict_put(options, "path", qstring_from_str(unixpath));
+qdict_put(options, "address.type", qstring_from_str("unix"));
+qdict_put(options, "address.data.path", qstring_from_str(unixpath));
 } else {
 InetSocketAddress *addr = NULL;
 
@@ -197,8 +202,9 @@ static void nbd_parse_filename(const char *filename, QDict 
*options,
 goto out;
 }
 
-qdict_put(options, "host", qstring_from_str(addr->host));
-qdict_put(options, "port", qstring_from_str(addr->port));
+qdict_put(options, "address.type", qstring_from_str("inet"));
+qdict_put(options, "address.data.host", qstring_from_str(addr->host));
+qdict_put(options, "address.data.port", qstring_from_str(addr->port));
 qapi_free_InetSocketAddress(addr);
 }
 
@@ -528,10 +534,12 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 }
 } else {
 if (path) {
-qdict_put(opts, "path", qstring_from_str(path));
+qdict_put(opts, "address.type", qstring_from_str("unix"));
+qdict_put(opts, "address.data.path", qstring_from_str(path));
 } else {
-qdict_put(opts, "host", qstring_from_str(host));
-qdict_put(opts, "port", qstring_from_str(port));
+qdict_put(opts, "address.type", qstring_from_str("inet"));
+qdict_put(opts, "address.data.host", qstring_from_str(host));
+qdict_put(opts, "address.data.port", qstring_from_str(port));
 }
 }
 if (export) {
-- 
2.8.0

[Qemu-block] [PATCH v3 07/14] block/nbd: "address" in nbd_refresh_filename()

2016-04-06 Thread Max Reitz

As of a future patch, the NBD block driver will accept a SocketAddress
structure for a new "address" option. In order to support this,
nbd_refresh_filename() needs some changes.

The two TODOs introduced by this patch will be removed in the very next
one. They exist to explain that it is currently impossible for
nbd_refresh_filename() to emit an "address.*" option (which the NBD
block driver does not handle yet). The next patch will arm these code
paths, but it will also enable handling of these options.

Signed-off-by: Max Reitz 
---
 block/nbd.c | 84 -
 1 file changed, 61 insertions(+), 23 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 1736f68..3adf302 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -441,37 +441,75 @@ static void nbd_attach_aio_context(BlockDriverState *bs,
 static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
 {
 QDict *opts = qdict_new();
-const char *path   = qdict_get_try_str(options, "path");
-const char *host   = qdict_get_try_str(options, "host");
-const char *port   = qdict_get_try_str(options, "port");
+bool can_generate_filename = true;
+const char *path = NULL, *host = NULL, *port = NULL;
 const char *export = qdict_get_try_str(options, "export");
 const char *tlscreds = qdict_get_try_str(options, "tls-creds");
 
-if (host && !port) {
-port = stringify(NBD_DEFAULT_PORT);
+if (qdict_get_try_str(options, "address.type")) {
+/* This path will only be possible as of a future patch;
+ * TODO: Remove this note once it is */
+
+const char *type = qdict_get_str(options, "address.type");
+
+if (!strcmp(type, "unix")) {
+path = qdict_get_str(options, "address.data.path");
+} else if (!strcmp(type, "inet")) {
+host = qdict_get_str(options, "address.data.host");
+port = qdict_get_str(options, "address.data.port");
+
+can_generate_filename = !qdict_haskey(options, "address.data.to")
+ && !qdict_haskey(options, "address.data.ipv4")
+ && !qdict_haskey(options, 
"address.data.ipv6");
+} else {
+can_generate_filename = false;
+}
+} else {
+path = qdict_get_try_str(options, "path");
+host = qdict_get_try_str(options, "host");
+port = qdict_get_try_str(options, "port");
+
+if (host && !port) {
+port = stringify(NBD_DEFAULT_PORT);
+}
 }
 
 qdict_put(opts, "driver", qstring_from_str("nbd"));
 
-if (path && export) {
-snprintf(bs->exact_filename, sizeof(bs->exact_filename),
- "nbd+unix:///%s?socket=%s", export, path);
-} else if (path && !export) {
-snprintf(bs->exact_filename, sizeof(bs->exact_filename),
- "nbd+unix://?socket=%s", path);
-} else if (!path && export) {
-snprintf(bs->exact_filename, sizeof(bs->exact_filename),
- "nbd://%s:%s/%s", host, port, export);
-} else if (!path && !export) {
-snprintf(bs->exact_filename, sizeof(bs->exact_filename),
- "nbd://%s:%s", host, port);
-}
-
-if (path) {
-qdict_put(opts, "path", qstring_from_str(path));
+if (can_generate_filename) {
+if (path && export) {
+snprintf(bs->exact_filename, sizeof(bs->exact_filename),
+ "nbd+unix:///%s?socket=%s", export, path);
+} else if (path && !export) {
+snprintf(bs->exact_filename, sizeof(bs->exact_filename),
+ "nbd+unix://?socket=%s", path);
+} else if (!path && export) {
+snprintf(bs->exact_filename, sizeof(bs->exact_filename),
+ "nbd://%s:%s/%s", host, port, export);
+} else if (!path && !export) {
+snprintf(bs->exact_filename, sizeof(bs->exact_filename),
+ "nbd://%s:%s", host, port);
+}
+}
+
+if (qdict_get_try_str(options, "address.type")) {
+/* This path will only be possible as of a future patch;
+ * TODO: Remove this note once it is */
+
+const QDictEntry *e;
+for (e = qdict_first(options); e; e = qdict_next(options, e)) {
+if (!strncmp(e->key, "address.", 8)) {
+qobject_incref(e->value);
+qdict_put_obj(opts, e->key, e->value);
+}
+}
 } else {
-qdict_put(opts, "host", qstring_from_str(host));
-qdict_put(opts, "port", qstring_from_str(port));
+if (path) {
+qdict_put(opts, "path", qstring_from_str(path));
+} else {
+qdict_put(opts, "host", qstring_from_str(host));
+qdict_put(opts, "port", qstring_from_str(port));
+}
 }
 if (export) {
 qdict_put(opts, "export", qstring_from_str(export));
-- 
2.8.0

[Qemu-block] [PATCH v3 03/14] block/nbd: Reject port parameter without host

2016-04-06 Thread Max Reitz

This is better than the generic block layer finding out later that the
port parameter has not been used.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/nbd.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index d9b946f..2112ec0 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -201,6 +201,10 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QDict 
*options, char **export,
 }
 return NULL;
 }
+if (qdict_haskey(options, "port") && !qdict_haskey(options, "host")) {
+error_setg(errp, "port may not be used without host");
+return NULL;
+}
 
 saddr = g_new0(SocketAddress, 1);
 
-- 
2.8.0

[Qemu-block] [PATCH v3 04/14] block/nbd: Default port in nbd_refresh_filename()

2016-04-06 Thread Max Reitz

Instead of not emitting the port in nbd_refresh_filename(), just set it
to the default if the user did not specify it. This makes the logic a
bit simpler.

Signed-off-by: Max Reitz 
---
 block/nbd.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2112ec0..efa5d3d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -433,6 +433,10 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 const char *export = qdict_get_try_str(options, "export");
 const char *tlscreds = qdict_get_try_str(options, "tls-creds");
 
+if (host && !port) {
+port = stringify(NBD_DEFAULT_PORT);
+}
+
 qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("nbd")));
 
 if (path && export) {
@@ -441,27 +445,19 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 } else if (path && !export) {
 snprintf(bs->exact_filename, sizeof(bs->exact_filename),
  "nbd+unix://?socket=%s", path);
-} else if (!path && export && port) {
+} else if (!path && export) {
 snprintf(bs->exact_filename, sizeof(bs->exact_filename),
  "nbd://%s:%s/%s", host, port, export);
-} else if (!path && export && !port) {
-snprintf(bs->exact_filename, sizeof(bs->exact_filename),
- "nbd://%s/%s", host, export);
-} else if (!path && !export && port) {
+} else if (!path && !export) {
 snprintf(bs->exact_filename, sizeof(bs->exact_filename),
  "nbd://%s:%s", host, port);
-} else if (!path && !export && !port) {
-snprintf(bs->exact_filename, sizeof(bs->exact_filename),
- "nbd://%s", host);
 }
 
 if (path) {
 qdict_put_obj(opts, "path", QOBJECT(qstring_from_str(path)));
-} else if (port) {
-qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(host)));
-qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(port)));
 } else {
 qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(host)));
+qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(port)));
 }
 if (export) {
 qdict_put_obj(opts, "export", QOBJECT(qstring_from_str(export)));
-- 
2.8.0

[Qemu-block] [PATCH v3 05/14] block/nbd: Use qdict_put()

2016-04-06 Thread Max Reitz

Instead of inlining this nice macro (i.e. resorting to
qdict_put_obj(..., QOBJECT(...))), use it.

Signed-off-by: Max Reitz 
---
 block/nbd.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index efa5d3d..d12bcc6 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -437,7 +437,7 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 port = stringify(NBD_DEFAULT_PORT);
 }
 
-qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("nbd")));
+qdict_put(opts, "driver", qstring_from_str("nbd"));
 
 if (path && export) {
 snprintf(bs->exact_filename, sizeof(bs->exact_filename),
@@ -454,16 +454,16 @@ static void nbd_refresh_filename(BlockDriverState *bs, 
QDict *options)
 }
 
 if (path) {
-qdict_put_obj(opts, "path", QOBJECT(qstring_from_str(path)));
+qdict_put(opts, "path", qstring_from_str(path));
 } else {
-qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(host)));
-qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(port)));
+qdict_put(opts, "host", qstring_from_str(host));
+qdict_put(opts, "port", qstring_from_str(port));
 }
 if (export) {
-qdict_put_obj(opts, "export", QOBJECT(qstring_from_str(export)));
+qdict_put(opts, "export", qstring_from_str(export));
 }
 if (tlscreds) {
-qdict_put_obj(opts, "tls-creds", QOBJECT(qstring_from_str(tlscreds)));
+qdict_put(opts, "tls-creds", qstring_from_str(tlscreds));
 }
 
 bs->full_open_options = opts;
-- 
2.8.0

[Qemu-block] [PATCH v3 01/14] qdict: Add qdict_change_key()

2016-04-06 Thread Max Reitz

This is a shorthand function for changing a QDict's entry's key.

Signed-off-by: Max Reitz 
---
 include/qapi/qmp/qdict.h |  1 +
 qobject/qdict.c  | 23 +++
 2 files changed, 24 insertions(+)

diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 8a3ac13..3e0d7df 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -39,6 +39,7 @@ size_t qdict_size(const QDict *qdict);
 void qdict_put_obj(QDict *qdict, const char *key, QObject *value);
 void qdict_del(QDict *qdict, const char *key);
 int qdict_haskey(const QDict *qdict, const char *key);
+bool qdict_change_key(QDict *qdict, const char *old_key, const char *new_key);
 QObject *qdict_get(const QDict *qdict, const char *key);
 QDict *qobject_to_qdict(const QObject *obj);
 void qdict_iter(const QDict *qdict,
diff --git a/qobject/qdict.c b/qobject/qdict.c
index ae6ad06..2eacb3d 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -170,6 +170,29 @@ int qdict_haskey(const QDict *qdict, const char *key)
 }
 
 /**
+ * qdict_change_key(): Changes an entry's key
+ *
+ * Removes the entry with the key 'old_key' and inserts its associated value as
+ * a new entry with the key 'new_key'.
+ *
+ * Returns false if 'old_key' does not exist, true otherwise.
+ */
+bool qdict_change_key(QDict *qdict, const char *old_key, const char *new_key)
+{
+QObject *value = qdict_get(qdict, old_key);
+
+if (!value) {
+return false;
+}
+
+qobject_incref(value);
+qdict_del(qdict, old_key);
+qdict_put_obj(qdict, new_key, value);
+
+return true;
+}
+
+/**
  * qdict_size(): Return the size of the dictionary
  */
 size_t qdict_size(const QDict *qdict)
-- 
2.8.0

[Qemu-block] [PATCH v3 02/14] block/nbd: Drop trailing "." in error messages

2016-04-06 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block/nbd.c   | 4 ++--
 tests/qemu-iotests/051.out| 4 ++--
 tests/qemu-iotests/051.pc.out | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index f7ea3b3..d9b946f 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -195,9 +195,9 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QDict 
*options, char **export,
 
 if (qdict_haskey(options, "path") == qdict_haskey(options, "host")) {
 if (qdict_haskey(options, "path")) {
-error_setg(errp, "path and host may not be used at the same 
time.");
+error_setg(errp, "path and host may not be used at the same time");
 } else {
-error_setg(errp, "one of path and host must be specified.");
+error_setg(errp, "one of path and host must be specified");
 }
 return NULL;
 }
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index c1291ff..5b3e991 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -222,7 +222,7 @@ Testing: -drive driver=file
 QEMU_PROG: -drive driver=file: The 'file' block driver requires a file name
 
 Testing: -drive driver=nbd
-QEMU_PROG: -drive driver=nbd: one of path and host must be specified.
+QEMU_PROG: -drive driver=nbd: one of path and host must be specified
 
 Testing: -drive driver=raw
 QEMU_PROG: -drive driver=raw: Can't use 'raw' as a block driver for the 
protocol level
@@ -231,7 +231,7 @@ Testing: -drive file.driver=file
 QEMU_PROG: -drive file.driver=file: The 'file' block driver requires a file 
name
 
 Testing: -drive file.driver=nbd
-QEMU_PROG: -drive file.driver=nbd: one of path and host must be specified.
+QEMU_PROG: -drive file.driver=nbd: one of path and host must be specified
 
 Testing: -drive file.driver=raw
 QEMU_PROG: -drive file.driver=raw: Can't use 'raw' as a block driver for the 
protocol level
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index ec6d222..6395a30 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -316,7 +316,7 @@ Testing: -drive driver=file
 QEMU_PROG: -drive driver=file: The 'file' block driver requires a file name
 
 Testing: -drive driver=nbd
-QEMU_PROG: -drive driver=nbd: one of path and host must be specified.
+QEMU_PROG: -drive driver=nbd: one of path and host must be specified
 
 Testing: -drive driver=raw
 QEMU_PROG: -drive driver=raw: Can't use 'raw' as a block driver for the 
protocol level
@@ -325,7 +325,7 @@ Testing: -drive file.driver=file
 QEMU_PROG: -drive file.driver=file: The 'file' block driver requires a file 
name
 
 Testing: -drive file.driver=nbd
-QEMU_PROG: -drive file.driver=nbd: one of path and host must be specified.
+QEMU_PROG: -drive file.driver=nbd: one of path and host must be specified
 
 Testing: -drive file.driver=raw
 QEMU_PROG: -drive file.driver=raw: Can't use 'raw' as a block driver for the 
protocol level
-- 
2.8.0

[Qemu-block] [PATCH v3 for-2.7 00/14] qapi: Allow blockdev-add for NBD

2016-04-06 Thread Max Reitz

Turns out NBD is not so simple to do if you do it right. Anyway, this
series adds blockdev-add support for NBD clients.

Patch 1 adds a QDict function which I needed in later NBD patches for
handling legacy options (move "host" to "address.data.host" etc.).

Patches 2, 3, 4, and 5 are minor patches with no functional relation to
this series, other than later patches will touch the code they touch,
too.

Patches 6 and 7 prepare the code for the addition of a new option
prefix, which is "address.".

Patch 8 makes the NBD client accept a SocketAddress under the "address"
option (or rather, a flattened SocketAddress QDict with its keys
prefixed by "address."). The old options "host", "port", and "path" are
supported as legacy options and translated to the respective
SocketAddress representation.

Patch 9 drops usage of "host", "port", and "path" outside of
nbd_has_filename_options_conflict(),
nbd_process_legacy_socket_options(), and nbd_refresh_filename(), making
those options nothing but legacy.

Patch 10, the goal of this series, is again not very complicated.

Patches 11, 12, and 13 are required for the iotest added in patch 16. It
will invoke qemu-nbd, so patch 13 is required. Besides qemu-nbd, it will
launch an NBD server VM concurrently to the client VM, which is why
patch 14 is required. And finally, it will test whether we can add an
NBD BDS by passing it a file descriptor, which patch 15 is needed for
(so we use the socket_scm_helper to pass sockets to qemu).

Patch 14 then adds the iotest for NBD's blockdev-add interface.


*** This series requires Daniel's qdict_crumple() function (from his
"Provide a QOM-based authorization API" series). ***


v2:
- Dropped patches 2 and 3; use Daniel's qdict_crumple() instead.
- Patch 7: Not sure why the diff differs, seems functionally like the
  same patch to me. Maybe git changed something about its default diff
  algorithm.
- Patch 8: Use qdict_crumple() instead of qdict_unflatten()
- Patch 10: Rebase conflicts


git-backport-diff against v2:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/14:[] [--] 'qdict: Add qdict_change_key()'
002/14:[] [--] 'block/nbd: Drop trailing "." in error messages'
003/14:[] [--] 'block/nbd: Reject port parameter without host'
004/14:[] [--] 'block/nbd: Default port in nbd_refresh_filename()'
005/14:[] [--] 'block/nbd: Use qdict_put()'
006/14:[] [--] 'block/nbd: Add nbd_has_filename_options_conflict()'
007/14:[0008] [FC] 'block/nbd: "address" in nbd_refresh_filename()'
008/14:[0024] [FC] 'block/nbd: Accept SocketAddress'
009/14:[] [--] 'block/nbd: Use SocketAddress options'
010/14:[0004] [FC] 'qapi: Allow blockdev-add for NBD'
011/14:[] [-C] 'iotests.py: Add qemu_nbd function'
012/14:[] [--] 'iotests.py: Allow concurrent qemu instances'
013/14:[] [--] 'socket_scm_helper: Accept fd directly'
014/14:[] [-C] 'iotests: Add test for NBD's blockdev-add interface'


Max Reitz (14):
  qdict: Add qdict_change_key()
  block/nbd: Drop trailing "." in error messages
  block/nbd: Reject port parameter without host
  block/nbd: Default port in nbd_refresh_filename()
  block/nbd: Use qdict_put()
  block/nbd: Add nbd_has_filename_options_conflict()
  block/nbd: "address" in nbd_refresh_filename()
  block/nbd: Accept SocketAddress
  block/nbd: Use SocketAddress options
  qapi: Allow blockdev-add for NBD
  iotests.py: Add qemu_nbd function
  iotests.py: Allow concurrent qemu instances
  socket_scm_helper: Accept fd directly
  iotests: Add test for NBD's blockdev-add interface

 block/nbd.c| 235 ++---
 include/qapi/qmp/qdict.h   |   1 +
 qapi/block-core.json   |  23 +++-
 qobject/qdict.c|  23 
 tests/qemu-iotests/051.out |   4 +-
 tests/qemu-iotests/051.pc.out  |   4 +-
 tests/qemu-iotests/147 | 194 +++
 tests/qemu-iotests/147.out |   5 +
 tests/qemu-iotests/group   |   1 +
 tests/qemu-iotests/iotests.py  |  22 ++-
 tests/qemu-iotests/socket_scm_helper.c |  29 ++--
 11 files changed, 443 insertions(+), 98 deletions(-)
 create mode 100755 tests/qemu-iotests/147
 create mode 100644 tests/qemu-iotests/147.out

-- 
2.8.0

Re: [Qemu-block] [PATCH v3 0/7] bdrv_flush_io_queue removal, shared LinuxAioState

2016-04-06 Thread Paolo Bonzini



On 06/04/2016 20:19, Kevin Wolf wrote:
>> > 
>> > Great, I'll send v4 as soon as possible for inclusion in the block-next
>> > branch.
> Are you still working on v4 or did I miss it? All of the block-next
> candidate series are waiting to be rebased on this one, and moving parts
> at the bottom of a longish dependency chain are kind of nasty.

Yes, I am waiting for Fam's bdrv_co_drain patch, which is this one
depends on.

Paolo

Re: [Qemu-block] [PATCH v3 0/7] bdrv_flush_io_queue removal, shared LinuxAioState

2016-04-06 Thread Kevin Wolf

Am 30.03.2016 um 15:17 hat Paolo Bonzini geschrieben:
> On 30/03/2016 15:06, Stefan Hajnoczi wrote:
> > Acked-by: Stefan Hajnoczi 
> 
> Great, I'll send v4 as soon as possible for inclusion in the block-next
> branch.

Are you still working on v4 or did I miss it? All of the block-next
candidate series are waiting to be rebased on this one, and moving parts
at the bottom of a longish dependency chain are kind of nasty.

Kevin

[Qemu-block] [PATCH v2 for-2.7 4/8] block: Drop blk_new_with_bs()

2016-04-06 Thread Max Reitz

Its only caller is blk_new_open(), so we can just inline it there.

Signed-off-by: Max Reitz 
---
 block/block-backend.c | 28 ++--
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 66b8bad..4e8e8ab 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -175,26 +175,6 @@ BlockBackend *blk_new(Error **errp)
 }
 
 /*
- * Create a new BlockBackend with a new BlockDriverState attached.
- * Otherwise just like blk_new(), which see.
- */
-BlockBackend *blk_new_with_bs(Error **errp)
-{
-BlockBackend *blk;
-BlockDriverState *bs;
-
-blk = blk_new(errp);
-if (!blk) {
-return NULL;
-}
-
-bs = bdrv_new_root();
-blk->root = bdrv_root_attach_child(bs, "root", _root);
-blk->root->opaque = blk;
-return blk;
-}
-
-/*
  * Calls blk_new_with_bs() and then calls bdrv_open() on the BlockDriverState.
  *
  * Just as with bdrv_open(), after having called this function the reference to
@@ -210,21 +190,25 @@ BlockBackend *blk_new_open(const char *filename, const 
char *reference,
QDict *options, int flags, Error **errp)
 {
 BlockBackend *blk;
+BlockDriverState *bs;
 int ret;
 
-blk = blk_new_with_bs(errp);
+blk = blk_new(errp);
 if (!blk) {
 QDECREF(options);
 return NULL;
 }
 
-ret = bdrv_open(>root->bs, filename, reference, options, flags, errp);
+bs = NULL;
+ret = bdrv_open(, filename, reference, options, flags, errp);
 if (ret < 0) {
 blk_unref(blk);
 return NULL;
 }
 
 blk_set_enable_write_cache(blk, true);
+blk->root = bdrv_root_attach_child(bs, "root", _root);
+blk->root->opaque = blk;
 
 return blk;
 }
-- 
2.8.0

[Qemu-block] [PATCH v2 for-2.7 3/8] tests: Drop BDS from test-throttle.c

2016-04-06 Thread Max Reitz

Now that throttling has been moved to the BlockBackend level, we do not
need to create a BDS along with the BB in the I/O throttling test.

Signed-off-by: Max Reitz 
---
 tests/test-throttle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/test-throttle.c b/tests/test-throttle.c
index 5ec966c..d7fb0a6 100644
--- a/tests/test-throttle.c
+++ b/tests/test-throttle.c
@@ -578,9 +578,9 @@ static void test_groups(void)
 BlockBackend *blk1, *blk2, *blk3;
 BlockBackendPublic *blkp1, *blkp2, *blkp3;
 
-blk1 = blk_new_with_bs(_abort);
-blk2 = blk_new_with_bs(_abort);
-blk3 = blk_new_with_bs(_abort);
+blk1 = blk_new(_abort);
+blk2 = blk_new(_abort);
+blk3 = blk_new(_abort);
 
 blkp1 = blk_get_public(blk1);
 blkp2 = blk_get_public(blk2);
-- 
2.8.0

[Qemu-block] [PATCH v2 for-2.7 8/8] block: Drop bdrv_parent_cb_...() from bdrv_close()

2016-04-06 Thread Max Reitz

bdrv_close() now asserts that the BDS's refcount is 0, therefore it
cannot have any parents and the bdrv_parent_cb_change_media() call is a
no-op.

Signed-off-by: Max Reitz 
---
 block.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block.c b/block.c
index 963abcc..7d7fd89 100644
--- a/block.c
+++ b/block.c
@@ -2113,8 +2113,6 @@ static void bdrv_close(BlockDriverState *bs)
 bdrv_release_named_dirty_bitmaps(bs);
 assert(QLIST_EMPTY(>dirty_bitmaps));
 
-bdrv_parent_cb_change_media(bs, false);
-
 if (bs->drv) {
 BdrvChild *child, *next;
 
-- 
2.8.0

[Qemu-block] [PATCH v2 for-2.7 7/8] block: Assert !bs->refcnt in bdrv_close()

2016-04-06 Thread Max Reitz

The only caller of bdrv_close() left is bdrv_delete(). We may as well
assert that, in a way (there are some things in bdrv_close() that make
more sense under that assumption, such as the call to
bdrv_release_all_dirty_bitmaps() which in turn assumes that no frozen
bitmaps are attached to the BDS).

In addition, being called only in bdrv_delete() means that we can drop
bdrv_close()'s forward declaration at the top of block.c.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 355c4be..963abcc 100644
--- a/block.c
+++ b/block.c
@@ -74,8 +74,6 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
-static void bdrv_close(BlockDriverState *bs);
-
 #ifdef _WIN32
 static int is_windows_drive_prefix(const char *filename)
 {
@@ -2106,6 +2104,7 @@ static void bdrv_close(BlockDriverState *bs)
 BdrvAioNotifier *ban, *ban_next;
 
 assert(!bs->job);
+assert(!bs->refcnt);
 
 bdrv_drained_begin(bs); /* complete I/O */
 bdrv_flush(bs);
-- 
2.8.0

[Qemu-block] [PATCH v2 for-2.7 6/8] block: Make bdrv_open() return a BDS

2016-04-06 Thread Max Reitz

There are no callers to bdrv_open() or bdrv_open_inherit() left that
pass a pointer to a non-NULL BDS pointer as the first argument of these
functions, so we can finally drop that parameter and just make them
return the new BDS.

Generally, the following pattern is applied:

bs = NULL;
ret = bdrv_open(, ..., _err);
if (ret < 0) {
error_propagate(errp, local_err);
...
}

by

bs = bdrv_open(..., errp);
if (!bs) {
ret = -EINVAL;
...
}

Of course, there are only a few instances where the pattern is really
pure.

Signed-off-by: Max Reitz 
---
 block.c   | 123 +-
 block/block-backend.c |   6 +--
 block/vvfat.c |   8 ++--
 blockdev.c|  38 ++--
 include/block/block.h |   4 +-
 5 files changed, 63 insertions(+), 116 deletions(-)

diff --git a/block.c b/block.c
index b6d1796..355c4be 100644
--- a/block.c
+++ b/block.c
@@ -64,10 +64,12 @@ static QTAILQ_HEAD(, BlockDriverState) all_bdrv_states =
 static QLIST_HEAD(, BlockDriver) bdrv_drivers =
 QLIST_HEAD_INITIALIZER(bdrv_drivers);
 
-static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
- const char *reference, QDict *options, int flags,
- BlockDriverState *parent,
- const BdrvChildRole *child_role, Error **errp);
+static BlockDriverState *bdrv_open_inherit(const char *filename,
+   const char *reference,
+   QDict *options, int flags,
+   BlockDriverState *parent,
+   const BdrvChildRole *child_role,
+   Error **errp);
 
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
@@ -1338,14 +1340,13 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 qdict_put(options, "driver", qstring_from_str(bs->backing_format));
 }
 
-backing_hd = NULL;
-ret = bdrv_open_inherit(_hd,
-*backing_filename ? backing_filename : NULL,
-reference, options, 0, bs, _backing,
-errp);
-if (ret < 0) {
+backing_hd = bdrv_open_inherit(*backing_filename ? backing_filename : NULL,
+   reference, options, 0, bs, _backing,
+   errp);
+if (!backing_hd) {
 bs->open_flags |= BDRV_O_NO_BACKING;
 error_prepend(errp, "Could not open backing file: ");
+ret = -EINVAL;
 goto free_exit;
 }
 
@@ -1385,7 +1386,6 @@ BdrvChild *bdrv_open_child(const char *filename,
 BdrvChild *c = NULL;
 BlockDriverState *bs;
 QDict *image_options;
-int ret;
 char *bdref_key_dot;
 const char *reference;
 
@@ -1405,10 +1405,9 @@ BdrvChild *bdrv_open_child(const char *filename,
 goto done;
 }
 
-bs = NULL;
-ret = bdrv_open_inherit(, filename, reference, image_options, 0,
-parent, child_role, errp);
-if (ret < 0) {
+bs = bdrv_open_inherit(filename, reference, image_options, 0,
+   parent, child_role, errp);
+if (!bs) {
 goto done;
 }
 
@@ -1429,7 +1428,6 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 int64_t total_size;
 QemuOpts *opts = NULL;
 BlockDriverState *bs_snapshot;
-Error *local_err = NULL;
 int ret;
 
 /* if snapshot, we create a temporary backing file and open it
@@ -1468,12 +1466,10 @@ static BlockDriverState 
*bdrv_append_temp_snapshot(BlockDriverState *bs,
 qdict_put(snapshot_options, "driver",
   qstring_from_str("qcow2"));
 
-bs_snapshot = NULL;
-ret = bdrv_open(_snapshot, NULL, NULL, snapshot_options,
-flags, _err);
+bs_snapshot = bdrv_open(NULL, NULL, snapshot_options, flags, errp);
 snapshot_options = NULL;
-if (ret < 0) {
-error_propagate(errp, local_err);
+if (!bs_snapshot) {
+ret = -EINVAL;
 goto out;
 }
 
@@ -1504,10 +1500,12 @@ out:
  * should be opened. If specified, neither options nor a filename may be given,
  * nor can an existing BDS be reused (that is, *pbs has to be NULL).
  */
-static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
- const char *reference, QDict *options, int flags,
- BlockDriverState *parent,
- const BdrvChildRole *child_role, Error **errp)
+static BlockDriverState *bdrv_open_inherit(const char *filename,
+   const char *reference,
+   QDict *options, int flags,
+

[Qemu-block] [PATCH v2 for-2.7 5/8] block: Drop bdrv_new_root()

2016-04-06 Thread Max Reitz

By now it has become just a wrapper around bdrv_new() and it has only a
single user. Use bdrv_new() there instead and drop this function.

Signed-off-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c   | 5 -
 include/block/block.h | 1 -
 2 files changed, 6 deletions(-)

diff --git a/block.c b/block.c
index d27..b6d1796 100644
--- a/block.c
+++ b/block.c
@@ -222,11 +222,6 @@ void bdrv_register(BlockDriver *bdrv)
 QLIST_INSERT_HEAD(_drivers, bdrv, list);
 }
 
-BlockDriverState *bdrv_new_root(void)
-{
-return bdrv_new();
-}
-
 BlockDriverState *bdrv_new(void)
 {
 BlockDriverState *bs;
diff --git a/include/block/block.h b/include/block/block.h
index 762984e..31fcd07 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -198,7 +198,6 @@ BlockDriver *bdrv_find_format(const char *format_name);
 int bdrv_create(BlockDriver *drv, const char* filename,
 QemuOpts *opts, Error **errp);
 int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
-BlockDriverState *bdrv_new_root(void);
 BlockDriverState *bdrv_new(void);
 void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 void bdrv_replace_in_backing_chain(BlockDriverState *old,
-- 
2.8.0

[Qemu-block] [PATCH v2 for-2.7 2/8] block: Let bdrv_open_inherit() return the snapshot

2016-04-06 Thread Max Reitz

If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that
snapshot BDS should be returned instead of the BDS under it.

To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS
instead of just appending it on top of the snapshotted BDS. Also, it
calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to
undo if not returning the overlay).

Signed-off-by: Max Reitz 
---
 block.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index b6a452a..d27 100644
--- a/block.c
+++ b/block.c
@@ -1424,8 +1424,10 @@ done:
 return c;
 }
 
-static int bdrv_append_temp_snapshot(BlockDriverState *bs, int flags,
- QDict *snapshot_options, Error **errp)
+static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
+   int flags,
+   QDict *snapshot_options,
+   Error **errp)
 {
 /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
 char *tmp_filename = g_malloc0(PATH_MAX + 1);
@@ -1441,7 +1443,6 @@ static int bdrv_append_temp_snapshot(BlockDriverState 
*bs, int flags,
 /* Get the required size from the image */
 total_size = bdrv_getlength(bs);
 if (total_size < 0) {
-ret = total_size;
 error_setg_errno(errp, -total_size, "Could not get image size");
 goto out;
 }
@@ -1481,12 +1482,16 @@ static int bdrv_append_temp_snapshot(BlockDriverState 
*bs, int flags,
 goto out;
 }
 
+bdrv_ref(bs_snapshot);
 bdrv_append(bs_snapshot, bs);
 
+g_free(tmp_filename);
+return bs_snapshot;
+
 out:
 QDECREF(snapshot_options);
 g_free(tmp_filename);
-return ret;
+return NULL;
 }
 
 /*
@@ -1705,17 +1710,31 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 }
 
 QDECREF(options);
-*pbs = bs;
 
 /* For snapshot=on, create a temporary qcow2 overlay. bs points to the
  * temporary snapshot afterwards. */
 if (snapshot_flags) {
-ret = bdrv_append_temp_snapshot(bs, snapshot_flags, snapshot_options,
-_err);
+BlockDriverState *snapshot_bs;
+snapshot_bs = bdrv_append_temp_snapshot(bs, snapshot_flags,
+snapshot_options, _err);
 snapshot_options = NULL;
 if (local_err) {
+ret = -EINVAL;
 goto close_and_fail;
 }
+if (!*pbs) {
+/* The reference is now held by the overlay BDS */
+bdrv_unref(bs);
+bs = snapshot_bs;
+} else {
+/* It is still referenced in the same way that *pbs was referenced,
+ * however that may be */
+bdrv_unref(snapshot_bs);
+}
+}
+
+if (!*pbs) {
+*pbs = bs;
 }
 
 return 0;
-- 
2.8.0

[Qemu-block] [PATCH v2 for-2.7 1/8] block: Drop useless bdrv_new() call

2016-04-06 Thread Max Reitz

bdrv_append_temp_snapshot() uses bdrv_new() to create an empty BDS
before invoking bdrv_open() on that BDS. This is probably a relict from
when it used to do some modifications on that empty BDS, but now that is
unnecessary, so we can just set bs_snapshot to NULL and let bdrv_open()
do the rest.

Signed-off-by: Max Reitz 
---
 block.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 1be887a..b6a452a 100644
--- a/block.c
+++ b/block.c
@@ -1472,8 +1472,7 @@ static int bdrv_append_temp_snapshot(BlockDriverState 
*bs, int flags,
 qdict_put(snapshot_options, "driver",
   qstring_from_str("qcow2"));
 
-bs_snapshot = bdrv_new();
-
+bs_snapshot = NULL;
 ret = bdrv_open(_snapshot, NULL, NULL, snapshot_options,
 flags, _err);
 snapshot_options = NULL;
-- 
2.8.0

[Qemu-block] [PATCH v2 for-2.7 0/8] blockdev: (Nearly) free clean-up work

2016-04-06 Thread Max Reitz

After a lot has been restructed in the block layer in the past, we can
now reap at least one of the fruits: Make bdrv_open() return a BDS!


This series depends on the following series/patches:
- Revert "block: Forbid I/O throttling on nodes with multiple parents
  for 2.6"
  This is something I suppose Kevin will send when the 2.7 development
  window opens.
- "block: Move I/O throttling to BlockBackend" by Kevin
- "block: Remove BlockDriverState.blk" by Kevin


v2:
- Patch 1: bdrv_open_backing_file() has been changed already, so that
  part can be dropped from the patch
- Patch 2:
  - bdrv_append_temp_snapshot() now takes an additional argument
(snapshot_options)
  - Drop a superfluous assignment to ret [Berto]
  - The function is now also static already, so we don't need to do that
- Patch 3: Added because this test has been added in the meantime
- Patch 4: Rebase conflicts
- Patch 6:
  - Rebase conflicts
  - A whole lot of places we no longer need to touch because they use
blk_new_open() instead of bdrv_open()
- Patch 8: Added because patch 7 makes this possible.


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/8:[down] 'block: Drop useless bdrv_new() call'
    Actually [0006] [FC] with a changed commit title
002/8:[0015] [FC] 'block: Let bdrv_open_inherit() return the snapshot'
003/8:[down] 'tests: Drop BDS from test-throttle.c'
004/8:[0019] [FC] 'block: Drop blk_new_with_bs()'
005/8:[] [-C] 'block: Drop bdrv_new_root()'
006/8:[0135] [FC] 'block: Make bdrv_open() return a BDS'
007/8:[] [-C] 'block: Assert !bs->refcnt in bdrv_close()'
008/8:[down] 'block: Drop bdrv_parent_cb_...() from bdrv_close()'


Max Reitz (8):
  block: Drop useless bdrv_new() call
  block: Let bdrv_open_inherit() return the snapshot
  tests: Drop BDS from test-throttle.c
  block: Drop blk_new_with_bs()
  block: Drop bdrv_new_root()
  block: Make bdrv_open() return a BDS
  block: Assert !bs->refcnt in bdrv_close()
  block: Drop bdrv_parent_cb_...() from bdrv_close()

 block.c   | 139 --
 block/block-backend.c |  30 +++
 block/vvfat.c |   8 +--
 blockdev.c|  38 +-
 include/block/block.h |   5 +-
 tests/test-throttle.c |   6 +--
 6 files changed, 83 insertions(+), 143 deletions(-)

-- 
2.8.0

[Qemu-block] [PATCH v2] qcow2: Prevent backing file names longer than 1023

2016-04-06 Thread Max Reitz

We reject backing file names with a length of more than 1023 characters
when opening a qcow2 file, so we should not produce such files
ourselves.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
v2:
- Move the check to qcow2_change_backing_file() so the BDS state will
  not be modified if that condition is not met [Kevin]
---
 block/qcow2.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 056525c..470734b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1986,6 +1986,10 @@ static int qcow2_change_backing_file(BlockDriverState 
*bs,
 {
 BDRVQcow2State *s = bs->opaque;
 
+if (backing_file && strlen(backing_file) > 1023) {
+return -EINVAL;
+}
+
 pstrcpy(bs->backing_file, sizeof(bs->backing_file), backing_file ?: "");
 pstrcpy(bs->backing_format, sizeof(bs->backing_format), backing_fmt ?: "");
 
-- 
2.8.0

Re: [Qemu-block] [PATCH 6/7] qemu-iotests: 141: reduce likelihood of race condition on systems with fast IO

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 18:15 hat Max Reitz geschrieben:
> On 05.04.2016 11:21, Sascha Silbe wrote:
> > On systems with fast IO, qemu-io may write more than 1 MiB before
> > receiving the block-job-cancel command, causing the test case to fail.
> > 
> > 141 is inherently racy, but we can at least reduce the likelihood of the
> > job completing before the cancel command arrives by bumping the size of
> > the data getting written; we'll try 32 MiB for a start.
> 
> Hm, interesting. I tried to prevent this by setting the block jobs'
> speed to 1, which should make it stop after the block job has processed
> the first block of data.
> 
> I won't oppose this patch, because if it fixes things for you, that's
> good. But I don't think it should be necessary.

We don't generally change test cases when they fail. Making a test case
pass doesn't "fix things" per se. It only helps when the failure is a
false positive.

In this case, it looks like there might be a problem with block job
throttling, so maybe we should look into that before changing the test?

Kevin

> > Once we actually move enough data around for the block job not to
> > complete prematurely, the test will still fail because the offset value
> > in the BLOCK_JOB_CANCELLED event will vary. Since this is more or less
> > inherent to the nature of this event, we just replace it with a fixed
> > value globally (in _filter_qmp), the same way we handle timestamps.
> > 
> > Signed-off-by: Sascha Silbe 
> > Reviewed-by: Bo Tu 
> > ---
> >  tests/qemu-iotests/141   | 11 ++-
> >  tests/qemu-iotests/141.out   | 24 
> >  tests/qemu-iotests/common.filter |  1 +
> >  3 files changed, 19 insertions(+), 17 deletions(-)
> 





pgpCbRzZJU0u4.pgp
Description: PGP signature

Re: [Qemu-block] [PATCH 7/7] qemu-iotests: iotests.py: get rid of all

2016-04-06 Thread Max Reitz

On 05.04.2016 11:21, Sascha Silbe wrote:
> The __all__ list contained a typo for as long as the iotests module
> existed. That typo prevented "from iotests import *" (which is the
> only case where iotests.__all__ is used at all) from ever working.

Hahaha :D

Nice one.

> The names used by iotests are highly prone to name collisions, so
> importing them all unconditionally is a bad idea anyway. Since __all__
> is not adding any value, let's just get rid of it.
> 
> Fixes: f345cfd0 ("qemu-iotests: add iotests Python module")
> Signed-off-by: Sascha Silbe 
> Reviewed-by: Bo Tu 
> ---
>  tests/qemu-iotests/iotests.py | 4 
>  1 file changed, 4 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH] qcow2: Prevent backing file names longer than 1023

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 17:14 hat Max Reitz geschrieben:
> We reject backing file names with a length of more than 1023 characters
> when opening a qcow2 file, so we should not produce such files
> ourselves.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 056525c..011a0ae 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1957,6 +1957,11 @@ int qcow2_update_header(BlockDriverState *bs)
>  if (s->image_backing_file) {
>  size_t backing_file_len = strlen(s->image_backing_file);
>  
> +if (backing_file_len > 1023) {
> +ret = -EINVAL;
> +goto fail;
> +}
> +
>  if (buflen < backing_file_len) {
>  ret = -ENOSPC;
>  goto fail;

We should probably already check this in qcow2_change_backing_file(), so
that s->image_backing_file can never contain anything longer than that.
If you like, you can keep an assertion here.

The advantage is that in qcow2_change_backing_file() we can fail the
operation before all of the variables are updated and therefore become
inconsistent with the on-disk state of the image.

Kevin

Re: [Qemu-block] [PATCH 6/7] qemu-iotests: 141: reduce likelihood of race condition on systems with fast IO

2016-04-06 Thread Max Reitz

On 05.04.2016 11:21, Sascha Silbe wrote:
> On systems with fast IO, qemu-io may write more than 1 MiB before
> receiving the block-job-cancel command, causing the test case to fail.
> 
> 141 is inherently racy, but we can at least reduce the likelihood of the
> job completing before the cancel command arrives by bumping the size of
> the data getting written; we'll try 32 MiB for a start.

Hm, interesting. I tried to prevent this by setting the block jobs'
speed to 1, which should make it stop after the block job has processed
the first block of data.

I won't oppose this patch, because if it fixes things for you, that's
good. But I don't think it should be necessary.

Max

> 
> Once we actually move enough data around for the block job not to
> complete prematurely, the test will still fail because the offset value
> in the BLOCK_JOB_CANCELLED event will vary. Since this is more or less
> inherent to the nature of this event, we just replace it with a fixed
> value globally (in _filter_qmp), the same way we handle timestamps.
> 
> Signed-off-by: Sascha Silbe 
> Reviewed-by: Bo Tu 
> ---
>  tests/qemu-iotests/141   | 11 ++-
>  tests/qemu-iotests/141.out   | 24 
>  tests/qemu-iotests/common.filter |  1 +
>  3 files changed, 19 insertions(+), 17 deletions(-)



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 5/7] qemu-iotests: 068: don't require KVM

2016-04-06 Thread Max Reitz

On 05.04.2016 11:21, Sascha Silbe wrote:
> None of the other test cases explicitly enable KVM and there's no
> obvious reason for 068 to require it. Drop this so all test cases can be
> executed in environments where KVM is not available (e.g. because the
> user doesn't have sufficient permissions to access /dev/kvm).
> 
> Signed-off-by: Sascha Silbe 
> Reviewed-by: Bo Tu 
> ---
>  tests/qemu-iotests/068 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Jeff Cody

On Wed, Apr 06, 2016 at 04:47:32PM +0200, Niels de Vos wrote:
> On Wed, Apr 06, 2016 at 08:44:18AM -0400, Jeff Cody wrote:
> > On Wed, Apr 06, 2016 at 01:02:16PM +0200, Kevin Wolf wrote:
> > > [ Adding some CCs ]
> > > 
> > > Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > > > Upon receiving an I/O error after an fsync, by default gluster will
> > > > dump its cache.  However, QEMU will retry the fsync, which is especially
> > > > useful when encountering errors such as ENOSPC when using the 
> > > > werror=stop
> > > > option.  When using caching with gluster, however, the last written data
> > > > will be lost upon encountering ENOSPC.  Using the cache xlator option of
> 
> Use "write-behind xlator" instead of "cache xlator". There are different
> caches in Gluster.
>

Thanks, I'll update the commit message.

> > > > 'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > > > cached data after a failed fsync, so that ENOSPC and other transient
> > > > errors are recoverable.
> > > > 
> > > > Signed-off-by: Jeff Cody 
> > > > ---
> > > >  block/gluster.c | 27 +++
> > > >  configure   |  8 
> > > >  2 files changed, 35 insertions(+)
> > > > 
> > > > diff --git a/block/gluster.c b/block/gluster.c
> > > > index 30a827e..b1cf71b 100644
> > > > --- a/block/gluster.c
> > > > +++ b/block/gluster.c
> > > > @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs, 
> > > >  QDict *options,
> > > >  goto out;
> > > >  }
> > > >  
> > > > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > > > +/* Without this, if fsync fails for a recoverable reason (for 
> > > > instance,
> > > > + * ENOSPC), gluster will dump its cache, preventing retries.  This 
> > > > means
> > > > + * almost certain data loss.  Not all gluster versions support the
> > > > + * 'resync-failed-syncs-after-fsync' key value, but there is no 
> > > > way to
> > > > + * discover during runtime if it is supported (this api returns 
> > > > success for
> > > > + * unknown key/value pairs) */
> > > 
> > > Honestly, this sucks. There is apparently no way to operate gluster so
> > > we can safely recover after a failed fsync. "We hope everything is fine,
> > > but depending on your gluster version, we may now corrupt your image"
> > > isn't very good.
> > > 
> > > We need to consider very carefully if this is good enough to go on after
> > > an error. I'm currently leaning towards "no". That is, we should only
> > > enable this after Gluster provides us a way to make sure that the option
> > > is really set.
> 
> Unfortunately it is also possible to disable the write-behind xlator as
> well. This would cause setting the option to fail too :-/ At the moment
> there is no real useful way to detect if write-behind is disabled (it is
> enabled by default).
> 
> > > > +ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> > > > +   
> > > > "resync-failed-syncs-after-fsync",
> > > > +   "on");
> > > > +if (ret < 0) {
> > > > +error_setg_errno(errp, errno, "Unable to set xlator key/value 
> > > > pair");
> > > > +ret = -errno;
> > > > +goto out;
> > > > +}
> > > > +#endif
> > > 
> > > We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
> > > In this case (as well as theoretically in the case that the option
> > > didn't take effect - if only we could know about it), a failed
> > > glfs_fsync_async() is fatal and we need to stop operating on the image,
> > > i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
> > > The guest will see a broken disk that fails all I/O requests, but that's
> > > better than corrupting data.
> > >
> > 
> > Gluster versions that don't support CONFIG_GLUSTERFS_XLATOR_OPT will
> > also not have the gluster patch that removes the file descriptor
> > invalidation upon error (unless that was a relatively new
> > bug/feature).  But if that is the case, every operation on the file
> > descriptor in those versions will return error.  But it is also rather
> > old versions that don't support glfs_set_xlator_option() I believe.
> 
> Indeed, glfs_set_xlator_option() was introduced with glusterfs-3.4.0. We
> are currently on glusterfs-3.7, with the oldest supported version of
> 3.5. In ~2 months we hopefully have a 3.8 release and that will cause

Is it possible for 3.8 to be able to validate key/value pairs in
glfs_set_xlator_option()?  Or is it something that by design is decoupled
enough that it isn't feasible?

This is the second instance I know of that a lack of error information from
the gluster api has made it difficult to determine if a feature is
supported (the other instance being SEEK_DATA/SEEK_HOLE).  It'd be nice if
error returns were more consistent in 3.8 in the API, is possible.

> the end-of-life of 3.5. 3.4 has been EOL for about a year now, hopefully
>

Re: [Qemu-block] [PATCH 4/7] qemu-iotests: 148: properly skip test if quorum support is missing

2016-04-06 Thread Max Reitz

On 05.04.2016 11:21, Sascha Silbe wrote:
> qemu-iotests test case 148 already had some code for skipping the test
> if quorum support is missing, but it didn't work in all
> cases. TestQuorumEvents.setUp() gets run before the actual test class
> (which contains the skipping code) and tries to start qemu with a drive
> using the quorum driver. For some reason this works fine when using
> qcow2, but fails for raw.
> 
> As the entire test case requires quorum, just check for availability
> before even starting the test suite. Introduce a verify_quorum()
> function in iotests.py for this purpose so future test cases can make
> use of it.
> 
> Signed-off-by: Sascha Silbe 
> Reviewed-by: Bo Tu 
> ---
>  tests/qemu-iotests/148| 4 +---
>  tests/qemu-iotests/iotests.py | 5 +
>  2 files changed, 6 insertions(+), 3 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH for-2.6] block: Forbid I/O throttling on nodes with multiple parents for 2.6

2016-04-06 Thread Paolo Bonzini



On 06/04/2016 17:55, Alberto Garcia wrote:
>> > [BB] overlay
>> >  |
>> >  |
>> > [BDS] qcow2 [BB] backing
>> >  ||
>> >  ++
>> >  .|
>> >  . I/O throttling
>> >  .|
>> > [BDS] raw
>> >   .
>> >   .
>> >   .
> I see, thanks for the explanation!
> 
> But what's the use of having a BlockBackend on a node that is used as a
> backing image? Is it because "we allow this so let's prevent the user
> from doing weird things", or is there anything useful we can do with it?

The backing image can be the one that is visible to the guest, while the
overlay can be a block-backup target (with 'sync':'none) whose
BlockBackend is created by the NBD server.

Paolo

Re: [Qemu-block] [PATCH for-2.6] block: Forbid I/O throttling on nodes with multiple parents for 2.6

2016-04-06 Thread Alberto Garcia

On Wed 06 Apr 2016 11:59:47 AM CEST, Kevin Wolf wrote:
> Let me draw some ASCII art. I'm drawing I/O throttling like a filter
> to make the difference clear; in reality it's part of the BDS now or
> of the BB in 2.7, so don't let this confuse you.
>
> Let's assume these command line options:
>
> -drive file=backimg.img,if=none,id=backing,iops=1024
> -drive file=overlay.qcow2,id=overlay,backing=backing
>
> If we didn't catch this configuration and error out, this is what we
> would end up with in 2.6:
>
> [BB] overlay
>  |
>  |
> [BDS] qcow2 [BB] backing
>  ||
>  ++
>  .|
>  . I/O throttling
>  .|
> [BDS] raw
>   .
>   .
>   .

I see, thanks for the explanation!

But what's the use of having a BlockBackend on a node that is used as a
backing image? Is it because "we allow this so let's prevent the user
from doing weird things", or is there anything useful we can do with it?

Berto

Re: [Qemu-block] [PATCH 3/7] qemu-iotests: iotests.VM: remove qtest socket on error

2016-04-06 Thread Max Reitz

On 05.04.2016 11:21, Sascha Silbe wrote:
> On error, VM.launch() cleaned up the monitor unix socket, but left the
> qtest unix socket behind. This caused the remaining sub-tests to fail
> with EADDRINUSE:
> 
> +==
> +ERROR: testQuorum (__main__.TestFifoQuorumEvents)
> +--
> +Traceback (most recent call last):
> +  File "148", line 63, in setUp
> +self.vm.launch()
> +  File "/home6/silbe/qemu/tests/qemu-iotests/iotests.py", line 247, in launch
> +self._qmp.accept()
> +  File "/home6/silbe/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 
> 141, in accept
> +return self.__negotiate_capabilities()
> +  File "/home6/silbe/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 
> 57, in __negotiate_capabilities
> +raise QMPConnectError
> +QMPConnectError
> +
> +==
> +ERROR: testQuorum (__main__.TestQuorumEvents)
> +--
> +Traceback (most recent call last):
> +  File "148", line 63, in setUp
> +self.vm.launch()
> +  File "/home6/silbe/qemu/tests/qemu-iotests/iotests.py", line 244, in launch
> +self._qtest = qtest.QEMUQtestProtocol(self._qtest_path, server=True)
> +  File "/home6/silbe/qemu/tests/qemu-iotests/../../scripts/qtest.py", line 
> 33, in __init__
> +self._sock.bind(self._address)
> +  File "/usr/lib64/python2.7/socket.py", line 224, in meth
> +return getattr(self._sock,name)(*args)
> +error: [Errno 98] Address already in use
> 
> Fix this by cleaning up both the monitor socket and the qtest socket iff
> they exist.
> 
> Signed-off-by: Sascha Silbe 
> Reviewed-by: Bo Tu 
> ---
>  tests/qemu-iotests/iotests.py | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)

Not sure about the leading underscore (it appears to be the only such
function in iotests.py besides __init__()), but I guess it's a test so
it doesn't really matter anyway.

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 2/7] qemu-iotests: fix 051 on non-PC architectures

2016-04-06 Thread Max Reitz

On 05.04.2016 11:21, Sascha Silbe wrote:
> Commit 61de4c68 [block: Remove BDRV_O_CACHE_WB] updated the reference
> output for PCs, but neglected to do the same for the generic reference
> output file. Fix 051 on all non-PC architectures by applying the same
> change to the generic output file.
> 
> Fixes: 61de4c68 ("block: Remove BDRV_O_CACHE_WB")
> Signed-off-by: Sascha Silbe 
> Reviewed-by: Bo Tu 
> ---
>  tests/qemu-iotests/051.out | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Well, that's interesting. I did note that in
http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06123.html
and thus didn't give an R-b. But in v2,
(http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06353.html)
my R-b magically had appeared there without the patch being changed. I
guess I need to be more careful with reviewing revised patch series from
now on, whoever they are from. :-)

Reviewed-by: Max Reitz 

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH] qcow2: Prevent backing file names longer than 1023

2016-04-06 Thread Eric Blake

On 04/06/2016 09:14 AM, Max Reitz wrote:
> We reject backing file names with a length of more than 1023 characters
> when opening a qcow2 file, so we should not produce such files
> ourselves.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2.c | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 1/7] qemu-iotests: check: don't place files with predictable names in /tmp

2016-04-06 Thread Max Reitz

On 05.04.2016 11:21, Sascha Silbe wrote:
> Placing files with predictable or even hard-coded names in /tmp is a
> security risk and can prevent or disturb operation on a multi-user
> machine. Place them inside the "scratch" directory instead, as we
> already do for most other test-related files.
> 
> Signed-off-by: Sascha Silbe 
> Reviewed-by: Bo Tu 
> ---
>  tests/qemu-iotests/check | 21 +++--
>  1 file changed, 11 insertions(+), 10 deletions(-)

Nice side effect: With this patch it's possible to run multiple
instances of the iotests in parallel (for different formats/protocols)
without them interfering with each other.

Grepping for '/tmp' in the iotests directory yields more occurrences,
however: Many tests set the tmp variable to /tmp/$$. Let's see whether
we can just remove that or have to replace it by "${TEST_DIR}"/$$.

"common.filter" evaluates $tmp, but the single filter that does so is
actually never used any more. Other than that, only "common" evaluates
it, but "common" is sourced by "check". Thus I think those tests setting
$tmp is superfluous and dropping it should be fine.

For this patch:

Reviewed-by: Max Reitz 

You decide whether you want to drop the tmp=/tmp/$$ lines in the tests
in a dedicated (follow-up) patch or include it here.

Max

signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH] qcow2: Prevent backing file names longer than 1023

2016-04-06 Thread Max Reitz

We reject backing file names with a length of more than 1023 characters
when opening a qcow2 file, so we should not produce such files
ourselves.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Max Reitz 
---
 block/qcow2.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 056525c..011a0ae 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1957,6 +1957,11 @@ int qcow2_update_header(BlockDriverState *bs)
 if (s->image_backing_file) {
 size_t backing_file_len = strlen(s->image_backing_file);
 
+if (backing_file_len > 1023) {
+ret = -EINVAL;
+goto fail;
+}
+
 if (buflen < backing_file_len) {
 ret = -ENOSPC;
 goto fail;
-- 
2.8.0

Re: [Qemu-block] Possible bug: virtio-scsi + iothread (Former x-data-plane) = "Guest moved index from 0 to 61440" warning

2016-04-06 Thread Paolo Bonzini



On 06/04/2016 16:59, Zir Blazer wrote:
>> On Tue, Apr 05, 2016 at 07:49:47PM -0300, Zir Blazer wrote:
>> > The bare minimum script that produces the warning, should be this:
>> > #!/bin/bash
>> > qemu-system-x86_64 \-m 1024M \-enable-kvm \-object
> iothread,id=iothread0 \-device virtio-scsi-pci,iothread=iothread0 \
>> > This produces the "Guest moved index from 0 to 61440" warning.
>> 
>> It does not produce a warning with qemu.git/master. I see the BIOS
>> screen.
>> 
>> Stefan
> 
> Sad that you didn't manage to reproduce it. Maybe its fixed on latest
> QEMU, but to test that I would have to learn how to compile and install
> it myself (Or wait for the Arch Linux User Repository qemu-git to catch
> up, otherwise I will have to wait for 2.6).
> 
> I would like if someone could confirm if it happens on QEMU 2.5, or I'm
> the only one affected due to some other misconfiguration issue (Which
> would be rather rare, since what I'm using is a pretty much a fresh
> install, an Arch Linux has a mostly vanilla QEMU).

I can confirm that it shows the message in Fedora's QEMU 2.5 but not
with upstream 2.6.

Thanks,

Paolo

Re: [Qemu-block] Possible bug: virtio-scsi + iothread (Former x-data-plane) = "Guest moved index from 0 to 61440" warning

2016-04-06 Thread Zir Blazer

> On Tue, Apr 05, 2016 at 07:49:47PM -0300, Zir Blazer wrote:> > The bare 
> minimum script that produces the warning, should be this:
> > #!/bin/bash
> > qemu-system-x86_64 \-m 1024M \-enable-kvm \-object iothread,id=iothread0 
> > \-device virtio-scsi-pci,iothread=iothread0 \
> > This produces the "Guest moved index from 0 to 61440" warning.
> 
> It does not produce a warning with qemu.git/master. I see the BIOS
> screen.
> 
> Stefan

Sad that you didn't manage to reproduce it. Maybe its fixed on latest QEMU, but 
to test that I would have to learn how to compile and install it myself (Or 
wait for the Arch Linux User Repository qemu-git to catch up, otherwise I will 
have to wait for 2.6).I would like if someone could confirm if it happens on 
QEMU 2.5, or I'm the only one affected due to some other misconfiguration issue 
(Which would be rather rare, since what I'm using is a pretty much a fresh 
install, an Arch Linux has a mostly vanilla QEMU).

The system DOES POST (SeaBIOS) and boots too, it just merely prints a console 
warning, not a fatal error. I should have added a Screenshot earlier, to make 
it more clear about how it looks:
http://i.imgur.com/54nvRgd.png
Other things seems to work (Since scsi-cd is attached to one of the affected 
virtio-scsi-pci)...
Anyways, I will have to wait to test with a newer QEMU version. Thanks for 
testing it.

Re: [Qemu-block] [PATCH for-2.6 1/2] block/gluster: return correct error value

2016-04-06 Thread Niels de Vos

On Tue, Apr 05, 2016 at 11:29:51PM -0400, Jeff Cody wrote:
> Upon error, gluster will call the aio callback function with a
> ret value of -1, with errno set to the proper error value.  If
> we set the acb->ret value to the return value in the callback,
> that results in every error being EPERM (i.e. 1).  Instead, set
> it to the proper error result.
> 
> Signed-off-by: Jeff Cody 

Looks good to me.

Reviewed-by: Niels de Vos 

> ---
>  block/gluster.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 7bface2..30a827e 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -245,7 +245,7 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, 
> ssize_t ret, void *arg)
>  if (!ret || ret == acb->size) {
>  acb->ret = 0; /* Success */
>  } else if (ret < 0) {
> -acb->ret = ret; /* Read/Write failed */
> +acb->ret = -errno; /* Read/Write failed */
>  } else {
>  acb->ret = -EIO; /* Partial read/write - fail it */
>  }
> -- 
> 1.9.3
>

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Niels de Vos

On Wed, Apr 06, 2016 at 08:44:18AM -0400, Jeff Cody wrote:
> On Wed, Apr 06, 2016 at 01:02:16PM +0200, Kevin Wolf wrote:
> > [ Adding some CCs ]
> > 
> > Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > > Upon receiving an I/O error after an fsync, by default gluster will
> > > dump its cache.  However, QEMU will retry the fsync, which is especially
> > > useful when encountering errors such as ENOSPC when using the werror=stop
> > > option.  When using caching with gluster, however, the last written data
> > > will be lost upon encountering ENOSPC.  Using the cache xlator option of

Use "write-behind xlator" instead of "cache xlator". There are different
caches in Gluster.

> > > 'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > > cached data after a failed fsync, so that ENOSPC and other transient
> > > errors are recoverable.
> > > 
> > > Signed-off-by: Jeff Cody 
> > > ---
> > >  block/gluster.c | 27 +++
> > >  configure   |  8 
> > >  2 files changed, 35 insertions(+)
> > > 
> > > diff --git a/block/gluster.c b/block/gluster.c
> > > index 30a827e..b1cf71b 100644
> > > --- a/block/gluster.c
> > > +++ b/block/gluster.c
> > > @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> > > QDict *options,
> > >  goto out;
> > >  }
> > >  
> > > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > > +/* Without this, if fsync fails for a recoverable reason (for 
> > > instance,
> > > + * ENOSPC), gluster will dump its cache, preventing retries.  This 
> > > means
> > > + * almost certain data loss.  Not all gluster versions support the
> > > + * 'resync-failed-syncs-after-fsync' key value, but there is no way 
> > > to
> > > + * discover during runtime if it is supported (this api returns 
> > > success for
> > > + * unknown key/value pairs) */
> > 
> > Honestly, this sucks. There is apparently no way to operate gluster so
> > we can safely recover after a failed fsync. "We hope everything is fine,
> > but depending on your gluster version, we may now corrupt your image"
> > isn't very good.
> > 
> > We need to consider very carefully if this is good enough to go on after
> > an error. I'm currently leaning towards "no". That is, we should only
> > enable this after Gluster provides us a way to make sure that the option
> > is really set.

Unfortunately it is also possible to disable the write-behind xlator as
well. This would cause setting the option to fail too :-/ At the moment
there is no real useful way to detect if write-behind is disabled (it is
enabled by default).

> > > +ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> > > +   
> > > "resync-failed-syncs-after-fsync",
> > > +   "on");
> > > +if (ret < 0) {
> > > +error_setg_errno(errp, errno, "Unable to set xlator key/value 
> > > pair");
> > > +ret = -errno;
> > > +goto out;
> > > +}
> > > +#endif
> > 
> > We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
> > In this case (as well as theoretically in the case that the option
> > didn't take effect - if only we could know about it), a failed
> > glfs_fsync_async() is fatal and we need to stop operating on the image,
> > i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
> > The guest will see a broken disk that fails all I/O requests, but that's
> > better than corrupting data.
> >
> 
> Gluster versions that don't support CONFIG_GLUSTERFS_XLATOR_OPT will
> also not have the gluster patch that removes the file descriptor
> invalidation upon error (unless that was a relatively new
> bug/feature).  But if that is the case, every operation on the file
> descriptor in those versions will return error.  But it is also rather
> old versions that don't support glfs_set_xlator_option() I believe.

Indeed, glfs_set_xlator_option() was introduced with glusterfs-3.4.0. We
are currently on glusterfs-3.7, with the oldest supported version of
3.5. In ~2 months we hopefully have a 3.8 release and that will cause
the end-of-life of 3.5. 3.4 has been EOL for about a year now, hopefully
all our users have upgraded, but we know that some users will stay on
unsupported versions for a long time...

However, the "resync-failed-syncs-after-fsync" option was only
introduced recently, with glusterfs-3.7.9. You could detect this with
pkg-config glusterfs-api >= 7.3.7.9 if need to be.

More details about the problem the option addresses are in the commit
message on http://review.gluster.org/13057 .

HTH,
Niels

Re: [Qemu-block] [Qemu-devel] [PATCH] qemu-img: fix formatting of error message

2016-04-06 Thread Eric Blake

On 04/06/2016 03:16 AM, Daniel P. Berrange wrote:
> The error_reportf_err() will not automatically append a
> ': ' before adding its suffix, so we must include that
> in the message we pass it, otherwise we get a badly
> formatted message lacking whitespace:
> 
> qemu-img: Could not open 
> 'driver=nbd,host=127.0.0.1,port=,tls-creds=tls0'Failed to connect socket: 
> Connection refused
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  qemu-img.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 06264d9..17c5cfd 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -256,7 +256,7 @@ static BlockBackend *img_open_opts(const char *optstr,
>  options = qemu_opts_to_qdict(opts, NULL);
>  blk = blk_new_open(NULL, NULL, options, flags, _err);
>  if (!blk) {
> -error_reportf_err(local_err, "Could not open '%s'", optstr);
> +error_reportf_err(local_err, "Could not open '%s': ", optstr);
>  return NULL;
>  }
>  blk_set_enable_write_cache(blk, !writethrough);
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Ric Wheeler



We had a thread discussing this not on the upstream list.

My summary of the thread is that I don't understand why gluster should drop 
cached data after a failed fsync() for any open file. For closed files, I think 
it might still happen but this is the same as any file system (and unlikely to 
be the case for qemu?).


I will note that Linux in general had (still has I think?) the behavior that 
once the process closes a file (or exits), we lose context to return an error 
to. From that point on, any failed IO from the page cache to the target disk 
will be dropped from cache. To hold things in the cache would lead it to fill 
with old data that is not really recoverable and we have no good way to know 
that the situation is repairable and how long that might take. Upstream kernel 
people have debated this, the behavior might be tweaked for certain types of errors.


Regards,

Ric


On 04/06/2016 07:02 AM, Kevin Wolf wrote:

[ Adding some CCs ]

Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:

Upon receiving an I/O error after an fsync, by default gluster will
dump its cache.  However, QEMU will retry the fsync, which is especially
useful when encountering errors such as ENOSPC when using the werror=stop
option.  When using caching with gluster, however, the last written data
will be lost upon encountering ENOSPC.  Using the cache xlator option of
'resync-failed-syncs-after-fsync' should cause gluster to retain the
cached data after a failed fsync, so that ENOSPC and other transient
errors are recoverable.

Signed-off-by: Jeff Cody 
---
  block/gluster.c | 27 +++
  configure   |  8 
  2 files changed, 35 insertions(+)

diff --git a/block/gluster.c b/block/gluster.c
index 30a827e..b1cf71b 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
  goto out;
  }
  
+#ifdef CONFIG_GLUSTERFS_XLATOR_OPT

+/* Without this, if fsync fails for a recoverable reason (for instance,
+ * ENOSPC), gluster will dump its cache, preventing retries.  This means
+ * almost certain data loss.  Not all gluster versions support the
+ * 'resync-failed-syncs-after-fsync' key value, but there is no way to
+ * discover during runtime if it is supported (this api returns success for
+ * unknown key/value pairs) */

Honestly, this sucks. There is apparently no way to operate gluster so
we can safely recover after a failed fsync. "We hope everything is fine,
but depending on your gluster version, we may now corrupt your image"
isn't very good.

We need to consider very carefully if this is good enough to go on after
an error. I'm currently leaning towards "no". That is, we should only
enable this after Gluster provides us a way to make sure that the option
is really set.


+ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
+   "resync-failed-syncs-after-fsync",
+   "on");
+if (ret < 0) {
+error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
+ret = -errno;
+goto out;
+}
+#endif

We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
In this case (as well as theoretically in the case that the option
didn't take effect - if only we could know about it), a failed
glfs_fsync_async() is fatal and we need to stop operating on the image,
i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
The guest will see a broken disk that fails all I/O requests, but that's
better than corrupting data.

Kevin

Re: [Qemu-block] [Qemu-devel] [PATCH] block: initialize qcrypto API at startup

2016-04-06 Thread Alex Bligh


On 6 Apr 2016, at 12:12, Daniel P. Berrange  wrote:

> Any programs which call the qcrypto APIs should ensure that
> qcrypto_init() has been called before anything else which
> can use crypto. Essentially this means right at the start
> of the main method before initializing anything else.
> 
> This is important because some versions of gnutls/gcrypt
> require explicit initialization before use.
> 
> Signed-off-by: Daniel P. Berrange 

Reviewed-by: Alex Bligh 
Tested-by: Alex Bligh 

> ---
> qemu-img.c | 6 ++
> qemu-io.c  | 6 ++
> qemu-nbd.c | 7 +++
> 3 files changed, 19 insertions(+)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 17c5cfd..1697762 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -37,6 +37,7 @@
> #include "block/block_int.h"
> #include "block/blockjob.h"
> #include "block/qapi.h"
> +#include "crypto/init.h"
> #include 
> 
> #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION QEMU_PKGVERSION \
> @@ -3486,6 +3487,11 @@ int main(int argc, char **argv)
> exit(EXIT_FAILURE);
> }
> 
> +if (qcrypto_init(_error) < 0) {
> +error_reportf_err(local_error, "cannot initialize crypto: ");
> +exit(1);
> +}
> +
> module_call_init(MODULE_INIT_QOM);
> bdrv_init();
> if (argc < 2) {
> diff --git a/qemu-io.c b/qemu-io.c
> index 0a738f1..288bba8 100644
> --- a/qemu-io.c
> +++ b/qemu-io.c
> @@ -23,6 +23,7 @@
> #include "sysemu/block-backend.h"
> #include "block/block_int.h"
> #include "trace/control.h"
> +#include "crypto/init.h"
> 
> #define CMD_NOFILE_OK   0x01
> 
> @@ -443,6 +444,11 @@ int main(int argc, char **argv)
> progname = basename(argv[0]);
> qemu_init_exec_dir(argv[0]);
> 
> +if (qcrypto_init(_error) < 0) {
> +error_reportf_err(local_error, "cannot initialize crypto: ");
> +exit(1);
> +}
> +
> module_call_init(MODULE_INIT_QOM);
> qemu_add_opts(_object_opts);
> bdrv_init();
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index ca4a724..fa91ca3 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -31,6 +31,7 @@
> #include "qapi/qmp/qstring.h"
> #include "qom/object_interfaces.h"
> #include "io/channel-socket.h"
> +#include "crypto/init.h"
> 
> #include 
> #include 
> @@ -518,6 +519,12 @@ int main(int argc, char **argv)
> memset(_sigterm, 0, sizeof(sa_sigterm));
> sa_sigterm.sa_handler = termsig_handler;
> sigaction(SIGTERM, _sigterm, NULL);
> +
> +if (qcrypto_init(_err) < 0) {
> +error_reportf_err(local_err, "cannot initialize crypto: ");
> +exit(1);
> +}
> +
> module_call_init(MODULE_INIT_QOM);
> qemu_add_opts(_object_opts);
> qemu_init_exec_dir(argv[0]);
> -- 
> 2.5.5
> 
> 

-- 
Alex Bligh

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 15:10 hat Jeff Cody geschrieben:
> On Wed, Apr 06, 2016 at 01:51:59PM +0200, Kevin Wolf wrote:
> > Am 06.04.2016 um 13:41 hat Kevin Wolf geschrieben:
> > > Am 06.04.2016 um 13:19 hat Ric Wheeler geschrieben:
> > > > 
> > > > We had a thread discussing this not on the upstream list.
> > > > 
> > > > My summary of the thread is that I don't understand why gluster
> > > > should drop cached data after a failed fsync() for any open file.
> > > 
> > > It certainly shouldn't, but it does by default. :-)
> > > 
> > > Have a look at commit 3fcead2d in glusterfs.git, which at least
> > > introduces an option to get usable behaviour:
> > > 
> > > { .key = {"resync-failed-syncs-after-fsync"},
> > >   .type = GF_OPTION_TYPE_BOOL,
> > >   .default_value = "off",
> > >   .description = "If sync of \"cached-writes issued before fsync\" "
> > >  "(to backend) fails, this option configures whether "
> > >  "to retry syncing them after fsync or forget them. "
> > >  "If set to on, cached-writes are retried "
> > >  "till a \"flush\" fop (or a successful sync) on sync 
> > > "
> > >  "failures. "
> > >  "fsync itself is failed irrespective of the value of 
> > > "
> > >  "this option. ",
> > > },
> > > 
> > > As you can see, the default is still to drop cached data, and this is
> > > with the file still opened. qemu needs to make sure that this option is
> > > set, and if Jeff's comment in the code below is right, there is no way
> > > currently to make sure that the option isn't silently ignored.
> > > 
> > > Can we get some function that sets an option and fails if the option is
> > > unknown? Or one that queries the state after setting an option, so we
> > > can check whether we succeeded in switching to the mode we need?
> > > 
> > > > For closed files, I think it might still happen but this is the same
> > > > as any file system (and unlikely to be the case for qemu?).
> > > 
> > > Our problem is only with open images. Dropping caches for files that
> > > qemu doesn't use any more is fine as far as I'm concerned.
> > > 
> > > Note that our usage can involve cases where we reopen a file with
> > > different flags, i.e. first open a second file descriptor, then close
> > > the first one. The image was never completely closed here and we would
> > > still want the cache to preserve our data in such cases.
> > 
> > Hm, actually, maybe we should just call bdrv_flush() before reopening an
> > image, and if an error is returned, we abort the reopen. It's far from
> > being a hot path, so the overhead of a flush shouldn't matter, and it
> > seems we're taking an unnecessary risk without doing this.
> >
> 
> [I seemed to have been dropped from the cc]
> 
> Are you talking about doing a bdrv_flush() on the new descriptor (i.e.
> reop_s->glfs)?  Because otherwise, we already do this in
> bdrv_reopen_prepare() on the original fd.  It happens right before the call
> to drv->bdrv_reopen_prepare():
> 
> 
> 2020 ret = bdrv_flush(reopen_state->bs);
> 2021 if (ret) {
> 2022 error_setg_errno(errp, -ret, "Error flushing drive");
> 2023 goto error;
> 2024 }
> 2025 
> 2026 if (drv->bdrv_reopen_prepare) {
> 2027 ret = drv->bdrv_reopen_prepare(reopen_state, queue, _err);

Ah, thanks. Yes, this is what I meant. I expected it somewhere close to
the bdrv_drain_all() call, so I missed the call you quoted. So that's
good news, at least this part of the problem doesn't exist then. :-)

Kevin

> > 
> > > > I will note that Linux in general had (still has I think?) the
> > > > behavior that once the process closes a file (or exits), we lose
> > > > context to return an error to. From that point on, any failed IO
> > > > from the page cache to the target disk will be dropped from cache.
> > > > To hold things in the cache would lead it to fill with old data that
> > > > is not really recoverable and we have no good way to know that the
> > > > situation is repairable and how long that might take. Upstream
> > > > kernel people have debated this, the behavior might be tweaked for
> > > > certain types of errors.
> > > 
> > > That's fine, we just don't want the next fsync() to signal success when
> > > in reality the cache has thrown away our data. As soon as we close the
> > > image, there is no next fsync(), so you can do whatever you like.
> > > 
> > > Kevin
> > > 
> > > > On 04/06/2016 07:02 AM, Kevin Wolf wrote:
> > > > >[ Adding some CCs ]
> > > > >
> > > > >Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > > > >>Upon receiving an I/O error after an fsync, by default gluster will
> > > > >>dump its cache.  However, QEMU will retry the fsync, which is 
> > > > >>especially
> > > > >>useful when encountering errors such as ENOSPC when using the 
> > > > >>werror=stop
> > > > >>option.  When using caching with gluster, however, the

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Jeff Cody

On Wed, Apr 06, 2016 at 01:51:59PM +0200, Kevin Wolf wrote:
> Am 06.04.2016 um 13:41 hat Kevin Wolf geschrieben:
> > Am 06.04.2016 um 13:19 hat Ric Wheeler geschrieben:
> > > 
> > > We had a thread discussing this not on the upstream list.
> > > 
> > > My summary of the thread is that I don't understand why gluster
> > > should drop cached data after a failed fsync() for any open file.
> > 
> > It certainly shouldn't, but it does by default. :-)
> > 
> > Have a look at commit 3fcead2d in glusterfs.git, which at least
> > introduces an option to get usable behaviour:
> > 
> > { .key = {"resync-failed-syncs-after-fsync"},
> >   .type = GF_OPTION_TYPE_BOOL,
> >   .default_value = "off",
> >   .description = "If sync of \"cached-writes issued before fsync\" "
> >  "(to backend) fails, this option configures whether "
> >  "to retry syncing them after fsync or forget them. "
> >  "If set to on, cached-writes are retried "
> >  "till a \"flush\" fop (or a successful sync) on sync "
> >  "failures. "
> >  "fsync itself is failed irrespective of the value of "
> >  "this option. ",
> > },
> > 
> > As you can see, the default is still to drop cached data, and this is
> > with the file still opened. qemu needs to make sure that this option is
> > set, and if Jeff's comment in the code below is right, there is no way
> > currently to make sure that the option isn't silently ignored.
> > 
> > Can we get some function that sets an option and fails if the option is
> > unknown? Or one that queries the state after setting an option, so we
> > can check whether we succeeded in switching to the mode we need?
> > 
> > > For closed files, I think it might still happen but this is the same
> > > as any file system (and unlikely to be the case for qemu?).
> > 
> > Our problem is only with open images. Dropping caches for files that
> > qemu doesn't use any more is fine as far as I'm concerned.
> > 
> > Note that our usage can involve cases where we reopen a file with
> > different flags, i.e. first open a second file descriptor, then close
> > the first one. The image was never completely closed here and we would
> > still want the cache to preserve our data in such cases.
> 
> Hm, actually, maybe we should just call bdrv_flush() before reopening an
> image, and if an error is returned, we abort the reopen. It's far from
> being a hot path, so the overhead of a flush shouldn't matter, and it
> seems we're taking an unnecessary risk without doing this.
>

[I seemed to have been dropped from the cc]

Are you talking about doing a bdrv_flush() on the new descriptor (i.e.
reop_s->glfs)?  Because otherwise, we already do this in
bdrv_reopen_prepare() on the original fd.  It happens right before the call
to drv->bdrv_reopen_prepare():


2020 ret = bdrv_flush(reopen_state->bs);
2021 if (ret) {
2022 error_setg_errno(errp, -ret, "Error flushing drive");
2023 goto error;
2024 }
2025 
2026 if (drv->bdrv_reopen_prepare) {
2027 ret = drv->bdrv_reopen_prepare(reopen_state, queue, _err);

> 
> > > I will note that Linux in general had (still has I think?) the
> > > behavior that once the process closes a file (or exits), we lose
> > > context to return an error to. From that point on, any failed IO
> > > from the page cache to the target disk will be dropped from cache.
> > > To hold things in the cache would lead it to fill with old data that
> > > is not really recoverable and we have no good way to know that the
> > > situation is repairable and how long that might take. Upstream
> > > kernel people have debated this, the behavior might be tweaked for
> > > certain types of errors.
> > 
> > That's fine, we just don't want the next fsync() to signal success when
> > in reality the cache has thrown away our data. As soon as we close the
> > image, there is no next fsync(), so you can do whatever you like.
> > 
> > Kevin
> > 
> > > On 04/06/2016 07:02 AM, Kevin Wolf wrote:
> > > >[ Adding some CCs ]
> > > >
> > > >Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > > >>Upon receiving an I/O error after an fsync, by default gluster will
> > > >>dump its cache.  However, QEMU will retry the fsync, which is especially
> > > >>useful when encountering errors such as ENOSPC when using the 
> > > >>werror=stop
> > > >>option.  When using caching with gluster, however, the last written data
> > > >>will be lost upon encountering ENOSPC.  Using the cache xlator option of
> > > >>'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > > >>cached data after a failed fsync, so that ENOSPC and other transient
> > > >>errors are recoverable.
> > > >>
> > > >>Signed-off-by: Jeff Cody 
> > > >>---
> > > >>  block/gluster.c | 27 +++
> > > >>  configure   |  8 
> > > >>  2 files changed,

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Jeff Cody

On Wed, Apr 06, 2016 at 01:02:16PM +0200, Kevin Wolf wrote:
> [ Adding some CCs ]
> 
> Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > Upon receiving an I/O error after an fsync, by default gluster will
> > dump its cache.  However, QEMU will retry the fsync, which is especially
> > useful when encountering errors such as ENOSPC when using the werror=stop
> > option.  When using caching with gluster, however, the last written data
> > will be lost upon encountering ENOSPC.  Using the cache xlator option of
> > 'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > cached data after a failed fsync, so that ENOSPC and other transient
> > errors are recoverable.
> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/gluster.c | 27 +++
> >  configure   |  8 
> >  2 files changed, 35 insertions(+)
> > 
> > diff --git a/block/gluster.c b/block/gluster.c
> > index 30a827e..b1cf71b 100644
> > --- a/block/gluster.c
> > +++ b/block/gluster.c
> > @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> > QDict *options,
> >  goto out;
> >  }
> >  
> > +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > +/* Without this, if fsync fails for a recoverable reason (for instance,
> > + * ENOSPC), gluster will dump its cache, preventing retries.  This 
> > means
> > + * almost certain data loss.  Not all gluster versions support the
> > + * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> > + * discover during runtime if it is supported (this api returns 
> > success for
> > + * unknown key/value pairs) */
> 
> Honestly, this sucks. There is apparently no way to operate gluster so
> we can safely recover after a failed fsync. "We hope everything is fine,
> but depending on your gluster version, we may now corrupt your image"
> isn't very good.
> 
> We need to consider very carefully if this is good enough to go on after
> an error. I'm currently leaning towards "no". That is, we should only
> enable this after Gluster provides us a way to make sure that the option
> is really set.
> 
> > +ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> > +   
> > "resync-failed-syncs-after-fsync",
> > +   "on");
> > +if (ret < 0) {
> > +error_setg_errno(errp, errno, "Unable to set xlator key/value 
> > pair");
> > +ret = -errno;
> > +goto out;
> > +}
> > +#endif
> 
> We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
> In this case (as well as theoretically in the case that the option
> didn't take effect - if only we could know about it), a failed
> glfs_fsync_async() is fatal and we need to stop operating on the image,
> i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
> The guest will see a broken disk that fails all I/O requests, but that's
> better than corrupting data.
>

Gluster versions that don't support CONFIG_GLUSTERFS_XLATOR_OPT will
also not have the gluster patch that removes the file descriptor
invalidation upon error (unless that was a relatively new
bug/feature).  But if that is the case, every operation on the file
descriptor in those versions will return error.  But it is also rather
old versions that don't support glfs_set_xlator_option() I believe.

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 13:41 hat Kevin Wolf geschrieben:
> Am 06.04.2016 um 13:19 hat Ric Wheeler geschrieben:
> > 
> > We had a thread discussing this not on the upstream list.
> > 
> > My summary of the thread is that I don't understand why gluster
> > should drop cached data after a failed fsync() for any open file.
> 
> It certainly shouldn't, but it does by default. :-)
> 
> Have a look at commit 3fcead2d in glusterfs.git, which at least
> introduces an option to get usable behaviour:
> 
> { .key = {"resync-failed-syncs-after-fsync"},
>   .type = GF_OPTION_TYPE_BOOL,
>   .default_value = "off",
>   .description = "If sync of \"cached-writes issued before fsync\" "
>  "(to backend) fails, this option configures whether "
>  "to retry syncing them after fsync or forget them. "
>  "If set to on, cached-writes are retried "
>  "till a \"flush\" fop (or a successful sync) on sync "
>  "failures. "
>  "fsync itself is failed irrespective of the value of "
>  "this option. ",
> },
> 
> As you can see, the default is still to drop cached data, and this is
> with the file still opened. qemu needs to make sure that this option is
> set, and if Jeff's comment in the code below is right, there is no way
> currently to make sure that the option isn't silently ignored.
> 
> Can we get some function that sets an option and fails if the option is
> unknown? Or one that queries the state after setting an option, so we
> can check whether we succeeded in switching to the mode we need?
> 
> > For closed files, I think it might still happen but this is the same
> > as any file system (and unlikely to be the case for qemu?).
> 
> Our problem is only with open images. Dropping caches for files that
> qemu doesn't use any more is fine as far as I'm concerned.
> 
> Note that our usage can involve cases where we reopen a file with
> different flags, i.e. first open a second file descriptor, then close
> the first one. The image was never completely closed here and we would
> still want the cache to preserve our data in such cases.

Hm, actually, maybe we should just call bdrv_flush() before reopening an
image, and if an error is returned, we abort the reopen. It's far from
being a hot path, so the overhead of a flush shouldn't matter, and it
seems we're taking an unnecessary risk without doing this.

Kevin

> > I will note that Linux in general had (still has I think?) the
> > behavior that once the process closes a file (or exits), we lose
> > context to return an error to. From that point on, any failed IO
> > from the page cache to the target disk will be dropped from cache.
> > To hold things in the cache would lead it to fill with old data that
> > is not really recoverable and we have no good way to know that the
> > situation is repairable and how long that might take. Upstream
> > kernel people have debated this, the behavior might be tweaked for
> > certain types of errors.
> 
> That's fine, we just don't want the next fsync() to signal success when
> in reality the cache has thrown away our data. As soon as we close the
> image, there is no next fsync(), so you can do whatever you like.
> 
> Kevin
> 
> > On 04/06/2016 07:02 AM, Kevin Wolf wrote:
> > >[ Adding some CCs ]
> > >
> > >Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> > >>Upon receiving an I/O error after an fsync, by default gluster will
> > >>dump its cache.  However, QEMU will retry the fsync, which is especially
> > >>useful when encountering errors such as ENOSPC when using the werror=stop
> > >>option.  When using caching with gluster, however, the last written data
> > >>will be lost upon encountering ENOSPC.  Using the cache xlator option of
> > >>'resync-failed-syncs-after-fsync' should cause gluster to retain the
> > >>cached data after a failed fsync, so that ENOSPC and other transient
> > >>errors are recoverable.
> > >>
> > >>Signed-off-by: Jeff Cody 
> > >>---
> > >>  block/gluster.c | 27 +++
> > >>  configure   |  8 
> > >>  2 files changed, 35 insertions(+)
> > >>
> > >>diff --git a/block/gluster.c b/block/gluster.c
> > >>index 30a827e..b1cf71b 100644
> > >>--- a/block/gluster.c
> > >>+++ b/block/gluster.c
> > >>@@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> > >>QDict *options,
> > >>  goto out;
> > >>  }
> > >>+#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> > >>+/* Without this, if fsync fails for a recoverable reason (for 
> > >>instance,
> > >>+ * ENOSPC), gluster will dump its cache, preventing retries.  This 
> > >>means
> > >>+ * almost certain data loss.  Not all gluster versions support the
> > >>+ * 'resync-failed-syncs-after-fsync' key value, but there is no way 
> > >>to
> > >>+ * discover during runtime if it is supported (this api returns 
> > >>success for
> > >>+

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 13:19 hat Ric Wheeler geschrieben:
> 
> We had a thread discussing this not on the upstream list.
> 
> My summary of the thread is that I don't understand why gluster
> should drop cached data after a failed fsync() for any open file.

It certainly shouldn't, but it does by default. :-)

Have a look at commit 3fcead2d in glusterfs.git, which at least
introduces an option to get usable behaviour:

{ .key = {"resync-failed-syncs-after-fsync"},
  .type = GF_OPTION_TYPE_BOOL,
  .default_value = "off",
  .description = "If sync of \"cached-writes issued before fsync\" "
 "(to backend) fails, this option configures whether "
 "to retry syncing them after fsync or forget them. "
 "If set to on, cached-writes are retried "
 "till a \"flush\" fop (or a successful sync) on sync "
 "failures. "
 "fsync itself is failed irrespective of the value of "
 "this option. ",
},

As you can see, the default is still to drop cached data, and this is
with the file still opened. qemu needs to make sure that this option is
set, and if Jeff's comment in the code below is right, there is no way
currently to make sure that the option isn't silently ignored.

Can we get some function that sets an option and fails if the option is
unknown? Or one that queries the state after setting an option, so we
can check whether we succeeded in switching to the mode we need?

> For closed files, I think it might still happen but this is the same
> as any file system (and unlikely to be the case for qemu?).

Our problem is only with open images. Dropping caches for files that
qemu doesn't use any more is fine as far as I'm concerned.

Note that our usage can involve cases where we reopen a file with
different flags, i.e. first open a second file descriptor, then close
the first one. The image was never completely closed here and we would
still want the cache to preserve our data in such cases.

> I will note that Linux in general had (still has I think?) the
> behavior that once the process closes a file (or exits), we lose
> context to return an error to. From that point on, any failed IO
> from the page cache to the target disk will be dropped from cache.
> To hold things in the cache would lead it to fill with old data that
> is not really recoverable and we have no good way to know that the
> situation is repairable and how long that might take. Upstream
> kernel people have debated this, the behavior might be tweaked for
> certain types of errors.

That's fine, we just don't want the next fsync() to signal success when
in reality the cache has thrown away our data. As soon as we close the
image, there is no next fsync(), so you can do whatever you like.

Kevin

> On 04/06/2016 07:02 AM, Kevin Wolf wrote:
> >[ Adding some CCs ]
> >
> >Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> >>Upon receiving an I/O error after an fsync, by default gluster will
> >>dump its cache.  However, QEMU will retry the fsync, which is especially
> >>useful when encountering errors such as ENOSPC when using the werror=stop
> >>option.  When using caching with gluster, however, the last written data
> >>will be lost upon encountering ENOSPC.  Using the cache xlator option of
> >>'resync-failed-syncs-after-fsync' should cause gluster to retain the
> >>cached data after a failed fsync, so that ENOSPC and other transient
> >>errors are recoverable.
> >>
> >>Signed-off-by: Jeff Cody 
> >>---
> >>  block/gluster.c | 27 +++
> >>  configure   |  8 
> >>  2 files changed, 35 insertions(+)
> >>
> >>diff --git a/block/gluster.c b/block/gluster.c
> >>index 30a827e..b1cf71b 100644
> >>--- a/block/gluster.c
> >>+++ b/block/gluster.c
> >>@@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> >>QDict *options,
> >>  goto out;
> >>  }
> >>+#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> >>+/* Without this, if fsync fails for a recoverable reason (for instance,
> >>+ * ENOSPC), gluster will dump its cache, preventing retries.  This 
> >>means
> >>+ * almost certain data loss.  Not all gluster versions support the
> >>+ * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> >>+ * discover during runtime if it is supported (this api returns 
> >>success for
> >>+ * unknown key/value pairs) */
> >Honestly, this sucks. There is apparently no way to operate gluster so
> >we can safely recover after a failed fsync. "We hope everything is fine,
> >but depending on your gluster version, we may now corrupt your image"
> >isn't very good.
> >
> >We need to consider very carefully if this is good enough to go on after
> >an error. I'm currently leaning towards "no". That is, we should only
> >enable this after Gluster provides us a way to make sure that the option
> >is really set.
> >
> >>+ret

[Qemu-block] [PATCH] block: initialize qcrypto API at startup

2016-04-06 Thread Daniel P. Berrange

Any programs which call the qcrypto APIs should ensure that
qcrypto_init() has been called before anything else which
can use crypto. Essentially this means right at the start
of the main method before initializing anything else.

This is important because some versions of gnutls/gcrypt
require explicit initialization before use.

Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 6 ++
 qemu-io.c  | 6 ++
 qemu-nbd.c | 7 +++
 3 files changed, 19 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 17c5cfd..1697762 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -37,6 +37,7 @@
 #include "block/block_int.h"
 #include "block/blockjob.h"
 #include "block/qapi.h"
+#include "crypto/init.h"
 #include 
 
 #define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION QEMU_PKGVERSION \
@@ -3486,6 +3487,11 @@ int main(int argc, char **argv)
 exit(EXIT_FAILURE);
 }
 
+if (qcrypto_init(_error) < 0) {
+error_reportf_err(local_error, "cannot initialize crypto: ");
+exit(1);
+}
+
 module_call_init(MODULE_INIT_QOM);
 bdrv_init();
 if (argc < 2) {
diff --git a/qemu-io.c b/qemu-io.c
index 0a738f1..288bba8 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -23,6 +23,7 @@
 #include "sysemu/block-backend.h"
 #include "block/block_int.h"
 #include "trace/control.h"
+#include "crypto/init.h"
 
 #define CMD_NOFILE_OK   0x01
 
@@ -443,6 +444,11 @@ int main(int argc, char **argv)
 progname = basename(argv[0]);
 qemu_init_exec_dir(argv[0]);
 
+if (qcrypto_init(_error) < 0) {
+error_reportf_err(local_error, "cannot initialize crypto: ");
+exit(1);
+}
+
 module_call_init(MODULE_INIT_QOM);
 qemu_add_opts(_object_opts);
 bdrv_init();
diff --git a/qemu-nbd.c b/qemu-nbd.c
index ca4a724..fa91ca3 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -31,6 +31,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qom/object_interfaces.h"
 #include "io/channel-socket.h"
+#include "crypto/init.h"
 
 #include 
 #include 
@@ -518,6 +519,12 @@ int main(int argc, char **argv)
 memset(_sigterm, 0, sizeof(sa_sigterm));
 sa_sigterm.sa_handler = termsig_handler;
 sigaction(SIGTERM, _sigterm, NULL);
+
+if (qcrypto_init(_err) < 0) {
+error_reportf_err(local_err, "cannot initialize crypto: ");
+exit(1);
+}
+
 module_call_init(MODULE_INIT_QOM);
 qemu_add_opts(_object_opts);
 qemu_init_exec_dir(argv[0]);
-- 
2.5.5

Re: [Qemu-block] [PATCH for-2.6 2/2] block/gluster: prevent data loss after i/o error

2016-04-06 Thread Kevin Wolf

[ Adding some CCs ]

Am 06.04.2016 um 05:29 hat Jeff Cody geschrieben:
> Upon receiving an I/O error after an fsync, by default gluster will
> dump its cache.  However, QEMU will retry the fsync, which is especially
> useful when encountering errors such as ENOSPC when using the werror=stop
> option.  When using caching with gluster, however, the last written data
> will be lost upon encountering ENOSPC.  Using the cache xlator option of
> 'resync-failed-syncs-after-fsync' should cause gluster to retain the
> cached data after a failed fsync, so that ENOSPC and other transient
> errors are recoverable.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/gluster.c | 27 +++
>  configure   |  8 
>  2 files changed, 35 insertions(+)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 30a827e..b1cf71b 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -330,6 +330,23 @@ static int qemu_gluster_open(BlockDriverState *bs,  
> QDict *options,
>  goto out;
>  }
>  
> +#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
> +/* Without this, if fsync fails for a recoverable reason (for instance,
> + * ENOSPC), gluster will dump its cache, preventing retries.  This means
> + * almost certain data loss.  Not all gluster versions support the
> + * 'resync-failed-syncs-after-fsync' key value, but there is no way to
> + * discover during runtime if it is supported (this api returns success 
> for
> + * unknown key/value pairs) */

Honestly, this sucks. There is apparently no way to operate gluster so
we can safely recover after a failed fsync. "We hope everything is fine,
but depending on your gluster version, we may now corrupt your image"
isn't very good.

We need to consider very carefully if this is good enough to go on after
an error. I'm currently leaning towards "no". That is, we should only
enable this after Gluster provides us a way to make sure that the option
is really set.

> +ret = glfs_set_xlator_option (s->glfs, "*-write-behind",
> +   "resync-failed-syncs-after-fsync",
> +   "on");
> +if (ret < 0) {
> +error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
> +ret = -errno;
> +goto out;
> +}
> +#endif

We also need to consider the case without CONFIG_GLUSTERFS_XLATOR_OPT.
In this case (as well as theoretically in the case that the option
didn't take effect - if only we could know about it), a failed
glfs_fsync_async() is fatal and we need to stop operating on the image,
i.e. set bs->drv = NULL like when we detect corruption in qcow2 images.
The guest will see a broken disk that fails all I/O requests, but that's
better than corrupting data.

Kevin

Re: [Qemu-block] [PATCH] qemu-img: fix formatting of error message

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 11:16 hat Daniel P. Berrange geschrieben:
> The error_reportf_err() will not automatically append a
> ': ' before adding its suffix, so we must include that
> in the message we pass it, otherwise we get a badly
> formatted message lacking whitespace:
> 
> qemu-img: Could not open 
> 'driver=nbd,host=127.0.0.1,port=,tls-creds=tls0'Failed to connect socket: 
> Connection refused
> 
> Signed-off-by: Daniel P. Berrange 

Thanks, applied to the block branch.

Kevin

Re: [Qemu-block] [PATCH for-2.6] block: Forbid I/O throttling on nodes with multiple parents for 2.6

2016-04-06 Thread Kevin Wolf

Am 06.04.2016 um 11:45 hat Alberto Garcia geschrieben:
> On Mon 04 Apr 2016 05:26:02 PM CEST, Kevin Wolf wrote:
> > The problem exists whenever a BDS has more users than just its BB, for
> > example it is used as a backing file for another node.
> 
> The code seems correct, but I don't understand this use case.

Let me draw some ASCII art. I'm drawing I/O throttling like a filter to
make the difference clear; in reality it's part of the BDS now or of the
BB in 2.7, so don't let this confuse you.

Let's assume these command line options:

-drive file=backimg.img,if=none,id=backing,iops=1024
-drive file=overlay.qcow2,id=overlay,backing=backing

If we didn't catch this configuration and error out, this is what we
would end up with in 2.6:

[BB] overlay
 |
 |
[BDS] qcow2 [BB] backing
 ||
 ++
 .|
 . I/O throttling
 .|
[BDS] raw
  .
  .
  .

After moving I/O throttling to the BDS level in 2.7, the resulting
configuration would look different:

[BB] overlay
 |
 |
[BDS] qcow2 [BB] backing
 ||
 ||
 | I/O throttling
 ||
 ++
 .  [BDS] raw
 ..
 ..
  .

As you can see, requests coming from the overlay image wouldn't be
throttled any more. If we allowed this set of options now, we would have
an API breakage in 2.7, which we don't want. That's why I want to
disable it for the 2.6 release.

Kevin

Re: [Qemu-block] [PATCH for-2.6] block: Forbid I/O throttling on nodes with multiple parents for 2.6

2016-04-06 Thread Alberto Garcia

On Mon 04 Apr 2016 05:26:02 PM CEST, Kevin Wolf wrote:
> The problem exists whenever a BDS has more users than just its BB, for
> example it is used as a backing file for another node.

The code seems correct, but I don't understand this use case.

Berto

[Qemu-block] [PATCH] qemu-img: fix formatting of error message

2016-04-06 Thread Daniel P. Berrange

The error_reportf_err() will not automatically append a
': ' before adding its suffix, so we must include that
in the message we pass it, otherwise we get a badly
formatted message lacking whitespace:

qemu-img: Could not open 
'driver=nbd,host=127.0.0.1,port=,tls-creds=tls0'Failed to connect socket: 
Connection refused

Signed-off-by: Daniel P. Berrange 
---
 qemu-img.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 06264d9..17c5cfd 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -256,7 +256,7 @@ static BlockBackend *img_open_opts(const char *optstr,
 options = qemu_opts_to_qdict(opts, NULL);
 blk = blk_new_open(NULL, NULL, options, flags, _err);
 if (!blk) {
-error_reportf_err(local_err, "Could not open '%s'", optstr);
+error_reportf_err(local_err, "Could not open '%s': ", optstr);
 return NULL;
 }
 blk_set_enable_write_cache(blk, !writethrough);
-- 
2.5.5

Re: [Qemu-block] Possible bug: virtio-scsi + iothread (Former x-data-plane) = "Guest moved index from 0 to 61440" warning

2016-04-06 Thread Stefan Hajnoczi

On Tue, Apr 05, 2016 at 07:49:47PM -0300, Zir Blazer wrote:
> The bare minimum script that produces the warning, should be this:
> #!/bin/bash
> qemu-system-x86_64 \-m 1024M \-enable-kvm \-object iothread,id=iothread0 
> \-device virtio-scsi-pci,iothread=iothread0 \
> This produces the "Guest moved index from 0 to 61440" warning.

It does not produce a warning with qemu.git/master.  I see the BIOS
screen.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH v12 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-04-06 Thread Wen Congyang

On 04/01/2016 11:20 PM, Max Reitz wrote:
> On 31.03.2016 13:42, Alberto Garcia wrote:
>> On Wed 30 Mar 2016 05:07:15 PM CEST, Max Reitz wrote:
 I also have another (not directly related) question: why not simply
 use the node name when removing children? I understood that the idea
 was that it's possible to have the same node attached twice to the
 same Quorum, but can you actually do that? And what's the use case?
>>>
>>> What I like about using the child role name is that it automatically
>>> prevents you from specifying a node that is not a child of the given
>>> parent.
>>
>> Right, but checking if a node is not a child and returning an error is
>> very simple. And it doesn't require the user to keep track of the node
>> name *and* the child role name.
> 
> Yes. But I think that you need to know parent and child anyway if you
> want to modify (delete) an edge in the graph.
> 
> Also, it may be possible to have multiple parents per node. Actually, it
> is already possible because the BB-BDS relationship is modeled as a
> parent-child relationship. Thus, I'm not sure whether it would be
> sufficient to specify a single node if you want to delete a single edge.
> 
>> Unless I'm forgetting something this would be the first time we expose
>> the child role name in the API, that's why I'm wondering if it's
>> something worth doing.
> 
> Well, the roles are kind of exposed already. It's exactly what you
> specify in -drive or blockdev-add.
> 
>>> Which makes me notice that it might be a good idea to require the user
>>> to specify the child's role when adding a new child. In this version
>>> of this series (where only quorum is supported), the children are just
>>> inserted in numerical order (first free slot is taken first), but
>>> maybe the user wants to insert them in a different order.
>>
>> For the Quorum case it totally makes sense to let the user choose the
>> position of the new child.
>>
>> But for creating a Quorum array in the first place we don't require
>> that, the order is the one that the user provides, and the user does not
>> need to know about the child role names at that point.
> 
> Depends. If you create an empty quorum BDS and then add the children
> using the QAPI command introduced in this series, you are right. But if
> you add children along with creating the quorum BDS (be it via -drive or
> via blockdev-add), one has to specify the child role names.

I think the problem is that: the child role name is wrong.

If we always attach the child in the tail, we can do it like this:
the child role name is children.XXX, and the XXX's value is larger than
any child role name's XXX.

For example:
Quorum has one child: children.1(children.0 is removed)
We add a new child, its role name is children.2, not children.0.

If we want to attach the child not in the tail, for example:
Quorum has two children: children.0, children.1. And the new child should
be before children.1. In this case, we should rename children.1 to children.2
and the new child role name can be children.1. If we allow such usage, we
should rename the other child role name when add/deleting a child. It means
that we should query the role name again after add/deleting a child.

Thanks
Wen Congyang

> 
> Max
>

65 matches

Mail list logo