[Qemu-devel] [PATCH V3 2/2] block/nfs: fix naming of runtime opts

2017-02-01 Thread Peter Lieven
commit 94d6a7a accidently left the naming of runtime opts and QAPI
scheme inconsistent. As one consequence passing of parameters in the
URI is broken. Sync the naming of the runtime opts to the QAPI
scheme.

Please note that this is technically backwards incompatible with the 2.8
release, but the 2.8 release is the only version that had the wrong naming.
Furthermore release 2.8 suffered from a NULL pointer dereference during
URI parsing.

Fixes: 94d6a7a76e9df9919629428f6c598e2b97d9426c
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 block/nfs.c | 46 +++---
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index baaecff..689eaa7 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -359,27 +359,27 @@ static QemuOptsList runtime_opts = {
 .help = "Path of the image on the host",
 },
 {
-.name = "uid",
+.name = "user",
 .type = QEMU_OPT_NUMBER,
 .help = "UID value to use when talking to the server",
 },
 {
-.name = "gid",
+.name = "group",
 .type = QEMU_OPT_NUMBER,
 .help = "GID value to use when talking to the server",
 },
 {
-.name = "tcp-syncnt",
+.name = "tcp-syn-count",
 .type = QEMU_OPT_NUMBER,
 .help = "Number of SYNs to send during the session establish",
 },
 {
-.name = "readahead",
+.name = "readahead-size",
 .type = QEMU_OPT_NUMBER,
 .help = "Set the readahead size in bytes",
 },
 {
-.name = "pagecache",
+.name = "page-cache-size",
 .type = QEMU_OPT_NUMBER,
 .help = "Set the pagecache size in bytes",
 },
@@ -508,29 +508,29 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 goto fail;
 }
 
-if (qemu_opt_get(opts, "uid")) {
-client->uid = qemu_opt_get_number(opts, "uid", 0);
+if (qemu_opt_get(opts, "user")) {
+client->uid = qemu_opt_get_number(opts, "user", 0);
 nfs_set_uid(client->context, client->uid);
 }
 
-if (qemu_opt_get(opts, "gid")) {
-client->gid = qemu_opt_get_number(opts, "gid", 0);
+if (qemu_opt_get(opts, "group")) {
+client->gid = qemu_opt_get_number(opts, "group", 0);
 nfs_set_gid(client->context, client->gid);
 }
 
-if (qemu_opt_get(opts, "tcp-syncnt")) {
-client->tcp_syncnt = qemu_opt_get_number(opts, "tcp-syncnt", 0);
+if (qemu_opt_get(opts, "tcp-syn-count")) {
+client->tcp_syncnt = qemu_opt_get_number(opts, "tcp-syn-count", 0);
 nfs_set_tcp_syncnt(client->context, client->tcp_syncnt);
 }
 
 #ifdef LIBNFS_FEATURE_READAHEAD
-if (qemu_opt_get(opts, "readahead")) {
+if (qemu_opt_get(opts, "readahead-size")) {
 if (open_flags & BDRV_O_NOCACHE) {
 error_setg(errp, "Cannot enable NFS readahead "
  "if cache.direct = on");
 goto fail;
 }
-client->readahead = qemu_opt_get_number(opts, "readahead", 0);
+client->readahead = qemu_opt_get_number(opts, "readahead-size", 0);
 if (client->readahead > QEMU_NFS_MAX_READAHEAD_SIZE) {
 error_report("NFS Warning: Truncating NFS readahead "
  "size to %d", QEMU_NFS_MAX_READAHEAD_SIZE);
@@ -545,13 +545,13 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 #endif
 
 #ifdef LIBNFS_FEATURE_PAGECACHE
-if (qemu_opt_get(opts, "pagecache")) {
+if (qemu_opt_get(opts, "page-cache-size")) {
 if (open_flags & BDRV_O_NOCACHE) {
 error_setg(errp, "Cannot enable NFS pagecache "
  "if cache.direct = on");
 goto fail;
 }
-client->pagecache = qemu_opt_get_number(opts, "pagecache", 0);
+client->pagecache = qemu_opt_get_number(opts, "page-cache-size", 0);
 if (client->pagecache > QEMU_NFS_MAX_PAGECACHE_SIZE) {
 error_report("NFS Warning: Truncating NFS pagecache "
  "size to %d pages", QEMU_NFS_MAX_PAGECACHE_SIZE);
@@ -804,22 +804,22 @@ static void nfs_refresh_filename(BlockDriverState *bs, 
QDict *options)
 qdict_put(opts, "path", qstring_from_str(client->path));
 
 if (client->uid) {
-

Re: [Qemu-devel] [PATCH V2 2/2] block/nfs: fix naming of runtime opts

2017-02-01 Thread Peter Lieven
Am 01.02.2017 um 02:06 schrieb Max Reitz:
> On 31.01.2017 16:56, Peter Lieven wrote:
>> commit 94d6a7a accidently left the naming of runtime opts and QAPI
>> scheme inconsistent. As one consequence passing of parameters in the
>> URI is broken. Sync the naming of the runtime opts to the QAPI
>> scheme.
>>
>> Please note that this is technically backwards incompatible with the 2.8
>> release, but the 2.8 release is the only version that had the wrong naming.
>> Furthermore release 2.8 suffered from a NULL pointer deference during
>> URI parsing.
> I always love things like this. "This technically breaks X, but
> obviously nobody used X, because X always crashed." Reminds me of how we
> success^W accidentally removed qcow2 encryption.
>
> Technically however this also changes the runtime parameter names, I
> think. Giving them probably did work in 2.8, so it is not compatible
> there. I don't mind very much, though, and you're the maintainer, so
> it's fine with me.
>
> Anyway, I think this patch should also update nfs_refresh_filename().

Thanks, that indeed is missing. Will send a V3.

Peter



[Qemu-devel] [PATCH V2 2/2] block/nfs: fix naming of runtime opts

2017-01-31 Thread Peter Lieven
commit 94d6a7a accidently left the naming of runtime opts and QAPI
scheme inconsistent. As one consequence passing of parameters in the
URI is broken. Sync the naming of the runtime opts to the QAPI
scheme.

Please note that this is technically backwards incompatible with the 2.8
release, but the 2.8 release is the only version that had the wrong naming.
Furthermore release 2.8 suffered from a NULL pointer deference during
URI parsing.

Fixes: 94d6a7a76e9df9919629428f6c598e2b97d9426c
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 block/nfs.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index baaecff..464d547 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -359,27 +359,27 @@ static QemuOptsList runtime_opts = {
 .help = "Path of the image on the host",
 },
 {
-.name = "uid",
+.name = "user",
 .type = QEMU_OPT_NUMBER,
 .help = "UID value to use when talking to the server",
 },
 {
-.name = "gid",
+.name = "group",
 .type = QEMU_OPT_NUMBER,
 .help = "GID value to use when talking to the server",
 },
 {
-.name = "tcp-syncnt",
+.name = "tcp-syn-count",
 .type = QEMU_OPT_NUMBER,
 .help = "Number of SYNs to send during the session establish",
 },
 {
-.name = "readahead",
+.name = "readahead-size",
 .type = QEMU_OPT_NUMBER,
 .help = "Set the readahead size in bytes",
 },
 {
-.name = "pagecache",
+.name = "page-cache-size",
 .type = QEMU_OPT_NUMBER,
 .help = "Set the pagecache size in bytes",
 },
@@ -508,29 +508,29 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 goto fail;
 }
 
-if (qemu_opt_get(opts, "uid")) {
-client->uid = qemu_opt_get_number(opts, "uid", 0);
+if (qemu_opt_get(opts, "user")) {
+client->uid = qemu_opt_get_number(opts, "user", 0);
 nfs_set_uid(client->context, client->uid);
 }
 
-if (qemu_opt_get(opts, "gid")) {
-client->gid = qemu_opt_get_number(opts, "gid", 0);
+if (qemu_opt_get(opts, "group")) {
+client->gid = qemu_opt_get_number(opts, "group", 0);
 nfs_set_gid(client->context, client->gid);
 }
 
-if (qemu_opt_get(opts, "tcp-syncnt")) {
-client->tcp_syncnt = qemu_opt_get_number(opts, "tcp-syncnt", 0);
+if (qemu_opt_get(opts, "tcp-syn-count")) {
+client->tcp_syncnt = qemu_opt_get_number(opts, "tcp-syn-count", 0);
 nfs_set_tcp_syncnt(client->context, client->tcp_syncnt);
 }
 
 #ifdef LIBNFS_FEATURE_READAHEAD
-if (qemu_opt_get(opts, "readahead")) {
+if (qemu_opt_get(opts, "readahead-size")) {
 if (open_flags & BDRV_O_NOCACHE) {
 error_setg(errp, "Cannot enable NFS readahead "
  "if cache.direct = on");
 goto fail;
 }
-client->readahead = qemu_opt_get_number(opts, "readahead", 0);
+client->readahead = qemu_opt_get_number(opts, "readahead-size", 0);
 if (client->readahead > QEMU_NFS_MAX_READAHEAD_SIZE) {
 error_report("NFS Warning: Truncating NFS readahead "
  "size to %d", QEMU_NFS_MAX_READAHEAD_SIZE);
@@ -545,13 +545,13 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 #endif
 
 #ifdef LIBNFS_FEATURE_PAGECACHE
-if (qemu_opt_get(opts, "pagecache")) {
+if (qemu_opt_get(opts, "page-cache-size")) {
 if (open_flags & BDRV_O_NOCACHE) {
 error_setg(errp, "Cannot enable NFS pagecache "
  "if cache.direct = on");
 goto fail;
 }
-client->pagecache = qemu_opt_get_number(opts, "pagecache", 0);
+client->pagecache = qemu_opt_get_number(opts, "page-cache-size", 0);
 if (client->pagecache > QEMU_NFS_MAX_PAGECACHE_SIZE) {
 error_report("NFS Warning: Truncating NFS pagecache "
  "size to %d pages", QEMU_NFS_MAX_PAGECACHE_SIZE);
-- 
1.9.1




[Qemu-devel] [PATCH V2 0/2] fix segfault and naming of runtime opts

2017-01-31 Thread Peter Lieven
commit 94d6a7a accidently left the naming of runtime opts and QAPI
scheme inconsistent. Furthermore a NULL pointer dereference resulted
in a segfault when parsing URI parameters.

v1->v2: Patch 1: fixed type in commit msg [Eric]
Patch 2: noted backwards incompatiblity with 2.8 [Eric]

Peter Lieven (2):
  block/nfs: fix NULL pointer dereference in URI parsing
  block/nfs: fix naming of runtime opts

 block/nfs.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH V2 1/2] block/nfs: fix NULL pointer dereference in URI parsing

2017-01-31 Thread Peter Lieven
parse_uint_full wants to put the parsed value into the
variable passed via its second argument which is NULL.

Fixes: 94d6a7a76e9df9919629428f6c598e2b97d9426c
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 block/nfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/nfs.c b/block/nfs.c
index a564340..baaecff 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -108,12 +108,13 @@ static int nfs_parse_uri(const char *filename, QDict 
*options, Error **errp)
 qdict_put(options, "path", qstring_from_str(uri->path));
 
 for (i = 0; i < qp->n; i++) {
+unsigned long long val;
 if (!qp->p[i].value) {
 error_setg(errp, "Value for NFS parameter expected: %s",
qp->p[i].name);
 goto out;
 }
-if (parse_uint_full(qp->p[i].value, NULL, 0)) {
+if (parse_uint_full(qp->p[i].value, , 0)) {
 error_setg(errp, "Illegal value for NFS parameter: %s",
qp->p[i].name);
 goto out;
-- 
1.9.1




[Qemu-devel] [PATCH] qga: ignore EBUSY when freezing a filesystem

2017-01-31 Thread Peter Lieven
the current implementation fails if we try to freeze an
already frozen filesystem. This can happen if a filesystem
is mounted more than once (e.g. with a bind mount).

Suggested-by: Christian Theune <c...@flyingcircus.io>
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 qga/commands-posix.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index ea37c09..73d93eb 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1243,6 +1243,9 @@ int64_t qmp_guest_fsfreeze_freeze_list(bool 
has_mountpoints,
  * filesystems may not implement fsfreeze for less obvious reasons.
  * these will report EOPNOTSUPP. we simply ignore these when tallying
  * the number of frozen filesystems.
+ * if a filesystem is mounted more than once (aka bind mount) a
+ * consecutive attempt to freeze an already frozen filesystem will
+ * return EBUSY.
  *
  * any other error means a failure to freeze a filesystem we
  * expect to be freezable, so return an error in those cases
@@ -1250,7 +1253,7 @@ int64_t qmp_guest_fsfreeze_freeze_list(bool 
has_mountpoints,
  */
 ret = ioctl(fd, FIFREEZE);
 if (ret == -1) {
-if (errno != EOPNOTSUPP) {
+if (errno != EOPNOTSUPP && errno != EBUSY) {
 error_setg_errno(errp, errno, "failed to freeze %s",
  mount->dirname);
 close(fd);
-- 
1.9.1




[Qemu-devel] [PATCH] block/iscsi: statically link qemu_iscsi_opts

2017-01-24 Thread Peter Lieven
commit f57b4b5f moved qemu_iscsi_opts into vl.c. This
made them invisible for qemu-img, qemu-nbd etc.

Fixes: f57b4b5fb127b60e1aade2684a8b16bc4f630b29
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 MAINTAINERS |  1 +
 block/Makefile.objs |  1 +
 block/iscsi-opts.c  | 71 +
 vl.c| 40 --
 4 files changed, 73 insertions(+), 40 deletions(-)
 create mode 100644 block/iscsi-opts.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 60b0b09..dcf6c4c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1632,6 +1632,7 @@ M: Peter Lieven <p...@kamp.de>
 L: qemu-bl...@nongnu.org
 S: Supported
 F: block/iscsi.c
+F: block/iscsi-opts.c
 
 NFS
 M: Jeff Cody <jc...@redhat.com>
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 0b8fd06..c6bd14e 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -14,6 +14,7 @@ block-obj-y += throttle-groups.o
 
 block-obj-y += nbd.o nbd-client.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
+block-obj-$(if $(CONFIG_LIBISCSI),y,n) += iscsi-opts.o
 block-obj-$(CONFIG_LIBNFS) += nfs.o
 block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
diff --git a/block/iscsi-opts.c b/block/iscsi-opts.c
new file mode 100644
index 000..c8dbfe0
--- /dev/null
+++ b/block/iscsi-opts.c
@@ -0,0 +1,71 @@
+/*
+ * QEMU Block driver for iSCSI images (static options)
+ *
+ * Copyright (c) 2017 Peter Lieven <p...@kamp.de>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/config-file.h"
+
+#ifdef CONFIG_LIBISCSI
+static QemuOptsList qemu_iscsi_opts = {
+.name = "iscsi",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_iscsi_opts.head),
+.desc = {
+{
+.name = "user",
+.type = QEMU_OPT_STRING,
+.help = "username for CHAP authentication to target",
+},{
+.name = "password",
+.type = QEMU_OPT_STRING,
+.help = "password for CHAP authentication to target",
+},{
+.name = "password-secret",
+.type = QEMU_OPT_STRING,
+.help = "ID of the secret providing password for CHAP "
+"authentication to target",
+},{
+.name = "header-digest",
+.type = QEMU_OPT_STRING,
+.help = "HeaderDigest setting. "
+"{CRC32C|CRC32C-NONE|NONE-CRC32C|NONE}",
+},{
+.name = "initiator-name",
+.type = QEMU_OPT_STRING,
+.help = "Initiator iqn name to use when connecting",
+},{
+.name = "timeout",
+.type = QEMU_OPT_NUMBER,
+.help = "Request timeout in seconds (default 0 = no timeout)",
+},
+{ /* end of list */ }
+},
+};
+
+static void iscsi_block_opts_init(void)
+{
+qemu_add_opts(_iscsi_opts);
+}
+
+block_init(iscsi_block_opts_init);
+#endif
diff --git a/vl.c b/vl.c
index a260f30..1390b54 100644
--- a/vl.c
+++ b/vl.c
@@ -511,43 +511,6 @@ static QemuOptsList qemu_fw_cfg_opts = {
 },
 };
 
-#ifdef CONFIG_LIBISCSI
-static QemuOptsList qemu_iscsi_opts = {
-.name = "iscsi",
-.head = QTAILQ_HEAD_INITIALIZER(qemu_iscsi_opts.head),
-.desc = {
-{
-.name = "user",
-.type = QEMU_OPT_STRING,
-.help = "username for CHAP authentication to target",
-},{
-.name = "password",
-.type = QEMU_OPT_STRING,
-.help = "password for CHAP authentication to target",
-},{
-.name = "password-secret",
-  

Re: [Qemu-devel] [PATCH] block/iscsi: statically link qemu_iscsi_opts

2017-01-24 Thread Peter Lieven

Am 24.01.2017 um 13:52 schrieb Paolo Bonzini:


On 24/01/2017 13:49, Peter Lieven wrote:

+#ifdef CONFIG_LIBISCSI

The #ifdef is not needed here.


Right, can you drop it or shall I send a V2?

Peter



Paolo


+static QemuOptsList qemu_iscsi_opts = {
+.name = "iscsi",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_iscsi_opts.head),
+.desc = {
+{
+.name = "user",
+.type = QEMU_OPT_STRING,
+.help = "username for CHAP authentication to target",
+},{
+.name = "password",
+.type = QEMU_OPT_STRING,
+.help = "password for CHAP authentication to target",
+},{
+.name = "password-secret",
+.type = QEMU_OPT_STRING,
+.help = "ID of the secret providing password for CHAP "
+"authentication to target",
+},{
+.name = "header-digest",
+.type = QEMU_OPT_STRING,
+.help = "HeaderDigest setting. "
+"{CRC32C|CRC32C-NONE|NONE-CRC32C|NONE}",
+},{
+.name = "initiator-name",
+.type = QEMU_OPT_STRING,
+.help = "Initiator iqn name to use when connecting",
+},{
+.name = "timeout",
+.type = QEMU_OPT_NUMBER,
+.help = "Request timeout in seconds (default 0 = no timeout)",
+},
+{ /* end of list */ }
+},
+};
+
+static void iscsi_block_opts_init(void)
+{
+qemu_add_opts(_iscsi_opts);
+}
+
+block_init(iscsi_block_opts_init);
+#endif



--

Mit freundlichen Grüßen

Peter Lieven

...

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...




Re: [Qemu-devel] qemu-img: complains about missing iscsi option group

2017-01-24 Thread Peter Lieven

Am 24.01.2017 um 12:24 schrieb Paolo Bonzini:


On 19/01/2017 18:03, Daniel P. Berrange wrote:

Move the iscsi option registration out of vl.c and back into a shared file
in block/ such that all programs see the options, not just the emulators.

Any preference which file? It seems no other block driver registers options.

ISCSI is somewhat bizarre in that it has a separate global -iscsi arg
instead of just accepting all args via the -drive spec like every other
block driver does, so there's no precedent to follow.

So suggest we just need a block/iscsi-global.c to deal with this.

Why not block/iscsi.c itself?


As far as I understand block/iscsi is now compiled as a shared object.

Peter




Re: [Qemu-devel] qemu-img: complains about missing iscsi option group

2017-01-24 Thread Peter Lieven

Am 24.01.2017 um 12:29 schrieb Paolo Bonzini:


On 24/01/2017 12:25, Peter Lieven wrote:

Am 24.01.2017 um 12:24 schrieb Paolo Bonzini:

On 19/01/2017 18:03, Daniel P. Berrange wrote:

Move the iscsi option registration out of vl.c and back into a
shared file
in block/ such that all programs see the options, not just the
emulators.

Any preference which file? It seems no other block driver registers
options.

ISCSI is somewhat bizarre in that it has a separate global -iscsi arg
instead of just accepting all args via the -drive spec like every other
block driver does, so there's no precedent to follow.

So suggest we just need a block/iscsi-global.c to deal with this.

Why not block/iscsi.c itself?

As far as I understand block/iscsi is now compiled as a shared object.

Uh, right. :(


Okay, so a new file iscsi-static.c ?

Peter



Re: [Qemu-devel] [PATCH V2] qemu-img: optimize is_allocated_sectors_min

2017-01-23 Thread Peter Lieven

Am 21.01.2017 um 20:58 schrieb Max Reitz:

On 19.01.2017 17:35, Peter Lieven wrote:

the current implementation always splits requests if a buffer
begins or ends with zeroes independent of the length of the
zero area. Change this to really only split off zero areas
that have at least a length of 'min' bytes.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
  qemu-img.c | 44 ++--
  1 file changed, 14 insertions(+), 30 deletions(-)

That is because the idea is that you should delay the write operation as
much as possible. Say you have some empty space at the beginning of the
buffer and then some used space. Of course you should then only start
writing at the point where you actually have to write something. The
same goes for the end of the buffer. Why write zeroes when you can just
not do it?

On the other hand, it makes sense to not split a single write operation
into two just because there is some empty space in the middle. This is
because just writing that empty space may take less time than issuing
two operations, especially if that empty space is e.g. in the middle of
a qcow2 cluster anyway.

This is why empty space in the middle is treated differently than empty
space at the beginning or the end of the buffer.

Do you have a reason for changing this other than because it simplifies
the code? Which it does, admittedly, but I think the code does have a
reason for being how it is.


Hi Max,

thanks for looking into this. I was wrong about my analysis what was
going on with a problem I tried to track down. After your explanation
I think the current implementation is fine and it makes perfect sense
to cut off leading and trailing zeroes.

Peter




[Qemu-devel] [PATCH 2/2] block/nfs: fix naming of runtime opts

2017-01-20 Thread Peter Lieven
commit 94d6a7a accidently left the naming of runtime opts and QAPI
scheme inconsistent. As one consequence passing of parameters in the
URI is broken. Sync the naming of the runtime opts to the QAPI
scheme.

Fixes: 94d6a7a76e9df9919629428f6c598e2b97d9426c
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 block/nfs.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index baaecff..464d547 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -359,27 +359,27 @@ static QemuOptsList runtime_opts = {
 .help = "Path of the image on the host",
 },
 {
-.name = "uid",
+.name = "user",
 .type = QEMU_OPT_NUMBER,
 .help = "UID value to use when talking to the server",
 },
 {
-.name = "gid",
+.name = "group",
 .type = QEMU_OPT_NUMBER,
 .help = "GID value to use when talking to the server",
 },
 {
-.name = "tcp-syncnt",
+.name = "tcp-syn-count",
 .type = QEMU_OPT_NUMBER,
 .help = "Number of SYNs to send during the session establish",
 },
 {
-.name = "readahead",
+.name = "readahead-size",
 .type = QEMU_OPT_NUMBER,
 .help = "Set the readahead size in bytes",
 },
 {
-.name = "pagecache",
+.name = "page-cache-size",
 .type = QEMU_OPT_NUMBER,
 .help = "Set the pagecache size in bytes",
 },
@@ -508,29 +508,29 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 goto fail;
 }
 
-if (qemu_opt_get(opts, "uid")) {
-client->uid = qemu_opt_get_number(opts, "uid", 0);
+if (qemu_opt_get(opts, "user")) {
+client->uid = qemu_opt_get_number(opts, "user", 0);
 nfs_set_uid(client->context, client->uid);
 }
 
-if (qemu_opt_get(opts, "gid")) {
-client->gid = qemu_opt_get_number(opts, "gid", 0);
+if (qemu_opt_get(opts, "group")) {
+client->gid = qemu_opt_get_number(opts, "group", 0);
 nfs_set_gid(client->context, client->gid);
 }
 
-if (qemu_opt_get(opts, "tcp-syncnt")) {
-client->tcp_syncnt = qemu_opt_get_number(opts, "tcp-syncnt", 0);
+if (qemu_opt_get(opts, "tcp-syn-count")) {
+client->tcp_syncnt = qemu_opt_get_number(opts, "tcp-syn-count", 0);
 nfs_set_tcp_syncnt(client->context, client->tcp_syncnt);
 }
 
 #ifdef LIBNFS_FEATURE_READAHEAD
-if (qemu_opt_get(opts, "readahead")) {
+if (qemu_opt_get(opts, "readahead-size")) {
 if (open_flags & BDRV_O_NOCACHE) {
 error_setg(errp, "Cannot enable NFS readahead "
  "if cache.direct = on");
 goto fail;
 }
-client->readahead = qemu_opt_get_number(opts, "readahead", 0);
+client->readahead = qemu_opt_get_number(opts, "readahead-size", 0);
 if (client->readahead > QEMU_NFS_MAX_READAHEAD_SIZE) {
 error_report("NFS Warning: Truncating NFS readahead "
  "size to %d", QEMU_NFS_MAX_READAHEAD_SIZE);
@@ -545,13 +545,13 @@ static int64_t nfs_client_open(NFSClient *client, QDict 
*options,
 #endif
 
 #ifdef LIBNFS_FEATURE_PAGECACHE
-if (qemu_opt_get(opts, "pagecache")) {
+if (qemu_opt_get(opts, "page-cache-size")) {
 if (open_flags & BDRV_O_NOCACHE) {
 error_setg(errp, "Cannot enable NFS pagecache "
  "if cache.direct = on");
 goto fail;
 }
-client->pagecache = qemu_opt_get_number(opts, "pagecache", 0);
+client->pagecache = qemu_opt_get_number(opts, "page-cache-size", 0);
 if (client->pagecache > QEMU_NFS_MAX_PAGECACHE_SIZE) {
 error_report("NFS Warning: Truncating NFS pagecache "
  "size to %d pages", QEMU_NFS_MAX_PAGECACHE_SIZE);
-- 
1.9.1




[Qemu-devel] [PATCH 1/2] block/nfs: fix NULL pointer dereference in URI parsing

2017-01-20 Thread Peter Lieven
parse_uint_full wants to put the parsed value into the
variabled passed via its second argument which is NULL.

Fixes: 94d6a7a76e9df9919629428f6c598e2b97d9426c
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 block/nfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/nfs.c b/block/nfs.c
index a564340..baaecff 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -108,12 +108,13 @@ static int nfs_parse_uri(const char *filename, QDict 
*options, Error **errp)
 qdict_put(options, "path", qstring_from_str(uri->path));
 
 for (i = 0; i < qp->n; i++) {
+unsigned long long val;
 if (!qp->p[i].value) {
 error_setg(errp, "Value for NFS parameter expected: %s",
qp->p[i].name);
 goto out;
 }
-if (parse_uint_full(qp->p[i].value, NULL, 0)) {
+if (parse_uint_full(qp->p[i].value, , 0)) {
 error_setg(errp, "Illegal value for NFS parameter: %s",
qp->p[i].name);
 goto out;
-- 
1.9.1




[Qemu-devel] [PATCH 0/2] block/nfs: fix segfault and naming of runtime opts

2017-01-20 Thread Peter Lieven
commit 94d6a7a accidently left the naming of runtime opts and QAPI
scheme inconsistent. Furthermore a NULL pointer dereference resulted
in a segfault when parsing URI parameters.

Peter Lieven (2):
  block/nfs: fix NULL pointer dereference in URI parsing
  block/nfs: fix naming of runtime opts

 block/nfs.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

-- 
1.9.1




Re: [Qemu-devel] [PATCH v6 0/2] allow blockdev-add for NFS

2017-01-19 Thread Peter Lieven

Am 19.01.2017 um 18:08 schrieb Kevin Wolf:

Am 19.01.2017 um 16:58 hat Peter Lieven geschrieben:

Am 19.01.2017 um 16:55 schrieb Kevin Wolf:

Am 19.01.2017 um 16:44 hat Peter Lieven geschrieben:

Am 19.01.2017 um 16:42 schrieb Kevin Wolf:

Am 19.01.2017 um 16:34 hat Peter Lieven geschrieben:

Am 19.01.2017 um 16:20 schrieb Kevin Wolf:

Am 19.01.2017 um 15:59 hat Eric Blake geschrieben:

On 01/19/2017 08:30 AM, Peter Lieven wrote:

qemu-img: Could not open
'nfs://172.21.200.61/vcore-dev-cdrom/templates/VC_debian8-20170116.qcow2,linux?readahead=131072':
Block protocol 'nfs' doesn't support the option 'readahead-size'

Please let me know if the below fix would be correct:

No, this needs to be fixed the other way round: runtime_opts must use
the names as specified in the schema, and nfs_client_open() must access
them as such. Without that, blockdev-add can't work (and the command
line only with the "wrong" old option names from the URL, whereas it
should be using the same names as the QAPI schema).

Shouldn't we support both for backwards compatiblity.?

blockdev-add only needs to support the modern naming.  But yes,
preserving back-compat spelling of the command-line spellings, as well
as matching blockdev-add spellings, is desirable.

We only just added the individual command line options, previously it
only supported the URL.

It's true that we have the messed up version of the options in 2.8, so
strictly speaking we would break compatibility with a release, but it's
only one release, it's only the nfs driver, and the documentation of the
options is the schema, which had the right option names even in 2.8
(they just didn't work).

So I wouldn't feel bad about removing the wrong names in this specific
case.

So want exactly do you want to do? Fix the names in the QAPI schema
to use the old naming?

No, fix the command line to use the names in the QAPI schema.

The option names from the URL were never supposed to be supported on the
command line.

Okay, so no backwards compatiblity? I actually used the options on the command 
line...

Well, do you _need_ compatibility?

It can certainly be done, but as the (wrong) options on the command line
have only existed since November and were never documented, I wouldn't
bother unless there's a good reason.

Every Qemu before 2.8.0 was working with sth like:

qemu -cdrom nfs://10.0.0.1/expory/my.iso?readahead=131072

That will keep working. We're not changing the URL parsing, just the
runtime_opts and its accesses in nfs_client_open(). The translation in
nfs_parse_uri() stays intact with the fixes.

What will stop working (and only worked in 2.8.0) is this:

 qemu -drive 
media=cdrom,driver=nfs,server.host=10.0.0.1,path=export/my.iso,readahead=131072

Also, I think the fixes should be Cc: qemu-stable, so that 2.8.1 will
work correctly again.


Okay, I hope I got it. Will send a patch tomorrow.

Peter




Re: [Qemu-devel] qemu-img: complains about missing iscsi option group

2017-01-19 Thread Peter Lieven

Am 19.01.2017 um 17:38 schrieb Daniel P. Berrange:

On Thu, Jan 19, 2017 at 05:30:20PM +0100, Peter Lieven wrote:

Hi all,


since commit f57b4b5 (blockdev: prepare iSCSI block driver for dynamic loading)

qemu-img complains about a missing option group several times.


~/git/qemu$ ./qemu-img info 
iscsi://172.21.200.56:3260/iqn.2001-05.com.equallogic:0-8a0906-69d384e0a-aa3004e55e15878d-00lieven-data/0
qemu-img: There is no option group 'iscsi'
qemu-img: There is no option group 'iscsi'
qemu-img: There is no option group 'iscsi'
qemu-img: There is no option group 'iscsi'
image: 
iscsi://172.21.200.56:3260/iqn.2001-05.com.equallogic:0-8a0906-69d384e0a-aa3004e55e15878d-00lieven-data/0
file format: raw
virtual size: 25G (26848788480 bytes)
disk size: unavailable
cluster_size: 15728640

I don't know whats the proper way to fix this?

Move the iscsi option registration out of vl.c and back into a shared file
in block/ such that all programs see the options, not just the emulators.


Any preference which file? It seems no other block driver registers options.

Thanks,
Peter




[Qemu-devel] [PATCH] bitmap: assert that start and nr are non negative

2017-01-19 Thread Peter Lieven
commit e1123a3b introduced a data corruption regression
in the iscsi driver because it passed -1 as nr to bitmap_set
and bitmap_clear. Add an assertion to catch such flaws earlier.

Suggested-by: Fam Zheng <f...@redhat.com>
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 util/bitmap.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/util/bitmap.c b/util/bitmap.c
index 43ed011..c1a84ca 100644
--- a/util/bitmap.c
+++ b/util/bitmap.c
@@ -164,6 +164,8 @@ void bitmap_set(unsigned long *map, long start, long nr)
 int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
 unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
 
+assert(start >= 0 && nr >= 0);
+
 while (nr - bits_to_set >= 0) {
 *p |= mask_to_set;
 nr -= bits_to_set;
@@ -184,6 +186,8 @@ void bitmap_set_atomic(unsigned long *map, long start, long 
nr)
 int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
 unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
 
+assert(start >= 0 && nr >= 0);
+
 /* First word */
 if (nr - bits_to_set > 0) {
 atomic_or(p, mask_to_set);
@@ -221,6 +225,8 @@ void bitmap_clear(unsigned long *map, long start, long nr)
 int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
 unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
 
+assert(start >= 0 && nr >= 0);
+
 while (nr - bits_to_clear >= 0) {
 *p &= ~mask_to_clear;
 nr -= bits_to_clear;
@@ -243,6 +249,8 @@ bool bitmap_test_and_clear_atomic(unsigned long *map, long 
start, long nr)
 unsigned long dirty = 0;
 unsigned long old_bits;
 
+assert(start >= 0 && nr >= 0);
+
 /* First word */
 if (nr - bits_to_clear > 0) {
 old_bits = atomic_fetch_and(p, ~mask_to_clear);
-- 
1.9.1




[Qemu-devel] [PATCH V2] qemu-img: optimize is_allocated_sectors_min

2017-01-19 Thread Peter Lieven
the current implementation always splits requests if a buffer
begins or ends with zeroes independent of the length of the
zero area. Change this to really only split off zero areas
that have at least a length of 'min' bytes.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 qemu-img.c | 44 ++--
 1 file changed, 14 insertions(+), 30 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 5df66fe..8e7357d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1010,45 +1010,29 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum)
 }
 
 /*
- * Like is_allocated_sectors, but if the buffer starts with a used sector,
- * up to 'min' consecutive sectors containing zeros are ignored. This avoids
- * breaking up write requests for only small sparse areas.
+ * Like is_allocated_sectors, but only at least 'min' consecutive sectors
+ * containing zeros are considered unallocated. This avoids breaking up write
+ * requests for only small sparse areas.
  */
 static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
-int min)
+int min)
 {
-int ret;
-int num_checked, num_used;
-
-if (n < min) {
-min = n;
-}
-
-ret = is_allocated_sectors(buf, n, pnum);
-if (!ret) {
-return ret;
-}
-
-num_used = *pnum;
-buf += BDRV_SECTOR_SIZE * *pnum;
-n -= *pnum;
-num_checked = num_used;
+int num_used = 0;
 
 while (n > 0) {
-ret = is_allocated_sectors(buf, n, pnum);
-
-buf += BDRV_SECTOR_SIZE * *pnum;
-n -= *pnum;
-num_checked += *pnum;
-if (ret) {
-num_used = num_checked;
-} else if (*pnum >= min) {
+if (!is_allocated_sectors(buf, n, pnum) && *pnum >= min) {
 break;
 }
+num_used += *pnum;
+buf += BDRV_SECTOR_SIZE * *pnum;
+n -= *pnum;
 }
 
-*pnum = num_used;
-return 1;
+if (num_used) {
+*pnum = num_used;
+}
+
+return !!num_used;
 }
 
 /*
-- 
1.9.1




Re: [Qemu-devel] [PATCH v6 0/2] allow blockdev-add for NFS

2017-01-19 Thread Peter Lieven

Am 19.01.2017 um 16:55 schrieb Kevin Wolf:

Am 19.01.2017 um 16:44 hat Peter Lieven geschrieben:

Am 19.01.2017 um 16:42 schrieb Kevin Wolf:

Am 19.01.2017 um 16:34 hat Peter Lieven geschrieben:

Am 19.01.2017 um 16:20 schrieb Kevin Wolf:

Am 19.01.2017 um 15:59 hat Eric Blake geschrieben:

On 01/19/2017 08:30 AM, Peter Lieven wrote:

qemu-img: Could not open
'nfs://172.21.200.61/vcore-dev-cdrom/templates/VC_debian8-20170116.qcow2,linux?readahead=131072':
Block protocol 'nfs' doesn't support the option 'readahead-size'

Please let me know if the below fix would be correct:

No, this needs to be fixed the other way round: runtime_opts must use
the names as specified in the schema, and nfs_client_open() must access
them as such. Without that, blockdev-add can't work (and the command
line only with the "wrong" old option names from the URL, whereas it
should be using the same names as the QAPI schema).

Shouldn't we support both for backwards compatiblity.?

blockdev-add only needs to support the modern naming.  But yes,
preserving back-compat spelling of the command-line spellings, as well
as matching blockdev-add spellings, is desirable.

We only just added the individual command line options, previously it
only supported the URL.

It's true that we have the messed up version of the options in 2.8, so
strictly speaking we would break compatibility with a release, but it's
only one release, it's only the nfs driver, and the documentation of the
options is the schema, which had the right option names even in 2.8
(they just didn't work).

So I wouldn't feel bad about removing the wrong names in this specific
case.

So want exactly do you want to do? Fix the names in the QAPI schema
to use the old naming?

No, fix the command line to use the names in the QAPI schema.

The option names from the URL were never supposed to be supported on the
command line.

Okay, so no backwards compatiblity? I actually used the options on the command 
line...

Well, do you _need_ compatibility?

It can certainly be done, but as the (wrong) options on the command line
have only existed since November and were never documented, I wouldn't
bother unless there's a good reason.


Every Qemu before 2.8.0 was working with sth like:

qemu -cdrom nfs://10.0.0.1/expory/my.iso?readahead=131072

Peter



Re: [Qemu-devel] [PATCH] qemu-img: optimize is_allocated_sectors_min

2017-01-19 Thread Peter Lieven

Please ignore this one The code was suboptimal. Will send v2.

Peter

Am 19.01.2017 um 16:56 schrieb Peter Lieven:

the current implementation always splits requests if a buffer
begins or ends with zeroes independent of the length of the
zero area. Change this to really only split off zero areas
that have at least a length of 'min' bytes.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
  qemu-img.c | 38 --
  1 file changed, 12 insertions(+), 26 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 5df66fe..046696f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1010,45 +1010,31 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum)
  }
  
  /*

- * Like is_allocated_sectors, but if the buffer starts with a used sector,
- * up to 'min' consecutive sectors containing zeros are ignored. This avoids
- * breaking up write requests for only small sparse areas.
+ * Like is_allocated_sectors, but only at least 'min' consecutive sectors
+ * containing zeros are considered unallocated. This avoids breaking up write
+ * requests for only small sparse areas.
   */
  static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
-int min)
+int min)
  {
-int ret;
-int num_checked, num_used;
-
-if (n < min) {
-min = n;
-}
-
-ret = is_allocated_sectors(buf, n, pnum);
-if (!ret) {
-return ret;
-}
-
-num_used = *pnum;
-buf += BDRV_SECTOR_SIZE * *pnum;
-n -= *pnum;
-num_checked = num_used;
+int ret, num_checked = 0, num_used = 0;
  
  while (n > 0) {

  ret = is_allocated_sectors(buf, n, pnum);
-
  buf += BDRV_SECTOR_SIZE * *pnum;
  n -= *pnum;
  num_checked += *pnum;
-if (ret) {
-num_used = num_checked;
-} else if (*pnum >= min) {
+if (!ret && *pnum >= min) {
  break;
  }
+num_used = num_checked;
  }
  
-*pnum = num_used;

-return 1;
+if (num_used) {
+*pnum = num_used;
+}
+
+return !!num_used;
  }
  
  /*






[Qemu-devel] qemu-img: complains about missing iscsi option group

2017-01-19 Thread Peter Lieven

Hi all,


since commit f57b4b5 (blockdev: prepare iSCSI block driver for dynamic loading)

qemu-img complains about a missing option group several times.


~/git/qemu$ ./qemu-img info 
iscsi://172.21.200.56:3260/iqn.2001-05.com.equallogic:0-8a0906-69d384e0a-aa3004e55e15878d-00lieven-data/0
qemu-img: There is no option group 'iscsi'
qemu-img: There is no option group 'iscsi'
qemu-img: There is no option group 'iscsi'
qemu-img: There is no option group 'iscsi'
image: 
iscsi://172.21.200.56:3260/iqn.2001-05.com.equallogic:0-8a0906-69d384e0a-aa3004e55e15878d-00lieven-data/0
file format: raw
virtual size: 25G (26848788480 bytes)
disk size: unavailable
cluster_size: 15728640

I don't know whats the proper way to fix this?


Thanks,

Peter





Re: [Qemu-devel] [PATCH v6 0/2] allow blockdev-add for NFS

2017-01-19 Thread Peter Lieven

Am 19.01.2017 um 16:20 schrieb Kevin Wolf:

Am 19.01.2017 um 15:59 hat Eric Blake geschrieben:

On 01/19/2017 08:30 AM, Peter Lieven wrote:

qemu-img: Could not open
'nfs://172.21.200.61/vcore-dev-cdrom/templates/VC_debian8-20170116.qcow2,linux?readahead=131072':
Block protocol 'nfs' doesn't support the option 'readahead-size'

Please let me know if the below fix would be correct:

No, this needs to be fixed the other way round: runtime_opts must use
the names as specified in the schema, and nfs_client_open() must access
them as such. Without that, blockdev-add can't work (and the command
line only with the "wrong" old option names from the URL, whereas it
should be using the same names as the QAPI schema).

Shouldn't we support both for backwards compatiblity.?

blockdev-add only needs to support the modern naming.  But yes,
preserving back-compat spelling of the command-line spellings, as well
as matching blockdev-add spellings, is desirable.

We only just added the individual command line options, previously it
only supported the URL.

It's true that we have the messed up version of the options in 2.8, so
strictly speaking we would break compatibility with a release, but it's
only one release, it's only the nfs driver, and the documentation of the
options is the schema, which had the right option names even in 2.8
(they just didn't work).

So I wouldn't feel bad about removing the wrong names in this specific
case.


So want exactly do you want to do? Fix the names in the QAPI schema
to use the old naming?

Peter




[Qemu-devel] [PATCH] qemu-img: optimize is_allocated_sectors_min

2017-01-19 Thread Peter Lieven
the current implementation always splits requests if a buffer
begins or ends with zeroes independent of the length of the
zero area. Change this to really only split off zero areas
that have at least a length of 'min' bytes.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 qemu-img.c | 38 --
 1 file changed, 12 insertions(+), 26 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 5df66fe..046696f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1010,45 +1010,31 @@ static int is_allocated_sectors(const uint8_t *buf, int 
n, int *pnum)
 }
 
 /*
- * Like is_allocated_sectors, but if the buffer starts with a used sector,
- * up to 'min' consecutive sectors containing zeros are ignored. This avoids
- * breaking up write requests for only small sparse areas.
+ * Like is_allocated_sectors, but only at least 'min' consecutive sectors
+ * containing zeros are considered unallocated. This avoids breaking up write
+ * requests for only small sparse areas.
  */
 static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
-int min)
+int min)
 {
-int ret;
-int num_checked, num_used;
-
-if (n < min) {
-min = n;
-}
-
-ret = is_allocated_sectors(buf, n, pnum);
-if (!ret) {
-return ret;
-}
-
-num_used = *pnum;
-buf += BDRV_SECTOR_SIZE * *pnum;
-n -= *pnum;
-num_checked = num_used;
+int ret, num_checked = 0, num_used = 0;
 
 while (n > 0) {
 ret = is_allocated_sectors(buf, n, pnum);
-
 buf += BDRV_SECTOR_SIZE * *pnum;
 n -= *pnum;
 num_checked += *pnum;
-if (ret) {
-num_used = num_checked;
-} else if (*pnum >= min) {
+if (!ret && *pnum >= min) {
 break;
 }
+num_used = num_checked;
 }
 
-*pnum = num_used;
-return 1;
+if (num_used) {
+*pnum = num_used;
+}
+
+return !!num_used;
 }
 
 /*
-- 
1.9.1




Re: [Qemu-devel] [PATCH v6 0/2] allow blockdev-add for NFS

2017-01-19 Thread Peter Lieven

Am 19.01.2017 um 16:42 schrieb Kevin Wolf:

Am 19.01.2017 um 16:34 hat Peter Lieven geschrieben:

Am 19.01.2017 um 16:20 schrieb Kevin Wolf:

Am 19.01.2017 um 15:59 hat Eric Blake geschrieben:

On 01/19/2017 08:30 AM, Peter Lieven wrote:

qemu-img: Could not open
'nfs://172.21.200.61/vcore-dev-cdrom/templates/VC_debian8-20170116.qcow2,linux?readahead=131072':
Block protocol 'nfs' doesn't support the option 'readahead-size'

Please let me know if the below fix would be correct:

No, this needs to be fixed the other way round: runtime_opts must use
the names as specified in the schema, and nfs_client_open() must access
them as such. Without that, blockdev-add can't work (and the command
line only with the "wrong" old option names from the URL, whereas it
should be using the same names as the QAPI schema).

Shouldn't we support both for backwards compatiblity.?

blockdev-add only needs to support the modern naming.  But yes,
preserving back-compat spelling of the command-line spellings, as well
as matching blockdev-add spellings, is desirable.

We only just added the individual command line options, previously it
only supported the URL.

It's true that we have the messed up version of the options in 2.8, so
strictly speaking we would break compatibility with a release, but it's
only one release, it's only the nfs driver, and the documentation of the
options is the schema, which had the right option names even in 2.8
(they just didn't work).

So I wouldn't feel bad about removing the wrong names in this specific
case.

So want exactly do you want to do? Fix the names in the QAPI schema
to use the old naming?

No, fix the command line to use the names in the QAPI schema.

The option names from the URL were never supposed to be supported on the
command line.

Kevin


This is what I wanted to do (before I asked ;-)):

diff --git a/block/nfs.c b/block/nfs.c
index baaecff..7c997cd 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -119,19 +119,24 @@ static int nfs_parse_uri(const char *filename, QDict 
*options, Error **errp)
qp->p[i].name);
 goto out;
 }
-if (!strcmp(qp->p[i].name, "uid")) {
+if (!strcmp(qp->p[i].name, "uid") ||
+!strcmp(qp->p[i].name, "user")) {
 qdict_put(options, "user",
   qstring_from_str(qp->p[i].value));
-} else if (!strcmp(qp->p[i].name, "gid")) {
+} else if (!strcmp(qp->p[i].name, "gid") ||
+   !strcmp(qp->p[i].name, "group")) {
 qdict_put(options, "group",
   qstring_from_str(qp->p[i].value));
-} else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
+} else if (!strcmp(qp->p[i].name, "tcp-syncnt") ||
+   !strcmp(qp->p[i].name, "tcp-syn-count")) {
 qdict_put(options, "tcp-syn-count",
   qstring_from_str(qp->p[i].value));
-} else if (!strcmp(qp->p[i].name, "readahead")) {
+} else if (!strcmp(qp->p[i].name, "readahead") ||
+   !strcmp(qp->p[i].name, "readahead-size")) {
 qdict_put(options, "readahead-size",
   qstring_from_str(qp->p[i].value));
-} else if (!strcmp(qp->p[i].name, "pagecache")) {
+} else if (!strcmp(qp->p[i].name, "pagecache") ||
+   !strcmp(qp->p[i].name, "page-cache-size")) {
 qdict_put(options, "page-cache-size",
   qstring_from_str(qp->p[i].value));
 } else if (!strcmp(qp->p[i].name, "debug")) {
@@ -359,27 +364,27 @@ static QemuOptsList runtime_opts = {
 .help = "Path of the image on the host",
 },
 {
-.name = "uid",
+.name = "user",
 .type = QEMU_OPT_NUMBER,
 .help = "UID value to use when talking to the server",
 },
 {
-.name = "gid",
+.name = "group",
 .type = QEMU_OPT_NUMBER,
 .help = "GID value to use when talking to the server",
 },
 {
-.name = "tcp-syncnt",
+.name = "tcp-syn-count",
 .type = QEMU_OPT_NUMBER,
 .help = "Number of SYNs to send during the session establish",
 },
 {
-.name = "readahead",
+.name = "readahead-size",
 .type = QEMU_OPT_NUMBER,
 .help = "Set the readahead size in bytes",
 },
 {
-.name = "pagecache",
+.name = "page-cache-size",
 .type = QEMU_OPT

Re: [Qemu-devel] [PATCH v6 0/2] allow blockdev-add for NFS

2017-01-19 Thread Peter Lieven

Am 19.01.2017 um 16:42 schrieb Kevin Wolf:

Am 19.01.2017 um 16:34 hat Peter Lieven geschrieben:

Am 19.01.2017 um 16:20 schrieb Kevin Wolf:

Am 19.01.2017 um 15:59 hat Eric Blake geschrieben:

On 01/19/2017 08:30 AM, Peter Lieven wrote:

qemu-img: Could not open
'nfs://172.21.200.61/vcore-dev-cdrom/templates/VC_debian8-20170116.qcow2,linux?readahead=131072':
Block protocol 'nfs' doesn't support the option 'readahead-size'

Please let me know if the below fix would be correct:

No, this needs to be fixed the other way round: runtime_opts must use
the names as specified in the schema, and nfs_client_open() must access
them as such. Without that, blockdev-add can't work (and the command
line only with the "wrong" old option names from the URL, whereas it
should be using the same names as the QAPI schema).

Shouldn't we support both for backwards compatiblity.?

blockdev-add only needs to support the modern naming.  But yes,
preserving back-compat spelling of the command-line spellings, as well
as matching blockdev-add spellings, is desirable.

We only just added the individual command line options, previously it
only supported the URL.

It's true that we have the messed up version of the options in 2.8, so
strictly speaking we would break compatibility with a release, but it's
only one release, it's only the nfs driver, and the documentation of the
options is the schema, which had the right option names even in 2.8
(they just didn't work).

So I wouldn't feel bad about removing the wrong names in this specific
case.

So want exactly do you want to do? Fix the names in the QAPI schema
to use the old naming?

No, fix the command line to use the names in the QAPI schema.

The option names from the URL were never supposed to be supported on the
command line.


Okay, so no backwards compatiblity? I actually used the options on the command 
line...

Peter




Re: [Qemu-devel] [PATCH v6 0/2] allow blockdev-add for NFS

2017-01-19 Thread Peter Lieven

Am 18.01.2017 um 10:59 schrieb Kevin Wolf:

Am 17.01.2017 um 16:14 hat Peter Lieven geschrieben:

Am 31.10.2016 um 18:20 schrieb Kevin Wolf:

Am 31.10.2016 um 16:05 hat Ashijeet Acharya geschrieben:

Previously posted series patches:
v5: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg07580.html
v4: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg07449.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg06903.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg05844.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg04487.html

This series adds blockdev-add support for NFS block driver.

Thanks, fixed as commented on patch 1 and applied.

Hi,

it seems this series breaks passing options via URI.

1) in nfs_parse_uri

parse_uint_full(qp->p[i].value, NULL, 0)

segfaults, as the routine wants to set *NULL = val.

Yes, you're right.


2) all parameters that have a different names in options and qdict
e.g. readahead-size vs. readahead cannot be passed via URI.

$ qemu-img convert -p 
nfs://172.21.200.61/templates/VC_debian8-20170116.qcow2,linux\?readahead=131072 
iscsi://172.21.200.56:3260/iqn.2001-05.com.equallogic:0-8a0906-69d384e0a-aa3004e55e15878d-XXX/0

qemu-img: Could not open 
'nfs://172.21.200.61/vcore-dev-cdrom/templates/VC_debian8-20170116.qcow2,linux?readahead=131072':
 Block protocol 'nfs' doesn't support the option 'readahead-size'

Please let me know if the below fix would be correct:

No, this needs to be fixed the other way round: runtime_opts must use
the names as specified in the schema, and nfs_client_open() must access
them as such. Without that, blockdev-add can't work (and the command
line only with the "wrong" old option names from the URL, whereas it
should be using the same names as the QAPI schema).


Shouldn't we support both for backwards compatiblity.?

Peter



Re: [Qemu-devel] [PATCH v6 0/2] allow blockdev-add for NFS

2017-01-17 Thread Peter Lieven

Am 31.10.2016 um 18:20 schrieb Kevin Wolf:

Am 31.10.2016 um 16:05 hat Ashijeet Acharya geschrieben:

Previously posted series patches:
v5: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg07580.html
v4: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg07449.html
v3: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg06903.html
v2: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg05844.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg04487.html

This series adds blockdev-add support for NFS block driver.

Thanks, fixed as commented on patch 1 and applied.


Hi,

it seems this series breaks passing options via URI.

1) in nfs_parse_uri

parse_uint_full(qp->p[i].value, NULL, 0)

segfaults, as the routine wants to set *NULL = val.

Program received signal SIGSEGV, Segmentation fault.
parse_uint (s=0x55d0ead0 "131072", value=value@entry=0x0, 
endptr=endptr@entry=0x7fffd870, base=base@entry=0) at util/cutils.c:475
475*value = val;
(gdb) bt full
#0  parse_uint (s=0x55d0ead0 "131072", value=value@entry=0x0, 
endptr=endptr@entry=0x7fffd870, base=base@entry=0) at util/cutils.c:475
r = 0
endp = 0x55d0ead6 ""
val = 131072
#1  0x55655ff2 in parse_uint_full (s=, 
value=value@entry=0x0, base=base@entry=0) at util/cutils.c:499
endp = 0x55d093f0 "\004"
r = 
#2  0x55603425 in nfs_parse_uri (filename=, 
options=0x55d093f0, errp=0x7fffd980) at block/nfs.c:116
uri = 0x55d0e920
qp = 0x55d0ea30
ret = -22
i = 0
__func__ = "nfs_parse_uri"
#3  0x5559c7bb in bdrv_fill_options (errp=0x7fffd968, flags=0x7fffd95c, 
filename=, options=) at block.c:1278
drvname = 
protocol = 
local_err = 0x0
parse_filename = true
drv = 0x558e3140 


2) all parameters that have a different names in options and qdict
e.g. readahead-size vs. readahead cannot be passed via URI.

$ qemu-img convert -p 
nfs://172.21.200.61/templates/VC_debian8-20170116.qcow2,linux\?readahead=131072 
iscsi://172.21.200.56:3260/iqn.2001-05.com.equallogic:0-8a0906-69d384e0a-aa3004e55e15878d-XXX/0

qemu-img: Could not open 
'nfs://172.21.200.61/vcore-dev-cdrom/templates/VC_debian8-20170116.qcow2,linux?readahead=131072':
 Block protocol 'nfs' doesn't support the option 'readahead-size'

Please let me know if the below fix would be correct:

lieven@lieven-pc:~/git/qemu$ git diff
diff --git a/block/nfs.c b/block/nfs.c
index a564340..0ff8268 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -108,30 +108,31 @@ static int nfs_parse_uri(const char *filename, QDict 
*options, Error **errp)
 qdict_put(options, "path", qstring_from_str(uri->path));

 for (i = 0; i < qp->n; i++) {
+unsigned long long val;
 if (!qp->p[i].value) {
 error_setg(errp, "Value for NFS parameter expected: %s",
qp->p[i].name);
 goto out;
 }
-if (parse_uint_full(qp->p[i].value, NULL, 0)) {
+if (parse_uint_full(qp->p[i].value, , 0)) {
 error_setg(errp, "Illegal value for NFS parameter: %s",
qp->p[i].name);
 goto out;
 }
 if (!strcmp(qp->p[i].name, "uid")) {
-qdict_put(options, "user",
+qdict_put(options, "uid",
   qstring_from_str(qp->p[i].value));
 } else if (!strcmp(qp->p[i].name, "gid")) {
-qdict_put(options, "group",
+qdict_put(options, "gid",
   qstring_from_str(qp->p[i].value));
 } else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
-qdict_put(options, "tcp-syn-count",
+qdict_put(options, "tcp-syncnt",
   qstring_from_str(qp->p[i].value));
 } else if (!strcmp(qp->p[i].name, "readahead")) {
-qdict_put(options, "readahead-size",
+qdict_put(options, "readahead",
   qstring_from_str(qp->p[i].value));
 } else if (!strcmp(qp->p[i].name, "pagecache")) {
-qdict_put(options, "page-cache-size",
+qdict_put(options, "pagecache",
   qstring_from_str(qp->p[i].value));
 } else if (!strcmp(qp->p[i].name, "debug")) {
 qdict_put(options, "debug",


Thanks,
Peter




Re: [Qemu-devel] [Qemu-stable] [PATCH] block/iscsi: avoid data corruption with cache=writeback

2017-01-17 Thread Peter Lieven

Am 17.01.2017 um 12:28 schrieb Fam Zheng:

On Mon, 01/16 16:17, Peter Lieven wrote:

nb_cls_shrunk in iscsi_allocmap_update can become -1 if the
request starts and ends within the same cluster. This results
in passing -1 to bitmap_set and bitmap_clear and they don't
handle negative values properly. In the end this leads to data
corruption.

Fixes: e1123a3b40a1a9a625a29c8ed4debb7e206ea690
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
---
  block/iscsi.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 6aeeb9e..1860f1b 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -499,14 +499,18 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
  if (allocated) {
  bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
  } else {
-bitmap_clear(iscsilun->allocmap, cl_num_shrunk, nb_cls_shrunk);
+if (nb_cls_shrunk > 0) {
+bitmap_clear(iscsilun->allocmap, cl_num_shrunk, nb_cls_shrunk);
+}
  }
  
  if (iscsilun->allocmap_valid == NULL) {

  return;
  }
  if (valid) {
-bitmap_set(iscsilun->allocmap_valid, cl_num_shrunk, nb_cls_shrunk);
+if (nb_cls_shrunk > 0) {
+bitmap_set(iscsilun->allocmap_valid, cl_num_shrunk, nb_cls_shrunk);
+}
  } else {
  bitmap_clear(iscsilun->allocmap_valid, cl_num_expanded,
   nb_cls_expanded);
--
1.9.1



It's probably a good idea to add assertions parameter in bitmap_*.


yes, i will send a follow-up patch.

Peter




Re: [Qemu-devel] [Qemu-stable] Data corruption in Qemu 2.7.1

2017-01-17 Thread Peter Lieven

Am 17.01.2017 um 07:40 schrieb Fam Zheng:

On Fri, 01/13 11:44, Peter Lieven wrote:

Hi,

i currently facing a problem in our testing environment where I see file
system corruption with 2.7.1 on iSCSI and Local Storage (LVM).
Trying to bisect, but has anyone observed this before?

The information here is too scarce to tell but a file corruption is more often a
result of two writers modifying the disk concurrently. Have you ruled that out?
Is the corruption reproducible?


My issue was primary with iSCSI and I cut bisect it. I already send a patch to 
the list:

commit 0bd57e907311be6e4f97394cfd9afebe271457e2
Author: Peter Lieven <p...@kamp.de>
Date:   Mon Jan 16 16:10:26 2017 +0100

block/iscsi: avoid data corruption with cache=writeback

nb_cls_shrunk in iscsi_allocmap_update can become -1 if the
request starts and ends within the same cluster. This results
in passing -1 to bitmap_set and bitmap_clear and they don't
handle negative values properly. In the end this leads to data
corruption.

Fixes: e1123a3b40a1a9a625a29c8ed4debb7e206ea690
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>

However, one user also reported corruption with LVM. I will check if he uses 
virtio-scsi.

Thanks,
Peter




[Qemu-devel] [PATCH] block/iscsi: avoid data corruption with cache=writeback

2017-01-16 Thread Peter Lieven
nb_cls_shrunk in iscsi_allocmap_update can become -1 if the
request starts and ends within the same cluster. This results
in passing -1 to bitmap_set and bitmap_clear and they don't
handle negative values properly. In the end this leads to data
corruption.

Fixes: e1123a3b40a1a9a625a29c8ed4debb7e206ea690
Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 block/iscsi.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 6aeeb9e..1860f1b 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -499,14 +499,18 @@ iscsi_allocmap_update(IscsiLun *iscsilun, int64_t 
sector_num,
 if (allocated) {
 bitmap_set(iscsilun->allocmap, cl_num_expanded, nb_cls_expanded);
 } else {
-bitmap_clear(iscsilun->allocmap, cl_num_shrunk, nb_cls_shrunk);
+if (nb_cls_shrunk > 0) {
+bitmap_clear(iscsilun->allocmap, cl_num_shrunk, nb_cls_shrunk);
+}
 }
 
 if (iscsilun->allocmap_valid == NULL) {
 return;
 }
 if (valid) {
-bitmap_set(iscsilun->allocmap_valid, cl_num_shrunk, nb_cls_shrunk);
+if (nb_cls_shrunk > 0) {
+bitmap_set(iscsilun->allocmap_valid, cl_num_shrunk, nb_cls_shrunk);
+}
 } else {
 bitmap_clear(iscsilun->allocmap_valid, cl_num_expanded,
  nb_cls_expanded);
-- 
1.9.1




[Qemu-devel] Data corruption in Qemu 2.7.1

2017-01-13 Thread Peter Lieven
Hi,

i currently facing a problem in our testing environment where I see file system 
corruption with 2.7.1 on iSCSI and Local Storage (LVM).
Trying to bisect, but has anyone observed this before?

Thanks,
Peter



Re: [Qemu-devel] [PATCH v2 5/9] block: Pass unaligned discard requests to drivers

2016-11-21 Thread Peter Lieven

Am 19.11.2016 um 23:05 schrieb Max Reitz:

On 18.11.2016 02:13, Eric Blake wrote:

On 11/17/2016 05:44 PM, Max Reitz wrote:

Since the SCSI specification says nothing about a minimum
discard granularity, and only documents the preferred
alignment, it is best if the block layer gives the driver
every bit of information about discard requests, rather than
rounding it to alignment boundaries early.

Is this series supposed to address this issue? Because if so, I fail to
see where it does. If the device advertises 15 MB as the discard
granularity, then the iscsi driver will still drop all discard requests
that are not aligned to 15 MB boundaries, no?


I don't have access to the device in question, so I'm hoping Peter
chimes in (oops, how'd I miss him on original CC?).  Here's all the more
he said on v1:
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02223.html

My gut feel, however, is that the iscsi code is NOT rounding

Why are you relying on your gut feel when you can simply look into the
code in question? As a matter of fact, I know that you have looked at
the very piece of code in question because you touch it in patch 4.

Now that I looked at it again, though, I see my mistake, though: I
assumed that is_byte_request_lun_aligned() would check whether the
request is aligned to the advertised discard granularity. Of course, it
does not, though, it only checks whether it's aligned to the device's
block_size.

So with the device in question, block_size is something like, say, 1 MB
and the pdiscard_alignment is 15 MB. With your series, the block layer
will first split off the head of an unaligned discard request (with the
rest being aligned to the pdiscard_alignment) and then again the head
off that head (with regards to the request_alignment).

The iscsi driver will discard the head of the head (which isn't aligned
to the request_alignment), but pass the part of the head that is aligned
to request_alignment and of course pass everything that's aligned to
pdiscard_alignment, too.

(And the same then happens for the tail.)

OK, now I see.


  (the qcow2
code rounded, but that's different); the regression happened in 2.7
because the block layer also started rounding, and this patch gets the
block layer rounding out of the way. If nothing changed in the iscsi
code in the meantime, then the iscsi code should now (once again) be
discarding all sizes, regardless of the 15M advertisement.

Well, all sizes that are aligned to the request_alignment.


Meanwhile, I _did_ test this patch with blkdebug (see 9/9), as well as
on a slightly modified NBD client that forced 15M alignments, and for
those cases, it definitely made the difference on whether all bytes were
passed (spread across several calls), vs. just the aligned bytes in the
middle of a request larger than 15M.


The only difference is that it's now the iscsi driver that drops the
request instead of the generic block layer.

If the iscsi driver was ever dropping it in the first place.

It wasn't dropping them, it was asserting that it was dropped; yes, my
mistake for thinking that is_byte_request_lun_aligned() would check
whether the request is aligned to pdiscard_alignment.


Sorry for not responding earlier. I was ooo some days.

The iSCSI driver indeed checked only for alignment to block_size and
was passing all other discard requests which the device was handling
or just dropping.

The version now seems correct.

Thanks,
Peter



[Qemu-devel] qemu 2.6.2: e1000 network card hanging after live migration

2016-11-14 Thread Peter Lieven

Hi,

I have observed a virtual server with an e1000  network card loosing network 
connectivity
after live migration (not reproducible so far). In the log of the vServer I 
found several messages like:

kernel: [13833656.832108] e1000 :00:05.0 eth2: Reset adapter

When I restarted the vServer to try to restore connectivity I triggered the 
following assertion:

qemu-2.6.2: hw/pci/pci.c:280: pcibus_reset: Assertion `bus->irq_count[i] == 0' 
failed.

Is this something that has anyone observed before?

Thanks,
Peter



Re: [Qemu-devel] [Qemu-stable] [PATCH 2/2] block: Pass unaligned discard requests to drivers

2016-11-11 Thread Peter Lieven
Am 11.11.2016 um 00:10 schrieb Eric Blake:
> Discard is advisory, so rounding the requests to alignment
> boundaries is never semantically wrong from the data that
> the guest sees.  But at least the Dell Equallogic iSCSI SANs
> has an interesting property that its advertised discard
> alignment is 15M, yet documents that discarding a sequence
> of 1M slices will eventually result in the 15M page being
> marked as discarded, and it is possible to observe which
> pages have been discarded.
>
> Between commits 9f1963b and b8d0a980, we converted the block
> layer to a byte-based interface that ultimately ignores any
> unaligned head or tail based on the driver's advertised
> discard granularity, which means that qemu 2.7 refuses to
> pass any discard request smaller than 15M down to the Dell
> Equallogic hardware.  This is a slight regression in behavior
> compared to earlier qemu, where a guest executing discards
> in power-of-2 chunks used to be able to get every page
> discarded, but is now left with various pages still allocated
> because the guest requests did not align with the hardware's
> 15M pages.
>
> Since the SCSI specification says nothing about a minimum
> discard granularity, and only documents the preferred
> alignment, it is best if the block layer gives the driver
> every bit of information about discard requests, rather than
> rounding it to alignment boundaries early.
>
> Rework the block layer discard algorithm to mirror the write
> zero algorithm: always peel off any unaligned head or tail
> and manage that in isolation, then do the bulk of the request
> on an aligned boundary.  The fallback when the driver returns
> -ENOTSUP for an unaligned request is to silently ignore that
> portion of the discard request; but for devices that can pass
> the partial request all the way down to hardware, this can
> result in the hardware coalescing requests and discarding
> aligned pages after all.
>
> Reported by: Peter Lieven <p...@kamp.de>
> Signed-off-by: Eric Blake <ebl...@redhat.com>
> CC: qemu-sta...@nongnu.org
> ---
>  block/io.c | 36 +++-
>  1 file changed, 23 insertions(+), 13 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index aa532a5..76c3030 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2421,7 +2421,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
> int64_t offset,
>  {
>  BdrvTrackedRequest req;
>  int max_pdiscard, ret;
> -int head, align;
> +int head, tail, align;
>
>  if (!bs->drv) {
>  return -ENOMEDIUM;
> @@ -2444,19 +2444,15 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState 
> *bs, int64_t offset,
>  return 0;
>  }
>
> -/* Discard is advisory, so ignore any unaligned head or tail */
> +/* Discard is advisory, but some devices track and coalesce
> + * unaligned requests, so we must pass everything down rather than
> + * round here.  Still, most devices will just silently ignore
> + * unaligned requests (by returning -ENOTSUP), so we must fragment
> + * the request accordingly.  */
>  align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
>  assert(align % bs->bl.request_alignment == 0);
>  head = offset % align;
> -if (head) {
> -head = MIN(count, align - head);
> -count -= head;
> -offset += head;
> -}
> -count = QEMU_ALIGN_DOWN(count, align);
> -if (!count) {
> -return 0;
> -}
> +tail = (offset + count) % align;
>
>  bdrv_inc_in_flight(bs);
>  tracked_request_begin(, bs, offset, count, BDRV_TRACKED_DISCARD);
> @@ -2468,11 +2464,25 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState 
> *bs, int64_t offset,
>
>  max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard, 
> INT_MAX),
> align);
> -assert(max_pdiscard);
> +assert(max_pdiscard >= bs->bl.request_alignment);
>
>  while (count > 0) {
>  int ret;
> -int num = MIN(count, max_pdiscard);
> +int num = count;
> +
> +if (head) {
> +/* Make a small request up to the first aligned sector. */
> +num = MIN(count, align - head);
> +head = (head + num) % align;

Is there any way that head is != 0 after this?

Peter



Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-11-08 Thread Peter Lieven

Am 25.10.2016 um 18:12 schrieb Eric Blake:

On 10/25/2016 09:36 AM, Paolo Bonzini wrote:


On 25/10/2016 16:35, Eric Blake wrote:

So your argument is that we should always pass down every unaligned
less-than-optimum discard request all the way to the hardware, rather
than dropping it higher in the stack, even though discard requests are
already advisory, in order to leave the hardware as the ultimate
decision on whether to ignore the unaligned request?

Yes, I agree with Peter as to this.

Okay, I'll work on patches. I think it counts as bug fix, so appropriate
even if I miss soft freeze (I'd still like to get NBD write zero support
into 2.8, since it already missed 2.7, but that one is still awaiting
review with not much time left).



Hi Eric,

have you had time to look at this?
If you need help, let me know.

Peter



Re: [Qemu-devel] [PATCH v2 1/2] block/nfs: Introduce runtime_opts in NFS

2016-10-25 Thread Peter Lieven

Am 25.10.2016 um 16:02 schrieb Kevin Wolf:

Peter, there is a question for you hidden somewhere below.

Am 24.10.2016 um 21:27 hat Ashijeet Acharya geschrieben:

Make NFS block driver use various fine grained runtime_opts.
Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
the URI.

Signed-off-by: Ashijeet Acharya 
---
  block/nfs.c | 348 +++-
  1 file changed, 253 insertions(+), 95 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 8602a44..666ccf2 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -35,8 +35,12 @@
  #include "qemu/uri.h"
  #include "qemu/cutils.h"
  #include "sysemu/sysemu.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qint.h"
+#include "qapi/qmp/qstring.h"
  #include 
  
+

  #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
  #define QEMU_NFS_MAX_PAGECACHE_SIZE (8388608 / NFS_BLKSIZE)
  #define QEMU_NFS_MAX_DEBUG_LEVEL 2
@@ -61,6 +65,118 @@ typedef struct NFSRPC {
  NFSClient *client;
  } NFSRPC;
  
+static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)

+{
+URI *uri = NULL;
+QueryParams *qp = NULL;
+int ret = 0, i;
+
+uri = uri_parse(filename);
+if (!uri) {
+error_setg(errp, "Invalid URI specified");
+ret = -EINVAL;
+goto out;
+}
+if (strcmp(uri->scheme, "nfs") != 0) {
+error_setg(errp, "URI scheme must be 'nfs'");
+ret = -EINVAL;
+goto out;
+}
+
+if (!uri->server) {
+error_setg(errp, "missing hostname in URI");
+ret = -EINVAL;
+goto out;
+}
+
+if (!uri->path) {
+error_setg(errp, "missing file path in URI");
+ret = -EINVAL;
+goto out;
+}
+
+qp = query_params_parse(uri->query);
+if (!qp) {
+error_setg(errp, "could not parse query parameters");
+ret = -EINVAL;
+goto out;
+}
+
+qdict_put(options, "host", qstring_from_str(uri->server));
+
+qdict_put(options, "path", qstring_from_str(uri->path));

Just style, but why the empty line between both qdict_put() calls?


+
+for (i = 0; i < qp->n; i++) {
+if (!qp->p[i].value) {
+error_setg(errp, "Value for NFS parameter expected: %s",
+   qp->p[i].name);
+goto out;

You need to set ret = -EINVAL here.


+}
+if (parse_uint_full(qp->p[i].value, NULL, 0)) {
+error_setg(errp, "Illegal value for NFS parameter: %s",
+   qp->p[i].name);
+goto out;

And here.


+}
+if (!strcmp(qp->p[i].name, "uid")) {
+qdict_put(options, "uid",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "gid")) {
+qdict_put(options, "gid",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
+qdict_put(options, "tcp-syncnt",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "readahead")) {
+qdict_put(options, "readahead",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "pagecache")) {
+qdict_put(options, "pagecache",
+  qstring_from_str(qp->p[i].value));
+} else if (!strcmp(qp->p[i].name, "debug")) {
+qdict_put(options, "debug",
+  qstring_from_str(qp->p[i].value));
+} else {
+error_setg(errp, "Unknown NFS parameter name: %s",
+   qp->p[i].name);
+goto out;

And here.

If you prefer, you can set ret = -EINVAL at the top of the function and
do a ret = 0 only right before out:


+}
+}
+out:
+if (qp) {
+query_params_free(qp);
+}
+if (uri) {
+uri_free(uri);
+}
+return ret;
+}
+
+static void nfs_parse_filename(const char *filename, QDict *options,
+   Error **errp)
+{
+int ret = 0;
+
+if (qdict_haskey(options, "host") ||
+qdict_haskey(options, "path") ||
+qdict_haskey(options, "uid") ||
+qdict_haskey(options, "gid") ||
+qdict_haskey(options, "tcp-syncnt") ||
+qdict_haskey(options, "readahead") ||
+qdict_haskey(options, "pagecache") ||
+qdict_haskey(options, "debug")) {
+error_setg(errp, "host/path/uid/gid/tcp-syncnt/readahead/pagecache"
+ "/debug and a filename may not be used at the same"
+ " time");
+return;
+}
+
+ret = nfs_parse_uri(filename, options, errp);
+if (ret < 0) {
+error_setg(errp, "No valid URL specified");

errp has already been set by nfs_parse_uri(), you must not set it a
second time. Otherwise, this is what you get:

$ x86_64-softmmu/qemu-system-x86_64 

Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 25.10.2016 um 15:59 schrieb Eric Blake:

On 10/25/2016 07:42 AM, Peter Lieven wrote:

But hey, that firmware is seriously weird. :)

Yes, so you would not change the new implementation?

Even if the discard is e.g. 1MB it could theretically be that internally
the device has a finer granularity. Its an optimal discard alignment
not the minimum required discard size. I think thats a difference.
It does not say I can't handle smaller discards.

The firmware is probably technically buggy for advertising too large of
a minimum granularity, if it can piece together smaller requests into a
larger discard.  If discards need to happen at a smaller granularity,
the firmware (or kernel quirk system) should fix the advertisement to
the actual granularity that it will honor.  I don't see a reason to
change qemu's current behavior.



The issue is that the optimal unmap granularity is only a hint.
There is no evidence what happens with unaligned requests or requests
that are smaller. They could still lead to a discard operation in the
storage device. It just says if you can send me discards of that size
thats optimal for me. Thats not said that smaller or unaligned requests
have no effect.

Peter




Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 25.10.2016 um 14:19 schrieb Paolo Bonzini:


On 25/10/2016 14:12, Peter Lieven wrote:

Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:

On 25/10/2016 14:03, Peter Lieven wrote:

Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:

On 28/07/2016 04:39, Eric Blake wrote:

On 07/27/2016 01:25 AM, Fam Zheng wrote:

On Thu, 07/21 13:34, Eric Blake wrote:

+max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a
reason for
a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo

Hi,

I came across a sort of regression we introduced with the dropping of
head and tail
of an unaligned discard.

The discard alignment that we use to trim the discard request is just a
hint.

I learned on the equallogics that a page (which is this unusal 15MB
large) is
unallocated even if the discard happens in pieces. E.g. in slices of 1MB
requests.

  From my point of view I would like to restore the old behaviour.
What do
you think?

The right logic should be the one in Linux: if splitting a request, and
the next starting sector would be misaligned, stop the discard at the
previous aligned sector.  Otherwise leave everything alone.

Just to clarify. I mean the guest would send incremental 1MB discards
we would now drop all of them if the alignment is 15MB. Previously,
we have sent all of the 1MB requests.

Yes.  In this case there would be no need at all to split the request,
so each request should be passed through.

But hey, that firmware is seriously weird. :)


Yes, so you would not change the new implementation?

Even if the discard is e.g. 1MB it could theretically be that internally
the device has a finer granularity. Its an optimal discard alignment
not the minimum required discard size. I think thats a difference.
It does not say I can't handle smaller discards.

Peter




Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 25.10.2016 um 14:09 schrieb Paolo Bonzini:


On 25/10/2016 14:03, Peter Lieven wrote:

Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:

On 28/07/2016 04:39, Eric Blake wrote:

On 07/27/2016 01:25 AM, Fam Zheng wrote:

On Thu, 07/21 13:34, Eric Blake wrote:

+max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a reason for
a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo

Hi,

I came across a sort of regression we introduced with the dropping of
head and tail
of an unaligned discard.

The discard alignment that we use to trim the discard request is just a
hint.

I learned on the equallogics that a page (which is this unusal 15MB
large) is
unallocated even if the discard happens in pieces. E.g. in slices of 1MB
requests.

 From my point of view I would like to restore the old behaviour. What do
you think?

The right logic should be the one in Linux: if splitting a request, and
the next starting sector would be misaligned, stop the discard at the
previous aligned sector.  Otherwise leave everything alone.


Just to clarify. I mean the guest would send incremental 1MB discards
we would now drop all of them if the alignment is 15MB. Previously,
we have sent all of the 1MB requests.

Peter




Re: [Qemu-devel] [PATCH 4/4] block: Cater to iscsi with non-power-of-2 discard

2016-10-25 Thread Peter Lieven

Am 01.08.2016 um 11:22 schrieb Paolo Bonzini:


On 28/07/2016 04:39, Eric Blake wrote:

On 07/27/2016 01:25 AM, Fam Zheng wrote:

On Thu, 07/21 13:34, Eric Blake wrote:

+max_write_zeroes = max_write_zeroes / alignment * alignment;

Not using QEMU_ALIGN_DOWN despite patch 3?

Looks like I missed that on the rebase. Can fix if there is a reason for
a respin.

Since Stefan acked this, I'm applying the patch and fixing it to use
QEMU_ALIGN_DOWN.

Paolo


Hi,

I came across a sort of regression we introduced with the dropping of head and 
tail
of an unaligned discard.

The discard alignment that we use to trim the discard request is just a hint.

I learned on the equallogics that a page (which is this unusal 15MB large) is
unallocated even if the discard happens in pieces. E.g. in slices of 1MB 
requests.

From my point of view I would like to restore the old behaviour. What do you 
think?

Thanks,
Peter




Re: [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage

2016-10-18 Thread Peter Lieven

Am 12.10.2016 um 23:18 schrieb Michael R. Hines:

Peter,

Greetings from DigitalOcean. We're experiencing the same symptoms without this 
patch.
We have, collectively, many gigabytes of un-planned-for RSS being used 
per-hypervisor
that we would like to get rid of =).

Without explicitly trying this patch (will do that ASAP), we immediately 
noticed that the
192MB mentioned immediately melts away (Yay) when we disabled the coroutine 
thread pool explicitly,
with another ~100MB in additional stack usage that would likely also go away if 
we
applied the entirety of your patch.

Is there any chance you have revisited this or have a timeline for it?


Hi Michael,

the current master already includes some of the patches of this original 
series. There are still some changes left, but
what works for me is the current master +

diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 5816702..3eaef68 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -25,8 +25,6 @@ enum {
 };

 /** Free list to speed up creation */
-static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
-static unsigned int release_pool_size;
 static __thread QSLIST_HEAD(, Coroutine) alloc_pool = 
QSLIST_HEAD_INITIALIZER(pool);
 static __thread unsigned int alloc_pool_size;
 static __thread Notifier coroutine_pool_cleanup_notifier;
@@ -49,20 +47,10 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry)
 if (CONFIG_COROUTINE_POOL) {
 co = QSLIST_FIRST(_pool);
 if (!co) {
-if (release_pool_size > POOL_BATCH_SIZE) {
-/* Slow path; a good place to register the destructor, too.  */
-if (!coroutine_pool_cleanup_notifier.notify) {
-coroutine_pool_cleanup_notifier.notify = 
coroutine_pool_cleanup;
- qemu_thread_atexit_add(_pool_cleanup_notifier);
-}
-
-/* This is not exact; there could be a little skew between
- * release_pool_size and the actual size of release_pool.  But
- * it is just a heuristic, it does not need to be perfect.
- */
-alloc_pool_size = atomic_xchg(_pool_size, 0);
-QSLIST_MOVE_ATOMIC(_pool, _pool);
-co = QSLIST_FIRST(_pool);
+/* Slow path; a good place to register the destructor, too.  */
+if (!coroutine_pool_cleanup_notifier.notify) {
+coroutine_pool_cleanup_notifier.notify = 
coroutine_pool_cleanup;
+ qemu_thread_atexit_add(_pool_cleanup_notifier);
 }
 }
 if (co) {
@@ -85,11 +73,6 @@ static void coroutine_delete(Coroutine *co)
 co->caller = NULL;

 if (CONFIG_COROUTINE_POOL) {
-if (release_pool_size < POOL_BATCH_SIZE * 2) {
-QSLIST_INSERT_HEAD_ATOMIC(_pool, co, pool_next);
-atomic_inc(_pool_size);
-return;
-}
 if (alloc_pool_size < POOL_BATCH_SIZE) {
 QSLIST_INSERT_HEAD(_pool, co, pool_next);
 alloc_pool_size++;

+ invoking qemu with the following environemnet variable set:

MALLOC_MMAP_THRESHOLD_=32768 qemu-system-x86_64 

The last one makes glibc automatically using mmap when the malloced memory 
exceeds 32kByte.

Hope this helps,
Peter




Re: [Qemu-devel] [Qemu-block] block/nfs: Fine grained runtime options in nfs

2016-10-18 Thread Peter Lieven

Am 17.10.2016 um 21:34 schrieb Ashijeet Acharya:

On Tue, Oct 18, 2016 at 12:59 AM, Eric Blake  wrote:

On 10/17/2016 01:00 PM, Ashijeet Acharya wrote:


One more relatively easy question though, will we include @port as an
option in runtime_opts while converting NFS to use several
runtime_opts? The reason I ask this because the uri syntax for NFS in
QEMU looks like this:

nfs:[?param=value[=value2[&...]]]

It's actually nfs://[:port]/...

so the URI syntax already supports port.

But the commit message which added support for NFS had the uri which I
mentioned above and the code for NFS does not make use of 'port'
anywhere either, which is why I am a bit confused.


Hi Aschijeet,

don't worry there is no port number when connecting to an NFS server.
The portmapper always listens on port 111. So theoretically we could
specifiy a port in the URL but it is ignored.

BR,
Peter



Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-09-30 Thread Peter Lieven

Am 28.09.2016 um 21:52 schrieb Michael Roth:

Quoting Peter Lieven (2016-09-27 06:30:27)

Am 27.09.2016 um 12:28 schrieb Peter Lieven:

 Am 16.09.2016 um 15:56 schrieb Peter Lieven:

 Am 13.09.2016 um 20:04 schrieb Michael Roth:

 Quoting Peter Lieven (2016-09-13 10:52:04)

 Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi 
<stefa...@redhat.com>:


 On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth 
wrote:
 Quoting Stefan Hajnoczi (2016-09-05 12:54:35)

 On Fri, Aug 26, 2016 at 01:45:56PM +0200, 
Peter Lieven wrote:

 Am 25.08.2016 um 19:23 schrieb Michael 
Roth:
 Quoting Peter Lieven (2016-08-25 01:38:13)

 7c509d1 virtio: decrement vq->inuse in 
virtqueue_discard()
 700f26b virtio: recalculate vq->inuse 
after migration

 Looks like these got posted during the 
freeze :(


 The virtio thing is important because 
live migration is broken without
 the fix as  86cc089 is in 2.6.1.

 Not sure I understand the relation to 
86cc089. Wouldn't the check
 introduced there always pass due to target 
initializing inuse to 0?

 Or is the issue that the fix introduced in 
86cc089 is only partially
 effective due to inuse not being 
recalculated properly on target? That might
 warrant a 2.6.1.1...

 This is what Stefan wrote in the cover letter 
to the series:

 "I should mention this is for QEMU 2.7. These 
fixes are needed if the
 CVE-2016-5403 patch has been applied. Without 
these patches any device that holds VirtQueueElements acros
 live migration will terminate with a "Virtqueue 
size exceeded" error message. virtio-balloon and virtio-scsi are affected. virtio-bl
 probably too but I haven't tested it."

 Maybe

 The virtio inuse fixes are needed for stable 
(v2.6.2?) so that the
 spurious "Virtqueue size exceeded" on migration is 
solved.

 The error can be reproduced when there is a 
VirtQueueElement pending
 across migration (e.g. virtio-blk s->rq failed 
request list).

 Thanks for clarifying. I'm planning to do a 2.6.2 to 
capture these, the
 patches Peter mentioned, and some other fixes that 
came during 2.7 RC
 phase.

 I have an initial staging tree at:

  
https://github.com/mdroth/qemu/commits/stable-2.6-staging

 There's still a few PULLs in flight with patches I 
plan to pull in, but
 hoping to send out the patch round-up early next week 
and a release the
 following week.

 Two more candidates for stable:

 4b7f91e virtio: zero vq->inuse in virtio_reset()
 104e70c virtio-balloon: discard virtqueue element on reset

 They also deal with "Virtqueue size exceeded" errors.

 Stefan

 There also seems to be an regression (segfault) in the VNC 
server in 2.6.1, but i am still investigating.

 Do you have a reproducer? I can try a bisect. Trying to get the 
initial
 staging tree posted today but want to make sure any known 
regressions are
 addressed beforehand.


 Hi Michael,

 we have not been able to reproduce anymore. My guess is that our client
 had a bug in the new version
 and that the regression can only happen in a special connection state.
 But we are still trying
 to reproduce.


 We are again able to reproduce the VNC error. The vServer dies with:

 qemu: qemu_mutex_lock: Invalid argument

 We are working on it.


The bug we faced is fixed upstream in:

ui: avoid crash if vnc client disconnects with writes pending

This should definetly go in 2.6.2

Other vnc related patches might also go in:

vnc: make sure we finish disconnect

vnc: ensure connection sharing/limits is always configured

vnc: fix crash when vnc_server_info_get has an error

vnc: don't crash getting server info if lsock is NULL

vnc-enc-tight: fix off-by-one bug

vnc: fix incorrect checking condition when upd

Re: [Qemu-devel] [PATCH V9 1/7] oslib-posix: add helpers for stack alloc and free

2016-09-27 Thread Peter Lieven

Am 27.09.2016 um 13:59 schrieb Kevin Wolf:

Am 27.09.2016 um 11:58 hat Peter Lieven geschrieben:

the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
  include/sysemu/os-posix.h | 27 +++
  util/oslib-posix.c| 42 ++
  2 files changed, 69 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..3cfedbc 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,31 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
  
  bool is_daemonized(void);
  
+/**

+ * qemu_alloc_stack:
+ * @sz: pointer to a size_t holding the requested usable stack size
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()). This function also inserts an
+ * additional guard page to catch a potential stack overflow.
+ * Note that the memory required for the guard page and alignment
+ * and minimal stack size restrictions will increase the value of sz.
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.
+ */
+void *qemu_alloc_stack(size_t *sz);
+
+/**
+ * qemu_free_stack:
+ * @stack: stack to free
+ * @sz: size of stack in bytes
+ *
+ * Free a stack allocated via qemu_alloc_stack(). Note that sz must
+ * be exactly the adjusted stack size returned by qemu_alloc_stack.
+ */
+void qemu_free_stack(void *stack, size_t sz);
+
  #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f2d4e9e..5745229 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -499,3 +499,45 @@ pid_t qemu_fork(Error **errp)
  }
  return pid;
  }
+
+void *qemu_alloc_stack(size_t *sz)
+{
+void *ptr, *guardpage;
+size_t pagesz = getpagesize();
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+long min_stack_sz = sysconf(_SC_THREAD_STACK_MIN);
+*sz = MAX(MAX(min_stack_sz, 0), *sz);
+#endif
+/* adjust stack size to a multiple of the page size */
+*sz = ROUND_UP(*sz, pagesz);
+/* allocate one extra page for the guard page */
+*sz += pagesz;
+
+ptr = mmap(NULL, *sz, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+
+#if defined(HOST_IA64)
+/* separate register stack */
+guardpage = ptr + (((sz - pagesz) / 2) & ~pagesz);

s/sz/*sz/


+#elif defined(HOST_HPPA)
+/* stack grows up */
+guardpage = ptr + sz - pagesz;

Here too. I can fix both while applying the series.


That would be nice, thank you.

Peter



Re: [Qemu-devel] [PATCH v2] block/iscsi: Adding iser support in Libiscsi-QEMU

2016-09-27 Thread Peter Lieven
i, filename);
+iscsi_url = iscsi_parse_full_url(iscsi, uri_string_unescape(filename, -1, 
NULL));
  if (iscsi_url == NULL) {
-error_setg(errp, "Failed to parse URL : %s", filename);
+error_setg(errp, "Failed to parse URL : %s", 
uri_string_unescape(filename, -1, NULL));

uri_string_unescape() returns a newly allocated string.  This is a
memory leak!

will be fixed in v3


Is unescaping a bug fix?  Please put it into a separate patch.

because libvirt is parsing '?' char as %3F, I needed to parse to URI with 
unescaping.



  ret = -EINVAL;
  goto out;
  }
@@ -1609,7 +1640,13 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
  ret = -ENOMEM;
  goto out;
  }
-
+#if LIBISCSI_API_VERSION >= (20160603)
+if (iscsi_init_transport(iscsi, iscsi_url->transport)) {
+error_setg(errp, ("Error initializing transport."));
+ret = -EINVAL;
+goto out;
+}
+#endif
  if (iscsi_set_targetname(iscsi, iscsi_url->target)) {
  error_setg(errp, "iSCSI: Failed to set target name.");
  ret = -EINVAL;
@@ -2010,6 +2047,40 @@ static BlockDriver bdrv_iscsi = {
  .bdrv_attach_aio_context = iscsi_attach_aio_context,
  };
  +static BlockDriver bdrv_iser = {
+.format_name = "iser",
+.protocol_name   = "iser",
+
+.instance_size   = sizeof(IscsiLun),
+.bdrv_needs_filename = true,
+.bdrv_file_open  = iscsi_open,
+.bdrv_close  = iscsi_close,
+.bdrv_create = iscsi_create,
+.create_opts = _create_opts,
+.bdrv_reopen_prepare   = iscsi_reopen_prepare,
+.bdrv_reopen_commit= iscsi_reopen_commit,
+.bdrv_invalidate_cache = iscsi_invalidate_cache,
+
+.bdrv_getlength  = iscsi_getlength,
+.bdrv_get_info   = iscsi_get_info,
+.bdrv_truncate   = iscsi_truncate,
+.bdrv_refresh_limits = iscsi_refresh_limits,
+
+.bdrv_co_get_block_status = iscsi_co_get_block_status,
+.bdrv_co_pdiscard  = iscsi_co_pdiscard,
+.bdrv_co_pwrite_zeroes = iscsi_co_pwrite_zeroes,
+.bdrv_co_readv = iscsi_co_readv,
+.bdrv_co_writev_flags  = iscsi_co_writev_flags,
+.bdrv_co_flush_to_disk = iscsi_co_flush,
+
+#ifdef __linux__
+.bdrv_aio_ioctl   = iscsi_aio_ioctl,
+#endif
+
+.bdrv_detach_aio_context = iscsi_detach_aio_context,
+.bdrv_attach_aio_context = iscsi_attach_aio_context,
+};

The commit description says ?iser needs to be added to the URI. Why is
it necessary to define a new "iser" protocol block driver?

PaoloB asked to add this option also, in Libiscsi the URI parser checks for 
'iser://' as a different protocol or '?iser' as URI option.
I will add it into the commit message.







When you send a v3 please check your patches with scripts/checkpatch.pl for 
style errors.

Thanks,
Peter


--

Mit freundlichen Grüßen

Peter Lieven

...

  KAMP Netzwerkdienste GmbH
  Vestische Str. 89-91 | 46117 Oberhausen
  Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
  p...@kamp.de | http://www.kamp.de

  Geschäftsführer: Heiner Lante | Michael Lante
  Amtsgericht Duisburg | HRB Nr. 12154
  USt-Id-Nr.: DE 120607556

...




Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-09-27 Thread Peter Lieven

Am 27.09.2016 um 12:28 schrieb Peter Lieven:

Am 16.09.2016 um 15:56 schrieb Peter Lieven:

Am 13.09.2016 um 20:04 schrieb Michael Roth:

Quoting Peter Lieven (2016-09-13 10:52:04)

Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi<stefa...@redhat.com>:


On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth wrote:
Quoting Stefan Hajnoczi (2016-09-05 12:54:35)

On Fri, Aug 26, 2016 at 01:45:56PM +0200, Peter Lieven wrote:

Am 25.08.2016 um 19:23 schrieb Michael Roth:
Quoting Peter Lieven (2016-08-25 01:38:13)

7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
700f26b virtio: recalculate vq->inuse after migration

Looks like these got posted during the freeze :(


The virtio thing is important because live migration is broken without
the fix as  86cc089 is in 2.6.1.

Not sure I understand the relation to 86cc089. Wouldn't the check
introduced there always pass due to target initializing inuse to 0?

Or is the issue that the fix introduced in 86cc089 is only partially
effective due to inuse not being recalculated properly on target? That might
warrant a 2.6.1.1...

This is what Stefan wrote in the cover letter to the series:

"I should mention this is for QEMU 2.7. These fixes are needed if the
CVE-2016-5403 patch has been applied. Without these patches any device that 
holds VirtQueueElements acros
live migration will terminate with a "Virtqueue size exceeded" error message. 
virtio-balloon and virtio-scsi are affected. virtio-bl
probably too but I haven't tested it."

Maybe

The virtio inuse fixes are needed for stable (v2.6.2?) so that the
spurious "Virtqueue size exceeded" on migration is solved.

The error can be reproduced when there is a VirtQueueElement pending
across migration (e.g. virtio-blk s->rq failed request list).

Thanks for clarifying. I'm planning to do a 2.6.2 to capture these, the
patches Peter mentioned, and some other fixes that came during 2.7 RC
phase.

I have an initial staging tree at:

  https://github.com/mdroth/qemu/commits/stable-2.6-staging

There's still a few PULLs in flight with patches I plan to pull in, but
hoping to send out the patch round-up early next week and a release the
following week.

Two more candidates for stable:

4b7f91e virtio: zero vq->inuse in virtio_reset()
104e70c virtio-balloon: discard virtqueue element on reset

They also deal with "Virtqueue size exceeded" errors.

Stefan

There also seems to be an regression (segfault) in the VNC server in 2.6.1, but 
i am still investigating.

Do you have a reproducer? I can try a bisect. Trying to get the initial
staging tree posted today but want to make sure any known regressions are
addressed beforehand.


Hi Michael,

we have not been able to reproduce anymore. My guess is that our client had a 
bug in the new version
and that the regression can only happen in a special connection state. But we 
are still trying
to reproduce.


We are again able to reproduce the VNC error. The vServer dies with:

qemu: qemu_mutex_lock: Invalid argument

We are working on it.


The bug we faced is fixed upstream in:

ui: avoid crash if vnc client disconnects with writes pending

This should definetly go in 2.6.2

Other vnc related patches might also go in:

vnc: make sure we finish disconnect

vnc: ensure connection sharing/limits is always configured

vnc: fix crash when vnc_server_info_get has an error

vnc: don't crash getting server info if lsock is NULL

vnc-enc-tight: fix off-by-one bug

vnc: fix incorrect checking condition when updating client


unfortunately none of these had qemu-stable in CC.


Peter


Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-09-27 Thread Peter Lieven

Am 16.09.2016 um 15:56 schrieb Peter Lieven:

Am 13.09.2016 um 20:04 schrieb Michael Roth:

Quoting Peter Lieven (2016-09-13 10:52:04)

Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi<stefa...@redhat.com>:


On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth wrote:
Quoting Stefan Hajnoczi (2016-09-05 12:54:35)

On Fri, Aug 26, 2016 at 01:45:56PM +0200, Peter Lieven wrote:

Am 25.08.2016 um 19:23 schrieb Michael Roth:
Quoting Peter Lieven (2016-08-25 01:38:13)

7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
700f26b virtio: recalculate vq->inuse after migration

Looks like these got posted during the freeze :(


The virtio thing is important because live migration is broken without
the fix as  86cc089 is in 2.6.1.

Not sure I understand the relation to 86cc089. Wouldn't the check
introduced there always pass due to target initializing inuse to 0?

Or is the issue that the fix introduced in 86cc089 is only partially
effective due to inuse not being recalculated properly on target? That might
warrant a 2.6.1.1...

This is what Stefan wrote in the cover letter to the series:

"I should mention this is for QEMU 2.7. These fixes are needed if the
CVE-2016-5403 patch has been applied. Without these patches any device that 
holds VirtQueueElements acros
live migration will terminate with a "Virtqueue size exceeded" error message. 
virtio-balloon and virtio-scsi are affected. virtio-bl
probably too but I haven't tested it."

Maybe

The virtio inuse fixes are needed for stable (v2.6.2?) so that the
spurious "Virtqueue size exceeded" on migration is solved.

The error can be reproduced when there is a VirtQueueElement pending
across migration (e.g. virtio-blk s->rq failed request list).

Thanks for clarifying. I'm planning to do a 2.6.2 to capture these, the
patches Peter mentioned, and some other fixes that came during 2.7 RC
phase.

I have an initial staging tree at:

  https://github.com/mdroth/qemu/commits/stable-2.6-staging

There's still a few PULLs in flight with patches I plan to pull in, but
hoping to send out the patch round-up early next week and a release the
following week.

Two more candidates for stable:

4b7f91e virtio: zero vq->inuse in virtio_reset()
104e70c virtio-balloon: discard virtqueue element on reset

They also deal with "Virtqueue size exceeded" errors.

Stefan

There also seems to be an regression (segfault) in the VNC server in 2.6.1, but 
i am still investigating.

Do you have a reproducer? I can try a bisect. Trying to get the initial
staging tree posted today but want to make sure any known regressions are
addressed beforehand.


Hi Michael,

we have not been able to reproduce anymore. My guess is that our client had a 
bug in the new version
and that the regression can only happen in a special connection state. But we 
are still trying
to reproduce.


We are again able to reproduce the VNC error. The vServer dies with:

qemu: qemu_mutex_lock: Invalid argument

We are working on it.

Peter


[Qemu-devel] [PATCH V9 2/7] coroutine-sigaltstack: rename coroutine struct appropriately

2016-09-27 Thread Peter Lieven
The name of the sigaltstack coroutine struct was misleading.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 util/coroutine-sigaltstack.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a7c3366..171cd44 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -34,7 +34,7 @@ typedef struct {
 Coroutine base;
 void *stack;
 sigjmp_buf env;
-} CoroutineUContext;
+} CoroutineSigAltStack;
 
 /**
  * Per-thread coroutine bookkeeping
@@ -44,7 +44,7 @@ typedef struct {
 Coroutine *current;
 
 /** The default coroutine */
-CoroutineUContext leader;
+CoroutineSigAltStack leader;
 
 /** Information for the signal handler (trampoline) */
 sigjmp_buf tr_reenter;
@@ -89,7 +89,7 @@ static void __attribute__((constructor)) coroutine_init(void)
  * (from the signal handler when it is not signal handling, read ahead
  * for more information).
  */
-static void coroutine_bootstrap(CoroutineUContext *self, Coroutine *co)
+static void coroutine_bootstrap(CoroutineSigAltStack *self, Coroutine *co)
 {
 /* Initialize longjmp environment and switch back the caller */
 if (!sigsetjmp(self->env, 0)) {
@@ -109,7 +109,7 @@ static void coroutine_bootstrap(CoroutineUContext *self, 
Coroutine *co)
  */
 static void coroutine_trampoline(int signal)
 {
-CoroutineUContext *self;
+CoroutineSigAltStack *self;
 Coroutine *co;
 CoroutineThreadState *coTS;
 
@@ -144,7 +144,7 @@ static void coroutine_trampoline(int signal)
 Coroutine *qemu_coroutine_new(void)
 {
 const size_t stack_size = 1 << 20;
-CoroutineUContext *co;
+CoroutineSigAltStack *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
 struct sigaction osa;
@@ -251,7 +251,7 @@ Coroutine *qemu_coroutine_new(void)
 
 void qemu_coroutine_delete(Coroutine *co_)
 {
-CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
+CoroutineSigAltStack *co = DO_UPCAST(CoroutineSigAltStack, base, co_);
 
 g_free(co->stack);
 g_free(co);
@@ -260,8 +260,8 @@ void qemu_coroutine_delete(Coroutine *co_)
 CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
   CoroutineAction action)
 {
-CoroutineUContext *from = DO_UPCAST(CoroutineUContext, base, from_);
-CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, to_);
+CoroutineSigAltStack *from = DO_UPCAST(CoroutineSigAltStack, base, from_);
+CoroutineSigAltStack *to = DO_UPCAST(CoroutineSigAltStack, base, to_);
 CoroutineThreadState *s = coroutine_get_thread_state();
 int ret;
 
-- 
1.9.1




[Qemu-devel] [PATCH V9 1/7] oslib-posix: add helpers for stack alloc and free

2016-09-27 Thread Peter Lieven
the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 include/sysemu/os-posix.h | 27 +++
 util/oslib-posix.c| 42 ++
 2 files changed, 69 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..3cfedbc 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,31 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
 
 bool is_daemonized(void);
 
+/**
+ * qemu_alloc_stack:
+ * @sz: pointer to a size_t holding the requested usable stack size
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()). This function also inserts an
+ * additional guard page to catch a potential stack overflow.
+ * Note that the memory required for the guard page and alignment
+ * and minimal stack size restrictions will increase the value of sz.
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.
+ */
+void *qemu_alloc_stack(size_t *sz);
+
+/**
+ * qemu_free_stack:
+ * @stack: stack to free
+ * @sz: size of stack in bytes
+ *
+ * Free a stack allocated via qemu_alloc_stack(). Note that sz must
+ * be exactly the adjusted stack size returned by qemu_alloc_stack.
+ */
+void qemu_free_stack(void *stack, size_t sz);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f2d4e9e..5745229 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -499,3 +499,45 @@ pid_t qemu_fork(Error **errp)
 }
 return pid;
 }
+
+void *qemu_alloc_stack(size_t *sz)
+{
+void *ptr, *guardpage;
+size_t pagesz = getpagesize();
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+long min_stack_sz = sysconf(_SC_THREAD_STACK_MIN);
+*sz = MAX(MAX(min_stack_sz, 0), *sz);
+#endif
+/* adjust stack size to a multiple of the page size */
+*sz = ROUND_UP(*sz, pagesz);
+/* allocate one extra page for the guard page */
+*sz += pagesz;
+
+ptr = mmap(NULL, *sz, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+
+#if defined(HOST_IA64)
+/* separate register stack */
+guardpage = ptr + (((sz - pagesz) / 2) & ~pagesz);
+#elif defined(HOST_HPPA)
+/* stack grows up */
+guardpage = ptr + sz - pagesz;
+#else
+/* stack grows down */
+guardpage = ptr;
+#endif
+if (mprotect(guardpage, pagesz, PROT_NONE) != 0) {
+abort();
+}
+
+return ptr;
+}
+
+void qemu_free_stack(void *stack, size_t sz)
+{
+munmap(stack, sz);
+}
-- 
1.9.1




[Qemu-devel] [PATCH V9 6/7] oslib-posix: add a configure switch to debug stack usage

2016-09-27 Thread Peter Lieven
this adds a knob to track the maximum stack usage of stacks
created by qemu_alloc_stack.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 configure  | 19 +++
 util/oslib-posix.c | 35 +++
 2 files changed, 54 insertions(+)

diff --git a/configure b/configure
index 8fa62ad..93ef00a 100755
--- a/configure
+++ b/configure
@@ -296,6 +296,7 @@ libiscsi=""
 libnfs=""
 coroutine=""
 coroutine_pool=""
+debug_stack_usage="no"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -1004,6 +1005,8 @@ for opt do
   ;;
   --enable-coroutine-pool) coroutine_pool="yes"
   ;;
+  --enable-debug-stack-usage) debug_stack_usage="yes"
+  ;;
   --disable-docs) docs="no"
   ;;
   --enable-docs) docs="yes"
@@ -4276,6 +4279,17 @@ if test "$coroutine" = "gthread" -a "$coroutine_pool" = 
"yes"; then
   error_exit "'gthread' coroutine backend does not support pool (use 
--disable-coroutine-pool)"
 fi
 
+if test "$debug_stack_usage" = "yes"; then
+  if test "$cpu" = "ia64" -o "$cpu" = "hppa"; then
+error_exit "stack usage debugging is not supported for $cpu"
+  fi
+  if test "$coroutine_pool" = "yes"; then
+echo "WARN: disabling coroutine pool for stack usage debugging"
+coroutine_pool=no
+  fi
+fi
+
+
 ##
 # check if we have open_by_handle_at
 
@@ -4861,6 +4875,7 @@ echo "QGA MSI support   $guest_agent_msi"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool$coroutine_pool"
+echo "debug stack usage $debug_stack_usage"
 echo "GlusterFS support $glusterfs"
 echo "Archipelago support $archipelago"
 echo "gcov  $gcov_tool"
@@ -5330,6 +5345,10 @@ else
   echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
 fi
 
+if test "$debug_stack_usage" = "yes" ; then
+  echo "CONFIG_DEBUG_STACK_USAGE=y" >> $config_host_mak
+fi
+
 if test "$open_by_handle_at" = "yes" ; then
   echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
 fi
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 5745229..21a5637 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -50,6 +50,10 @@
 
 #include "qemu/mmap-alloc.h"
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+#include "qemu/error-report.h"
+#endif
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -503,6 +507,9 @@ pid_t qemu_fork(Error **errp)
 void *qemu_alloc_stack(size_t *sz)
 {
 void *ptr, *guardpage;
+#ifdef CONFIG_DEBUG_STACK_USAGE
+void *ptr2;
+#endif
 size_t pagesz = getpagesize();
 #ifdef _SC_THREAD_STACK_MIN
 /* avoid stacks smaller than _SC_THREAD_STACK_MIN */
@@ -534,10 +541,38 @@ void *qemu_alloc_stack(size_t *sz)
 abort();
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr2 = ptr + pagesz; ptr2 < ptr + *sz; ptr2 += sizeof(uint32_t)) {
+*(uint32_t *)ptr2 = 0xdeadbeaf;
+}
+#endif
+
 return ptr;
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+static __thread unsigned int max_stack_usage;
+#endif
+
 void qemu_free_stack(void *stack, size_t sz)
 {
+#ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned int usage;
+void *ptr;
+
+for (ptr = stack + getpagesize(); ptr < stack + sz;
+ ptr += sizeof(uint32_t)) {
+if (*(uint32_t *)ptr != 0xdeadbeaf) {
+break;
+}
+}
+usage = sz - (uintptr_t) (ptr - stack);
+if (usage > max_stack_usage) {
+error_report("thread %d max stack usage increased from %u to %u",
+ qemu_get_thread_id(), max_stack_usage, usage);
+max_stack_usage = usage;
+}
+#endif
+
 munmap(stack, sz);
 }
-- 
1.9.1




[Qemu-devel] [PATCH V9 5/7] coroutine-sigaltstack: use helper for allocating stack memory

2016-09-27 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 util/coroutine-sigaltstack.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a5bcb7e..f6fc49a 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -33,6 +33,7 @@
 typedef struct {
 Coroutine base;
 void *stack;
+size_t stack_size;
 sigjmp_buf env;
 } CoroutineSigAltStack;
 
@@ -143,7 +144,6 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineSigAltStack *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
@@ -164,7 +164,8 @@ Coroutine *qemu_coroutine_new(void)
  */
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack_size = COROUTINE_STACK_SIZE;
+co->stack = qemu_alloc_stack(>stack_size);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 coTS = coroutine_get_thread_state();
@@ -189,7 +190,7 @@ Coroutine *qemu_coroutine_new(void)
  * Set the new stack.
  */
 ss.ss_sp = co->stack;
-ss.ss_size = stack_size;
+ss.ss_size = co->stack_size;
 ss.ss_flags = 0;
 if (sigaltstack(, ) < 0) {
 abort();
@@ -253,7 +254,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineSigAltStack *co = DO_UPCAST(CoroutineSigAltStack, base, co_);
 
-g_free(co->stack);
+qemu_free_stack(co->stack, co->stack_size);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V9 4/7] coroutine-ucontext: use helper for allocating stack memory

2016-09-27 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 util/coroutine-ucontext.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 31254ab..6621f3f 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -34,6 +34,7 @@
 typedef struct {
 Coroutine base;
 void *stack;
+size_t stack_size;
 sigjmp_buf env;
 
 #ifdef CONFIG_VALGRIND_H
@@ -82,7 +83,6 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
@@ -101,17 +101,18 @@ Coroutine *qemu_coroutine_new(void)
 }
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack_size = COROUTINE_STACK_SIZE;
+co->stack = qemu_alloc_stack(>stack_size);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 uc.uc_link = _uc;
 uc.uc_stack.ss_sp = co->stack;
-uc.uc_stack.ss_size = stack_size;
+uc.uc_stack.ss_size = co->stack_size;
 uc.uc_stack.ss_flags = 0;
 
 #ifdef CONFIG_VALGRIND_H
 co->valgrind_stack_id =
-VALGRIND_STACK_REGISTER(co->stack, co->stack + stack_size);
+VALGRIND_STACK_REGISTER(co->stack, co->stack + co->stack_size);
 #endif
 
 arg.p = co;
@@ -149,7 +150,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 valgrind_stack_deregister(co);
 #endif
 
-g_free(co->stack);
+qemu_free_stack(co->stack, co->stack_size);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V9 0/7] coroutine: mmap stack memory and stack size

2016-09-27 Thread Peter Lieven
I decided to split this from the rest of the Qemu RSS usage series as
it contains the more or less non contentious patches.

I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
side effects it has.

I kept the guard page which is now nicely makes the stacks visible in
smaps. The old version of the relevent patch lacked the MAP_FIXED flag
in the second call to mmap.

v8->v9:
 - Patch 1: return size of stack including guard page [Kevin]
 - Patch 2: added to rename coroutine sigaltstack strucht [Kevin]

v7->v8:
 The series failed on platforms with 64kB page size. Thus the following changes
 where made:
 - Patch 1: add the guard page to the stack memory and do not deduct it [Kevin, 
Stephan]
 - Patch 1: Submit the requested page size as a pointer so that 
qemu_alloc_stack can
adjust the size according to system requirements and that the full 
size is usable
to the caller.
 - Patch 6: reduced stack size to 60kB so that on systems with 4kB page size we 
still get
64kB allocations.

v6->v7:
 - Patch 1: avoid multiple calls to sysconf and getpagesize [Richard]

v5->v6:
 - Patch 1: added info that the guard page is deducted from stack memory to
commit msg and headers [Stefan]
 - rebased to master

v4->v5:
 - Patch 1: check if _SC_THREAD_STACK_MIN is defined
 - Patch 1: guard against sysconf(_SC_THREAD_STACK_MIN) returning -1 [Eric]

v3->v4:
 - Patch 1: add a static function to adjust the stack size [Richard]
 - Patch 1: round up the stack size to multiple of the pagesize.

v2->v3:
 - Patch 1,6: adjusted commit message to mention the guard page [Markus]

v1->v2:
 - Patch 1: added an architecture dependend guard page [Richard]
 - Patch 1: avoid stacks smaller than _SC_THREAD_STACK_MIN [Richard]
 - Patch 1: use mmap+mprotect instead of mmap+mmap [Richard]
 - Patch 5: u_int32_t -> uint32_t [Richard]
 - Patch 5: only available if stack grows down

Peter Lieven (7):
  oslib-posix: add helpers for stack alloc and free
  coroutine-sigaltstack: rename coroutine struct appropriately
  coroutine: add a macro for the coroutine stack size
  coroutine-ucontext: use helper for allocating stack memory
  coroutine-sigaltstack: use helper for allocating stack memory
  oslib-posix: add a configure switch to debug stack usage
  coroutine: reduce stack size to 60kB

 configure| 19 +++
 include/qemu/coroutine_int.h |  2 ++
 include/sysemu/os-posix.h| 27 
 util/coroutine-sigaltstack.c | 25 +++---
 util/coroutine-ucontext.c| 11 ---
 util/coroutine-win32.c   |  2 +-
 util/oslib-posix.c   | 77 
 7 files changed, 145 insertions(+), 18 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH V9 7/7] coroutine: reduce stack size to 60kB

2016-09-27 Thread Peter Lieven
evaluation with the recently introduced maximum stack usage monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
60kB which should still give enough head room. The guard page added
in qemu_alloc_stack will catch a potential stack overflow introduced
by this commit. The 60kB + guard page will result in an allocation of
64kB per coroutine on systems where a page is 4kB.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 14d4f1d..be14260 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,7 +28,7 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE 61440
 
 typedef enum {
 COROUTINE_YIELD = 1,
-- 
1.9.1




[Qemu-devel] [PATCH V9 3/7] coroutine: add a macro for the coroutine stack size

2016-09-27 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 include/qemu/coroutine_int.h | 2 ++
 util/coroutine-sigaltstack.c | 2 +-
 util/coroutine-ucontext.c| 2 +-
 util/coroutine-win32.c   | 2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 6df9d33..14d4f1d 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,6 +28,8 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
+#define COROUTINE_STACK_SIZE (1 << 20)
+
 typedef enum {
 COROUTINE_YIELD = 1,
 COROUTINE_TERMINATE = 2,
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index 171cd44..a5bcb7e 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,7 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineSigAltStack *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 2bb7e10..31254ab 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,7 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c
index 02e28e8..de6bd4f 100644
--- a/util/coroutine-win32.c
+++ b/util/coroutine-win32.c
@@ -71,7 +71,7 @@ static void CALLBACK coroutine_trampoline(void *co_)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineWin32 *co;
 
 co = g_malloc0(sizeof(*co));
-- 
1.9.1




Re: [Qemu-devel] [PATCH V8 1/6] oslib-posix: add helpers for stack alloc and free

2016-09-26 Thread Peter Lieven

Am 26.09.2016 um 15:44 schrieb Kevin Wolf:

Am 26.09.2016 um 13:44 hat Peter Lieven geschrieben:

the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
  include/sysemu/os-posix.h | 27 +++
  util/oslib-posix.c| 43 +++
  2 files changed, 70 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..4a0f493 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,31 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
  
  bool is_daemonized(void);
  
+/**

+ * qemu_alloc_stack:
+ * @sz: pointer to a size_t holding the requested stack size
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()). This function also inserts an
+ * additional guard page to catch a potential stack overflow.
+ * Note that the useable stack memory can be greater than the
+ * requested stack size due to alignment and minimal stack size
+ * restrictions. In this case the value of sz is adjusted.
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.

Not quite. It's the pointer to the lowest address of the guard page,
while the returned stack size doesn't include the guard page. This is an
awkward interface, and consequently patch 3 fails to use it correctly.

So you end up with something like:

 |||||
    

 G = guard page
 . = allocated stack page
 * = stack as used for makecontext()

That is, the guard page is included in the stack used to create the
coroutine context, and the last page stays unused. On systems where we
only allocate a single page for the stack, this obviously means that the
tests still fail.


you are right. so I should adjust the size to allocsz instead?

the other option would be to keep version 7 of this series and
adjust the COROUTINE_SIZE to MAX(2*pagesize(), 1 << 16) to
avoid the problem?

Peter



[Qemu-devel] [PATCH V8 5/6] oslib-posix: add a configure switch to debug stack usage

2016-09-26 Thread Peter Lieven
this adds a knob to track the maximum stack usage of stacks
created by qemu_alloc_stack.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 configure  | 19 +++
 util/oslib-posix.c | 40 +++-
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 8fa62ad..93ef00a 100755
--- a/configure
+++ b/configure
@@ -296,6 +296,7 @@ libiscsi=""
 libnfs=""
 coroutine=""
 coroutine_pool=""
+debug_stack_usage="no"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -1004,6 +1005,8 @@ for opt do
   ;;
   --enable-coroutine-pool) coroutine_pool="yes"
   ;;
+  --enable-debug-stack-usage) debug_stack_usage="yes"
+  ;;
   --disable-docs) docs="no"
   ;;
   --enable-docs) docs="yes"
@@ -4276,6 +4279,17 @@ if test "$coroutine" = "gthread" -a "$coroutine_pool" = 
"yes"; then
   error_exit "'gthread' coroutine backend does not support pool (use 
--disable-coroutine-pool)"
 fi
 
+if test "$debug_stack_usage" = "yes"; then
+  if test "$cpu" = "ia64" -o "$cpu" = "hppa"; then
+error_exit "stack usage debugging is not supported for $cpu"
+  fi
+  if test "$coroutine_pool" = "yes"; then
+echo "WARN: disabling coroutine pool for stack usage debugging"
+coroutine_pool=no
+  fi
+fi
+
+
 ##
 # check if we have open_by_handle_at
 
@@ -4861,6 +4875,7 @@ echo "QGA MSI support   $guest_agent_msi"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool$coroutine_pool"
+echo "debug stack usage $debug_stack_usage"
 echo "GlusterFS support $glusterfs"
 echo "Archipelago support $archipelago"
 echo "gcov  $gcov_tool"
@@ -5330,6 +5345,10 @@ else
   echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
 fi
 
+if test "$debug_stack_usage" = "yes" ; then
+  echo "CONFIG_DEBUG_STACK_USAGE=y" >> $config_host_mak
+fi
+
 if test "$open_by_handle_at" = "yes" ; then
   echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
 fi
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 7d053b8..18940d9 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -50,6 +50,10 @@
 
 #include "qemu/mmap-alloc.h"
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+#include "qemu/error-report.h"
+#endif
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -503,6 +507,9 @@ pid_t qemu_fork(Error **errp)
 void *qemu_alloc_stack(size_t *sz)
 {
 void *ptr, *guardpage;
+#ifdef CONFIG_DEBUG_STACK_USAGE
+void *ptr2;
+#endif
 size_t pagesz = getpagesize();
 size_t allocsz;
 #ifdef _SC_THREAD_STACK_MIN
@@ -535,10 +542,41 @@ void *qemu_alloc_stack(size_t *sz)
 abort();
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr2 = ptr + pagesz; ptr2 < ptr + allocsz; ptr2 += sizeof(uint32_t)) {
+*(uint32_t *)ptr2 = 0xdeadbeaf;
+}
+#endif
+
 return ptr;
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+static __thread unsigned int max_stack_usage;
+#endif
+
 void qemu_free_stack(void *stack, size_t sz)
 {
-munmap(stack, sz + getpagesize());
+#ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned int usage;
+void *ptr;
+#endif
+size_t pagesz = getpagesize();
+size_t allocsz = sz + pagesz;
+
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr = stack + pagesz; ptr < stack + allocsz; ptr += sizeof(uint32_t)) 
{
+if (*(uint32_t *)ptr != 0xdeadbeaf) {
+break;
+}
+}
+usage = sz - (uintptr_t) (ptr - stack);
+if (usage > max_stack_usage) {
+error_report("thread %d max stack usage increased from %u to %u",
+ qemu_get_thread_id(), max_stack_usage, usage);
+max_stack_usage = usage;
+}
+#endif
+
+munmap(stack, allocsz);
 }
-- 
1.9.1




[Qemu-devel] [PATCH V8 1/6] oslib-posix: add helpers for stack alloc and free

2016-09-26 Thread Peter Lieven
the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 include/sysemu/os-posix.h | 27 +++
 util/oslib-posix.c| 43 +++
 2 files changed, 70 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..4a0f493 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,31 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
 
 bool is_daemonized(void);
 
+/**
+ * qemu_alloc_stack:
+ * @sz: pointer to a size_t holding the requested stack size
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()). This function also inserts an
+ * additional guard page to catch a potential stack overflow.
+ * Note that the useable stack memory can be greater than the
+ * requested stack size due to alignment and minimal stack size
+ * restrictions. In this case the value of sz is adjusted.
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.
+ */
+void *qemu_alloc_stack(size_t *sz);
+
+/**
+ * qemu_free_stack:
+ * @stack: stack to free
+ * @sz: size of stack in bytes
+ *
+ * Free a stack allocated via qemu_alloc_stack().
+ */
+void qemu_free_stack(void *stack, size_t sz);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f2d4e9e..7d053b8 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -499,3 +499,46 @@ pid_t qemu_fork(Error **errp)
 }
 return pid;
 }
+
+void *qemu_alloc_stack(size_t *sz)
+{
+void *ptr, *guardpage;
+size_t pagesz = getpagesize();
+size_t allocsz;
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+long min_stack_sz = sysconf(_SC_THREAD_STACK_MIN);
+*sz = MAX(MAX(min_stack_sz, 0), *sz);
+#endif
+/* adjust stack size to a multiple of the page size */
+*sz = ROUND_UP(*sz, pagesz);
+/* allocate one extra page for the guard page */
+allocsz = *sz + getpagesize();
+
+ptr = mmap(NULL, allocsz, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+
+#if defined(HOST_IA64)
+/* separate register stack */
+guardpage = ptr + (((allocsz - pagesz) / 2) & ~pagesz);
+#elif defined(HOST_HPPA)
+/* stack grows up */
+guardpage = ptr + allocsz - pagesz;
+#else
+/* stack grows down */
+guardpage = ptr;
+#endif
+if (mprotect(guardpage, pagesz, PROT_NONE) != 0) {
+abort();
+}
+
+return ptr;
+}
+
+void qemu_free_stack(void *stack, size_t sz)
+{
+munmap(stack, sz + getpagesize());
+}
-- 
1.9.1




[Qemu-devel] [PATCH V8 2/6] coroutine: add a macro for the coroutine stack size

2016-09-26 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 include/qemu/coroutine_int.h | 2 ++
 util/coroutine-sigaltstack.c | 2 +-
 util/coroutine-ucontext.c| 2 +-
 util/coroutine-win32.c   | 2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 6df9d33..14d4f1d 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,6 +28,8 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
+#define COROUTINE_STACK_SIZE (1 << 20)
+
 typedef enum {
 COROUTINE_YIELD = 1,
 COROUTINE_TERMINATE = 2,
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a7c3366..9c2854c 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,7 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 2bb7e10..31254ab 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,7 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c
index 02e28e8..de6bd4f 100644
--- a/util/coroutine-win32.c
+++ b/util/coroutine-win32.c
@@ -71,7 +71,7 @@ static void CALLBACK coroutine_trampoline(void *co_)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineWin32 *co;
 
 co = g_malloc0(sizeof(*co));
-- 
1.9.1




[Qemu-devel] [PATCH V8 3/6] coroutine-ucontext: use helper for allocating stack memory

2016-09-26 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 util/coroutine-ucontext.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 31254ab..6621f3f 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -34,6 +34,7 @@
 typedef struct {
 Coroutine base;
 void *stack;
+size_t stack_size;
 sigjmp_buf env;
 
 #ifdef CONFIG_VALGRIND_H
@@ -82,7 +83,6 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
@@ -101,17 +101,18 @@ Coroutine *qemu_coroutine_new(void)
 }
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack_size = COROUTINE_STACK_SIZE;
+co->stack = qemu_alloc_stack(>stack_size);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 uc.uc_link = _uc;
 uc.uc_stack.ss_sp = co->stack;
-uc.uc_stack.ss_size = stack_size;
+uc.uc_stack.ss_size = co->stack_size;
 uc.uc_stack.ss_flags = 0;
 
 #ifdef CONFIG_VALGRIND_H
 co->valgrind_stack_id =
-VALGRIND_STACK_REGISTER(co->stack, co->stack + stack_size);
+VALGRIND_STACK_REGISTER(co->stack, co->stack + co->stack_size);
 #endif
 
 arg.p = co;
@@ -149,7 +150,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 valgrind_stack_deregister(co);
 #endif
 
-g_free(co->stack);
+qemu_free_stack(co->stack, co->stack_size);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V8 4/6] coroutine-sigaltstack: use helper for allocating stack memory

2016-09-26 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
---
 util/coroutine-sigaltstack.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index 9c2854c..d9c7f66 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -33,6 +33,7 @@
 typedef struct {
 Coroutine base;
 void *stack;
+size_t stack_size;
 sigjmp_buf env;
 } CoroutineUContext;
 
@@ -143,7 +144,6 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
@@ -164,7 +164,8 @@ Coroutine *qemu_coroutine_new(void)
  */
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack_size = COROUTINE_STACK_SIZE;
+co->stack = qemu_alloc_stack(>stack_size);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 coTS = coroutine_get_thread_state();
@@ -189,7 +190,7 @@ Coroutine *qemu_coroutine_new(void)
  * Set the new stack.
  */
 ss.ss_sp = co->stack;
-ss.ss_size = stack_size;
+ss.ss_size = co->stack_size;
 ss.ss_flags = 0;
 if (sigaltstack(, ) < 0) {
 abort();
@@ -253,7 +254,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
 
-g_free(co->stack);
+qemu_free_stack(co->stack, co->stack_size);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V8 0/6] coroutine: mmap stack memory and stack size

2016-09-26 Thread Peter Lieven
I decided to split this from the rest of the Qemu RSS usage series as
it contains the more or less non contentious patches.

I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
side effects it has.

I kept the guard page which is now nicely makes the stacks visible in
smaps. The old version of the relevent patch lacked the MAP_FIXED flag
in the second call to mmap.

v7->v8:
 The series failed on platforms with 64kB page size. Thus the following changes
 where made:
 - Patch 1: add the guard page to the stack memory and do not deduct it [Kevin, 
Stephan]
 - Patch 1: Submit the requested page size as a pointer so that 
qemu_alloc_stack can
adjust the size according to system requirements and that the full 
size is usable
to the caller.
 - Patch 6: reduced stack size to 60kB so that on systems with 4kB page size we 
still get
64kB allocations.

v6->v7:
 - Patch 1: avoid multiple calls to sysconf and getpagesize [Richard]

v5->v6:
 - Patch 1: added info that the guard page is deducted from stack memory to
commit msg and headers [Stefan]
 - rebased to master

v4->v5:
 - Patch 1: check if _SC_THREAD_STACK_MIN is defined
 - Patch 1: guard against sysconf(_SC_THREAD_STACK_MIN) returning -1 [Eric]

v3->v4:
 - Patch 1: add a static function to adjust the stack size [Richard]
 - Patch 1: round up the stack size to multiple of the pagesize.

v2->v3:
 - Patch 1,6: adjusted commit message to mention the guard page [Markus]

v1->v2:
 - Patch 1: added an architecture dependend guard page [Richard]
 - Patch 1: avoid stacks smaller than _SC_THREAD_STACK_MIN [Richard]
 - Patch 1: use mmap+mprotect instead of mmap+mmap [Richard]
 - Patch 5: u_int32_t -> uint32_t [Richard]
 - Patch 5: only available if stack grows down

Peter Lieven (6):
  oslib-posix: add helpers for stack alloc and free
  coroutine: add a macro for the coroutine stack size
  coroutine-ucontext: use helper for allocating stack memory
  coroutine-sigaltstack: use helper for allocating stack memory
  oslib-posix: add a configure switch to debug stack usage
  coroutine: reduce stack size to 60kB

 configure| 19 +++
 include/qemu/coroutine_int.h |  2 ++
 include/sysemu/os-posix.h| 27 +++
 util/coroutine-sigaltstack.c |  9 ++---
 util/coroutine-ucontext.c| 11 +++---
 util/coroutine-win32.c   |  2 +-
 util/oslib-posix.c   | 81 
 7 files changed, 141 insertions(+), 10 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH V8 6/6] coroutine: reduce stack size to 60kB

2016-09-26 Thread Peter Lieven
evaluation with the recently introduced maximum stack usage monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
60kB which should still give enough head room. The guard page added
in qemu_alloc_stack will catch a potential stack overflow introduced
by this commit. The 60kB + guard page will result in an allocation of
64kB per coroutine on systems where a page is 4kB.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 14d4f1d..be14260 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,7 +28,7 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE 61440
 
 typedef enum {
 COROUTINE_YIELD = 1,
-- 
1.9.1




Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-09-16 Thread Peter Lieven

Am 13.09.2016 um 20:04 schrieb Michael Roth:

Quoting Peter Lieven (2016-09-13 10:52:04)



Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi <stefa...@redhat.com>:


On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth wrote:
Quoting Stefan Hajnoczi (2016-09-05 12:54:35)

On Fri, Aug 26, 2016 at 01:45:56PM +0200, Peter Lieven wrote:

Am 25.08.2016 um 19:23 schrieb Michael Roth:
Quoting Peter Lieven (2016-08-25 01:38:13)

7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
700f26b virtio: recalculate vq->inuse after migration

Looks like these got posted during the freeze :(


The virtio thing is important because live migration is broken without
the fix as  86cc089 is in 2.6.1.

Not sure I understand the relation to 86cc089. Wouldn't the check
introduced there always pass due to target initializing inuse to 0?

Or is the issue that the fix introduced in 86cc089 is only partially
effective due to inuse not being recalculated properly on target? That might
warrant a 2.6.1.1...

This is what Stefan wrote in the cover letter to the series:

"I should mention this is for QEMU 2.7. These fixes are needed if the
CVE-2016-5403 patch has been applied. Without these patches any device that 
holds VirtQueueElements acros
live migration will terminate with a "Virtqueue size exceeded" error message. 
virtio-balloon and virtio-scsi are affected. virtio-bl
probably too but I haven't tested it."

Maybe

The virtio inuse fixes are needed for stable (v2.6.2?) so that the
spurious "Virtqueue size exceeded" on migration is solved.

The error can be reproduced when there is a VirtQueueElement pending
across migration (e.g. virtio-blk s->rq failed request list).

Thanks for clarifying. I'm planning to do a 2.6.2 to capture these, the
patches Peter mentioned, and some other fixes that came during 2.7 RC
phase.

I have an initial staging tree at:

  https://github.com/mdroth/qemu/commits/stable-2.6-staging

There's still a few PULLs in flight with patches I plan to pull in, but
hoping to send out the patch round-up early next week and a release the
following week.

Two more candidates for stable:

4b7f91e virtio: zero vq->inuse in virtio_reset()
104e70c virtio-balloon: discard virtqueue element on reset

They also deal with "Virtqueue size exceeded" errors.

Stefan

There also seems to be an regression (segfault) in the VNC server in 2.6.1, but 
i am still investigating.

Do you have a reproducer? I can try a bisect. Trying to get the initial
staging tree posted today but want to make sure any known regressions are
addressed beforehand.


Hi Michael,

we have not been able to reproduce anymore. My guess is that our client had a 
bug in the new version
and that the regression can only happen in a special connection state. But we 
are still trying
to reproduce.

BTW, meanwhile another vnc bugfix popped up:

vnc: fix qemu crash because of SIGSEGV

BR,

Peter



Re: [Qemu-devel] [PATCH] net: limit allocation in nc_sendv_compat

2016-09-16 Thread Peter Lieven

Am 30.06.2016 um 15:39 schrieb Stefan Hajnoczi:

On Thu, Jun 30, 2016 at 11:49:40AM +0200, Peter Lieven wrote:

we only need to allocate enough memory to hold the packet. This might be
less than NET_BUFSIZE. Additionally fail early if the packet is larger
than NET_BUFSIZE.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
  net/net.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefa...@redhat.com>


Can somebody pick this up please?

Thanks,
Peter




Re: [Qemu-devel] [PULL 00/42] Block layer patches

2016-09-16 Thread Peter Lieven

Am 06.09.2016 um 12:36 schrieb Kevin Wolf:

Am 06.09.2016 um 12:12 hat Peter Maydell geschrieben:

On 5 September 2016 at 19:13, Kevin Wolf <kw...@redhat.com> wrote:

The following changes since commit e87d397e5ef66276ccc49b829527d605ca07d0ad:

   Open 2.8 development tree (2016-09-05 11:38:54 +0100)

are available in the git repository at:

   git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 1585512ba88324844c2722fd977b126fc2748604:

   coroutine: reduce stack size to 64kB (2016-09-05 19:06:49 +0200)


Block layer patches



I'm afraid this fails 'make check' on ppc64be:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$((RANDOM % 255 + 1))} gtester -k
--verbose -m=quick tests/test-coroutine
TEST: tests/test-coroutine... (pid=52932)
   /basic/co_queue: FAIL
GTester: last random seed: R02Sc3b6f36ad285d724ab0bb9d6b3b65ff7
(pid=52934)
   /basic/lifecycle:FAIL
GTester: last random seed: R02Sc30fe01c96aecf3873a5dab246c4945d
(pid=52935)
   /basic/yield:FAIL
GTester: last random seed: R02Sa343d307145707796c84673870963561
(pid=52936)
   /basic/nesting:  FAIL
GTester: last random seed: R02S9d9c125c3934e5f3d4fd794a7a3026b6
(pid=52937)
   /basic/self: FAIL
GTester: last random seed: R02Sf30817e4d44892fff95286f50d30b375
(pid=52938)
   /basic/in_coroutine: FAIL
GTester: last random seed: R02S4048ae6da7f2fe5614e9cecf846f712b
(pid=52939)
   /basic/order:FAIL
GTester: last random seed: R02S4261fe2d21feaee583cdd9f2a2433f5a
(pid=52940)
FAIL: tests/test-coroutine

Peter (Lieven), this is yours.

I'll drop the coroutine series from the pull request and send a v2.


Hi Kevin,

do you know how to check this on ppc64be? On x86_64 its working fine.

Thanks,
Peter




Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-09-13 Thread Peter Lieven


> Am 13.09.2016 um 20:04 schrieb Michael Roth <mdr...@linux.vnet.ibm.com>:
> 
> Quoting Peter Lieven (2016-09-13 10:52:04)
>> 
>> 
>>>> Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi <stefa...@redhat.com>:
>>>> 
>>>> On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth wrote:
>>>> Quoting Stefan Hajnoczi (2016-09-05 12:54:35)
>>>>>> On Fri, Aug 26, 2016 at 01:45:56PM +0200, Peter Lieven wrote:
>>>>>>>> Am 25.08.2016 um 19:23 schrieb Michael Roth:
>>>>>>>> Quoting Peter Lieven (2016-08-25 01:38:13)
>>>>>>>> 7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
>>>>>>>> 700f26b virtio: recalculate vq->inuse after migration
>>>>>>> Looks like these got posted during the freeze :(
>>>>>>> 
>>>>>>>> The virtio thing is important because live migration is broken without
>>>>>>>> the fix as  86cc089 is in 2.6.1.
>>>>>>> Not sure I understand the relation to 86cc089. Wouldn't the check
>>>>>>> introduced there always pass due to target initializing inuse to 0?
>>>>>>> 
>>>>>>> Or is the issue that the fix introduced in 86cc089 is only partially
>>>>>>> effective due to inuse not being recalculated properly on target? That 
>>>>>>> might
>>>>>>> warrant a 2.6.1.1...
>>>>>> 
>>>>>> This is what Stefan wrote in the cover letter to the series:
>>>>>> 
>>>>>> "I should mention this is for QEMU 2.7. These fixes are needed if the
>>>>>> CVE-2016-5403 patch has been applied. Without these patches any device 
>>>>>> that holds VirtQueueElements acros
>>>>>> live migration will terminate with a "Virtqueue size exceeded" error 
>>>>>> message. virtio-balloon and virtio-scsi are affected. virtio-bl
>>>>>> probably too but I haven't tested it."
>>>>>> 
>>>>>> Maybe
>>>>> 
>>>>> The virtio inuse fixes are needed for stable (v2.6.2?) so that the
>>>>> spurious "Virtqueue size exceeded" on migration is solved.
>>>>> 
>>>>> The error can be reproduced when there is a VirtQueueElement pending
>>>>> across migration (e.g. virtio-blk s->rq failed request list).
>>>> 
>>>> Thanks for clarifying. I'm planning to do a 2.6.2 to capture these, the
>>>> patches Peter mentioned, and some other fixes that came during 2.7 RC
>>>> phase.
>>>> 
>>>> I have an initial staging tree at:
>>>> 
>>>> https://github.com/mdroth/qemu/commits/stable-2.6-staging
>>>> 
>>>> There's still a few PULLs in flight with patches I plan to pull in, but
>>>> hoping to send out the patch round-up early next week and a release the
>>>> following week.
>>> 
>>> Two more candidates for stable:
>>> 
>>> 4b7f91e virtio: zero vq->inuse in virtio_reset()
>>> 104e70c virtio-balloon: discard virtqueue element on reset
>>> 
>>> They also deal with "Virtqueue size exceeded" errors.
>>> 
>>> Stefan
>> 
>> There also seems to be an regression (segfault) in the VNC server in 2.6.1, 
>> but i am still investigating.
> 
> Do you have a reproducer? I can try a bisect. Trying to get the initial
> staging tree posted today but want to make sure any known regressions are
> addressed beforehand.

i am out of Office till Monday, but if I remember correctly I saw mutex errors 
(not segfaults) with 2.6.1 that were not there on 2.5.1.1. They happened while 
my colleagues where experimenting with a new VNC client. So its likely that a 
certain connect/disconnect pattern is the trigger. I am not sure if the same 
issue exists in master. For more details we might have to wait till i am back 
at the office, sorry.

However, CC'ing Jan from Kamp. Maybe he has a reproducer.

Peter 





Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-09-13 Thread Peter Lieven


> Am 13.09.2016 um 17:42 schrieb Stefan Hajnoczi <stefa...@redhat.com>:
> 
>> On Thu, Sep 08, 2016 at 03:58:26PM -0500, Michael Roth wrote:
>> Quoting Stefan Hajnoczi (2016-09-05 12:54:35)
>>>> On Fri, Aug 26, 2016 at 01:45:56PM +0200, Peter Lieven wrote:
>>>>> Am 25.08.2016 um 19:23 schrieb Michael Roth:
>>>>> Quoting Peter Lieven (2016-08-25 01:38:13)
>>>>>> 7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
>>>>>> 700f26b virtio: recalculate vq->inuse after migration
>>>>> Looks like these got posted during the freeze :(
>>>>> 
>>>>>> The virtio thing is important because live migration is broken without
>>>>>> the fix as  86cc089 is in 2.6.1.
>>>>> Not sure I understand the relation to 86cc089. Wouldn't the check
>>>>> introduced there always pass due to target initializing inuse to 0?
>>>>> 
>>>>> Or is the issue that the fix introduced in 86cc089 is only partially
>>>>> effective due to inuse not being recalculated properly on target? That 
>>>>> might
>>>>> warrant a 2.6.1.1...
>>>> 
>>>> This is what Stefan wrote in the cover letter to the series:
>>>> 
>>>> "I should mention this is for QEMU 2.7. These fixes are needed if the
>>>> CVE-2016-5403 patch has been applied. Without these patches any device 
>>>> that holds VirtQueueElements acros
>>>> live migration will terminate with a "Virtqueue size exceeded" error 
>>>> message. virtio-balloon and virtio-scsi are affected. virtio-bl
>>>> probably too but I haven't tested it."
>>>> 
>>>> Maybe
>>> 
>>> The virtio inuse fixes are needed for stable (v2.6.2?) so that the
>>> spurious "Virtqueue size exceeded" on migration is solved.
>>> 
>>> The error can be reproduced when there is a VirtQueueElement pending
>>> across migration (e.g. virtio-blk s->rq failed request list).
>> 
>> Thanks for clarifying. I'm planning to do a 2.6.2 to capture these, the
>> patches Peter mentioned, and some other fixes that came during 2.7 RC
>> phase.
>> 
>> I have an initial staging tree at:
>> 
>>  https://github.com/mdroth/qemu/commits/stable-2.6-staging
>> 
>> There's still a few PULLs in flight with patches I plan to pull in, but
>> hoping to send out the patch round-up early next week and a release the
>> following week.
> 
> Two more candidates for stable:
> 
> 4b7f91e virtio: zero vq->inuse in virtio_reset()
> 104e70c virtio-balloon: discard virtqueue element on reset
> 
> They also deal with "Virtqueue size exceeded" errors.
> 
> Stefan

There also seems to be an regression (segfault) in the VNC server in 2.6.1, but 
i am still investigating.

Peter



Re: [Qemu-devel] [Qemu-stable] [PATCH v2 for 2.7] ui: fix refresh of VNC server surface

2016-08-29 Thread Peter Lieven

Am 16.08.2016 um 18:30 schrieb Daniel P. Berrange:

In previous commit

   commit c7628bff4138ce906a3620d12e0820c1cf6c140d
   Author: Gerd Hoffmann <kra...@redhat.com>
   Date:   Fri Oct 30 12:10:09 2015 +0100

 vnc: only alloc server surface with clients connected

the VNC server was changed so that the 'vd->server' pixman
image was only allocated when a client is connected.

Since then if a client disconnects and then reconnects to
the VNC server all they will see is a black screen until
they do something that triggers a refresh. On a graphical
desktop this is not often noticed since there's many things
going on which cause a refresh. On a plain text console it
is really obvious since nothing refreshes frequently.

The problem is that the VNC server didn't update the guest
dirty bitmap, so still believes its server image is in sync
with the guest contents.

To fix this we must explicitly mark the entire guest desktop
as dirty after re-creating the server surface. Move this
logic into vnc_update_server_surface() so it is guaranteed
to be call in all code paths that re-create the surface
instead of only in vnc_dpy_switch()

Signed-off-by: Daniel P. Berrange <berra...@redhat.com>
---

NB This should go into 2.5 & 2.6 stable branches too

  ui/vnc.c | 20 +++-
  1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 853b57e..d1087c9 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -692,6 +692,8 @@ void *vnc_server_fb_ptr(VncDisplay *vd, int x, int y)
  
  static void vnc_update_server_surface(VncDisplay *vd)

  {
+int width, height;
+
  qemu_pixman_image_unref(vd->server);
  vd->server = NULL;
  
@@ -699,10 +701,15 @@ static void vnc_update_server_surface(VncDisplay *vd)

  return;
  }
  
+width = vnc_width(vd);

+height = vnc_height(vd);
  vd->server = pixman_image_create_bits(VNC_SERVER_FB_FORMAT,
-  vnc_width(vd),
-  vnc_height(vd),
+  width, height,
NULL, 0);
+
+memset(vd->guest.dirty, 0x00, sizeof(vd->guest.dirty));
+vnc_set_area_dirty(vd->guest.dirty, vd, 0, 0,
+   width, height);
  }
  
  static void vnc_dpy_switch(DisplayChangeListener *dcl,

@@ -710,7 +717,6 @@ static void vnc_dpy_switch(DisplayChangeListener *dcl,
  {
  VncDisplay *vd = container_of(dcl, VncDisplay, dcl);
  VncState *vs;
-int width, height;
  
  vnc_abort_display_jobs(vd);

  vd->ds = surface;
@@ -722,11 +728,6 @@ static void vnc_dpy_switch(DisplayChangeListener *dcl,
  qemu_pixman_image_unref(vd->guest.fb);
  vd->guest.fb = pixman_image_ref(surface->image);
  vd->guest.format = surface->format;
-width = vnc_width(vd);
-height = vnc_height(vd);
-memset(vd->guest.dirty, 0x00, sizeof(vd->guest.dirty));
-vnc_set_area_dirty(vd->guest.dirty, vd, 0, 0,
-   width, height);
  
  QTAILQ_FOREACH(vs, >clients, next) {

  vnc_colordepth(vs);
@@ -736,7 +737,8 @@ static void vnc_dpy_switch(DisplayChangeListener *dcl,
  }
  memset(vs->dirty, 0x00, sizeof(vs->dirty));
  vnc_set_area_dirty(vs->dirty, vd, 0, 0,
-   width, height);
+   vnc_width(vd),
+   vnc_height(vd));
  }
  }
  


I can confirm this issue now. Its only visible if the VNC client does not send 
a SET_PIXEL_FORMAT message.
Which my client does and this masked the issue for me. The set pixel format 
routine invalidates
the whole display which results in a vnc_dpy_switch which masks the guest fb 
dirty. Why this invalidation
takes place I do not understand, but this can be sorted out lated.

For this patch:
Tested-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Peter Lieven <p...@kamp.de>

Peter



Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-08-26 Thread Peter Lieven

Am 25.08.2016 um 19:23 schrieb Michael Roth:

Quoting Peter Lieven (2016-08-25 01:38:13)

Am 17.08.2016 um 21:30 schrieb Michael Roth:

Hi everyone,

I am pleased to announce that the QEMU v2.6.1 stable release is now
available:

http://wiki.qemu.org/download/qemu-2.6.1.tar.bz2

v2.6.1 is now tagged in the official qemu.git repository,
and the stable-2.6 branch has been updated accordingly:

http://git.qemu.org/?p=qemu.git;a=shortlog;h=refs/heads/stable-2.6

This is a fairly large update that addresses a broad range of bugs
and security issues. Users should upgrade accordingly.

Thank you to everyone involved!

Hi Michael,

thanks for putting this together. Unfortunately, I was on holiday during
the patch round up for 2.6.1

I additionally have the following 5 patches in case you want or need to
release a 2.6.1.1 or 2.6.2:

bd9f480 ui: fix refresh of VNC server surface
4c23084 net: limit allocation in nc_sendv_compat


the vnc fix was also on the list during the freeze. Looking at it at the moment.
There seems to be a second issue with the VNC server as well..
The otherone indeed is missing. Its not critical there is just to much memory
allocated. I will ping Stefan to PULL it for 2.8.


I don't see these in master yet.


bf97c17 iscsi: pass SCSI status back for SG_IO

I'll pull this in if there's another release, but doesn't look
like a regression from 2.6.0 at least.


No, it was not there all the time.




7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
700f26b virtio: recalculate vq->inuse after migration

Looks like these got posted during the freeze :(


The virtio thing is important because live migration is broken without
the fix as  86cc089 is in 2.6.1.

Not sure I understand the relation to 86cc089. Wouldn't the check
introduced there always pass due to target initializing inuse to 0?

Or is the issue that the fix introduced in 86cc089 is only partially
effective due to inuse not being recalculated properly on target? That might
warrant a 2.6.1.1...


This is what Stefan wrote in the cover letter to the series:

"I should mention this is for QEMU 2.7. These fixes are needed if the
CVE-2016-5403 patch has been applied. Without these patches any device that 
holds VirtQueueElements acros
live migration will terminate with a "Virtqueue size exceeded" error message. 
virtio-balloon and virtio-scsi are affected. virtio-bl
probably too but I haven't tested it."

Maybe



Re: [Qemu-devel] [Qemu-stable] [PATCH v2 for 2.7] ui: fix refresh of VNC server surface

2016-08-26 Thread Peter Lieven

Am 25.08.2016 um 14:46 schrieb Daniel P. Berrange:

On Thu, Aug 25, 2016 at 09:15:52AM +0200, Peter Lieven wrote:

Am 24.08.2016 um 17:49 schrieb Daniel P. Berrange:

On Wed, Aug 24, 2016 at 04:46:31PM +0100, Peter Maydell wrote:

On 23 August 2016 at 07:50, Peter Lieven <p...@kamp.de> wrote:

Am 16.08.2016 um 18:30 schrieb Daniel P. Berrange:

In previous commit

 commit c7628bff4138ce906a3620d12e0820c1cf6c140d
 Author: Gerd Hoffmann <kra...@redhat.com>
 Date:   Fri Oct 30 12:10:09 2015 +0100

   vnc: only alloc server surface with clients connected

the VNC server was changed so that the 'vd->server' pixman
image was only allocated when a client is connected.

Since then if a client disconnects and then reconnects to
the VNC server all they will see is a black screen until
they do something that triggers a refresh. On a graphical
desktop this is not often noticed since there's many things
going on which cause a refresh. On a plain text console it
is really obvious since nothing refreshes frequently.

The problem is that the VNC server didn't update the guest
dirty bitmap, so still believes its server image is in sync
with the guest contents.

To fix this we must explicitly mark the entire guest desktop
as dirty after re-creating the server surface. Move this
logic into vnc_update_server_surface() so it is guaranteed
to be call in all code paths that re-create the surface
instead of only in vnc_dpy_switch()

Signed-off-by: Daniel P. Berrange <berra...@redhat.com>

I noticed that these patches is as well not in master yet and therefore
not included in the 2.7.0-rc4 tagged yesterday.

Dan, Gerd -- we're going to need an rc5 anyway -- can you
comment on whether this patch is "should be in rc5"
material? (If it is I can commit it to master directly.)

I think it should be, as the VNC server is pretty unusable when it does
not refresh the screen when you connect, but I'll let Gerd decide in case
there's some implication in my patch that I've mis-understood.

Hi Daniel,

I have had a look at the code and currently I do not understand why you
ran into a blank screen. The server surface is not created if there is no
client connected, this is correct. But the dirty bitmap for the server
exists even if there is no client connected. So the status of the dirty bitmap
should be the same with or without your patch.

IIUC, the dirty bitmap is used to decide when to update the server surface
from the guest framebuffer.  When the new client connects and we create a
new server surface, the dirty bitmap is clean, so QEMU never copies the
guest framebuffer into the new server surface hence I just get a blank
screen.


Okay, but this can only happen at the second connection. The first
vnc connection works, you disconnect, reconnect and then get a blank screen.
Indeed in this case there is in theory no vnc_dpy_switch in between. However,
I have the feeling that something else is additionally wrong - see below.




Can you tell what exactly you tried out to reproduce a potential issue?

Boot a Fedora guest, in text mode (ie no Xorg graphics). Then simply
disconnect & connect and I'll always get a black screen until I do something
that causes Linux to update the screen.


I can reproduce an issue if I have 2 VNC connections in paralell.

If I open the first VNC connection to a VM that does not update the screen, I 
have the following
calls to vnc_dpy_switch and vnc_update_server_surface.

# qemu start
vnc_update_server_surface: no clients
vnc_dpy_switch: 640 480

# first vnc client connects
vnc_update_server_surface: no clients
vnc_dpy_switch: 720 400
vnc_connect: 0x5582ceb56000
vnc_init_state: 0x5582ceb56000 (first_client 1)
vnc_update_server_surface
vnc_update_server_surface
vnc_dpy_switch: 720 400

# first client disconnects
vnc_disconnect_finished: 0x562b79aa3850 (last_client 1)
vnc_update_server_surface: no clients

# first client connects again
vnc_connect: 0x562b79aa3850
vnc_init_state: 0x562b79aa3850 (first_client 1)
vnc_update_server_surface
vnc_update_server_surface
vnc_dpy_switch: 720 400

# second client connects
vnc_connect: 0x562b79a6d890
vnc_init_state: 0x562b79a6d890 (first_client 0)
vnc_update_server_surface
vnc_dpy_switch: 720 400

# second client disconnects
vnc_disconnect_finished: 0x562b79a6d890 (last_client 0)

# first client disconnects
vnc_disconnect_finished: 0x562b79aa3850 (last_client 1)
vnc_update_server_surface: no clients

Strange enough, I see no issues with the first client, but I wonder why each vnc
connect triggers multiple vnc_update_server_surface and vnc_dpy_switch and 
secondly
my second client has a blank screen.

So at first I think your patch is correct, we need to mark the server surface 
dirty
in vnc_update_server_surface cause there are other callers to it than 
vnc_dpy_switch.

But there seems to be another issue. I try to figure out what it is.

Peter



Re: [Qemu-devel] [Qemu-stable] [PATCH v2 for 2.7] ui: fix refresh of VNC server surface

2016-08-25 Thread Peter Lieven

Am 24.08.2016 um 17:49 schrieb Daniel P. Berrange:

On Wed, Aug 24, 2016 at 04:46:31PM +0100, Peter Maydell wrote:

On 23 August 2016 at 07:50, Peter Lieven <p...@kamp.de> wrote:

Am 16.08.2016 um 18:30 schrieb Daniel P. Berrange:

In previous commit

commit c7628bff4138ce906a3620d12e0820c1cf6c140d
Author: Gerd Hoffmann <kra...@redhat.com>
Date:   Fri Oct 30 12:10:09 2015 +0100

  vnc: only alloc server surface with clients connected

the VNC server was changed so that the 'vd->server' pixman
image was only allocated when a client is connected.

Since then if a client disconnects and then reconnects to
the VNC server all they will see is a black screen until
they do something that triggers a refresh. On a graphical
desktop this is not often noticed since there's many things
going on which cause a refresh. On a plain text console it
is really obvious since nothing refreshes frequently.

The problem is that the VNC server didn't update the guest
dirty bitmap, so still believes its server image is in sync
with the guest contents.

To fix this we must explicitly mark the entire guest desktop
as dirty after re-creating the server surface. Move this
logic into vnc_update_server_surface() so it is guaranteed
to be call in all code paths that re-create the surface
instead of only in vnc_dpy_switch()

Signed-off-by: Daniel P. Berrange <berra...@redhat.com>


I noticed that these patches is as well not in master yet and therefore
not included in the 2.7.0-rc4 tagged yesterday.

Dan, Gerd -- we're going to need an rc5 anyway -- can you
comment on whether this patch is "should be in rc5"
material? (If it is I can commit it to master directly.)

I think it should be, as the VNC server is pretty unusable when it does
not refresh the screen when you connect, but I'll let Gerd decide in case
there's some implication in my patch that I've mis-understood.


Hi Daniel,

I have had a look at the code and currently I do not understand why you
ran into a blank screen. The server surface is not created if there is no
client connected, this is correct. But the dirty bitmap for the server
exists even if there is no client connected. So the status of the dirty bitmap
should be the same with or without your patch.

Can you tell what exactly you tried out to reproduce a potential issue?

Thanks,
Peter



Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.6.1 Stable released

2016-08-25 Thread Peter Lieven

Am 17.08.2016 um 21:30 schrieb Michael Roth:

Hi everyone,

I am pleased to announce that the QEMU v2.6.1 stable release is now
available:

   http://wiki.qemu.org/download/qemu-2.6.1.tar.bz2

v2.6.1 is now tagged in the official qemu.git repository,
and the stable-2.6 branch has been updated accordingly:

   http://git.qemu.org/?p=qemu.git;a=shortlog;h=refs/heads/stable-2.6

This is a fairly large update that addresses a broad range of bugs
and security issues. Users should upgrade accordingly.

Thank you to everyone involved!


Hi Michael,

thanks for putting this together. Unfortunately, I was on holiday during
the patch round up for 2.6.1

I additionally have the following 5 patches in case you want or need to
release a 2.6.1.1 or 2.6.2:

bd9f480 ui: fix refresh of VNC server surface
7c509d1 virtio: decrement vq->inuse in virtqueue_discard()
700f26b virtio: recalculate vq->inuse after migration
4c23084 net: limit allocation in nc_sendv_compat
bf97c17 iscsi: pass SCSI status back for SG_IO

The virtio thing is important because live migration is broken without
the fix as  86cc089 is in 2.6.1.

Thanks,
Peter



[Qemu-devel] [PATCH V7 1/6] oslib-posix: add helpers for stack alloc and free

2016-08-23 Thread Peter Lieven
the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows. The memory for the guard page is deductated from
stack memory so that the usable stack size is effectively reduced by the size
of one page. This is equivalent to how the glibc stack allocation routines
behave.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 include/sysemu/os-posix.h | 27 ++
 util/oslib-posix.c| 48 +++
 2 files changed, 75 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..87e60fe 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,31 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
 
 bool is_daemonized(void);
 
+/**
+ * qemu_alloc_stack:
+ * @sz: size of required stack in bytes
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()). This function also inserts a
+ * guard page to catch a potential stack overflow. The memory
+ * for the guard page is deductated from stack memory so that
+ * the usable stack size is effectively sz bytes minus the size
+ * of one page.
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.
+ */
+void *qemu_alloc_stack(size_t sz);
+
+/**
+ * qemu_free_stack:
+ * @stack: stack to free
+ * @sz: size of stack in bytes
+ *
+ * Free a stack allocated via qemu_alloc_stack().
+ */
+void qemu_free_stack(void *stack, size_t sz);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f2d4e9e..74dbe1c 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -499,3 +499,51 @@ pid_t qemu_fork(Error **errp)
 }
 return pid;
 }
+
+static size_t adjust_stack_size(size_t sz, size_t pagesz)
+{
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+long min_stack_sz = sysconf(_SC_THREAD_STACK_MIN);
+sz = MAX(MAX(min_stack_sz, 0), sz);
+#endif
+/* adjust stack size to a multiple of the page size */
+sz = ROUND_UP(sz, pagesz);
+return sz;
+}
+
+void *qemu_alloc_stack(size_t sz)
+{
+void *ptr, *guardpage;
+size_t pagesz = getpagesize();
+sz = adjust_stack_size(sz, pagesz);
+
+ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+
+#if defined(HOST_IA64)
+/* separate register stack */
+guardpage = ptr + (((sz - pagesz) / 2) & ~pagesz);
+#elif defined(HOST_HPPA)
+/* stack grows up */
+guardpage = ptr + sz - pagesz;
+#else
+/* stack grows down */
+guardpage = ptr;
+#endif
+if (mprotect(guardpage, pagesz, PROT_NONE) != 0) {
+abort();
+}
+
+return ptr;
+}
+
+void qemu_free_stack(void *stack, size_t sz)
+{
+size_t pagesz = getpagesize();
+sz = adjust_stack_size(sz, pagesz);
+munmap(stack, sz);
+}
-- 
1.9.1




[Qemu-devel] [PATCH V7 5/6] oslib-posix: add a configure switch to debug stack usage

2016-08-23 Thread Peter Lieven
this adds a knob to track the maximum stack usage of stacks
created by qemu_alloc_stack.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 configure  | 19 +++
 util/oslib-posix.c | 36 
 2 files changed, 55 insertions(+)

diff --git a/configure b/configure
index 4b808f9..7e087c5 100755
--- a/configure
+++ b/configure
@@ -296,6 +296,7 @@ libiscsi=""
 libnfs=""
 coroutine=""
 coroutine_pool=""
+debug_stack_usage="no"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -1005,6 +1006,8 @@ for opt do
   ;;
   --enable-coroutine-pool) coroutine_pool="yes"
   ;;
+  --enable-debug-stack-usage) debug_stack_usage="yes"
+  ;;
   --disable-docs) docs="no"
   ;;
   --enable-docs) docs="yes"
@@ -4306,6 +4309,17 @@ if test "$coroutine" = "gthread" -a "$coroutine_pool" = 
"yes"; then
   error_exit "'gthread' coroutine backend does not support pool (use 
--disable-coroutine-pool)"
 fi
 
+if test "$debug_stack_usage" = "yes"; then
+  if test "$cpu" = "ia64" -o "$cpu" = "hppa"; then
+error_exit "stack usage debugging is not supported for $cpu"
+  fi
+  if test "$coroutine_pool" = "yes"; then
+echo "WARN: disabling coroutine pool for stack usage debugging"
+coroutine_pool=no
+  fi
+fi
+
+
 ##
 # check if we have open_by_handle_at
 
@@ -4892,6 +4906,7 @@ echo "QGA MSI support   $guest_agent_msi"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool$coroutine_pool"
+echo "debug stack usage $debug_stack_usage"
 echo "GlusterFS support $glusterfs"
 echo "Archipelago support $archipelago"
 echo "gcov  $gcov_tool"
@@ -5360,6 +5375,10 @@ else
   echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
 fi
 
+if test "$debug_stack_usage" = "yes" ; then
+  echo "CONFIG_DEBUG_STACK_USAGE=y" >> $config_host_mak
+fi
+
 if test "$open_by_handle_at" = "yes" ; then
   echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
 fi
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 74dbe1c..b33c1cd 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -50,6 +50,10 @@
 
 #include "qemu/mmap-alloc.h"
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+#include "qemu/error-report.h"
+#endif
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -515,6 +519,9 @@ static size_t adjust_stack_size(size_t sz, size_t pagesz)
 void *qemu_alloc_stack(size_t sz)
 {
 void *ptr, *guardpage;
+#ifdef CONFIG_DEBUG_STACK_USAGE
+void *ptr2;
+#endif
 size_t pagesz = getpagesize();
 sz = adjust_stack_size(sz, pagesz);
 
@@ -538,12 +545,41 @@ void *qemu_alloc_stack(size_t sz)
 abort();
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr2 = ptr + pagesz; ptr2 < ptr + sz; ptr2 += sizeof(uint32_t)) {
+*(uint32_t *)ptr2 = 0xdeadbeaf;
+}
+#endif
+
 return ptr;
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+static __thread unsigned int max_stack_usage;
+#endif
+
 void qemu_free_stack(void *stack, size_t sz)
 {
+#ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned int usage;
+void *ptr;
+#endif
 size_t pagesz = getpagesize();
 sz = adjust_stack_size(sz, pagesz);
+
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr = stack + pagesz; ptr < stack + sz; ptr += sizeof(uint32_t)) {
+if (*(uint32_t *)ptr != 0xdeadbeaf) {
+break;
+}
+}
+usage = sz - (uintptr_t) (ptr - stack);
+if (usage > max_stack_usage) {
+error_report("thread %d max stack usage increased from %u to %u",
+ qemu_get_thread_id(), max_stack_usage, usage);
+max_stack_usage = usage;
+}
+#endif
+
 munmap(stack, sz);
 }
-- 
1.9.1




[Qemu-devel] [PATCH V7 2/6] coroutine: add a macro for the coroutine stack size

2016-08-23 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 include/qemu/coroutine_int.h | 2 ++
 util/coroutine-sigaltstack.c | 2 +-
 util/coroutine-ucontext.c| 2 +-
 util/coroutine-win32.c   | 2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 581a7f5..f62f83f 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,6 +28,8 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
+#define COROUTINE_STACK_SIZE (1 << 20)
+
 typedef enum {
 COROUTINE_YIELD = 1,
 COROUTINE_TERMINATE = 2,
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a7c3366..9c2854c 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,7 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 2bb7e10..31254ab 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,7 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c
index 02e28e8..de6bd4f 100644
--- a/util/coroutine-win32.c
+++ b/util/coroutine-win32.c
@@ -71,7 +71,7 @@ static void CALLBACK coroutine_trampoline(void *co_)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineWin32 *co;
 
 co = g_malloc0(sizeof(*co));
-- 
1.9.1




[Qemu-devel] [PATCH V7 3/6] coroutine-ucontext: use helper for allocating stack memory

2016-08-23 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 util/coroutine-ucontext.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 31254ab..b7dea8c 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,6 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
@@ -101,17 +100,17 @@ Coroutine *qemu_coroutine_new(void)
 }
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack = qemu_alloc_stack(COROUTINE_STACK_SIZE);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 uc.uc_link = _uc;
 uc.uc_stack.ss_sp = co->stack;
-uc.uc_stack.ss_size = stack_size;
+uc.uc_stack.ss_size = COROUTINE_STACK_SIZE;
 uc.uc_stack.ss_flags = 0;
 
 #ifdef CONFIG_VALGRIND_H
 co->valgrind_stack_id =
-VALGRIND_STACK_REGISTER(co->stack, co->stack + stack_size);
+VALGRIND_STACK_REGISTER(co->stack, co->stack + COROUTINE_STACK_SIZE);
 #endif
 
 arg.p = co;
@@ -149,7 +148,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 valgrind_stack_deregister(co);
 #endif
 
-g_free(co->stack);
+qemu_free_stack(co->stack, COROUTINE_STACK_SIZE);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V7 6/6] coroutine: reduce stack size to 64kB

2016-08-23 Thread Peter Lieven
evaluation with the recently introduced maximum stack usage monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
64kB which should still give enough head room. The guard page added
in qemu_alloc_stack will catch a potential stack overflow introduced
by this commit.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index f62f83f..011910f 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,7 +28,7 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE (1 << 16)
 
 typedef enum {
 COROUTINE_YIELD = 1,
-- 
1.9.1




[Qemu-devel] [PATCH V7 0/6] coroutine: mmap stack memory and stack size

2016-08-23 Thread Peter Lieven
I decided to split this from the rest of the Qemu RSS usage series as
it contains the more or less non contentious patches.

I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
side effects it has.

I kept the guard page which is now nicely makes the stacks visible in
smaps. The old version of the relevent patch lacked the MAP_FIXED flag
in the second call to mmap.

The last patch which reduces the stack size of coroutines to 64kB
may be omitted if its found to risky.

v6->v6:
 - Patch 1: avoid multiple calls to sysconf and getpagesize [Richard]

v5->v6:
 - Patch 1: added info that the guard page is deducted from stack memory to
commit msg and headers [Stefan]
 - rebased to master

v4->v5:
 - Patch 1: check if _SC_THREAD_STACK_MIN is defined
 - Patch 1: guard against sysconf(_SC_THREAD_STACK_MIN) returning -1 [Eric]

v3->v4:
 - Patch 1: add a static function to adjust the stack size [Richard]
 - Patch 1: round up the stack size to multiple of the pagesize.

v2->v3:
 - Patch 1,6: adjusted commit message to mention the guard page [Markus]

v1->v2:
 - Patch 1: added an architecture dependend guard page [Richard]
 - Patch 1: avoid stacks smaller than _SC_THREAD_STACK_MIN [Richard]
 - Patch 1: use mmap+mprotect instead of mmap+mmap [Richard]
 - Patch 5: u_int32_t -> uint32_t [Richard]
 - Patch 5: only available if stack grows down

Peter Lieven (6):
  oslib-posix: add helpers for stack alloc and free
  coroutine: add a macro for the coroutine stack size
  coroutine-ucontext: use helper for allocating stack memory
  coroutine-sigaltstack: use helper for allocating stack memory
  oslib-posix: add a configure switch to debug stack usage
  coroutine: reduce stack size to 64kB

 configure| 19 ++
 include/qemu/coroutine_int.h |  2 ++
 include/sysemu/os-posix.h| 27 ++
 util/coroutine-sigaltstack.c |  7 ++--
 util/coroutine-ucontext.c|  9 +++--
 util/coroutine-win32.c   |  2 +-
 util/oslib-posix.c   | 84 
 7 files changed, 140 insertions(+), 10 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH V7 4/6] coroutine-sigaltstack: use helper for allocating stack memory

2016-08-23 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 util/coroutine-sigaltstack.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index 9c2854c..ccf4861 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,6 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
@@ -164,7 +163,7 @@ Coroutine *qemu_coroutine_new(void)
  */
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack = qemu_alloc_stack(COROUTINE_STACK_SIZE);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 coTS = coroutine_get_thread_state();
@@ -189,7 +188,7 @@ Coroutine *qemu_coroutine_new(void)
  * Set the new stack.
  */
 ss.ss_sp = co->stack;
-ss.ss_size = stack_size;
+ss.ss_size = COROUTINE_STACK_SIZE;
 ss.ss_flags = 0;
 if (sigaltstack(, ) < 0) {
 abort();
@@ -253,7 +252,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
 
-g_free(co->stack);
+qemu_free_stack(co->stack, COROUTINE_STACK_SIZE);
 g_free(co);
 }
 
-- 
1.9.1




Re: [Qemu-devel] [PATCH V6 1/6] oslib-posix: add helpers for stack alloc and free

2016-08-23 Thread Peter Lieven

Am 22.08.2016 um 17:19 schrieb Richard Henderson:

On 08/22/2016 06:04 AM, Peter Lieven wrote:

+static size_t adjust_stack_size(size_t sz)
+{
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+sz = MAX(MAX(sysconf(_SC_THREAD_STACK_MIN), 0), sz);
+#endif


You need to place the sysconf result into a local variable.  What you have now 
expands to 4 invocations of the function.


okay, I thought it would only be 2 calls and its not critical. but I can change 
that.



You might also consider passing in the pagesize, since you've already called 
getpagesize in one of the two users of this function.


I can change this as well, but isn't getpagesize just a macro on most systems?
If the call of getpagesize is critical we should cache it somewhere as there 
are more critical sections that call it repeatly.

Peter




Re: [Qemu-devel] [Qemu-stable] [PATCH v2 for 2.7] ui: fix refresh of VNC server surface

2016-08-23 Thread Peter Lieven

Am 16.08.2016 um 18:30 schrieb Daniel P. Berrange:

In previous commit

   commit c7628bff4138ce906a3620d12e0820c1cf6c140d
   Author: Gerd Hoffmann 
   Date:   Fri Oct 30 12:10:09 2015 +0100

 vnc: only alloc server surface with clients connected

the VNC server was changed so that the 'vd->server' pixman
image was only allocated when a client is connected.

Since then if a client disconnects and then reconnects to
the VNC server all they will see is a black screen until
they do something that triggers a refresh. On a graphical
desktop this is not often noticed since there's many things
going on which cause a refresh. On a plain text console it
is really obvious since nothing refreshes frequently.

The problem is that the VNC server didn't update the guest
dirty bitmap, so still believes its server image is in sync
with the guest contents.

To fix this we must explicitly mark the entire guest desktop
as dirty after re-creating the server surface. Move this
logic into vnc_update_server_surface() so it is guaranteed
to be call in all code paths that re-create the surface
instead of only in vnc_dpy_switch()

Signed-off-by: Daniel P. Berrange 


I noticed that these patches is as well not in master yet and therefore
not included in the 2.7.0-rc4 tagged yesterday.

Peter



Re: [Qemu-devel] [Qemu-stable] [PATCH 0/2] virtio: fix VirtQueue->inuse field

2016-08-23 Thread Peter Lieven

Am 17.08.2016 um 15:58 schrieb Stefan Hajnoczi:

On Mon, Aug 15, 2016 at 01:54:14PM +0100, Stefan Hajnoczi wrote:

The VirtQueue->inuse field is not always updated correctly.  These patches fix
it.

Originally this series was called "virtio-balloon: fix stats vq migration" but
Ladi Prosek posted a nicer fix called "balloon: Fix failure of updating guest
memory status".  I dropped the virtio-balloon patches.

Changes from previous series:
  * Missing comma in error formatting [Fam]
  * virtio_descard() -> virtio_discard() [Michael]
  * Multi-line comment style [Cornelia]

Stefan Hajnoczi (2):
   virtio: recalculate vq->inuse after migration
   virtio: decrement vq->inuse in virtqueue_discard()

  hw/virtio/virtio.c | 16 
  1 file changed, 16 insertions(+)

I should mention this is for QEMU 2.7.  These fixes are needed if the
CVE-2016-5403 patch has been applied.

Without these patches any device that holds VirtQueueElements across
live migration will terminate with a "Virtqueue size exceeded" error
message.  virtio-balloon and virtio-scsi are affected.  virtio-blk
probably too but I haven't tested it.

Stefan


I noticed that these patches are not in master yet and therefore
not included in the 2.7.0-rc4 tagges yesterday. Is there any issue with them?

Peter



[Qemu-devel] [PATCH V6 5/6] oslib-posix: add a configure switch to debug stack usage

2016-08-22 Thread Peter Lieven
this adds a knob to track the maximum stack usage of stacks
created by qemu_alloc_stack.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 configure  | 19 +++
 util/oslib-posix.c | 37 +
 2 files changed, 56 insertions(+)

diff --git a/configure b/configure
index 4b808f9..7e087c5 100755
--- a/configure
+++ b/configure
@@ -296,6 +296,7 @@ libiscsi=""
 libnfs=""
 coroutine=""
 coroutine_pool=""
+debug_stack_usage="no"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -1005,6 +1006,8 @@ for opt do
   ;;
   --enable-coroutine-pool) coroutine_pool="yes"
   ;;
+  --enable-debug-stack-usage) debug_stack_usage="yes"
+  ;;
   --disable-docs) docs="no"
   ;;
   --enable-docs) docs="yes"
@@ -4306,6 +4309,17 @@ if test "$coroutine" = "gthread" -a "$coroutine_pool" = 
"yes"; then
   error_exit "'gthread' coroutine backend does not support pool (use 
--disable-coroutine-pool)"
 fi
 
+if test "$debug_stack_usage" = "yes"; then
+  if test "$cpu" = "ia64" -o "$cpu" = "hppa"; then
+error_exit "stack usage debugging is not supported for $cpu"
+  fi
+  if test "$coroutine_pool" = "yes"; then
+echo "WARN: disabling coroutine pool for stack usage debugging"
+coroutine_pool=no
+  fi
+fi
+
+
 ##
 # check if we have open_by_handle_at
 
@@ -4892,6 +4906,7 @@ echo "QGA MSI support   $guest_agent_msi"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool$coroutine_pool"
+echo "debug stack usage $debug_stack_usage"
 echo "GlusterFS support $glusterfs"
 echo "Archipelago support $archipelago"
 echo "gcov  $gcov_tool"
@@ -5360,6 +5375,10 @@ else
   echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
 fi
 
+if test "$debug_stack_usage" = "yes" ; then
+  echo "CONFIG_DEBUG_STACK_USAGE=y" >> $config_host_mak
+fi
+
 if test "$open_by_handle_at" = "yes" ; then
   echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
 fi
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 76b028e..8869f6c 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -50,6 +50,10 @@
 
 #include "qemu/mmap-alloc.h"
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+#include "qemu/error-report.h"
+#endif
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -514,6 +518,9 @@ static size_t adjust_stack_size(size_t sz)
 void *qemu_alloc_stack(size_t sz)
 {
 void *ptr, *guardpage;
+#ifdef CONFIG_DEBUG_STACK_USAGE
+void *ptr2;
+#endif
 size_t pagesz = getpagesize();
 sz = adjust_stack_size(sz);
 
@@ -537,11 +544,41 @@ void *qemu_alloc_stack(size_t sz)
 abort();
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr2 = ptr + pagesz; ptr2 < ptr + sz; ptr2 += sizeof(uint32_t)) {
+*(uint32_t *)ptr2 = 0xdeadbeaf;
+}
+#endif
+
 return ptr;
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+static __thread unsigned int max_stack_usage;
+#endif
+
 void qemu_free_stack(void *stack, size_t sz)
 {
+#ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned int usage;
+void *ptr;
+#endif
 sz = adjust_stack_size(sz);
+
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr = stack + getpagesize(); ptr < stack + sz;
+ ptr += sizeof(uint32_t)) {
+if (*(uint32_t *)ptr != 0xdeadbeaf) {
+break;
+}
+}
+usage = sz - (uintptr_t) (ptr - stack);
+if (usage > max_stack_usage) {
+error_report("thread %d max stack usage increased from %u to %u",
+ qemu_get_thread_id(), max_stack_usage, usage);
+max_stack_usage = usage;
+}
+#endif
+
 munmap(stack, sz);
 }
-- 
1.9.1




[Qemu-devel] [PATCH V6 3/6] coroutine-ucontext: use helper for allocating stack memory

2016-08-22 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 util/coroutine-ucontext.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 31254ab..b7dea8c 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,6 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
@@ -101,17 +100,17 @@ Coroutine *qemu_coroutine_new(void)
 }
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack = qemu_alloc_stack(COROUTINE_STACK_SIZE);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 uc.uc_link = _uc;
 uc.uc_stack.ss_sp = co->stack;
-uc.uc_stack.ss_size = stack_size;
+uc.uc_stack.ss_size = COROUTINE_STACK_SIZE;
 uc.uc_stack.ss_flags = 0;
 
 #ifdef CONFIG_VALGRIND_H
 co->valgrind_stack_id =
-VALGRIND_STACK_REGISTER(co->stack, co->stack + stack_size);
+VALGRIND_STACK_REGISTER(co->stack, co->stack + COROUTINE_STACK_SIZE);
 #endif
 
 arg.p = co;
@@ -149,7 +148,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 valgrind_stack_deregister(co);
 #endif
 
-g_free(co->stack);
+qemu_free_stack(co->stack, COROUTINE_STACK_SIZE);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V6 4/6] coroutine-sigaltstack: use helper for allocating stack memory

2016-08-22 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 util/coroutine-sigaltstack.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index 9c2854c..ccf4861 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,6 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
@@ -164,7 +163,7 @@ Coroutine *qemu_coroutine_new(void)
  */
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack = qemu_alloc_stack(COROUTINE_STACK_SIZE);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 coTS = coroutine_get_thread_state();
@@ -189,7 +188,7 @@ Coroutine *qemu_coroutine_new(void)
  * Set the new stack.
  */
 ss.ss_sp = co->stack;
-ss.ss_size = stack_size;
+ss.ss_size = COROUTINE_STACK_SIZE;
 ss.ss_flags = 0;
 if (sigaltstack(, ) < 0) {
 abort();
@@ -253,7 +252,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
 
-g_free(co->stack);
+qemu_free_stack(co->stack, COROUTINE_STACK_SIZE);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH V6 1/6] oslib-posix: add helpers for stack alloc and free

2016-08-22 Thread Peter Lieven
the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows. The memory for the guard page is deductated from
stack memory so that the usable stack size is effectively reduced by the size
of one page. This is equivalent to how the glibc stack allocation routines
behave.

Signed-off-by: Peter Lieven <p...@kamp.de>
---
 include/sysemu/os-posix.h | 27 +++
 util/oslib-posix.c| 46 ++
 2 files changed, 73 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..87e60fe 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,31 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
 
 bool is_daemonized(void);
 
+/**
+ * qemu_alloc_stack:
+ * @sz: size of required stack in bytes
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()). This function also inserts a
+ * guard page to catch a potential stack overflow. The memory
+ * for the guard page is deductated from stack memory so that
+ * the usable stack size is effectively sz bytes minus the size
+ * of one page.
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.
+ */
+void *qemu_alloc_stack(size_t sz);
+
+/**
+ * qemu_free_stack:
+ * @stack: stack to free
+ * @sz: size of stack in bytes
+ *
+ * Free a stack allocated via qemu_alloc_stack().
+ */
+void qemu_free_stack(void *stack, size_t sz);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f2d4e9e..76b028e 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -499,3 +499,49 @@ pid_t qemu_fork(Error **errp)
 }
 return pid;
 }
+
+static size_t adjust_stack_size(size_t sz)
+{
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+sz = MAX(MAX(sysconf(_SC_THREAD_STACK_MIN), 0), sz);
+#endif
+/* adjust stack size to a multiple of the page size */
+sz = ROUND_UP(sz, getpagesize());
+return sz;
+}
+
+void *qemu_alloc_stack(size_t sz)
+{
+void *ptr, *guardpage;
+size_t pagesz = getpagesize();
+sz = adjust_stack_size(sz);
+
+ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+
+#if defined(HOST_IA64)
+/* separate register stack */
+guardpage = ptr + (((sz - pagesz) / 2) & ~pagesz);
+#elif defined(HOST_HPPA)
+/* stack grows up */
+guardpage = ptr + sz - pagesz;
+#else
+/* stack grows down */
+guardpage = ptr;
+#endif
+if (mprotect(guardpage, pagesz, PROT_NONE) != 0) {
+abort();
+}
+
+return ptr;
+}
+
+void qemu_free_stack(void *stack, size_t sz)
+{
+sz = adjust_stack_size(sz);
+munmap(stack, sz);
+}
-- 
1.9.1




[Qemu-devel] [PATCH V6 6/6] coroutine: reduce stack size to 64kB

2016-08-22 Thread Peter Lieven
evaluation with the recently introduced maximum stack usage monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
64kB which should still give enough head room. The guard page added
in qemu_alloc_stack will catch a potential stack overflow introduced
by this commit.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index f62f83f..011910f 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,7 +28,7 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE (1 << 16)
 
 typedef enum {
 COROUTINE_YIELD = 1,
-- 
1.9.1




[Qemu-devel] [PATCH V6 2/6] coroutine: add a macro for the coroutine stack size

2016-08-22 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 include/qemu/coroutine_int.h | 2 ++
 util/coroutine-sigaltstack.c | 2 +-
 util/coroutine-ucontext.c| 2 +-
 util/coroutine-win32.c   | 2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 581a7f5..f62f83f 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,6 +28,8 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
+#define COROUTINE_STACK_SIZE (1 << 20)
+
 typedef enum {
 COROUTINE_YIELD = 1,
 COROUTINE_TERMINATE = 2,
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a7c3366..9c2854c 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,7 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 2bb7e10..31254ab 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,7 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c
index 02e28e8..de6bd4f 100644
--- a/util/coroutine-win32.c
+++ b/util/coroutine-win32.c
@@ -71,7 +71,7 @@ static void CALLBACK coroutine_trampoline(void *co_)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineWin32 *co;
 
 co = g_malloc0(sizeof(*co));
-- 
1.9.1




[Qemu-devel] [PATCH V6 0/6] coroutine: mmap stack memory and stack size

2016-08-22 Thread Peter Lieven
I decided to split this from the rest of the Qemu RSS usage series as
it contains the more or less non contentious patches.

I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
side effects it has.

I kept the guard page which is now nicely makes the stacks visible in
smaps. The old version of the relevent patch lacked the MAP_FIXED flag
in the second call to mmap.

The last patch which reduces the stack size of coroutines to 64kB
may be omitted if its found to risky.

v5->v6:
 - Patch 1: added info that the guard page is deducted from stack memory to
commit msg and headers [Stefan]
 - rebased to master

v4->v5:
 - Patch 1: check if _SC_THREAD_STACK_MIN is defined
 - Patch 1: guard against sysconf(_SC_THREAD_STACK_MIN) returning -1 [Eric]

v3->v4:
 - Patch 1: add a static function to adjust the stack size [Richard]
 - Patch 1: round up the stack size to multiple of the pagesize.

v2->v3:
 - Patch 1,6: adjusted commit message to mention the guard page [Markus]

v1->v2:
 - Patch 1: added an architecture dependend guard page [Richard]
 - Patch 1: avoid stacks smaller than _SC_THREAD_STACK_MIN [Richard]
 - Patch 1: use mmap+mprotect instead of mmap+mmap [Richard]
 - Patch 5: u_int32_t -> uint32_t [Richard]
 - Patch 5: only available if stack grows down

Peter Lieven (6):
  oslib-posix: add helpers for stack alloc and free
  coroutine: add a macro for the coroutine stack size
  coroutine-ucontext: use helper for allocating stack memory
  coroutine-sigaltstack: use helper for allocating stack memory
  oslib-posix: add a configure switch to debug stack usage
  coroutine: reduce stack size to 64kB

 configure| 19 ++
 include/qemu/coroutine_int.h |  2 ++
 include/sysemu/os-posix.h| 27 ++
 util/coroutine-sigaltstack.c |  7 ++--
 util/coroutine-ucontext.c|  9 +++--
 util/coroutine-win32.c   |  2 +-
 util/oslib-posix.c   | 83 
 7 files changed, 139 insertions(+), 10 deletions(-)

-- 
1.9.1




Re: [Qemu-devel] [PATCH V5 1/6] oslib-posix: add helpers for stack alloc and free

2016-08-08 Thread Peter Lieven
Am 08.08.2016 um 12:37 schrieb Stefan Hajnoczi:
> On Tue, Jul 12, 2016 at 06:23:01PM +0200, Peter Lieven wrote:
>> the allocated stack will be adjusted to the minimum supported stack size
>> by the OS and rounded up to be a multiple of the system pagesize.
>> Additionally an architecture dependent guard page is added to the stack
>> to catch stack overflows.
>>
>> Signed-off-by: Peter Lieven <p...@kamp.de>
>> ---
>>  include/sysemu/os-posix.h | 23 +++
>>  util/oslib-posix.c| 46 
>> ++
>>  2 files changed, 69 insertions(+)
>>
>> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
>> index 9c7dfdf..7630665 100644
>> --- a/include/sysemu/os-posix.h
>> +++ b/include/sysemu/os-posix.h
>> @@ -60,4 +60,27 @@ int qemu_utimens(const char *path, const qemu_timespec 
>> *times);
>>  
>>  bool is_daemonized(void);
>>  
>> +/**
>> + * qemu_alloc_stack:
>> + * @sz: size of required stack in bytes
>> + *
>> + * Allocate memory that can be used as a stack, for instance for
>> + * coroutines. If the memory cannot be allocated, this function
>> + * will abort (like g_malloc()).
>> + *
>> + * The allocated stack must be freed with qemu_free_stack().
>> + *
>> + * Returns: pointer to (the lowest address of) the stack memory.
>> + */
>> +void *qemu_alloc_stack(size_t sz);
>> +
>> +/**
>> + * qemu_free_stack:
>> + * @stack: stack to free
>> + * @sz: size of stack in bytes
>> + *
>> + * Free a stack allocated via qemu_alloc_stack().
>> + */
>> +void qemu_free_stack(void *stack, size_t sz);
>> +
>>  #endif
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index e2e1d4d..2303ca6 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -497,3 +497,49 @@ pid_t qemu_fork(Error **errp)
>>  }
>>  return pid;
>>  }
>> +
>> +static size_t adjust_stack_size(size_t sz)
>> +{
>> +#ifdef _SC_THREAD_STACK_MIN
>> +/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
>> +sz = MAX(MAX(sysconf(_SC_THREAD_STACK_MIN), 0), sz);
>> +#endif
>> +/* adjust stack size to a multiple of the page size */
>> +sz = ROUND_UP(sz, getpagesize());
>> +return sz;
>> +}
>> +
>> +void *qemu_alloc_stack(size_t sz)
>> +{
>> +void *ptr, *guardpage;
>> +size_t pagesz = getpagesize();
>> +sz = adjust_stack_size(sz);
>> +
>> +ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE,
> It's cleaner to count for the guard page separately and give the caller
> the sz bytes they expected:
>
>   sz + pagesz

I don't mind to change that, but the glibc stack allocation routine
does it exactly this way. (see glibc/nptl/allocatestack.c).
As we already used other parts (minimal stack size, positioning
of the guard page), we mimicked this as well.

Peter



[Qemu-devel] [RESEND PATCH V5 2/6] coroutine: add a macro for the coroutine stack size

2016-08-04 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 include/qemu/coroutine_int.h | 2 ++
 util/coroutine-sigaltstack.c | 2 +-
 util/coroutine-ucontext.c| 2 +-
 util/coroutine-win32.c   | 2 +-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index 42d6838..eac323a 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,6 +28,8 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
+#define COROUTINE_STACK_SIZE (1 << 20)
+
 typedef enum {
 COROUTINE_YIELD = 1,
 COROUTINE_TERMINATE = 2,
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index a7c3366..9c2854c 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,7 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 2bb7e10..31254ab 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,7 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c
index 02e28e8..de6bd4f 100644
--- a/util/coroutine-win32.c
+++ b/util/coroutine-win32.c
@@ -71,7 +71,7 @@ static void CALLBACK coroutine_trampoline(void *co_)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = 1 << 20;
+const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineWin32 *co;
 
 co = g_malloc0(sizeof(*co));
-- 
1.9.1




[Qemu-devel] [RESEND PATCH V5 5/6] oslib-posix: add a configure switch to debug stack usage

2016-08-04 Thread Peter Lieven
this adds a knob to track the maximum stack usage of stacks
created by qemu_alloc_stack.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
---
 configure  | 19 +++
 util/oslib-posix.c | 37 +
 2 files changed, 56 insertions(+)

diff --git a/configure b/configure
index 5ada56d..a7ee2f3 100755
--- a/configure
+++ b/configure
@@ -296,6 +296,7 @@ libiscsi=""
 libnfs=""
 coroutine=""
 coroutine_pool=""
+debug_stack_usage="no"
 seccomp=""
 glusterfs=""
 glusterfs_xlator_opt="no"
@@ -1005,6 +1006,8 @@ for opt do
   ;;
   --enable-coroutine-pool) coroutine_pool="yes"
   ;;
+  --enable-debug-stack-usage) debug_stack_usage="yes"
+  ;;
   --disable-docs) docs="no"
   ;;
   --enable-docs) docs="yes"
@@ -4302,6 +4305,17 @@ if test "$coroutine" = "gthread" -a "$coroutine_pool" = 
"yes"; then
   error_exit "'gthread' coroutine backend does not support pool (use 
--disable-coroutine-pool)"
 fi
 
+if test "$debug_stack_usage" = "yes"; then
+  if test "$cpu" = "ia64" -o "$cpu" = "hppa"; then
+error_exit "stack usage debugging is not supported for $cpu"
+  fi
+  if test "$coroutine_pool" = "yes"; then
+echo "WARN: disabling coroutine pool for stack usage debugging"
+coroutine_pool=no
+  fi
+fi
+
+
 ##
 # check if we have open_by_handle_at
 
@@ -4879,6 +4893,7 @@ echo "QGA MSI support   $guest_agent_msi"
 echo "seccomp support   $seccomp"
 echo "coroutine backend $coroutine"
 echo "coroutine pool$coroutine_pool"
+echo "debug stack usage $debug_stack_usage"
 echo "GlusterFS support $glusterfs"
 echo "Archipelago support $archipelago"
 echo "gcov  $gcov_tool"
@@ -5347,6 +5362,10 @@ else
   echo "CONFIG_COROUTINE_POOL=0" >> $config_host_mak
 fi
 
+if test "$debug_stack_usage" = "yes" ; then
+  echo "CONFIG_DEBUG_STACK_USAGE=y" >> $config_host_mak
+fi
+
 if test "$open_by_handle_at" = "yes" ; then
   echo "CONFIG_OPEN_BY_HANDLE=y" >> $config_host_mak
 fi
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 2303ca6..e818d38 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -50,6 +50,10 @@
 
 #include 
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+#include "qemu/error-report.h"
+#endif
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -512,6 +516,9 @@ static size_t adjust_stack_size(size_t sz)
 void *qemu_alloc_stack(size_t sz)
 {
 void *ptr, *guardpage;
+#ifdef CONFIG_DEBUG_STACK_USAGE
+void *ptr2;
+#endif
 size_t pagesz = getpagesize();
 sz = adjust_stack_size(sz);
 
@@ -535,11 +542,41 @@ void *qemu_alloc_stack(size_t sz)
 abort();
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr2 = ptr + pagesz; ptr2 < ptr + sz; ptr2 += sizeof(uint32_t)) {
+*(uint32_t *)ptr2 = 0xdeadbeaf;
+}
+#endif
+
 return ptr;
 }
 
+#ifdef CONFIG_DEBUG_STACK_USAGE
+static __thread unsigned int max_stack_usage;
+#endif
+
 void qemu_free_stack(void *stack, size_t sz)
 {
+#ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned int usage;
+void *ptr;
+#endif
 sz = adjust_stack_size(sz);
+
+#ifdef CONFIG_DEBUG_STACK_USAGE
+for (ptr = stack + getpagesize(); ptr < stack + sz;
+ ptr += sizeof(uint32_t)) {
+if (*(uint32_t *)ptr != 0xdeadbeaf) {
+break;
+}
+}
+usage = sz - (uintptr_t) (ptr - stack);
+if (usage > max_stack_usage) {
+error_report("thread %d max stack usage increased from %u to %u",
+ qemu_get_thread_id(), max_stack_usage, usage);
+max_stack_usage = usage;
+}
+#endif
+
 munmap(stack, sz);
 }
-- 
1.9.1




[Qemu-devel] [RESEND PATCH V5 1/6] oslib-posix: add helpers for stack alloc and free

2016-08-04 Thread Peter Lieven
the allocated stack will be adjusted to the minimum supported stack size
by the OS and rounded up to be a multiple of the system pagesize.
Additionally an architecture dependent guard page is added to the stack
to catch stack overflows.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 include/sysemu/os-posix.h | 23 +++
 util/oslib-posix.c| 46 ++
 2 files changed, 69 insertions(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index 9c7dfdf..7630665 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -60,4 +60,27 @@ int qemu_utimens(const char *path, const qemu_timespec 
*times);
 
 bool is_daemonized(void);
 
+/**
+ * qemu_alloc_stack:
+ * @sz: size of required stack in bytes
+ *
+ * Allocate memory that can be used as a stack, for instance for
+ * coroutines. If the memory cannot be allocated, this function
+ * will abort (like g_malloc()).
+ *
+ * The allocated stack must be freed with qemu_free_stack().
+ *
+ * Returns: pointer to (the lowest address of) the stack memory.
+ */
+void *qemu_alloc_stack(size_t sz);
+
+/**
+ * qemu_free_stack:
+ * @stack: stack to free
+ * @sz: size of stack in bytes
+ *
+ * Free a stack allocated via qemu_alloc_stack().
+ */
+void qemu_free_stack(void *stack, size_t sz);
+
 #endif
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e2e1d4d..2303ca6 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -497,3 +497,49 @@ pid_t qemu_fork(Error **errp)
 }
 return pid;
 }
+
+static size_t adjust_stack_size(size_t sz)
+{
+#ifdef _SC_THREAD_STACK_MIN
+/* avoid stacks smaller than _SC_THREAD_STACK_MIN */
+sz = MAX(MAX(sysconf(_SC_THREAD_STACK_MIN), 0), sz);
+#endif
+/* adjust stack size to a multiple of the page size */
+sz = ROUND_UP(sz, getpagesize());
+return sz;
+}
+
+void *qemu_alloc_stack(size_t sz)
+{
+void *ptr, *guardpage;
+size_t pagesz = getpagesize();
+sz = adjust_stack_size(sz);
+
+ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (ptr == MAP_FAILED) {
+abort();
+}
+
+#if defined(HOST_IA64)
+/* separate register stack */
+guardpage = ptr + (((sz - pagesz) / 2) & ~pagesz);
+#elif defined(HOST_HPPA)
+/* stack grows up */
+guardpage = ptr + sz - pagesz;
+#else
+/* stack grows down */
+guardpage = ptr;
+#endif
+if (mprotect(guardpage, pagesz, PROT_NONE) != 0) {
+abort();
+}
+
+return ptr;
+}
+
+void qemu_free_stack(void *stack, size_t sz)
+{
+sz = adjust_stack_size(sz);
+munmap(stack, sz);
+}
-- 
1.9.1




[Qemu-devel] [RESEND PATCH V5 3/6] coroutine-ucontext: use helper for allocating stack memory

2016-08-04 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 util/coroutine-ucontext.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 31254ab..b7dea8c 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -82,7 +82,6 @@ static void coroutine_trampoline(int i0, int i1)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 ucontext_t old_uc, uc;
 sigjmp_buf old_env;
@@ -101,17 +100,17 @@ Coroutine *qemu_coroutine_new(void)
 }
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack = qemu_alloc_stack(COROUTINE_STACK_SIZE);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 uc.uc_link = _uc;
 uc.uc_stack.ss_sp = co->stack;
-uc.uc_stack.ss_size = stack_size;
+uc.uc_stack.ss_size = COROUTINE_STACK_SIZE;
 uc.uc_stack.ss_flags = 0;
 
 #ifdef CONFIG_VALGRIND_H
 co->valgrind_stack_id =
-VALGRIND_STACK_REGISTER(co->stack, co->stack + stack_size);
+VALGRIND_STACK_REGISTER(co->stack, co->stack + COROUTINE_STACK_SIZE);
 #endif
 
 arg.p = co;
@@ -149,7 +148,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 valgrind_stack_deregister(co);
 #endif
 
-g_free(co->stack);
+qemu_free_stack(co->stack, COROUTINE_STACK_SIZE);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [RESEND PATCH V5 4/6] coroutine-sigaltstack: use helper for allocating stack memory

2016-08-04 Thread Peter Lieven
Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Paolo Bonzini <pbonz...@redhat.com>
Reviewed-by: Richard Henderson <r...@twiddle.net>
---
 util/coroutine-sigaltstack.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index 9c2854c..ccf4861 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -143,7 +143,6 @@ static void coroutine_trampoline(int signal)
 
 Coroutine *qemu_coroutine_new(void)
 {
-const size_t stack_size = COROUTINE_STACK_SIZE;
 CoroutineUContext *co;
 CoroutineThreadState *coTS;
 struct sigaction sa;
@@ -164,7 +163,7 @@ Coroutine *qemu_coroutine_new(void)
  */
 
 co = g_malloc0(sizeof(*co));
-co->stack = g_malloc(stack_size);
+co->stack = qemu_alloc_stack(COROUTINE_STACK_SIZE);
 co->base.entry_arg = _env; /* stash away our jmp_buf */
 
 coTS = coroutine_get_thread_state();
@@ -189,7 +188,7 @@ Coroutine *qemu_coroutine_new(void)
  * Set the new stack.
  */
 ss.ss_sp = co->stack;
-ss.ss_size = stack_size;
+ss.ss_size = COROUTINE_STACK_SIZE;
 ss.ss_flags = 0;
 if (sigaltstack(, ) < 0) {
 abort();
@@ -253,7 +252,7 @@ void qemu_coroutine_delete(Coroutine *co_)
 {
 CoroutineUContext *co = DO_UPCAST(CoroutineUContext, base, co_);
 
-g_free(co->stack);
+qemu_free_stack(co->stack, COROUTINE_STACK_SIZE);
 g_free(co);
 }
 
-- 
1.9.1




[Qemu-devel] [RESEND PATCH V5 0/6] coroutine: mmap stack memory and stack size

2016-08-04 Thread Peter Lieven
I decided to split this from the rest of the Qemu RSS usage series as
it contains the more or less non contentious patches.

I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
side effects it has.

I kept the guard page which is now nicely makes the stacks visible in
smaps. The old version of the relevent patch lacked the MAP_FIXED flag
in the second call to mmap.

The last patch which reduces the stack size of coroutines to 64kB
may be omitted if its found to risky.

v4->v5:
 - Patch 1: check if _SC_THREAD_STACK_MIN is defined
 - Patch 1: guard against sysconf(_SC_THREAD_STACK_MIN) returning -1 [Eric]

v3->v4:
 - Patch 1: add a static function to adjust the stack size [Richard]
 - Patch 1: round up the stack size to multiple of the pagesize.

v2->v3:
 - Patch 1,6: adjusted commit message to mention the guard page [Markus]

v1->v2:
 - Patch 1: added an architecture dependend guard page [Richard]
 - Patch 1: avoid stacks smaller than _SC_THREAD_STACK_MIN [Richard]
 - Patch 1: use mmap+mprotect instead of mmap+mmap [Richard]
 - Patch 5: u_int32_t -> uint32_t [Richard]
 - Patch 5: only available if stack grows down

Peter Lieven (6):
  oslib-posix: add helpers for stack alloc and free
  coroutine: add a macro for the coroutine stack size
  coroutine-ucontext: use helper for allocating stack memory
  coroutine-sigaltstack: use helper for allocating stack memory
  oslib-posix: add a configure switch to debug stack usage
  coroutine: reduce stack size to 64kB

 configure| 19 ++
 include/qemu/coroutine_int.h |  2 ++
 include/sysemu/os-posix.h| 23 
 util/coroutine-sigaltstack.c |  7 ++--
 util/coroutine-ucontext.c|  9 +++--
 util/coroutine-win32.c   |  2 +-
 util/oslib-posix.c   | 83 
 7 files changed, 135 insertions(+), 10 deletions(-)

-- 
1.9.1




[Qemu-devel] [RESEND PATCH V5 6/6] coroutine: reduce stack size to 64kB

2016-08-04 Thread Peter Lieven
evaluation with the recently introduced maximum stack usage monitoring revealed
that the actual used stack size was never above 4kB so allocating 1MB stack
for each coroutine is a lot of wasted memory. So reduce the stack size to
64kB which should still give enough head room. The guard page added
in qemu_alloc_stack will catch a potential stack overflow introduced
by this commit.

Signed-off-by: Peter Lieven <p...@kamp.de>
Reviewed-by: Eric Blake <ebl...@redhat.com>
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index eac323a..f84d777 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -28,7 +28,7 @@
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
 
-#define COROUTINE_STACK_SIZE (1 << 20)
+#define COROUTINE_STACK_SIZE (1 << 16)
 
 typedef enum {
 COROUTINE_YIELD = 1,
-- 
1.9.1




Re: [Qemu-devel] [PATCH] block/iscsi: Adding iser support in Libiscsi-QEMU

2016-07-28 Thread Peter Lieven

Am 27.07.2016 um 12:02 schrieb Roy Shterman:

iSER is a new transport layer supported in Libiscsi,
iSER provides a zero-copy RDMA capable interface that can
improve performance.

New API is introduced in abstracion of the Libiscsi transport layer.
In order to use the new iSER transport, one need to add the ?iser option
at the end of Libiscsi URI.

For now iSER memory buffers are pre-allocated and pre-registered,
hence in order to work with iSER from QEMU, one need to enable MEMLOCK
attribute in the VM to be large enough for all iSER buffers and RDMA
resources.

A new functionallity is also introduced in this commit, a new API
to deploy zero-copy command submission. iSER is differing from TCP in
data-path, hence IO vectors must be transferred already when queueing
the PDU.

Signed-off-by: Roy Shterman 
---
  block/iscsi.c |   45 +
  1 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 7e78ade..6b95636 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -41,6 +41,7 @@
  #include "qapi/qmp/qstring.h"
  #include "crypto/secret.h"
  
+#include "qemu/uri.h"

  #include 
  #include 
  
@@ -484,6 +485,18 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors,

  iscsi_co_init_iscsitask(iscsilun, );
  retry:
  if (iscsilun->use_16_for_rw) {
+#if LIBISCSI_API_VERSION >= (20160603)
+iTask.task = iscsi_write16_iov_task(iscsilun->iscsi, iscsilun->lun, 
lba,
+NULL, num_sectors * 
iscsilun->block_size,
+iscsilun->block_size, 0, 0, fua, 
0, 0,
+iscsi_co_generic_cb, , (struct 
scsi_iovec *)iov->iov, iov->niov);
+} else {
+iTask.task = iscsi_write10_iov_task(iscsilun->iscsi, iscsilun->lun, 
lba,
+NULL, num_sectors * 
iscsilun->block_size,
+iscsilun->block_size, 0, 0, fua, 
0, 0,
+iscsi_co_generic_cb, , (struct 
scsi_iovec *)iov->iov, iov->niov);
+}
+#else
  iTask.task = iscsi_write16_task(iscsilun->iscsi, iscsilun->lun, lba,
  NULL, num_sectors * 
iscsilun->block_size,
  iscsilun->block_size, 0, 0, fua, 0, 0,
@@ -494,11 +507,14 @@ retry:
  iscsilun->block_size, 0, 0, fua, 0, 0,
  iscsi_co_generic_cb, );
  }
+#endif
  if (iTask.task == NULL) {
  return -ENOMEM;
  }
+#if LIBISCSI_API_VERSION < (20160603)
  scsi_task_set_iov_out(iTask.task, (struct scsi_iovec *) iov->iov,
iov->niov);
+#endif
  while (!iTask.complete) {
  iscsi_set_events(iscsilun);
  qemu_coroutine_yield();
@@ -677,6 +693,19 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
  iscsi_co_init_iscsitask(iscsilun, );
  retry:
  if (iscsilun->use_16_for_rw) {
+#if LIBISCSI_API_VERSION >= (20160603)
+iTask.task = iscsi_read16_iov_task(iscsilun->iscsi, iscsilun->lun, lba,
+   num_sectors * iscsilun->block_size,
+   iscsilun->block_size, 0, 0, 0, 0, 0,
+   iscsi_co_generic_cb, , (struct 
scsi_iovec *)iov->iov, iov->niov);
+} else {
+iTask.task = iscsi_read10_iov_task(iscsilun->iscsi, iscsilun->lun, lba,
+   num_sectors * iscsilun->block_size,
+   iscsilun->block_size,
+   0, 0, 0, 0, 0,
+   iscsi_co_generic_cb, , (struct 
scsi_iovec *)iov->iov, iov->niov);
+}
+#else
  iTask.task = iscsi_read16_task(iscsilun->iscsi, iscsilun->lun, lba,
 num_sectors * iscsilun->block_size,
 iscsilun->block_size, 0, 0, 0, 0, 0,
@@ -688,11 +717,13 @@ retry:
 0, 0, 0, 0, 0,
 iscsi_co_generic_cb, );
  }
+#endif
  if (iTask.task == NULL) {
  return -ENOMEM;
  }
+#if LIBISCSI_API_VERSION < (20160603)
  scsi_task_set_iov_in(iTask.task, (struct scsi_iovec *) iov->iov, 
iov->niov);
-
+#endif
  while (!iTask.complete) {
  iscsi_set_events(iscsilun);
  qemu_coroutine_yield();
@@ -1477,9 +1508,9 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
  
  filename = qemu_opt_get(opts, "filename");
  
-iscsi_url = iscsi_parse_full_url(iscsi, filename);

+iscsi_url = iscsi_parse_full_url(iscsi, uri_string_unescape(filename, -1, 
NULL));
  if (iscsi_url == NULL) {
-

Re: [Qemu-devel] [PATCH V5 0/6] coroutine: mmap stack memory and stack size

2016-07-27 Thread Peter Lieven
Am 12.07.2016 um 18:23 schrieb Peter Lieven:
> I decided to split this from the rest of the Qemu RSS usage series as
> it contains the more or less non contentious patches.
>
> I omitted the MAP_GROWSDOWN flag in mmap as we are not 100% sure which
> side effects it has.
>
> I kept the guard page which is now nicely makes the stacks visible in
> smaps. The old version of the relevent patch lacked the MAP_FIXED flag
> in the second call to mmap.
>
> The last patch which reduces the stack size of coroutines to 64kB
> may be omitted if its found to risky.
>
> v4->v5:
>  - Patch 1: check if _SC_THREAD_STACK_MIN is defined
>  - Patch 1: guard against sysconf(_SC_THREAD_STACK_MIN) returning -1 [Eric]
>
> v3->v4:
>  - Patch 1: add a static function to adjust the stack size [Richard]
>  - Patch 1: round up the stack size to multiple of the pagesize.
>
> v2->v3:
>  - Patch 1,6: adjusted commit message to mention the guard page [Markus]
>
> v1->v2:
>  - Patch 1: added an architecture dependend guard page [Richard]
>  - Patch 1: avoid stacks smaller than _SC_THREAD_STACK_MIN [Richard]
>  - Patch 1: use mmap+mprotect instead of mmap+mmap [Richard]
>  - Patch 5: u_int32_t -> uint32_t [Richard]
>  - Patch 5: only available if stack grows down
>
> Peter Lieven (6):
>   oslib-posix: add helpers for stack alloc and free
>   coroutine: add a macro for the coroutine stack size
>   coroutine-ucontext: use helper for allocating stack memory
>   coroutine-sigaltstack: use helper for allocating stack memory
>   oslib-posix: add a configure switch to debug stack usage
>   coroutine: reduce stack size to 64kB
>
>  configure| 19 ++
>  include/qemu/coroutine_int.h |  2 ++
>  include/sysemu/os-posix.h| 23 
>  util/coroutine-sigaltstack.c |  7 ++--
>  util/coroutine-ucontext.c|  9 +++--
>  util/coroutine-win32.c   |  2 +-
>  util/oslib-posix.c   | 83 
> 
>  7 files changed, 135 insertions(+), 10 deletions(-)
>

Ping.

Peter



<    1   2   3   4   5   6   7   8   9   10   >