Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-07-29 Thread Wang, Wei W
On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote:
> On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote:
> > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote:
> > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote:
> > > > > > > OK I thought this over. While we might need these new APIs
> > > > > > > in the future, I think that at the moment, there's a way to
> > > > > > > implement this feature that is significantly simpler. Just
> > > > > > > add each s/g as a separate input buffer.
> > > > > > Should it be an output buffer?
> > > > > Hypervisor overwrites these pages with zeroes. Therefore it is
> > > > > writeable by device: DMA_FROM_DEVICE.
> > > > Why would the hypervisor need to zero the buffer?
> > > The page is supplied to hypervisor and can lose the value that is
> > > there.  That is the definition of writeable by device.
> >
> > I think for the free pages, it should be clear that they will be added
> > as output buffer to the device, because (as we discussed) they are
> > just hints, and some of them may be used by the guest after the report_ API 
> > is
> invoked.
> > The device/hypervisor should not use or discard them.
> 
> Discarding contents is exactly what you propose doing if migration is going 
> on,
> isn't it?

That's actually a different concept. Please let me explain it with this example:

The hypervisor receives the hint saying the guest PageX is a free page, but as 
we know, 
after that report_ API exits, the guest kernel may take PageX to use, so PageX 
is not free
page any more. At this time, if the hypervisor writes to the page, that would 
crash the guest.
So, I think the cornerstone of this work is that the hypervisor should not 
touch the
reported pages.

Best,
Wei



[Qemu-devel] [Bug 1473451] Re: Please support the native bios format for dec alpha

2017-07-29 Thread Richard Henderson
QEMU does not really implement a "true" ev67.

We cheat and implement something that is significantly faster to emulate.
E.g. doing all TLB refill within qemu, rather than in the PALcode.

So, no, there's no chance of running true SRM or ARC firmware.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1473451

Title:
  Please support the native bios format for dec alpha

Status in QEMU:
  New

Bug description:
  Currently qemu-system-alpha -bios parameter takes an ELF image.
  However HP maintains firmware updates for those systems.

  Some example rom files can be found here
  
ftp://ftp.hp.com/pub/alphaserver/firmware/current_platforms/v7.3_release/DS20_DS20e/

  It might allow things like using the SRM firmware.
  The ARC(nt) firmware would allow to build and test windows applications for 
that platforms without having the relevant hardware

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1473451/+subscriptions



Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-07-29 Thread Michael S. Tsirkin
On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote:
> On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote:
> > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote:
> > > > > > OK I thought this over. While we might need these new APIs in
> > > > > > the future, I think that at the moment, there's a way to implement
> > > > > > this feature that is significantly simpler. Just add each s/g
> > > > > > as a separate input buffer.
> > > > > Should it be an output buffer?
> > > > Hypervisor overwrites these pages with zeroes. Therefore it is
> > > > writeable by device: DMA_FROM_DEVICE.
> > > Why would the hypervisor need to zero the buffer?
> > The page is supplied to hypervisor and can lose the value that
> > is there.  That is the definition of writeable by device.
> 
> I think for the free pages, it should be clear that they will be added as
> output buffer to the device, because (as we discussed) they are just hints,
> and some of them may be used by the guest after the report_ API is invoked.
> The device/hypervisor should not use or discard them.

Discarding contents is exactly what you propose doing if
migration is going on, isn't it?

> For the balloon pages, I kind of agree with the existing implementation
> (e.g. inside tell_host()), which uses virtqueue_add_outbuf (instead of
> _add_inbuf()).


This is because it does not pass SGs, it passes weirdly
formatted PA within the buffer.

> I think inbuf should be a buffer which will be written by the device and
> read by the
> driver.

If hypervisor can change it, it's an inbuf. Should not matter
whether driver reads it.

> The cmd buffer put on the vq for the device to send commands can be
> an
> inbuf, I think.
> 
> 
> > 
> > > I think it may only
> > > need to read out the info(base,size).
> > And then do what?
> 
> 
> For the free pages, the info will be used to clear the corresponding "1" in
> the dirty bitmap.
> For balloon pages, they will be made DONTNEED and given to other host
> processes to
> use (the device won't write them, so no need to set "write" when
> virtqueue_map_desc() in
> the device).
> 
> 
> > 
> > > I think it should be like this:
> > > the cmd hdr buffer: input, because the hypervisor would write it to
> > > send a cmd to the guest
> > > the payload buffer: output, for the hypervisor to read the info
> > These should be split.
> > 
> > We have:
> > 
> > 1. request that hypervisor sends to guest, includes ID: input
> > 2. synchronisation header with ID sent by guest: output
> > 3. list of pages: input
> > 
> > 2 and 3 must be on the same VQ. 1 can come on any VQ - reusing stats VQ
> > might make sense.
> 
> I would prefer to make the above item 1 come on the the free page vq,
> because the existing stat_vq doesn't support the cmd hdr.
> Now, I think it is also not necessary to move the existing stat_vq
> implementation to
> a new implementation under the cmd hdr, because
> that new support doesn't make a difference for stats, it will still use its
> stat_vq (the free
> page vq will be used for reporting free pages only)
> 
> What do you think?
> 
> 
> Best,
> Wei



[Qemu-devel] [PATCH for 2.10] hw/mps2_scc: fix incorrect properties

2017-07-29 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
sorry, I missed them in my review :(

 hw/misc/mps2-scc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
index cc58d26f29..32be2a9df1 100644
--- a/hw/misc/mps2-scc.c
+++ b/hw/misc/mps2-scc.c
@@ -270,9 +270,9 @@ static Property mps2_scc_properties[] = {
 /* Values for various read-only ID registers (which are specific
  * to the board model or FPGA image)
  */
-DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, aid, 0),
+DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, cfg4, 0),
 DEFINE_PROP_UINT32("scc-aid", MPS2SCC, aid, 0),
-DEFINE_PROP_UINT32("scc-id", MPS2SCC, aid, 0),
+DEFINE_PROP_UINT32("scc-id", MPS2SCC, id, 0),
 /* These are the initial settings for the source clocks on the board.
  * In hardware they can be configured via a config file read by the
  * motherboard configuration controller to suit the FPGA image.
-- 
2.13.3




[Qemu-devel] [PATCH v6 2/4] sockets: factor out a new try_bind() function

2017-07-29 Thread Knut Omang
A refactoring step to prepare for the problem
exposed by the test-listen test in the previous commit.

Simplify and reorganize the IPv6 specific extra
measures and move it out of the for loop to increase
code readability. No semantic changes.

Signed-off-by: Knut Omang 
Reviewed-by: Daniel P. Berrange 
---
 util/qemu-sockets.c | 69 ++
 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 1358c81..b4a2f24 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -149,6 +149,44 @@ int inet_ai_family_from_address(InetSocketAddress *addr,
 return PF_UNSPEC;
 }
 
+static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
+{
+#ifndef IPV6_V6ONLY
+return bind(socket, e->ai_addr, e->ai_addrlen);
+#else
+/*
+ * Deals with first & last cases in matrix in comment
+ * for inet_ai_family_from_address().
+ */
+int v6only =
+((!saddr->has_ipv4 && !saddr->has_ipv6) ||
+ (saddr->has_ipv4 && saddr->ipv4 &&
+  saddr->has_ipv6 && saddr->ipv6)) ? 0 : 1;
+int stat;
+
+ rebind:
+if (e->ai_family == PF_INET6) {
+qemu_setsockopt(socket, IPPROTO_IPV6, IPV6_V6ONLY, ,
+sizeof(v6only));
+}
+
+stat = bind(socket, e->ai_addr, e->ai_addrlen);
+if (!stat) {
+return 0;
+}
+
+/* If we got EADDRINUSE from an IPv6 bind & v6only is unset,
+ * it could be that the IPv4 port is already claimed, so retry
+ * with v6only set
+ */
+if (e->ai_family == PF_INET6 && errno == EADDRINUSE && !v6only) {
+v6only = 1;
+goto rebind;
+}
+return stat;
+#endif
+}
+
 static int inet_listen_saddr(InetSocketAddress *saddr,
  int port_offset,
  bool update_addr,
@@ -228,39 +266,10 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
 port_min = inet_getport(e);
 port_max = saddr->has_to ? saddr->to + port_offset : port_min;
 for (p = port_min; p <= port_max; p++) {
-#ifdef IPV6_V6ONLY
-/*
- * Deals with first & last cases in matrix in comment
- * for inet_ai_family_from_address().
- */
-int v6only =
-((!saddr->has_ipv4 && !saddr->has_ipv6) ||
- (saddr->has_ipv4 && saddr->ipv4 &&
-  saddr->has_ipv6 && saddr->ipv6)) ? 0 : 1;
-#endif
 inet_setport(e, p);
-#ifdef IPV6_V6ONLY
-rebind:
-if (e->ai_family == PF_INET6) {
-qemu_setsockopt(slisten, IPPROTO_IPV6, IPV6_V6ONLY, ,
-sizeof(v6only));
-}
-#endif
-if (bind(slisten, e->ai_addr, e->ai_addrlen) == 0) {
+if (try_bind(slisten, saddr, e) >= 0) {
 goto listen;
 }
-
-#ifdef IPV6_V6ONLY
-/* If we got EADDRINUSE from an IPv6 bind & V6ONLY is unset,
- * it could be that the IPv4 port is already claimed, so retry
- * with V6ONLY set
- */
-if (e->ai_family == PF_INET6 && errno == EADDRINUSE && !v6only) {
-v6only = 1;
-goto rebind;
-}
-#endif
-
 if (p == port_max) {
 if (!e->ai_next) {
 error_setg_errno(errp, errno, "Failed to bind socket");
-- 
git-series 0.9.1



[Qemu-devel] [PATCH v6 3/4] sockets: factor out create_fast_reuse_socket

2017-07-29 Thread Knut Omang
Another refactoring step to prepare for fixing the problem
exposed with the test-listen test in the previous commit

Signed-off-by: Knut Omang 
---
 util/qemu-sockets.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index b4a2f24..d0d2047 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -149,6 +149,16 @@ int inet_ai_family_from_address(InetSocketAddress *addr,
 return PF_UNSPEC;
 }
 
+static int create_fast_reuse_socket(struct addrinfo *e)
+{
+int slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
+if (slisten < 0) {
+return -1;
+}
+socket_set_fast_reuse(slisten);
+return slisten;
+}
+
 static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
 {
 #ifndef IPV6_V6ONLY
@@ -253,7 +263,8 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
 getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
uaddr,INET6_ADDRSTRLEN,uport,32,
NI_NUMERICHOST | NI_NUMERICSERV);
-slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol);
+
+slisten = create_fast_reuse_socket(e);
 if (slisten < 0) {
 if (!e->ai_next) {
 error_setg_errno(errp, errno, "Failed to create socket");
@@ -261,8 +272,6 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
 continue;
 }
 
-socket_set_fast_reuse(slisten);
-
 port_min = inet_getport(e);
 port_max = saddr->has_to ? saddr->to + port_offset : port_min;
 for (p = port_min; p <= port_max; p++) {
-- 
git-series 0.9.1



[Qemu-devel] [PATCH v6 4/4] sockets: Handle race condition between binds to the same port

2017-07-29 Thread Knut Omang
If an offset of ports is specified to the inet_listen_saddr function(),
and two or more processes tries to bind from these ports at the same time,
occasionally more than one process may be able to bind to the same
port. The condition is detected by listen() but too late to avoid a failure.

This function is called by socket_listen() and used
by all socket listening code in QEMU, so all cases where any form of dynamic
port selection is used should be subject to this issue.

Add code to close and re-establish the socket when this
condition is observed, hiding the race condition from the user.

Also clean up some issues with error handling to allow more
accurate reporting of the cause of an error.

This has been developed and tested by means of the
test-listen unit test in the previous commit.
Enable the test for make check now that it passes.

Reviewed-by: Bhavesh Davda 
Reviewed-by: Yuval Shaia 
Reviewed-by: Girish Moodalbail 
Signed-off-by: Knut Omang 
---
 tests/Makefile.include |  2 +-
 util/qemu-sockets.c| 58 ++-
 2 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index b37c0c8..9d2131d 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -128,7 +128,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
 gcov-files-check-bufferiszero-y = util/bufferiszero.c
 check-unit-y += tests/test-uuid$(EXESUF)
 check-unit-y += tests/ptimer-test$(EXESUF)
-#check-unit-y += tests/test-listen$(EXESUF)
+check-unit-y += tests/test-listen$(EXESUF)
 gcov-files-ptimer-test-y = hw/core/ptimer.c
 check-unit-y += tests/test-qapi-util$(EXESUF)
 gcov-files-test-qapi-util-y = qapi/qapi-util.c
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index d0d2047..6a511fb 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -206,7 +206,10 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
 char port[33];
 char uaddr[INET6_ADDRSTRLEN+1];
 char uport[33];
-int slisten, rc, port_min, port_max, p;
+int rc, port_min, port_max, p;
+int slisten = 0;
+int saved_errno = 0;
+bool socket_created = false;
 Error *err = NULL;
 
 memset(,0, sizeof(ai));
@@ -258,7 +261,7 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
 return -1;
 }
 
-/* create socket + bind */
+/* create socket + bind/listen */
 for (e = res; e != NULL; e = e->ai_next) {
 getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
uaddr,INET6_ADDRSTRLEN,uport,32,
@@ -266,37 +269,58 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
 
 slisten = create_fast_reuse_socket(e);
 if (slisten < 0) {
-if (!e->ai_next) {
-error_setg_errno(errp, errno, "Failed to create socket");
-}
 continue;
 }
 
+socket_created = true;
 port_min = inet_getport(e);
 port_max = saddr->has_to ? saddr->to + port_offset : port_min;
 for (p = port_min; p <= port_max; p++) {
 inet_setport(e, p);
-if (try_bind(slisten, saddr, e) >= 0) {
-goto listen;
-}
-if (p == port_max) {
-if (!e->ai_next) {
+rc = try_bind(slisten, saddr, e);
+if (rc) {
+if (errno == EADDRINUSE) {
+continue;
+} else {
 error_setg_errno(errp, errno, "Failed to bind socket");
+goto listen_failed;
 }
 }
+if (!listen(slisten, 1)) {
+goto listen_ok;
+}
+if (errno != EADDRINUSE) {
+error_setg_errno(errp, errno, "Failed to listen on socket");
+goto listen_failed;
+}
+/* Someone else managed to bind to the same port and beat us
+ * to listen on it! Socket semantics does not allow us to
+ * recover from this situation, so we need to recreate the
+ * socket to allow bind attempts for subsequent ports:
+ */
+closesocket(slisten);
+slisten = create_fast_reuse_socket(e);
+if (slisten < 0) {
+error_setg_errno(errp, errno,
+ "Failed to recreate failed listening socket");
+goto listen_failed;
+}
 }
+}
+error_setg_errno(errp, errno,
+ socket_created ?
+ "Failed to find an available port" :
+ "Failed to create a socket");
+listen_failed:
+saved_errno = errno;
+if (slisten >= 0) {
 closesocket(slisten);
 }
 freeaddrinfo(res);
+errno = saved_errno;
 return -1;
 
-listen:
-if (listen(slisten,1) != 0) {
-

[Qemu-devel] [PATCH v6 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port

2017-07-29 Thread Knut Omang
This series contains:
* a unit test that exposes a race condition which causes QEMU to fail
  to find a port even when there is plenty of available ports.
* a refactor of the qemu-sockets inet_listen_saddr() function
  to better handle this situation.

Changes from v5:
* Also move setting of error from socket creation out
  into the main double for loop.
* Simplify and enhance reporting if socket creation fails:
  Only report failure to create sockets if none of the
  provided addrinfo list elements allows creation of a
  socket, otherwise report failure to find an available port.
* Further simplify if's within the for loops and make sure
  failure to recreate a socket gets distinctively reported.
* Rebased to current master.

Changes from v4:
* Move the complexity of recreating a socket and setting the error pointer
  into the main for loop, eliminating the try_bind_listen() function
  again. Cleaning up and improving error handling in the process.

Changes from v3:
* Test changes: Add missing license
  Add subtests for ipv4, ipv6 and both
  Various g_* usage improvements
* Split patch into 3 patches with two refactoring patches ahead
  of the actual fix.

Changes from v2:
* Non-trivial rebase + further abstraction
  on top of 7ad9af343c7f1c70c8015c7c519c312d8c5f9fa1
  'tests: add functional test validating ipv4/ipv6 address flag handling'

Changes from v1:
* Fix potential uninitialized variable only detected by optimize.
* Improve unexpected error detection in test-listen to give more
  details about why the test fails unexpectedly.
* Fix some line length style issues.

Thanks,
Knut

Knut Omang (4):
  tests: Add test-listen - a stress test for QEMU socket listen
  sockets: factor out a new try_bind() function
  sockets: factor out create_fast_reuse_socket
  sockets: Handle race condition between binds to the same port

 tests/Makefile.include |   2 +-
 tests/test-listen.c| 253 ++-
 util/qemu-sockets.c| 138 +++
 3 files changed, 345 insertions(+), 48 deletions(-)
 create mode 100644 tests/test-listen.c

base-commit: a588c4985eff363154d65aee8607d0a4601655f7
-- 
git-series 0.9.1



[Qemu-devel] [PATCH v6 1/4] tests: Add test-listen - a stress test for QEMU socket listen

2017-07-29 Thread Knut Omang
There's a potential race condition between multiple bind()'s
attempting to bind to the same port, which occasionally
allows more than one bind to succeed against the same port.

When a subsequent listen() call is made with the same socket
only one will succeed.

The current QEMU code does however not take this situation into account
and the listen will cause the code to break out and fail even
when there are actually available ports to use.

This test exposes two subtests:

/socket/listen-serial
/socket/listen-compete

The "compete" subtest creates a number of threads and have them all trying to 
bind
to the same port with a large enough offset input to
allow all threads to get it's own port.
The "serial" subtest just does the same, except in series in a
single thread.

The serial version passes, probably in most versions of QEMU.

The parallel version exposes the problem in a relatively reliable way,
eg. it fails a majority of times, but not with a 100% rate, occasional
passes can be seen. Nevertheless this is quite good given that
the bug was tricky to reproduce and has been left undetected for
a while.

The problem seems to be present in all versions of QEMU.

The original failure scenario occurred with VNC port allocation
in a traditional Xen based build, in different code
but with similar functionality.

Reported-by: Bhavesh Davda 
Signed-off-by: Knut Omang 
Reviewed-by: Yuval Shaia 
Reviewed-by: Bhavesh Davda 
Reviewed-by: Girish Moodalbail 
---
 tests/Makefile.include |   2 +-
 tests/test-listen.c| 253 ++-
 2 files changed, 255 insertions(+)
 create mode 100644 tests/test-listen.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 7af278d..b37c0c8 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -128,6 +128,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
 gcov-files-check-bufferiszero-y = util/bufferiszero.c
 check-unit-y += tests/test-uuid$(EXESUF)
 check-unit-y += tests/ptimer-test$(EXESUF)
+#check-unit-y += tests/test-listen$(EXESUF)
 gcov-files-ptimer-test-y = hw/core/ptimer.c
 check-unit-y += tests/test-qapi-util$(EXESUF)
 gcov-files-test-qapi-util-y = qapi/qapi-util.c
@@ -769,6 +770,7 @@ tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o
 tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
 tests/numa-test$(EXESUF): tests/numa-test.o
 tests/vmgenid-test$(EXESUF): tests/vmgenid-test.o tests/boot-sector.o 
tests/acpi-utils.o
+tests/test-listen$(EXESUF): tests/test-listen.o $(test-util-obj-y)
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< 
,"LINK","$(TARGET_DIR)$@")
diff --git a/tests/test-listen.c b/tests/test-listen.c
new file mode 100644
index 000..5c07537
--- /dev/null
+++ b/tests/test-listen.c
@@ -0,0 +1,253 @@
+/*
+ * Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
+ *Author: Knut Omang 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ *
+ * Test parallel port listen configuration with
+ * dynamic port allocation
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qemu-common.h"
+#include "qemu/thread.h"
+#include "qemu/sockets.h"
+#include "qapi/error.h"
+
+#define NAME_LEN 1024
+#define PORT_LEN 16
+
+struct thr_info {
+QemuThread thread;
+int to_port;
+bool ipv4;
+bool ipv6;
+int got_port;
+int eno;
+int fd;
+const char *errstr;
+char hostname[NAME_LEN + 1];
+char port[PORT_LEN + 1];
+};
+
+
+/* These two functions taken from test-io-channel-socket.c */
+static int check_bind(const char *hostname, bool *has_proto)
+{
+int fd = -1;
+struct addrinfo ai, *res = NULL;
+int rc;
+int ret = -1;
+
+memset(, 0, sizeof(ai));
+ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG;
+ai.ai_family = AF_UNSPEC;
+ai.ai_socktype = SOCK_STREAM;
+
+/* lookup */
+rc = getaddrinfo(hostname, NULL, , );
+if (rc != 0) {
+if (rc == EAI_ADDRFAMILY ||
+rc == EAI_FAMILY) {
+*has_proto = false;
+goto done;
+}
+goto cleanup;
+}
+
+fd = qemu_socket(res->ai_family, res->ai_socktype, res->ai_protocol);
+if (fd < 0) {
+goto cleanup;
+}
+
+if (bind(fd, res->ai_addr, res->ai_addrlen) < 0) {
+if (errno == EADDRNOTAVAIL) {
+*has_proto = false;
+goto done;
+}
+goto cleanup;
+}
+
+*has_proto = true;
+ done:
+ret = 0;
+
+ cleanup:
+if (fd != -1) {
+close(fd);
+}
+if (res) {
+freeaddrinfo(res);
+}
+return ret;
+}
+
+static int 

Re: [Qemu-devel] [PATCH v5 2/4] sockets: factor out create_fast_reuse_socket

2017-07-29 Thread Knut Omang
On Tue, 2017-07-25 at 10:38 +0100, Daniel P. Berrange wrote:
> On Sat, Jul 22, 2017 at 09:49:31AM +0200, Knut Omang wrote:
> > First refactoring step to prepare for fixing the problem
> > exposed with the test-listen test in the previous commit
> > 
> > Signed-off-by: Knut Omang 
> > ---
> >  util/qemu-sockets.c | 24 +---
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 1358c81..578f25b 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -149,6 +149,20 @@ int inet_ai_family_from_address(InetSocketAddress 
> > *addr,
> >  return PF_UNSPEC;
> >  }
> >  
> > +static int create_fast_reuse_socket(struct addrinfo *e, Error **errp)
> > +{
> > +int slisten = qemu_socket(e->ai_family, e->ai_socktype, 
> > e->ai_protocol);
> > +if (slisten < 0) {
> > +if (!e->ai_next) {
> > +error_setg_errno(errp, errno, "Failed to create socket");
> > +}
> > +return -1;
> > +}
> > +
> > +socket_set_fast_reuse(slisten);
> > +return slisten;
> > +}
> 
> As mentioned in the previous review, I don't think we should have
> methods like this which sometimes report an error on failure, and
> sometimes don't report an error. It makes it hard to review the
> callers for correctness of error handling. 

Sorry, somehow this one slipped through..

> 
> > +
> >  static int inet_listen_saddr(InetSocketAddress *saddr,
> >   int port_offset,
> >   bool update_addr,
> > @@ -210,21 +224,17 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >  return -1;
> >  }
> >  
> > -/* create socket + bind */
> > +/* create socket + bind/listen */
> >  for (e = res; e != NULL; e = e->ai_next) {
> >  getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen,
> >     uaddr,INET6_ADDRSTRLEN,uport,32,
> >     NI_NUMERICHOST | NI_NUMERICSERV);
> > -slisten = qemu_socket(e->ai_family, e->ai_socktype, 
> > e->ai_protocol);
> > +
> > +slisten = create_fast_reuse_socket(e, );
> >  if (slisten < 0) {
> > -if (!e->ai_next) {
> > -error_setg_errno(errp, errno, "Failed to create socket");
> > -}
> 
> So please leave this here.

Since we are already outside of the scope of the original patch, 
let me suggest:

As far as I can see there's no good reason to have 
different behavior if creating sockets on the last 'struct addrinfo'
in the list fails than if an intermediate addrinfo fails and the next 
addrinfo in the list just happened to have no free ports.

As far as I can see the 'Failed to create socket' message will be
most useful to the user if creating a socket fails on all 
elements in the list, to detect issues with the network setup.

I have implemented this semantics in v6 - it should be an enhancement while 
also simplifying the code inside the for loops, getting rid  
of the conditional error setting and avoid code duplication by 
handling the same error twice in proximity.

Thanks,
Knut

> 
> >  continue;
> >  }
> >  
> > -socket_set_fast_reuse(slisten);
> > -
> >  port_min = inet_getport(e);
> >  port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> >  for (p = port_min; p <= port_max; p++) {
> 
> 
> Regards,
> Daniel

Re: [Qemu-devel] [PATCH v5 4/4] sockets: Handle race condition between binds to the same port

2017-07-29 Thread Knut Omang
On Tue, 2017-07-25 at 10:54 +0100, Daniel P. Berrange wrote:
> On Sat, Jul 22, 2017 at 09:49:33AM +0200, Knut Omang wrote:
> > If an offset of ports is specified to the inet_listen_saddr function(),
> > and two or more processes tries to bind from these ports at the same time,
> > occasionally more than one process may be able to bind to the same
> > port. The condition is detected by listen() but too late to avoid a failure.
> > 
> > This function is called by socket_listen() and used
> > by all socket listening code in QEMU, so all cases where any form of dynamic
> > port selection is used should be subject to this issue.
> > 
> > Add code to close and re-establish the socket when this
> > condition is observed, hiding the race condition from the user.
> > 
> > Also clean up some issues with error handling to allow more
> > accurate reporting of the cause of an error.
> > 
> > This has been developed and tested by means of the
> > test-listen unit test in the previous commit.
> > Enable the test for make check now that it passes.
> > 
> > Signed-off-by: Knut Omang 
> > Reviewed-by: Bhavesh Davda 
> > Reviewed-by: Yuval Shaia 
> > Reviewed-by: Girish Moodalbail 
> > Signed-off-by: Knut Omang 
> > ---
> >  tests/Makefile.include |  2 +-
> >  util/qemu-sockets.c| 51 ---
> >  2 files changed, 39 insertions(+), 14 deletions(-)
> > 
> > diff --git a/tests/Makefile.include b/tests/Makefile.include
> > index b37c0c8..9d2131d 100644
> > --- a/tests/Makefile.include
> > +++ b/tests/Makefile.include
> > @@ -128,7 +128,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
> >  gcov-files-check-bufferiszero-y = util/bufferiszero.c
> >  check-unit-y += tests/test-uuid$(EXESUF)
> >  check-unit-y += tests/ptimer-test$(EXESUF)
> > -#check-unit-y += tests/test-listen$(EXESUF)
> > +check-unit-y += tests/test-listen$(EXESUF)
> >  gcov-files-ptimer-test-y = hw/core/ptimer.c
> >  check-unit-y += tests/test-qapi-util$(EXESUF)
> >  gcov-files-test-qapi-util-y = qapi/qapi-util.c
> > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> > index 39f207c..8ca3ba6 100644
> > --- a/util/qemu-sockets.c
> > +++ b/util/qemu-sockets.c
> > @@ -210,7 +210,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >  char port[33];
> >  char uaddr[INET6_ADDRSTRLEN+1];
> >  char uport[33];
> > -int slisten, rc, port_min, port_max, p;
> > +int rc, port_min, port_max, p;
> > +int slisten = 0;
> > +int saved_errno = 0;
> >  Error *err = NULL;
> >  
> >  memset(,0, sizeof(ai));
> > @@ -277,27 +279,50 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
> >  port_max = saddr->has_to ? saddr->to + port_offset : port_min;
> >  for (p = port_min; p <= port_max; p++) {
> >  inet_setport(e, p);
> > -if (try_bind(slisten, saddr, e) >= 0) {
> > -goto listen;
> > -}
> > -if (p == port_max) {
> > -if (!e->ai_next) {
> > +rc = try_bind(slisten, saddr, e);
> > +if (rc) {
> > +if (errno == EADDRINUSE) {
> 
> Add in  '&& e->ai_next' here, so we trigger the error handling
> on the else branch if this was the last address we had.

See my comment on patch 2 - with that reporting semantics, inability to
create a socket will only be reported if none of the addrinfo structs
allowed socket creation so this reporting case goes away.

> > +continue;
> > +} else {
> >  error_setg_errno(errp, errno, "Failed to bind socket");
> > +goto listen_failed;
> > +}
> > +}
> > +rc = listen(slisten, 1);
> > +if (!rc) {
> 
> Assigning to rc seems redundant here.

removed.

> > +goto listen_ok;
> > +} else if (errno == EADDRINUSE) {
> > +/* Someone else managed to bind to the same port and beat 
> > us
> > + * to listen on it! Socket semantics does not allow us to
> > + * recover from this situation, so we need to recreate the
> > + * socket to allow bind attempts for subsequent ports:
> > + */
> > +closesocket(slisten);
> > +slisten = create_fast_reuse_socket(e, errp);
> 
> create_fast_reuse_socket may report an error here
> 
> > +if (slisten >= 0) {
> > +continue;
> >  }
> >  }
> > +error_setg_errno(errp, errno, "Failed to listen on socket");
> 
> at which point you report another error on top of it with a stale
> errno value here. This bug would be more obvious if error reporting
> was pulled out of create_fast_reuse_socket() as described in the
> earlier patch.
> 
> I think it'd be clearer if 

Re: [Qemu-devel] [PATCH] 9pfs: include for XATTR_SIZE_MAX

2017-07-29 Thread Peter Maydell
On 29 July 2017 at 14:50, Patrick Steinhardt  wrote:
> On Fri, Jul 28, 2017 at 02:20:49PM -0300, Philippe Mathieu-Daudé wrote:
>> This is likely to break on BSD, but now than patchew has a NetBSD job
>> you can trigger a build RESENDing this patch.

> Thanks for the feedback! Is this really relevant in this context,
> though? Both 9p-local.c and 9p-handle.c already include linux
> headers, especially  is included without any ifdef
> around. As such, I simply assumed that this code is being built
> on Linux systems, only.

Yes; hw/Makefile.objs only includes the 9pfs directory
if CONFIG_VIRTFS is defined, and configure only sets that
on Linux hosts which have libpcap and libattr available.

thanks
-- PMM



[Qemu-devel] [PATCH v4 1/3] block: add bdrv_get_format_alloc_stat format interface

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
The function should collect statistics, about used/unused by top-level
format driver space (in its .file) and allocation status
(data/zero/discarded/after-eof) of corresponding areas in this .file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block.c   | 16 +++
 include/block/block.h |  3 ++
 include/block/block_int.h |  2 ++
 qapi/block-core.json  | 72 +++
 4 files changed, 93 insertions(+)

diff --git a/block.c b/block.c
index 50ba264143..7d720ae0c2 100644
--- a/block.c
+++ b/block.c
@@ -3407,6 +3407,22 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState 
*bs)
 }
 
 /**
+ * Collect format allocation info. See BlockFormatAllocInfo definition in
+ * qapi/block-core.json.
+ */
+int bdrv_get_format_alloc_stat(BlockDriverState *bs, BlockFormatAllocInfo 
*bfai)
+{
+BlockDriver *drv = bs->drv;
+if (!drv) {
+return -ENOMEDIUM;
+}
+if (drv->bdrv_get_format_alloc_stat) {
+return drv->bdrv_get_format_alloc_stat(bs, bfai);
+}
+return -ENOTSUP;
+}
+
+/**
  * Return number of sectors on success, -errno on error.
  */
 int64_t bdrv_nb_sectors(BlockDriverState *bs)
diff --git a/include/block/block.h b/include/block/block.h
index 9b355e92d8..646376a772 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -335,6 +335,9 @@ typedef enum {
 
 int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
 
+int bdrv_get_format_alloc_stat(BlockDriverState *bs,
+   BlockFormatAllocInfo *bfai);
+
 /* The units of offset and total_work_size may be chosen arbitrarily by the
  * block driver; total_work_size may change during the course of the amendment
  * operation */
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8d3724cce6..458c715e99 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -208,6 +208,8 @@ struct BlockDriver {
 int64_t (*bdrv_getlength)(BlockDriverState *bs);
 bool has_variable_length;
 int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
+int (*bdrv_get_format_alloc_stat)(BlockDriverState *bs,
+  BlockFormatAllocInfo *bfai);
 
 int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ea0b3e8b13..93f6995381 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -139,6 +139,78 @@
'*format-specific': 'ImageInfoSpecific' } }
 
 ##
+# @BlockFormatAllocInfo:
+#
+#
+# Allocation information of an underlying protocol file, as partitioned by a
+# format driver's utilization of said allocations.
+# All fields are in bytes.
+#
+# Regions of the underlying protocol file may be considered used or unused by
+# the format driver interpreting these regions. It is at the discretion of the
+# format driver (e.g. qcow2) which regions of its backing storage are 
considered
+# in-use or not.
+#
+# For now, the only format driver supporting this feature is Qcow2 which is a
+# cluster based format. Clusters considered in-use by qcow2 are those with a
+# non-zero refcount in the format metadata. All other clusters, if present, are
+# considered unused. Examples of unused allocations for the Qcow2 format are
+# leaked clusters, pre-allocated clusters, and recently freed clusters.
+#
+# Note: the whole underlying protocol file is described as well as all format
+# file allocations, not only virtual disk data (metadata, internal snapshots,
+# etc. are included).
+#
+# For the underlying protocol file there are native block-status types of the
+# regions:
+#  - data: allocated data
+#  - zero: reported as zero (for example, this type corresponds to holes for
+#  POSIX files on sparce file-system)
+#  - discarded: not allocated
+# 4th additional type is 'overrun', is data referenced by the format driver
+# located beyond EOF of the underlying protocol file. For example, a partially
+# allocated cluster at the end of a QCOW2 file, where Qcow2 generally operates
+# on complete clusters.
+#
+# So, the fields are:
+#
+# @used-data: used by the format file and backed by data in the underlying
+# protocol file
+#
+# @used-zero: used by the format file and backed by zeroes in the underlying
+# protocol file; which may be a filesystem hole for POSIX files.
+#
+# @used-discarded: used by the format file but actually unallocated in the
+#  underlying protocol file
+#
+# @used-overrun: used by the format file beyond the end of the underlying
+#protocol file
+#
+# @unused-data: allocated data in the underlying protocol file not used by the
+#   format file
+#
+# @unused-zero: reported-as-zero regions in the underlying protocol file not
+#   used by the format file
+#
+# @unused-discarded: 

[Qemu-devel] [PATCH v4 0/3] qemu-img check: format allocation info

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
Hi all.

See 01 patch for the doc.

Question to discuss.
If I understand correctly get_block_status flags allocated, zero, and data
actually provide 5 possible combinations, which I combine into three.

allocated data zero
1 11\__ data
1 10/
1 01\__ zero
0 01/
0 00___ discarded

This division looks not bad, but it is not the only one possible.
Separating data is really useful - it shows leaked clusters..
So the question is, don't we want to adjust the division?
I'm ok with the current one.

v4: - reword docs in 01
- s/2.10/2.11/ for qapi

v3: - improve docs
- rename fields
- add 'zero' type of underlying file portions status. It as these
  areas cannot be presented as 'discarded', but they are not
  occupying real space, so we don't want to present them as
  'allocated data' too.
- remove last patch. It is not needed after introducing new naming
  for the fields

v2: fix build error, gcc things that some variables may be used
uninitialized (actually they didn't).

v1: 

These series is a replacement for "qemu-img check: unallocated size"
series.

There was a question, should we account allocated clusters in qcow2 but
actually holes in underalying file as allocated or not. Instead of
hiding this information in one-number statistic I've decided to print
the whole information, 5 numbers:

For allocated by top-level format driver (qcow2 for ex.) clusters, 3
numbers: number of bytes, which are:
 - allocated in underlying file
 - holes in underlying file
 - after end of underlying file

To account other areas of underlying file, 2 more numbers of bytes:
 - unallocated by top-level driver but allocated in underlying file
 - unallocated by top-level driver and holes in underlying file

Vladimir Sementsov-Ogievskiy (3):
  block: add bdrv_get_format_alloc_stat format interface
  qcow2: add .bdrv_get_format_alloc_stat
  qemu-img check: add format allocation info

 block.c   |  16 +
 block/qcow2-refcount.c| 152 ++
 block/qcow2.c |   2 +
 block/qcow2.h |   2 +
 include/block/block.h |   3 +
 include/block/block_int.h |   2 +
 qapi/block-core.json  |  78 +++-
 qemu-img.c|  42 +
 8 files changed, 296 insertions(+), 1 deletion(-)

-- 
2.11.1




[Qemu-devel] [PATCH v4 3/3] qemu-img check: add format allocation info

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json |  6 +-
 qemu-img.c   | 42 ++
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 93f6995381..d662786261 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -249,6 +249,9 @@
 #   field is present if the driver for the image format
 #   supports it
 #
+# @format-alloc-info: Format-allocation information, see
+# BlockFormatAllocInfo description. (Since: 2.11)
+#
 # Since: 1.4
 #
 ##
@@ -257,7 +260,8 @@
'*image-end-offset': 'int', '*corruptions': 'int', '*leaks': 'int',
'*corruptions-fixed': 'int', '*leaks-fixed': 'int',
'*total-clusters': 'int', '*allocated-clusters': 'int',
-   '*fragmented-clusters': 'int', '*compressed-clusters': 'int' } }
+   '*fragmented-clusters': 'int', '*compressed-clusters': 'int',
+   '*format-alloc-info': 'BlockFormatAllocInfo' } }
 
 ##
 # @MapEntry:
diff --git a/qemu-img.c b/qemu-img.c
index b506839ef0..b3adf9a1a2 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -560,6 +560,35 @@ static void dump_json_image_check(ImageCheck *check, bool 
quiet)
 QDECREF(str);
 }
 
+static void dump_human_format_alloc_info(BlockFormatAllocInfo *bfai, bool 
quiet)
+{
+char *used_data = size_to_str(bfai->used_data);
+char *used_zero = size_to_str(bfai->used_zero);
+char *used_discarded = size_to_str(bfai->used_discarded);
+char *used_overrun = size_to_str(bfai->used_overrun);
+
+char *unused_data = size_to_str(bfai->unused_data);
+char *unused_zero = size_to_str(bfai->unused_zero);
+char *unused_discarded = size_to_str(bfai->unused_discarded);
+
+qprintf(quiet,
+"Format allocation info (including metadata):\n"
+"   datazero   discarded   after-eof\n"
+"used %10s  %10s  %10s  %10s\n"
+"unused   %10s  %10s  %10s\n",
+used_data, used_zero, used_discarded, used_overrun,
+unused_data, unused_zero, unused_discarded);
+
+g_free(used_data);
+g_free(used_zero);
+g_free(used_discarded);
+g_free(used_overrun);
+
+g_free(unused_data);
+g_free(unused_zero);
+g_free(unused_discarded);
+}
+
 static void dump_human_image_check(ImageCheck *check, bool quiet)
 {
 if (!(check->corruptions || check->leaks || check->check_errors)) {
@@ -601,6 +630,10 @@ static void dump_human_image_check(ImageCheck *check, bool 
quiet)
 qprintf(quiet,
 "Image end offset: %" PRId64 "\n", check->image_end_offset);
 }
+
+if (check->has_format_alloc_info) {
+dump_human_format_alloc_info(check->format_alloc_info, quiet);
+}
 }
 
 static int collect_image_check(BlockDriverState *bs,
@@ -611,6 +644,7 @@ static int collect_image_check(BlockDriverState *bs,
 {
 int ret;
 BdrvCheckResult result;
+BlockFormatAllocInfo *bfai = g_new0(BlockFormatAllocInfo, 1);
 
 ret = bdrv_check(bs, , fix);
 if (ret < 0) {
@@ -639,6 +673,14 @@ static int collect_image_check(BlockDriverState *bs,
 check->compressed_clusters  = result.bfi.compressed_clusters;
 check->has_compressed_clusters  = result.bfi.compressed_clusters != 0;
 
+ret = bdrv_get_format_alloc_stat(bs, bfai);
+if (ret < 0) {
+qapi_free_BlockFormatAllocInfo(bfai);
+} else {
+check->has_format_alloc_info = true;
+check->format_alloc_info = bfai;
+}
+
 return 0;
 }
 
-- 
2.11.1




[Qemu-devel] [PATCH v4 2/3] qcow2: add .bdrv_get_format_alloc_stat

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
Realize .bdrv_get_format_alloc_stat interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2-refcount.c | 152 +
 block/qcow2.c  |   2 +
 block/qcow2.h  |   2 +
 3 files changed, 156 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7c06061aae..fd6027f0c2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2931,3 +2931,155 @@ done:
 qemu_vfree(new_refblock);
 return ret;
 }
+
+typedef struct FormatAllocStatCo {
+BlockDriverState *bs;
+BlockFormatAllocInfo *bfai;
+int64_t ret;
+} FormatAllocStatCo;
+
+static void coroutine_fn qcow2_get_format_alloc_stat_entry(void *opaque)
+{
+int ret = 0;
+FormatAllocStatCo *nbco = opaque;
+BlockDriverState *bs = nbco->bs;
+BDRVQcow2State *s = bs->opaque;
+BlockFormatAllocInfo *bfai = nbco->bfai;
+int64_t cluster, file_sectors, sector;
+int refcount_block_offset;
+uint32_t i;
+bool used = false, f_data = false, f_zero = false;
+int dif, num = 0, f_num = 0;
+
+memset(bfai, 0, sizeof(*bfai));
+
+file_sectors = bdrv_nb_sectors(bs->file->bs);
+if (file_sectors < 0) {
+nbco->ret = file_sectors;
+return;
+}
+
+qemu_co_mutex_lock(>lock);
+
+for (sector = 0; sector < file_sectors; sector += dif) {
+if (f_num == 0) {
+BlockDriverState *file;
+ret = bdrv_get_block_status(bs->file->bs, sector,
+file_sectors - sector, _num, );
+if (ret < 0) {
+goto fail;
+}
+f_data = ret & BDRV_BLOCK_DATA;
+f_zero = ret & BDRV_BLOCK_ZERO;
+}
+
+if (num == 0) {
+uint64_t refcount;
+assert(((sector << BDRV_SECTOR_BITS) & (s->cluster_size - 1)) == 
0);
+ret = qcow2_get_refcount(
+bs, (sector << BDRV_SECTOR_BITS) >> s->cluster_bits, 
);
+if (ret < 0) {
+goto fail;
+}
+used = refcount > 0;
+num = s->cluster_size >> BDRV_SECTOR_BITS;
+}
+
+dif = MIN(f_num, MIN(num, file_sectors - sector));
+if (used) {
+if (f_data) {
+bfai->used_data += dif;
+} else if (f_zero) {
+bfai->used_zero += dif;
+} else {
+bfai->used_discarded += dif;
+}
+} else {
+if (f_data) {
+bfai->unused_data += dif;
+} else if (f_zero) {
+bfai->unused_zero += dif;
+} else {
+bfai->unused_discarded += dif;
+}
+}
+f_num -= dif;
+num -= dif;
+}
+
+assert(f_num == 0);
+
+if (used) {
+bfai->used_overrun += num;
+}
+
+cluster = size_to_clusters(s, sector << BDRV_SECTOR_BITS);
+refcount_block_offset = cluster & (s->refcount_block_size - 1);
+for (i = cluster >> s->refcount_block_bits;
+ i <= s->max_refcount_table_index; i++)
+{
+int j;
+
+if (!(s->refcount_table[i] & REFT_OFFSET_MASK)) {
+refcount_block_offset = 0;
+continue;
+}
+
+for (j = refcount_block_offset; j < s->refcount_block_size; j++) {
+uint64_t refcount;
+cluster = (i << s->refcount_block_bits) + j;
+
+ret = qcow2_get_refcount(bs, cluster, );
+if (ret < 0) {
+goto fail;
+}
+if (refcount > 0) {
+bfai->used_overrun++;
+}
+}
+
+refcount_block_offset = 0;
+}
+
+qemu_co_mutex_unlock(>lock);
+
+bfai->used_data = bfai->used_data << BDRV_SECTOR_BITS;
+bfai->used_zero = bfai->used_zero << BDRV_SECTOR_BITS;
+bfai->used_discarded = bfai->used_discarded << BDRV_SECTOR_BITS;
+bfai->used_overrun = bfai->used_overrun << BDRV_SECTOR_BITS;
+
+bfai->unused_data = bfai->unused_data << BDRV_SECTOR_BITS;
+bfai->unused_zero = bfai->unused_zero << BDRV_SECTOR_BITS;
+bfai->unused_discarded = bfai->unused_discarded << BDRV_SECTOR_BITS;
+
+nbco->ret = 0;
+return;
+
+fail:
+nbco->ret = ret;
+qemu_co_mutex_unlock(>lock);
+}
+
+/* qcow2_get_format_alloc_stat()
+ * Fills @bfai struct. In case of failure @bfai content is unpredicted.
+ */
+int qcow2_get_format_alloc_stat(BlockDriverState *bs,
+BlockFormatAllocInfo *bfai)
+{
+FormatAllocStatCo nbco = {
+.bs = bs,
+.bfai = bfai,
+.ret = -EINPROGRESS
+};
+
+if (qemu_in_coroutine()) {
+qcow2_get_format_alloc_stat_entry();
+} else {
+Coroutine *co =
+qemu_coroutine_create(qcow2_get_format_alloc_stat_entry, );
+qemu_coroutine_enter(co);
+BDRV_POLL_WHILE(bs, nbco.ret == -EINPROGRESS);
+}
+
+return nbco.ret;
+}

Re: [Qemu-devel] [PATCH] 9pfs: include for XATTR_SIZE_MAX

2017-07-29 Thread Patrick Steinhardt
On Fri, Jul 28, 2017 at 02:20:49PM -0300, Philippe Mathieu-Daudé wrote:
> > On Mon, Jun 26, 2017 at 12:20 PM, Patrick Steinhardt  wrote:
> >> The function `v9fs_xattrcreate` makes use of the define `XATTR_SIZE_MAX`
> >> to reject attempts of creating xattrs with an invalid size, which is
> >> defined in . On glibc-based systems, this header is
> >> indirectly included via , ,
> >> , but on other platforms this is not guaranteed due
> >> to not being part of the POSIX standard. One examples are systems based
> >> on musl libc, which do not include the  indirectly,
> >> which leads to `XATTR_SIZE_MAX` being undefined.
> >>
> >> Fix this error by directly include . As the 9P fs code
> >> is being Linux-based either way, we can simply do so without breaking
> >> other platforms. This enables building 9pfs on musl-based systems.
> >>
> >> Signed-off-by: Patrick Steinhardt 
> > Reviewed-by: Alistair Francis 
> >> ---
> >>   hw/9pfs/9p.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> >> index 96d2683348..48cd558e96 100644
> >> --- a/hw/9pfs/9p.c
> >> +++ b/hw/9pfs/9p.c
> >> @@ -13,6 +13,7 @@
> >>
> >>   #include "qemu/osdep.h"
> >>   #include 
> 
> This is likely to break on BSD, but now than patchew has a NetBSD job 
> you can trigger a build RESENDing this patch.
> 
> This should probably work:
> 
> #ifdef __linux__
> 
> >> +#include 
> 
> #endif
> 
> >>   #include "hw/virtio/virtio.h"
> >>   #include "qapi/error.h"
> >>   #include "qemu/error-report.h"
> >> --
> >> 2.13.2
> 
> Regards,
> 
> Phil.

Thanks for the feedback! Is this really relevant in this context,
though? Both 9p-local.c and 9p-handle.c already include linux
headers, especially  is included without any ifdef
around. As such, I simply assumed that this code is being built
on Linux systems, only.

Regards
Patrick


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 0/4] trace-events: print 0x before hex numbers

2017-07-29 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH v2 0/4] trace-events: print 0x before hex numbers
Message-id: 20170729131159.24949-1-vsement...@virtuozzo.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
3dd406d1d0 trace-events: fix code style: print 0x before hex numbers
32dc4922ee checkpatch: check trace-events code style
01905453d5 trace-events: fix code style: %# -> 0x%
e08ea289b5 coding_style: add point about 0x in trace-events

=== OUTPUT BEGIN ===
Checking PATCH 1/4: coding_style: add point about 0x in trace-events...
Checking PATCH 2/4: trace-events: fix code style: %# -> 0x%...
Checking PATCH 3/4: checkpatch: check trace-events code style...
WARNING: line over 80 characters
#25: FILE: scripts/checkpatch.pl:1343:
+   ERROR("Don't use '#' flag of printf format 
('%#') in " .

ERROR: line over 90 characters
#26: FILE: scripts/checkpatch.pl:1344:
+ "trace-events, use '0x' prefix instead\n" 
. $herecurr);

ERROR: line over 90 characters
#29: FILE: scripts/checkpatch.pl:1347:
+   qr/%[-+ 
*.0-9]*([hljztL]|ll|hh)?(x|X|"\s*PRI[xX][^"]*"?)/;

ERROR: line over 90 characters
#31: FILE: scripts/checkpatch.pl:1349:
+   # don't consider groups splitted by [.:/ ], 
like 2A.20:12ab

WARNING: line over 80 characters
#32: FILE: scripts/checkpatch.pl:1350:
+   my $tmpline = $rawline =~ s/($hex[.:\/ 
])+$hex//gr;

WARNING: line over 80 characters
#35: FILE: scripts/checkpatch.pl:1353:
+   ERROR("Hex numbers must be prefixed 
with '0x'\n" .

total: 3 errors, 3 warnings, 25 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 4/4: trace-events: fix code style: print 0x before hex numbers...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PATCH v2 4/4] trace-events: fix code style: print 0x before hex numbers

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
The only exception are groups of numers separated by symbols
'.', ' ', ':', '/', like 'ab.09.7d'.

This patch is made by the following:

> find . -name trace-events | xargs python script.py

where script.py is the following python script:
=
 #!/usr/bin/env python

import sys
import re
import fileinput

rhex = '%[-+ *.0-9]*(?:[hljztL]|ll|hh)?(?:x|X|"\s*PRI[xX][^"]*"?)'
rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')')
rbad = re.compile('(?', arr[i])

sys.stdout.write(''.join(arr))
=

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 accel/tcg/trace-events|   2 +-
 block/trace-events|  22 +++
 hw/audio/trace-events |   4 +-
 hw/char/trace-events  |  12 ++--
 hw/display/trace-events   |  12 ++--
 hw/dma/trace-events   |  20 +++---
 hw/i386/xen/trace-events  |   4 +-
 hw/input/trace-events |   4 +-
 hw/intc/trace-events  | 156 +++---
 hw/isa/trace-events   |   4 +-
 hw/misc/trace-events  |  78 +++
 hw/net/trace-events   |  34 +-
 hw/nvram/trace-events |   2 +-
 hw/ppc/trace-events   |  64 +--
 hw/s390x/trace-events |  20 +++---
 hw/scsi/trace-events  | 112 -
 hw/sd/trace-events|   4 +-
 hw/timer/trace-events |  20 +++---
 hw/usb/trace-events   |  56 -
 hw/vfio/trace-events  |  44 ++---
 hw/virtio/trace-events|   6 +-
 linux-user/trace-events   |  10 +--
 migration/trace-events|  36 +--
 nbd/trace-events  |  18 +++---
 net/trace-events  |   4 +-
 target/arm/trace-events   |  10 +--
 target/s390x/trace-events |   2 +-
 target/sparc/trace-events |  30 -
 28 files changed, 395 insertions(+), 395 deletions(-)

diff --git a/accel/tcg/trace-events b/accel/tcg/trace-events
index 2de8359670..c22ad60af7 100644
--- a/accel/tcg/trace-events
+++ b/accel/tcg/trace-events
@@ -4,7 +4,7 @@
 # cpu-exec.c
 disable exec_tb(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR
 disable exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR
-disable exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=%x"
+disable exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=0x%x"
 
 # translate-all.c
 translate_block(void *tb, uintptr_t pc, uint8_t *tb_code) "tb:%p, 
pc:0x%"PRIxPTR", tb_code:%p"
diff --git a/block/trace-events b/block/trace-events
index 622c754495..8fc853b135 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -5,8 +5,8 @@ bdrv_open_common(void *bs, const char *filename, int flags, 
const char *format_n
 bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 
 # block/block-backend.c
-blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x"
-blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x"
+blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x"
+blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int 
flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x"
 
 # block/io.c
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
@@ -54,19 +54,19 @@ paio_submit_co(int64_t offset, int count, int type) "offset 
%"PRId64" count %d t
 paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb 
%p opaque %p offset %"PRId64" count %d type %d"
 
 # block/qcow2.c
-qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset %" 
PRIx64 " bytes %d"
+qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" 
PRIx64 " bytes %d"
 qcow2_writev_done_req(void *co, int ret) "co %p ret %d"
 qcow2_writev_start_part(void *co) "co %p"
 qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
-qcow2_writev_data(void *co, uint64_t offset) "co %p offset %" PRIx64
-qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p 
offset %" PRIx64 " count %d"
-qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset %" 
PRIx64 " count %d"
+qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
+qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p 
offset 0x%" PRIx64 " count %d"
+qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" 
PRIx64 " count %d"
 
 # block/qcow2-cluster.c
-qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p 
offset %" PRIx64 " bytes %d"
-qcow2_handle_copied(void *co, uint64_t guest_offset, uint64_t host_offset, 
uint64_t bytes) "co %p guest_offset %" PRIx64 " host_offset %" PRIx64 " bytes 
%" PRIx64
-qcow2_handle_alloc(void *co, uint64_t guest_offset, uint64_t host_offset, 

[Qemu-devel] [PATCH v2 3/4] checkpatch: check trace-events code style

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
Accordingly to CODING_STYLE, check that in trace-events:
1. hex numbers are prefixed with '0x'
2. '#' flag of printf is not used
3. The exclusion from 1. are period-separated groups of numbers

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/checkpatch.pl | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 4e91122813..fa478074b8 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1337,6 +1337,25 @@ sub process {
$rpt_cleaners = 1;
}
 
+# checks for trace-events files
+   if ($realfile =~ /trace-events$/ && $line =~ /^\+/) {
+   if ($rawline =~ /%[-+ 0]*#/) {
+   ERROR("Don't use '#' flag of printf format 
('%#') in " .
+ "trace-events, use '0x' prefix instead\n" 
. $herecurr);
+   } else {
+   my $hex =
+   qr/%[-+ 
*.0-9]*([hljztL]|ll|hh)?(x|X|"\s*PRI[xX][^"]*"?)/;
+
+   # don't consider groups splitted by [.:/ ], 
like 2A.20:12ab
+   my $tmpline = $rawline =~ s/($hex[.:\/ 
])+$hex//gr;
+
+   if ($tmpline =~ /(?

[Qemu-devel] [PATCH v2 0/4] trace-events: print 0x before hex numbers

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
Hi all!

It is hard to read logs, when there are hex and dec numbers in one line, when
hex number doesn't contain any letters and don't have '0x' prefix.

So, here is a complete solution for the problem:

- add information into CODING_STYLE
- add a check into checkpatch.pl
- fix current state

The new rule for the style is:

Hex numbers should be prefixed by '0x', except groups of numbers,
separated by symbols ' ', '.', ':', '/', however '0x' can be used for
numbers in such groups too. Flag '#' in number format is not allowed.

Note: checkpatch fails on checkpatch change (03) due to long lines.
It is because checkpatch.pl is indented by tabs and when it checks
for long lines it consider tabs as 8 spaces. Looks like nobody cares,
so do I. I see two ways here:
 -  s/\t//g
 -  make exclusion in checkpatch.pl for checkpatch.pl to consider tabs
as 4 spaces, not 8.  However I don't want to fix it in the context
of these series.

v2: almost everything (style, checkpatch, excluding number groups)
v1: was a draft of the idea using two sed commands.

Vladimir Sementsov-Ogievskiy (4):
  coding_style: add point about 0x in trace-events
  trace-events: fix code style: %# -> 0x%
  checkpatch: check trace-events code style
  trace-events: fix code style: print 0x before hex numbers

 CODING_STYLE  |  23 ++
 accel/tcg/trace-events|   2 +-
 audio/trace-events|   4 +-
 block/trace-events|  28 
 hw/audio/trace-events |   4 +-
 hw/char/trace-events  |  12 ++--
 hw/display/trace-events   |  14 ++--
 hw/dma/trace-events   |  20 +++---
 hw/i386/xen/trace-events  |  26 +++
 hw/input/trace-events |   6 +-
 hw/intc/trace-events  | 176 +++---
 hw/isa/trace-events   |   4 +-
 hw/misc/trace-events  |  78 ++--
 hw/net/trace-events   |  52 +++---
 hw/nvram/trace-events |   2 +-
 hw/pci/trace-events   |   4 +-
 hw/ppc/trace-events   |  64 -
 hw/s390x/trace-events |  20 +++---
 hw/scsi/trace-events  | 118 +++
 hw/sd/trace-events|   4 +-
 hw/timer/trace-events |  20 +++---
 hw/usb/trace-events   |  56 +++
 hw/vfio/trace-events  |  44 ++--
 hw/virtio/trace-events|   6 +-
 hw/xen/trace-events   |   8 +--
 linux-user/trace-events   |  10 +--
 migration/trace-events|  36 +-
 nbd/trace-events  |  18 ++---
 net/trace-events  |   4 +-
 scripts/checkpatch.pl |  19 +
 target/arm/trace-events   |  10 +--
 target/s390x/trace-events |   2 +-
 target/sparc/trace-events |  30 
 trace-events  |  20 +++---
 34 files changed, 493 insertions(+), 451 deletions(-)

-- 
2.11.1




[Qemu-devel] [PATCH v2 1/4] coding_style: add point about 0x in trace-events

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 CODING_STYLE | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/CODING_STYLE b/CODING_STYLE
index 2fa0c0b65b..2e6a0507be 100644
--- a/CODING_STYLE
+++ b/CODING_STYLE
@@ -123,3 +123,26 @@ We use traditional C-style /* */ comments and avoid // 
comments.
 
 Rationale: The // form is valid in C99, so this is purely a matter of
 consistency of style. The checkpatch script will warn you about this.
+
+8. trace-events style
+
+In trace-events files use '0x' prefix to specify hex numbers, as in:
+
+some_trace(unsigned x, uint64_t y) "x 0x%x y 0x" PRIx64
+
+The exclusion is a group of numbers, separated by symbols '.', '/', ':', ' ':
+
+some_other_trace(unsigned a, unsigned b, unsigned c) "num %x.%x.%x"
+
+However, you can use '0x' for such groups if you want. Anyway, be sure that
+it is obvious that numbers are in hex, ex.:
+
+data_dump(uint8_t c1, uint8_t c2, uint8_t c3) "bytes (in hex): %02x %02x %02x"
+
+For consistency do not use printf flag '#', like '%#x'.
+
+Rationale: hex numbers are hard to read in logs when there no 0x prefix,
+especially when (occasionally) the representation doesn't contain any letters
+and especially in one line with other decimal numbers. Number groups are
+allowed to not use '0x' because for some things notations like %x.%x.%x are
+used not only in Qemu. Also dumping raw data bytes with '0x' is less readable.
-- 
2.11.1




[Qemu-devel] [PATCH v2 2/4] trace-events: fix code style: %# -> 0x%

2017-07-29 Thread Vladimir Sementsov-Ogievskiy
In trace format '#' flag of printf is forbidden. Fix it to '0x%'.

This patch is created by the following:

check that we have a problem
> find . -name trace-events | xargs grep '%#' | wc -l
56

check that there are no cases with additional printf flags
> find . -name trace-events | xargs grep '%[-+ 0]+#' | wc -l
0

check that there are no wrong usage of '#' and '0x' together
> find . -name trace-events | xargs grep '0x%#' | wc -l
0

fix the problem
> find . -name trace-events | xargs sed -i 's/%#/0x%/g'

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 audio/trace-events   |  4 ++--
 block/trace-events   |  6 +++---
 hw/display/trace-events  |  2 +-
 hw/i386/xen/trace-events | 22 +++---
 hw/input/trace-events|  2 +-
 hw/intc/trace-events | 20 ++--
 hw/net/trace-events  | 18 +-
 hw/pci/trace-events  |  4 ++--
 hw/scsi/trace-events |  6 +++---
 hw/xen/trace-events  |  8 
 trace-events | 20 ++--
 11 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/audio/trace-events b/audio/trace-events
index 517359039e..939ab99e76 100644
--- a/audio/trace-events
+++ b/audio/trace-events
@@ -3,7 +3,7 @@
 # audio/alsaaudio.c
 alsa_revents(int revents) "revents = %d"
 alsa_pollout(int i, int fd) "i = %d fd = %d"
-alsa_set_handler(int events, int index, int fd, int err) "events=%#x index=%d 
fd=%d err=%d"
+alsa_set_handler(int events, int index, int fd, int err) "events=0x%x index=%d 
fd=%d err=%d"
 alsa_wrote_zero(int len) "Failed to write %d frames (wrote zero)"
 alsa_read_zero(long len) "Failed to read %ld frames (read zero)"
 alsa_xrun_out(void) "Recovering from playback xrun"
@@ -13,5 +13,5 @@ alsa_resume_in(void) "Resuming suspended input stream"
 alsa_no_frames(int state) "No frames available and ALSA state is %d"
 
 # audio/ossaudio.c
-oss_version(int version) "OSS version = %#x"
+oss_version(int version) "OSS version = 0x%x"
 oss_invalid_available_size(int size, int bufsize) "Invalid available size, 
size=%d bufsize=%d"
diff --git a/block/trace-events b/block/trace-events
index 4a4df25323..622c754495 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -1,7 +1,7 @@
 # See docs/tracing.txt for syntax documentation.
 
 # block.c
-bdrv_open_common(void *bs, const char *filename, int flags, const char 
*format_name) "bs %p filename \"%s\" flags %#x format_name \"%s\""
+bdrv_open_common(void *bs, const char *filename, int flags, const char 
*format_name) "bs %p filename \"%s\" flags 0x%x format_name \"%s\""
 bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 
 # block/block-backend.c
@@ -11,7 +11,7 @@ blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned 
int bytes, int flag
 # block/io.c
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
 bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
-bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p 
offset %"PRId64" count %d flags %#x"
+bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p 
offset %"PRId64" count %d flags 0x%x"
 bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t 
cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u 
cluster_offset %"PRId64" cluster_bytes %u"
 
 # block/stream.c
@@ -100,7 +100,7 @@ qed_need_check_timer_cb(void *s) "s %p"
 qed_start_need_check_timer(void *s) "s %p"
 qed_cancel_need_check_timer(void *s) "s %p"
 qed_aio_complete(void *s, void *acb, int ret) "s %p acb %p ret %d"
-qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void 
*opaque, int flags) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p 
flags %#x"
+qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void 
*opaque, int flags) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p 
flags 0x%x"
 qed_aio_next_io(void *s, void *acb, int ret, uint64_t cur_pos) "s %p acb %p 
ret %d cur_pos %"PRIu64
 qed_aio_read_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s 
%p acb %p ret %d offset %"PRIu64" len %zu"
 qed_aio_write_data(void *s, void *acb, int ret, uint64_t offset, size_t len) 
"s %p acb %p ret %d offset %"PRIu64" len %zu"
diff --git a/hw/display/trace-events b/hw/display/trace-events
index 3e896d2e3f..16d19122f0 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -5,7 +5,7 @@ jazz_led_read(uint64_t addr, uint8_t val) "read 
addr=0x%"PRIx64": 0x%x"
 jazz_led_write(uint64_t addr, uint8_t new) "write addr=0x%"PRIx64": 0x%x"
 
 # hw/display/xenfb.c
-xenfb_mouse_event(void *opaque, int dx, int dy, int dz, int button_state, int 
abs_pointer_wanted) "%p x %d y %d z %d bs %#x abs %d"
+xenfb_mouse_event(void *opaque, int dx, int dy, int dz, int button_state, int 
abs_pointer_wanted) "%p x %d y %d z %d bs 0x%x abs %d"
 

Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-07-29 Thread Wei Wang

On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote:

On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote:

OK I thought this over. While we might need these new APIs in
the future, I think that at the moment, there's a way to implement
this feature that is significantly simpler. Just add each s/g
as a separate input buffer.

Should it be an output buffer?

Hypervisor overwrites these pages with zeroes. Therefore it is
writeable by device: DMA_FROM_DEVICE.

Why would the hypervisor need to zero the buffer?

The page is supplied to hypervisor and can lose the value that
is there.  That is the definition of writeable by device.


I think for the free pages, it should be clear that they will be added as
output buffer to the device, because (as we discussed) they are just hints,
and some of them may be used by the guest after the report_ API is invoked.
The device/hypervisor should not use or discard them.

For the balloon pages, I kind of agree with the existing implementation
(e.g. inside tell_host()), which uses virtqueue_add_outbuf (instead of 
_add_inbuf()).
I think inbuf should be a buffer which will be written by the device and 
read by the
driver. The cmd buffer put on the vq for the device to send commands can 
be an

inbuf, I think.





I think it may only
need to read out the info(base,size).

And then do what?



For the free pages, the info will be used to clear the corresponding "1" 
in the dirty bitmap.
For balloon pages, they will be made DONTNEED and given to other host 
processes to
use (the device won't write them, so no need to set "write" when 
virtqueue_map_desc() in

the device).





I think it should be like this:
the cmd hdr buffer: input, because the hypervisor would write it to
send a cmd to the guest
the payload buffer: output, for the hypervisor to read the info

These should be split.

We have:

1. request that hypervisor sends to guest, includes ID: input
2. synchronisation header with ID sent by guest: output
3. list of pages: input

2 and 3 must be on the same VQ. 1 can come on any VQ - reusing stats VQ
might make sense.


I would prefer to make the above item 1 come on the the free page vq,
because the existing stat_vq doesn't support the cmd hdr.
Now, I think it is also not necessary to move the existing stat_vq 
implementation to

a new implementation under the cmd hdr, because
that new support doesn't make a difference for stats, it will still use 
its stat_vq (the free

page vq will be used for reporting free pages only)

What do you think?


Best,
Wei



Re: [Qemu-devel] QEMU NVDIMM as type 7 in e820 table

2017-07-29 Thread Haozhong Zhang
On 07/28/17 13:45 -0600, Ross Zwisler wrote:
> On Fri, Jul 28, 2017 at 11:11:10AM -0700, Dan Williams wrote:
> > On Fri, Jul 28, 2017 at 11:04 AM, Ross Zwisler
> >  wrote:
> > > I've been using the virtualized NVDIMM support in QEMU for testing, and I
> > > noticed that the physical addresses used by the virtual NVDIMMs aren't 
> > > present
> > > in the guest's e820 table.
> > >
> > > Here is the e820 table on my QEMU instance where I have one 32 GiB virtual
> > > NVDIMM:
> > >
> > > [0.00] e820: BIOS-provided physical RAM map:
> > > [0.00] BIOS-e820: [mem 0x-0x0009fbff] 
> > > usable
> > > [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0x000f-0x000f] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0x0010-0xbffdefff] 
> > > usable
> > > [0.00] BIOS-e820: [mem 0xbffdf000-0xbfff] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0xfffc-0x] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0x0001-0x00023fff] 
> > > usable
> > >
> > > The physical addresses used by the virtual NVDIMM are 
> > > 0x24000-0xA4000.
> > > You can see this by looking at ndctl and the values we get from the NFIT:
> > >
> > > # ndctl list -R
> > > {
> > >   "dev":"region0",
> > >   "size":34359738368,
> > >   "available_size":0,
> > >   "type":"pmem"
> > > }
> > >
> > > # grep . /sys/bus/nd/devices/region0/{resource,size}
> > > region0/resource:0x24000
> > > region0/size:34359738368
> > >
> > > Or you can see the same info by using iasl to dump
> > > /sys/firmware/acpi/tables/NFIT:
> > >
> > > [028h 0040   2]Subtable Type :  [System Physical 
> > > Address Range]
> > > [02Ah 0042   2]   Length : 0038
> > >
> > > [02Ch 0044   2]  Range Index : 0002
> > > [02Eh 0046   2]Flags (decoded below) : 0003
> > >Add/Online Operation Only : 1
> > >   Proximity Domain Valid : 1
> > > [030h 0048   4] Reserved : 
> > > [034h 0052   4] Proximity Domain : 
> > > [038h 0056  16]   Address Range GUID : 
> > > 66F0D379-B4F3-4074-AC43-0D3318B78CDB
> > > [048h 0072   8]   Address Range Base : 00024000
> > > [050h 0080   8] Address Range Length : 0008
> > > [058h 0088   8] Memory Map Attribute : 8008
> > >
> > > I expected to see a type 7 region for the NVDIMM physical address range 
> > > in the
> > > e820 table, so something like:
> > >
> > > [0.00] e820: BIOS-provided physical RAM map:
> > > [0.00] BIOS-e820: [mem 0x-0x0009fbff] 
> > > usable
> > > [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0x000f-0x000f] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0x0010-0xbffdefff] 
> > > usable
> > > [0.00] BIOS-e820: [mem 0xbffdf000-0xbfff] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0xfffc-0x] 
> > > reserved
> > > [0.00] BIOS-e820: [mem 0x0001-0x00023fff] 
> > > usable
> > > [0.00] BIOS-e820: [mem 0x00024000-0x000A4000] 
> > > persistent (type 7)
> > >
> > 
> > Do you need that informationin e820? Linux effectively ignores type-7.
> > As long as the range is treated as reserved it's not clear that you
> > need the e820 entry. We also infect the persistent type back into the
> > memory map when the NFIT driver loads. /proc/iomem should show the
> > right data.
> 
> [ Adding Linda & Toshi to see if they have an opinion. ]
> 
> I guess maybe we don't need it.  Yep, /proc/iomem looks good:
> 
>   # cat /proc/iomem
>   -0fff : Reserved
>   1000-0009fbff : System RAM
>   ...
>   1-23fff : System RAM
>   24000-a3fff : Persistent Memory
> 24000-a3fff : namespace0.0
> 
> I was just worried that this was an inconsistency between the way that virtual
> NVDIMMs are presented vs the way that they will be presented on bare metal.  I
> at least look at the e820 table to get my bearings of how memory is laid out -
> maybe I just need to look at /proc/iomem instead?

Do any OS or applications rely on the E820 information or the
consistency between E820 and NFIT to properly work? If any, I can make
a QEMU patch to build type-7 e820 entries.

Haozhong



Re: [Qemu-devel] 'make help' generate unrelated files

2017-07-29 Thread Peter Maydell
On 29 July 2017 at 02:55, Philippe Mathieu-Daudé  wrote:
> Is it necessary to generate this files to run 'make help'?
>
> $ ./configure
> [...]
> $ make help
> CHK version_gen.h
> UPD version_gen.h
>  DEP util.c
> CHK version_gen.h

This happens because we have a rule that says
"Makefile: $(GENERATED_FILES)"
which is how we ensure that the autogenerated header
files are always built before any .c file is compiled.
The unfortunate side effect is that we rebuild the
generated files before processing *any* makefile
target, even those like 'clean' and 'help' that
don't actually require them.

I did once have a go at fixing this some years back:
http://patchwork.ozlabs.org/patch/365288/
but as you'll note from the last comment by me
in that mail thread it had a fatal flaw for
the "build from clean" case...

thanks
-- PMM



Re: [Qemu-devel] tb_invalidate_phys_range and tb_invalidate_phys_page_range

2017-07-29 Thread Peter Maydell
On 29 July 2017 at 01:33, Emilio G. Cota  wrote:
> While working on the removal of tb_lock, I stumbled upon the following.
>
> Commit 77a8f1a51 ("linux-user: Fix stale tbs after mmap") introduced
> tb_invalidate_phys_range(), which we now have in accel/tcg/translate-all.c
> [ patchwork thread here: https://patchwork.ozlabs.org/patch/158556/ ]
>
> This function calls
> tb_invalidate_phys_page_range(). As the name suggests, the latter
> function expects the range to be within the same page.
> The former does not have this requirement, as stated in the comment
> above the function (which I confirmed running some linux-user code):
>
> +/*
> + * invalidate all TBs which intersect with the target physical pages
> + * starting in range [start;end[. NOTE: start and end may refer to
> + * different physical pages. 'is_cpu_write_access' should be true if called
> + * from a real cpu write access: the virtual CPU will exit the current
> + * TB if code is modified inside this TB.
> + */
> +void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end,
> +  int is_cpu_write_access)
> +{
> +while (start < end) {
> +tb_invalidate_phys_page_range(start, end, is_cpu_write_access);
> +start &= TARGET_PAGE_MASK;
> +start += TARGET_PAGE_SIZE;
> +}
> +}
>
> What I find puzzling is that here we pass 'end' unmodified, instead of
> making sure the range passed is within a page.
>
> Is this a bug, or am I missing something?

It's a bit odd, but looking at the code I think it doesn't matter.
The only thing that tb_invalidate_phys_page_range() uses 'end'
for is when it does the "does this TB I've found overlap the
range we're flushing" check with
 if (!(tb_end <= start || tb_start >= end))

If we pass an 'end' value that's larger than it would be if
we confined it to the page, this is safe because at worst
we'll invalidate more TBs than we needed to (and in practice
if the TB we've found is within the full range that
tb_invalidate_phys_range() is checking we need to invalidate
it anyway). I haven't quite worked it through but I rather
suspect that you can't have a TB that's in the list for
the PageDesc for 'start' and which overlaps in the
"full" [start-end) range but which wouldn't overlap
in a truncated [start-(end rounded down to end of page))
range.

Basically the reason for the "same page" restriction
is that tb_invalidate_phys_page_range() only does
a single page_find() for the 'start' address. So
as long as we call it once per page in the range
it doesn't matter that we pass an overly pessimistic
end. This is kind of bordering on "happens to work"
territory though, so maybe we could improve it...

thanks
-- PMM



Re: [Qemu-devel] [PATCH 2/3] COLO: Define COLOMode without QAPI

2017-07-29 Thread Hailiang Zhang

On 2017/7/29 1:17, Dr. David Alan Gilbert wrote:

* Markus Armbruster (arm...@redhat.com) wrote:

COLOMode is defined in the QAPI schema, but not used there.  Of the
stuff QAPI generates for it only the typedef is actually used.  Use of
QAPI is pointless and only complicates things, so don't.

Hmm, in one of the old COLO worlds I have, there's code to emit an event
on exiting from COLO and that event includes the mode it was in.


Yes, we need it in the later series.


If the intent is to bring that or similar back then it would be worth
keeping.


Agreed.  ;)


Dave


Cc: zhanghailiang 
Signed-off-by: Markus Armbruster 
---
  include/migration/colo.h |  6 ++
  qapi-schema.json | 16 
  2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/include/migration/colo.h b/include/migration/colo.h
index ff9874e..5d7c500 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -26,6 +26,12 @@ void migration_incoming_exit_colo(void);
  void *colo_process_incoming_thread(void *opaque);
  bool migration_incoming_in_colo_state(void);
  
+typedef enum {

+COLO_MODE_UNKNOWN,
+COLO_MODE_PRIMARY,
+COLO_MODE_SECONDARY,
+} COLOMode;
+
  COLOMode get_colo_mode(void);
  
  /* failover */

diff --git a/qapi-schema.json b/qapi-schema.json
index 9b6f6cb..3f0eb05 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1304,22 +1304,6 @@
  'vmstate-loaded' ] }
  
  ##

-# @COLOMode:
-#
-# The colo mode
-#
-# @unknown: unknown mode
-#
-# @primary: master side
-#
-# @secondary: slave side
-#
-# Since: 2.8
-##
-{ 'enum': 'COLOMode',
-  'data': [ 'unknown', 'primary', 'secondary'] }
-
-##
  # @FailoverStatus:
  #
  # An enumeration of COLO failover status
--
2.7.5



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

.