Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG
On Sunday, July 30, 2017 12:23 PM, Michael S. Tsirkin wrote: > On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > > OK I thought this over. While we might need these new APIs > > > > > > > in the future, I think that at the moment, there's a way to > > > > > > > implement this feature that is significantly simpler. Just > > > > > > > add each s/g as a separate input buffer. > > > > > > Should it be an output buffer? > > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > > writeable by device: DMA_FROM_DEVICE. > > > > Why would the hypervisor need to zero the buffer? > > > The page is supplied to hypervisor and can lose the value that is > > > there. That is the definition of writeable by device. > > > > I think for the free pages, it should be clear that they will be added > > as output buffer to the device, because (as we discussed) they are > > just hints, and some of them may be used by the guest after the report_ API > > is > invoked. > > The device/hypervisor should not use or discard them. > > Discarding contents is exactly what you propose doing if migration is going > on, > isn't it? That's actually a different concept. Please let me explain it with this example: The hypervisor receives the hint saying the guest PageX is a free page, but as we know, after that report_ API exits, the guest kernel may take PageX to use, so PageX is not free page any more. At this time, if the hypervisor writes to the page, that would crash the guest. So, I think the cornerstone of this work is that the hypervisor should not touch the reported pages. Best, Wei
[Qemu-devel] [Bug 1473451] Re: Please support the native bios format for dec alpha
QEMU does not really implement a "true" ev67. We cheat and implement something that is significantly faster to emulate. E.g. doing all TLB refill within qemu, rather than in the PALcode. So, no, there's no chance of running true SRM or ARC firmware. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1473451 Title: Please support the native bios format for dec alpha Status in QEMU: New Bug description: Currently qemu-system-alpha -bios parameter takes an ELF image. However HP maintains firmware updates for those systems. Some example rom files can be found here ftp://ftp.hp.com/pub/alphaserver/firmware/current_platforms/v7.3_release/DS20_DS20e/ It might allow things like using the SRM firmware. The ARC(nt) firmware would allow to build and test windows applications for that platforms without having the relevant hardware To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1473451/+subscriptions
Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG
On Sat, Jul 29, 2017 at 08:47:08PM +0800, Wei Wang wrote: > On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: > > On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: > > > > > > OK I thought this over. While we might need these new APIs in > > > > > > the future, I think that at the moment, there's a way to implement > > > > > > this feature that is significantly simpler. Just add each s/g > > > > > > as a separate input buffer. > > > > > Should it be an output buffer? > > > > Hypervisor overwrites these pages with zeroes. Therefore it is > > > > writeable by device: DMA_FROM_DEVICE. > > > Why would the hypervisor need to zero the buffer? > > The page is supplied to hypervisor and can lose the value that > > is there. That is the definition of writeable by device. > > I think for the free pages, it should be clear that they will be added as > output buffer to the device, because (as we discussed) they are just hints, > and some of them may be used by the guest after the report_ API is invoked. > The device/hypervisor should not use or discard them. Discarding contents is exactly what you propose doing if migration is going on, isn't it? > For the balloon pages, I kind of agree with the existing implementation > (e.g. inside tell_host()), which uses virtqueue_add_outbuf (instead of > _add_inbuf()). This is because it does not pass SGs, it passes weirdly formatted PA within the buffer. > I think inbuf should be a buffer which will be written by the device and > read by the > driver. If hypervisor can change it, it's an inbuf. Should not matter whether driver reads it. > The cmd buffer put on the vq for the device to send commands can be > an > inbuf, I think. > > > > > > > I think it may only > > > need to read out the info(base,size). > > And then do what? > > > For the free pages, the info will be used to clear the corresponding "1" in > the dirty bitmap. > For balloon pages, they will be made DONTNEED and given to other host > processes to > use (the device won't write them, so no need to set "write" when > virtqueue_map_desc() in > the device). > > > > > > > I think it should be like this: > > > the cmd hdr buffer: input, because the hypervisor would write it to > > > send a cmd to the guest > > > the payload buffer: output, for the hypervisor to read the info > > These should be split. > > > > We have: > > > > 1. request that hypervisor sends to guest, includes ID: input > > 2. synchronisation header with ID sent by guest: output > > 3. list of pages: input > > > > 2 and 3 must be on the same VQ. 1 can come on any VQ - reusing stats VQ > > might make sense. > > I would prefer to make the above item 1 come on the the free page vq, > because the existing stat_vq doesn't support the cmd hdr. > Now, I think it is also not necessary to move the existing stat_vq > implementation to > a new implementation under the cmd hdr, because > that new support doesn't make a difference for stats, it will still use its > stat_vq (the free > page vq will be used for reporting free pages only) > > What do you think? > > > Best, > Wei
[Qemu-devel] [PATCH for 2.10] hw/mps2_scc: fix incorrect properties
Signed-off-by: Philippe Mathieu-Daudé--- sorry, I missed them in my review :( hw/misc/mps2-scc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c index cc58d26f29..32be2a9df1 100644 --- a/hw/misc/mps2-scc.c +++ b/hw/misc/mps2-scc.c @@ -270,9 +270,9 @@ static Property mps2_scc_properties[] = { /* Values for various read-only ID registers (which are specific * to the board model or FPGA image) */ -DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, aid, 0), +DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, cfg4, 0), DEFINE_PROP_UINT32("scc-aid", MPS2SCC, aid, 0), -DEFINE_PROP_UINT32("scc-id", MPS2SCC, aid, 0), +DEFINE_PROP_UINT32("scc-id", MPS2SCC, id, 0), /* These are the initial settings for the source clocks on the board. * In hardware they can be configured via a config file read by the * motherboard configuration controller to suit the FPGA image. -- 2.13.3
[Qemu-devel] [PATCH v6 2/4] sockets: factor out a new try_bind() function
A refactoring step to prepare for the problem exposed by the test-listen test in the previous commit. Simplify and reorganize the IPv6 specific extra measures and move it out of the for loop to increase code readability. No semantic changes. Signed-off-by: Knut OmangReviewed-by: Daniel P. Berrange --- util/qemu-sockets.c | 69 ++ 1 file changed, 39 insertions(+), 30 deletions(-) diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index 1358c81..b4a2f24 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -149,6 +149,44 @@ int inet_ai_family_from_address(InetSocketAddress *addr, return PF_UNSPEC; } +static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e) +{ +#ifndef IPV6_V6ONLY +return bind(socket, e->ai_addr, e->ai_addrlen); +#else +/* + * Deals with first & last cases in matrix in comment + * for inet_ai_family_from_address(). + */ +int v6only = +((!saddr->has_ipv4 && !saddr->has_ipv6) || + (saddr->has_ipv4 && saddr->ipv4 && + saddr->has_ipv6 && saddr->ipv6)) ? 0 : 1; +int stat; + + rebind: +if (e->ai_family == PF_INET6) { +qemu_setsockopt(socket, IPPROTO_IPV6, IPV6_V6ONLY, , +sizeof(v6only)); +} + +stat = bind(socket, e->ai_addr, e->ai_addrlen); +if (!stat) { +return 0; +} + +/* If we got EADDRINUSE from an IPv6 bind & v6only is unset, + * it could be that the IPv4 port is already claimed, so retry + * with v6only set + */ +if (e->ai_family == PF_INET6 && errno == EADDRINUSE && !v6only) { +v6only = 1; +goto rebind; +} +return stat; +#endif +} + static int inet_listen_saddr(InetSocketAddress *saddr, int port_offset, bool update_addr, @@ -228,39 +266,10 @@ static int inet_listen_saddr(InetSocketAddress *saddr, port_min = inet_getport(e); port_max = saddr->has_to ? saddr->to + port_offset : port_min; for (p = port_min; p <= port_max; p++) { -#ifdef IPV6_V6ONLY -/* - * Deals with first & last cases in matrix in comment - * for inet_ai_family_from_address(). - */ -int v6only = -((!saddr->has_ipv4 && !saddr->has_ipv6) || - (saddr->has_ipv4 && saddr->ipv4 && - saddr->has_ipv6 && saddr->ipv6)) ? 0 : 1; -#endif inet_setport(e, p); -#ifdef IPV6_V6ONLY -rebind: -if (e->ai_family == PF_INET6) { -qemu_setsockopt(slisten, IPPROTO_IPV6, IPV6_V6ONLY, , -sizeof(v6only)); -} -#endif -if (bind(slisten, e->ai_addr, e->ai_addrlen) == 0) { +if (try_bind(slisten, saddr, e) >= 0) { goto listen; } - -#ifdef IPV6_V6ONLY -/* If we got EADDRINUSE from an IPv6 bind & V6ONLY is unset, - * it could be that the IPv4 port is already claimed, so retry - * with V6ONLY set - */ -if (e->ai_family == PF_INET6 && errno == EADDRINUSE && !v6only) { -v6only = 1; -goto rebind; -} -#endif - if (p == port_max) { if (!e->ai_next) { error_setg_errno(errp, errno, "Failed to bind socket"); -- git-series 0.9.1
[Qemu-devel] [PATCH v6 3/4] sockets: factor out create_fast_reuse_socket
Another refactoring step to prepare for fixing the problem exposed with the test-listen test in the previous commit Signed-off-by: Knut Omang--- util/qemu-sockets.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index b4a2f24..d0d2047 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -149,6 +149,16 @@ int inet_ai_family_from_address(InetSocketAddress *addr, return PF_UNSPEC; } +static int create_fast_reuse_socket(struct addrinfo *e) +{ +int slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol); +if (slisten < 0) { +return -1; +} +socket_set_fast_reuse(slisten); +return slisten; +} + static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e) { #ifndef IPV6_V6ONLY @@ -253,7 +263,8 @@ static int inet_listen_saddr(InetSocketAddress *saddr, getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen, uaddr,INET6_ADDRSTRLEN,uport,32, NI_NUMERICHOST | NI_NUMERICSERV); -slisten = qemu_socket(e->ai_family, e->ai_socktype, e->ai_protocol); + +slisten = create_fast_reuse_socket(e); if (slisten < 0) { if (!e->ai_next) { error_setg_errno(errp, errno, "Failed to create socket"); @@ -261,8 +272,6 @@ static int inet_listen_saddr(InetSocketAddress *saddr, continue; } -socket_set_fast_reuse(slisten); - port_min = inet_getport(e); port_max = saddr->has_to ? saddr->to + port_offset : port_min; for (p = port_min; p <= port_max; p++) { -- git-series 0.9.1
[Qemu-devel] [PATCH v6 4/4] sockets: Handle race condition between binds to the same port
If an offset of ports is specified to the inet_listen_saddr function(), and two or more processes tries to bind from these ports at the same time, occasionally more than one process may be able to bind to the same port. The condition is detected by listen() but too late to avoid a failure. This function is called by socket_listen() and used by all socket listening code in QEMU, so all cases where any form of dynamic port selection is used should be subject to this issue. Add code to close and re-establish the socket when this condition is observed, hiding the race condition from the user. Also clean up some issues with error handling to allow more accurate reporting of the cause of an error. This has been developed and tested by means of the test-listen unit test in the previous commit. Enable the test for make check now that it passes. Reviewed-by: Bhavesh DavdaReviewed-by: Yuval Shaia Reviewed-by: Girish Moodalbail Signed-off-by: Knut Omang --- tests/Makefile.include | 2 +- util/qemu-sockets.c| 58 ++- 2 files changed, 42 insertions(+), 18 deletions(-) diff --git a/tests/Makefile.include b/tests/Makefile.include index b37c0c8..9d2131d 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -128,7 +128,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF) gcov-files-check-bufferiszero-y = util/bufferiszero.c check-unit-y += tests/test-uuid$(EXESUF) check-unit-y += tests/ptimer-test$(EXESUF) -#check-unit-y += tests/test-listen$(EXESUF) +check-unit-y += tests/test-listen$(EXESUF) gcov-files-ptimer-test-y = hw/core/ptimer.c check-unit-y += tests/test-qapi-util$(EXESUF) gcov-files-test-qapi-util-y = qapi/qapi-util.c diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index d0d2047..6a511fb 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -206,7 +206,10 @@ static int inet_listen_saddr(InetSocketAddress *saddr, char port[33]; char uaddr[INET6_ADDRSTRLEN+1]; char uport[33]; -int slisten, rc, port_min, port_max, p; +int rc, port_min, port_max, p; +int slisten = 0; +int saved_errno = 0; +bool socket_created = false; Error *err = NULL; memset(,0, sizeof(ai)); @@ -258,7 +261,7 @@ static int inet_listen_saddr(InetSocketAddress *saddr, return -1; } -/* create socket + bind */ +/* create socket + bind/listen */ for (e = res; e != NULL; e = e->ai_next) { getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen, uaddr,INET6_ADDRSTRLEN,uport,32, @@ -266,37 +269,58 @@ static int inet_listen_saddr(InetSocketAddress *saddr, slisten = create_fast_reuse_socket(e); if (slisten < 0) { -if (!e->ai_next) { -error_setg_errno(errp, errno, "Failed to create socket"); -} continue; } +socket_created = true; port_min = inet_getport(e); port_max = saddr->has_to ? saddr->to + port_offset : port_min; for (p = port_min; p <= port_max; p++) { inet_setport(e, p); -if (try_bind(slisten, saddr, e) >= 0) { -goto listen; -} -if (p == port_max) { -if (!e->ai_next) { +rc = try_bind(slisten, saddr, e); +if (rc) { +if (errno == EADDRINUSE) { +continue; +} else { error_setg_errno(errp, errno, "Failed to bind socket"); +goto listen_failed; } } +if (!listen(slisten, 1)) { +goto listen_ok; +} +if (errno != EADDRINUSE) { +error_setg_errno(errp, errno, "Failed to listen on socket"); +goto listen_failed; +} +/* Someone else managed to bind to the same port and beat us + * to listen on it! Socket semantics does not allow us to + * recover from this situation, so we need to recreate the + * socket to allow bind attempts for subsequent ports: + */ +closesocket(slisten); +slisten = create_fast_reuse_socket(e); +if (slisten < 0) { +error_setg_errno(errp, errno, + "Failed to recreate failed listening socket"); +goto listen_failed; +} } +} +error_setg_errno(errp, errno, + socket_created ? + "Failed to find an available port" : + "Failed to create a socket"); +listen_failed: +saved_errno = errno; +if (slisten >= 0) { closesocket(slisten); } freeaddrinfo(res); +errno = saved_errno; return -1; -listen: -if (listen(slisten,1) != 0) { -
[Qemu-devel] [PATCH v6 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port
This series contains: * a unit test that exposes a race condition which causes QEMU to fail to find a port even when there is plenty of available ports. * a refactor of the qemu-sockets inet_listen_saddr() function to better handle this situation. Changes from v5: * Also move setting of error from socket creation out into the main double for loop. * Simplify and enhance reporting if socket creation fails: Only report failure to create sockets if none of the provided addrinfo list elements allows creation of a socket, otherwise report failure to find an available port. * Further simplify if's within the for loops and make sure failure to recreate a socket gets distinctively reported. * Rebased to current master. Changes from v4: * Move the complexity of recreating a socket and setting the error pointer into the main for loop, eliminating the try_bind_listen() function again. Cleaning up and improving error handling in the process. Changes from v3: * Test changes: Add missing license Add subtests for ipv4, ipv6 and both Various g_* usage improvements * Split patch into 3 patches with two refactoring patches ahead of the actual fix. Changes from v2: * Non-trivial rebase + further abstraction on top of 7ad9af343c7f1c70c8015c7c519c312d8c5f9fa1 'tests: add functional test validating ipv4/ipv6 address flag handling' Changes from v1: * Fix potential uninitialized variable only detected by optimize. * Improve unexpected error detection in test-listen to give more details about why the test fails unexpectedly. * Fix some line length style issues. Thanks, Knut Knut Omang (4): tests: Add test-listen - a stress test for QEMU socket listen sockets: factor out a new try_bind() function sockets: factor out create_fast_reuse_socket sockets: Handle race condition between binds to the same port tests/Makefile.include | 2 +- tests/test-listen.c| 253 ++- util/qemu-sockets.c| 138 +++ 3 files changed, 345 insertions(+), 48 deletions(-) create mode 100644 tests/test-listen.c base-commit: a588c4985eff363154d65aee8607d0a4601655f7 -- git-series 0.9.1
[Qemu-devel] [PATCH v6 1/4] tests: Add test-listen - a stress test for QEMU socket listen
There's a potential race condition between multiple bind()'s attempting to bind to the same port, which occasionally allows more than one bind to succeed against the same port. When a subsequent listen() call is made with the same socket only one will succeed. The current QEMU code does however not take this situation into account and the listen will cause the code to break out and fail even when there are actually available ports to use. This test exposes two subtests: /socket/listen-serial /socket/listen-compete The "compete" subtest creates a number of threads and have them all trying to bind to the same port with a large enough offset input to allow all threads to get it's own port. The "serial" subtest just does the same, except in series in a single thread. The serial version passes, probably in most versions of QEMU. The parallel version exposes the problem in a relatively reliable way, eg. it fails a majority of times, but not with a 100% rate, occasional passes can be seen. Nevertheless this is quite good given that the bug was tricky to reproduce and has been left undetected for a while. The problem seems to be present in all versions of QEMU. The original failure scenario occurred with VNC port allocation in a traditional Xen based build, in different code but with similar functionality. Reported-by: Bhavesh DavdaSigned-off-by: Knut Omang Reviewed-by: Yuval Shaia Reviewed-by: Bhavesh Davda Reviewed-by: Girish Moodalbail --- tests/Makefile.include | 2 +- tests/test-listen.c| 253 ++- 2 files changed, 255 insertions(+) create mode 100644 tests/test-listen.c diff --git a/tests/Makefile.include b/tests/Makefile.include index 7af278d..b37c0c8 100644 --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -128,6 +128,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF) gcov-files-check-bufferiszero-y = util/bufferiszero.c check-unit-y += tests/test-uuid$(EXESUF) check-unit-y += tests/ptimer-test$(EXESUF) +#check-unit-y += tests/test-listen$(EXESUF) gcov-files-ptimer-test-y = hw/core/ptimer.c check-unit-y += tests/test-qapi-util$(EXESUF) gcov-files-test-qapi-util-y = qapi/qapi-util.c @@ -769,6 +770,7 @@ tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y) tests/numa-test$(EXESUF): tests/numa-test.o tests/vmgenid-test$(EXESUF): tests/vmgenid-test.o tests/boot-sector.o tests/acpi-utils.o +tests/test-listen$(EXESUF): tests/test-listen.o $(test-util-obj-y) tests/migration/stress$(EXESUF): tests/migration/stress.o $(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"LINK","$(TARGET_DIR)$@") diff --git a/tests/test-listen.c b/tests/test-listen.c new file mode 100644 index 000..5c07537 --- /dev/null +++ b/tests/test-listen.c @@ -0,0 +1,253 @@ +/* + * Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved. + *Author: Knut Omang + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation. + * + * Test parallel port listen configuration with + * dynamic port allocation + */ + +#include "qemu/osdep.h" +#include "libqtest.h" +#include "qemu-common.h" +#include "qemu/thread.h" +#include "qemu/sockets.h" +#include "qapi/error.h" + +#define NAME_LEN 1024 +#define PORT_LEN 16 + +struct thr_info { +QemuThread thread; +int to_port; +bool ipv4; +bool ipv6; +int got_port; +int eno; +int fd; +const char *errstr; +char hostname[NAME_LEN + 1]; +char port[PORT_LEN + 1]; +}; + + +/* These two functions taken from test-io-channel-socket.c */ +static int check_bind(const char *hostname, bool *has_proto) +{ +int fd = -1; +struct addrinfo ai, *res = NULL; +int rc; +int ret = -1; + +memset(, 0, sizeof(ai)); +ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; +ai.ai_family = AF_UNSPEC; +ai.ai_socktype = SOCK_STREAM; + +/* lookup */ +rc = getaddrinfo(hostname, NULL, , ); +if (rc != 0) { +if (rc == EAI_ADDRFAMILY || +rc == EAI_FAMILY) { +*has_proto = false; +goto done; +} +goto cleanup; +} + +fd = qemu_socket(res->ai_family, res->ai_socktype, res->ai_protocol); +if (fd < 0) { +goto cleanup; +} + +if (bind(fd, res->ai_addr, res->ai_addrlen) < 0) { +if (errno == EADDRNOTAVAIL) { +*has_proto = false; +goto done; +} +goto cleanup; +} + +*has_proto = true; + done: +ret = 0; + + cleanup: +if (fd != -1) { +close(fd); +} +if (res) { +freeaddrinfo(res); +} +return ret; +} + +static int
Re: [Qemu-devel] [PATCH v5 2/4] sockets: factor out create_fast_reuse_socket
On Tue, 2017-07-25 at 10:38 +0100, Daniel P. Berrange wrote: > On Sat, Jul 22, 2017 at 09:49:31AM +0200, Knut Omang wrote: > > First refactoring step to prepare for fixing the problem > > exposed with the test-listen test in the previous commit > > > > Signed-off-by: Knut Omang> > --- > > util/qemu-sockets.c | 24 +--- > > 1 file changed, 17 insertions(+), 7 deletions(-) > > > > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c > > index 1358c81..578f25b 100644 > > --- a/util/qemu-sockets.c > > +++ b/util/qemu-sockets.c > > @@ -149,6 +149,20 @@ int inet_ai_family_from_address(InetSocketAddress > > *addr, > > return PF_UNSPEC; > > } > > > > +static int create_fast_reuse_socket(struct addrinfo *e, Error **errp) > > +{ > > +int slisten = qemu_socket(e->ai_family, e->ai_socktype, > > e->ai_protocol); > > +if (slisten < 0) { > > +if (!e->ai_next) { > > +error_setg_errno(errp, errno, "Failed to create socket"); > > +} > > +return -1; > > +} > > + > > +socket_set_fast_reuse(slisten); > > +return slisten; > > +} > > As mentioned in the previous review, I don't think we should have > methods like this which sometimes report an error on failure, and > sometimes don't report an error. It makes it hard to review the > callers for correctness of error handling. Sorry, somehow this one slipped through.. > > > + > > static int inet_listen_saddr(InetSocketAddress *saddr, > > int port_offset, > > bool update_addr, > > @@ -210,21 +224,17 @@ static int inet_listen_saddr(InetSocketAddress *saddr, > > return -1; > > } > > > > -/* create socket + bind */ > > +/* create socket + bind/listen */ > > for (e = res; e != NULL; e = e->ai_next) { > > getnameinfo((struct sockaddr*)e->ai_addr,e->ai_addrlen, > > uaddr,INET6_ADDRSTRLEN,uport,32, > > NI_NUMERICHOST | NI_NUMERICSERV); > > -slisten = qemu_socket(e->ai_family, e->ai_socktype, > > e->ai_protocol); > > + > > +slisten = create_fast_reuse_socket(e, ); > > if (slisten < 0) { > > -if (!e->ai_next) { > > -error_setg_errno(errp, errno, "Failed to create socket"); > > -} > > So please leave this here. Since we are already outside of the scope of the original patch, let me suggest: As far as I can see there's no good reason to have different behavior if creating sockets on the last 'struct addrinfo' in the list fails than if an intermediate addrinfo fails and the next addrinfo in the list just happened to have no free ports. As far as I can see the 'Failed to create socket' message will be most useful to the user if creating a socket fails on all elements in the list, to detect issues with the network setup. I have implemented this semantics in v6 - it should be an enhancement while also simplifying the code inside the for loops, getting rid of the conditional error setting and avoid code duplication by handling the same error twice in proximity. Thanks, Knut > > > continue; > > } > > > > -socket_set_fast_reuse(slisten); > > - > > port_min = inet_getport(e); > > port_max = saddr->has_to ? saddr->to + port_offset : port_min; > > for (p = port_min; p <= port_max; p++) { > > > Regards, > Daniel
Re: [Qemu-devel] [PATCH v5 4/4] sockets: Handle race condition between binds to the same port
On Tue, 2017-07-25 at 10:54 +0100, Daniel P. Berrange wrote: > On Sat, Jul 22, 2017 at 09:49:33AM +0200, Knut Omang wrote: > > If an offset of ports is specified to the inet_listen_saddr function(), > > and two or more processes tries to bind from these ports at the same time, > > occasionally more than one process may be able to bind to the same > > port. The condition is detected by listen() but too late to avoid a failure. > > > > This function is called by socket_listen() and used > > by all socket listening code in QEMU, so all cases where any form of dynamic > > port selection is used should be subject to this issue. > > > > Add code to close and re-establish the socket when this > > condition is observed, hiding the race condition from the user. > > > > Also clean up some issues with error handling to allow more > > accurate reporting of the cause of an error. > > > > This has been developed and tested by means of the > > test-listen unit test in the previous commit. > > Enable the test for make check now that it passes. > > > > Signed-off-by: Knut Omang> > Reviewed-by: Bhavesh Davda > > Reviewed-by: Yuval Shaia > > Reviewed-by: Girish Moodalbail > > Signed-off-by: Knut Omang > > --- > > tests/Makefile.include | 2 +- > > util/qemu-sockets.c| 51 --- > > 2 files changed, 39 insertions(+), 14 deletions(-) > > > > diff --git a/tests/Makefile.include b/tests/Makefile.include > > index b37c0c8..9d2131d 100644 > > --- a/tests/Makefile.include > > +++ b/tests/Makefile.include > > @@ -128,7 +128,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF) > > gcov-files-check-bufferiszero-y = util/bufferiszero.c > > check-unit-y += tests/test-uuid$(EXESUF) > > check-unit-y += tests/ptimer-test$(EXESUF) > > -#check-unit-y += tests/test-listen$(EXESUF) > > +check-unit-y += tests/test-listen$(EXESUF) > > gcov-files-ptimer-test-y = hw/core/ptimer.c > > check-unit-y += tests/test-qapi-util$(EXESUF) > > gcov-files-test-qapi-util-y = qapi/qapi-util.c > > diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c > > index 39f207c..8ca3ba6 100644 > > --- a/util/qemu-sockets.c > > +++ b/util/qemu-sockets.c > > @@ -210,7 +210,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr, > > char port[33]; > > char uaddr[INET6_ADDRSTRLEN+1]; > > char uport[33]; > > -int slisten, rc, port_min, port_max, p; > > +int rc, port_min, port_max, p; > > +int slisten = 0; > > +int saved_errno = 0; > > Error *err = NULL; > > > > memset(,0, sizeof(ai)); > > @@ -277,27 +279,50 @@ static int inet_listen_saddr(InetSocketAddress *saddr, > > port_max = saddr->has_to ? saddr->to + port_offset : port_min; > > for (p = port_min; p <= port_max; p++) { > > inet_setport(e, p); > > -if (try_bind(slisten, saddr, e) >= 0) { > > -goto listen; > > -} > > -if (p == port_max) { > > -if (!e->ai_next) { > > +rc = try_bind(slisten, saddr, e); > > +if (rc) { > > +if (errno == EADDRINUSE) { > > Add in '&& e->ai_next' here, so we trigger the error handling > on the else branch if this was the last address we had. See my comment on patch 2 - with that reporting semantics, inability to create a socket will only be reported if none of the addrinfo structs allowed socket creation so this reporting case goes away. > > +continue; > > +} else { > > error_setg_errno(errp, errno, "Failed to bind socket"); > > +goto listen_failed; > > +} > > +} > > +rc = listen(slisten, 1); > > +if (!rc) { > > Assigning to rc seems redundant here. removed. > > +goto listen_ok; > > +} else if (errno == EADDRINUSE) { > > +/* Someone else managed to bind to the same port and beat > > us > > + * to listen on it! Socket semantics does not allow us to > > + * recover from this situation, so we need to recreate the > > + * socket to allow bind attempts for subsequent ports: > > + */ > > +closesocket(slisten); > > +slisten = create_fast_reuse_socket(e, errp); > > create_fast_reuse_socket may report an error here > > > +if (slisten >= 0) { > > +continue; > > } > > } > > +error_setg_errno(errp, errno, "Failed to listen on socket"); > > at which point you report another error on top of it with a stale > errno value here. This bug would be more obvious if error reporting > was pulled out of create_fast_reuse_socket() as described in the > earlier patch. > > I think it'd be clearer if
Re: [Qemu-devel] [PATCH] 9pfs: include for XATTR_SIZE_MAX
On 29 July 2017 at 14:50, Patrick Steinhardtwrote: > On Fri, Jul 28, 2017 at 02:20:49PM -0300, Philippe Mathieu-Daudé wrote: >> This is likely to break on BSD, but now than patchew has a NetBSD job >> you can trigger a build RESENDing this patch. > Thanks for the feedback! Is this really relevant in this context, > though? Both 9p-local.c and 9p-handle.c already include linux > headers, especially is included without any ifdef > around. As such, I simply assumed that this code is being built > on Linux systems, only. Yes; hw/Makefile.objs only includes the 9pfs directory if CONFIG_VIRTFS is defined, and configure only sets that on Linux hosts which have libpcap and libattr available. thanks -- PMM
[Qemu-devel] [PATCH v4 1/3] block: add bdrv_get_format_alloc_stat format interface
The function should collect statistics, about used/unused by top-level format driver space (in its .file) and allocation status (data/zero/discarded/after-eof) of corresponding areas in this .file. Signed-off-by: Vladimir Sementsov-Ogievskiy--- block.c | 16 +++ include/block/block.h | 3 ++ include/block/block_int.h | 2 ++ qapi/block-core.json | 72 +++ 4 files changed, 93 insertions(+) diff --git a/block.c b/block.c index 50ba264143..7d720ae0c2 100644 --- a/block.c +++ b/block.c @@ -3407,6 +3407,22 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs) } /** + * Collect format allocation info. See BlockFormatAllocInfo definition in + * qapi/block-core.json. + */ +int bdrv_get_format_alloc_stat(BlockDriverState *bs, BlockFormatAllocInfo *bfai) +{ +BlockDriver *drv = bs->drv; +if (!drv) { +return -ENOMEDIUM; +} +if (drv->bdrv_get_format_alloc_stat) { +return drv->bdrv_get_format_alloc_stat(bs, bfai); +} +return -ENOTSUP; +} + +/** * Return number of sectors on success, -errno on error. */ int64_t bdrv_nb_sectors(BlockDriverState *bs) diff --git a/include/block/block.h b/include/block/block.h index 9b355e92d8..646376a772 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -335,6 +335,9 @@ typedef enum { int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix); +int bdrv_get_format_alloc_stat(BlockDriverState *bs, + BlockFormatAllocInfo *bfai); + /* The units of offset and total_work_size may be chosen arbitrarily by the * block driver; total_work_size may change during the course of the amendment * operation */ diff --git a/include/block/block_int.h b/include/block/block_int.h index 8d3724cce6..458c715e99 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -208,6 +208,8 @@ struct BlockDriver { int64_t (*bdrv_getlength)(BlockDriverState *bs); bool has_variable_length; int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs); +int (*bdrv_get_format_alloc_stat)(BlockDriverState *bs, + BlockFormatAllocInfo *bfai); int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov); diff --git a/qapi/block-core.json b/qapi/block-core.json index ea0b3e8b13..93f6995381 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -139,6 +139,78 @@ '*format-specific': 'ImageInfoSpecific' } } ## +# @BlockFormatAllocInfo: +# +# +# Allocation information of an underlying protocol file, as partitioned by a +# format driver's utilization of said allocations. +# All fields are in bytes. +# +# Regions of the underlying protocol file may be considered used or unused by +# the format driver interpreting these regions. It is at the discretion of the +# format driver (e.g. qcow2) which regions of its backing storage are considered +# in-use or not. +# +# For now, the only format driver supporting this feature is Qcow2 which is a +# cluster based format. Clusters considered in-use by qcow2 are those with a +# non-zero refcount in the format metadata. All other clusters, if present, are +# considered unused. Examples of unused allocations for the Qcow2 format are +# leaked clusters, pre-allocated clusters, and recently freed clusters. +# +# Note: the whole underlying protocol file is described as well as all format +# file allocations, not only virtual disk data (metadata, internal snapshots, +# etc. are included). +# +# For the underlying protocol file there are native block-status types of the +# regions: +# - data: allocated data +# - zero: reported as zero (for example, this type corresponds to holes for +# POSIX files on sparce file-system) +# - discarded: not allocated +# 4th additional type is 'overrun', is data referenced by the format driver +# located beyond EOF of the underlying protocol file. For example, a partially +# allocated cluster at the end of a QCOW2 file, where Qcow2 generally operates +# on complete clusters. +# +# So, the fields are: +# +# @used-data: used by the format file and backed by data in the underlying +# protocol file +# +# @used-zero: used by the format file and backed by zeroes in the underlying +# protocol file; which may be a filesystem hole for POSIX files. +# +# @used-discarded: used by the format file but actually unallocated in the +# underlying protocol file +# +# @used-overrun: used by the format file beyond the end of the underlying +#protocol file +# +# @unused-data: allocated data in the underlying protocol file not used by the +# format file +# +# @unused-zero: reported-as-zero regions in the underlying protocol file not +# used by the format file +# +# @unused-discarded:
[Qemu-devel] [PATCH v4 0/3] qemu-img check: format allocation info
Hi all. See 01 patch for the doc. Question to discuss. If I understand correctly get_block_status flags allocated, zero, and data actually provide 5 possible combinations, which I combine into three. allocated data zero 1 11\__ data 1 10/ 1 01\__ zero 0 01/ 0 00___ discarded This division looks not bad, but it is not the only one possible. Separating data is really useful - it shows leaked clusters.. So the question is, don't we want to adjust the division? I'm ok with the current one. v4: - reword docs in 01 - s/2.10/2.11/ for qapi v3: - improve docs - rename fields - add 'zero' type of underlying file portions status. It as these areas cannot be presented as 'discarded', but they are not occupying real space, so we don't want to present them as 'allocated data' too. - remove last patch. It is not needed after introducing new naming for the fields v2: fix build error, gcc things that some variables may be used uninitialized (actually they didn't). v1: These series is a replacement for "qemu-img check: unallocated size" series. There was a question, should we account allocated clusters in qcow2 but actually holes in underalying file as allocated or not. Instead of hiding this information in one-number statistic I've decided to print the whole information, 5 numbers: For allocated by top-level format driver (qcow2 for ex.) clusters, 3 numbers: number of bytes, which are: - allocated in underlying file - holes in underlying file - after end of underlying file To account other areas of underlying file, 2 more numbers of bytes: - unallocated by top-level driver but allocated in underlying file - unallocated by top-level driver and holes in underlying file Vladimir Sementsov-Ogievskiy (3): block: add bdrv_get_format_alloc_stat format interface qcow2: add .bdrv_get_format_alloc_stat qemu-img check: add format allocation info block.c | 16 + block/qcow2-refcount.c| 152 ++ block/qcow2.c | 2 + block/qcow2.h | 2 + include/block/block.h | 3 + include/block/block_int.h | 2 + qapi/block-core.json | 78 +++- qemu-img.c| 42 + 8 files changed, 296 insertions(+), 1 deletion(-) -- 2.11.1
[Qemu-devel] [PATCH v4 3/3] qemu-img check: add format allocation info
Signed-off-by: Vladimir Sementsov-Ogievskiy--- qapi/block-core.json | 6 +- qemu-img.c | 42 ++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 93f6995381..d662786261 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -249,6 +249,9 @@ # field is present if the driver for the image format # supports it # +# @format-alloc-info: Format-allocation information, see +# BlockFormatAllocInfo description. (Since: 2.11) +# # Since: 1.4 # ## @@ -257,7 +260,8 @@ '*image-end-offset': 'int', '*corruptions': 'int', '*leaks': 'int', '*corruptions-fixed': 'int', '*leaks-fixed': 'int', '*total-clusters': 'int', '*allocated-clusters': 'int', - '*fragmented-clusters': 'int', '*compressed-clusters': 'int' } } + '*fragmented-clusters': 'int', '*compressed-clusters': 'int', + '*format-alloc-info': 'BlockFormatAllocInfo' } } ## # @MapEntry: diff --git a/qemu-img.c b/qemu-img.c index b506839ef0..b3adf9a1a2 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -560,6 +560,35 @@ static void dump_json_image_check(ImageCheck *check, bool quiet) QDECREF(str); } +static void dump_human_format_alloc_info(BlockFormatAllocInfo *bfai, bool quiet) +{ +char *used_data = size_to_str(bfai->used_data); +char *used_zero = size_to_str(bfai->used_zero); +char *used_discarded = size_to_str(bfai->used_discarded); +char *used_overrun = size_to_str(bfai->used_overrun); + +char *unused_data = size_to_str(bfai->unused_data); +char *unused_zero = size_to_str(bfai->unused_zero); +char *unused_discarded = size_to_str(bfai->unused_discarded); + +qprintf(quiet, +"Format allocation info (including metadata):\n" +" datazero discarded after-eof\n" +"used %10s %10s %10s %10s\n" +"unused %10s %10s %10s\n", +used_data, used_zero, used_discarded, used_overrun, +unused_data, unused_zero, unused_discarded); + +g_free(used_data); +g_free(used_zero); +g_free(used_discarded); +g_free(used_overrun); + +g_free(unused_data); +g_free(unused_zero); +g_free(unused_discarded); +} + static void dump_human_image_check(ImageCheck *check, bool quiet) { if (!(check->corruptions || check->leaks || check->check_errors)) { @@ -601,6 +630,10 @@ static void dump_human_image_check(ImageCheck *check, bool quiet) qprintf(quiet, "Image end offset: %" PRId64 "\n", check->image_end_offset); } + +if (check->has_format_alloc_info) { +dump_human_format_alloc_info(check->format_alloc_info, quiet); +} } static int collect_image_check(BlockDriverState *bs, @@ -611,6 +644,7 @@ static int collect_image_check(BlockDriverState *bs, { int ret; BdrvCheckResult result; +BlockFormatAllocInfo *bfai = g_new0(BlockFormatAllocInfo, 1); ret = bdrv_check(bs, , fix); if (ret < 0) { @@ -639,6 +673,14 @@ static int collect_image_check(BlockDriverState *bs, check->compressed_clusters = result.bfi.compressed_clusters; check->has_compressed_clusters = result.bfi.compressed_clusters != 0; +ret = bdrv_get_format_alloc_stat(bs, bfai); +if (ret < 0) { +qapi_free_BlockFormatAllocInfo(bfai); +} else { +check->has_format_alloc_info = true; +check->format_alloc_info = bfai; +} + return 0; } -- 2.11.1
[Qemu-devel] [PATCH v4 2/3] qcow2: add .bdrv_get_format_alloc_stat
Realize .bdrv_get_format_alloc_stat interface. Signed-off-by: Vladimir Sementsov-Ogievskiy--- block/qcow2-refcount.c | 152 + block/qcow2.c | 2 + block/qcow2.h | 2 + 3 files changed, 156 insertions(+) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 7c06061aae..fd6027f0c2 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -2931,3 +2931,155 @@ done: qemu_vfree(new_refblock); return ret; } + +typedef struct FormatAllocStatCo { +BlockDriverState *bs; +BlockFormatAllocInfo *bfai; +int64_t ret; +} FormatAllocStatCo; + +static void coroutine_fn qcow2_get_format_alloc_stat_entry(void *opaque) +{ +int ret = 0; +FormatAllocStatCo *nbco = opaque; +BlockDriverState *bs = nbco->bs; +BDRVQcow2State *s = bs->opaque; +BlockFormatAllocInfo *bfai = nbco->bfai; +int64_t cluster, file_sectors, sector; +int refcount_block_offset; +uint32_t i; +bool used = false, f_data = false, f_zero = false; +int dif, num = 0, f_num = 0; + +memset(bfai, 0, sizeof(*bfai)); + +file_sectors = bdrv_nb_sectors(bs->file->bs); +if (file_sectors < 0) { +nbco->ret = file_sectors; +return; +} + +qemu_co_mutex_lock(>lock); + +for (sector = 0; sector < file_sectors; sector += dif) { +if (f_num == 0) { +BlockDriverState *file; +ret = bdrv_get_block_status(bs->file->bs, sector, +file_sectors - sector, _num, ); +if (ret < 0) { +goto fail; +} +f_data = ret & BDRV_BLOCK_DATA; +f_zero = ret & BDRV_BLOCK_ZERO; +} + +if (num == 0) { +uint64_t refcount; +assert(((sector << BDRV_SECTOR_BITS) & (s->cluster_size - 1)) == 0); +ret = qcow2_get_refcount( +bs, (sector << BDRV_SECTOR_BITS) >> s->cluster_bits, ); +if (ret < 0) { +goto fail; +} +used = refcount > 0; +num = s->cluster_size >> BDRV_SECTOR_BITS; +} + +dif = MIN(f_num, MIN(num, file_sectors - sector)); +if (used) { +if (f_data) { +bfai->used_data += dif; +} else if (f_zero) { +bfai->used_zero += dif; +} else { +bfai->used_discarded += dif; +} +} else { +if (f_data) { +bfai->unused_data += dif; +} else if (f_zero) { +bfai->unused_zero += dif; +} else { +bfai->unused_discarded += dif; +} +} +f_num -= dif; +num -= dif; +} + +assert(f_num == 0); + +if (used) { +bfai->used_overrun += num; +} + +cluster = size_to_clusters(s, sector << BDRV_SECTOR_BITS); +refcount_block_offset = cluster & (s->refcount_block_size - 1); +for (i = cluster >> s->refcount_block_bits; + i <= s->max_refcount_table_index; i++) +{ +int j; + +if (!(s->refcount_table[i] & REFT_OFFSET_MASK)) { +refcount_block_offset = 0; +continue; +} + +for (j = refcount_block_offset; j < s->refcount_block_size; j++) { +uint64_t refcount; +cluster = (i << s->refcount_block_bits) + j; + +ret = qcow2_get_refcount(bs, cluster, ); +if (ret < 0) { +goto fail; +} +if (refcount > 0) { +bfai->used_overrun++; +} +} + +refcount_block_offset = 0; +} + +qemu_co_mutex_unlock(>lock); + +bfai->used_data = bfai->used_data << BDRV_SECTOR_BITS; +bfai->used_zero = bfai->used_zero << BDRV_SECTOR_BITS; +bfai->used_discarded = bfai->used_discarded << BDRV_SECTOR_BITS; +bfai->used_overrun = bfai->used_overrun << BDRV_SECTOR_BITS; + +bfai->unused_data = bfai->unused_data << BDRV_SECTOR_BITS; +bfai->unused_zero = bfai->unused_zero << BDRV_SECTOR_BITS; +bfai->unused_discarded = bfai->unused_discarded << BDRV_SECTOR_BITS; + +nbco->ret = 0; +return; + +fail: +nbco->ret = ret; +qemu_co_mutex_unlock(>lock); +} + +/* qcow2_get_format_alloc_stat() + * Fills @bfai struct. In case of failure @bfai content is unpredicted. + */ +int qcow2_get_format_alloc_stat(BlockDriverState *bs, +BlockFormatAllocInfo *bfai) +{ +FormatAllocStatCo nbco = { +.bs = bs, +.bfai = bfai, +.ret = -EINPROGRESS +}; + +if (qemu_in_coroutine()) { +qcow2_get_format_alloc_stat_entry(); +} else { +Coroutine *co = +qemu_coroutine_create(qcow2_get_format_alloc_stat_entry, ); +qemu_coroutine_enter(co); +BDRV_POLL_WHILE(bs, nbco.ret == -EINPROGRESS); +} + +return nbco.ret; +}
Re: [Qemu-devel] [PATCH] 9pfs: include for XATTR_SIZE_MAX
On Fri, Jul 28, 2017 at 02:20:49PM -0300, Philippe Mathieu-Daudé wrote: > > On Mon, Jun 26, 2017 at 12:20 PM, Patrick Steinhardtwrote: > >> The function `v9fs_xattrcreate` makes use of the define `XATTR_SIZE_MAX` > >> to reject attempts of creating xattrs with an invalid size, which is > >> defined in . On glibc-based systems, this header is > >> indirectly included via , , > >> , but on other platforms this is not guaranteed due > >> to not being part of the POSIX standard. One examples are systems based > >> on musl libc, which do not include the indirectly, > >> which leads to `XATTR_SIZE_MAX` being undefined. > >> > >> Fix this error by directly include . As the 9P fs code > >> is being Linux-based either way, we can simply do so without breaking > >> other platforms. This enables building 9pfs on musl-based systems. > >> > >> Signed-off-by: Patrick Steinhardt > > Reviewed-by: Alistair Francis > >> --- > >> hw/9pfs/9p.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > >> index 96d2683348..48cd558e96 100644 > >> --- a/hw/9pfs/9p.c > >> +++ b/hw/9pfs/9p.c > >> @@ -13,6 +13,7 @@ > >> > >> #include "qemu/osdep.h" > >> #include > > This is likely to break on BSD, but now than patchew has a NetBSD job > you can trigger a build RESENDing this patch. > > This should probably work: > > #ifdef __linux__ > > >> +#include > > #endif > > >> #include "hw/virtio/virtio.h" > >> #include "qapi/error.h" > >> #include "qemu/error-report.h" > >> -- > >> 2.13.2 > > Regards, > > Phil. Thanks for the feedback! Is this really relevant in this context, though? Both 9p-local.c and 9p-handle.c already include linux headers, especially is included without any ifdef around. As such, I simply assumed that this code is being built on Linux systems, only. Regards Patrick signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v2 0/4] trace-events: print 0x before hex numbers
Hi, This series seems to have some coding style problems. See output below for more information: Subject: [Qemu-devel] [PATCH v2 0/4] trace-events: print 0x before hex numbers Message-id: 20170729131159.24949-1-vsement...@virtuozzo.com Type: series === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 3dd406d1d0 trace-events: fix code style: print 0x before hex numbers 32dc4922ee checkpatch: check trace-events code style 01905453d5 trace-events: fix code style: %# -> 0x% e08ea289b5 coding_style: add point about 0x in trace-events === OUTPUT BEGIN === Checking PATCH 1/4: coding_style: add point about 0x in trace-events... Checking PATCH 2/4: trace-events: fix code style: %# -> 0x%... Checking PATCH 3/4: checkpatch: check trace-events code style... WARNING: line over 80 characters #25: FILE: scripts/checkpatch.pl:1343: + ERROR("Don't use '#' flag of printf format ('%#') in " . ERROR: line over 90 characters #26: FILE: scripts/checkpatch.pl:1344: + "trace-events, use '0x' prefix instead\n" . $herecurr); ERROR: line over 90 characters #29: FILE: scripts/checkpatch.pl:1347: + qr/%[-+ *.0-9]*([hljztL]|ll|hh)?(x|X|"\s*PRI[xX][^"]*"?)/; ERROR: line over 90 characters #31: FILE: scripts/checkpatch.pl:1349: + # don't consider groups splitted by [.:/ ], like 2A.20:12ab WARNING: line over 80 characters #32: FILE: scripts/checkpatch.pl:1350: + my $tmpline = $rawline =~ s/($hex[.:\/ ])+$hex//gr; WARNING: line over 80 characters #35: FILE: scripts/checkpatch.pl:1353: + ERROR("Hex numbers must be prefixed with '0x'\n" . total: 3 errors, 3 warnings, 25 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 4/4: trace-events: fix code style: print 0x before hex numbers... === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@freelists.org
[Qemu-devel] [PATCH v2 4/4] trace-events: fix code style: print 0x before hex numbers
The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events | xargs python script.py where script.py is the following python script: = #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ *.0-9]*(?:[hljztL]|ll|hh)?(?:x|X|"\s*PRI[xX][^"]*"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?', arr[i]) sys.stdout.write(''.join(arr)) = Signed-off-by: Vladimir Sementsov-Ogievskiy--- accel/tcg/trace-events| 2 +- block/trace-events| 22 +++ hw/audio/trace-events | 4 +- hw/char/trace-events | 12 ++-- hw/display/trace-events | 12 ++-- hw/dma/trace-events | 20 +++--- hw/i386/xen/trace-events | 4 +- hw/input/trace-events | 4 +- hw/intc/trace-events | 156 +++--- hw/isa/trace-events | 4 +- hw/misc/trace-events | 78 +++ hw/net/trace-events | 34 +- hw/nvram/trace-events | 2 +- hw/ppc/trace-events | 64 +-- hw/s390x/trace-events | 20 +++--- hw/scsi/trace-events | 112 - hw/sd/trace-events| 4 +- hw/timer/trace-events | 20 +++--- hw/usb/trace-events | 56 - hw/vfio/trace-events | 44 ++--- hw/virtio/trace-events| 6 +- linux-user/trace-events | 10 +-- migration/trace-events| 36 +-- nbd/trace-events | 18 +++--- net/trace-events | 4 +- target/arm/trace-events | 10 +-- target/s390x/trace-events | 2 +- target/sparc/trace-events | 30 - 28 files changed, 395 insertions(+), 395 deletions(-) diff --git a/accel/tcg/trace-events b/accel/tcg/trace-events index 2de8359670..c22ad60af7 100644 --- a/accel/tcg/trace-events +++ b/accel/tcg/trace-events @@ -4,7 +4,7 @@ # cpu-exec.c disable exec_tb(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR disable exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR -disable exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=%x" +disable exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=0x%x" # translate-all.c translate_block(void *tb, uintptr_t pc, uint8_t *tb_code) "tb:%p, pc:0x%"PRIxPTR", tb_code:%p" diff --git a/block/trace-events b/block/trace-events index 622c754495..8fc853b135 100644 --- a/block/trace-events +++ b/block/trace-events @@ -5,8 +5,8 @@ bdrv_open_common(void *bs, const char *filename, int flags, const char *format_n bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d" # block/block-backend.c -blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x" -blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x" +blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x" +blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags) "blk %p bs %p offset %"PRId64" bytes %u flags 0x%x" # block/io.c bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d" @@ -54,19 +54,19 @@ paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d t paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d" # block/qcow2.c -qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset %" PRIx64 " bytes %d" +qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d" qcow2_writev_done_req(void *co, int ret) "co %p ret %d" qcow2_writev_start_part(void *co) "co %p" qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d" -qcow2_writev_data(void *co, uint64_t offset) "co %p offset %" PRIx64 -qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset %" PRIx64 " count %d" -qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset %" PRIx64 " count %d" +qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64 +qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d" +qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d" # block/qcow2-cluster.c -qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset %" PRIx64 " bytes %d" -qcow2_handle_copied(void *co, uint64_t guest_offset, uint64_t host_offset, uint64_t bytes) "co %p guest_offset %" PRIx64 " host_offset %" PRIx64 " bytes %" PRIx64 -qcow2_handle_alloc(void *co, uint64_t guest_offset, uint64_t host_offset,
[Qemu-devel] [PATCH v2 3/4] checkpatch: check trace-events code style
Accordingly to CODING_STYLE, check that in trace-events: 1. hex numbers are prefixed with '0x' 2. '#' flag of printf is not used 3. The exclusion from 1. are period-separated groups of numbers Signed-off-by: Vladimir Sementsov-Ogievskiy--- scripts/checkpatch.pl | 19 +++ 1 file changed, 19 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 4e91122813..fa478074b8 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -1337,6 +1337,25 @@ sub process { $rpt_cleaners = 1; } +# checks for trace-events files + if ($realfile =~ /trace-events$/ && $line =~ /^\+/) { + if ($rawline =~ /%[-+ 0]*#/) { + ERROR("Don't use '#' flag of printf format ('%#') in " . + "trace-events, use '0x' prefix instead\n" . $herecurr); + } else { + my $hex = + qr/%[-+ *.0-9]*([hljztL]|ll|hh)?(x|X|"\s*PRI[xX][^"]*"?)/; + + # don't consider groups splitted by [.:/ ], like 2A.20:12ab + my $tmpline = $rawline =~ s/($hex[.:\/ ])+$hex//gr; + + if ($tmpline =~ /(?
[Qemu-devel] [PATCH v2 0/4] trace-events: print 0x before hex numbers
Hi all! It is hard to read logs, when there are hex and dec numbers in one line, when hex number doesn't contain any letters and don't have '0x' prefix. So, here is a complete solution for the problem: - add information into CODING_STYLE - add a check into checkpatch.pl - fix current state The new rule for the style is: Hex numbers should be prefixed by '0x', except groups of numbers, separated by symbols ' ', '.', ':', '/', however '0x' can be used for numbers in such groups too. Flag '#' in number format is not allowed. Note: checkpatch fails on checkpatch change (03) due to long lines. It is because checkpatch.pl is indented by tabs and when it checks for long lines it consider tabs as 8 spaces. Looks like nobody cares, so do I. I see two ways here: - s/\t//g - make exclusion in checkpatch.pl for checkpatch.pl to consider tabs as 4 spaces, not 8. However I don't want to fix it in the context of these series. v2: almost everything (style, checkpatch, excluding number groups) v1: was a draft of the idea using two sed commands. Vladimir Sementsov-Ogievskiy (4): coding_style: add point about 0x in trace-events trace-events: fix code style: %# -> 0x% checkpatch: check trace-events code style trace-events: fix code style: print 0x before hex numbers CODING_STYLE | 23 ++ accel/tcg/trace-events| 2 +- audio/trace-events| 4 +- block/trace-events| 28 hw/audio/trace-events | 4 +- hw/char/trace-events | 12 ++-- hw/display/trace-events | 14 ++-- hw/dma/trace-events | 20 +++--- hw/i386/xen/trace-events | 26 +++ hw/input/trace-events | 6 +- hw/intc/trace-events | 176 +++--- hw/isa/trace-events | 4 +- hw/misc/trace-events | 78 ++-- hw/net/trace-events | 52 +++--- hw/nvram/trace-events | 2 +- hw/pci/trace-events | 4 +- hw/ppc/trace-events | 64 - hw/s390x/trace-events | 20 +++--- hw/scsi/trace-events | 118 +++ hw/sd/trace-events| 4 +- hw/timer/trace-events | 20 +++--- hw/usb/trace-events | 56 +++ hw/vfio/trace-events | 44 ++-- hw/virtio/trace-events| 6 +- hw/xen/trace-events | 8 +-- linux-user/trace-events | 10 +-- migration/trace-events| 36 +- nbd/trace-events | 18 ++--- net/trace-events | 4 +- scripts/checkpatch.pl | 19 + target/arm/trace-events | 10 +-- target/s390x/trace-events | 2 +- target/sparc/trace-events | 30 trace-events | 20 +++--- 34 files changed, 493 insertions(+), 451 deletions(-) -- 2.11.1
[Qemu-devel] [PATCH v2 1/4] coding_style: add point about 0x in trace-events
Signed-off-by: Vladimir Sementsov-Ogievskiy--- CODING_STYLE | 23 +++ 1 file changed, 23 insertions(+) diff --git a/CODING_STYLE b/CODING_STYLE index 2fa0c0b65b..2e6a0507be 100644 --- a/CODING_STYLE +++ b/CODING_STYLE @@ -123,3 +123,26 @@ We use traditional C-style /* */ comments and avoid // comments. Rationale: The // form is valid in C99, so this is purely a matter of consistency of style. The checkpatch script will warn you about this. + +8. trace-events style + +In trace-events files use '0x' prefix to specify hex numbers, as in: + +some_trace(unsigned x, uint64_t y) "x 0x%x y 0x" PRIx64 + +The exclusion is a group of numbers, separated by symbols '.', '/', ':', ' ': + +some_other_trace(unsigned a, unsigned b, unsigned c) "num %x.%x.%x" + +However, you can use '0x' for such groups if you want. Anyway, be sure that +it is obvious that numbers are in hex, ex.: + +data_dump(uint8_t c1, uint8_t c2, uint8_t c3) "bytes (in hex): %02x %02x %02x" + +For consistency do not use printf flag '#', like '%#x'. + +Rationale: hex numbers are hard to read in logs when there no 0x prefix, +especially when (occasionally) the representation doesn't contain any letters +and especially in one line with other decimal numbers. Number groups are +allowed to not use '0x' because for some things notations like %x.%x.%x are +used not only in Qemu. Also dumping raw data bytes with '0x' is less readable. -- 2.11.1
[Qemu-devel] [PATCH v2 2/4] trace-events: fix code style: %# -> 0x%
In trace format '#' flag of printf is forbidden. Fix it to '0x%'. This patch is created by the following: check that we have a problem > find . -name trace-events | xargs grep '%#' | wc -l 56 check that there are no cases with additional printf flags > find . -name trace-events | xargs grep '%[-+ 0]+#' | wc -l 0 check that there are no wrong usage of '#' and '0x' together > find . -name trace-events | xargs grep '0x%#' | wc -l 0 fix the problem > find . -name trace-events | xargs sed -i 's/%#/0x%/g' Signed-off-by: Vladimir Sementsov-Ogievskiy--- audio/trace-events | 4 ++-- block/trace-events | 6 +++--- hw/display/trace-events | 2 +- hw/i386/xen/trace-events | 22 +++--- hw/input/trace-events| 2 +- hw/intc/trace-events | 20 ++-- hw/net/trace-events | 18 +- hw/pci/trace-events | 4 ++-- hw/scsi/trace-events | 6 +++--- hw/xen/trace-events | 8 trace-events | 20 ++-- 11 files changed, 56 insertions(+), 56 deletions(-) diff --git a/audio/trace-events b/audio/trace-events index 517359039e..939ab99e76 100644 --- a/audio/trace-events +++ b/audio/trace-events @@ -3,7 +3,7 @@ # audio/alsaaudio.c alsa_revents(int revents) "revents = %d" alsa_pollout(int i, int fd) "i = %d fd = %d" -alsa_set_handler(int events, int index, int fd, int err) "events=%#x index=%d fd=%d err=%d" +alsa_set_handler(int events, int index, int fd, int err) "events=0x%x index=%d fd=%d err=%d" alsa_wrote_zero(int len) "Failed to write %d frames (wrote zero)" alsa_read_zero(long len) "Failed to read %ld frames (read zero)" alsa_xrun_out(void) "Recovering from playback xrun" @@ -13,5 +13,5 @@ alsa_resume_in(void) "Resuming suspended input stream" alsa_no_frames(int state) "No frames available and ALSA state is %d" # audio/ossaudio.c -oss_version(int version) "OSS version = %#x" +oss_version(int version) "OSS version = 0x%x" oss_invalid_available_size(int size, int bufsize) "Invalid available size, size=%d bufsize=%d" diff --git a/block/trace-events b/block/trace-events index 4a4df25323..622c754495 100644 --- a/block/trace-events +++ b/block/trace-events @@ -1,7 +1,7 @@ # See docs/tracing.txt for syntax documentation. # block.c -bdrv_open_common(void *bs, const char *filename, int flags, const char *format_name) "bs %p filename \"%s\" flags %#x format_name \"%s\"" +bdrv_open_common(void *bs, const char *filename, int flags, const char *format_name) "bs %p filename \"%s\" flags 0x%x format_name \"%s\"" bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d" # block/block-backend.c @@ -11,7 +11,7 @@ blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flag # block/io.c bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d" bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d" -bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags %#x" +bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x" bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, unsigned int cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %u" # block/stream.c @@ -100,7 +100,7 @@ qed_need_check_timer_cb(void *s) "s %p" qed_start_need_check_timer(void *s) "s %p" qed_cancel_need_check_timer(void *s) "s %p" qed_aio_complete(void *s, void *acb, int ret) "s %p acb %p ret %d" -qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void *opaque, int flags) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p flags %#x" +qed_aio_setup(void *s, void *acb, int64_t sector_num, int nb_sectors, void *opaque, int flags) "s %p acb %p sector_num %"PRId64" nb_sectors %d opaque %p flags 0x%x" qed_aio_next_io(void *s, void *acb, int ret, uint64_t cur_pos) "s %p acb %p ret %d cur_pos %"PRIu64 qed_aio_read_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu" qed_aio_write_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu" diff --git a/hw/display/trace-events b/hw/display/trace-events index 3e896d2e3f..16d19122f0 100644 --- a/hw/display/trace-events +++ b/hw/display/trace-events @@ -5,7 +5,7 @@ jazz_led_read(uint64_t addr, uint8_t val) "read addr=0x%"PRIx64": 0x%x" jazz_led_write(uint64_t addr, uint8_t new) "write addr=0x%"PRIx64": 0x%x" # hw/display/xenfb.c -xenfb_mouse_event(void *opaque, int dx, int dy, int dz, int button_state, int abs_pointer_wanted) "%p x %d y %d z %d bs %#x abs %d" +xenfb_mouse_event(void *opaque, int dx, int dy, int dz, int button_state, int abs_pointer_wanted) "%p x %d y %d z %d bs 0x%x abs %d"
Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG
On 07/29/2017 07:08 AM, Michael S. Tsirkin wrote: On Thu, Jul 27, 2017 at 10:50:11AM +0800, Wei Wang wrote: OK I thought this over. While we might need these new APIs in the future, I think that at the moment, there's a way to implement this feature that is significantly simpler. Just add each s/g as a separate input buffer. Should it be an output buffer? Hypervisor overwrites these pages with zeroes. Therefore it is writeable by device: DMA_FROM_DEVICE. Why would the hypervisor need to zero the buffer? The page is supplied to hypervisor and can lose the value that is there. That is the definition of writeable by device. I think for the free pages, it should be clear that they will be added as output buffer to the device, because (as we discussed) they are just hints, and some of them may be used by the guest after the report_ API is invoked. The device/hypervisor should not use or discard them. For the balloon pages, I kind of agree with the existing implementation (e.g. inside tell_host()), which uses virtqueue_add_outbuf (instead of _add_inbuf()). I think inbuf should be a buffer which will be written by the device and read by the driver. The cmd buffer put on the vq for the device to send commands can be an inbuf, I think. I think it may only need to read out the info(base,size). And then do what? For the free pages, the info will be used to clear the corresponding "1" in the dirty bitmap. For balloon pages, they will be made DONTNEED and given to other host processes to use (the device won't write them, so no need to set "write" when virtqueue_map_desc() in the device). I think it should be like this: the cmd hdr buffer: input, because the hypervisor would write it to send a cmd to the guest the payload buffer: output, for the hypervisor to read the info These should be split. We have: 1. request that hypervisor sends to guest, includes ID: input 2. synchronisation header with ID sent by guest: output 3. list of pages: input 2 and 3 must be on the same VQ. 1 can come on any VQ - reusing stats VQ might make sense. I would prefer to make the above item 1 come on the the free page vq, because the existing stat_vq doesn't support the cmd hdr. Now, I think it is also not necessary to move the existing stat_vq implementation to a new implementation under the cmd hdr, because that new support doesn't make a difference for stats, it will still use its stat_vq (the free page vq will be used for reporting free pages only) What do you think? Best, Wei
Re: [Qemu-devel] QEMU NVDIMM as type 7 in e820 table
On 07/28/17 13:45 -0600, Ross Zwisler wrote: > On Fri, Jul 28, 2017 at 11:11:10AM -0700, Dan Williams wrote: > > On Fri, Jul 28, 2017 at 11:04 AM, Ross Zwisler > >wrote: > > > I've been using the virtualized NVDIMM support in QEMU for testing, and I > > > noticed that the physical addresses used by the virtual NVDIMMs aren't > > > present > > > in the guest's e820 table. > > > > > > Here is the e820 table on my QEMU instance where I have one 32 GiB virtual > > > NVDIMM: > > > > > > [0.00] e820: BIOS-provided physical RAM map: > > > [0.00] BIOS-e820: [mem 0x-0x0009fbff] > > > usable > > > [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] > > > reserved > > > [0.00] BIOS-e820: [mem 0x000f-0x000f] > > > reserved > > > [0.00] BIOS-e820: [mem 0x0010-0xbffdefff] > > > usable > > > [0.00] BIOS-e820: [mem 0xbffdf000-0xbfff] > > > reserved > > > [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] > > > reserved > > > [0.00] BIOS-e820: [mem 0xfffc-0x] > > > reserved > > > [0.00] BIOS-e820: [mem 0x0001-0x00023fff] > > > usable > > > > > > The physical addresses used by the virtual NVDIMM are > > > 0x24000-0xA4000. > > > You can see this by looking at ndctl and the values we get from the NFIT: > > > > > > # ndctl list -R > > > { > > > "dev":"region0", > > > "size":34359738368, > > > "available_size":0, > > > "type":"pmem" > > > } > > > > > > # grep . /sys/bus/nd/devices/region0/{resource,size} > > > region0/resource:0x24000 > > > region0/size:34359738368 > > > > > > Or you can see the same info by using iasl to dump > > > /sys/firmware/acpi/tables/NFIT: > > > > > > [028h 0040 2]Subtable Type : [System Physical > > > Address Range] > > > [02Ah 0042 2] Length : 0038 > > > > > > [02Ch 0044 2] Range Index : 0002 > > > [02Eh 0046 2]Flags (decoded below) : 0003 > > >Add/Online Operation Only : 1 > > > Proximity Domain Valid : 1 > > > [030h 0048 4] Reserved : > > > [034h 0052 4] Proximity Domain : > > > [038h 0056 16] Address Range GUID : > > > 66F0D379-B4F3-4074-AC43-0D3318B78CDB > > > [048h 0072 8] Address Range Base : 00024000 > > > [050h 0080 8] Address Range Length : 0008 > > > [058h 0088 8] Memory Map Attribute : 8008 > > > > > > I expected to see a type 7 region for the NVDIMM physical address range > > > in the > > > e820 table, so something like: > > > > > > [0.00] e820: BIOS-provided physical RAM map: > > > [0.00] BIOS-e820: [mem 0x-0x0009fbff] > > > usable > > > [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] > > > reserved > > > [0.00] BIOS-e820: [mem 0x000f-0x000f] > > > reserved > > > [0.00] BIOS-e820: [mem 0x0010-0xbffdefff] > > > usable > > > [0.00] BIOS-e820: [mem 0xbffdf000-0xbfff] > > > reserved > > > [0.00] BIOS-e820: [mem 0xfeffc000-0xfeff] > > > reserved > > > [0.00] BIOS-e820: [mem 0xfffc-0x] > > > reserved > > > [0.00] BIOS-e820: [mem 0x0001-0x00023fff] > > > usable > > > [0.00] BIOS-e820: [mem 0x00024000-0x000A4000] > > > persistent (type 7) > > > > > > > Do you need that informationin e820? Linux effectively ignores type-7. > > As long as the range is treated as reserved it's not clear that you > > need the e820 entry. We also infect the persistent type back into the > > memory map when the NFIT driver loads. /proc/iomem should show the > > right data. > > [ Adding Linda & Toshi to see if they have an opinion. ] > > I guess maybe we don't need it. Yep, /proc/iomem looks good: > > # cat /proc/iomem > -0fff : Reserved > 1000-0009fbff : System RAM > ... > 1-23fff : System RAM > 24000-a3fff : Persistent Memory > 24000-a3fff : namespace0.0 > > I was just worried that this was an inconsistency between the way that virtual > NVDIMMs are presented vs the way that they will be presented on bare metal. I > at least look at the e820 table to get my bearings of how memory is laid out - > maybe I just need to look at /proc/iomem instead? Do any OS or applications rely on the E820 information or the consistency between E820 and NFIT to properly work? If any, I can make a QEMU patch to build type-7 e820 entries. Haozhong
Re: [Qemu-devel] 'make help' generate unrelated files
On 29 July 2017 at 02:55, Philippe Mathieu-Daudéwrote: > Is it necessary to generate this files to run 'make help'? > > $ ./configure > [...] > $ make help > CHK version_gen.h > UPD version_gen.h > DEP util.c > CHK version_gen.h This happens because we have a rule that says "Makefile: $(GENERATED_FILES)" which is how we ensure that the autogenerated header files are always built before any .c file is compiled. The unfortunate side effect is that we rebuild the generated files before processing *any* makefile target, even those like 'clean' and 'help' that don't actually require them. I did once have a go at fixing this some years back: http://patchwork.ozlabs.org/patch/365288/ but as you'll note from the last comment by me in that mail thread it had a fatal flaw for the "build from clean" case... thanks -- PMM
Re: [Qemu-devel] tb_invalidate_phys_range and tb_invalidate_phys_page_range
On 29 July 2017 at 01:33, Emilio G. Cotawrote: > While working on the removal of tb_lock, I stumbled upon the following. > > Commit 77a8f1a51 ("linux-user: Fix stale tbs after mmap") introduced > tb_invalidate_phys_range(), which we now have in accel/tcg/translate-all.c > [ patchwork thread here: https://patchwork.ozlabs.org/patch/158556/ ] > > This function calls > tb_invalidate_phys_page_range(). As the name suggests, the latter > function expects the range to be within the same page. > The former does not have this requirement, as stated in the comment > above the function (which I confirmed running some linux-user code): > > +/* > + * invalidate all TBs which intersect with the target physical pages > + * starting in range [start;end[. NOTE: start and end may refer to > + * different physical pages. 'is_cpu_write_access' should be true if called > + * from a real cpu write access: the virtual CPU will exit the current > + * TB if code is modified inside this TB. > + */ > +void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end, > + int is_cpu_write_access) > +{ > +while (start < end) { > +tb_invalidate_phys_page_range(start, end, is_cpu_write_access); > +start &= TARGET_PAGE_MASK; > +start += TARGET_PAGE_SIZE; > +} > +} > > What I find puzzling is that here we pass 'end' unmodified, instead of > making sure the range passed is within a page. > > Is this a bug, or am I missing something? It's a bit odd, but looking at the code I think it doesn't matter. The only thing that tb_invalidate_phys_page_range() uses 'end' for is when it does the "does this TB I've found overlap the range we're flushing" check with if (!(tb_end <= start || tb_start >= end)) If we pass an 'end' value that's larger than it would be if we confined it to the page, this is safe because at worst we'll invalidate more TBs than we needed to (and in practice if the TB we've found is within the full range that tb_invalidate_phys_range() is checking we need to invalidate it anyway). I haven't quite worked it through but I rather suspect that you can't have a TB that's in the list for the PageDesc for 'start' and which overlaps in the "full" [start-end) range but which wouldn't overlap in a truncated [start-(end rounded down to end of page)) range. Basically the reason for the "same page" restriction is that tb_invalidate_phys_page_range() only does a single page_find() for the 'start' address. So as long as we call it once per page in the range it doesn't matter that we pass an overly pessimistic end. This is kind of bordering on "happens to work" territory though, so maybe we could improve it... thanks -- PMM
Re: [Qemu-devel] [PATCH 2/3] COLO: Define COLOMode without QAPI
On 2017/7/29 1:17, Dr. David Alan Gilbert wrote: * Markus Armbruster (arm...@redhat.com) wrote: COLOMode is defined in the QAPI schema, but not used there. Of the stuff QAPI generates for it only the typedef is actually used. Use of QAPI is pointless and only complicates things, so don't. Hmm, in one of the old COLO worlds I have, there's code to emit an event on exiting from COLO and that event includes the mode it was in. Yes, we need it in the later series. If the intent is to bring that or similar back then it would be worth keeping. Agreed. ;) Dave Cc: zhanghailiangSigned-off-by: Markus Armbruster --- include/migration/colo.h | 6 ++ qapi-schema.json | 16 2 files changed, 6 insertions(+), 16 deletions(-) diff --git a/include/migration/colo.h b/include/migration/colo.h index ff9874e..5d7c500 100644 --- a/include/migration/colo.h +++ b/include/migration/colo.h @@ -26,6 +26,12 @@ void migration_incoming_exit_colo(void); void *colo_process_incoming_thread(void *opaque); bool migration_incoming_in_colo_state(void); +typedef enum { +COLO_MODE_UNKNOWN, +COLO_MODE_PRIMARY, +COLO_MODE_SECONDARY, +} COLOMode; + COLOMode get_colo_mode(void); /* failover */ diff --git a/qapi-schema.json b/qapi-schema.json index 9b6f6cb..3f0eb05 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1304,22 +1304,6 @@ 'vmstate-loaded' ] } ## -# @COLOMode: -# -# The colo mode -# -# @unknown: unknown mode -# -# @primary: master side -# -# @secondary: slave side -# -# Since: 2.8 -## -{ 'enum': 'COLOMode', - 'data': [ 'unknown', 'primary', 'secondary'] } - -## # @FailoverStatus: # # An enumeration of COLO failover status -- 2.7.5 -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK .