Re: [Qemu-devel] [PATCH 2/2] tests: Check serial output of firmware boot of some machines
On Fri, Jul 15, 2016 at 09:25:16AM +0200, Thomas Huth wrote: > On 15.07.2016 05:21, David Gibson wrote: > > On Thu, Jul 14, 2016 at 11:57:46AM +0200, Thomas Huth wrote: > >> Some of the machines that we have got a firmware image for write > >> some output to the serial console while booting up. We can use > >> this output to make sure that the machine is basically working, > >> so this adds a test that checks the output of these machines > >> for some well-known "magic" strings. > >> > >> Signed-off-by: Thomas Huth> > > > I love this idea. A couple of queries about the implemntation. > > > >> --- > >> tests/Makefile.include | 8 > >> tests/boot-serial-test.c | 110 > >> +++ > >> 2 files changed, 118 insertions(+) > >> create mode 100644 tests/boot-serial-test.c > >> > >> diff --git a/tests/Makefile.include b/tests/Makefile.include > >> index b7784d3..ba1cc8d 100644 > >> --- a/tests/Makefile.include > >> +++ b/tests/Makefile.include > >> @@ -194,6 +194,7 @@ check-qtest-i386-y += tests/hd-geo-test$(EXESUF) > >> gcov-files-i386-y += hw/block/hd-geometry.c > >> check-qtest-i386-y += tests/boot-order-test$(EXESUF) > >> check-qtest-i386-y += tests/bios-tables-test$(EXESUF) > >> +check-qtest-i386-y += tests/boot-serial-test$(EXESUF) > >> check-qtest-i386-y += tests/pxe-test$(EXESUF) > >> check-qtest-i386-y += tests/rtc-test$(EXESUF) > >> check-qtest-i386-y += tests/ipmi-kcs-test$(EXESUF) > >> @@ -241,6 +242,8 @@ check-qtest-x86_64-y += $(check-qtest-i386-y) > >> gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c > >> gcov-files-x86_64-y = $(subst > >> i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y)) > >> > >> +check-qtest-alpha-y = tests/boot-serial-test$(EXESUF) > >> + > >> check-qtest-mips-y = tests/endianness-test$(EXESUF) > >> check-qtest-mips64-y = tests/endianness-test$(EXESUF) > >> check-qtest-mips64el-y = tests/endianness-test$(EXESUF) > >> @@ -248,12 +251,14 @@ check-qtest-mips64el-y = > >> tests/endianness-test$(EXESUF) > >> check-qtest-ppc-y = tests/endianness-test$(EXESUF) > >> check-qtest-ppc-y += tests/boot-order-test$(EXESUF) > >> check-qtest-ppc-y += tests/prom-env-test$(EXESUF) > >> +check-qtest-ppc-y += tests/boot-serial-test$(EXESUF) > >> > >> check-qtest-ppc64-y = tests/endianness-test$(EXESUF) > >> check-qtest-ppc64-y += tests/boot-order-test$(EXESUF) > >> check-qtest-ppc64-y += tests/spapr-phb-test$(EXESUF) > >> gcov-files-ppc64-y += ppc64-softmmu/hw/ppc/spapr_pci.c > >> check-qtest-ppc64-y += tests/prom-env-test$(EXESUF) > >> +check-qtest-ppc64-y += tests/boot-serial-test$(EXESUF) > >> > >> check-qtest-sh4-y = tests/endianness-test$(EXESUF) > >> check-qtest-sh4eb-y = tests/endianness-test$(EXESUF) > >> @@ -277,6 +282,8 @@ gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c > >> check-qtest-microblazeel-y = $(check-qtest-microblaze-y) > >> check-qtest-xtensaeb-y = $(check-qtest-xtensa-y) > >> > >> +check-qtest-s390x-y = tests/boot-serial-test$(EXESUF) > >> + > >> check-qtest-generic-y += tests/qom-test$(EXESUF) > >> > >> qapi-schema += alternate-any.json > >> @@ -575,6 +582,7 @@ tests/ipmi-kcs-test$(EXESUF): tests/ipmi-kcs-test.o > >> tests/ipmi-bt-test$(EXESUF): tests/ipmi-bt-test.o > >> tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o > >> tests/boot-order-test$(EXESUF): tests/boot-order-test.o $(libqos-obj-y) > >> +tests/boot-serial-test$(EXESUF): tests/boot-serial-test.o $(libqos-obj-y) > >> tests/bios-tables-test$(EXESUF): tests/bios-tables-test.o \ > >>tests/boot-sector.o $(libqos-obj-y) > >> tests/pxe-test$(EXESUF): tests/pxe-test.o tests/boot-sector.o > >> $(libqos-obj-y) > >> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c > >> new file mode 100644 > >> index 000..3263dcd > >> --- /dev/null > >> +++ b/tests/boot-serial-test.c > >> @@ -0,0 +1,110 @@ > >> +/* > >> + * Test serial output of some machines. > >> + * > >> + * Copyright 2016 Thomas Huth, Red Hat Inc. > >> + * > >> + * This work is licensed under the terms of the GNU GPL, version 2 > >> + * or later. See the COPYING file in the top-level directory. > >> + * > >> + * This test is used to check that the serial output of the firmware > >> + * (that we provide for some machines) contains an expected string. > >> + * Thus we check that the firmware still boots at least to a certain > >> + * point and so we know that the machine is not completely broken. > >> + */ > >> + > >> +#include "qemu/osdep.h" > >> +#include "libqtest.h" > >> + > >> +typedef struct testdef { > >> +const char *arch; /* Target architecture */ > >> +const char *machine;/* Name of the machine */ > >> +const char *extra; /* Additional parameters */ > >> +const char *expect; /* Expected string in the serial output */ > >> +} testdef_t; > >> + > >> +static testdef_t tests[] = { > >> +{ "alpha", "clipper", "", "PCI:" }, > >> +{ "ppc", "ppce500", "", "U-Boot" }, > >>
Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than MAX_EVENTS
On Wed, Jul 13, 2016 at 1:45 PM, Kevin Wolfwrote: > Am 13.07.2016 um 13:33 hat Roman Penyaev geschrieben: >> Just to be sure that we are on the same page: >> >> 1. We have this commit "linux-aio: Cancel BH if not needed" which >> >>a) introduces performance regression on my fio workloads on the >> following config: "iothread=1, VCPU=8, MQ=8". Performance >> dropped from 1878MB/s to 1606MB/s with Stefan's fix, that is >> ~14%. > > Do we already understand why the performance regresses with the patch? > As long as we don't, everything we do is just guesswork. Eventually the issue is clear. I test on /dev/nullb0, which completes all submitted bios almost immediately. That means, that after io_submit() is called it is worth trying to check completed requests and not to accumulate them in-flight. That is the theory. On practise happens the following: --- >>> sys_poll <<< sys_poll >>> aio_dispatch >>> aio_bh_poll <<< aio_bh_poll >>> node->io_read !!! ioq_submit(), submitted=98 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=49 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=47 <<< node->io_read <<< aio_dispatch >>> sys_poll <<< sys_poll >>> aio_dispatch >>> aio_bh_poll <<< aio_bh_poll >>> node->io_read !!! ioq_submit(), submitted=50 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=43 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=43 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=8 <<< node->io_read >>> node->io_read ~~~ qemu_laio_completion_bh completed=338 <<< node->io_read <<< aio_dispatch --- * this run gave 1461MB/s * This is the very common hunk of the log which I see running fio load with the "linux-aio: Cancel BH if not needed" patch applied. The important thing which is worth paying attention to is submission of 338 requests (almost whole ring buffer of AIO context) before consuming requests completions. Very fast backend device completes submitted requests almost immediately, but we get a chance to fetch completions only some time later. The following is the common part of the log when "linux-aio: Cancel BH if not needed" is reverted: --- >>> sys_poll <<< sys_poll >>> dispatch >>> aio_bh_poll ~~~ qemu_laio_completion_bh completed=199 <<< aio_bh_poll >>> node->io_read !!! ioq_submit(), submitted=47 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=49 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=50 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=43 <<< node->io_read >>> node->io_read <<< node->io_read <<< dispatch >>> sys_poll <<< sys_poll >>> dispatch >>> aio_bh_poll ~~~ qemu_laio_completion_bh, completed=189 <<< aio_bh_poll >>> node->io_read !!! ioq_submit(), submitted=46 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=46 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=51 <<< node->io_read >>> node->io_read !!! ioq_submit(), submitted=51 <<< node->io_read <<< dispatch --- * this run gave 1805MB/s * According to this part of the log I can say, that completions happen frequently, i.e. we get a chance to fetch completions more often, thus queue is always "refreshed" by new comming requests. To be more precise I collected some statistics: each time I enter qemu_laio_completion_bh() I account the number of collected requests in the bucket, e.g.: "~~~ qemu_laio_completion_bh completed=199" bucket[199] += 1; "~~~ qemu_laio_completion_bh, completed=189" bucket[189] += 1; When fio finishes I have a distribution of number of completed requests which I have observed in the ring buffer. Here is the sheet: https://docs.google.com/spreadsheets/d/12CIt6EKiJLqNx0OHNqiabR-oFBrqkH0LN3mjzZ5jGeo/edit?usp=sharing (Frankly, I could not think of anything better than to send a link on google docs, sorry if that insults someone). There is a chart which shows the whole picture of distribution: o X axis is a number of requests completed at once. o Y axis is a number of times we observe that
Re: [Qemu-devel] [PATCH] ppc: Yet another fix for the huge page support detection mechanism
On Fri, 15 Jul 2016 10:10:25 +0200 Thomas Huthwrote: > Commit 86b50f2e1bef ("Disable huge page support if it is not available > for main RAM") already made sure that huge page support is not announced > to the guest if the normal RAM of non-NUMA configurations is not backed > by a huge page filesystem. However, there is one more case that can go > wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured > with huge page support (and only the memory of a DIMM is configured with > it). When QEMU is started with the following command line for example, > the Linux guest currently crashes because it is trying to use huge pages > on a memory region that does not support huge pages: > > qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \ >memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 > \ >-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \ >-numa node,nodeid=0 -numa node,nodeid=1 > > To fix this issue, we've got to make sure to disable huge page support, > too, when there is a NUMA node that is not using a memory backend with > huge page support. > > Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740 According to http://patchwork.ozlabs.org/patch/584741/ , it is best worded "Broken in commit 86b50f2e1bef" > Signed-off-by: Thomas Huth > --- > target-ppc/kvm.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 884d564..7a8f555 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -389,12 +389,16 @@ static long getrampagesize(void) > > object_child_foreach(memdev_root, find_max_supported_pagesize, ); > > -if (hpsize == LONG_MAX) { > +if (hpsize == LONG_MAX || hpsize == getpagesize()) { > return getpagesize(); > } > > -if (nb_numa_nodes == 0 && hpsize > getpagesize()) { > -/* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! > */ > +/* If NUMA is disabled or the NUMA nodes are not backed with a > + * memory-backend, then there is at least one node using "normal" > + * RAM. And since normal RAM has not been configured with "-mem-path" > + * (what we've checked earlier here already), we can not use huge pages! > + */ > +if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) { Dumb question: why only checking numa_info[0] ? > static bool warned; > if (!warned) { > error_report("Huge page support disabled (n/a for main > memory).");
Re: [Qemu-devel] [PATCH for 2.7 resend] linux-aio: share one LinuxAioState within an AioContext
On Mon, Jul 04, 2016 at 06:33:20PM +0200, Paolo Bonzini wrote: > This has better performance because it executes fewer system calls > and does not use a bottom half per disk. > > Originally proposed by Ming Lei. > > Acked-by: Stefan Hajnoczi> Signed-off-by: Paolo Bonzini > --- > async.c| 23 +++ > block/linux-aio.c | 10 ++-- > block/raw-posix.c | 119 > + > block/raw-win32.c | 2 +- > include/block/aio.h| 13 > {block => include/block}/raw-aio.h | 0 > 6 files changed, 57 insertions(+), 110 deletions(-) > rename {block => include/block}/raw-aio.h (100%) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v3 12/14] virtio-gpu: Use migrate_add_blocker for virgl migration blocking
On Thu, 14 Jul 2016 18:22:54 +0100 "Dr. David Alan Gilbert (git)"wrote: > From: "Dr. David Alan Gilbert" > > virgl conditionally registers a vmstate as unmigratable when virgl > is enabled; instead use the migrate_add_blocker mechanism. > > Signed-off-by: Dr. David Alan Gilbert > --- > hw/display/virtio-gpu.c| 19 +-- > include/hw/virtio/virtio-gpu.h | 2 ++ > 2 files changed, 15 insertions(+), 6 deletions(-) > > @@ -1169,13 +1165,23 @@ static void virtio_gpu_device_realize(DeviceState > *qdev, Error **errp) > } > > if (virtio_gpu_virgl_enabled(g->conf)) { > -vmstate_register(qdev, -1, _virtio_gpu_unmigratable, g); > +error_setg(>migration_blocker, "virgl is not yet migratable"); Suggest prepending with "virtio-gpu:". > +migrate_add_blocker(g->migration_blocker); > } else { > register_savevm(qdev, "virtio-gpu", -1, VIRTIO_GPU_VM_VERSION, > virtio_gpu_save, virtio_gpu_load, g); > } > } In any case, Reviewed-by: Cornelia Huck
[Qemu-devel] [PATCH] net: fix incorrect access to pointer
This is not dereferencing the pointer, and instead checking only the value of the pointer. Signed-off-by: Paolo Bonzini--- net/eth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/eth.c b/net/eth.c index 0be59c2..df81efb 100644 --- a/net/eth.c +++ b/net/eth.c @@ -211,7 +211,7 @@ void eth_get_protocols(const struct iovec *iov, int iovcnt, *l4hdr_off, sizeof(l4hdr_info->hdr.tcp), _info->hdr.tcp); -if (istcp) { +if (*istcp) { *l5hdr_off = *l4hdr_off + TCP_HEADER_DATA_OFFSET(_info->hdr.tcp); -- 2.7.4
[Qemu-devel] [PATCH] vnc-tight: fix regression with libxenstore
commit 095497ff added thread local storage for the color counting palette. Unfortunately, a VncPalette is about 7kB on a x86_64 system. This memory is reserved from the stack of every thread and it exhausted the stack space of a libxenstore thread. Fix this by allocating memory only for the VNC encoding thread. Fixes: 095497ffc66b7f031ff2a17f1e50f5cb105ce588 Reported-by: Juergen GrossTested-by: Juergen Gross Signed-off-by: Peter Lieven --- ui/vnc-enc-tight.c | 28 +--- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/ui/vnc-enc-tight.c b/ui/vnc-enc-tight.c index b8581dd..2b58739 100644 --- a/ui/vnc-enc-tight.c +++ b/ui/vnc-enc-tight.c @@ -1457,11 +1457,17 @@ static int send_sub_rect_jpeg(VncState *vs, int x, int y, int w, int h, } #endif -static __thread VncPalette color_count_palette; +static __thread VncPalette *color_count_palette; +static __thread Notifier vnc_tight_cleanup_notifier; + +static void vnc_tight_cleanup(Notifier *n, void *value) +{ +g_free(color_count_palette); +color_count_palette = NULL; +} static int send_sub_rect(VncState *vs, int x, int y, int w, int h) { -VncPalette *palette = _count_palette; uint32_t bg = 0, fg = 0; int colors; int ret = 0; @@ -1470,6 +1476,12 @@ static int send_sub_rect(VncState *vs, int x, int y, int w, int h) bool allow_jpeg = true; #endif +if (!color_count_palette) { +color_count_palette = g_malloc(sizeof(VncPalette)); +vnc_tight_cleanup_notifier.notify = vnc_tight_cleanup; +qemu_thread_atexit_add(_tight_cleanup_notifier); +} + vnc_framebuffer_update(vs, x, y, w, h, vs->tight.type); vnc_tight_start(vs); @@ -1490,17 +1502,19 @@ static int send_sub_rect(VncState *vs, int x, int y, int w, int h) } #endif -colors = tight_fill_palette(vs, x, y, w * h, , , palette); +colors = tight_fill_palette(vs, x, y, w * h, , , color_count_palette); #ifdef CONFIG_VNC_JPEG if (allow_jpeg && vs->tight.quality != (uint8_t)-1) { -ret = send_sub_rect_jpeg(vs, x, y, w, h, bg, fg, colors, palette, - force_jpeg); +ret = send_sub_rect_jpeg(vs, x, y, w, h, bg, fg, colors, + color_count_palette, force_jpeg); } else { -ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, palette); +ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, + color_count_palette); } #else -ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, palette); +ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, + color_count_palette); #endif return ret; -- 1.9.1
Re: [Qemu-devel] [PATCH v3 12/14] virtio-gpu: Use migrate_add_blocker for virgl migration blocking
On Do, 2016-07-14 at 18:22 +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert"> > virgl conditionally registers a vmstate as unmigratable when virgl > is enabled; instead use the migrate_add_blocker mechanism. > > Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Gerd Hoffmann
Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than MAX_EVENTS
On 15/07/2016 11:18, Roman Penyaev wrote: > Those 3 red spikes and a blue hill is what we have to focus on. The > blue hill at the right corner of the chart means that almost always the > ring buffer was observed as full, i.e. qemu_laio_completion_bh() got > a chance to reap completions not very often, meanwhile completed > requests stand in the ring buffer for quite a long time which degrades > the overall performance. > > The results covered by the red line are much better and that can be > explained by those 3 red spikes, which are almost in the middle of the > whole distribution, i.e. qemu_laio_completion_bh() is called more often, > completed requests do not stall, giving fio a chance to submit new fresh > requests. > > The theoretical fix would be to schedule completion BH just after > successful io_submit, i.e.: What about removing the qemu_bh_cancel but keeping the rest of the patch? I'm also interested in a graph with this patch ("linux-aio: prevent submitting more than MAX_EVENTS") on top of origin/master. Thanks for the analysis. Sometimes a picture _is_ worth a thousand words, even if it's measuring "only" second-order effects (# of completions is not what causes the slowdown, but # of completions affects latency which causes the slowdown). Paolo
Re: [Qemu-devel] [Qemu-block] [PATCH] aio_ctx_check: follow CODING_STYLE
On Fri, Jul 15, 2016 at 09:48:50AM +0800, Cao jin wrote: > On 07/14/2016 10:08 PM, Eric Blake wrote: > > On 07/14/2016 07:10 AM, Cao jin wrote: > > > replace tab with spaces > > > > > > Signed-off-by: Cao jin> > > --- > > > async.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Whitespace-only changes are best done as part of a series that is > > already touching nearby code for other reasons (depending on the size of > > the whitespace changes and on the rest of your patch, it may be okay to > > squash the whitespace change in place, or better to split into separate > > patches to make review of both patches easier). Otherwise, it just > > makes 'git blame' output dirtier. > > I see. > Since async.c & aio-posix.c are belong to the same maintaiers, so, Fam & > Stefan, is it ok to squash this into that "remove useless parameter" patch? > If not, we can just forget this one. The "remove useless parameter" patch doesn't touch the function you are modifying here. Please don't squash the patches. Since you have already posted this patch I will merge it. In the future please don't submit whitespace changes, tiny coding style cleanups, etc in by themselves. Thanks for all your contributions. I do not want to discourage you but my view is that code changes should only be made if they fix a bug, improve performance measurably, add a feature, or significantly improve the code. Every patch has a cost in terms of code review, merging/testing, backporting, bisecting, documentation, etc. We could discuss each of these in detail but basically a code change creates work or takes time from one or more people in these areas. In a perfect world with unlimited resources all patches would be equally welcome. Due to limited resources it's best to submit the types of patches I mentioned above where the cost/benefit ratio is favorable. Thanks, Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
On 15/07/16 12:35, Paolo Bonzini wrote: > > > On 15/07/2016 12:12, Gerd Hoffmann wrote: >> On Fr, 2016-07-15 at 12:02 +0200, Paolo Bonzini wrote: >>> >>> On 15/07/2016 10:47, Juergen Gross wrote: Nothing scaring and no real difference between working and not working variant. Meanwhile I've been digging a little bit deeper and found the reason: libxenstore is setting up a reader thread which is waiting for the watch to fire. With above commit the stack size of that thread (16kB) is too small. Setting it to 32kB made qemu work again. >>> >>> This makes very little sense (not your fault)... The commit doesn't >>> change stack usage at all, TLS should not be on the stack. >>> >>> Can you capture a backtrace where the 16K stack is exceeded? Perhaps >>> it's only due to inlining decision on the compiler, in which case >>> Peter's patch from today is only a bandaid. >> >> Hmm, I guess I hold off the vnc pull request for now ... > > Go ahead. I looked at glibc source code and the patch is okay. Paolo, do you know of any interface to obtain the size of the TLS area taken from the stack (before calling pthread_create() )? This would enable me to modify libxenstore to set the stack size to a sensible value without having to choose a magic number which might fit for qemu, but not for other users of libxenstore in the future. Juergen
Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than MAX_EVENTS
On Fri, Jul 15, 2016 at 12:17 PM, Roman Penyaevwrote: > On Fri, Jul 15, 2016 at 11:58 AM, Paolo Bonzini wrote: >> >> >> On 15/07/2016 11:18, Roman Penyaev wrote: >>> Those 3 red spikes and a blue hill is what we have to focus on. The >>> blue hill at the right corner of the chart means that almost always the >>> ring buffer was observed as full, i.e. qemu_laio_completion_bh() got >>> a chance to reap completions not very often, meanwhile completed >>> requests stand in the ring buffer for quite a long time which degrades >>> the overall performance. >>> >>> The results covered by the red line are much better and that can be >>> explained by those 3 red spikes, which are almost in the middle of the >>> whole distribution, i.e. qemu_laio_completion_bh() is called more often, >>> completed requests do not stall, giving fio a chance to submit new fresh >>> requests. >>> >>> The theoretical fix would be to schedule completion BH just after >>> successful io_submit, i.e.: >> >> What about removing the qemu_bh_cancel but keeping the rest of the patch? > > That exactly what I did. Numbers go to expected from ~1600MB/s to ~1800MB/s. > So basically this hunk of the debatable patch: > > if (event_notifier_test_and_clear(>e)) { > -qemu_bh_schedule(s->completion_bh); > +qemu_laio_completion_bh(s); > } > > does not have any impact and can be ignored. At least I did not notice > anything important. > >> >> I'm also interested in a graph with this patch ("linux-aio: prevent >> submitting more than MAX_EVENTS") on top of origin/master. > > I can plot it also of course. So, finally I have it. Same link: https://docs.google.com/spreadsheets/d/12CIt6EKiJLqNx0OHNqiabR-oFBrqkH0LN3mjzZ5jGeo/edit?usp=sharing last sheet: "1789MB/s" Not that much interesting: almost all the time we complete maximum: MAX_LIMIT requests at once. But of course that expected on such device. Probably other good metrics should be taken into account. -- Roman
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
On 15/07/16 09:39, Peter Lieven wrote: > Am 15.07.2016 um 08:32 schrieb Juergen Gross: >> Commit 095497ffc66b7f031ff2a17f1e50f5cb105ce588 ("vnc-enc-tight: use >> thread local storage for palette") introduced a regression with Xen: >> Since this commit qemu used as a device model is no longer capable >> to register Xenstore watches (that's the effect visible to a user). >> Reverting this commit makes qemu behave well again. I have no idea >> why that commit would have this effect with Xen, may be some memory >> is clobbered? > > I personally have no idea, maybe @Paolo has? > > Maybe the corruption happens somewhere else and is just visible > due to this change. > > Do you see sth when you ran qemu/xen in valgrind? Nothing scaring and no real difference between working and not working variant. Meanwhile I've been digging a little bit deeper and found the reason: libxenstore is setting up a reader thread which is waiting for the watch to fire. With above commit the stack size of that thread (16kB) is too small. Setting it to 32kB made qemu work again. So I'd recommend to have just a thread local palette pointer and allocate the palette when needed and don't free it after using it but keep it for reuse. Do you want to write that patch or should I do it? Juergen
[Qemu-devel] [PATCH] exec: avoid realloc in phys_map_node_reserve
this is the first step in reducing the brk heap fragmentation created by the map->nodes memory allocation. Since the introduction of RCU the freeing of the PhysPageMaps is delayed so that sometimes several hundred are allocated at the same time. Even worse the memory for map->nodes is allocated and shortly afterwards reallocated. Since the number of nodes it grows to in the end is the same for all PhysPageMaps remember this value and at least avoid the reallocation. The large number of simultaneous allocations (about 450 x 70kB in my configuration) has to be addressed later. Signed-off-by: Peter Lieven--- exec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/exec.c b/exec.c index 011babd..60cf46a 100644 --- a/exec.c +++ b/exec.c @@ -187,10 +187,12 @@ struct CPUAddressSpace { static void phys_map_node_reserve(PhysPageMap *map, unsigned nodes) { +static unsigned alloc_hint = 16; if (map->nodes_nb + nodes > map->nodes_nb_alloc) { -map->nodes_nb_alloc = MAX(map->nodes_nb_alloc * 2, 16); +map->nodes_nb_alloc = MAX(map->nodes_nb_alloc, alloc_hint); map->nodes_nb_alloc = MAX(map->nodes_nb_alloc, map->nodes_nb + nodes); map->nodes = g_renew(Node, map->nodes, map->nodes_nb_alloc); +alloc_hint = map->nodes_nb_alloc; } } -- 1.9.1
Re: [Qemu-devel] [PATCH] vnc-tight: fix regression with libxenstore
Am 15.07.2016 um 12:07 schrieb Gerd Hoffmann: > On Fr, 2016-07-15 at 11:45 +0200, Peter Lieven wrote: >> commit 095497ff added thread local storage for the color counting >> palette. Unfortunately, a VncPalette is about 7kB on a x86_64 system. >> This memory is reserved from the stack of every thread and it >> exhausted the stack space of a libxenstore thread. >> >> Fix this by allocating memory only for the VNC encoding thread. > Added to vnc queue. Please wait. Paolo mentioned that TLS is not allocated from the stack. Maybe this patch is ok, but we need a different commit message then. Peter
Re: [Qemu-devel] [PATCH] aio_ctx_check: follow CODING_STYLE
On Thu, Jul 14, 2016 at 09:10:43PM +0800, Cao jin wrote: > replace tab with spaces > > Signed-off-by: Cao jin> --- > async.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan signature.asc Description: PGP signature
[Qemu-devel] [RFC PATCH V6 2/6] colo-base: add colo-base to define and handle packet
COLO-base used by colo-compare and filter-rewriter. this can share common data structure like:net packet, and share other functions. Signed-off-by: Zhang ChenSigned-off-by: Li Zhijian Signed-off-by: Wen Congyang --- net/Makefile.objs | 1 + net/colo-base.c| 74 + net/colo-base.h| 38 + net/colo-compare.c | 119 - trace-events | 3 ++ 5 files changed, 233 insertions(+), 2 deletions(-) create mode 100644 net/colo-base.c create mode 100644 net/colo-base.h diff --git a/net/Makefile.objs b/net/Makefile.objs index ba92f73..119589f 100644 --- a/net/Makefile.objs +++ b/net/Makefile.objs @@ -17,3 +17,4 @@ common-obj-y += filter.o common-obj-y += filter-buffer.o common-obj-y += filter-mirror.o common-obj-y += colo-compare.o +common-obj-y += colo-base.o diff --git a/net/colo-base.c b/net/colo-base.c new file mode 100644 index 000..f5d5de9 --- /dev/null +++ b/net/colo-base.c @@ -0,0 +1,74 @@ +/* + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO) + * (a.k.a. Fault Tolerance or Continuous Replication) + * + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD. + * Copyright (c) 2016 FUJITSU LIMITED + * Copyright (c) 2016 Intel Corporation + * + * Author: Zhang Chen + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "net/colo-base.h" + +int parse_packet_early(Packet *pkt) +{ +int network_length; +uint8_t *data = pkt->data; +uint16_t l3_proto; +ssize_t l2hdr_len = eth_get_l2_hdr_length(data); + +if (pkt->size < ETH_HLEN) { +error_report("pkt->size < ETH_HLEN"); +return 1; +} +pkt->network_layer = data + ETH_HLEN; +l3_proto = eth_get_l3_proto(data, l2hdr_len); +if (l3_proto != ETH_P_IP) { +return 1; +} + +network_length = pkt->ip->ip_hl * 4; +if (pkt->size < ETH_HLEN + network_length) { +error_report("pkt->size < network_layer + network_length"); +return 1; +} +pkt->transport_layer = pkt->network_layer + network_length; +if (!pkt->transport_layer) { +error_report("pkt->transport_layer is valid"); +return 1; +} + +return 0; +} + +Packet *packet_new(const void *data, int size) +{ +Packet *pkt = g_slice_new(Packet); + +pkt->data = g_memdup(data, size); +pkt->size = size; + +return pkt; +} + +void packet_destroy(void *opaque, void *user_data) +{ +Packet *pkt = opaque; + +g_free(pkt->data); +g_slice_free(Packet, pkt); +} + +/* + * Clear hashtable, stop this hash growing really huge + */ +void connection_hashtable_reset(GHashTable *connection_track_table) +{ +g_hash_table_remove_all(connection_track_table); +} diff --git a/net/colo-base.h b/net/colo-base.h new file mode 100644 index 000..48835e7 --- /dev/null +++ b/net/colo-base.h @@ -0,0 +1,38 @@ +/* + * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO) + * (a.k.a. Fault Tolerance or Continuous Replication) + * + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD. + * Copyright (c) 2016 FUJITSU LIMITED + * Copyright (c) 2016 Intel Corporation + * + * Author: Zhang Chen + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_COLO_BASE_H +#define QEMU_COLO_BASE_H + +#include "slirp/slirp.h" +#include "qemu/jhash.h" + +#define HASHTABLE_MAX_SIZE 16384 + +typedef struct Packet { +void *data; +union { +uint8_t *network_layer; +struct ip *ip; +}; +uint8_t *transport_layer; +int size; +} Packet; + +int parse_packet_early(Packet *pkt); +void connection_hashtable_reset(GHashTable *connection_track_table); +Packet *packet_new(const void *data, int size); +void packet_destroy(void *opaque, void *user_data); + +#endif /* QEMU_COLO_BASE_H */ diff --git a/net/colo-compare.c b/net/colo-compare.c index 0402958..7c52cc8 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -27,13 +27,38 @@ #include "sysemu/char.h" #include "qemu/sockets.h" #include "qapi-visit.h" +#include "net/colo-base.h" +#include "trace.h" #define TYPE_COLO_COMPARE "colo-compare" #define COLO_COMPARE(obj) \ OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE) #define COMPARE_READ_LEN_MAX NET_BUFSIZE +#define MAX_QUEUE_SIZE 1024 +/* + + CompareState ++ + | | + +---+ +---+ +---+ + |conn list +--->conn +->conn | + +---+ +---+ +---+ + | | | | | | +
Re: [Qemu-devel] [PATCH v5 04/10] block: Support meta dirty bitmap
On 15.07.2016 14:04, Max Reitz wrote: > On 14.07.2016 22:00, John Snow wrote: >> On 06/22/2016 11:53 AM, Max Reitz wrote: >>> On 03.06.2016 06:32, Fam Zheng wrote: The added group of operations enables tracking of the changed bits in the dirty bitmap. Signed-off-by: Fam Zheng--- block/dirty-bitmap.c | 52 include/block/dirty-bitmap.h | 9 2 files changed, 61 insertions(+) diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c index 628b77c..9c53c56 100644 --- a/block/dirty-bitmap.c +++ b/block/dirty-bitmap.c @@ -38,6 +38,7 @@ */ struct BdrvDirtyBitmap { HBitmap *bitmap;/* Dirty sector bitmap implementation */ +HBitmap *meta; /* Meta dirty bitmap */ BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */ char *name; /* Optional non-empty unique ID */ int64_t size; /* Size of the bitmap (Number of sectors) */ @@ -103,6 +104,56 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, return bitmap; } +/* bdrv_create_meta_dirty_bitmap + * + * Create a meta dirty bitmap that tracks the changes of bits in @bitmap. I.e. + * when a dirty status bit in @bitmap is changed (either from reset to set or + * the other way around), its respective meta dirty bitmap bit will be marked + * dirty as well. >>> >>> Not wrong, but I'd like a note here that this is not an >>> when-and-only-when relationship, i.e. that bits in the meta bitmap may >>> be set even without the tracked bits in the dirty bitmap having changed. >>> >> >> How? >> >> You mean, if the caller manually starts setting things in the meta >> bitmap object? >> >> That's their fault then, isn't it? > > No, I mean something that I mentioned in a reply to some previous > version (the patch adding the test): > > http://lists.nongnu.org/archive/html/qemu-block/2016-03/msg00332.html > > Fam's reply is here: > > http://lists.nongnu.org/archive/html/qemu-block/2016-06/msg00097.html > > (Interesting how that reply took nearly three months and yours took > nearly one, there most be something about this series that makes > replying to replies very cumbersome :-)) I just remembered that it's very much justified now, as you have only recently adopted this series. It's just always funny to get a “What are you talking about?” reply to some nagging I sent out long enough in the past that I can't even remember myself, so I have to look it up, too. Sorry :-) Max > What I meant by “then it would update meta” is that it would update *all > of the range* even though only a single bit has actually been changed. > > So the answer to your “how” is: See patch 2, the changes to > hbitmap_set() (and hbitmap_reset()). If any of the bits in the given > range is changed, all of the range is marked as having changed in the > meta bitmap. > > So all we guarantee is that every time a bit is changed, the > corresponding bit in the meta bitmap will be set. But we do not > guarantee that a bit in the meta bitmap stays cleared as long as the > corresponding range of the underlying bitmap stays the same. > > Max > >> >>> Maybe this should be mentioned somewhere in patch 2, too. Or maybe only >>> in patch 2. >>> + * + * @bitmap: the block dirty bitmap for which to create a meta dirty bitmap. + * @chunk_size: how many bytes of bitmap data does each bit in the meta bitmap + * track. + */ +void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap, + int chunk_size) +{ +assert(!bitmap->meta); +bitmap->meta = hbitmap_create_meta(bitmap->bitmap, + chunk_size * BITS_PER_BYTE); +} + +void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap) +{ +assert(bitmap->meta); +hbitmap_free_meta(bitmap->bitmap); +bitmap->meta = NULL; +} + +int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs, + BdrvDirtyBitmap *bitmap, int64_t sector, + int nb_sectors) +{ +uint64_t i; +int gran = bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS; + +/* To optimize: we can make hbitmap to internally check the range in a + * coarse level, or at least do it word by word. */ >>> >>> We could also multiply gran by the granularity of the meta bitmap. Each >>> bit of the meta bitmap tracks at least eight bits of the dirty bitmap, >>> so we're calling hbitmap_get() at least eight times as often as >>> necessary here. >>> >>> Or we just use int gran = hbitmap_granularity(bitmap->meta);. >>> >>> Not exactly
[Qemu-devel] [PATCH] net: fix incorrect argument to iov_to_buf
Coverity reports a "suspicious sizeof" which is indeed wrong. Signed-off-by: Paolo Bonzini--- net/eth.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/eth.c b/net/eth.c index 95fe15c..0be59c2 100644 --- a/net/eth.c +++ b/net/eth.c @@ -418,7 +418,7 @@ _eth_get_rss_ex_dst_addr(const struct iovec *pkt, int pkt_frags, bytes_read = iov_to_buf(pkt, pkt_frags, rthdr_offset + sizeof(*ext_hdr), -dst_addr, sizeof(dst_addr)); +dst_addr, sizeof(*dst_addr)); return bytes_read == sizeof(dst_addr); } @@ -467,7 +467,7 @@ _eth_get_rss_ex_src_addr(const struct iovec *pkt, int pkt_frags, bytes_read = iov_to_buf(pkt, pkt_frags, opt_offset + sizeof(opthdr), -src_addr, sizeof(src_addr)); +src_addr, sizeof(*src_addr)); return bytes_read == sizeof(src_addr); } -- 2.7.4
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
On 15/07/16 11:03, Peter Lieven wrote: > Am 15.07.2016 um 10:47 schrieb Juergen Gross: >> On 15/07/16 09:39, Peter Lieven wrote: >>> Am 15.07.2016 um 08:32 schrieb Juergen Gross: Commit 095497ffc66b7f031ff2a17f1e50f5cb105ce588 ("vnc-enc-tight: use thread local storage for palette") introduced a regression with Xen: Since this commit qemu used as a device model is no longer capable to register Xenstore watches (that's the effect visible to a user). Reverting this commit makes qemu behave well again. I have no idea why that commit would have this effect with Xen, may be some memory is clobbered? >>> I personally have no idea, maybe @Paolo has? >>> >>> Maybe the corruption happens somewhere else and is just visible >>> due to this change. >>> >>> Do you see sth when you ran qemu/xen in valgrind? >> Nothing scaring and no real difference between working and not working >> variant. >> >> Meanwhile I've been digging a little bit deeper and found the reason: >> libxenstore is setting up a reader thread which is waiting for the >> watch to fire. With above commit the stack size of that thread (16kB) >> is too small. Setting it to 32kB made qemu work again. >> >> So I'd recommend to have just a thread local palette pointer and >> allocate the palette when needed and don't free it after using it but >> keep it for reuse. Do you want to write that patch or should I do it? > > As you like. But as I have introduced this regression, maybe I should > fix it ;-) Sure. > Actually I do not understand what libxenstore confuses about 16 and 32kB, > but I have no knowledge about XEN. However, let me know if this here works > for you: Thanks, is working again. You can add Tested-by: Juergen Grosswhen submitting it. The thread local static variable you added is occupying stack space in each thread started by qemu. As it is rather large (about 6kB on a 64 bit system) the stack of the thread created by libxenstore is exhausted resulting in a failing library call. Juergen > > diff --git a/ui/vnc-enc-tight.c b/ui/vnc-enc-tight.c > index b8581dd..5731cf6 100644 > --- a/ui/vnc-enc-tight.c > +++ b/ui/vnc-enc-tight.c > @@ -1457,11 +1457,18 @@ static int send_sub_rect_jpeg(VncState *vs, int x, > int y, int w, int h, > } > #endif > > -static __thread VncPalette color_count_palette; > +static __thread VncPalette *color_count_palette = NULL; > +static __thread Notifier vnc_tight_cleanup_notifier; > + > +static void vnc_tight_cleanup(Notifier *n, void *value) > +{ > +printf("thread %d: free tight palette %p\n", qemu_get_thread_id(), > color_count_palette); > +g_free(color_count_palette); > +color_count_palette = NULL; > +} > > static int send_sub_rect(VncState *vs, int x, int y, int w, int h) > { > -VncPalette *palette = _count_palette; > uint32_t bg = 0, fg = 0; > int colors; > int ret = 0; > @@ -1470,6 +1477,13 @@ static int send_sub_rect(VncState *vs, int x, int y, > int w, int h) > bool allow_jpeg = true; > #endif > > +if (!color_count_palette) { > +color_count_palette = g_malloc(sizeof(VncPalette)); > +vnc_tight_cleanup_notifier.notify = vnc_tight_cleanup; > +qemu_thread_atexit_add(_tight_cleanup_notifier); > +printf("thread %d: alloc tight palette %p\n", qemu_get_thread_id(), > color_count_palette); > +} > + > vnc_framebuffer_update(vs, x, y, w, h, vs->tight.type); > > vnc_tight_start(vs); > @@ -1490,17 +1504,17 @@ static int send_sub_rect(VncState *vs, int x, int y, > int w, int h) > } > #endif > > -colors = tight_fill_palette(vs, x, y, w * h, , , palette); > +colors = tight_fill_palette(vs, x, y, w * h, , , > color_count_palette); > > #ifdef CONFIG_VNC_JPEG > if (allow_jpeg && vs->tight.quality != (uint8_t)-1) { > -ret = send_sub_rect_jpeg(vs, x, y, w, h, bg, fg, colors, palette, > +ret = send_sub_rect_jpeg(vs, x, y, w, h, bg, fg, colors, > color_count_palette, > force_jpeg); > } else { > -ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, palette); > +ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, > color_count_palette); > } > #else > -ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, palette); > +ret = send_sub_rect_nojpeg(vs, x, y, w, h, bg, fg, colors, > color_count_palette); > #endif > > return ret; > > Peter > >
Re: [Qemu-devel] [PATCH] vnc-enc-tight: fix off-by-one bug
On Di, 2016-07-12 at 17:31 +0800, Herongguang (Stephen) wrote: > In tight_encode_indexed_rect32, buf(or src)’s size is count. In for loop, > the logic is supposed to be that i is an index into src, i should be > incremented when incrementing src. > > This is broken when src is incremented but i is not before while loop, > resulting in off-by-one bug in while loop. > > Signed-off-by: He RongguangAdded to vnc queue. Patch is whitespace mangled, had to use "patch --ignore-whitespace" to get it applied. Can you please use 'git send-email' to send patches in the future? That is the best way to avoid your mail client breaking patches. thanks, Gerd
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
On 15/07/2016 12:12, Gerd Hoffmann wrote: > On Fr, 2016-07-15 at 12:02 +0200, Paolo Bonzini wrote: >> >> On 15/07/2016 10:47, Juergen Gross wrote: >>> Nothing scaring and no real difference between working and not working >>> variant. >>> >>> Meanwhile I've been digging a little bit deeper and found the reason: >>> libxenstore is setting up a reader thread which is waiting for the >>> watch to fire. With above commit the stack size of that thread (16kB) >>> is too small. Setting it to 32kB made qemu work again. >> >> This makes very little sense (not your fault)... The commit doesn't >> change stack usage at all, TLS should not be on the stack. >> >> Can you capture a backtrace where the 16K stack is exceeded? Perhaps >> it's only due to inlining decision on the compiler, in which case >> Peter's patch from today is only a bandaid. > > Hmm, I guess I hold off the vnc pull request for now ... Go ahead. I looked at glibc source code and the patch is okay. Paolo
Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than MAX_EVENTS
On 15/07/2016 12:17, Roman Penyaev wrote: > On Fri, Jul 15, 2016 at 11:58 AM, Paolo Bonziniwrote: >> >> >> On 15/07/2016 11:18, Roman Penyaev wrote: >>> Those 3 red spikes and a blue hill is what we have to focus on. The >>> blue hill at the right corner of the chart means that almost always the >>> ring buffer was observed as full, i.e. qemu_laio_completion_bh() got >>> a chance to reap completions not very often, meanwhile completed >>> requests stand in the ring buffer for quite a long time which degrades >>> the overall performance. >>> >>> The results covered by the red line are much better and that can be >>> explained by those 3 red spikes, which are almost in the middle of the >>> whole distribution, i.e. qemu_laio_completion_bh() is called more often, >>> completed requests do not stall, giving fio a chance to submit new fresh >>> requests. >>> >>> The theoretical fix would be to schedule completion BH just after >>> successful io_submit, i.e.: >> >> What about removing the qemu_bh_cancel but keeping the rest of the patch? > > That exactly what I did. Numbers go to expected from ~1600MB/s to ~1800MB/s. > So basically this hunk of the debatable patch: > > if (event_notifier_test_and_clear(>e)) { > -qemu_bh_schedule(s->completion_bh); > +qemu_laio_completion_bh(s); > } > > does not have any impact and can be ignored. At least I did not notice > anything important. > >> >> I'm also interested in a graph with this patch ("linux-aio: prevent >> submitting more than MAX_EVENTS") on top of origin/master. > > I can plot it also of course. > >> >> Thanks for the analysis. Sometimes a picture _is_ worth a thousand >> words, even if it's measuring "only" second-order effects (# of >> completions is not what causes the slowdown, but # of completions >> affects latency which causes the slowdown). > > Yes, you are right, latency. With userspace io_getevents ~0 costs we > can peek requests as often as we like to decrease latency on very > fast devices. That can also bring something. Probably after each > io_submit() it makes sense to peek and complete something. Right, especially 1) because io_getevents with timeout 0 is cheap (it peeks at the ring buffer before the syscall); 2) because we want anyway to replace io_getevents with userspace code through your other patch. Paolo
Re: [Qemu-devel] [Qemu-block] [PATCH] aio_ctx_check: follow CODING_STYLE
On 07/15/2016 06:40 PM, Stefan Hajnoczi wrote: On Fri, Jul 15, 2016 at 09:48:50AM +0800, Cao jin wrote: On 07/14/2016 10:08 PM, Eric Blake wrote: On 07/14/2016 07:10 AM, Cao jin wrote: replace tab with spaces Signed-off-by: Cao jin--- async.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Whitespace-only changes are best done as part of a series that is already touching nearby code for other reasons (depending on the size of the whitespace changes and on the rest of your patch, it may be okay to squash the whitespace change in place, or better to split into separate patches to make review of both patches easier). Otherwise, it just makes 'git blame' output dirtier. I see. Since async.c & aio-posix.c are belong to the same maintaiers, so, Fam & Stefan, is it ok to squash this into that "remove useless parameter" patch? If not, we can just forget this one. The "remove useless parameter" patch doesn't touch the function you are modifying here. Please don't squash the patches. Since you have already posted this patch I will merge it. In the future please don't submit whitespace changes, tiny coding style cleanups, etc in by themselves. Thanks for all your contributions. I do not want to discourage you but my view is that code changes should only be made if they fix a bug, improve performance measurably, add a feature, or significantly improve the code. Every patch has a cost in terms of code review, merging/testing, backporting, bisecting, documentation, etc. We could discuss each of these in detail but basically a code change creates work or takes time from one or more people in these areas. In a perfect world with unlimited resources all patches would be equally welcome. Due to limited resources it's best to submit the types of patches I mentioned above where the cost/benefit ratio is favorable. Thanks, Stefan Thanks Stefan, and sorry for the inconvenience brought to you. I thought this kind of tiny things would be very simple for maintainers, now I understand -- Yours Sincerely, Cao jin
[Qemu-devel] [RFC PATCH V6 5/6] colo-compare: introduce packet comparison thread
If primary packet is same with secondary packet, we will send primary packet and drop secondary packet, otherwise notify COLO frame to do checkpoint. If primary packet comes and secondary packet not, after REGULAR_PACKET_CHECK_MS milliseconds we set the primary packet as old_packet,then do a checkpoint. Signed-off-by: Zhang ChenSigned-off-by: Li Zhijian Signed-off-by: Wen Congyang --- net/colo-base.c| 1 + net/colo-base.h| 3 + net/colo-compare.c | 216 + trace-events | 2 + 4 files changed, 222 insertions(+) diff --git a/net/colo-base.c b/net/colo-base.c index 7e91dec..eb1b631 100644 --- a/net/colo-base.c +++ b/net/colo-base.c @@ -132,6 +132,7 @@ Packet *packet_new(const void *data, int size) pkt->data = g_memdup(data, size); pkt->size = size; +pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST); return pkt; } diff --git a/net/colo-base.h b/net/colo-base.h index 0505608..06d6dca 100644 --- a/net/colo-base.h +++ b/net/colo-base.h @@ -17,6 +17,7 @@ #include "slirp/slirp.h" #include "qemu/jhash.h" +#include "qemu/timer.h" #define HASHTABLE_MAX_SIZE 16384 @@ -28,6 +29,8 @@ typedef struct Packet { }; uint8_t *transport_layer; int size; +/* Time of packet creation, in wall clock ms */ +int64_t creation_ms; } Packet; typedef struct ConnectionKey { diff --git a/net/colo-compare.c b/net/colo-compare.c index 5f87710..942e326 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -36,6 +36,8 @@ #define COMPARE_READ_LEN_MAX NET_BUFSIZE #define MAX_QUEUE_SIZE 1024 +/* TODO: Should be configurable */ +#define REGULAR_PACKET_CHECK_MS 3000 /* + CompareState ++ @@ -83,6 +85,10 @@ typedef struct CompareState { GQueue unprocessed_connections; /* proxy current hash size */ uint32_t hashtable_size; +/* compare thread, a thread for each NIC */ +QemuThread thread; +/* Timer used on the primary to find packets that are never matched */ +QEMUTimer *timer; } CompareState; typedef struct CompareClass { @@ -170,6 +176,112 @@ static int packet_enqueue(CompareState *s, int mode) return 0; } +/* + * The IP packets sent by primary and secondary + * will be compared in here + * TODO support ip fragment, Out-Of-Order + * return:0 means packet same + *> 0 || < 0 means packet different + */ +static int colo_packet_compare(Packet *ppkt, Packet *spkt) +{ +trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src), + inet_ntoa(ppkt->ip->ip_dst), spkt->size, + inet_ntoa(spkt->ip->ip_src), + inet_ntoa(spkt->ip->ip_dst)); + +if (ppkt->size == spkt->size) { +return memcmp(ppkt->data, spkt->data, spkt->size); +} else { +return -1; +} +} + +static int colo_packet_compare_all(Packet *spkt, Packet *ppkt) +{ +trace_colo_compare_main("compare all"); +return colo_packet_compare(ppkt, spkt); +} + +static void colo_old_packet_check_one(void *opaque_packet, + void *opaque_found) +{ +int64_t now; +bool *found_old = (bool *)opaque_found; +Packet *ppkt = (Packet *)opaque_packet; + +if (*found_old) { +/* Someone found an old packet earlier in the queue */ +return; +} + +now = qemu_clock_get_ms(QEMU_CLOCK_HOST); +if ((now - ppkt->creation_ms) > REGULAR_PACKET_CHECK_MS) { +trace_colo_old_packet_check_found(ppkt->creation_ms); +*found_old = true; +} +} + +static void colo_old_packet_check_one_conn(void *opaque, + void *user_data) +{ +bool found_old = false; +Connection *conn = opaque; + +g_queue_foreach(>primary_list, colo_old_packet_check_one, +_old); +if (found_old) { +/* do checkpoint will flush old packet */ +/* TODO: colo_notify_checkpoint();*/ +} +} + +/* + * Look for old packets that the secondary hasn't matched, + * if we have some then we have to checkpoint to wake + * the secondary up. + */ +static void colo_old_packet_check(void *opaque) +{ +CompareState *s = opaque; + +g_queue_foreach(>conn_list, colo_old_packet_check_one_conn, NULL); +} + +/* + * called from the compare thread on the primary + * for compare connection + */ +static void colo_compare_connection(void *opaque, void *user_data) +{ +CompareState *s = user_data; +Connection *conn = opaque; +Packet *pkt = NULL; +GList *result = NULL; +int ret; + +while (!g_queue_is_empty(>primary_list) && + !g_queue_is_empty(>secondary_list)) { +pkt = g_queue_pop_tail(>primary_list); +result = g_queue_find_custom(>secondary_list, + pkt, (GCompareFunc)colo_packet_compare_all); + +if (result) { +
[Qemu-devel] [PATCH] e1000e: fix incorrect access to pointer
This is not dereferencing the pointer, and instead checking only the value of the pointer. Signed-off-by: Paolo Bonzini--- hw/net/e1000e_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 6050d8b..badb1fe 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -281,7 +281,7 @@ e1000e_intrmgr_delay_rx_causes(E1000ECore *core, uint32_t *causes) /* Check if delayed RX interrupts disabled by client or if there are causes that cannot be delayed */ -if ((rdtr == 0) || (causes != 0)) { +if ((rdtr == 0) || (*causes != 0)) { return false; } @@ -322,7 +322,7 @@ e1000e_intrmgr_delay_tx_causes(E1000ECore *core, uint32_t *causes) *causes &= ~delayable_causes; /* If there are causes that cannot be delayed */ -if (causes != 0) { +if (*causes != 0) { return false; } -- 2.7.4
Re: [Qemu-devel] [PATCH 1/1] spapr: Ensure CPU cores are added contiguously and removed in LIFO order
On Fri, 15 Jul 2016 15:29:01 +1000 David Gibsonwrote: > On Thu, Jul 14, 2016 at 10:27:15AM +0200, Igor Mammedov wrote: > > On Thu, 14 Jul 2016 10:51:27 +1000 > > David Gibson wrote: > > > > > On Wed, Jul 13, 2016 at 12:20:20PM +0530, Bharata B Rao wrote: > > > > If CPU core addition or removal is allowed in random order leading to > > > > holes in the core id range (and hence in the cpu_index range), migration > > > > can fail as migration with holes in cpu_index range isn't yet handled > > > > correctly. > > > > > > > > Prevent this situation by enforcing the addition in contiguous order > > > > and removal in LIFO order so that we never end up with holes in > > > > cpu_index range. > > > > > > > > Signed-off-by: Bharata B Rao > > > > --- > > > > While there is work in progress to support migration when there are > > > > holes > > > > in cpu_index range resulting from out-of-order plug or unplug, this > > > > patch > > > > is intended as a last resort if no easy, risk-free and elegant solution > > > > emerges before 2.7 dev cycle ends. > > > > > > Applied to ppc-for-2.7. We can revert it once the problems with > > > cpu_index are sorted out. > > You'd need to add machine type specific compat option here, > > so that new-qemu -M 2.7 wouldn't allow out of order too and > > could be migrated to old-qemu -M 2.7 > > Hmm, do we care about migration from newer back to older versions of > qemu upstream? upstream we don't, though it would reduce maintenance head-ache. (if isn't hard then why not do it upstream either)
[Qemu-devel] [PATCH] checkpatch: consider git extended headers valid patches
Renames look like this with git-diff(1) when diff.renames = true is set: diff --git a/a b/b similarity index 100% rename from a rename to b This raises the "Does not appear to be a unified-diff format patch" error because checkpatch.pl only considers a diff valid if it contains at least one "@@" hunk. This patch accepts renames and copies too so that checkpatch.pl exits successfully when a diff only renames/copies files. The git diff extended header format is described on the git-diff(1) man page. Reported-by: Colin LordSigned-off-by: Stefan Hajnoczi --- scripts/checkpatch.pl | 5 + 1 file changed, 5 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index cf32c8f..afa7f79 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -1279,6 +1279,11 @@ sub process { } } +# Accept git diff extended headers as valid patches + if ($line =~ /^(?:rename|copy) (?:from|to) [\w\/\.\-]+\s*$/) { + $is_patch = 1; + } + #check the patch for a signoff: if ($line =~ /^\s*signed-off-by:/i) { # This is a signoff, if ugly, so do not double report. -- 2.7.4
Re: [Qemu-devel] [PATCH 7/8] pc: acpi: memhp: nvdimm hotplug support
On Fri, Jul 15, 2016 at 03:49:12PM +0800, Xiao Guangrong wrote: > > > On 07/14/2016 08:17 PM, Stefan Hajnoczi wrote: > > On Mon, Jul 11, 2016 at 09:45:17PM +0800, Xiao Guangrong wrote: > > > diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c > > > index 73fa62d..d1c2e92 100644 > > > --- a/hw/acpi/memory_hotplug.c > > > +++ b/hw/acpi/memory_hotplug.c > > > @@ -239,11 +239,6 @@ void acpi_memory_plug_cb(HotplugHandler > > > *hotplug_dev, MemHotplugState *mem_st, > > >DeviceState *dev, Error **errp) > > > { > > > MemStatus *mdev; > > > -DeviceClass *dc = DEVICE_GET_CLASS(dev); > > > - > > > -if (!dc->hotpluggable) { > > > -return; > > > -} > > > > > > mdev = acpi_memory_slot_status(mem_st, dev, errp); > > > if (!mdev) { > > > > Did you mean to include this hunk in the patch? Looks like something > > left over from debugging/prototyping. > > > > As all memory devices, both pc-dimm and nvdimm, have supported hotplug now, i > think this chunk can be removed. > > In fact, this check was introduced by nvdimm. :) Fair enough. Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v3 13/14] virtio-gpu: Wrap in vmstate
On Do, 2016-07-14 at 18:22 +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert"> > Forcibly convert it to a vmstate wrapper; proper conversion > comes later. > > Signed-off-by: Dr. David Alan Gilbert > Reviewed-by: Cornelia Huck > --- > hw/display/virtio-gpu.c | 17 +++-- > 1 file changed, 7 insertions(+), 10 deletions(-) Reviewed-by: Gerd Hoffmann
Re: [Qemu-devel] [PATCH v3 11/14] virtio-input: Wrap in vmstate
On Do, 2016-07-14 at 18:22 +0100, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert"> > Forcibly convert it to a vmstate wrapper; proper conversion > comes later. > > Signed-off-by: Dr. David Alan Gilbert > Reviewed-by: Cornelia Huck Reviewed-by: Gerd Hoffmann
[Qemu-devel] [PATCH v4] aio-posix: remove useless parameter
Parameter **errp of aio_context_setup() is useless, remove it and clean up the related code. Cc: Stefan HajnocziCc: Fam Zheng Cc: Eric Blake Signed-off-by: Cao jin --- aio-posix.c | 3 ++- aio-win32.c | 2 +- async.c | 8 ++-- include/block/aio.h | 2 +- 4 files changed, 6 insertions(+), 9 deletions(-) v4 changelog: 1. replace plain errno with strerror(errno) (Eric) diff --git a/aio-posix.c b/aio-posix.c index 6006122..43162a9 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -485,12 +485,13 @@ bool aio_poll(AioContext *ctx, bool blocking) return progress; } -void aio_context_setup(AioContext *ctx, Error **errp) +void aio_context_setup(AioContext *ctx) { #ifdef CONFIG_EPOLL_CREATE1 assert(!ctx->epollfd); ctx->epollfd = epoll_create1(EPOLL_CLOEXEC); if (ctx->epollfd == -1) { +fprintf(stderr, "Failed to create epoll instance: %s", strerror(errno)); ctx->epoll_available = false; } else { ctx->epoll_available = true; diff --git a/aio-win32.c b/aio-win32.c index 6aaa32a..c8c249e 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -371,6 +371,6 @@ bool aio_poll(AioContext *ctx, bool blocking) return progress; } -void aio_context_setup(AioContext *ctx, Error **errp) +void aio_context_setup(AioContext *ctx) { } diff --git a/async.c b/async.c index fb7dd92..8589017 100644 --- a/async.c +++ b/async.c @@ -327,14 +327,10 @@ AioContext *aio_context_new(Error **errp) { int ret; AioContext *ctx; -Error *local_err = NULL; ctx = (AioContext *) g_source_new(_source_funcs, sizeof(AioContext)); -aio_context_setup(ctx, _err); -if (local_err) { -error_propagate(errp, local_err); -goto fail; -} +aio_context_setup(ctx); + ret = event_notifier_init(>notifier, false); if (ret < 0) { error_setg_errno(errp, -ret, "Failed to initialize event notifier"); diff --git a/include/block/aio.h b/include/block/aio.h index 88a64ee..0922b69 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -439,6 +439,6 @@ static inline bool aio_node_check(AioContext *ctx, bool is_external) * * Initialize the aio context. */ -void aio_context_setup(AioContext *ctx, Error **errp); +void aio_context_setup(AioContext *ctx); #endif -- 2.1.0
Re: [Qemu-devel] [PATCH] vnc-tight: fix regression with libxenstore
On 15/07/2016 12:10, Peter Lieven wrote: > Am 15.07.2016 um 12:07 schrieb Gerd Hoffmann: >> On Fr, 2016-07-15 at 11:45 +0200, Peter Lieven wrote: >>> commit 095497ff added thread local storage for the color counting >>> palette. Unfortunately, a VncPalette is about 7kB on a x86_64 system. >>> This memory is reserved from the stack of every thread and it >>> exhausted the stack space of a libxenstore thread. >>> >>> Fix this by allocating memory only for the VNC encoding thread. >> Added to vnc queue. > > Please wait. Paolo mentioned that TLS is not allocated from the stack. Actually it does---which is not a problem, but then the stack size from pthread attributes should be increased IMHO. Anyway, the patch is okay. Paolo > Maybe this patch is ok, but we need a different commit message then.
[Qemu-devel] [RFC PATCH V6 1/6] colo-compare: introduce colo compare initialization
This a COLO net ascii figure: Primary qemu Secondary qemu +--+ ++ | +-+ | | +---+ | | | | | | | | | | |guest| | | | guest | | | | | | | | | | | +---^--+--+ | | +-+++ | | | | | | ^| | | | | | | || | | | +--+ | || | |netfilter| | | || | netfilter|| | | +--+ ---+|| | +---+ | | | | | |||| | | || filter excute order | | | | | | |||| | | || +---> | | | | | | |||| | | || TCP | | | | +-+--+--+ +--v-+ | ++ || | | ++ +---++---v+rewriter++ ++ | | | | | | || | || || | | | | || | || | | | | | filter | | filter +> colo <+ +> filter +--> adjust | adjust +--> filter | | | | | | mirror | | redirector | | | compare | | || | | redirector | | ack| seq| | redirector | | | | | | | || | || | || | | | || | || | | | | +^--+ ++ | +-+--+ | || | ++ ++--+ +---++ | | | | | tx rx | || || | txall | rx | | | | || || || +---+ | | | || || || || | | | filter excute order | || || || | | | +---> | || ++| | +---+ || | | |||| | | +--+ ++ |guest receive |guest send || ++v+ | | NOTE: filter direction is rx/tx/all | tap | rx:receive packets sent to the netdev | | tx:receive packets sent by the netdev +--+ In COLO-compare. Packets coming from the primary char indev will be sent to outdev Packets coming from the secondary char dev will be dropped colo-comapre need two input chardev and one output chardev: primary_in=chardev1-id secondary_in=chardev2-id outdev=chardev3-id usage: primary: -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device
[Qemu-devel] [RFC PATCH V6 3/6] Jhash: add linux kernel jhashtable in qemu
Jhash used by colo-compare and filter-rewriter to save and lookup net connection info Signed-off-by: Zhang ChenSigned-off-by: Li Zhijian Signed-off-by: Wen Congyang --- include/qemu/jhash.h | 61 1 file changed, 61 insertions(+) create mode 100644 include/qemu/jhash.h diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h new file mode 100644 index 000..0fcd875 --- /dev/null +++ b/include/qemu/jhash.h @@ -0,0 +1,61 @@ +/* jhash.h: Jenkins hash support. + * + * Copyright (C) 2006. Bob Jenkins (bob_jenk...@burtleburtle.net) + * + * http://burtleburtle.net/bob/hash/ + * + * These are the credits from Bob's sources: + * + * lookup3.c, by Bob Jenkins, May 2006, Public Domain. + * + * These are functions for producing 32-bit hashes for hash table lookup. + * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final() + * are externally useful functions. Routines to test the hash are +included + * if SELF_TEST is defined. You can use this free for any purpose. +It's in + * the public domain. It has no warranty. + * + * Copyright (C) 2009-2010 Jozsef Kadlecsik (kad...@blackhole.kfki.hu) + * + * I've modified Bob's hash to be useful in the Linux kernel, and + * any bugs present are my fault. + * Jozsef + */ + +#ifndef QEMU_JHASH_H__ +#define QEMU_JHASH_H__ + +#include "qemu/bitops.h" + +/* + * hashtable relation copy from linux kernel jhash + */ + +/* __jhash_mix -- mix 3 32-bit values reversibly. */ +#define __jhash_mix(a, b, c)\ +{ \ +a -= c; a ^= rol32(c, 4); c += b; \ +b -= a; b ^= rol32(a, 6); a += c; \ +c -= b; c ^= rol32(b, 8); b += a; \ +a -= c; a ^= rol32(c, 16); c += b; \ +b -= a; b ^= rol32(a, 19); a += c; \ +c -= b; c ^= rol32(b, 4); b += a; \ +} + +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ +#define __jhash_final(a, b, c) \ +{ \ +c ^= b; c -= rol32(b, 14); \ +a ^= c; a -= rol32(c, 11); \ +b ^= a; b -= rol32(a, 25); \ +c ^= b; c -= rol32(b, 16); \ +a ^= c; a -= rol32(c, 4); \ +b ^= a; b -= rol32(a, 14); \ +c ^= b; c -= rol32(b, 24); \ +} + +/* An arbitrary initial parameter */ +#define JHASH_INITVAL 0xdeadbeef + +#endif /* QEMU_JHASH_H__ */ -- 2.7.4
Re: [Qemu-devel] [PATCH] Move README to markdown
On Fri, Jul 15, 2016 at 12:31:11AM -0400, Pranith Kumar wrote: > Move the README file to markdown so that it makes the github page look > prettier. I know that github repo is a mirror and not the official > repo, but I think it doesn't hurt to have it in markdown format. > > Signed-off-by: Pranith Kumar> --- > README => README.md | 41 - > 1 file changed, 20 insertions(+), 21 deletions(-) > rename README => README.md (85%) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
Re: [Qemu-devel] [SeaBIOS] [PATCH 5/5] [wip] sercon: initial split-output implementation
Hi, > I'm okay with the cut-and-paste. But, another option would be to use > the iretw at the end of the existing irqentry_extrastack to implement > the ljmpw into the main vgabios. Something like (totally untested): > > entry_10_hooked: > pushfw // Setup for iretw in irqentry_arg > pushl %cs:sercon_int10_hook_resume > > pushl $handle_10 > #if CONFIG_ENTRY_EXTRASTACK > jmp irqentry_arg_extrastack > #else > jmp irqentry_arg > #endif Good idea, I'll try it. > Separately, have you considered choosing a separate entry point for > entry_10_hooked. That is, changing the above pushl to > $handle_sercon_hooked and introducing that function in sercon.c. It > seems it would reduce a number of "if (!sercon_splitmode())" checks in > the main code, as handle_sercon_hooked() could just use a smaller > switch statement and ignore requests it doesn't need to support. Makes sense indeed. All functions which only return information are not needed in the splitmode case. > Finally, one high level observation is that we know there are a number > of quirks in various vgabios emulators. For example, we know some > emulators don't handle certain 32bit instructions when in 16bit mode > (hence scripts/vgafixup.py), we know some versions of Windows use an > emulator that doesn't like some stack relative instructions (hence the > vgabios is compiled without -fomit-frame-pointer), and we know Windows > Vista doesn't like the extra stack in high ram (the skifree bug). Any > thoughts on what happens with these quirks if the main seabios code > hooks int10? Good question. Do the emulators (both win, xorg) use the int10 vector set by seabios in the first place? Or go they load the vgabios and run it, including the initialization, and use whatever entry point the init code sets up? I suspect it is the latter. But needs investigation and testing. /me places the item on the todo list. Also a serial console for windows guests isn't that useful, so I wouldn't worry too much about windows emulator issues. cheers, Gerd
Re: [Qemu-devel] [PATCH v3 00/14] virtio migration: Flip outer layer to vmstate
On Thu, 14 Jul 2016 18:22:42 +0100 "Dr. David Alan Gilbert (git)"wrote: > From: "Dr. David Alan Gilbert" > > Hi, > This series converts the outer most layer of virtio to > use VMState macros; this is the easy bit, but I'm hoping that > having done that, the next trick is to nibble away at the virtio_save/load > functions and all of the zillions of device/bus helpers. Looks good, and I'd like to see this in 2.7.
[Qemu-devel] [PATCH] AioContext: correct comments
Correct comments of field notify_me Cc: Kevin WolfCc: Max Reitz Signed-off-by: Cao jin --- include/block/aio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/block/aio.h b/include/block/aio.h index 0922b69..f89a1df 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -71,7 +71,7 @@ struct AioContext { * event_notifier_set necessary. * * Bit 0 is reserved for GSource usage of the AioContext, and is 1 - * between a call to aio_ctx_check and the next call to aio_ctx_dispatch. + * between a call to aio_ctx_prepare and the next call to aio_ctx_check. * Bits 1-31 simply count the number of active calls to aio_poll * that are in the prepare or poll phase. * -- 2.1.0
Re: [Qemu-devel] [PATCH] ppc: Yet another fix for the huge page support detection mechanism
On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote: > Commit 86b50f2e1bef ("Disable huge page support if it is not available > for main RAM") already made sure that huge page support is not announced > to the guest if the normal RAM of non-NUMA configurations is not backed > by a huge page filesystem. However, there is one more case that can go > wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured > with huge page support (and only the memory of a DIMM is configured with > it). When QEMU is started with the following command line for example, > the Linux guest currently crashes because it is trying to use huge pages > on a memory region that does not support huge pages: > > qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \ >memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 > \ >-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \ >-numa node,nodeid=0 -numa node,nodeid=1 > > To fix this issue, we've got to make sure to disable huge page support, > too, when there is a NUMA node that is not using a memory backend with > huge page support. > > Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740 > Signed-off-by: Thomas Huth> --- > target-ppc/kvm.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 884d564..7a8f555 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -389,12 +389,16 @@ static long getrampagesize(void) > > object_child_foreach(memdev_root, find_max_supported_pagesize, ); > > -if (hpsize == LONG_MAX) { > +if (hpsize == LONG_MAX || hpsize == getpagesize()) { > return getpagesize(); > } > > -if (nb_numa_nodes == 0 && hpsize > getpagesize()) { > -/* No NUMA nodes and normal RAM without -mem-path ==> no huge pages! > */ > +/* If NUMA is disabled or the NUMA nodes are not backed with a > + * memory-backend, then there is at least one node using "normal" > + * RAM. And since normal RAM has not been configured with "-mem-path" > + * (what we've checked earlier here already), we can not use huge pages! > + */ > +if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) { Is that second clause sufficient, or do you need to loop through and check the memdev of every node? > static bool warned; > if (!warned) { > error_report("Huge page support disabled (n/a for main > memory)."); -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
Am 15.07.2016 um 12:02 schrieb Paolo Bonzini: > > On 15/07/2016 10:47, Juergen Gross wrote: >> Nothing scaring and no real difference between working and not working >> variant. >> >> Meanwhile I've been digging a little bit deeper and found the reason: >> libxenstore is setting up a reader thread which is waiting for the >> watch to fire. With above commit the stack size of that thread (16kB) >> is too small. Setting it to 32kB made qemu work again. > This makes very little sense (not your fault)... The commit doesn't > change stack usage at all, TLS should not be on the stack. But we still allocate the VncPalette for every thread, right? Even if it has nothing todo with VNC. Peter
Re: [Qemu-devel] [PATCH] vnc-tight: fix regression with libxenstore
On Fr, 2016-07-15 at 11:45 +0200, Peter Lieven wrote: > commit 095497ff added thread local storage for the color counting > palette. Unfortunately, a VncPalette is about 7kB on a x86_64 system. > This memory is reserved from the stack of every thread and it > exhausted the stack space of a libxenstore thread. > > Fix this by allocating memory only for the VNC encoding thread. Added to vnc queue. thanks, Gerd
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
Am 15.07.2016 um 12:12 schrieb Paolo Bonzini: > > On 15/07/2016 12:07, Peter Lieven wrote: >> Am 15.07.2016 um 12:02 schrieb Paolo Bonzini: >>> On 15/07/2016 10:47, Juergen Gross wrote: Nothing scaring and no real difference between working and not working variant. Meanwhile I've been digging a little bit deeper and found the reason: libxenstore is setting up a reader thread which is waiting for the watch to fire. With above commit the stack size of that thread (16kB) is too small. Setting it to 32kB made qemu work again. >>> This makes very little sense (not your fault)... The commit doesn't >>> change stack usage at all, TLS should not be on the stack. >> But we still allocate the VncPalette for every thread, right? Even >> if it has nothing todo with VNC. > Yes, I'm just trying to understand the root cause. Which is that glibc > actually does carve out TLS space from the requested stack size. That > means that a program that has a lot of TLS variables, or has big TLS > variables, will fail in weird ways. > > So that's two reasons why your patch is okay. :) Okay, then I will wait for the analysis and then resubmit that Patch with a different commit message. Peter
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
On 15/07/2016 12:07, Peter Lieven wrote: > Am 15.07.2016 um 12:02 schrieb Paolo Bonzini: >> >> On 15/07/2016 10:47, Juergen Gross wrote: >>> Nothing scaring and no real difference between working and not working >>> variant. >>> >>> Meanwhile I've been digging a little bit deeper and found the reason: >>> libxenstore is setting up a reader thread which is waiting for the >>> watch to fire. With above commit the stack size of that thread (16kB) >>> is too small. Setting it to 32kB made qemu work again. >> This makes very little sense (not your fault)... The commit doesn't >> change stack usage at all, TLS should not be on the stack. > > But we still allocate the VncPalette for every thread, right? Even > if it has nothing todo with VNC. Yes, I'm just trying to understand the root cause. Which is that glibc actually does carve out TLS space from the requested stack size. That means that a program that has a lot of TLS variables, or has big TLS variables, will fail in weird ways. So that's two reasons why your patch is okay. :) Paolo
[Qemu-devel] [PATCH] linux-user: Fix type for SIOCATMARK ioctl
The SIOCATMARK ioctl takes an argument which should be a pointer to an integer where the kernel will write the result. We were incorrectly declaring it as TYPE_NULL which would mean it would always fail (with EFAULT) when it should succeed. Correct the type. Signed-off-by: Peter Maydell--- This fixes a failure in the LTP sockioctl01 test. linux-user/ioctls.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h index 4b36baa..7e2c133 100644 --- a/linux-user/ioctls.h +++ b/linux-user/ioctls.h @@ -120,7 +120,7 @@ MK_PTR(MK_STRUCT(STRUCT_fiemap))) #endif - IOCTL(SIOCATMARK, 0, TYPE_NULL) + IOCTL(SIOCATMARK, IOC_R, MK_PTR(TYPE_INT)) IOCTL(SIOCGIFNAME, IOC_RW, MK_PTR(TYPE_INT)) IOCTL(SIOCGIFFLAGS, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_short_ifreq))) IOCTL(SIOCSIFFLAGS, IOC_W, MK_PTR(MK_STRUCT(STRUCT_short_ifreq))) -- 1.9.1
Re: [Qemu-devel] [PATCH v3 1/1] linux-aio: prevent submitting more than MAX_EVENTS
On Wed, Jul 13, 2016 at 03:03:24PM +0200, Roman Pen wrote: > Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us > with specified number of events. But kernel ring buffer allocation logic > is a bit tricky (ring buffer is page size aligned + some percpu allocation > are required) so eventually more than requested events number is allocated. > > From a userspace side we have to follow the convention and should not try > to io_submit() more or logic, which consumes completed events, should be > changed accordingly. The pitfall is in the following sequence: > > MAX_EVENTS = 128 > io_setup(MAX_EVENTS) > > io_submit(MAX_EVENTS) > io_submit(MAX_EVENTS) > > /* now 256 events are in-flight */ > > io_getevents(MAX_EVENTS) = 128 > > /* we can handle only 128 events at once, to be sure > * that nothing is pended the io_getevents(MAX_EVENTS) > * call must be invoked once more or hang will happen. */ > > To prevent the hang or reiteration of io_getevents() call this patch > restricts the number of in-flights, which is now limited to MAX_EVENTS. > > Signed-off-by: Roman Pen> Reviewed-by: Fam Zheng > Reviewed-by: Paolo Bonzini > Cc: Stefan Hajnoczi > Cc: qemu-devel@nongnu.org > --- > v3: > o comment tweaks. > > v2: > o comment tweaks. > o fix QEMU coding style. > > block/linux-aio.c | 26 -- > 1 file changed, 16 insertions(+), 10 deletions(-) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH v5 04/10] block: Support meta dirty bitmap
On 14.07.2016 22:00, John Snow wrote: > On 06/22/2016 11:53 AM, Max Reitz wrote: >> On 03.06.2016 06:32, Fam Zheng wrote: >>> The added group of operations enables tracking of the changed bits in >>> the dirty bitmap. >>> >>> Signed-off-by: Fam Zheng>>> --- >>> block/dirty-bitmap.c | 52 >>> >>> include/block/dirty-bitmap.h | 9 >>> 2 files changed, 61 insertions(+) >>> >>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c >>> index 628b77c..9c53c56 100644 >>> --- a/block/dirty-bitmap.c >>> +++ b/block/dirty-bitmap.c >>> @@ -38,6 +38,7 @@ >>> */ >>> struct BdrvDirtyBitmap { >>> HBitmap *bitmap;/* Dirty sector bitmap implementation */ >>> +HBitmap *meta; /* Meta dirty bitmap */ >>> BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status >>> */ >>> char *name; /* Optional non-empty unique ID */ >>> int64_t size; /* Size of the bitmap (Number of sectors) >>> */ >>> @@ -103,6 +104,56 @@ BdrvDirtyBitmap >>> *bdrv_create_dirty_bitmap(BlockDriverState *bs, >>> return bitmap; >>> } >>> >>> +/* bdrv_create_meta_dirty_bitmap >>> + * >>> + * Create a meta dirty bitmap that tracks the changes of bits in @bitmap. >>> I.e. >>> + * when a dirty status bit in @bitmap is changed (either from reset to set >>> or >>> + * the other way around), its respective meta dirty bitmap bit will be >>> marked >>> + * dirty as well. >> >> Not wrong, but I'd like a note here that this is not an >> when-and-only-when relationship, i.e. that bits in the meta bitmap may >> be set even without the tracked bits in the dirty bitmap having changed. >> > > How? > > You mean, if the caller manually starts setting things in the meta > bitmap object? > > That's their fault then, isn't it? No, I mean something that I mentioned in a reply to some previous version (the patch adding the test): http://lists.nongnu.org/archive/html/qemu-block/2016-03/msg00332.html Fam's reply is here: http://lists.nongnu.org/archive/html/qemu-block/2016-06/msg00097.html (Interesting how that reply took nearly three months and yours took nearly one, there most be something about this series that makes replying to replies very cumbersome :-)) What I meant by “then it would update meta” is that it would update *all of the range* even though only a single bit has actually been changed. So the answer to your “how” is: See patch 2, the changes to hbitmap_set() (and hbitmap_reset()). If any of the bits in the given range is changed, all of the range is marked as having changed in the meta bitmap. So all we guarantee is that every time a bit is changed, the corresponding bit in the meta bitmap will be set. But we do not guarantee that a bit in the meta bitmap stays cleared as long as the corresponding range of the underlying bitmap stays the same. Max > >> Maybe this should be mentioned somewhere in patch 2, too. Or maybe only >> in patch 2. >> >>> + * >>> + * @bitmap: the block dirty bitmap for which to create a meta dirty bitmap. >>> + * @chunk_size: how many bytes of bitmap data does each bit in the meta >>> bitmap >>> + * track. >>> + */ >>> +void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap, >>> + int chunk_size) >>> +{ >>> +assert(!bitmap->meta); >>> +bitmap->meta = hbitmap_create_meta(bitmap->bitmap, >>> + chunk_size * BITS_PER_BYTE); >>> +} >>> + >>> +void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap) >>> +{ >>> +assert(bitmap->meta); >>> +hbitmap_free_meta(bitmap->bitmap); >>> +bitmap->meta = NULL; >>> +} >>> + >>> +int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs, >>> + BdrvDirtyBitmap *bitmap, int64_t sector, >>> + int nb_sectors) >>> +{ >>> +uint64_t i; >>> +int gran = bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS; >>> + >>> +/* To optimize: we can make hbitmap to internally check the range in a >>> + * coarse level, or at least do it word by word. */ >> >> We could also multiply gran by the granularity of the meta bitmap. Each >> bit of the meta bitmap tracks at least eight bits of the dirty bitmap, >> so we're calling hbitmap_get() at least eight times as often as >> necessary here. >> >> Or we just use int gran = hbitmap_granularity(bitmap->meta);. >> >> Not exactly wrong, though, so: >> >> Reviewed-by: Max Reitz >> >>> +for (i = sector; i < sector + nb_sectors; i += gran) { >>> +if (hbitmap_get(bitmap->meta, i)) { >>> +return true; >>> +} >>> +} >>> +return false; >>> +} >> > signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v6 5/6] tests: removed skipped flushes from block test traces
On 14.07.2016 19:06, Eric Blake wrote: On 07/14/2016 06:29 AM, Denis V. Lunev wrote: From: Evgeny Yakovlevbdrv_co_flush is now skipping flushes in case underlying media has no actual changes. This affected some blkdebug testcases that were expecting error logs from failure-injected flushes which are now skipped entirely. This change removes expected flush error logs from block tests 026 071 089 Signed-off-by: Evgeny Yakovlev Signed-off-by: Denis V. Lunev CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: John Snow --- tests/qemu-iotests/026.out.nocache | 50 -- tests/qemu-iotests/071.out | 8 -- tests/qemu-iotests/089.out | 2 -- 3 files changed, 60 deletions(-) If the previous patch broke the testsuite, then this should be squashed in with that patch so that bisection doesn't land on a broken testsuite. Seeing the testsuite change alongside code change is just fine; it proves what impact the code change has. Will do.
[Qemu-devel] [PATCH] tap: fix memory leak on failure to create a multiqueue tap device
Reported by Coverity. Signed-off-by: Paolo Bonzini--- net/tap.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/net/tap.c b/net/tap.c index e9c32f3..6a2cedc 100644 --- a/net/tap.c +++ b/net/tap.c @@ -787,8 +787,8 @@ int net_init_tap(const NetClientOptions *opts, const char *name, return -1; } } else if (tap->has_fds) { -char **fds = g_new(char *, MAX_TAP_QUEUES); -char **vhost_fds = g_new(char *, MAX_TAP_QUEUES); +char **fds = g_new0(char *, MAX_TAP_QUEUES); +char **vhost_fds = g_new0(char *, MAX_TAP_QUEUES); int nfds, nvhosts; if (tap->has_ifname || tap->has_script || tap->has_downscript || @@ -806,7 +806,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name, if (nfds != nvhosts) { error_setg(errp, "The number of fds passed does not match " "the number of vhostfds passed"); -return -1; +goto free_fail; } } @@ -814,7 +814,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name, fd = monitor_fd_param(cur_mon, fds[i], ); if (fd == -1) { error_propagate(errp, err); -return -1; +goto free_fail; } fcntl(fd, F_SETFL, O_NONBLOCK); @@ -824,7 +824,7 @@ int net_init_tap(const NetClientOptions *opts, const char *name, } else if (vnet_hdr != tap_probe_vnet_hdr(fd)) { error_setg(errp, "vnet_hdr not consistent across given tap fds"); -return -1; +goto free_fail; } net_init_tap_one(tap, peer, "tap", name, ifname, @@ -833,11 +833,21 @@ int net_init_tap(const NetClientOptions *opts, const char *name, vnet_hdr, fd, ); if (err) { error_propagate(errp, err); -return -1; +goto free_fail; } } g_free(fds); g_free(vhost_fds); +return 0; + +free_fail: +for (i = 0; i < nfds; i++) { +g_free(fds[i]); +g_free(vhost_fds[i]); +} +g_free(fds); +g_free(vhost_fds); +return -1; } else if (tap->has_helper) { if (tap->has_ifname || tap->has_script || tap->has_downscript || tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds) { -- 2.7.4
[Qemu-devel] [Bug 955379] Re: cmake hangs with qemu-arm-static
Please provide exact reproduction instructions -- I need enough information that I can completely replicate your setup and what you're doing: exactly how you've set up any chroot or whatever other guest setup you have, what cmake command you're running, and so on. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/955379 Title: cmake hangs with qemu-arm-static Status in QEMU: Fix Committed Status in Linaro QEMU: Confirmed Status in qemu-linaro package in Ubuntu: Confirmed Bug description: I'm using git commit 3e7ecd976b06f... configured with --target-list =arm-linux-user --static in a chroot environment to compile some things. I ran into this problem with both pcl and opencv-2.3.1. cmake consistently freezes at some point during its execution, though in a different spot each time, usually during a step when it's searching for some libraries. For instance, pcl most commonly stops after: [snip] -- Boost version: 1.46.1 -- Found the following Boost libraries: -- system -- filesystem -- thread -- date_time -- checking for module 'eigen3' -- found eigen3, version 3.0.1 which is perplexing because it freezes after finding what it wants, not during the search. When it does get past that point, it does so almost immediately but freezes somewhere else. I'm using 64-bit Ubuntu 11.10 with kernel release 3.0.0-16-generic with an Intel i5. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/955379/+subscriptions
Re: [Qemu-devel] [PATCH v4 05/16] pc: enforce adding CPUs contiguously and removing them in opposit order
On Thu, 14 Jul 2016 14:10:24 -0400 Bandan Daswrote: > Igor Mammedov writes: > > > it will still allow us to use cpu_index as migration instance_id > > since when CPUs are added contiguously (from the first to the last) > > and removed in opposite order, cpu_index stays stable and it's > > reproducable on destination side. > > > > Signed-off-by: Igor Mammedov > > --- > > While there is work in progress to support migration when there are holes > > in cpu_index range resulting from out-of-order plug or unplug, this patch > > is intended as a last resort if no easy, risk-free and elegant solution > > emerges before 2.7 dev cycle ends. > > I think this (or a modified version) is appropriate comment > material to accompany the changes. Ok if you are sure this code > is short-lived, but if it stays longer, a comment is definitely > helpful. Maybe a bit of reasoning added to the error message is > fine too. dwg is looking at cpu_index refactoring but that's not 2.7 material, this patch is doing what the similar spapr patch did (which David applied to his ppc queue). Perhaps moving comment under --- to commit message itself would be better as to leave trace of future refactoring plans. > > --- > > hw/i386/pc.c | 34 ++ > > 1 file changed, 34 insertions(+) > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > > index 33c5f97..75a92d0 100644 > > --- a/hw/i386/pc.c > > +++ b/hw/i386/pc.c > > @@ -1762,6 +1762,23 @@ static void pc_cpu_unplug_request_cb(HotplugHandler > > *hotplug_dev, > > goto out; > > } > > > > +if (idx < pcms->possible_cpus->len - 1 && > > +pcms->possible_cpus->cpus[idx + 1].cpu != NULL) { > > +X86CPU *cpu; > > + > > +for (idx = pcms->possible_cpus->len - 1; > > + pcms->possible_cpus->cpus[idx].cpu == NULL; idx--) { > > +;; > > +} > > + > > +cpu = X86_CPU(pcms->possible_cpus->cpus[idx].cpu); > > +error_setg(_err, "CPU [socket-id: %u, core-id: %u," > > + " thread-id: %u] should be removed first", > > + cpu->socket_id, cpu->core_id, cpu->thread_id); > > +goto out; > > + > > +} > > + > > hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev); > > hhc->unplug_request(HOTPLUG_HANDLER(pcms->acpi_dev), dev, _err); > > > > @@ -1860,6 +1877,23 @@ static void pc_cpu_pre_plug(HotplugHandler > > *hotplug_dev, > > return; > > } > > > > +if (idx != 0 && pcms->possible_cpus->cpus[idx - 1].cpu == NULL) { > > +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > > + > > +for (idx = 1; pcms->possible_cpus->cpus[idx].cpu != NULL; idx++) { > > +;; > > +} > > + > > +x86_topo_ids_from_apicid(pcms->possible_cpus->cpus[idx].arch_id, > > + smp_cores, smp_threads, ); > > + > > +if (!pcmc->legacy_cpu_hotplug) { > > +error_setg(errp, "CPU [socket: %u, core: %u, thread: %u] > > should be" > > + " added first", topo.pkg_id, topo.core_id, > > topo.smt_id); > > +return; > > +} > > +} > > + > > /* if 'address' properties socket-id/core-id/thread-id are not set, > > set them > > * so that query_hotpluggable_cpus would show correct values > > */
Re: [Qemu-devel] [PATCH v2 12/13] virtio-gpu: Wrap in vmstate
Hi, > Hmm yes; I think the right fix here is to convert that into > migrate_add_blocker / migrate_del_blocker Oh, didn't notice we have that, yes that sounds good. cheers, Gerd
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
On Fr, 2016-07-15 at 12:02 +0200, Paolo Bonzini wrote: > > On 15/07/2016 10:47, Juergen Gross wrote: > > Nothing scaring and no real difference between working and not working > > variant. > > > > Meanwhile I've been digging a little bit deeper and found the reason: > > libxenstore is setting up a reader thread which is waiting for the > > watch to fire. With above commit the stack size of that thread (16kB) > > is too small. Setting it to 32kB made qemu work again. > > This makes very little sense (not your fault)... The commit doesn't > change stack usage at all, TLS should not be on the stack. > > Can you capture a backtrace where the 16K stack is exceeded? Perhaps > it's only due to inlining decision on the compiler, in which case > Peter's patch from today is only a bandaid. Hmm, I guess I hold off the vnc pull request for now ... cheers, Gerd
Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than MAX_EVENTS
On Fri, Jul 15, 2016 at 11:58 AM, Paolo Bonziniwrote: > > > On 15/07/2016 11:18, Roman Penyaev wrote: >> Those 3 red spikes and a blue hill is what we have to focus on. The >> blue hill at the right corner of the chart means that almost always the >> ring buffer was observed as full, i.e. qemu_laio_completion_bh() got >> a chance to reap completions not very often, meanwhile completed >> requests stand in the ring buffer for quite a long time which degrades >> the overall performance. >> >> The results covered by the red line are much better and that can be >> explained by those 3 red spikes, which are almost in the middle of the >> whole distribution, i.e. qemu_laio_completion_bh() is called more often, >> completed requests do not stall, giving fio a chance to submit new fresh >> requests. >> >> The theoretical fix would be to schedule completion BH just after >> successful io_submit, i.e.: > > What about removing the qemu_bh_cancel but keeping the rest of the patch? That exactly what I did. Numbers go to expected from ~1600MB/s to ~1800MB/s. So basically this hunk of the debatable patch: if (event_notifier_test_and_clear(>e)) { -qemu_bh_schedule(s->completion_bh); +qemu_laio_completion_bh(s); } does not have any impact and can be ignored. At least I did not notice anything important. > > I'm also interested in a graph with this patch ("linux-aio: prevent > submitting more than MAX_EVENTS") on top of origin/master. I can plot it also of course. > > Thanks for the analysis. Sometimes a picture _is_ worth a thousand > words, even if it's measuring "only" second-order effects (# of > completions is not what causes the slowdown, but # of completions > affects latency which causes the slowdown). Yes, you are right, latency. With userspace io_getevents ~0 costs we can peek requests as often as we like to decrease latency on very fast devices. That can also bring something. Probably after each io_submit() it makes sense to peek and complete something. -- Roman
[Qemu-devel] [RFC PATCH V6 6/6] colo-compare: add TCP, UDP, ICMP packet comparison
We add TCP,UDP,ICMP packet comparison to replace IP packet comparison. This can increase the accuracy of the package comparison. less checkpoint more efficiency. Signed-off-by: Zhang ChenSigned-off-by: Li Zhijian Signed-off-by: Wen Congyang --- net/colo-compare.c | 174 +++-- trace-events | 4 ++ 2 files changed, 174 insertions(+), 4 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index 942e326..9737ec6 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -18,6 +18,7 @@ #include "qapi/qmp/qerror.h" #include "qapi/error.h" #include "net/net.h" +#include "net/eth.h" #include "net/vhost_net.h" #include "qom/object_interfaces.h" #include "qemu/iov.h" @@ -197,9 +198,158 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt) } } -static int colo_packet_compare_all(Packet *spkt, Packet *ppkt) +/* + * called from the compare thread on the primary + * for compare tcp packet + * compare_tcp copied from Dr. David Alan Gilbert's branch + */ +static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) +{ +struct tcphdr *ptcp, *stcp; +int res; +char *sdebug, *ddebug; + +trace_colo_compare_main("compare tcp"); +if (ppkt->size != spkt->size) { +if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { +trace_colo_compare_main("pkt size not same"); +} +return -1; +} + +ptcp = (struct tcphdr *)ppkt->transport_layer; +stcp = (struct tcphdr *)spkt->transport_layer; + +if (ptcp->th_seq != stcp->th_seq) { +if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { +trace_colo_compare_main("pkt tcp seq not same"); +} +return -1; +} + +/* + * The 'identification' field in the IP header is *very* random + * it almost never matches. Fudge this by ignoring differences in + * unfragmented packets; they'll normally sort themselves out if different + * anyway, and it should recover at the TCP level. + * An alternative would be to get both the primary and secondary to rewrite + * somehow; but that would need some sync traffic to sync the state + */ +if (ntohs(ppkt->ip->ip_off) & IP_DF) { +spkt->ip->ip_id = ppkt->ip->ip_id; +/* and the sum will be different if the IDs were different */ +spkt->ip->ip_sum = ppkt->ip->ip_sum; +} + +res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN, +(spkt->size - ETH_HLEN)); + +if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { +sdebug = strdup(inet_ntoa(ppkt->ip->ip_src)); +ddebug = strdup(inet_ntoa(ppkt->ip->ip_dst)); +fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u" +" s: seq/ack=%u/%u res=%d flags=%x/%x\n", __func__, + sdebug, ddebug, + ntohl(ptcp->th_seq), ntohl(ptcp->th_ack), + ntohl(stcp->th_seq), ntohl(stcp->th_ack), + res, ptcp->th_flags, stcp->th_flags); + +trace_colo_compare_tcp_miscompare("Primary len", ppkt->size); +qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", ppkt->size); +trace_colo_compare_tcp_miscompare("Secondary len", spkt->size); +qemu_hexdump((char *)spkt->data, stderr, "colo-compare", spkt->size); + +g_free(sdebug); +g_free(ddebug); +} + +return res; +} + +/* + * called from the compare thread on the primary + * for compare udp packet + */ +static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) +{ +int ret; + +trace_colo_compare_main("compare udp"); +ret = colo_packet_compare(ppkt, spkt); + +if (ret) { +trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size); +qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", ppkt->size); +trace_colo_compare_udp_miscompare("Secondary pkt size", spkt->size); +qemu_hexdump((char *)spkt->data, stderr, "colo-compare", spkt->size); +} + +return ret; +} + +/* + * called from the compare thread on the primary + * for compare icmp packet + */ +static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) { -trace_colo_compare_main("compare all"); +int network_length; +struct icmp *icmp_ppkt, *icmp_spkt; + +trace_colo_compare_main("compare icmp"); +network_length = ppkt->ip->ip_hl * 4; +if (ppkt->size != spkt->size || +ppkt->size < network_length + ETH_HLEN) { +return -1; +} +icmp_ppkt = (struct icmp *)(ppkt->data + network_length + ETH_HLEN); +icmp_spkt = (struct icmp *)(spkt->data + network_length + ETH_HLEN); + +if ((icmp_ppkt->icmp_type == icmp_spkt->icmp_type) && +(icmp_ppkt->icmp_code == icmp_spkt->icmp_code)) { +if (icmp_ppkt->icmp_type == ICMP_REDIRECT) { +if (icmp_ppkt->icmp_gwaddr.s_addr != +
Re: [Qemu-devel] [PATCH] ppc: Yet another fix for the huge page support detection mechanism
On 15.07.2016 10:35, David Gibson wrote: > On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote: >> Commit 86b50f2e1bef ("Disable huge page support if it is not available >> for main RAM") already made sure that huge page support is not announced >> to the guest if the normal RAM of non-NUMA configurations is not backed >> by a huge page filesystem. However, there is one more case that can go >> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured >> with huge page support (and only the memory of a DIMM is configured with >> it). When QEMU is started with the following command line for example, >> the Linux guest currently crashes because it is trying to use huge pages >> on a memory region that does not support huge pages: >> >> qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \ >> >> memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \ >>-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \ >>-numa node,nodeid=0 -numa node,nodeid=1 >> >> To fix this issue, we've got to make sure to disable huge page support, >> too, when there is a NUMA node that is not using a memory backend with >> huge page support. >> >> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740 >> Signed-off-by: Thomas Huth>> --- >> target-ppc/kvm.c | 10 +++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c >> index 884d564..7a8f555 100644 >> --- a/target-ppc/kvm.c >> +++ b/target-ppc/kvm.c >> @@ -389,12 +389,16 @@ static long getrampagesize(void) >> >> object_child_foreach(memdev_root, find_max_supported_pagesize, ); >> >> -if (hpsize == LONG_MAX) { >> +if (hpsize == LONG_MAX || hpsize == getpagesize()) { >> return getpagesize(); >> } >> >> -if (nb_numa_nodes == 0 && hpsize > getpagesize()) { >> -/* No NUMA nodes and normal RAM without -mem-path ==> no huge >> pages! */ >> +/* If NUMA is disabled or the NUMA nodes are not backed with a >> + * memory-backend, then there is at least one node using "normal" >> + * RAM. And since normal RAM has not been configured with "-mem-path" >> + * (what we've checked earlier here already), we can not use huge pages! >> + */ >> +if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) { > > Is that second clause sufficient, or do you need to loop through and > check the memdev of every node? Checking the first entry should be sufficient. QEMU forces you to specify either a memory backend for all NUMA nodes (which we should have looked at during the object_child_foreach() some lines earlier), or you must not specify a memory backend for any NUMA node at all. You can not mix the settings, so checking numa_info[0] is enough. Thomas signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 3/7] hw/mips: fix PCI bus initialization
On Thu, Jul 14, 2016 at 04:43:42PM +0300, Marcel Apfelbaum wrote: > Delay the host-bridge 'realization' until the > PCI root bus is attached. > > Signed-off-by: Marcel Apfelbaum> --- > hw/mips/gt64xxx_pci.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Acked-by: Leon Alrae Tested-by: Leon Alrae Thanks, Leon > > diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c > index 3f4523d..4811843 100644 > --- a/hw/mips/gt64xxx_pci.c > +++ b/hw/mips/gt64xxx_pci.c > @@ -1167,7 +1167,6 @@ PCIBus *gt64120_register(qemu_irq *pic) > DeviceState *dev; > > dev = qdev_create(NULL, TYPE_GT64120_PCI_HOST_BRIDGE); > -qdev_init_nofail(dev); > d = GT64120_PCI_HOST_BRIDGE(dev); > phb = PCI_HOST_BRIDGE(dev); > memory_region_init(>pci0_mem, OBJECT(dev), "pci0-mem", UINT32_MAX); > @@ -1178,6 +1177,7 @@ PCIBus *gt64120_register(qemu_irq *pic) > >pci0_mem, > get_system_io(), > PCI_DEVFN(18, 0), 4, TYPE_PCI_BUS); > +qdev_init_nofail(dev); > memory_region_init_io(>ISD_mem, OBJECT(dev), _mem_ops, d, > "isd-mem", 0x1000); > > pci_create_simple(phb->bus, PCI_DEVFN(0, 0), "gt64120_pci"); > -- > 2.4.3 >
Re: [Qemu-devel] [PATCH V2 1/1] linux-aio: prevent submitting more than MAX_EVENTS
On 15/07/2016 13:35, Roman Penyaev wrote: > On Fri, Jul 15, 2016 at 12:17 PM, Roman Penyaev >wrote: >> On Fri, Jul 15, 2016 at 11:58 AM, Paolo Bonzini wrote: >>> >>> >>> On 15/07/2016 11:18, Roman Penyaev wrote: Those 3 red spikes and a blue hill is what we have to focus on. The blue hill at the right corner of the chart means that almost always the ring buffer was observed as full, i.e. qemu_laio_completion_bh() got a chance to reap completions not very often, meanwhile completed requests stand in the ring buffer for quite a long time which degrades the overall performance. The results covered by the red line are much better and that can be explained by those 3 red spikes, which are almost in the middle of the whole distribution, i.e. qemu_laio_completion_bh() is called more often, completed requests do not stall, giving fio a chance to submit new fresh requests. The theoretical fix would be to schedule completion BH just after successful io_submit, i.e.: >>> >>> What about removing the qemu_bh_cancel but keeping the rest of the patch? >> >> That exactly what I did. Numbers go to expected from ~1600MB/s to ~1800MB/s. >> So basically this hunk of the debatable patch: >> >> if (event_notifier_test_and_clear(>e)) { >> -qemu_bh_schedule(s->completion_bh); >> +qemu_laio_completion_bh(s); >> } >> >> does not have any impact and can be ignored. At least I did not notice >> anything important. Thanks, this means that we should either add back the other line, or wrap qemu_laio_completion_bh in a loop. The rationale is that an io_getevents that doesn't find any event is extremely cheap. >>> I'm also interested in a graph with this patch ("linux-aio: prevent >>> submitting more than MAX_EVENTS") on top of origin/master. >> >> I can plot it also of course. > > So, finally I have it. > > Same link: > https://docs.google.com/spreadsheets/d/12CIt6EKiJLqNx0OHNqiabR-oFBrqkH0LN3mjzZ5jGeo/edit?usp=sharing > > last sheet: > "1789MB/s" > > Not that much interesting: almost all the time we complete maximum: > MAX_LIMIT requests at once. But of course that expected on such > device. Probably other good metrics should be taken into account. And this means that we probably should raise MAX_LIMIT. Paolo
[Qemu-devel] [kvm-unit-tests PATCH v3 08/10] arm/arm64: gicv2: add an IPI test
Signed-off-by: Andrew Jones--- v2: add more details in the output if a test fails, report spurious interrupts if we get them --- arm/Makefile.common | 6 +- arm/gic.c | 194 arm/unittests.cfg | 7 ++ 3 files changed, 204 insertions(+), 3 deletions(-) create mode 100644 arm/gic.c diff --git a/arm/Makefile.common b/arm/Makefile.common index 41239c37e0920..bc38183ab86e0 100644 --- a/arm/Makefile.common +++ b/arm/Makefile.common @@ -9,9 +9,9 @@ ifeq ($(LOADADDR),) LOADADDR = 0x4000 endif -tests-common = \ - $(TEST_DIR)/selftest.flat \ - $(TEST_DIR)/spinlock-test.flat +tests-common = $(TEST_DIR)/selftest.flat +tests-common += $(TEST_DIR)/spinlock-test.flat +tests-common += $(TEST_DIR)/gic.flat all: test_cases diff --git a/arm/gic.c b/arm/gic.c new file mode 100644 index 0..cf7ec1c90413c --- /dev/null +++ b/arm/gic.c @@ -0,0 +1,194 @@ +/* + * GIC tests + * + * GICv2 + * . test sending/receiving IPIs + * + * Copyright (C) 2016, Red Hat Inc, Andrew Jones + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include +#include +#include +#include +#include +#include +#include + +static int gic_version; +static int acked[NR_CPUS], spurious[NR_CPUS]; +static cpumask_t ready; + +static void nr_cpu_check(int nr) +{ + if (nr_cpus < nr) + report_abort("At least %d cpus required", nr); +} + +static void wait_on_ready(void) +{ + cpumask_set_cpu(smp_processor_id(), ); + while (!cpumask_full()) + cpu_relax(); +} + +static void check_acked(cpumask_t *mask) +{ + int missing = 0, extra = 0, unexpected = 0; + int nr_pass, cpu, i; + + /* Wait up to 5s for all interrupts to be delivered */ + for (i = 0; i < 50; ++i) { + mdelay(100); + nr_pass = 0; + for_each_present_cpu(cpu) { + smp_rmb(); + nr_pass += cpumask_test_cpu(cpu, mask) ? + acked[cpu] == 1 : acked[cpu] == 0; + } + if (nr_pass == nr_cpus) { + report("Completed in %d ms", true, ++i * 100); + return; + } + } + + for_each_present_cpu(cpu) { + if (cpumask_test_cpu(cpu, mask)) { + if (!acked[cpu]) + ++missing; + else if (acked[cpu] > 1) + ++extra; + } else { + if (acked[cpu]) + ++unexpected; + } + } + + report("Timed-out (5s). ACKS: missing=%d extra=%d unexpected=%d", + false, missing, extra, unexpected); +} + +static void ipi_handler(struct pt_regs *regs __unused) +{ + u32 iar = readl(gicv2_cpu_base() + GIC_CPU_INTACK); + + if (iar != GICC_INT_SPURIOUS) { + writel(iar, gicv2_cpu_base() + GIC_CPU_EOI); + smp_rmb(); /* pairs with wmb in ipi_test functions */ + ++acked[smp_processor_id()]; + smp_wmb(); /* pairs with rmb in check_acked */ + } else { + ++spurious[smp_processor_id()]; + smp_wmb(); + } +} + +static void ipi_test_self(void) +{ + cpumask_t mask; + + report_prefix_push("self"); + memset(acked, 0, sizeof(acked)); + smp_wmb(); + cpumask_clear(); + cpumask_set_cpu(0, ); + writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT); + check_acked(); + report_prefix_pop(); +} + +static void ipi_test_smp(void) +{ + cpumask_t mask; + unsigned long tlist; + + report_prefix_push("target-list"); + memset(acked, 0, sizeof(acked)); + smp_wmb(); + tlist = cpumask_bits(_present_mask)[0] & 0xaa; + cpumask_bits()[0] = tlist; + writel((u8)tlist << 16, gicv2_dist_base() + GIC_DIST_SOFTINT); + check_acked(); + report_prefix_pop(); + + report_prefix_push("broadcast"); + memset(acked, 0, sizeof(acked)); + smp_wmb(); + cpumask_copy(, _present_mask); + cpumask_clear_cpu(0, ); + writel(1 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT); + check_acked(); + report_prefix_pop(); +} + +static void ipi_enable(void) +{ + gicv2_enable_defaults(); +#ifdef __arm__ + install_exception_handler(EXCPTN_IRQ, ipi_handler); +#else + install_irq_handler(EL1H_IRQ, ipi_handler); +#endif + local_irq_enable(); +} + +static void ipi_recv(void) +{ + ipi_enable(); + cpumask_set_cpu(smp_processor_id(), ); + while (1) + wfi(); +} + +int main(int argc, char **argv) +{ + char pfx[8]; + int cpu; + + gic_version = gic_init(); + if (!gic_version) + report_abort("No gic present!"); +
[Qemu-devel] [kvm-unit-tests PATCH v3 07/10] arm/arm64: add initial gicv3 support
Signed-off-by: Andrew Jones--- v2: configure irqs as NS GRP1 --- lib/arm/asm/arch_gicv3.h | 184 ++ lib/arm/asm/gic-v3.h | 321 + lib/arm/asm/gic.h | 1 + lib/arm/gic.c | 73 +++ lib/arm64/asm/arch_gicv3.h | 169 lib/arm64/asm/gic-v3.h | 1 + lib/arm64/asm/sysreg.h | 44 +++ 7 files changed, 793 insertions(+) create mode 100644 lib/arm/asm/arch_gicv3.h create mode 100644 lib/arm/asm/gic-v3.h create mode 100644 lib/arm64/asm/arch_gicv3.h create mode 100644 lib/arm64/asm/gic-v3.h create mode 100644 lib/arm64/asm/sysreg.h diff --git a/lib/arm/asm/arch_gicv3.h b/lib/arm/asm/arch_gicv3.h new file mode 100644 index 0..d529a7eb62807 --- /dev/null +++ b/lib/arm/asm/arch_gicv3.h @@ -0,0 +1,184 @@ +/* + * All ripped off from arch/arm/include/asm/arch_gicv3.h + * + * Copyright (C) 2016, Red Hat Inc, Andrew Jones + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#ifndef _ASMARM_ARCH_GICV3_H_ +#define _ASMARM_ARCH_GICV3_H_ + +#ifndef __ASSEMBLY__ + +#include +#include +#include + +#define __stringify xstr + + +#define __ACCESS_CP15(CRn, Op1, CRm, Op2) p15, Op1, %0, CRn, CRm, Op2 +#define __ACCESS_CP15_64(Op1, CRm) p15, Op1, %Q0, %R0, CRm + +#define ICC_EOIR1 __ACCESS_CP15(c12, 0, c12, 1) +#define ICC_DIR__ACCESS_CP15(c12, 0, c11, 1) +#define ICC_IAR1 __ACCESS_CP15(c12, 0, c12, 0) +#define ICC_SGI1R __ACCESS_CP15_64(0, c12) +#define ICC_PMR__ACCESS_CP15(c4, 0, c6, 0) +#define ICC_CTLR __ACCESS_CP15(c12, 0, c12, 4) +#define ICC_SRE__ACCESS_CP15(c12, 0, c12, 5) +#define ICC_IGRPEN1__ACCESS_CP15(c12, 0, c12, 7) + +#define ICC_HSRE __ACCESS_CP15(c12, 4, c9, 5) + +#define ICH_VSEIR __ACCESS_CP15(c12, 4, c9, 4) +#define ICH_HCR__ACCESS_CP15(c12, 4, c11, 0) +#define ICH_VTR__ACCESS_CP15(c12, 4, c11, 1) +#define ICH_MISR __ACCESS_CP15(c12, 4, c11, 2) +#define ICH_EISR __ACCESS_CP15(c12, 4, c11, 3) +#define ICH_ELSR __ACCESS_CP15(c12, 4, c11, 5) +#define ICH_VMCR __ACCESS_CP15(c12, 4, c11, 7) + +#define __LR0(x) __ACCESS_CP15(c12, 4, c12, x) +#define __LR8(x) __ACCESS_CP15(c12, 4, c13, x) + +#define ICH_LR0__LR0(0) +#define ICH_LR1__LR0(1) +#define ICH_LR2__LR0(2) +#define ICH_LR3__LR0(3) +#define ICH_LR4__LR0(4) +#define ICH_LR5__LR0(5) +#define ICH_LR6__LR0(6) +#define ICH_LR7__LR0(7) +#define ICH_LR8__LR8(0) +#define ICH_LR9__LR8(1) +#define ICH_LR10 __LR8(2) +#define ICH_LR11 __LR8(3) +#define ICH_LR12 __LR8(4) +#define ICH_LR13 __LR8(5) +#define ICH_LR14 __LR8(6) +#define ICH_LR15 __LR8(7) + +/* LR top half */ +#define __LRC0(x) __ACCESS_CP15(c12, 4, c14, x) +#define __LRC8(x) __ACCESS_CP15(c12, 4, c15, x) + +#define ICH_LRC0 __LRC0(0) +#define ICH_LRC1 __LRC0(1) +#define ICH_LRC2 __LRC0(2) +#define ICH_LRC3 __LRC0(3) +#define ICH_LRC4 __LRC0(4) +#define ICH_LRC5 __LRC0(5) +#define ICH_LRC6 __LRC0(6) +#define ICH_LRC7 __LRC0(7) +#define ICH_LRC8 __LRC8(0) +#define ICH_LRC9 __LRC8(1) +#define ICH_LRC10 __LRC8(2) +#define ICH_LRC11 __LRC8(3) +#define ICH_LRC12 __LRC8(4) +#define ICH_LRC13 __LRC8(5) +#define ICH_LRC14 __LRC8(6) +#define ICH_LRC15 __LRC8(7) + +#define __AP0Rx(x) __ACCESS_CP15(c12, 4, c8, x) +#define ICH_AP0R0 __AP0Rx(0) +#define ICH_AP0R1 __AP0Rx(1) +#define ICH_AP0R2 __AP0Rx(2) +#define ICH_AP0R3 __AP0Rx(3) + +#define __AP1Rx(x) __ACCESS_CP15(c12, 4, c9, x) +#define ICH_AP1R0 __AP1Rx(0) +#define ICH_AP1R1 __AP1Rx(1) +#define
Re: [Qemu-devel] [PULL 1/1] Add optionrom compatible with fw_cfg DMA version
On Thu, Jul 14, 2016 at 2:52 PM, Paolo Bonziniwrote: > From: Marc Marí > > This optionrom is based on linuxboot.S. > > Signed-off-by: Marc Marí > Signed-off-by: Richard W.M. Jones > Message-Id: <1464027093-24073-2-git-send-email-rjo...@redhat.com> > [Add -fno-toplevel-reorder, support clang without -m16. - Paolo] > Signed-off-by: Paolo Bonzini > --- > .gitignore| 4 + > Makefile | 2 +- > hw/i386/pc.c | 10 +- > hw/nvram/fw_cfg.c | 2 +- > include/hw/i386/pc.h | 4 + > include/hw/nvram/fw_cfg.h | 1 + > pc-bios/linuxboot_dma.bin | Bin 0 -> 1024 bytes > pc-bios/optionrom/Makefile| 42 -- > pc-bios/optionrom/code16gcc.h | 3 + > pc-bios/optionrom/linuxboot_dma.c | 294 > ++ > 10 files changed, 349 insertions(+), 13 deletions(-) > create mode 100644 pc-bios/linuxboot_dma.bin > create mode 100644 pc-bios/optionrom/code16gcc.h > create mode 100644 pc-bios/optionrom/linuxboot_dma.c CCoptionrom/linuxboot_dma.o clang-3.8: error: unsupported argument '-32' to option 'Wa,' $ rpm -qi clang Name: clang Version : 3.8.0 Release : 2.fc24 Architecture: x86_64 Stefan
[Qemu-devel] [PATCH Qemu] Change spice-server protocol for GL texture passing
--- ui/spice-core.c| 5 - ui/spice-display.c | 29 - 2 files changed, 8 insertions(+), 26 deletions(-) diff --git a/ui/spice-core.c b/ui/spice-core.c index da05054..f7647f7 100644 --- a/ui/spice-core.c +++ b/ui/spice-core.c @@ -828,11 +828,6 @@ void qemu_spice_init(void) #ifdef HAVE_SPICE_GL if (qemu_opt_get_bool(opts, "gl", 0)) { -if ((port != 0) || (tls_port != 0)) { -error_report("SPICE GL support is local-only for now and " - "incompatible with -spice port/tls-port"); -exit(1); -} if (egl_rendernode_init() != 0) { error_report("Failed to initialize EGL render node for SPICE GL"); exit(1); diff --git a/ui/spice-display.c b/ui/spice-display.c index 2a77a54..72137bd 100644 --- a/ui/spice-display.c +++ b/ui/spice-display.c @@ -852,6 +852,10 @@ static void qemu_spice_gl_block_timer(void *opaque) static QEMUGLContext qemu_spice_gl_create_context(DisplayChangeListener *dcl, QEMUGLParams *params) { +SimpleSpiceDisplay *ssd = container_of(dcl, SimpleSpiceDisplay, dcl); + +spice_qxl_gl_init(>qxl, qemu_egl_display, qemu_egl_rn_ctx); + eglMakeCurrent(qemu_egl_display, EGL_NO_SURFACE, EGL_NO_SURFACE, qemu_egl_rn_ctx); return qemu_egl_create_context(dcl, params); @@ -864,28 +868,11 @@ static void qemu_spice_gl_scanout(DisplayChangeListener *dcl, uint32_t w, uint32_t h) { SimpleSpiceDisplay *ssd = container_of(dcl, SimpleSpiceDisplay, dcl); -EGLint stride = 0, fourcc = 0; -int fd = -1; - -if (tex_id) { -fd = egl_get_fd_for_texture(tex_id, , ); -if (fd < 0) { -fprintf(stderr, "%s: failed to get fd for texture\n", __func__); -return; -} -dprint(1, "%s: %dx%d (stride %d, fourcc 0x%x)\n", __func__, - w, h, stride, fourcc); -} else { -dprint(1, "%s: no texture (no framebuffer)\n", __func__); -} - -assert(!tex_id || fd >= 0); -/* note: spice server will close the fd */ -spice_qxl_gl_scanout(>qxl, fd, - surface_width(ssd->ds), - surface_height(ssd->ds), - stride, fourcc, y_0_top); +spice_qxl_gl_scanout_texture(>qxl, tex_id, + surface_width(ssd->ds), + surface_height(ssd->ds), + y_0_top); qemu_spice_gl_monitor_config(ssd, x, y, w, h); } -- 2.7.4
Re: [Qemu-devel] Regression with commit 095497ffc66b7f031
On 15/07/2016 12:41, Juergen Gross wrote: > On 15/07/16 12:35, Paolo Bonzini wrote: >> >> >> On 15/07/2016 12:12, Gerd Hoffmann wrote: >>> On Fr, 2016-07-15 at 12:02 +0200, Paolo Bonzini wrote: On 15/07/2016 10:47, Juergen Gross wrote: > Nothing scaring and no real difference between working and not working > variant. > > Meanwhile I've been digging a little bit deeper and found the reason: > libxenstore is setting up a reader thread which is waiting for the > watch to fire. With above commit the stack size of that thread (16kB) > is too small. Setting it to 32kB made qemu work again. This makes very little sense (not your fault)... The commit doesn't change stack usage at all, TLS should not be on the stack. Can you capture a backtrace where the 16K stack is exceeded? Perhaps it's only due to inlining decision on the compiler, in which case Peter's patch from today is only a bandaid. >>> >>> Hmm, I guess I hold off the vnc pull request for now ... >> >> Go ahead. I looked at glibc source code and the patch is okay. > > Paolo, do you know of any interface to obtain the size of the TLS area > taken from the stack (before calling pthread_create() )? https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01643.html has a patch that _removes_ code to do this from the golang runtime. The comments there say that only with glibc before version 2.16 the static TLS size is taken out of the stack size... What version of glibc are you using? Paolo
[Qemu-devel] [kvm-unit-tests PATCH v3 06/10] arm/arm64: add initial gicv2 support
Add some gicv2 support. This just adds init and enable functions, allowing unit tests to start messing with it. Signed-off-by: Andrew Jones--- arm/Makefile.common| 1 + lib/arm/asm/gic-v2.h | 74 ++ lib/arm/asm/gic.h | 20 ++ lib/arm/gic.c | 69 ++ lib/arm64/asm/gic-v2.h | 1 + lib/arm64/asm/gic.h| 1 + 6 files changed, 166 insertions(+) create mode 100644 lib/arm/asm/gic-v2.h create mode 100644 lib/arm/asm/gic.h create mode 100644 lib/arm/gic.c create mode 100644 lib/arm64/asm/gic-v2.h create mode 100644 lib/arm64/asm/gic.h diff --git a/arm/Makefile.common b/arm/Makefile.common index ccb554d9251a4..41239c37e0920 100644 --- a/arm/Makefile.common +++ b/arm/Makefile.common @@ -42,6 +42,7 @@ cflatobjs += lib/arm/mmu.o cflatobjs += lib/arm/bitops.o cflatobjs += lib/arm/psci.o cflatobjs += lib/arm/smp.o +cflatobjs += lib/arm/gic.o libeabi = lib/arm/libeabi.a eabiobjs = lib/arm/eabi_compat.o diff --git a/lib/arm/asm/gic-v2.h b/lib/arm/asm/gic-v2.h new file mode 100644 index 0..973c2bf3cc796 --- /dev/null +++ b/lib/arm/asm/gic-v2.h @@ -0,0 +1,74 @@ +/* + * All GIC* defines are lifted from include/linux/irqchip/arm-gic.h + * + * Copyright (C) 2016, Red Hat Inc, Andrew Jones + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#ifndef _ASMARM_GIC_V2_H_ +#define _ASMARM_GIC_V2_H_ + +#define GIC_CPU_CTRL 0x00 +#define GIC_CPU_PRIMASK0x04 +#define GIC_CPU_BINPOINT 0x08 +#define GIC_CPU_INTACK 0x0c +#define GIC_CPU_EOI0x10 +#define GIC_CPU_RUNNINGPRI 0x14 +#define GIC_CPU_HIGHPRI0x18 +#define GIC_CPU_ALIAS_BINPOINT 0x1c +#define GIC_CPU_ACTIVEPRIO 0xd0 +#define GIC_CPU_IDENT 0xfc +#define GIC_CPU_DEACTIVATE 0x1000 + +#define GICC_ENABLE0x1 +#define GICC_INT_PRI_THRESHOLD 0xf0 + +#define GIC_CPU_CTRL_EOImodeNS (1 << 9) + +#define GICC_IAR_INT_ID_MASK 0x3ff +#define GICC_INT_SPURIOUS 1023 +#define GICC_DIS_BYPASS_MASK 0x1e0 + +#define GIC_DIST_CTRL 0x000 +#define GIC_DIST_CTR 0x004 +#define GIC_DIST_IGROUP0x080 +#define GIC_DIST_ENABLE_SET0x100 +#define GIC_DIST_ENABLE_CLEAR 0x180 +#define GIC_DIST_PENDING_SET 0x200 +#define GIC_DIST_PENDING_CLEAR 0x280 +#define GIC_DIST_ACTIVE_SET0x300 +#define GIC_DIST_ACTIVE_CLEAR 0x380 +#define GIC_DIST_PRI 0x400 +#define GIC_DIST_TARGET0x800 +#define GIC_DIST_CONFIG0xc00 +#define GIC_DIST_SOFTINT 0xf00 +#define GIC_DIST_SGI_PENDING_CLEAR 0xf10 +#define GIC_DIST_SGI_PENDING_SET 0xf20 + +#define GICD_ENABLE0x1 +#define GICD_DISABLE 0x0 +#define GICD_INT_ACTLOW_LVLTRIG0x0 +#define GICD_INT_EN_CLR_X320x +#define GICD_INT_EN_SET_SGI0x +#define GICD_INT_EN_CLR_PPI0x +#define GICD_INT_DEF_PRI 0xa0 +#define GICD_INT_DEF_PRI_X4((GICD_INT_DEF_PRI << 24) |\ + (GICD_INT_DEF_PRI << 16) |\ + (GICD_INT_DEF_PRI << 8) |\ + GICD_INT_DEF_PRI) +#ifndef __ASSEMBLY__ + +struct gicv2_data { + void *dist_base; + void *cpu_base; +}; +extern struct gicv2_data gicv2_data; + +#define gicv2_dist_base() (gicv2_data.dist_base) +#define gicv2_cpu_base() (gicv2_data.cpu_base) + +extern int gicv2_init(void); +extern void gicv2_enable_defaults(void); + +#endif /* !__ASSEMBLY__ */ +#endif /* _ASMARM_GIC_V2_H_ */ diff --git a/lib/arm/asm/gic.h b/lib/arm/asm/gic.h new file mode 100644 index 0..b1237d1c5ef22 --- /dev/null +++ b/lib/arm/asm/gic.h @@ -0,0 +1,20 @@ +/* + * Copyright (C) 2016, Red Hat Inc, Andrew Jones + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#ifndef _ASMARM_GIC_H_ +#define _ASMARM_GIC_H_ + +#include + +/* + * gic_init will try to find all known gics, and then + * initialize the gic data for the one found. + * returns + * 0 : no gic was found + * > 0 : the gic version of the gic found + */ +extern int gic_init(void); + +#endif /* _ASMARM_GIC_H_ */ diff --git a/lib/arm/gic.c b/lib/arm/gic.c new file mode 100644 index 0..64a3049c9e8ce --- /dev/null +++ b/lib/arm/gic.c @@ -0,0 +1,69 @@ +/* + * Copyright (C) 2016, Red Hat Inc, Andrew Jones + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#include +#include
[Qemu-devel] [kvm-unit-tests PATCH v3 00/10] arm/arm64: add gic framework
v3: - Rebased on latest master - Added Alex's r-b's v2: Rebased on latest master + my "populate argv[0]" series (will send a REPOST for that shortly. Additionally a few patches got fixes/features; 07/10 got same fix as kernel 7c9b973061 "irqchip/gic-v3: Configure all interrupts as non-secure Group-1" in order to continue working over TCG, as the gicv3 code for TCG removed a hack it had there to make Linux happy. 08/10 added more output for when things fail (if they fail) 09/10 switched gicv3 broadcast implementation to using IRM. This found a bug in a recent (but not tip) kernel, which I was about to fix, but then I saw MarcZ beat me to it. 10/10 actually check that the input irq is the received irq Import defines, and steal enough helper functions, from Linux to enable programming of the gic (v2 and v3). Then use the framework to add an initial test (an ipi test; self, target-list, broadcast). It's my hope that this framework will be a suitable base on which more tests may be easily added, particularly because we have vgic-new and tcg gicv3 emulation getting close to merge. (v3 UPDATE: vgic-new and tcg gicv3 are merged now) To run it, along with other tests, just do ./configure [ --arch=[arm|arm64] --cross-prefix=$PREFIX ] make export QEMU=$PATH_TO_QEMU ./run_tests.sh To run it separately do, e.g. $QEMU -machine virt,accel=tcg -cpu cortex-a57 \ -device virtio-serial-device \ -device virtconsole,chardev=ctd -chardev testdev,id=ctd \ -display none -serial stdio \ -kernel arm/gic.flat \ -smp 123 -machine gic-version=3 -append ipi ^^ note, we can go nuts with nr-cpus on TCG :-) Or, a KVM example using a different "sender" cpu and irq (other than zero) $QEMU -machine virt,accel=kvm -cpu host \ -device virtio-serial-device \ -device virtconsole,chardev=ctd -chardev testdev,id=ctd \ -display none -serial stdio \ -kernel arm/gic.flat \ -smp 48 -machine gic-version=3 -append 'ipi sender=42 irq=1' Patches: 01-05: fixes and functionality needed by the later gic patches 06-07: code theft from Linux (defines, helper functions) 08-10: arm/gic.flat (the base of the gic unit test), currently just IPI Available here: https://github.com/rhdrjones/kvm-unit-tests/commits/arm/gic Andrew Jones (10): lib: xstr: allow multiple args arm64: fix get_"sysreg32" and make MPIDR 64bit arm/arm64: smp: support more than 8 cpus arm/arm64: add some delay routines arm/arm64: irq enable/disable arm/arm64: add initial gicv2 support arm/arm64: add initial gicv3 support arm/arm64: gicv2: add an IPI test arm/arm64: gicv3: add an IPI test arm/arm64: gic: don't just use zero arm/Makefile.common| 7 +- arm/gic.c | 381 + arm/run| 19 ++- arm/selftest.c | 5 +- arm/unittests.cfg | 13 ++ lib/arm/asm/arch_gicv3.h | 184 ++ lib/arm/asm/gic-v2.h | 74 + lib/arm/asm/gic-v3.h | 321 ++ lib/arm/asm/gic.h | 21 +++ lib/arm/asm/processor.h| 38 - lib/arm/asm/setup.h| 4 +- lib/arm/gic.c | 142 + lib/arm/processor.c| 15 ++ lib/arm/setup.c| 12 +- lib/arm64/asm/arch_gicv3.h | 169 lib/arm64/asm/gic-v2.h | 1 + lib/arm64/asm/gic-v3.h | 1 + lib/arm64/asm/gic.h| 1 + lib/arm64/asm/processor.h | 53 ++- lib/arm64/asm/sysreg.h | 44 ++ lib/arm64/processor.c | 15 ++ lib/libcflat.h | 4 +- 22 files changed, 1498 insertions(+), 26 deletions(-) create mode 100644 arm/gic.c create mode 100644 lib/arm/asm/arch_gicv3.h create mode 100644 lib/arm/asm/gic-v2.h create mode 100644 lib/arm/asm/gic-v3.h create mode 100644 lib/arm/asm/gic.h create mode 100644 lib/arm/gic.c create mode 100644 lib/arm64/asm/arch_gicv3.h create mode 100644 lib/arm64/asm/gic-v2.h create mode 100644 lib/arm64/asm/gic-v3.h create mode 100644 lib/arm64/asm/gic.h create mode 100644 lib/arm64/asm/sysreg.h -- 2.7.4
[Qemu-devel] [kvm-unit-tests PATCH v3 04/10] arm/arm64: add some delay routines
Allow a thread to wait some specified amount of time. Can specify in cycles, usecs, and msecs. Reviewed-by: Alex BennéeSigned-off-by: Andrew Jones --- lib/arm/asm/processor.h | 19 +++ lib/arm/processor.c | 15 +++ lib/arm64/asm/processor.h | 19 +++ lib/arm64/processor.c | 15 +++ 4 files changed, 68 insertions(+) diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h index d2048f5f5f7e6..afc903ca7d4ab 100644 --- a/lib/arm/asm/processor.h +++ b/lib/arm/asm/processor.h @@ -5,7 +5,9 @@ * * This work is licensed under the terms of the GNU LGPL, version 2. */ +#include #include +#include enum vector { EXCPTN_RST, @@ -51,4 +53,21 @@ extern int mpidr_to_cpu(unsigned long mpidr); extern void start_usr(void (*func)(void *arg), void *arg, unsigned long sp_usr); extern bool is_user(void); +static inline u64 get_cntvct(void) +{ + u64 vct; + isb(); + asm volatile("mrrc p15, 1, %Q0, %R0, c14" : "=r" (vct)); + return vct; +} + +extern void delay(u64 cycles); +extern void udelay(unsigned long usecs); + +static inline void mdelay(unsigned long msecs) +{ + while (msecs--) + udelay(1000); +} + #endif /* _ASMARM_PROCESSOR_H_ */ diff --git a/lib/arm/processor.c b/lib/arm/processor.c index 54fdb87ef0196..c2ee360df6884 100644 --- a/lib/arm/processor.c +++ b/lib/arm/processor.c @@ -9,6 +9,7 @@ #include #include #include +#include static const char *processor_modes[] = { "USER_26", "FIQ_26" , "IRQ_26" , "SVC_26" , @@ -141,3 +142,17 @@ bool is_user(void) { return current_thread_info()->flags & TIF_USER_MODE; } + +void delay(u64 cycles) +{ + u64 start = get_cntvct(); + while ((get_cntvct() - start) < cycles) + cpu_relax(); +} + +void udelay(unsigned long usec) +{ + unsigned int frq; + asm volatile("mrc p15, 0, %0, c14, c0, 0" : "=r" (frq)); + delay((u64)usec * frq / 100); +} diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h index 7e448dc81a6aa..94f7ce35b65c1 100644 --- a/lib/arm64/asm/processor.h +++ b/lib/arm64/asm/processor.h @@ -17,8 +17,10 @@ #define SCTLR_EL1_M(1 << 0) #ifndef __ASSEMBLY__ +#include #include #include +#include enum vector { EL1T_SYNC, @@ -89,5 +91,22 @@ extern int mpidr_to_cpu(unsigned long mpidr); extern void start_usr(void (*func)(void *arg), void *arg, unsigned long sp_usr); extern bool is_user(void); +static inline u64 get_cntvct(void) +{ + u64 vct; + isb(); + asm volatile("mrs %0, cntvct_el0" : "=r" (vct)); + return vct; +} + +extern void delay(u64 cycles); +extern void udelay(unsigned long usecs); + +static inline void mdelay(unsigned long msecs) +{ + while (msecs--) + udelay(1000); +} + #endif /* !__ASSEMBLY__ */ #endif /* _ASMARM64_PROCESSOR_H_ */ diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c index deeab4ec9c8ac..50fa835c6f1e3 100644 --- a/lib/arm64/processor.c +++ b/lib/arm64/processor.c @@ -9,6 +9,7 @@ #include #include #include +#include static const char *vector_names[] = { "el1t_sync", @@ -253,3 +254,17 @@ bool is_user(void) { return current_thread_info()->flags & TIF_USER_MODE; } + +void delay(u64 cycles) +{ + u64 start = get_cntvct(); + while ((get_cntvct() - start) < cycles) + cpu_relax(); +} + +void udelay(unsigned long usec) +{ + unsigned int frq; + asm volatile("mrs %0, cntfrq_el0" : "=r" (frq)); + delay((u64)usec * frq / 100); +} -- 2.7.4
Re: [Qemu-devel] [PATCH v3 0/2] Report format specific info for LUKS block driver
On 14.06.2016 18:24, Daniel P. Berrange wrote: > This is a followup to: > > v1: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01723.html > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03642.html > > The 'qemu-img info' tool has ability to print format specific > information, eg with qcow2 it reports two extra items: > > $ qemu-img info ~/VirtualMachines/demo.qcow2 > image: /home/berrange/VirtualMachines/demo.qcow2 > file format: qcow2 > virtual size: 3.0G (3221225472 bytes) > disk size: 140K > cluster_size: 65536 > Format specific information: > compat: 0.10 > refcount bits: 16 > > > This is not currently wired up for the LUKS driver. This patch > series adds that support so that we can report useful data about > the LUKS volume such as the crypto algorithm choices, key slot > usage and other volume metadata. > > The first patch extends the crypto API to allow querying of the > format specific metadata > > The second patches extends the block API to allow the LUKS driver > to report the format specific metadata. > > $ qemu-img info ~/VirtualMachines/demo.luks > image: /home/berrange/VirtualMachines/demo.luks > file format: luks > virtual size: 98M (102760448 bytes) > disk size: 100M > encrypted: yes > Format specific information: > ivgen alg: plain64 > hash alg: sha1 > cipher alg: aes-128 > uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 > cipher mode: xts > slots: > [0]: > active: true > iters: 572706 > key offset: 4096 > stripes: 4000 > [1]: > active: false > key offset: 135168 > [2]: > active: false > key offset: 266240 > [3]: > active: false > key offset: 397312 > [4]: > active: false > key offset: 528384 > [5]: > active: false > key offset: 659456 > [6]: > active: false > key offset: 790528 > [7]: > active: false > key offset: 921600 > payload offset: 2097152 > master key iters: 142375 > > Technically most of the code changes here are in the crypto > layer, rather than the block layer. I'm fine with both patches > going through the block maintainer tree, or can submit a both > patches myself as, for sake of simplicity of merge. > > Changed in v3: > > - Do full struct copy instead of field-by-field copy (Max) > - Simplify handling of linked list pointers (Max) > - Use g_strndup with uuid to guarantee null termination (Max) > - Misc typos (Max) > > Changed in v2: > > - Drop patches related to creating a text output visitor to >format the ImageInfoSpecific data. This will be continued >in a separate patch series > - Fix key offset to be in bytes instead of sectors > - Drop the duplicated ImageInfoSpecificLUKS type and just >directly use QCryptoBlockInfoLUKS type in block layer > - Skip reporting stripes/iters if keyslot is inactive > - Add missing QAPI schema docs > > > > Daniel P. Berrange (2): > crypto: add support for querying parameters for block encryption > block: export LUKS specific data to qemu-img info > > block/crypto.c | 49 > crypto/block-luks.c| 67 > crypto/block.c | 17 +++ > crypto/blockpriv.h | 4 +++ > include/crypto/block.h | 16 +++ > qapi/block-core.json | 6 +++- > qapi/crypto.json | 76 > ++ > 7 files changed, 234 insertions(+), 1 deletion(-) Thanks, I've applied the series to my block branch: https://github.com/XanClic/qemu/commits/block Max signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v7 1/4] ide: refactor retry_unit set and clear into separate function
From: Evgeny YakovlevCode to set and clear state associated with retry in moved into ide_set_retry and ide_clear_retry to make adding retry setups easier. Signed-off-by: Evgeny Yakovlev Signed-off-by: Denis V. Lunev Reviewed-by: Paolo Bonzini CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: John Snow --- hw/ide/core.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 029f6b9..b72346e 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -466,6 +466,20 @@ void ide_abort_command(IDEState *s) s->error = ABRT_ERR; } +static void ide_set_retry(IDEState *s) +{ +s->bus->retry_unit = s->unit; +s->bus->retry_sector_num = ide_get_sector(s); +s->bus->retry_nsector = s->nsector; +} + +static void ide_clear_retry(IDEState *s) +{ +s->bus->retry_unit = -1; +s->bus->retry_sector_num = 0; +s->bus->retry_nsector = 0; +} + /* prepare data transfer and tell what to do after */ void ide_transfer_start(IDEState *s, uint8_t *buf, int size, EndTransferFunc *end_transfer_func) @@ -756,9 +770,7 @@ void dma_buf_commit(IDEState *s, uint32_t tx_bytes) void ide_set_inactive(IDEState *s, bool more) { s->bus->dma->aiocb = NULL; -s->bus->retry_unit = -1; -s->bus->retry_sector_num = 0; -s->bus->retry_nsector = 0; +ide_clear_retry(s); if (s->bus->dma->ops->set_inactive) { s->bus->dma->ops->set_inactive(s->bus->dma, more); } @@ -914,9 +926,7 @@ static void ide_sector_start_dma(IDEState *s, enum ide_dma_cmd dma_cmd) void ide_start_dma(IDEState *s, BlockCompletionFunc *cb) { s->io_buffer_index = 0; -s->bus->retry_unit = s->unit; -s->bus->retry_sector_num = ide_get_sector(s); -s->bus->retry_nsector = s->nsector; +ide_set_retry(s); if (s->bus->dma->ops->start_dma) { s->bus->dma->ops->start_dma(s->bus->dma, s, cb); } -- 2.1.4
[Qemu-devel] [PATCH v7 0/4] block: ignore flush requests when storage is clean
Changes from v6: - squashed patches 5-6 into patch 4 to avoid test faults on git bissect - changed sector number from 0 to 1 in patch 3 Changes from v5: - Removed failed flush traces in block tests 026 071 089 - Changed BLOCK_JOB_READY event order in block tests 141 144 Changes from v4: - Moved to write generation scheme instead of dirty flag - Added retry setup to IDE PIO and FLUSH requests Changes from v3: - Fixed a typo in commit message - Rebased on Kevin'n origin/block Changes from v2: - Better comments - Rebased on latest master Changes from v1: - Flush requests that should be skipped will now wait for completion of any previous requests already in flight - Fixed IDE and AHCI tests to dirty media for new flush behaviour - Fixed a problem in IDE CMD_FLUSH_CACHE failure handling Signed-off-by: Evgeny YakovlevSigned-off-by: Denis V. Lunev CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: John Snow Evgeny Yakovlev (4): ide: refactor retry_unit set and clear into separate function ide: set retry_unit for PIO and FLUSH requests tests: in IDE and AHCI tests perform DMA write before flushing block: ignore flush requests when storage is clean block.c| 3 +++ block/io.c | 21 hw/ide/core.c | 24 +- include/block/block_int.h | 5 tests/ahci-test.c | 34 -- tests/ide-test.c | 43 tests/qemu-iotests/026.out.nocache | 50 -- tests/qemu-iotests/071.out | 8 -- tests/qemu-iotests/089.out | 2 -- tests/qemu-iotests/141.out | 4 +-- tests/qemu-iotests/144.out | 2 +- 11 files changed, 125 insertions(+), 71 deletions(-) -- 2.1.4
Re: [Qemu-devel] [PATCH v18 4/4] block/gluster: add support for multiple gluster servers
On Thu, Jul 14, 2016 at 5:35 PM, Markus Armbrusterwrote: > Interface and error message review only. > > Prasanna Kumar Kalever writes: > >> This patch adds a way to specify multiple volfile servers to the gluster >> block backend of QEMU with tcp|rdma transport types and their port numbers. >> >> Problem: >> >> Currently VM Image on gluster volume is specified like this: >> >> file=gluster[+tcp]://host[:port]/testvol/a.img >> >> Assuming we have three hosts in trusted pool with replica 3 volume >> in action and unfortunately host (mentioned in the command above) went down >> for some reason, since the volume is replica 3 we now have other 2 hosts >> active from which we can boot the VM. >> >> But currently there is no mechanism to pass the other 2 gluster host >> addresses to qemu. > > Awkward. Perhaps: > > Say we have three hosts in a trusted pool with replica 3 volume > in action. When the host mentioned in the command above goes down > for some reason, the other two hosts are still available. But there's > currently no way to tell QEMU about them. Will incorporate in v19 > >> Solution: >> >> New way of specifying VM Image on gluster volume with volfile servers: >> (We still support old syntax to maintain backward compatibility) >> >> Basic command line syntax looks like: >> >> Pattern I: >> -drive driver=gluster, >> volume=testvol,path=/path/a.raw, >> server.0.host=1.2.3.4, >>[server.0.port=24007,] >>[server.0.transport=tcp,] >> server.1.host=5.6.7.8, >>[server.1.port=24008,] >>[server.1.transport=rdma,] > > Don't forget to update this line when you drop transport 'rdma'. More > of the same below. > >> server.2.host=/var/run/glusterd.socket, >> server.2.transport=unix ... >> >> Pattern II: >> 'json:{"driver":"qcow2","file":{"driver":"gluster", >>"volume":"testvol","path":"/path/a.qcow2", >>"server":[{tuple0},{tuple1}, ...{tupleN}]}}' > > Suggest to add -drive here, for symmetry with pattern I. > > JSON calls the things in { ... } objects, not tuples. Let's stick to > JSON terminology: [{object0},{object1}, ...]. But I think spelling > things out to a similar degree as in pattern I would be clearer: > >-drive 'json:{"driver": "qcow2", > "file": { "driver": "gluster", >"volume": "testvol", >"path": "/path/a.qcow2", >"server": [ > { "host": "1.2.3.4", > "port": 24007, > "transport": "tcp" }, > { "host": "5.6.7.8" > ... }, > ... ] } > ... }' IMO, this doesn't work, I have given a try. > >>driver => 'gluster' (protocol name) >>volume => name of gluster volume where our VM image resides >>path=> absolute path of image in gluster volume >> >> {tuple} => {"host":"1.2.3.4"[,"port":"24007","transport":"tcp"]} >> >>host=> host address (hostname/ipv4/ipv6 addresses/socket path) >>port=> port number on which glusterd is listening. (default 24007) >>transport => transport type used to connect to gluster management >> daemon, >>it can be tcp|rdma|unix (default 'tcp') >> >> Examples: >> 1. >> -drive driver=qcow2,file.driver=gluster, >> file.volume=testvol,file.path=/path/a.qcow2, >> file.server.0.host=1.2.3.4, >> file.server.0.port=24007, >> file.server.0.transport=tcp, >> file.server.1.host=5.6.7.8, >> file.server.1.port=24008, >> file.server.1.transport=rdma, >> file.server.2.host=/var/run/glusterd.socket >> file.server.1.transport=unix >> 2. >> 'json:{"driver":"qcow2","file":{"driver":"gluster","volume":"testvol", >> "path":"/path/a.qcow2","server": >> [{"host":"1.2.3.4","port":"24007","transport":"tcp"}, >> {"host":"4.5.6.7","port":"24008","transport":"rdma"}, >> {"host":"/var/run/glusterd.socket","transport":"unix"}] } }' > > Ah, working examples. Good! No need to expand your abbreviated > description of pattern II then, just say object instead of tuple. But if > you prefer to expand it, go ahead, your choice. > >> This patch gives a mechanism to provide all the server addresses, which are >> in >> replica set, so in case host1 is down VM can still boot from any of the >> active hosts. >> >> This is equivalent to the backup-volfile-servers option supported by >> mount.glusterfs (FUSE way of mounting gluster volume) >> >> Credits: Sincere thanks to Kevin Wolf and >> "Deepak C Shetty" for inputs and all their support >> >> Signed-off-by: Prasanna Kumar Kalever >> --- >> block/gluster.c
Re: [Qemu-devel] [PATCH v3 00/14] virtio migration: Flip outer layer to vmstate
On (Fri) 15 Jul 2016 [10:23:10], Cornelia Huck wrote: > On Thu, 14 Jul 2016 18:22:42 +0100 > "Dr. David Alan Gilbert (git)"wrote: > > > From: "Dr. David Alan Gilbert" > > > > Hi, > > This series converts the outer most layer of virtio to > > use VMState macros; this is the easy bit, but I'm hoping that > > having done that, the next trick is to nibble away at the virtio_save/load > > functions and all of the zillions of device/bus helpers. > > Looks good, and I'd like to see this in 2.7. Michael mentioned he's interested in reviewing, so we'll wait a few days for him to go through this. Amit
[Qemu-devel] [kvm-unit-tests PATCH v3 01/10] lib: xstr: allow multiple args
Make implementation equivalent to Linux's include/linux/stringify.h Signed-off-by: Andrew Jones--- lib/libcflat.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/libcflat.h b/lib/libcflat.h index 72b1bf9668ef1..82005f5d014fb 100644 --- a/lib/libcflat.h +++ b/lib/libcflat.h @@ -27,8 +27,8 @@ #define __unused __attribute__((__unused__)) -#define xstr(s) xxstr(s) -#define xxstr(s) #s +#define xstr(s...) xxstr(s) +#define xxstr(s...) #s #define __ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) #define __ALIGN(x, a) __ALIGN_MASK(x, (typeof(x))(a) - 1) -- 2.7.4
[Qemu-devel] [kvm-unit-tests PATCH v3 02/10] arm64: fix get_"sysreg32" and make MPIDR 64bit
mrs is always 64bit, so we should always use a 64bit register. Sometimes we'll only want to return the lower 32, but not for MPIDR, as that does define fields in the upper 32. Reviewed-by: Alex BennéeSigned-off-by: Andrew Jones --- lib/arm64/asm/processor.h | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h index 84d5c7ce752b0..9a208ff729b7e 100644 --- a/lib/arm64/asm/processor.h +++ b/lib/arm64/asm/processor.h @@ -66,14 +66,17 @@ static inline unsigned long current_level(void) return el & 0xc; } -#define DEFINE_GET_SYSREG32(reg) \ -static inline unsigned int get_##reg(void) \ +#define DEFINE_GET_SYSREG(reg, type) \ +static inline type get_##reg(void) \ { \ - unsigned int reg; \ - asm volatile("mrs %0, " #reg "_el1" : "=r" (reg)); \ - return reg; \ + unsigned long r;\ + asm volatile("mrs %0, " #reg "_el1" : "=r" (r));\ + return (type)r; \ } -DEFINE_GET_SYSREG32(mpidr) +#define DEFINE_GET_SYSREG32(reg) DEFINE_GET_SYSREG(reg, unsigned int) +#define DEFINE_GET_SYSREG64(reg) DEFINE_GET_SYSREG(reg, unsigned long) + +DEFINE_GET_SYSREG64(mpidr) /* Only support Aff0 for now, gicv2 only */ #define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff)) -- 2.7.4
[Qemu-devel] [kvm-unit-tests PATCH v3 03/10] arm/arm64: smp: support more than 8 cpus
Reviewed-by: Alex BennéeSigned-off-by: Andrew Jones --- arm/run | 19 --- arm/selftest.c| 5 - lib/arm/asm/processor.h | 9 +++-- lib/arm/asm/setup.h | 4 ++-- lib/arm/setup.c | 12 +++- lib/arm64/asm/processor.h | 9 +++-- 6 files changed, 43 insertions(+), 15 deletions(-) diff --git a/arm/run b/arm/run index a2f35ef6a7e63..2d0698619606e 100755 --- a/arm/run +++ b/arm/run @@ -31,13 +31,6 @@ if [ -z "$ACCEL" ]; then fi fi -if [ "$HOST" = "aarch64" ] && [ "$ACCEL" = "kvm" ]; then - processor="host" - if [ "$ARCH" = "arm" ]; then - processor+=",aarch64=off" - fi -fi - qemu="${QEMU:-qemu-system-$ARCH_NAME}" qpath=$(which $qemu 2>/dev/null) @@ -53,6 +46,18 @@ fi M='-machine virt' +if [ "$ACCEL" = "kvm" ]; then + if $qemu $M,\? 2>&1 | grep gic-version > /dev/null; then + M+=',gic-version=host' + fi + if [ "$HOST" = "aarch64" ]; then + processor="host" + if [ "$ARCH" = "arm" ]; then + processor+=",aarch64=off" + fi + fi +fi + if ! $qemu $M -device '?' 2>&1 | grep virtconsole > /dev/null; then echo "$qpath doesn't support virtio-console for chr-testdev. Exiting." exit 2 diff --git a/arm/selftest.c b/arm/selftest.c index 196164f5313de..2f117f795d2dc 100644 --- a/arm/selftest.c +++ b/arm/selftest.c @@ -312,9 +312,10 @@ static bool psci_check(void) static cpumask_t smp_reported; static void cpu_report(void) { + unsigned long mpidr = get_mpidr(); int cpu = smp_processor_id(); - report("CPU%d online", true, cpu); + report("CPU(%3d) mpidr=%lx", mpidr_to_cpu(mpidr) == cpu, cpu, mpidr); cpumask_set_cpu(cpu, _reported); halt(); } @@ -343,6 +344,7 @@ int main(int argc, char **argv) } else if (strcmp(argv[1], "smp") == 0) { + unsigned long mpidr = get_mpidr(); int cpu; report("PSCI version", psci_check()); @@ -353,6 +355,7 @@ int main(int argc, char **argv) smp_boot_secondary(cpu, cpu_report); } + report("CPU(%3d) mpidr=%lx", mpidr_to_cpu(mpidr) == 0, 0, mpidr); cpumask_set_cpu(0, _reported); while (!cpumask_full(_reported)) cpu_relax(); diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h index f25e7eee3666c..d2048f5f5f7e6 100644 --- a/lib/arm/asm/processor.h +++ b/lib/arm/asm/processor.h @@ -40,8 +40,13 @@ static inline unsigned int get_mpidr(void) return mpidr; } -/* Only support Aff0 for now, up to 4 cpus */ -#define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff)) +#define MPIDR_HWID_BITMASK 0xff +extern int mpidr_to_cpu(unsigned long mpidr); + +#define MPIDR_LEVEL_SHIFT(level) \ + (((1 << level) >> 1) << 3) +#define MPIDR_AFFINITY_LEVEL(mpidr, level) \ + ((mpidr >> MPIDR_LEVEL_SHIFT(level)) & 0xff) extern void start_usr(void (*func)(void *arg), void *arg, unsigned long sp_usr); extern bool is_user(void); diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h index cb8fdbd38dd5d..c501c6ddd8657 100644 --- a/lib/arm/asm/setup.h +++ b/lib/arm/asm/setup.h @@ -10,8 +10,8 @@ #include #include -#define NR_CPUS8 -extern u32 cpus[NR_CPUS]; +#define NR_CPUS255 +extern u64 cpus[NR_CPUS]; extern int nr_cpus; #define NR_MEM_REGIONS 8 diff --git a/lib/arm/setup.c b/lib/arm/setup.c index 7e7b39f11dde1..b6e2d5815e723 100644 --- a/lib/arm/setup.c +++ b/lib/arm/setup.c @@ -24,12 +24,22 @@ extern unsigned long stacktop; extern void io_init(void); extern void setup_args_progname(const char *args); -u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0U) }; +u64 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0U) }; int nr_cpus; struct mem_region mem_regions[NR_MEM_REGIONS]; phys_addr_t __phys_offset, __phys_end; +int mpidr_to_cpu(unsigned long mpidr) +{ + int i; + + for (i = 0; i < nr_cpus; ++i) + if (cpus[i] == (mpidr & MPIDR_HWID_BITMASK)) + return i; + return -1; +} + static void cpu_set(int fdtnode __unused, u32 regval, void *info __unused) { int cpu = nr_cpus++; diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h index 9a208ff729b7e..7e448dc81a6aa 100644 --- a/lib/arm64/asm/processor.h +++ b/lib/arm64/asm/processor.h @@ -78,8 +78,13 @@ static inline type get_##reg(void) \ DEFINE_GET_SYSREG64(mpidr) -/* Only support Aff0 for now, gicv2 only */ -#define mpidr_to_cpu(mpidr) ((int)((mpidr) & 0xff)) +#define MPIDR_HWID_BITMASK 0xff00ff +extern int mpidr_to_cpu(unsigned long mpidr); + +#define MPIDR_LEVEL_SHIFT(level) \ + (((1 << level) >> 1) << 3) +#define MPIDR_AFFINITY_LEVEL(mpidr, level) \ +
[Qemu-devel] [kvm-unit-tests PATCH v3 05/10] arm/arm64: irq enable/disable
Reviewed-by: Alex BennéeSigned-off-by: Andrew Jones --- lib/arm/asm/processor.h | 10 ++ lib/arm64/asm/processor.h | 10 ++ 2 files changed, 20 insertions(+) diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h index afc903ca7d4ab..75a8d08b89330 100644 --- a/lib/arm/asm/processor.h +++ b/lib/arm/asm/processor.h @@ -35,6 +35,16 @@ static inline unsigned long current_cpsr(void) #define current_mode() (current_cpsr() & MODE_MASK) +static inline void local_irq_enable(void) +{ + asm volatile("cpsie i" : : : "memory", "cc"); +} + +static inline void local_irq_disable(void) +{ + asm volatile("cpsid i" : : : "memory", "cc"); +} + static inline unsigned int get_mpidr(void) { unsigned int mpidr; diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h index 94f7ce35b65c1..d54a4ed1c1876 100644 --- a/lib/arm64/asm/processor.h +++ b/lib/arm64/asm/processor.h @@ -68,6 +68,16 @@ static inline unsigned long current_level(void) return el & 0xc; } +static inline void local_irq_enable(void) +{ + asm volatile("msr daifclr, #2" : : : "memory"); +} + +static inline void local_irq_disable(void) +{ + asm volatile("msr daifset, #2" : : : "memory"); +} + #define DEFINE_GET_SYSREG(reg, type) \ static inline type get_##reg(void) \ { \ -- 2.7.4
[Qemu-devel] [PATCH v7 4/4] block: ignore flush requests when storage is clean
From: Evgeny YakovlevSome guests (win2008 server for example) do a lot of unnecessary flushing when underlying media has not changed. This adds additional overhead on host when calling fsync/fdatasync. This change introduces a write generation scheme in BlockDriverState. Current write generation is checked against last flushed generation to avoid unnessesary flushes. The problem with excessive flushing was found by a performance test which does parallel directory tree creation (from 2 processes). Results improved from 0.424 loops/sec to 0.432 loops/sec. Each loop creates 10^3 directories with 10 files in each. This affected some blkdebug testcases that were expecting error logs from failure-injected flushes which are now skipped entirely (tests 026 071 089). This also affects the performance of block jobs and thus BLOCK_JOB_READY events for driver-mirror and active block-commit commands now arrives faster, before QMP send successfully returns to caller (tests 141 144). Signed-off-by: Evgeny Yakovlev Signed-off-by: Denis V. Lunev Reviewed-by: Paolo Bonzini CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: John Snow --- block.c| 3 +++ block/io.c | 21 include/block/block_int.h | 5 tests/qemu-iotests/026.out.nocache | 50 -- tests/qemu-iotests/071.out | 8 -- tests/qemu-iotests/089.out | 2 -- tests/qemu-iotests/141.out | 4 +-- tests/qemu-iotests/144.out | 2 +- 8 files changed, 32 insertions(+), 63 deletions(-) diff --git a/block.c b/block.c index 823ff1d..060f88e 100644 --- a/block.c +++ b/block.c @@ -234,6 +234,8 @@ BlockDriverState *bdrv_new(void) bs->refcnt = 1; bs->aio_context = qemu_get_aio_context(); +qemu_co_queue_init(>flush_queue); + QTAILQ_INSERT_TAIL(_bdrv_states, bs, bs_list); return bs; @@ -2472,6 +2474,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset) ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS); bdrv_dirty_bitmap_truncate(bs); bdrv_parent_cb_resize(bs); +++bs->write_gen; } return ret; } diff --git a/block/io.c b/block/io.c index 7086908..f181ff7 100644 --- a/block/io.c +++ b/block/io.c @@ -1303,6 +1303,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs, } bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE); +++bs->write_gen; bdrv_set_dirty(bs, start_sector, end_sector - start_sector); if (bs->wr_highest_offset < offset + bytes) { @@ -2235,6 +2236,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) tracked_request_begin(, bs, 0, 0, BDRV_TRACKED_FLUSH); +int current_gen = bs->write_gen; + +/* Wait until any previous flushes are completed */ +while (bs->flush_started_gen != bs->flushed_gen) { +qemu_co_queue_wait(>flush_queue); +} + +bs->flush_started_gen = current_gen; + /* Write back all layers by calling one driver function */ if (bs->drv->bdrv_co_flush) { ret = bs->drv->bdrv_co_flush(bs); @@ -2255,6 +2265,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) goto flush_parent; } +/* Check if we really need to flush anything */ +if (bs->flushed_gen == current_gen) { +goto flush_parent; +} + BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK); if (bs->drv->bdrv_co_flush_to_disk) { ret = bs->drv->bdrv_co_flush_to_disk(bs); @@ -2285,6 +2300,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) */ ret = 0; } + if (ret < 0) { goto out; } @@ -2295,6 +2311,10 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) flush_parent: ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0; out: +/* Notify any pending flushes that we have completed */ +bs->flushed_gen = current_gen; +qemu_co_queue_restart_all(>flush_queue); + tracked_request_end(); return ret; } @@ -2420,6 +2440,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, } ret = 0; out: +++bs->write_gen; bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS, req.bytes >> BDRV_SECTOR_BITS); tracked_request_end(); diff --git a/include/block/block_int.h b/include/block/block_int.h index 47b9aac..396bd2b 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -439,6 +439,11 @@ struct BlockDriverState { int copy_on_read; /* if nonzero, copy read backing sectors into image. note this is a reference count */ +CoQueue flush_queue;/* Serializing flush queue */ +unsigned int write_gen; /* Current data generation */ +
Re: [Qemu-devel] [PATCH Qemu] Change spice-server protocol for GL texture passing
Forgot to add RFC to the subject Frediano > > --- > ui/spice-core.c| 5 - > ui/spice-display.c | 29 - > 2 files changed, 8 insertions(+), 26 deletions(-) > > diff --git a/ui/spice-core.c b/ui/spice-core.c > index da05054..f7647f7 100644 > --- a/ui/spice-core.c > +++ b/ui/spice-core.c > @@ -828,11 +828,6 @@ void qemu_spice_init(void) > > #ifdef HAVE_SPICE_GL > if (qemu_opt_get_bool(opts, "gl", 0)) { > -if ((port != 0) || (tls_port != 0)) { > -error_report("SPICE GL support is local-only for now and " > - "incompatible with -spice port/tls-port"); > -exit(1); > -} > if (egl_rendernode_init() != 0) { > error_report("Failed to initialize EGL render node for SPICE > GL"); > exit(1); > diff --git a/ui/spice-display.c b/ui/spice-display.c > index 2a77a54..72137bd 100644 > --- a/ui/spice-display.c > +++ b/ui/spice-display.c > @@ -852,6 +852,10 @@ static void qemu_spice_gl_block_timer(void *opaque) > static QEMUGLContext qemu_spice_gl_create_context(DisplayChangeListener > *dcl, >QEMUGLParams *params) > { > +SimpleSpiceDisplay *ssd = container_of(dcl, SimpleSpiceDisplay, dcl); > + > +spice_qxl_gl_init(>qxl, qemu_egl_display, qemu_egl_rn_ctx); > + > eglMakeCurrent(qemu_egl_display, EGL_NO_SURFACE, EGL_NO_SURFACE, > qemu_egl_rn_ctx); > return qemu_egl_create_context(dcl, params); > @@ -864,28 +868,11 @@ static void qemu_spice_gl_scanout(DisplayChangeListener > *dcl, >uint32_t w, uint32_t h) > { > SimpleSpiceDisplay *ssd = container_of(dcl, SimpleSpiceDisplay, dcl); > -EGLint stride = 0, fourcc = 0; > -int fd = -1; > - > -if (tex_id) { > -fd = egl_get_fd_for_texture(tex_id, , ); > -if (fd < 0) { > -fprintf(stderr, "%s: failed to get fd for texture\n", __func__); > -return; > -} > -dprint(1, "%s: %dx%d (stride %d, fourcc 0x%x)\n", __func__, > - w, h, stride, fourcc); > -} else { > -dprint(1, "%s: no texture (no framebuffer)\n", __func__); > -} > - > -assert(!tex_id || fd >= 0); > > -/* note: spice server will close the fd */ > -spice_qxl_gl_scanout(>qxl, fd, > - surface_width(ssd->ds), > - surface_height(ssd->ds), > - stride, fourcc, y_0_top); > +spice_qxl_gl_scanout_texture(>qxl, tex_id, > + surface_width(ssd->ds), > + surface_height(ssd->ds), > + y_0_top); > > qemu_spice_gl_monitor_config(ssd, x, y, w, h); > }
[Qemu-devel] [PATCH 1/3] linux-user: Fix handling of iovec counts
In the kernel the length of an iovec is generally handled as an unsigned long, not an integer; fix the parameter to lock_iovec() accordingly. Signed-off-by: Peter Maydell--- linux-user/syscall.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 9dbd711..8d36d6c 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2647,7 +2647,7 @@ static abi_long do_getsockopt(int sockfd, int level, int optname, } static struct iovec *lock_iovec(int type, abi_ulong target_addr, -int count, int copy) +abi_ulong count, int copy) { struct target_iovec *target_vec; struct iovec *vec; @@ -2660,7 +2660,7 @@ static struct iovec *lock_iovec(int type, abi_ulong target_addr, errno = 0; return NULL; } -if (count < 0 || count > IOV_MAX) { +if (count > IOV_MAX) { errno = EINVAL; return NULL; } @@ -2735,7 +2735,7 @@ static struct iovec *lock_iovec(int type, abi_ulong target_addr, } static void unlock_iovec(struct iovec *vec, abi_ulong target_addr, - int count, int copy) + abi_ulong count, int copy) { struct target_iovec *target_vec; int i; @@ -2962,7 +2962,7 @@ static abi_long do_sendrecvmsg_locked(int fd, struct target_msghdr *msgp, { abi_long ret, len; struct msghdr msg; -int count; +abi_ulong count; struct iovec *vec; abi_ulong target_vec; -- 1.9.1
[Qemu-devel] [PATCH 2/3] linux-user: Fix errno for sendrecvmsg with large iovec length
The sendmsg and recvmsg syscalls use a different errno to indicate an overlarge iovec length from readv and writev. Handle this special case in do_sendrcvmsg_locked() to avoid getting the default errno returned by lock_iovec(). Signed-off-by: Peter Maydell--- linux-user/syscall.c | 9 + 1 file changed, 9 insertions(+) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 8d36d6c..a21ee59 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2985,6 +2985,15 @@ static abi_long do_sendrecvmsg_locked(int fd, struct target_msghdr *msgp, count = tswapal(msgp->msg_iovlen); target_vec = tswapal(msgp->msg_iov); + +if (count > IOV_MAX) { +/* sendrcvmsg returns a different errno for this condition than + * readv/writev, so we must catch it here before lock_iovec() does. + */ +ret = -TARGET_EMSGSIZE; +goto out2; +} + vec = lock_iovec(send ? VERIFY_READ : VERIFY_WRITE, target_vec, count, send); if (vec == NULL) { -- 1.9.1
Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
On 13/07/16 14:13, Paolo Bonzini wrote: > diff --git a/include/qemu/qht.h b/include/qemu/qht.h > index 70bfc68..f4f1d55 100644 > --- a/include/qemu/qht.h > +++ b/include/qemu/qht.h > @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht); > * Attempting to insert a NULL @p is a bug. > * Inserting the same pointer @p with different @hash values is a bug. > * > + * In case of successful operation, smp_wmb() is implied before the pointer > is > + * inserted into the hash table. > + * > * Returns true on sucess. > * Returns false if the @p-@hash pair already exists in the hash table. > */ > @@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash); > * > * Needs to be called under an RCU read-critical section. > * > + * smp_read_barrier_depends() is implied before the call to @func. > + * > * The user-provided @func compares pointers in QHT against @userp. > * If the function returns true, a match has been found. > * > @@ -105,6 +110,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t func, > const void *userp, > * This guarantees that concurrent lookups will always compare against valid > * data. > * > + * In case of successful operation, a smp_wmb() barrier is implied before and > + * after the pointer is removed from the hash table. In other words, > + * a successful qht_remove acts as a bidirectional write barrier. > + * I understand why an implied wmb can be expected after the entry is removed: so that the caller can trash the contents of the object removed. However that would require double-check on lookup side to make sure the entry hadn't been removed after the first lookup (something like seqlock read side). But I have no idea why we might like to promise wmb before the removal. Could you please share your point regarding this? I tempt to remove any promises on qht_remove() because it is not clear for me what would be the natural expectations on memory ordering. As of qht_insert() and qht_lookup(), I agree and this is enough guarantees for the series. Thanks, Sergey > * Returns true on success. > * Returns false if the @p-@hash pair was not found. > */ > diff --git a/util/qht.c b/util/qht.c > index 40d6e21..d38948e 100644 > --- a/util/qht.c > +++ b/util/qht.c > @@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, > qht_lookup_func_t func, > do { > for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { > if (b->hashes[i] == hash) { > -void *p = atomic_read(>pointers[i]); > +/* The pointer is dereferenced before seqlock_read_retry, > + * so (unlike qht_insert__locked) we need to use > + * atomic_rcu_read here. > + */ > +void *p = atomic_rcu_read(>pointers[i]); > > if (likely(p) && likely(func(p, userp))) { > return p; > @@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct > qht_map *map, > atomic_rcu_set(>next, b); > } > b->hashes[i] = hash; > +/* smp_wmb() implicit in seqlock_write_begin. */ > atomic_set(>pointers[i], p); > seqlock_write_end(>sequence); > return true; > @@ -659,6 +664,9 @@ bool qht_remove__locked(struct qht_map *map, struct > qht_bucket *head, > } > if (q == p) { > qht_debug_assert(b->hashes[i] == hash); > +/* seqlock_write_begin and seqlock_write_end provide write > + * memory barrier semantics to callers of qht_remove. > + */ > seqlock_write_begin(>sequence); > qht_bucket_remove_entry(b, i); > seqlock_write_end(>sequence);
Re: [Qemu-devel] [PATCH v2] qcow2: do not allocate extra memory
On 14.07.2016 18:59, Vladimir Sementsov-Ogievskiy wrote: > There are no needs to allocate more than one cluster, as we set > avail_out for deflate to one cluster. > > Zlib docs (http://www.zlib.net/manual.html) says: > "deflate compresses as much data as possible, and stops when the input > buffer becomes empty or the output buffer becomes full." > > So, deflate will not write more than avail_out to output buffer. If > there is no enough space in output buffer for compressed data (it may be > larger than input data) deflate just returns Z_OK. (if all data is > compressed and written to output buffer deflate returns Z_STREAM_END). > > Signed-off-by: Vladimir Sementsov-Ogievskiy> --- > > v2: improve commit message > > block/qcow.c | 2 +- > block/qcow2.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) Thanks Vladimir, applied to my block branch (with s/no/not/ as proposed by Eric): https://github.com/XanClic/qemu/commits/block Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
On 15/07/2016 14:37, Sergey Fedorov wrote: > On 13/07/16 14:13, Paolo Bonzini wrote: >> diff --git a/include/qemu/qht.h b/include/qemu/qht.h >> index 70bfc68..f4f1d55 100644 >> --- a/include/qemu/qht.h >> +++ b/include/qemu/qht.h >> @@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht); >> * Attempting to insert a NULL @p is a bug. >> * Inserting the same pointer @p with different @hash values is a bug. >> * >> + * In case of successful operation, smp_wmb() is implied before the pointer >> is >> + * inserted into the hash table. >> + * >> * Returns true on sucess. >> * Returns false if the @p-@hash pair already exists in the hash table. >> */ >> @@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash); >> * >> * Needs to be called under an RCU read-critical section. >> * >> + * smp_read_barrier_depends() is implied before the call to @func. >> + * >> * The user-provided @func compares pointers in QHT against @userp. >> * If the function returns true, a match has been found. >> * >> @@ -105,6 +110,10 @@ void *qht_lookup(struct qht *ht, qht_lookup_func_t >> func, const void *userp, >> * This guarantees that concurrent lookups will always compare against valid >> * data. >> * >> + * In case of successful operation, a smp_wmb() barrier is implied before >> and >> + * after the pointer is removed from the hash table. In other words, >> + * a successful qht_remove acts as a bidirectional write barrier. >> + * > > I understand why an implied wmb can be expected after the entry is > removed: so that the caller can trash the contents of the object > removed. However that would require double-check on lookup side to make > sure the entry hadn't been removed after the first lookup (something > like seqlock read side). But I have no idea why we might like to promise > wmb before the removal. Could you please share your point regarding this? The reasoning was to make qht_remove() "look to be ordered" with respect to writes. So anything after remove wouldn't bleed into it, nor would anything before. In the case of this series, it would let you remove the smp_wmb() after tb_mark_invalid(). However, it's also okay to only handle qht_insert() and qht_lookup(), and keep the memory barrier after the TB is marked invalid (though it is unnecessary). Paolo > As of qht_insert() and qht_lookup(), I agree and this is enough > guarantees for the series. > > Thanks, > Sergey > >> * Returns true on success. >> * Returns false if the @p-@hash pair was not found. >> */ >> diff --git a/util/qht.c b/util/qht.c >> index 40d6e21..d38948e 100644 >> --- a/util/qht.c >> +++ b/util/qht.c >> @@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, >> qht_lookup_func_t func, >> do { >> for (i = 0; i < QHT_BUCKET_ENTRIES; i++) { >> if (b->hashes[i] == hash) { >> -void *p = atomic_read(>pointers[i]); >> +/* The pointer is dereferenced before seqlock_read_retry, >> + * so (unlike qht_insert__locked) we need to use >> + * atomic_rcu_read here. >> + */ >> +void *p = atomic_rcu_read(>pointers[i]); >> >> if (likely(p) && likely(func(p, userp))) { >> return p; >> @@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct >> qht_map *map, >> atomic_rcu_set(>next, b); >> } >> b->hashes[i] = hash; >> +/* smp_wmb() implicit in seqlock_write_begin. */ >> atomic_set(>pointers[i], p); >> seqlock_write_end(>sequence); >> return true; >> @@ -659,6 +664,9 @@ bool qht_remove__locked(struct qht_map *map, struct >> qht_bucket *head, >> } >> if (q == p) { >> qht_debug_assert(b->hashes[i] == hash); >> +/* seqlock_write_begin and seqlock_write_end provide write >> + * memory barrier semantics to callers of qht_remove. >> + */ >> seqlock_write_begin(>sequence); >> qht_bucket_remove_entry(b, i); >> seqlock_write_end(>sequence); >
[Qemu-devel] [kvm-unit-tests PATCH v3 10/10] arm/arm64: gic: don't just use zero
Allow user to select who sends ipis and with which irq, rather than just always sending irq=0 from cpu0. Signed-off-by: Andrew Jones--- v2: actually check that the irq received was the irq sent, and (for gicv2) that the sender is the expected one. --- arm/gic.c | 80 ++- 1 file changed, 64 insertions(+), 16 deletions(-) diff --git a/arm/gic.c b/arm/gic.c index fc7ef241de3e2..d3ab97d4ae470 100644 --- a/arm/gic.c +++ b/arm/gic.c @@ -11,6 +11,7 @@ * This work is licensed under the terms of the GNU LGPL, version 2. */ #include +#include #include #include #include @@ -33,6 +34,8 @@ static struct gic *gic; static int gic_version; static int acked[NR_CPUS], spurious[NR_CPUS]; static cpumask_t ready; +static int sender; +static u32 irq; static void nr_cpu_check(int nr) { @@ -85,7 +88,16 @@ static void check_acked(cpumask_t *mask) static u32 gicv2_read_iar(void) { - return readl(gicv2_cpu_base() + GIC_CPU_INTACK); + u32 iar = readl(gicv2_cpu_base() + GIC_CPU_INTACK); + int src = (iar >> 10) & 7; + + if (src != sender) { + report("cpu%d received IPI from unexpected source cpu%d " + "(expected cpu%d)", + false, smp_processor_id(), src, sender); + } + + return iar & 0x3ff; } static void gicv2_write_eoi(u32 irq) @@ -99,9 +111,15 @@ static void ipi_handler(struct pt_regs *regs __unused) if (iar != GICC_INT_SPURIOUS) { gic->write_eoi(iar); - smp_rmb(); /* pairs with wmb in ipi_test functions */ - ++acked[smp_processor_id()]; - smp_wmb(); /* pairs with rmb in check_acked */ + if (iar == irq) { + smp_rmb(); /* pairs with wmb in ipi_test functions */ + ++acked[smp_processor_id()]; + smp_wmb(); /* pairs with rmb in check_acked */ + } else { + report("cpu%d received unexpected irq %u " + "(expected %u)", + false, smp_processor_id(), iar, irq); + } } else { ++spurious[smp_processor_id()]; smp_wmb(); @@ -110,19 +128,19 @@ static void ipi_handler(struct pt_regs *regs __unused) static void gicv2_ipi_send_self(void) { - writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT); + writel(2 << 24 | irq, gicv2_dist_base() + GIC_DIST_SOFTINT); } static void gicv2_ipi_send_tlist(cpumask_t *mask) { u8 tlist = (u8)cpumask_bits(mask)[0]; - writel(tlist << 16, gicv2_dist_base() + GIC_DIST_SOFTINT); + writel(tlist << 16 | irq, gicv2_dist_base() + GIC_DIST_SOFTINT); } static void gicv2_ipi_send_broadcast(void) { - writel(1 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT); + writel(1 << 24 | irq, gicv2_dist_base() + GIC_DIST_SOFTINT); } #define ICC_SGI1R_AFFINITY_1_SHIFT 16 @@ -165,7 +183,7 @@ static void gicv3_ipi_send_tlist(cpumask_t *mask) sgi1r = (MPIDR_TO_SGI_AFFINITY(cluster_id, 3) | MPIDR_TO_SGI_AFFINITY(cluster_id, 2) | -/* irq << 24 | */ +irq << 24 | MPIDR_TO_SGI_AFFINITY(cluster_id, 1) | tlist); @@ -187,7 +205,7 @@ static void gicv3_ipi_send_self(void) static void gicv3_ipi_send_broadcast(void) { - gicv3_write_sgi1r(1ULL << 40); + gicv3_write_sgi1r(1ULL << 40 | irq << 24); isb(); } @@ -199,7 +217,7 @@ static void ipi_test_self(void) memset(acked, 0, sizeof(acked)); smp_wmb(); cpumask_clear(); - cpumask_set_cpu(0, ); + cpumask_set_cpu(smp_processor_id(), ); gic->ipi.send_self(); check_acked(); report_prefix_pop(); @@ -214,7 +232,7 @@ static void ipi_test_smp(void) memset(acked, 0, sizeof(acked)); smp_wmb(); cpumask_copy(, _present_mask); - for (i = 0; i < nr_cpus; i += 2) + for (i = smp_processor_id() & 1; i < nr_cpus; i += 2) cpumask_clear_cpu(i, ); gic->ipi.send_tlist(); check_acked(); @@ -224,7 +242,7 @@ static void ipi_test_smp(void) memset(acked, 0, sizeof(acked)); smp_wmb(); cpumask_copy(, _present_mask); - cpumask_clear_cpu(0, ); + cpumask_clear_cpu(smp_processor_id(), ); gic->ipi.send_broadcast(); check_acked(); report_prefix_pop(); @@ -241,6 +259,15 @@ static void ipi_enable(void) local_irq_enable(); } +static void ipi_send(void) +{ + ipi_enable(); + wait_on_ready(); + ipi_test_self(); + ipi_test_smp(); + exit(report_summary()); +} + static void ipi_recv(void) { ipi_enable(); @@ -300,19 +327,40 @@ int
[Qemu-devel] [kvm-unit-tests PATCH v3 09/10] arm/arm64: gicv3: add an IPI test
Signed-off-by: Andrew Jones--- v2: use IRM for gicv3 broadcast --- arm/gic.c | 157 ++ arm/unittests.cfg | 6 +++ 2 files changed, 154 insertions(+), 9 deletions(-) diff --git a/arm/gic.c b/arm/gic.c index cf7ec1c90413c..fc7ef241de3e2 100644 --- a/arm/gic.c +++ b/arm/gic.c @@ -3,6 +3,8 @@ * * GICv2 * . test sending/receiving IPIs + * GICv3 + * . test sending/receiving IPIs * * Copyright (C) 2016, Red Hat Inc, Andrew Jones * @@ -16,6 +18,18 @@ #include #include +struct gic { + struct { + void (*enable)(void); + void (*send_self)(void); + void (*send_tlist)(cpumask_t *); + void (*send_broadcast)(void); + } ipi; + u32 (*read_iar)(void); + void (*write_eoi)(u32); +}; + +static struct gic *gic; static int gic_version; static int acked[NR_CPUS], spurious[NR_CPUS]; static cpumask_t ready; @@ -69,12 +83,22 @@ static void check_acked(cpumask_t *mask) false, missing, extra, unexpected); } +static u32 gicv2_read_iar(void) +{ + return readl(gicv2_cpu_base() + GIC_CPU_INTACK); +} + +static void gicv2_write_eoi(u32 irq) +{ + writel(irq, gicv2_cpu_base() + GIC_CPU_EOI); +} + static void ipi_handler(struct pt_regs *regs __unused) { - u32 iar = readl(gicv2_cpu_base() + GIC_CPU_INTACK); + u32 iar = gic->read_iar(); if (iar != GICC_INT_SPURIOUS) { - writel(iar, gicv2_cpu_base() + GIC_CPU_EOI); + gic->write_eoi(iar); smp_rmb(); /* pairs with wmb in ipi_test functions */ ++acked[smp_processor_id()]; smp_wmb(); /* pairs with rmb in check_acked */ @@ -84,6 +108,89 @@ static void ipi_handler(struct pt_regs *regs __unused) } } +static void gicv2_ipi_send_self(void) +{ + writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT); +} + +static void gicv2_ipi_send_tlist(cpumask_t *mask) +{ + u8 tlist = (u8)cpumask_bits(mask)[0]; + + writel(tlist << 16, gicv2_dist_base() + GIC_DIST_SOFTINT); +} + +static void gicv2_ipi_send_broadcast(void) +{ + writel(1 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT); +} + +#define ICC_SGI1R_AFFINITY_1_SHIFT 16 +#define ICC_SGI1R_AFFINITY_2_SHIFT 32 +#define ICC_SGI1R_AFFINITY_3_SHIFT 48 +#define MPIDR_TO_SGI_AFFINITY(cluster_id, level) \ + (MPIDR_AFFINITY_LEVEL(cluster_id, level) << ICC_SGI1R_AFFINITY_## level ## _SHIFT) + +static void gicv3_ipi_send_tlist(cpumask_t *mask) +{ + u16 tlist; + int cpu; + + for_each_cpu(cpu, mask) { + u64 mpidr = cpus[cpu], sgi1r; + u64 cluster_id = mpidr & ~0xffUL; + + tlist = 0; + + while (cpu < nr_cpus) { + if ((mpidr & 0xff) >= 16) { + printf("cpu%d MPIDR:aff0 is %d (>= 16)!\n", + cpu, (int)(mpidr & 0xff)); + break; + } + + tlist |= 1 << (mpidr & 0xf); + + cpu = cpumask_next(cpu, mask); + if (cpu >= nr_cpus) + break; + + mpidr = cpus[cpu]; + + if (cluster_id != (mpidr & ~0xffUL)) { + --cpu; + break; + } + } + + sgi1r = (MPIDR_TO_SGI_AFFINITY(cluster_id, 3) | +MPIDR_TO_SGI_AFFINITY(cluster_id, 2) | +/* irq << 24 | */ +MPIDR_TO_SGI_AFFINITY(cluster_id, 1) | +tlist); + + gicv3_write_sgi1r(sgi1r); + } + + /* Force the above writes to ICC_SGI1R_EL1 to be executed */ + isb(); +} + +static void gicv3_ipi_send_self(void) +{ + cpumask_t mask; + + cpumask_clear(); + cpumask_set_cpu(smp_processor_id(), ); + gicv3_ipi_send_tlist(); +} + +static void gicv3_ipi_send_broadcast(void) +{ + gicv3_write_sgi1r(1ULL << 40); + isb(); +} + static void ipi_test_self(void) { cpumask_t mask; @@ -93,7 +200,7 @@ static void ipi_test_self(void) smp_wmb(); cpumask_clear(); cpumask_set_cpu(0, ); - writel(2 << 24, gicv2_dist_base() + GIC_DIST_SOFTINT); + gic->ipi.send_self(); check_acked(); report_prefix_pop(); } @@ -101,14 +208,15 @@ static void ipi_test_self(void) static void ipi_test_smp(void) { cpumask_t mask; - unsigned long tlist; + int i; report_prefix_push("target-list"); memset(acked, 0, sizeof(acked)); smp_wmb(); - tlist = cpumask_bits(_present_mask)[0] & 0xaa; - cpumask_bits()[0] = tlist; - writel((u8)tlist << 16,
Re: [Qemu-devel] [Xen-devel] Regression with commit 095497ffc66b7f031
On 15/07/16 14:42, Paolo Bonzini wrote: > > > On 15/07/2016 12:41, Juergen Gross wrote: >> On 15/07/16 12:35, Paolo Bonzini wrote: >>> >>> >>> On 15/07/2016 12:12, Gerd Hoffmann wrote: On Fr, 2016-07-15 at 12:02 +0200, Paolo Bonzini wrote: > > On 15/07/2016 10:47, Juergen Gross wrote: >> Nothing scaring and no real difference between working and not working >> variant. >> >> Meanwhile I've been digging a little bit deeper and found the reason: >> libxenstore is setting up a reader thread which is waiting for the >> watch to fire. With above commit the stack size of that thread (16kB) >> is too small. Setting it to 32kB made qemu work again. > > This makes very little sense (not your fault)... The commit doesn't > change stack usage at all, TLS should not be on the stack. > > Can you capture a backtrace where the 16K stack is exceeded? Perhaps > it's only due to inlining decision on the compiler, in which case > Peter's patch from today is only a bandaid. Hmm, I guess I hold off the vnc pull request for now ... >>> >>> Go ahead. I looked at glibc source code and the patch is okay. >> >> Paolo, do you know of any interface to obtain the size of the TLS area >> taken from the stack (before calling pthread_create() )? > > https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01643.html has a patch > that _removes_ code to do this from the golang runtime. The comments > there say that only with glibc before version 2.16 the static TLS size > is taken out of the stack size... > > What version of glibc are you using? 2.19. But according to: https://sourceware.org/bugzilla/show_bug.cgi?id=11787 the issue is still present today. Juergen
Re: [Qemu-devel] [PATCH v4 11/11] nbd-server: Allow node name for nbd-server-add
On 14.07.2016 23:36, Eric Blake wrote: > On 07/14/2016 07:28 AM, Kevin Wolf wrote: >> There is no reason why an NBD server couldn't be started for any node, >> even if it's not on the top level. This converts nbd-server-add to >> accept a node-name. >> >> Note that there is a semantic difference between using a BlockBackend >> name and the node name of its root: In the former case, the NBD server >> is closed on eject; in the latter case, the NBD server doesn't drop its >> reference and keeps the image file open this way. >> >> Signed-off-by: Kevin Wolf>> --- >> blockdev-nbd.c | 21 + >> qapi/block.json | 4 ++-- >> 2 files changed, 11 insertions(+), 14 deletions(-) >> >> diff --git a/blockdev-nbd.c b/blockdev-nbd.c >> index c437d32..ca41cc6 100644 >> --- a/blockdev-nbd.c >> +++ b/blockdev-nbd.c >> @@ -145,7 +145,8 @@ void qmp_nbd_server_start(SocketAddress *addr, >> void qmp_nbd_server_add(const char *device, bool has_writable, bool >> writable, >> Error **errp) >> { >> -BlockBackend *blk; >> +BlockDriverState *bs = NULL; >> +BlockBackend *on_eject_blk; >> NBDExport *exp; >> >> if (!nbd_server) { >> @@ -158,26 +159,22 @@ void qmp_nbd_server_add(const char *device, bool >> has_writable, bool writable, >> return; > > Do we want to do any sanity checking that writing should only be > permitted on a root, and that when using a node name that is not a root > that writable must be false so as not to negatively change the BDS out > of under the feet of the other root? Do op-blockers already cover that? Well, one could argue that it's possible to create an NBD server on a non-root node today anyway, since creating BBs is not restricted to root nodes: blockdev-add(id=foo, other arguments...) blockdev-add(id=bar, backing=foo, other arguments...) And then you can create an NBD server on bar. I agree that this is not how it should be, though. However, I think that the fact that you need to specify a BB name for now deters people from doing stuff like that. If you can specify a node name, people will think it's completely fine to do so. Also note that only allowing NBD servers to be created on a root node doesn't really help you: blockdev-add(node-name=foo, ...) nbd-server-add(device=foo) blockdev-add(id=bar, backing=foo, ...) So, yeah, I think we just need the new op-blockers for this, I don't think the current op blockers cover this. Max signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH v7 2/4] ide: set retry_unit for PIO and FLUSH requests
From: Evgeny YakovlevThe following sequence of tests discovered a problem in IDE emulation: 1. Send DMA write to IDE device 0 2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block layer using blkdebug script in tests/ide-test:test_retry_flush When doing DMA request ide/core.c will set s->retry_unit to s->unit in ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1. After that ide_flush_cache runs and fails thanks to blkdebug. ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit == s->unit. But s->retry_unit is still -1 after previous DMA completion and flush does not use anything related to retry. This patch restricts retry unit assertion only to ops that actually use retry logic. Signed-off-by: Evgeny Yakovlev Signed-off-by: Denis V. Lunev Reviewed-by: Paolo Bonzini CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: John Snow --- hw/ide/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/ide/core.c b/hw/ide/core.c index b72346e..14f03d2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -487,6 +487,7 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int size, s->end_transfer_func = end_transfer_func; s->data_ptr = buf; s->data_end = buf + size; +ide_set_retry(s); if (!(s->status & ERR_STAT)) { s->status |= DRQ_STAT; } @@ -1056,6 +1057,7 @@ static void ide_flush_cache(IDEState *s) } s->status |= BUSY_STAT; +ide_set_retry(s); block_acct_start(blk_get_stats(s->blk), >acct, 0, BLOCK_ACCT_FLUSH); s->pio_aiocb = blk_aio_flush(s->blk, ide_flush_cb, s); } -- 2.1.4
[Qemu-devel] [PATCH 0/3] linux-user: fix corner cases in sendrecvmsg
These patches fix some bugs in handling corner cases of sendrecvmsg; this allows us to pass the LTP 'recvmsg01' test case. thanks -- PMM Peter Maydell (3): linux-user: Fix handling of iovec counts linux-user: Fix errno for sendrecvmsg with large iovec length linux-user: Allow bad msg_name for recvfrom on connected socket linux-user/syscall.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) -- 1.9.1
Re: [Qemu-devel] [PATCH] qmp: add support for mixed typed input visitor
Eric Blakewrites: > On 07/14/2016 08:39 AM, Daniel P. Berrange wrote: >> On Thu, Jul 14, 2016 at 08:23:18AM -0600, Eric Blake wrote: >>> On 07/14/2016 08:16 AM, Daniel P. Berrange wrote: Add a qmp_mixed_input_visitor_new() method which returns a QMP input visitor that accepts either strings or the native data types. > > Question: do we want to allow: "key":1 when the QAPI is written > 'key':'str'? Your current patches allow the converse (allowing > "key":"1" when the QAPI is written 'key':'int'). To allow native types > to be consumed in mixed-mode where string is expected would require yet > another method for deciding how to handle non-strings in > v->visitor.type_str. Where it might be useful is in SocketAddress > parsing, in particular where InetSocketAddress.port is currently 'str' > but where it often takes an integer port number in addition to a string > for a named port alias; callers currently have to pass a stringized > integer, where mixed mode might make it easier to fudge things. I think we shouldn't do that. Let me explain why. The QMP input visitor is designed for QMP input. Works like this: the JSON parser converts a JSON value to a QObject, and the QMP input visitor converts the QObject to a QAPI object. Observations: * In JSON, types are obvious. The JSON parser's conversion to QObject is mostly straightforward: JSON object becomes QDict, array becomes QList, string becomes QString, false and true become QBool, null becomes qnull(). The only complication is JSON number, which becomes either QInt or QFloat, depending on its value. * QInt represents int64_t. Integers outside its range become QFloat. In particular, INT64_MAX+1..UINT64_MAX become QFloat. * The QMP input visitor checks the QObject's type matches the QAPI object's type. For object and array, this is recursive. To compensate for the split of JSON number into QInt and QFloat, it accepts both for QAPI type 'number'. * Despite its name, the QMP input visitor doesn't visit QMP, it visits a QObject. Makes it useful in other contexts, such as QemuOpts input. QemuOpts input works like this: the QemuOpts parser converts a key=value,... string to a QemuOpts, qemu_opts_to_qdict() converts to a QObject, and the QMP input visitor converts to a QAPI object. Observations: * In the key=value,... string, types are ambiguous. The QemuOpts parser disambiguates to bool, uint64_t, char * when given an option description (opts->list->desc[] not empty), else it treats all values as strings. Even when it disambiguates, it retains the string value. * QemuOpts that are ultimately fed to the QMP input visitor typically have no option description. Even if they have one, the types are thrown away: qemu_opts_to_qdict() works with the strings. * Since all scalars are strings, the QMP input visitor's type checking will fail when it runs into a scalar QAPI type other than str. This is the problem Dan needs solved. Let's compare the two pipelines JSON -> QObject -> QAPI object and key=value,... -> QObject -> QAPI object. Their first stages are conceptually similar, and the remaining stages are identical. The difference of interest here is how they pick QObject types: * The JSON pipeline picks in its first stage. * The QemuOpts pipeline can't do that, but to be able to reuse the rest of the pipeline, it arbitrarily picks QString. Good enough for its intial uses. Not good for the uses we need to support now. I believe we should change the QemuOpts pipeline to resolve types in its last stage. This requires a different input visitor: one that resolves types rather than checking them. I guess this is basically what Dan's "[PATCH v7 3/7] qapi: add a QmpInputVisitor that does string conversion" does. It's even less a *QMP* input visitor than the original, but let's not worry about that now. Now it gets ugly. A long time ago, under a lot of pressure to have QMP reach feature parity with HMP, I shoehorned device_add into QMP. QAPI didn't exist back then, so the pipeline was just JSON -> QObject. I added a -> QemuOpts stage, done by qemu_opts_from_qdict(), so I could use the existing qdev_device_add() unmodified. Despite such shortcuts, it took me ~50 patches (commit 0aef426..8bc27249). I've regretted it ever since. This JSON -> QObject -> QemuOpts pipeline is problematic: its second stage throws away all type information. It preserves JSON string values, but anything else gets converted to a string. Which may, if you're lucky, even resemble your JSON token string. Examples: JSONQemuOpts value of key "foo" "foo": "abracadabra" "abracadabra" "foo": -1 "-1" "foo": 1.024e3 "1024" "foo": 9223372036854775808 "9.2233720368547758e+18" "foo": 6.022140857e23 "6.022140857002e+23" "foo": false "off" "foo": null
Re: [Qemu-devel] [Qemu-block] [PATCH] AioContext: correct comments
On 15.07.2016 11:44, Cao jin wrote: > Correct comments of field notify_me > > Cc: Kevin Wolf> Cc: Max Reitz > Signed-off-by: Cao jin > --- > include/block/aio.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/block/aio.h b/include/block/aio.h > index 0922b69..f89a1df 100644 > --- a/include/block/aio.h > +++ b/include/block/aio.h > @@ -71,7 +71,7 @@ struct AioContext { > * event_notifier_set necessary. > * > * Bit 0 is reserved for GSource usage of the AioContext, and is 1 > - * between a call to aio_ctx_check and the next call to aio_ctx_dispatch. > + * between a call to aio_ctx_prepare and the next call to aio_ctx_check. > * Bits 1-31 simply count the number of active calls to aio_poll > * that are in the prepare or poll phase. > * Reviewed-by: Max Reitz CC-ing Stefan and Fam, since I think this file should actually be under their maintainership (should we add it to MAINTAINERS?). Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v3 01/11] util/qht: Document memory ordering assumptions
On 15/07/16 15:51, Paolo Bonzini wrote: > > On 15/07/2016 14:37, Sergey Fedorov wrote: >> I understand why an implied wmb can be expected after the entry is >> removed: so that the caller can trash the contents of the object >> removed. However that would require double-check on lookup side to make >> sure the entry hadn't been removed after the first lookup (something >> like seqlock read side). But I have no idea why we might like to promise >> wmb before the removal. Could you please share your point regarding this? > The reasoning was to make qht_remove() "look to be ordered" with respect > to writes. So anything after remove wouldn't bleed into it, nor would > anything before. > > In the case of this series, it would let you remove the smp_wmb() after > tb_mark_invalid(). However, it's also okay to only handle qht_insert() > and qht_lookup(), and keep the memory barrier after the TB is marked > invalid (though it is unnecessary). > I'm pretty sure the smp_wmb() after tb_mark_invalid() is unnecessary anyway. We don't rely on it at all because we're to recheck for tb_is_invalid() under tb_lock before tb_add_jump(). However we have to invalidate the CPU state atomically since we're going to check for it out of tb_lock. Kind regards, Sergey
[Qemu-devel] [PATCH v7 3/4] tests: in IDE and AHCI tests perform DMA write before flushing
From: Evgeny YakovlevDue to changes in flush behaviour clean disks stopped generating flush_to_disk events and IDE and AHCI tests that test flush commands started to fail. This change adds additional DMA writes to affected tests before sending flush commands so that bdrv_flush actually generates flush_to_disk event. Signed-off-by: Evgeny Yakovlev Signed-off-by: Denis V. Lunev Reviewed-by: Paolo Bonzini CC: Kevin Wolf CC: Max Reitz CC: Stefan Hajnoczi CC: Fam Zheng CC: John Snow --- tests/ahci-test.c | 34 -- tests/ide-test.c | 43 +++ 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/tests/ahci-test.c b/tests/ahci-test.c index 57dc44c..9c0adce 100644 --- a/tests/ahci-test.c +++ b/tests/ahci-test.c @@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void) g_free(tx); } +/* + * Write sector 1 with random data to make AHCI storage dirty + * Needed for flush tests so that flushes actually go though the block layer + */ +static void make_dirty(AHCIQState* ahci, uint8_t port) +{ +uint64_t ptr; +unsigned bufsize = 512; + +ptr = ahci_alloc(ahci, bufsize); +g_assert(ptr); + +ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 1); +ahci_free(ahci, ptr); +} + static void test_flush(void) { AHCIQState *ahci; +uint8_t port; ahci = ahci_boot_and_enable(NULL); + +port = ahci_port_select(ahci); +ahci_port_clear(ahci, port); + +make_dirty(ahci, port); + ahci_test_flush(ahci); ahci_shutdown(ahci); } @@ -1087,10 +1110,13 @@ static void test_flush_retry(void) debug_path, tmp_path, imgfmt); -/* Issue Flush Command and wait for error */ port = ahci_port_select(ahci); ahci_port_clear(ahci, port); +/* Issue write so that flush actually goes to disk */ +make_dirty(ahci, port); + +/* Issue Flush Command and wait for error */ cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0); ahci_guest_io_resume(ahci, cmd); @@ -1343,9 +1369,13 @@ static void test_flush_migrate(void) set_context(src->parent); -/* Issue Flush Command */ px = ahci_port_select(src); ahci_port_clear(src, px); + +/* Dirty device so that flush reaches disk */ +make_dirty(src, px); + +/* Issue Flush Command */ cmd = ahci_command_create(CMD_FLUSH_CACHE); ahci_command_commit(src, cmd, px); ahci_command_issue_async(src, cmd); diff --git a/tests/ide-test.c b/tests/ide-test.c index fed1b2e..8466d0f 100644 --- a/tests/ide-test.c +++ b/tests/ide-test.c @@ -499,6 +499,39 @@ static void test_identify(void) ide_test_quit(); } +/* + * Write sector 1 with random data to make IDE storage dirty + * Needed for flush tests so that flushes actually go though the block layer + */ +static void make_dirty(uint8_t device) +{ +uint8_t status; +size_t len = 512; +uintptr_t guest_buf; +void* buf; + +guest_buf = guest_alloc(guest_malloc, len); +buf = g_malloc(len); +g_assert(guest_buf); +g_assert(buf); + +memwrite(guest_buf, buf, len); + +PrdtEntry prdt[] = { +{ +.addr = cpu_to_le32(guest_buf), +.size = cpu_to_le32(len | PRDT_EOT), +}, +}; + +status = send_dma_request(CMD_WRITE_DMA, 1, 1, prdt, + ARRAY_SIZE(prdt), NULL); +g_assert_cmphex(status, ==, BM_STS_INTR); +assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR); + +g_free(buf); +} + static void test_flush(void) { uint8_t data; @@ -507,6 +540,11 @@ static void test_flush(void) "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw", tmp_path); +qtest_irq_intercept_in(global_qtest, "ioapic"); + +/* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */ +make_dirty(0); + /* Delay the completion of the flush request until we explicitly do it */ g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\"")); @@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine) "rerror=stop,werror=stop", debug_path, tmp_path); +qtest_irq_intercept_in(global_qtest, "ioapic"); + +/* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */ +make_dirty(0); + /* FLUSH CACHE command on device 0*/ outb(IDE_BASE + reg_device, 0); outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE); -- 2.1.4
Re: [Qemu-devel] [PATCH] linux-aio: keep processing events if MAX_EVENTS reached
On Tue, Jul 12, 2016 at 04:12:42PM +0200, Roman Penyaev wrote: > On Tue, Jun 28, 2016 at 11:42 AM, Stefan Hajnocziwrote: > > On Mon, Jun 27, 2016 at 08:36:19PM +0200, Roman Penyaev wrote: > >> On Jun 27, 2016 6:37 PM, "Stefan Hajnoczi" wrote: > >> > > >> > Commit ccb9dc10129954d0bcd7814298ed445e684d5a2a ("linux-aio: Cancel BH > >> > if not needed") exposed a side-effect of scheduling the BH for nested > >> > event loops. When io_getevents(2) fetches MAX_EVENTS events then it's > >> > necessary to call it again. Failure to do so can lead to hung I/O > >> > because the eventfd has already been cleared and the completion BH will > >> > not run again. > >> > > >> > This patch fixes the hang by calling io_getevents(2) again if the events > >> > array was totally full. > >> > > >> > Reported-by: Roman Penyaev > >> > Signed-off-by: Stefan Hajnoczi > >> > --- > >> > block/linux-aio.c | 5 + > >> > 1 file changed, 5 insertions(+) > >> > > >> > diff --git a/block/linux-aio.c b/block/linux-aio.c > >> > index e468960..af15f85 100644 > >> > --- a/block/linux-aio.c > >> > +++ b/block/linux-aio.c > >> > @@ -117,6 +117,7 @@ static void qemu_laio_completion_bh(void *opaque) > >> > LinuxAioState *s = opaque; > >> > > >> > /* Fetch more completion events when empty */ > >> > +more_events: > >> > if (s->event_idx == s->event_max) { > >> > do { > >> > struct timespec ts = { 0 }; > >> > @@ -146,6 +147,10 @@ static void qemu_laio_completion_bh(void *opaque) > >> > qemu_laio_process_completion(laiocb); > >> > } > >> > > >> > +if (s->event_idx == MAX_EVENTS) { > >> > +goto more_events; /* there might still be events waiting for us > >> */ > >> > +} > >> > + > >> > >> I am on vacation and can't check anything, but seems you also > >> could reproduce an issue and seems you are right: what I've > >> also noticed is that io hangs only if total number of requests > >> in guest > than 128 (what is defined in linux-aio.c). > >> > >> But I do not understand why you have to repeat io_getevents in > >> case of return value == MAX_EVENTS. According to my > >> understanding you cant submit more than 128, so the first > >> io_getevents call returns you exactly what you've submitted. > > > > Hi Roman, > > There is nothing like discussions on qemu-devel from the beach. True > > vacation! > > > > The issue is: > > > > 1. s->events[] is only MAX_EVENTS large so each io_getevents() call can > >only fetch that maximum number of events. > > > > 2. qemu_laio_completion_cb() clears the eventfd so the event loop will > >not call us again (unless additional I/O requests complete). > > > > Therefore, returning from qemu_laio_completion_bh() without having > > processed all events causes a hang. > > Hi Stefan, > > The issue is clear now. The thing is that I had an assumption, that we > never submit more than MAX_EVENTS. Now I see that according to the > ioq_submit() this is not true, so we can have inflights > MAX_EVENTS. > > Frankly, that seems a bug to me, because we promise the kernel [when > we call io_setup(MAX_EVENTS)] not to submit more than specified value. > Kernel allocates ring buffer aligned to the page size, also does some > compensations to have enough free requests for each CPU, so the final > io events number *can be* > than we have requested. So eventually > kernel can "swallow" more events than MAX_EVENTS. You are right. QEMU shouldn't exceed MAX_EVENTS in-flight requests, even though the kernel overallocates resources so we don't see an error (yet). Your patch makes linux-aio.c more correct overall, so let's take that. > I did the following patch (at the bottom of this email), which restricts > submission more than MAX_EVENTS and the interesting thing, that it works > faster than your current fix for this hang: > > my setup: 1 iothread, VCPU=8, MQ=8 > > your current patch > "linux-aio: keep processing events if MAX_EVENTS reached": > >READ: io=48199MB, aggrb=1606.5MB/s, minb=1606.5MB/s, > maxb=1606.5MB/s, mint=30003msec, maxt=30003msec > WRITE: io=48056MB, aggrb=1601.8MB/s, minb=1601.8MB/s, > maxb=1601.8MB/s, mint=30003msec, maxt=30003msec > > mine changes: > >READ: io=53294MB, aggrb=1776.3MB/s, minb=1776.3MB/s, > maxb=1776.3MB/s, mint=30003msec, maxt=30003msec > WRITE: io=53177MB, aggrb=1772.4MB/s, minb=1772.4MB/s, > maxb=1772.4MB/s, mint=30003msec, maxt=30003msec > > But what is the most important thing here is, that reverting > "linux-aio: Cancel BH if not needed" brings these numbers: > >READ: io=56362MB, aggrb=1878.4MB/s, minb=1878.4MB/s, > maxb=1878.4MB/s, mint=30007msec, maxt=30007msec > WRITE: io=56255MB, aggrb=1874.8MB/s, minb=1874.8MB/s, > maxb=1874.8MB/s, mint=30007msec, maxt=30007msec > > So, it seems to me that "linux-aio: Cancel BH if not needed" introduces > regression. Unfortunately the patch does two
Re: [Qemu-devel] [PATCH v2 0/2] trace: [*-user] Add commandline arguments to control tracing
On Wed, Jun 22, 2016 at 12:04:30PM +0200, Lluís Vilanova wrote: > Adds three commandline arguments to the main *-user programs, following what's > already available in softmmu: > > * -trace-enable > * -trace-events > * -trace-file > > > Changes in v2 > = > > * Tell user to use 'help' instead of '?' [Eric Blake]. > * Remove newlines on argument docs for bsd-user [Eric Blake]. > > > Signed-off-by: Lluís Vilanova> --- > > Lluís Vilanova (2): > trace: [linux-user] Commandline arguments to control tracing > trace: [bsd-user] Commandline arguments to control tracing > > > bsd-user/main.c | 19 +++ > linux-user/main.c | 28 > 2 files changed, 47 insertions(+) Hi Lluís, Commit e9e0bb2af2248eabafb54402e3127f9f8a8690f5 ("trace: move qemu_trace_opts to trace/control.c") made trace_events_init() static. This conflicts with your patch series. I suggest changing this series to use -trace ... just like qemu/qemu-img/qemu-nbd. Stefan signature.asc Description: PGP signature
[Qemu-devel] [RFC PATCH 0/4] translate: [tcg] Generic translation framework
This series proposes a generic (target-agnostic) instruction translation framework. It basically provides a generic main loop for instruction disassembly, which calls target-specific functions when necessary. This generalization makes inserting new code in the main loop easier, and helps in keeping all targets in synch as to the contents of it. I've only ported i386 as an example to get some feedback, but I'm planning on porting ARM next to see how well it fits into the current organization. Signed-off-by: Lluís Vilanova--- Lluís Vilanova (4): Pass generic CPUState to gen_intermediate_code() queue: Add macro for incremental traversal target: [tcg] Add generic translation framework target: [tcg,i386] Port to generic translation framework include/exec/exec-all.h |2 include/exec/translate-all_template.h | 58 +++ include/qemu/queue.h |5 + include/qom/cpu.h | 21 ++ target-alpha/translate.c | 11 + target-arm/translate.c| 24 +-- target-cris/translate.c | 17 +- target-i386/cpu.h |2 target-i386/translate.c | 290 +++-- target-lm32/translate.c | 22 +-- target-m68k/translate.c | 15 +- target-microblaze/translate.c | 24 +-- target-mips/translate.c | 15 +- target-moxie/translate.c | 14 +- target-openrisc/translate.c | 24 +-- target-ppc/translate.c| 15 +- target-s390x/translate.c | 13 + target-sh4/translate.c| 15 +- target-sparc/translate.c | 11 + target-tilegx/translate.c |7 - target-tricore/translate.c|9 - target-unicore32/translate.c | 17 +- target-xtensa/translate.c | 13 + translate-all.c |2 translate-all_template.h | 160 ++ 25 files changed, 503 insertions(+), 303 deletions(-) create mode 100644 include/exec/translate-all_template.h create mode 100644 translate-all_template.h To: qemu-devel@nongnu.org Cc: Paolo Bonzini Cc: Peter Crosthwaite Cc: Richard Henderson
[Qemu-devel] [PATCH 2/4] queue: Add macro for incremental traversal
Adds macro QTAILQ_FOREACH_CONTINUE to support incremental list traversal. Signed-off-by: Lluís Vilanova--- include/qemu/queue.h |5 + 1 file changed, 5 insertions(+) diff --git a/include/qemu/queue.h b/include/qemu/queue.h index f781aa2..c19f7ee 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -414,6 +414,11 @@ struct { \ (var); \ (var) = ((var)->field.tqe_next)) +#define QTAILQ_FOREACH_CONTINUE(var, field) \ +for ((var) = ((var)->field.tqe_next); \ +(var); \ +(var) = ((var)->field.tqe_next)) + #define QTAILQ_FOREACH_SAFE(var, head, field, next_var) \ for ((var) = ((head)->tqh_first); \ (var) && ((next_var) = ((var)->field.tqe_next), 1); \
[Qemu-devel] [PATCH] test-logging: don't hard-code paths in /tmp
Since f6880b7f [qemu-log: support simple pid substitution for logs], test-logging creates files with hard-coded names in /tmp. In the best case, this prevents multiple developers from running "make check" on the same machine. In the worst case, it allows for symlink attacks, enabling an attacker to overwrite files that are writable to the developer running "make check". Instead of hard-coding the paths, create a temporary directory using g_dir_make_tmp() and clean it up afterwards. Fixes: f6880b7f ("qemu-log: support simple pid substitution for logs") Signed-off-by: Sascha Silbe--- tests/test-logging.c | 42 +++--- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/tests/test-logging.c b/tests/test-logging.c index cdf13c6..faebc35 100644 --- a/tests/test-logging.c +++ b/tests/test-logging.c @@ -86,24 +86,52 @@ static void test_parse_range(void) error_free_or_abort(); } -static void test_parse_path(void) +static void test_parse_path(gconstpointer data) { +gchar const *tmp_path = data; +gchar *plain_path = g_build_filename(tmp_path, "qemu.log", NULL); +gchar *pid_infix_path = g_build_filename(tmp_path, "qemu-%d.log", NULL); +gchar *pid_suffix_path = g_build_filename(tmp_path, "qemu.log.%d", NULL); +gchar *double_pid_path = g_build_filename(tmp_path, "qemu-%d%d.log", NULL); Error *err = NULL; -qemu_set_log_filename("/tmp/qemu.log", _abort); -qemu_set_log_filename("/tmp/qemu-%d.log", _abort); -qemu_set_log_filename("/tmp/qemu.log.%d", _abort); +qemu_set_log_filename(plain_path, _abort); +qemu_set_log_filename(pid_infix_path, _abort); +qemu_set_log_filename(pid_suffix_path, _abort); -qemu_set_log_filename("/tmp/qemu-%d%d.log", ); +qemu_set_log_filename(double_pid_path, ); error_free_or_abort(); + +g_free(double_pid_path); +g_free(pid_suffix_path); +g_free(pid_infix_path); +g_free(plain_path); +} + +static void rmtree(gchar const *root) +{ +/* There should really be a g_rmtree(). Implementing it ourselves + * isn't really worth it just for a test, so let's just use rm. */ +gchar const *rm_args[] = { "rm", "-rf", root, NULL }; +g_spawn_sync(NULL, (gchar **)rm_args, NULL, + G_SPAWN_SEARCH_PATH, NULL, NULL, + NULL, NULL, NULL, NULL); } int main(int argc, char **argv) { +gchar *tmp_path = g_dir_make_tmp("qemu-test-logging.XX", NULL); +int rc; + g_test_init(, , NULL); +g_assert_nonnull(tmp_path); g_test_add_func("/logging/parse_range", test_parse_range); -g_test_add_func("/logging/parse_path", test_parse_path); +g_test_add_data_func("/logging/parse_path", tmp_path, test_parse_path); -return g_test_run(); +rc = g_test_run(); + +rmtree(tmp_path); +g_free(tmp_path); +return rc; } -- 1.9.1
[Qemu-devel] [PATCH] compiler: never omit assertions if using a static analysis tool
Assertions help both Coverity and the clang static analyzer avoid false positives, but on the other hand both are confused when the condition is compiled as (void)(x != FOO). Always expand assertion macros when using Coverity or clang, through a new QEMU_STATIC_ANALYSIS preprocessor symbol. This fixes a couple false positives in TCG. Signed-off-by: Paolo Bonzini--- include/qemu/compiler.h | 3 +++ tcg/tcg.h | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h index b64f899..338d3a6 100644 --- a/include/qemu/compiler.h +++ b/include/qemu/compiler.h @@ -3,6 +3,9 @@ #ifndef COMPILER_H #define COMPILER_H +#if defined __clang_analyzer__ || defined __COVERITY__ +#define QEMU_STATIC_ANALYSIS 1 +#endif /* | The macro QEMU_GNUC_PREREQ tests for minimum version of the GNU C compiler. diff --git a/tcg/tcg.h b/tcg/tcg.h index 66ae0c7..6046dcd 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -191,7 +191,7 @@ typedef uint64_t tcg_insn_unit; #endif -#ifdef CONFIG_DEBUG_TCG +#if defined CONFIG_DEBUG_TCG || defined QEMU_STATIC_ANALYSIS # define tcg_debug_assert(X) do { assert(X); } while (0) #elif QEMU_GNUC_PREREQ(4, 5) # define tcg_debug_assert(X) \ -- 2.7.4
Re: [Qemu-devel] QOM: best way for parents to pass information to children? (was Re: [PATCH RFC 07/16] qom/cpu: make nr-cores, nr-threads real properties)
Am 15.07.2016 um 18:10 schrieb Eduardo Habkost: > On Fri, Jul 15, 2016 at 11:11:38AM +0200, Igor Mammedov wrote: >> On Fri, 15 Jul 2016 08:35:30 +0200 >> Andrew Joneswrote: >>> On Thu, Jul 14, 2016 at 05:07:43PM -0300, Eduardo Habkost wrote: First of all, sorry for the horrible delay in replying to this thread. On Wed, Jun 15, 2016 at 10:56:20AM +1000, David Gibson wrote: > On Tue, Jun 14, 2016 at 08:19:49AM +0200, Andrew Jones wrote: >> On Tue, Jun 14, 2016 at 12:12:16PM +1000, David Gibson wrote: >>> On Sun, Jun 12, 2016 at 03:48:10PM +0200, Andrew Jones wrote: > [...] >> +static Property cpu_common_properties[] = { >> +DEFINE_PROP_INT32("nr-cores", CPUState, nr_cores, 1), >> +DEFINE_PROP_INT32("nr-threads", CPUState, nr_threads, 1), >> +DEFINE_PROP_END_OF_LIST() >> +}; > > Are you aware of the current CPU hotplug discussion that is going on? > I'm aware of it going on, but haven't been following it. > I'm not very involved there, but I think some of these reworks also > move > "nr_threads" into the CPU state already, e.g. see: nr_threads (and nr_cores) are already state in CPUState. This patch just exposes that state via properties. > > https://github.com/dgibson/qemu/commit/9d07719784ecbeebea71 > > ... so you might want to check these patches first to see whether you > can base your rework on them? Every cpu, and thus every machine, uses CPUState for its cpus. I'm not sure every machine will want to use that new abstract core class though. If they did, then we could indeed use nr_threads from there instead (and remove it from CPUState), but we'd still need nr_cores from the abstract cpu package class (CPUState). >>> >>> Hmm. Since the CPUState object represents just a single thread, it >>> seems weird to me that it would have nr_threads and nr_cores >>> information. Agreed it is weird, and I think we should try to move it away from CPUState, not make it part of the TYPE_CPU interface. nr_threads belongs to the actual container of the Thread objects, and nr_cores in the actual container of the Core objects. The problem is how to implement that in a non-intrusive way that would require changing the object hierarchy of all architectures. >>> >>> Exposing those as properties makes that much worse, because it's now >>> ABI, rather than internal detail we can clean up at some future time. >> >> CPUState is supposed to be "State of one CPU core or thread", which >> justifies having nr_threads state, as it may be describing a core. > > Um.. does it ever actually represent a (multithread) core in practice? > It would need to have duplicated register state for every thread were > that the case. AFAIK, CPUState is still always thread state. Or has this changed in some architectures, already? > >> I guess there's no justification for having nr_cores in there though. >> I agree adding the Core class is a good idea, assuming it will get used >> by all machines, and CPUState then gets changed to a Thread class. The >> question then, though, is do we also create a Socket class that contains >> nr_cores? > > That was roughly our intention with the way the cross platform hotplug > stuff is evolving. But the intention was that the Socket objects > would only need to be constructed for machine types where it makes > sense. So for example on the paravirt pseries platform, we'll only > have Core objects, because the socket distinction isn't really > meaningful. > >> And how will a Thread method get that information when it >> needs to emulate, e.g. CPUID, that requires it? It's a bit messy, so >> I'm open to all suggestions on it. > > So, if the Thread needs this information, I'm not opposed to it having > it internally (presumably populated earlier from the Core object). > But I am opposed to it being a locked in part of the interface by > having it as an exposed property. I agree we don't want to make this part of the external interface. In this case, if we don't add the properties, how exactly is the Machine or Core code supposed to pass that information to the Thread object? Maybe the intermediate steps could be: * Make the Thread code that uses CPUState::nr_{cores,threads} and smp_{cores,threads} get that info from MachineState instead. >>> >>> I have some patches already headed down this road. >>> * On the architectures where we already have a reasonable
[Qemu-devel] [PATCH] migration: set state to post-migrate on failure
From: "Dr. David Alan Gilbert"If a migration fails/is cancelled during the postcopy stage we currently end up with the runstate as finish-migrate, where it should be post-migrate. There's a small window in precopy where I think the same thing can happen, but I've never seen it. It rarely matters; the only postcopy case is if you restart a migration, which again is a case that rarely matters in postcopy because it's only safe to restart the migration if you know the destination hasn't been running (which you might if you started the destination with -S and hadn't got around to 'c' ing it before the postcopy failed). Even then it's a small window but potentially you could hit if there's a problem loading the devices on the destination. This corresponds to: https://bugzilla.redhat.com/show_bug.cgi?id=1355683 Signed-off-by: Dr. David Alan Gilbert --- migration/migration.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index c4e0193..955d5ee 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1837,6 +1837,10 @@ static void *migration_thread(void *opaque) } else { if (old_vm_running && !entered_postcopy) { vm_start(); +} else { +if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { +runstate_set(RUN_STATE_POSTMIGRATE); +} } } qemu_bh_schedule(s->cleanup_bh); -- 2.7.4