Re: [Qemu-devel] [Qemu-block] [PATCH] raw-posix.c: Make physical devices usable in QEMU
On Jul 16, 2015, at 9:19 AM, Stefan Hajnoczi wrote: On Thu, Jul 09, 2015 at 10:02:26AM -0400, Programmingkid wrote: On Jul 9, 2015, at 6:52 AM, Stefan Hajnoczi wrote: On Tue, Jul 07, 2015 at 01:33:23PM -0400, Programmingkid wrote: Make physical devices like a USB flash drive or a CDROM drive work in QEMU. With this patch I can use a USB flash drive like a hard drive. Before this patch, QEMU would just quit with a message like resource busy. The commit message and the description are missing on Mac OS X. It should be clear right away that this applies to Mac only. This works fine on Linux and probably other host OSes. Yeah, that should have been done. Did you see any issues with the code? QEMU shouldn't silently open a different file than the one given by the user. The user should give the exact device file they want. If there is magic behavior it needs to be documented, but I don't see a reason why that's necessary in the case of device files. I think you are reviewing an older patch. The newest one doesn't do that. QEMU shouldn't mount/unmount the CD-ROM. atexit(3) doesn't handle crashes or abort(). Users may be confused to find their CD-ROM unmounted in those cases and would see this as a bug. Instead we should refuse mounted CD-ROMs so the user understands that block-level access requires them to unmount first. That can be done. It just wouldn't be as user friendly as having QEMU do it for the user :( The strcpy/sprintf usage in this patch is unsafe and can lead to buffer overflow, for example in the case of generating command-lines. The command-line buffer is only MAXPATHLEN so prepending the command to the filename could exceed the buffer size. There is also a buffer overflow in the array of devices that need to be mounted. What happens if there are more than 7 devices? Ok. Will correct this issue.
Re: [Qemu-devel] [PULL for-2.4 0/1] qxl: allow to specify head limit to qxl driver
On 16 July 2015 at 16:34, Gerd Hoffmann kra...@redhat.com wrote: Hi, A single spice patch, adding a new parameter to specify the number of heads for qxl (assuming spice-server version is new enough). please pull, Gerd The following changes since commit 2d5ee9e7a7dd495d233cf9613a865f63f88e3375: Merge remote-tracking branch 'remotes/lalrae/tags/mips-20150716' into staging (2015-07-16 10:40:23 +0100) are available in the git repository at: git://anongit.freedesktop.org/spice/qemu tags/pull-spice-20150716-1 for you to fetch changes up to 567161fdd47aeb6987e700702f6bbfef04ae0236: qxl: allow to specify head limit to qxl driver (2015-07-16 17:31:05 +0200) qxl: allow to specify head limit to qxl driver Applied, thanks. -- PMM
Re: [Qemu-devel] [PULL for-2.4 0/4] input: fixes for 2.4
On 16 July 2015 at 16:38, Gerd Hoffmann kra...@redhat.com wrote: Hi, A few input fixes for 2.4. Also enable virtio-input builds on non-linux hosts after fixing up the ioctl include. please pull, Gerd The following changes since commit f3a1b5068cea303a55e2a21a97e66d057eaae638: Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2015-07-13 13:35:51 +0100) are available in the git repository at: git://git.kraxel.org/qemu tags/pull-input-20150714-1 for you to fetch changes up to 8121c09e8a52fd47254479d8f5ccbbc20e7bb718: hid: clarify hid_keyboard_process_keycode (2015-07-14 13:48:45 +0200) input: fixes for 2.4 I'm afraid this doesn't build for Windows: In file included from /home/petmay01/linaro/qemu-for-merges/hw/input/virtio-input.c:13: /home/petmay01/linaro/qemu-for-merges/include/standard-headers/linux/input.h:890:1: error: SW_MAX redefined In file included from /usr/lib/gcc/i586-mingw32msvc/4.2.1-sjlj/../../../../i586-mingw32msvc/include/windows.h:55, from /home/petmay01/linaro/qemu-for-merges/include/sysemu/os-win32.h:29, from /home/petmay01/linaro/qemu-for-merges/include/qemu-common.h:48, from /home/petmay01/linaro/qemu-for-merges/include/qemu/iov.h:17, from /home/petmay01/linaro/qemu-for-merges/hw/input/virtio-input.c:7: /usr/lib/gcc/i586-mingw32msvc/4.2.1-sjlj/../../../../i586-mingw32msvc/include/winuser.h:729:1: error: this is the location of the previous definition thanks -- PMM
Re: [Qemu-devel] [PULL 00/11] RCU, KVM, memory API, crypto, Coverity fixes for 2.4.0-rc1
On 16 July 2015 at 17:55, Paolo Bonzini pbonz...@redhat.com wrote: The following changes since commit 6169b60285fe1ff730d840a49527e721bfb30899: Update version for v2.4.0-rc0 release (2015-07-09 17:56:56 +0100) are available in the git repository at: git://github.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 4a8775ab71d2186fc1cd585ea80c000409965cde: crypto: avoid undefined behavior in nettle calls (2015-07-16 18:54:21 +0200) * rcu_register_thread fixes. * MIPS-KVM fixes. * Coverity fixes. * Nettle function prototype fixes. * Memory API refcount fix. I get a pile of assertions on OSX running rcutorture: GTESTER tests/rcutorture Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/srcAssertion failed: (rcu_reader.de/qemu/util/rcu.c, line 304. pth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/srcAssertion failed: (rcu_reader.de/qemu/util/rcu.c, line 304. pth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. GTester: last random seed: R02S9b5149dbb406809df60686a3e8223c26 Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. Assertion failed: (rcu_reader.depth == 0), function rcu_unregister_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. Assertion failed: (rcu_reader.deAssertion failed: (rcu_reader.depth == 0), function rcu_unregistpth == 0), function rcu_unregister_thread, file /Users/pm215/srcer_thread, file /Users/pm215/src/qemu/util/rcu.c, line 304. /qemu/util/rcu.c, line 304. GTester: last random seed: R02Sb915fd85eca48d367fd186bdfd39d8c7 make: *** [check-tests/rcutorture] Error 1 -- PMM
Re: [Qemu-devel] [PATCH 3/3] PAM: make PAM emulation closer to documentation
On 16/07/2015 16:41, Ефимов Василий wrote: The main problem is rendering memory tree to FlatView. I don't believe it's necessary to render a memory tree to the FlatView. You can use existing AddressSpaces. +/* Read from RAM and write to PCI */ +memory_region_init_io(pam-region[1], OBJECT(dev), pam_ops, pam, +pam-r-ram-w-pci, size); This can be done with memory_region_set_readonly on the RAM region. You need to set mr-ops in order to point to pam_ops; for a first proof of concept you can just set the field directly. The idea is to read directly from system RAM region and to write to PCI using I/O (because I do not see another way to emulate access type driven redirection with existent API). If we create RAM and make it read only then new useless RAM block will be created. Don't create RAM; modify the existing one. It is useless because of ram_addr of new region will be set to one within system RAM block. Hence, cleaner way is to create I/O region. You can use the existing RAM region and modify its properties (i.e. toggle mr-readonly) after setting special mr-ops. The special mr-ops will be used for writes when mr-readonly = true. Writes to the PCI memory space can use the PCI address space, with address_space_st*. There is no PCI AddressSpace (only MemoryRegion). But address_space_st* requires AddressSpace as argument. Then create one with address_space_init. However, can the guest see the difference between real mode 1 and the fake mode 1 that QEMU implements? Perhaps mode 1 can be left as is. +/* Read from PCI and write to RAM */ +memory_region_init_io(pam-region[2], OBJECT(dev), pam_ops, pam, +pam-r-pci-w-ram, size); Here you cannot run code from ROM, so it can be a pure MMIO region. Reads can use address_space_ld*, while writes can use memory_region_get_ram_ptr. Even in this mode it is possible for code to be executed from ROM. This can happen when particular PCI address is within ROM device connected to PCI bus. If it's just for pc.rom and isa-bios, introduce a new function pam_create_pci_region that creates pc.rom with memory_region_init_rom_device. The mr-ops can write to RAM (mode 2) or discard the write (mode 0). They you can make pc.rom 256K instead of 128K, and instead of an alias, you can manually copy the last 128K of the BIOS into the last 128K of pc.rom. Some adjustment will be necessary in order to support migration (perhaps creating two 128K regions pc.rom and pc.rom.mirror), but for a proof of concept the above should be enough. Paolo
[Qemu-devel] [PATCH v13 18/19] i.MX: Add qtest support for I2C device emulator.
This is using a ds1338 RTC chip on the I2C bus. This RTC chip is not present on the real 3DS PDK board. Signed-off-by: Jean-Christophe Dubois j...@tribudubois.net --- Changes since v1: * not present on v1 Changes since v2: * use a common header file for I2C regs definition Changes since v3: * rework GPL headers. Changes since v4: * none Changes since v5: * none Changes since v6: * none Changes since v7: * adapt to new i.MX I2C header file. Changes since v8: * no change Changes since v9: * no change Changes since v10: * no change Changes since v11: * no change Changes since v12: * no change tests/Makefile | 3 + tests/ds1338-test.c| 75 ++ tests/libqos/i2c-imx.c | 209 + tests/libqos/i2c.h | 3 + 4 files changed, 290 insertions(+) create mode 100644 tests/ds1338-test.c create mode 100644 tests/libqos/i2c-imx.c diff --git a/tests/Makefile b/tests/Makefile index c5e4744..93890a8 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -193,6 +193,7 @@ check-qtest-sparc64-y = tests/endianness-test$(EXESUF) gcov-files-sparc-y += hw/timer/m48t59.c gcov-files-sparc64-y += hw/timer/m48t59.c check-qtest-arm-y = tests/tmp105-test$(EXESUF) +check-qtest-arm-y = tests/ds1338-test$(EXESUF) gcov-files-arm-y += hw/misc/tmp105.c check-qtest-arm-y += tests/virtio-blk-test$(EXESUF) gcov-files-arm-y += arm-softmmu/hw/block/virtio-blk.c @@ -342,6 +343,7 @@ libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o libqos-pc-obj-y += tests/libqos/malloc-pc.o tests/libqos/libqos-pc.o libqos-pc-obj-y += tests/libqos/ahci.o libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o +libqos-imx-obj-y = $(libqos-obj-y) tests/libqos/i2c-imx.o libqos-usb-obj-y = $(libqos-pc-obj-y) tests/libqos/usb.o libqos-virtio-obj-y = $(libqos-pc-obj-y) tests/libqos/virtio.o tests/libqos/virtio-pci.o tests/libqos/virtio-mmio.o tests/libqos/malloc-generic.o @@ -356,6 +358,7 @@ tests/hd-geo-test$(EXESUF): tests/hd-geo-test.o tests/boot-order-test$(EXESUF): tests/boot-order-test.o $(libqos-obj-y) tests/bios-tables-test$(EXESUF): tests/bios-tables-test.o $(libqos-obj-y) tests/tmp105-test$(EXESUF): tests/tmp105-test.o $(libqos-omap-obj-y) +tests/ds1338-test$(EXESUF): tests/ds1338-test.o $(libqos-imx-obj-y) tests/i440fx-test$(EXESUF): tests/i440fx-test.o $(libqos-pc-obj-y) tests/q35-test$(EXESUF): tests/q35-test.o $(libqos-pc-obj-y) tests/fw_cfg-test$(EXESUF): tests/fw_cfg-test.o $(libqos-pc-obj-y) diff --git a/tests/ds1338-test.c b/tests/ds1338-test.c new file mode 100644 index 000..fbc989b --- /dev/null +++ b/tests/ds1338-test.c @@ -0,0 +1,75 @@ +/* + * QTest testcase for the DS1338 RTC + * + * Copyright (c) 2013 Jean-Christophe Dubois + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ + +#include libqtest.h +#include libqos/i2c.h + +#include glib.h + +#define IMX25_I2C_0_BASE 0x43F8 + +#define DS1338_ADDR 0x68 + +static I2CAdapter *i2c; +static uint8_t addr; + +#define bcd2bin(x)(((x) 0x0f) + ((x) 4) * 10) + +static void send_and_receive(void) +{ +uint8_t cmd[1]; +uint8_t resp[7]; +time_t now = time(NULL); +struct tm *tm_ptr = gmtime(now); + +/* reset the index in the RTC memory */ +cmd[0] = 0; +i2c_send(i2c, addr, cmd, 1); + +/* retrieve the date */ +i2c_recv(i2c, addr, resp, 7); + +/* check retreived time againt local time */ +g_assert_cmpuint(bcd2bin(resp[4]), == , tm_ptr-tm_mday); +g_assert_cmpuint(bcd2bin(resp[5]), == , 1 + tm_ptr-tm_mon); +g_assert_cmpuint(2000 + bcd2bin(resp[6]), == , 1900 + tm_ptr-tm_year); +} + +int main(int argc, char **argv) +{ +QTestState *s = NULL; +int ret; + +g_test_init(argc, argv, NULL); + +s = qtest_start(-display none -machine imx25_3ds); +i2c = imx_i2c_create(IMX25_I2C_0_BASE); +addr = DS1338_ADDR; + +qtest_add_func(/ds1338/tx-rx, send_and_receive); + +ret = g_test_run(); + +if (s) { +qtest_quit(s); +} +g_free(i2c); + +return ret; +} diff --git a/tests/libqos/i2c-imx.c b/tests/libqos/i2c-imx.c new file mode 100644 index 000..b5cef66 --- /dev/null +++ b/tests/libqos/i2c-imx.c @@ -0,0 +1,209 @@ +/* + * QTest i.MX I2C driver + * + * Copyright (c) 2013 Jean-Christophe Dubois + * + * This program is free
Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
On 07/15/2015 09:54 PM, Aurelien Jarno wrote: While I understand why we need the new trunc_shr_i32 opcode for MIPS64 (the 32-bit values must be kept sign-extended), I currently fail to see why it is needed for SPARC. As far as I recall, it improves code for extracting high parts of 64-bit quantities. Without this, we wind up with a 64-bit shift, requiring a 64-bit temp register, followed by the real truncate which can copy the data to a 32-bit destination register. r~
[Qemu-devel] [PATCH 1/2] tcg/i386: Extend addresses for 32-bit guests
Removing the ??? comment explaining why it (mostly) worked. Signed-off-by: Richard Henderson r...@twiddle.net --- tcg/i386/tcg-target.c | 105 +++--- 1 file changed, 65 insertions(+), 40 deletions(-) diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c index ff4d9cf..bbe2963 100644 --- a/tcg/i386/tcg-target.c +++ b/tcg/i386/tcg-target.c @@ -1434,8 +1434,8 @@ static inline void setup_guest_base_seg(void) { } #endif /* SOFTMMU */ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi, - TCGReg base, intptr_t ofs, int seg, - TCGMemOp memop) + TCGReg base, int index, intptr_t ofs, + int seg, TCGMemOp memop) { const TCGMemOp real_bswap = memop MO_BSWAP; TCGMemOp bswap = real_bswap; @@ -1448,13 +1448,16 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi, switch (memop MO_SSIZE) { case MO_UB: -tcg_out_modrm_offset(s, OPC_MOVZBL + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, OPC_MOVZBL + seg, datalo, + base, index, 0, ofs); break; case MO_SB: -tcg_out_modrm_offset(s, OPC_MOVSBL + P_REXW + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, OPC_MOVSBL + P_REXW + seg, datalo, + base, index, 0, ofs); break; case MO_UW: -tcg_out_modrm_offset(s, OPC_MOVZWL + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, OPC_MOVZWL + seg, datalo, + base, index, 0, ofs); if (real_bswap) { tcg_out_rolw_8(s, datalo); } @@ -1462,20 +1465,21 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi, case MO_SW: if (real_bswap) { if (have_movbe) { -tcg_out_modrm_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg, - datalo, base, ofs); +tcg_out_modrm_sib_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg, + datalo, base, index, 0, ofs); } else { -tcg_out_modrm_offset(s, OPC_MOVZWL + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, OPC_MOVZWL + seg, datalo, + base, index, 0, ofs); tcg_out_rolw_8(s, datalo); } tcg_out_modrm(s, OPC_MOVSWL + P_REXW, datalo, datalo); } else { -tcg_out_modrm_offset(s, OPC_MOVSWL + P_REXW + seg, - datalo, base, ofs); +tcg_out_modrm_sib_offset(s, OPC_MOVSWL + P_REXW + seg, + datalo, base, index, 0, ofs); } break; case MO_UL: -tcg_out_modrm_offset(s, movop + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, movop + seg, datalo, base, index, 0, ofs); if (bswap) { tcg_out_bswap32(s, datalo); } @@ -1483,19 +1487,22 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi, #if TCG_TARGET_REG_BITS == 64 case MO_SL: if (real_bswap) { -tcg_out_modrm_offset(s, movop + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, movop + seg, datalo, + base, index, 0, ofs); if (bswap) { tcg_out_bswap32(s, datalo); } tcg_out_ext32s(s, datalo, datalo); } else { -tcg_out_modrm_offset(s, OPC_MOVSLQ + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, OPC_MOVSLQ + seg, datalo, + base, index, 0, ofs); } break; #endif case MO_Q: if (TCG_TARGET_REG_BITS == 64) { -tcg_out_modrm_offset(s, movop + P_REXW + seg, datalo, base, ofs); +tcg_out_modrm_sib_offset(s, movop + P_REXW + seg, datalo, + base, index, 0, ofs); if (bswap) { tcg_out_bswap64(s, datalo); } @@ -1506,11 +1513,15 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi, datahi = t; } if (base != datalo) { -tcg_out_modrm_offset(s, movop + seg, datalo, base, ofs); -tcg_out_modrm_offset(s, movop + seg, datahi, base, ofs + 4); +tcg_out_modrm_sib_offset(s, movop + seg, datalo, + base, index, 0, ofs); +tcg_out_modrm_sib_offset(s, movop + seg, datahi, + base, index, 0, ofs + 4); } else { -tcg_out_modrm_offset(s, movop + seg, datahi, base, ofs + 4); -
Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx-dispatching is dead, all hail ctx-notify_me
On 16/07/2015 21:05, Richard W.M. Jones wrote: Sorry to spoil things, but I'm still seeing this bug, although it is now a lot less frequent with your patch. I would estimate it happens more often than 1 in 5 runs with qemu.git, and probably 1 in 200 runs with qemu.git + the v2 patch series. It's the exact same hang in both cases. Is it possible that this patch doesn't completely close any race? Still, it is an improvement, so there is that. I would guess instead that there are two separate bugs, but it's not impossible that it's still there. Paolo
Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx-dispatching is dead, all hail ctx-notify_me
On 17/07/2015 00:06, Paolo Bonzini wrote: On 16/07/2015 21:05, Richard W.M. Jones wrote: Sorry to spoil things, but I'm still seeing this bug, although it is now a lot less frequent with your patch. I would estimate it happens more often than 1 in 5 runs with qemu.git, and probably 1 in 200 runs with qemu.git + the v2 patch series. It's the exact same hang in both cases. Is it possible that this patch doesn't completely close any race? Still, it is an improvement, so there is that. I would guess instead that there are two separate bugs, but it's not impossible that it's still there. Reproduced after ~80 runs... Paolo
[Qemu-devel] [PULL 2/8] mips/kvm: Sign extend registers written to KVM
From: James Hogan james.ho...@imgtec.com In case we're running on a 64-bit host, be sure to sign extend the general purpose registers and hi/lo/pc before writing them to KVM, so as to take advantage of MIPS32/MIPS64 compatibility. Signed-off-by: James Hogan james.ho...@imgtec.com Cc: Paolo Bonzini pbonz...@redhat.com Cc: Leon Alrae leon.al...@imgtec.com Cc: Aurelien Jarno aurel...@aurel32.net Cc: k...@vger.kernel.org Cc: qemu-sta...@nongnu.org Message-Id: 1429871214-23514-3-git-send-email-james.ho...@imgtec.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- target-mips/kvm.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/target-mips/kvm.c b/target-mips/kvm.c index 85256f3..d287d42 100644 --- a/target-mips/kvm.c +++ b/target-mips/kvm.c @@ -628,12 +628,12 @@ int kvm_arch_put_registers(CPUState *cs, int level) /* Set the registers based on QEMU's view of things */ for (i = 0; i 32; i++) { -regs.gpr[i] = env-active_tc.gpr[i]; +regs.gpr[i] = (int64_t)(target_long)env-active_tc.gpr[i]; } -regs.hi = env-active_tc.HI[0]; -regs.lo = env-active_tc.LO[0]; -regs.pc = env-active_tc.PC; +regs.hi = (int64_t)(target_long)env-active_tc.HI[0]; +regs.lo = (int64_t)(target_long)env-active_tc.LO[0]; +regs.pc = (int64_t)(target_long)env-active_tc.PC; ret = kvm_vcpu_ioctl(cs, KVM_SET_REGS, regs); -- 2.4.3
[Qemu-devel] [Bug 1384892] Re: RTL8168 NIC VFIO not working anymore since QEMU 2.1
I made some time for limited testing. The released V2.1 puts the vfio-ed RTL8168 into a zombie state when running an IPFire VM, which requires a subsequent POWER cycle in order to let mii-tools show anything else than 'no link' (i.e. to get the LED back on). Pushing the reset button on the x64 metal was NOT enough. Then I binary-patched my V2.1 qemu x64 executable so it compares against a 'wrong' PCI ID inside vfio_probe_rtl8168_bar2_window_quirk() (to confirm comment #3). Yes, that made it work again. Further patch-attempts towards comment #10 all ended up in RTL zombie state (even though the IPFire VM was otherwise responsive). Each time a power cycle was required to get the RTL back alive. Could there be a general problem because the RTL must be usable in MSI-X mode for Windows and in in MSI mode for Linux ? Maybe a cmdline switch should be added to activate the quirk just when MSI-X is intended ? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1384892 Title: RTL8168 NIC VFIO not working anymore since QEMU 2.1 Status in QEMU: New Bug description: After upgrading QEMU from 2.0 to 2.1 (and libiscsi from 1.7.0 to 1.12 as a dependency) my two RTL8168 NICs stopped working. The NICs do not respond to any command and even the LEDs on the network connection turn off, a few seconds after the VM started. To get them back running I had to downgrade to 2.0 and restart the system. Unfortunately, I have no clue what to do or how to debug this problem since there are no specific errors logged. I tried two different VMs: Debian Wheezy and IPFire (see attachment for further details). The QEMU 2.1 changelog states Support for RTL8168 NICs. so there were some major changes done, I guess. On the IPFire guest the kernel log shows many of these lines: r8169 :00:07.0 green1: rtl_eriar_cond == 1 (loop: 100, delay: 100) r8169 :00:07.0 green1: rtl_phy_reset_cond == 1 (loop: 100, delay: 1) On the Debian guest there is only: r8169 :00:07.0: firmware: agent loaded rtl_nic/rtl8168e-3.fw into memory r8169 :00:07.0: lan0: link down ADDRCONF(NETDEV_UP): lan0: link is not ready The commandline for IPFire can be seen in the attachment. It is the same for Debian. There are also the complete kernel logs for the working (2.0) and non-working (2.1) cases. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1384892/+subscriptions
Re: [Qemu-devel] [PATCH 2/2 v3] virtio-rng: Serve pending request if any after timer bumps up quota.
On (Wed) 15 Jul 2015 [17:46:48], Pankaj Gupta wrote: We are arming timer when we get first request from guest. Even if guest pulls all the data we will be serving guest only when timer bumps up new quota. When timer expires we check if we have a pending request from guest, we serve it and re-arm the timer else we don't do any thing. Signed-off-by: Pankaj Gupta pagu...@redhat.com --- hw/virtio/virtio-rng.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c index ae04352..3d9a002 100644 --- a/hw/virtio/virtio-rng.c +++ b/hw/virtio/virtio-rng.c @@ -142,9 +142,13 @@ static int virtio_rng_load(QEMUFile *f, void *opaque, int version_id) static void check_rate_limit(void *opaque) { VirtIORNG *vrng = opaque; +size_t size; vrng-quota_remaining = vrng-conf.max_bytes; -virtio_rng_process(vrng); +size = get_request_size(vrng-vq, 0); +if (size 0) { +virtio_rng_process(vrng); +} vrng-activate_timer = true; Ah, this isn't helping us much here; we might as well return in a similar way from virtio_rng_process. What I had written earlier was slightly different than this: So now with your changes, here is what we can do: instead of just activating the timer in check_rate_limit(), we can issue a get_request_size() call. If the return value is 0, it means we have a buffer queued up by the guest, and we can then call virtio_rng_process() and set activated to true. Else, no need to call virtio_rng_process at all, and the job for the timer is done, since there are no more outstanding requests from the guest. Anyway we can look at that later, patch 1 is already a big improvement so I think we should just stick with that, and think about other things in a different series. Thanks, Amit
Re: [Qemu-devel] [PATCH] qemu-char: Fix missed data on unix socket
On (Wed) 15 Jul 2015 [23:44:52], Nils Carlson wrote: On Mon, 13 Jul 2015, Nils Carlson wrote: On Mon, 13 Jul 2015, Amit Shah wrote: snip Also, returning TRUE there isn't right - if the connection ends, we should return FALSE. I agree that this seems reasonable. I will change it and re-test. I had a closer look, and it seems always returning true is intentional here, the called function, tcp_chr_disconnect(chr), handles the deregistration from handlers. If we were to return FALSE we would be duplicating work and possibly breaking things. Not sure how. Anyway, can you please start a new thread, with the author and reviewers of the patch CC'ed, so they can chime in as well? Amit
Re: [Qemu-devel] [PATCH V2] virtio-net: unbreak any layout
On Wed, Jul 15, 2015 at 03:56:07PM +0800, Jason Wang wrote: Commit 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) breaks any layout by requiring out_sg[0].iov_len = n-guest_hdr_len. Fixing this by copying header to temporary buffer if swap is needed, and then use this buffer as part of out_sg. Fixes 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) Cc: qemu-sta...@nongnu.org Cc: c...@fr.ibm.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - avoid header copying if there's no need to do header swap - don't write the header back --- hw/net/virtio-net.c | 17 ++--- include/hw/virtio/virtio-access.h | 9 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e3c2db3..12322bd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1142,7 +1142,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) ssize_t ret, len; unsigned int out_num = elem.out_num; struct iovec *out_sg = elem.out_sg[0]; -struct iovec sg[VIRTQUEUE_MAX_SIZE]; +struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE]; +struct virtio_net_hdr hdr; if (out_num 1) { error_report(virtio-net header not in first element); @@ -1150,11 +1151,21 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } if (n-has_vnet_hdr) { -if (out_sg[0].iov_len n-guest_hdr_len) { +if (iov_size(out_sg, out_num) n-guest_hdr_len) { error_report(virtio-net header incorrect); exit(1); } this scans the iov unnecessarily. How about checking return code from iov_to_buf instead? -virtio_net_hdr_swap(vdev, (void *) out_sg[0].iov_base); +if (virtio_needs_swap(vdev)) { +iov_to_buf(out_sg, out_num, 0, hdr, sizeof(hdr)); +virtio_net_hdr_swap(vdev, (void *) hdr); +sg2[0].iov_base = hdr; +sg2[0].iov_len = sizeof(hdr); +out_num = iov_copy(sg2[1], ARRAY_SIZE(sg2) - 1, + out_sg, out_num, + sizeof(hdr), -1); This might truncate packet if it does not fit in 1024 anymore. It might be better to just drop it. +out_num += 1; +out_sg = sg2; +} } /* diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index cee5dd7..1ec1dfd 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -143,6 +143,15 @@ static inline uint64_t virtio_ldq_p(VirtIODevice *vdev, const void *ptr) } } +static inline bool virtio_needs_swap(VirtIODevice *vdev) +{ +#ifdef HOST_WORDS_BIGENDIAN +return virtio_access_is_big_endian(vdev) ? false : true; +#else +return virtio_access_is_big_endian(vdev) ? true : false; +#endif +} + static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s) { #ifdef HOST_WORDS_BIGENDIAN -- 2.1.4
Re: [Qemu-devel] [PATCH 2/2 v3] virtio-rng: Serve pending request if any after timer bumps up quota.
- Original Message - From: Amit Shah amit.s...@redhat.com To: Pankaj Gupta pagu...@redhat.com Cc: qemu-devel@nongnu.org, m...@redhat.com Sent: Thursday, 16 July, 2015 11:35:36 AM Subject: Re: [Qemu-devel] [PATCH 2/2 v3] virtio-rng: Serve pending request if any after timer bumps up quota. On (Wed) 15 Jul 2015 [17:46:48], Pankaj Gupta wrote: We are arming timer when we get first request from guest. Even if guest pulls all the data we will be serving guest only when timer bumps up new quota. When timer expires we check if we have a pending request from guest, we serve it and re-arm the timer else we don't do any thing. Signed-off-by: Pankaj Gupta pagu...@redhat.com --- hw/virtio/virtio-rng.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c index ae04352..3d9a002 100644 --- a/hw/virtio/virtio-rng.c +++ b/hw/virtio/virtio-rng.c @@ -142,9 +142,13 @@ static int virtio_rng_load(QEMUFile *f, void *opaque, int version_id) static void check_rate_limit(void *opaque) { VirtIORNG *vrng = opaque; + size_t size; vrng-quota_remaining = vrng-conf.max_bytes; - virtio_rng_process(vrng); + size = get_request_size(vrng-vq, 0); + if (size 0) { + virtio_rng_process(vrng); + } vrng-activate_timer = true; Ah, this isn't helping us much here; we might as well return in a similar way from virtio_rng_process. What I had written earlier was slightly different than this: So now with your changes, here is what we can do: instead of just activating the timer in check_rate_limit(), we can issue a get_request_size() call. If the return value is 0, it means we have a buffer queued up by the guest, and we can then call virtio_rng_process() and set activated to true. Else, no need to call virtio_rng_process at all, and the job for the timer is done, since there are no more outstanding requests from the guest. Anyway we can look at that later, patch 1 is already a big improvement so I think we should just stick with that, and think about other things in a different series. Sure. Thanks, Pankaj Thanks, Amit
Re: [Qemu-devel] [PATCH V2] virtio-net: unbreak any layout
On Thu, Jul 16, 2015 at 01:54:38PM +0800, Jason Wang wrote: On 07/15/2015 07:23 PM, Michael S. Tsirkin wrote: On Wed, Jul 15, 2015 at 03:56:07PM +0800, Jason Wang wrote: Commit 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) breaks any layout by requiring out_sg[0].iov_len = n-guest_hdr_len. Fixing this by copying header to temporary buffer if swap is needed, and then use this buffer as part of out_sg. Fixes 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) Cc: qemu-sta...@nongnu.org Cc: c...@fr.ibm.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - avoid header copying if there's no need to do header swap - don't write the header back --- hw/net/virtio-net.c | 17 ++--- include/hw/virtio/virtio-access.h | 9 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e3c2db3..12322bd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1142,7 +1142,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) ssize_t ret, len; unsigned int out_num = elem.out_num; struct iovec *out_sg = elem.out_sg[0]; -struct iovec sg[VIRTQUEUE_MAX_SIZE]; +struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE]; +struct virtio_net_hdr hdr; if (out_num 1) { error_report(virtio-net header not in first element); @@ -1150,11 +1151,21 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } if (n-has_vnet_hdr) { -if (out_sg[0].iov_len n-guest_hdr_len) { +if (iov_size(out_sg, out_num) n-guest_hdr_len) { error_report(virtio-net header incorrect); exit(1); } -virtio_net_hdr_swap(vdev, (void *) out_sg[0].iov_base); +if (virtio_needs_swap(vdev)) { +iov_to_buf(out_sg, out_num, 0, hdr, sizeof(hdr)); +virtio_net_hdr_swap(vdev, (void *) hdr); +sg2[0].iov_base = hdr; +sg2[0].iov_len = sizeof(hdr); +out_num = iov_copy(sg2[1], ARRAY_SIZE(sg2) - 1, + out_sg, out_num, + sizeof(hdr), -1); +out_num += 1; +out_sg = sg2; +} } /* I guess this works but iov cpy is pretty expensive. - When using the tun backend, we can just set VNETBE/VNETLE correctly and avoid swaps in userspace. It's easy to probe for - why don't we do this? We need to fix stable branches first. And if we do this for 2.4, if running on a older kernel which does not support VNETBE/VNETLE, do we want to fail the initialization? If yes, it breaks backward compatibility. If no, we still need the swap here. - OTOH when not using the tun backend, offloads are generally disabled, so the swap isn't needed since header is going to be all 0s. I agree it's better to check backend (e.g host_hdr_len) before trying to do the swap. - what's left is old kernels when using tun. how about probing for that explicitly? In other words, virtio_net_needs_swap as opposed to the generic virtio_needs_swap. So we still need swap anyway. How about apply this patch first for stable and 2.4 and probe the VNETBE/VNETLE explicitly on top for 2.5? I'm fine with it being a patch on top, but I think it's preferable to use VNETBE/VNETLE even in 2.4 if that's available. diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index cee5dd7..1ec1dfd 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -143,6 +143,15 @@ static inline uint64_t virtio_ldq_p(VirtIODevice *vdev, const void *ptr) } } +static inline bool virtio_needs_swap(VirtIODevice *vdev) +{ +#ifdef HOST_WORDS_BIGENDIAN +return virtio_access_is_big_endian(vdev) ? false : true; +#else +return virtio_access_is_big_endian(vdev) ? true : false; +#endif +} + static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s) { #ifdef HOST_WORDS_BIGENDIAN -- 2.1.4
Re: [Qemu-devel] [PATCH v4 0/7] Fix QEMU crash during memory hotplug with vhost=on
On Wed, 15 Jul 2015 19:32:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 05:12:01PM +0200, Igor Mammedov wrote: On Thu, 9 Jul 2015 13:47:17 +0200 Igor Mammedov imamm...@redhat.com wrote: there also is yet another issue with vhost-user. It also has very low limit on amount of memory regions (if I recall correctly 8) and it's possible to trigger even without memory hotplug. one just need to start QEMU with a several -numa memdev= options to create a necessary amount of memory regions to trigger it. lowrisk option to fix it would be increasing limit in vhost-user backend. another option is disabling vhost and fall-back to virtio, but I don't know much about vhost if it's possible to to switch it off without loosing packets guest was sending at the moment and if it will work at all with vhost. With vhost-user you can't fall back to virtio: it's not an accelerator, it's the backend. Updating the protocol to support a bigger table is possible but old remotes won't be able to support it. it looks like increasing limit is the only option left. it's not ideal that old remotes /with hardcoded limit/ might not be able to handle bigger table but at least new ones and ones that handle VhostUserMsg payload dynamically would be able to work without crashing.
[Qemu-devel] [PULL 2/9] target-mips: fix to clear MSACSR.Cause
From: Yongbok Kim yongbok@imgtec.com MSACSR.Cause bits are needed to be cleared before a vector floating-point instructions. FEXDO.df, FEXUPL.df and FEXUPR.df were missed out. Signed-off-by: Yongbok Kim yongbok@imgtec.com Reviewed-by: Aurelien Jarno aurel...@aurel32.net Reviewed-by: Leon Alrae leon.al...@imgtec.com Signed-off-by: Leon Alrae leon.al...@imgtec.com --- target-mips/msa_helper.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/target-mips/msa_helper.c b/target-mips/msa_helper.c index 26ffdc7..a1cb48f 100644 --- a/target-mips/msa_helper.c +++ b/target-mips/msa_helper.c @@ -2642,6 +2642,8 @@ void helper_msa_fexdo_df(CPUMIPSState *env, uint32_t df, uint32_t wd, wr_t *pwt = (env-active_fpu.fpr[wt].wr); uint32_t i; +clear_msacsr_cause(env); + switch (df) { case DF_WORD: for (i = 0; i DF_ELEMENTS(DF_WORD); i++) { @@ -3192,6 +3194,8 @@ void helper_msa_fexupl_df(CPUMIPSState *env, uint32_t df, uint32_t wd, wr_t *pws = (env-active_fpu.fpr[ws].wr); uint32_t i; +clear_msacsr_cause(env); + switch (df) { case DF_WORD: for (i = 0; i DF_ELEMENTS(DF_WORD); i++) { @@ -3224,6 +3228,8 @@ void helper_msa_fexupr_df(CPUMIPSState *env, uint32_t df, uint32_t wd, wr_t *pws = (env-active_fpu.fpr[ws].wr); uint32_t i; +clear_msacsr_cause(env); + switch (df) { case DF_WORD: for (i = 0; i DF_ELEMENTS(DF_WORD); i++) { -- 2.1.0
[Qemu-devel] [PULL 9/9] target-mips: fix page fault address for LWL/LWR/LDL/LDR
From: Aurelien Jarno aurel...@aurel32.net When a LWL, LWR, LDL or LDR instruction triggers a page fault, QEMU currently reports the aligned address in CP0 BadVAddr, while the Windows NT kernel expects the unaligned address. This patch adds a byte access with the unaligned address at the beginning of the LWL/LWR/LDL/LDR instructions to possibly trigger a page fault and fill the QEMU TLB. Cc: Leon Alrae leon.al...@imgtec.com Reported-by: Hervé Poussineau hpous...@reactos.org Tested-by: Hervé Poussineau hpous...@reactos.org Signed-off-by: Aurelien Jarno aurel...@aurel32.net Signed-off-by: Leon Alrae leon.al...@imgtec.com --- target-mips/translate.c | 12 1 file changed, 12 insertions(+) diff --git a/target-mips/translate.c b/target-mips/translate.c index 4a1ffdb..d1de35a 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -2142,6 +2142,9 @@ static void gen_ld(DisasContext *ctx, uint32_t opc, break; case OPC_LDL: t1 = tcg_temp_new(); +/* Do a byte access to possibly trigger a page + fault with the unaligned address. */ +tcg_gen_qemu_ld_tl(t1, t0, ctx-mem_idx, MO_UB); tcg_gen_andi_tl(t1, t0, 7); #ifndef TARGET_WORDS_BIGENDIAN tcg_gen_xori_tl(t1, t1, 7); @@ -2163,6 +2166,9 @@ static void gen_ld(DisasContext *ctx, uint32_t opc, break; case OPC_LDR: t1 = tcg_temp_new(); +/* Do a byte access to possibly trigger a page + fault with the unaligned address. */ +tcg_gen_qemu_ld_tl(t1, t0, ctx-mem_idx, MO_UB); tcg_gen_andi_tl(t1, t0, 7); #ifdef TARGET_WORDS_BIGENDIAN tcg_gen_xori_tl(t1, t1, 7); @@ -2229,6 +2235,9 @@ static void gen_ld(DisasContext *ctx, uint32_t opc, break; case OPC_LWL: t1 = tcg_temp_new(); +/* Do a byte access to possibly trigger a page + fault with the unaligned address. */ +tcg_gen_qemu_ld_tl(t1, t0, ctx-mem_idx, MO_UB); tcg_gen_andi_tl(t1, t0, 3); #ifndef TARGET_WORDS_BIGENDIAN tcg_gen_xori_tl(t1, t1, 3); @@ -2251,6 +2260,9 @@ static void gen_ld(DisasContext *ctx, uint32_t opc, break; case OPC_LWR: t1 = tcg_temp_new(); +/* Do a byte access to possibly trigger a page + fault with the unaligned address. */ +tcg_gen_qemu_ld_tl(t1, t0, ctx-mem_idx, MO_UB); tcg_gen_andi_tl(t1, t0, 3); #ifdef TARGET_WORDS_BIGENDIAN tcg_gen_xori_tl(t1, t1, 3); -- 2.1.0
Re: [Qemu-devel] [RFC PATCH v0] spapr: Abort when hash table size requirement isn't met
On Wed, Jul 15, 2015 at 03:27:13PM +0530, Bharata B Rao wrote: [This patch addresses an issue which is not prominently seen in mainline, but seen frequently only in David's spapr-next branch. Though it is possible to see this issue with mainline too, the current version of the patch is intended for David's tree.] QEMU requests for hash table allocation through KVM_PPC_ALLOCATE_HTAB ioctl by providing the size hint via htab_shift value. Sometimes the hinted size requirement can't be met by the host and it returns with a lower value for htab_shift. This was fine until recently where the hash table size was dependent on guest RAM size. With the intention of supporting memory hotplug, hash table size was changed to depend on maxram size recently. Since it is typical to have maxram size to be much higher than RAM size, the possibility of host not being able to meet the size requirement has increased. This causes two problems: - When memory hotplug is supported, we will not be able to grow till maxram if the host wasn't able to satisfy the hash table size for the the full maxram range. This is a recoverable condition where the hotplug can be gracefully failed. - During migration, we can end up having different htab_shift values (and hence different hash table sizes) at the source and target due to which the migration fails. One possible way to solve this is to change (reduce) the maxram_size based on the negotiated value of htab_shit and use the changed value of maxram_size at the target during migration. However AFAIK, currently there is no way to communicate the changed maxram_size back to libvirt, so this solution may not be feasible. So it is the question of whether to allow the guest to boot with reduced hashtable size and fail migration (this is the current behaviour) or As done in this patch, prevent the booting of the VM altogether. I am leaning towards the former. Thoughts ? Regards, Bharata.
Re: [Qemu-devel] [PATCH] qemu-char: Fix missed data on unix socket
On Thu, 16 Jul 2015, Amit Shah wrote: On (Wed) 15 Jul 2015 [23:44:52], Nils Carlson wrote: On Mon, 13 Jul 2015, Nils Carlson wrote: On Mon, 13 Jul 2015, Amit Shah wrote: snip Also, returning TRUE there isn't right - if the connection ends, we should return FALSE. I agree that this seems reasonable. I will change it and re-test. I had a closer look, and it seems always returning true is intentional here, the called function, tcp_chr_disconnect(chr), handles the deregistration from handlers. If we were to return FALSE we would be duplicating work and possibly breaking things. Not sure how. Anyway, can you please start a new thread, with the author and reviewers of the patch CC'ed, so they can chime in as well? The authors and reviewers of which patch exactly? The fix for windows which broke cloudstack was done in 812c1057. The returning TRUE everywhere was done in commit cdbf6e16 by Paolo Bonzini, CC:d above. I don't think there was anything wrong with either patch, the former one simply broken a strange behaviour CloudStack was relying on which I would say is dubious (fire and forget messaging.) /Nils
[Qemu-devel] [Bug 1465935] Re: kvm_irqchip_commit_routes: Assertion `ret == 0' failed
The proposed fix seems not yet part of any qemu release but applied as commit bdf026317daa3b9dfa281f29e96fbb6fd48394c8 Author: 马文霜 kevin...@tencent.com Date: Wed Jul 1 15:41:41 2015 +0200 Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES to v2.4.0-rc0. So this would affect all current releases and the current development release (Wily/15.10). I would start there with reproduction/verification and work backwards from there. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1465935 Title: kvm_irqchip_commit_routes: Assertion `ret == 0' failed Status in QEMU: New Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Precise: New Status in qemu source package in Trusty: New Status in qemu source package in Utopic: New Status in qemu source package in Vivid: New Bug description: Several my QEMU instances crashed, and in the qemu log, I can see this assertion failure, qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. The QEMU version is 2.0.0, HV OS is ubuntu 12.04, kernel 3.2.0-38. Guest OS is RHEL 6.3. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1465935/+subscriptions
[Qemu-devel] [PULL 3/9] disas/mips: fix disassembling R6 instructions
From: Yongbok Kim yongbok@imgtec.com In the Release 6 of the MIPS Architecture, LL, SC, LLD, SCD, PREF and CACHE instructions have 9 bits offsets. Signed-off-by: Yongbok Kim yongbok@imgtec.com Reviewed-by: Aurelien Jarno aurel...@aurel32.net Reviewed-by: Leon Alrae leon.al...@imgtec.com Signed-off-by: Leon Alrae leon.al...@imgtec.com --- disas/mips.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/disas/mips.c b/disas/mips.c index 32940fe..01336a8 100644 --- a/disas/mips.c +++ b/disas/mips.c @@ -1296,12 +1296,12 @@ const struct mips_opcode mips_builtin_opcodes[] = {dmod,d,s,t,0x00de, 0xfc0007ff, WR_d|RD_s|RD_t, 0, I64R6}, {ddivu, d,s,t,0x009f, 0xfc0007ff, WR_d|RD_s|RD_t, 0, I64R6}, {dmodu, d,s,t,0x00df, 0xfc0007ff, WR_d|RD_s|RD_t, 0, I64R6}, -{ll, t,o(b), 0x7c36, 0xfc7f, LDD|RD_b|WR_t,0, I32R6}, -{sc, t,o(b), 0x7c26, 0xfc7f, LDD|RD_b|WR_t,0, I32R6}, -{lld, t,o(b), 0x7c37, 0xfc7f, LDD|RD_b|WR_t,0, I64R6}, -{scd, t,o(b), 0x7c27, 0xfc7f, LDD|RD_b|WR_t,0, I64R6}, -{pref,h,o(b), 0x7c35, 0xfc7f, RD_b, 0, I32R6}, -{cache, k,o(b), 0x7c25, 0xfc7f, RD_b, 0, I32R6}, +{ll, t,+o(b), 0x7c36, 0xfc7f, LDD|RD_b|WR_t,0, I32R6}, +{sc, t,+o(b), 0x7c26, 0xfc7f, LDD|RD_b|WR_t,0, I32R6}, +{lld, t,+o(b), 0x7c37, 0xfc7f, LDD|RD_b|WR_t,0, I64R6}, +{scd, t,+o(b), 0x7c27, 0xfc7f, LDD|RD_b|WR_t,0, I64R6}, +{pref,h,+o(b), 0x7c35, 0xfc7f, RD_b, 0, I32R6}, +{cache, k,+o(b), 0x7c25, 0xfc7f, RD_b, 0, I32R6}, {seleqz, d,v,t,0x0035, 0xfc0007ff, WR_d|RD_s|RD_t, 0, I32R6}, {selnez, d,v,t,0x0037, 0xfc0007ff, WR_d|RD_s|RD_t, 0, I32R6}, {maddf.s, D,S,T,0x4618, 0xffe0003f, WR_D|RD_S|RD_T|FP_S, 0, I32R6}, -- 2.1.0
[Qemu-devel] [PULL 5/9] target-mips: correct DERET instruction
Fix Debug Mode flag clearing, and when DERET is placed between LL and SC do not make SC fail. Signed-off-by: Leon Alrae leon.al...@imgtec.com Reviewed-by: Aurelien Jarno aurel...@aurel32.net --- target-mips/op_helper.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index d457a29..9c28631 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -2154,10 +2154,9 @@ void helper_deret(CPUMIPSState *env) debug_pre_eret(env); set_pc(env, env-CP0_DEPC); -env-hflags = MIPS_HFLAG_DM; +env-hflags = ~MIPS_HFLAG_DM; compute_hflags(env); debug_post_eret(env); -env-lladdr = 1; } #endif /* !CONFIG_USER_ONLY */ -- 2.1.0
[Qemu-devel] [PULL 1/9] target-mips: fix MIPS64R6-generic configuration
From: Yongbok Kim yongbok@imgtec.com Fix core configuration for MIPS64R6-generic to make it as close as I6400. I6400 core has 48-bit of Virtual Address available (SEGBITS). MIPS SIMD Architecture is available. Rearrange order of bits to match the specification. Signed-off-by: Yongbok Kim yongbok@imgtec.com Reviewed-by: Aurelien Jarno aurel...@aurel32.net Reviewed-by: Leon Alrae leon.al...@imgtec.com Signed-off-by: Leon Alrae leon.al...@imgtec.com --- target-mips/mips-defs.h | 2 +- target-mips/translate_init.c | 18 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/target-mips/mips-defs.h b/target-mips/mips-defs.h index 20aa87c..53b185e 100644 --- a/target-mips/mips-defs.h +++ b/target-mips/mips-defs.h @@ -11,7 +11,7 @@ #if defined(TARGET_MIPS64) #define TARGET_LONG_BITS 64 #define TARGET_PHYS_ADDR_SPACE_BITS 48 -#define TARGET_VIRT_ADDR_SPACE_BITS 42 +#define TARGET_VIRT_ADDR_SPACE_BITS 48 #else #define TARGET_LONG_BITS 32 #define TARGET_PHYS_ADDR_SPACE_BITS 40 diff --git a/target-mips/translate_init.c b/target-mips/translate_init.c index ddfaff8..9304e74 100644 --- a/target-mips/translate_init.c +++ b/target-mips/translate_init.c @@ -655,14 +655,14 @@ static const mips_def_t mips_defs[] = (2 CP0C1_DS) | (4 CP0C1_DL) | (3 CP0C1_DA) | (0 CP0C1_PC) | (1 CP0C1_WR) | (1 CP0C1_EP), .CP0_Config2 = MIPS_CONFIG2, -.CP0_Config3 = MIPS_CONFIG3 | (1 CP0C3_RXI) | (1 CP0C3_BP) | - (1 CP0C3_BI) | (1 CP0C3_ULRI) | (1 CP0C3_LPA) | - (1U CP0C3_M), -.CP0_Config4 = MIPS_CONFIG4 | (0xfc CP0C4_KScrExist) | - (3 CP0C4_IE) | (1 CP0C4_M), +.CP0_Config3 = MIPS_CONFIG3 | (1U CP0C3_M) | (1 CP0C3_MSAP) | + (1 CP0C3_BP) | (1 CP0C3_BI) | (1 CP0C3_ULRI) | + (1 CP0C3_RXI) | (1 CP0C3_LPA), +.CP0_Config4 = MIPS_CONFIG4 | (1U CP0C4_M) | (3 CP0C4_IE) | + (0xfc CP0C4_KScrExist), .CP0_Config5 = MIPS_CONFIG5 | (1 CP0C5_LLB), -.CP0_Config5_rw_bitmask = (1 CP0C5_SBRI) | (1 CP0C5_FRE) | - (1 CP0C5_UFE), +.CP0_Config5_rw_bitmask = (1 CP0C5_MSAEn) | (1 CP0C5_SBRI) | + (1 CP0C5_FRE) | (1 CP0C5_UFE), .CP0_LLAddr_rw_bitmask = 0, .CP0_LLAddr_shift = 0, .SYNCI_Step = 32, @@ -674,9 +674,9 @@ static const mips_def_t mips_defs[] = .CP1_fcr0 = (1 FCR0_FREP) | (1 FCR0_F64) | (1 FCR0_L) | (1 FCR0_W) | (1 FCR0_D) | (1 FCR0_S) | (0x00 FCR0_PRID) | (0x0 FCR0_REV), -.SEGBITS = 42, +.SEGBITS = 48, .PABITS = 48, -.insn_flags = CPU_MIPS64R6, +.insn_flags = CPU_MIPS64R6 | ASE_MSA, .mmu_type = MMU_TYPE_R4000, }, { -- 2.1.0
[Qemu-devel] [PULL 6/9] target-mips: fix logically dead code reported by Coverity
Make use of CMPOP in floating-point compare instructions. Signed-off-by: Leon Alrae leon.al...@imgtec.com Reviewed-by: Aurelien Jarno aurel...@aurel32.net --- target-mips/translate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/target-mips/translate.c b/target-mips/translate.c index 7302857..4a1ffdb 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -9552,6 +9552,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, gen_cmp_s(ctx, func-48, ft, fs, cc); opn = condnames[func-48]; } +optype = CMPOP; break; case OPC_ADD_D: check_cp1_registers(ctx, fs | ft | fd); @@ -10036,6 +10037,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, gen_cmp_d(ctx, func-48, ft, fs, cc); opn = condnames[func-48]; } +optype = CMPOP; break; case OPC_CVT_S_D: check_cp1_registers(ctx, fs); @@ -10461,6 +10463,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1, gen_cmp_ps(ctx, func-48, ft, fs, cc); opn = condnames[func-48]; } +optype = CMPOP; break; default: MIPS_INVAL(opn); -- 2.1.0
Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx-dispatching optimization
On 16/07/2015 11:14, Kevin Wolf wrote: With this information, I understand that what has changed is that the return value of g_main_context_iteration() changes from true before this patch (had the aio_notify() from aio_set_fd_handler() pending) to false after the patch (aio_notify() doesn't inject an event any more). This should mean that like above we can assert that the first iteration returns false, i.e. reverse the assertion (and indeed, with this change the test still passes for me). I was a bit undecided about this. In the end I decided that the calls to aio_poll/g_main_context_iteration were just to put the AioContext in a known state, and the assertions on the return value of g_assert were not really important. For this reason, the while loop seemed to express the intentions best, and I made it consistent between the AioContext and GSource cases. You changed the AioContext case in this same patch, even if you didn't quote my comment on that hunk. :-) Both cases were asserting the return value before. I'll change the testcase (other than the aio_notify testcase) in a separate patch. Paolo
Re: [Qemu-devel] [PULL 0/2] X86 queue, 2015-07-15
On 15 July 2015 at 21:27, Eduardo Habkost ehabk...@redhat.com wrote: The following changes since commit 7692401a0826803522cfde533bdcc149932ddc6a: Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150715' into staging (2015-07-15 17:28:59 +0100) are available in the git repository at: git://github.com/ehabkost/qemu.git tags/x86-pull-request for you to fetch changes up to 3046bb5debc8153a542acb1df93b2a1a85527a15: target-i386: emulate CPUID level of real hardware (2015-07-15 17:05:59 -0300) X86 queue, 2015-07-15 Two bug fixes: * Memory leak due to extra g_strdup() when registering X86CPU alias properties * Fix CPUID levels so that W10 insider can run as guest OS Applied, thanks. -- PMM
Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
On Thu, 16 Jul 2015 10:30:41 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:24:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 17:08:27 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote: Hi, A while ago I was pondering on the options available for retrieving a fw_cfg blob from the guest-side (now that we can insert fw_cfg files on the host-side command line, see commit 81b2b8106). So over the last couple of weekends I cooked up the sysfs kernel module below, which lists all fw_cfg files under /sys/firmware/fw_cfg/filename. One concern here is that there will be a conflict here if fw cfg is used by ACPI. I don't know whether that last is a good idea though, so maybe not a real concern. I think Igor wanted to make it so. I don't see any conflict here so far, it's just guest side module that accesses fw_cfg. If there's ACPI code that accesses fw_cfg in response to an interrupt, it'll race with fw cfg access by guest OS. On linux we might be able to block ACPI preventing it from running. We probably won't be able to do it on windows. wrt vmgenid series we were talking about possibility to access fw_cfg from ACPI device.init so it's unlikely that it will ever collide with much later sysfs accesses. Don't we need to get updates when it changes too? I've thought that it's pretty static and doesn't change. But if ACPI will start accessing fw_cfg from other methods it will race for sure. Maybe we shouldn't poke at fw cfg in ACPI then. Yep, maybe we shouldn't. The last vmgenid series just maps UUID region at fixed location (which is sufficient to build ACPI tables at machine done time) so there isn't real need to export address via fw_cfg.
[Qemu-devel] [PATCH v2 1/3] tests: remove irrelevant assertions from test-aio
In these tests, the purpose of the initial calls to aio_poll and g_main_context_iteration is simply to put the AioContext in a known state; the return value of the function does not really matter. The next patch will change those return values; change the assertions to a while loop which expresses the intention better. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- tests/test-aio.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/test-aio.c b/tests/test-aio.c index a7cb5c9..e7bbb83 100644 --- a/tests/test-aio.c +++ b/tests/test-aio.c @@ -331,7 +331,7 @@ static void test_wait_event_notifier(void) EventNotifierTestData data = { .n = 0, .active = 1 }; event_notifier_init(data.e, false); aio_set_event_notifier(ctx, data.e, event_ready_cb); -g_assert(!aio_poll(ctx, false)); +while (aio_poll(ctx, false)); g_assert_cmpint(data.n, ==, 0); g_assert_cmpint(data.active, ==, 1); @@ -356,7 +356,7 @@ static void test_flush_event_notifier(void) EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true }; event_notifier_init(data.e, false); aio_set_event_notifier(ctx, data.e, event_ready_cb); -g_assert(!aio_poll(ctx, false)); +while (aio_poll(ctx, false)); g_assert_cmpint(data.n, ==, 0); g_assert_cmpint(data.active, ==, 10); @@ -669,7 +669,7 @@ static void test_source_wait_event_notifier(void) EventNotifierTestData data = { .n = 0, .active = 1 }; event_notifier_init(data.e, false); aio_set_event_notifier(ctx, data.e, event_ready_cb); -g_assert(g_main_context_iteration(NULL, false)); +while (g_main_context_iteration(NULL, false)); g_assert_cmpint(data.n, ==, 0); g_assert_cmpint(data.active, ==, 1); @@ -694,7 +694,7 @@ static void test_source_flush_event_notifier(void) EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true }; event_notifier_init(data.e, false); aio_set_event_notifier(ctx, data.e, event_ready_cb); -g_assert(g_main_context_iteration(NULL, false)); +while (g_main_context_iteration(NULL, false)); g_assert_cmpint(data.n, ==, 0); g_assert_cmpint(data.active, ==, 10); -- 2.4.3
Re: [Qemu-devel] memory hotplug fails with mem-merge option
On Thu, Jul 16, 2015 at 10:25:09AM +0530, Bharata B Rao wrote: Hi, When I use -machine pc,mem-merge=off|on -m 1G,slots=4,maxmem=2G, adding memory-backend-ram object fails like below. Same failure is seen with -machine pseries,mem-merge=on|off. (qemu) object_add memory-backend-ram,id=ram0,size=1G qemu-system-x86_64: util/qemu-option.c:388: qemu_opt_get_bool_helper: Assertion `opt-desc opt-desc-type == QEMU_OPT_BOOL' failed. Same failure seen with -machine pseries,mem-merge=off when used together with -object memory-backend-file on QEMU commandline in which case boot time failure is observed. Regards, Bharata.
Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx-dispatching is dead, all hail ctx-notify_me
Am 16.07.2015 um 11:56 hat Paolo Bonzini geschrieben: Apart from an additional assertion this is exactly the same code as v1, but split across three patches so that the important one focuses on the optimization. Paolo v1-v2 Split some changes to the tests to a separate patch Fix commit message [Laszlo] Clarify do...while loop in aio-win32.c [Kevin] Paolo Bonzini (3): tests: remove irrelevant assertions from test-aio aio-win32: reorganize polling loop AioContext: fix broken ctx-dispatching optimization Reviewed-by: Kevin Wolf kw...@redhat.com
Re: [Qemu-devel] [PATCH v9 0/4] remove icc bus/bridge
On Thu, 16 Jul 2015 10:45:41 +0800 Zhu Guihua zhugh.f...@cn.fujitsu.com wrote: ping... I'll look at it once 2.4 is released. On 07/03/2015 05:38 PM, Zhu Guihua wrote: ICC Bus was used for providing a hotpluggable bus for APIC and CPU, but now we use HotplugHandler to make hotplug. So ICC Bus is unnecessary. This code has passed the new pc-cpu-test. And I have tested with kvm along with kernel_irqchip=on/off, it works fine. This patch series is based on the latest master. v9: -use a callback to correct reset sequence for x86 -update apic mmio mapping v8: -add a wrapper to specify reset order v7: -update to register reset handler for main_system_bus when created -register reset handler for apic after all devices are initialized Chen Fan (2): apic: map APIC's MMIO region at each CPU's address space cpu/apic: drop icc bus/bridge Zhu Guihua (2): x86: use new method to correct reset sequence icc_bus: drop the unused files default-configs/i386-softmmu.mak | 1 - default-configs/x86_64-softmmu.mak | 1 - hw/cpu/Makefile.objs | 1 - hw/cpu/icc_bus.c | 118 - hw/i386/pc.c | 43 +++--- hw/i386/pc_piix.c | 9 +-- hw/i386/pc_q35.c | 9 +-- hw/intc/apic_common.c | 11 +--- include/hw/cpu/icc_bus.h | 82 -- include/hw/i386/apic_internal.h| 7 ++- include/hw/i386/pc.h | 2 +- target-i386/cpu.c | 30 +++--- 12 files changed, 52 insertions(+), 262 deletions(-) delete mode 100644 hw/cpu/icc_bus.c delete mode 100644 include/hw/cpu/icc_bus.h
Re: [Qemu-devel] write operations during read
Hi Gleb, Am 15.07.2015 um 13:04 hat Gleb Stepanov geschrieben: We are testing disk I/O perfomance using fio tool. I've measured performance on qcow2 image mounted to nb0 device. I run tests and launch iostat -x 1 command to browse I/O operations. Test consists of sequential read operations with 1 MB size. I've get following output. Device: rrqm/s wrqm/s r/s w/srkB/swkB/s sda 0.00 5.000.00 677.00 0.00 22812.00 sdb 0.00 0.000.00 545.00 0.00 0.00 dm-0 0.00 0.000.00 674.00 0.00 22784.00 dm-1 0.00 0.000.007.00 0.0028.00 dm-2 0.00 0.000.000.00 0.00 0.00 nb0 0.00 0.00 432.00 182.00 55296.00 23296.00 So, reading produce a lot of write operations. What is the reason why these write operations occured? Unfortunately you didn't include your qemu command line and didn't describe in detail what your test setup looked like, so I'll have to take a guess. Read requests never cause writes. However, you wrote 55 MB to the image before running the test. Did you make sure that this data was already flushed to the disk before starting your benchmark? If it was only sitting in the kernel page cache, the kernel can decide to write it out at any point. How to determina block size that qemu operates , because if i've writtien 55 mb with 1 mb blocks i should get 55 operations instead of 432. If you didn't use O_DIRECT in the benchmark, chances are that the guest kernel issues requests smaller than 1 MB to qemu. Be aware that you might get a few metadata reads in the guest filesystem, the qcow2 and the host filesystem layer, so slightly more than 55 requests is expected. Each layer may also have to split requests if the data is stored fragmented. Also, did you make sure that no processes other than fio access the disk? Kevin
Re: [Qemu-devel] [PATCH v4 0/7] Fix QEMU crash during memory hotplug with vhost=on
On Thu, Jul 16, 2015 at 11:42:36AM +0200, Igor Mammedov wrote: On Thu, 16 Jul 2015 10:35:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 09:26:21AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:32:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 05:12:01PM +0200, Igor Mammedov wrote: On Thu, 9 Jul 2015 13:47:17 +0200 Igor Mammedov imamm...@redhat.com wrote: there also is yet another issue with vhost-user. It also has very low limit on amount of memory regions (if I recall correctly 8) and it's possible to trigger even without memory hotplug. one just need to start QEMU with a several -numa memdev= options to create a necessary amount of memory regions to trigger it. lowrisk option to fix it would be increasing limit in vhost-user backend. another option is disabling vhost and fall-back to virtio, but I don't know much about vhost if it's possible to to switch it off without loosing packets guest was sending at the moment and if it will work at all with vhost. With vhost-user you can't fall back to virtio: it's not an accelerator, it's the backend. Updating the protocol to support a bigger table is possible but old remotes won't be able to support it. it looks like increasing limit is the only option left. it's not ideal that old remotes /with hardcoded limit/ might not be able to handle bigger table but at least new ones and ones that handle VhostUserMsg payload dynamically would be able to work without crashing. I think we need a way for hotplug to fail gracefully. As long as we don't implement the hva trick, it's needed for old kernels with vhost in kernel, too. I don't see a reliable way to fail hotplug though. In case of hotplug failure path comes from memory listener which can't fail by design but it fails in vhost case, i.e. vhost side doesn't follow protocol. We already have considered idea of querying vhost, for limit from memory hotplug handler before mapping memory region but it has drawbacks: 1. amount of memory ranges changes during guest lifecycle as it initializes different devices. which leads to a case when we can hotplug more pc-dimms than cold-plug. Which leads to inability to migrate guest with hotplugged pc-dimms since target QEMU won't start with that amount of dimms from source due to hitting limit. 2. it's ugly hack to query random 'vhost' entity when plugging dimm device from modeling pov, but we can live with it if it helps QEMU not to crash. If it's acceptable to break/ignore #1 issue, I can post related QEMU patches that I have, at least qemu won't crash with old vhost backends. Old kvm has lower limits on 3 of slots as well. How is this handled? -- MST
Re: [Qemu-devel] [PATCH] crypto/cipher-nettle.c: Pass correct function type to cbc_encrypt and cbc_decrypt.
On Thu, Jul 16, 2015 at 11:04:15AM +0100, Peter Maydell wrote: Is this the same issue that Radim's patchset from a few days back is addressing? Yes, looks like it, so ignore this patch. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Re: [Qemu-devel] [PATCH 3/3] PAM: make PAM emulation closer to documentation
16.07.2015 12:05, Paolo Bonzini пишет: On 16/07/2015 10:35, Efimov Vasily wrote: This patch improves PAM emulation. PAM defines 4 memory access redirection modes. In mode 1 reads are directed to RAM and writes are directed to PCI. In mode 2 it is contrary. In mode 0 all access is directed to PCI. In mode 3 it is directed to RAM. Modes 0 and 3 are well emulated but modes 1 and 2 are not. The cause is: aliases are used while more complicated logic is required. The idea is to use ROM device like memory regions for mode 1 and 2 emulation instead of aliases. Writes are directed to proper destination region by specified I/O callback. Read redirection depends on type of source region. In most cases source region is RAM (or ROM), so ram_addr of PAM region is set to ram_addr of source region with offset. Otherwise, when source region is an I/O region, reading is redirected to source region read callback by PAM region one. Read source and write destination regions are updated by the memory commit callback. Note that we cannot use I/O region for PAM as it will violate trying to execute code outside RAM or ROM assertion. Signed-off-by: Efimov Vasily r...@ispras.ru --- hw/pci-host/pam.c | 238 +- include/hw/pci-host/pam.h | 10 +- 2 files changed, 223 insertions(+), 25 deletions(-) diff --git a/hw/pci-host/pam.c b/hw/pci-host/pam.c index 17d826c..9729b2b 100644 --- a/hw/pci-host/pam.c +++ b/hw/pci-host/pam.c @@ -27,43 +27,233 @@ * THE SOFTWARE. */ -#include qom/object.h -#include sysemu/sysemu.h #include hw/pci-host/pam.h +#include exec/address-spaces.h +#include exec/memory-internal.h +#include qemu/bswap.h + +static void pam_write(void *opaque, hwaddr addr, uint64_t data, + unsigned size) +{ +PAMMemoryRegion *pam = (PAMMemoryRegion *) opaque; +void *ptr; + +/* Destination region can be both RAM and IO. */ +if (!memory_access_is_direct(pam-mr_write_to, true)) { +memory_region_dispatch_write(pam-mr_write_to, + addr + pam-write_offset, data, size, + MEMTXATTRS_UNSPECIFIED); +} else { +ptr = memory_region_get_ram_ptr(pam-mr_write_to) + addr + + pam-write_offset; + +switch (size) { +case 1: +stb_p(ptr, data); +break; +case 2: +stw_he_p(ptr, data); +break; +case 4: +stl_he_p(ptr, data); +break; +case 8: +stq_he_p(ptr, data); +break; +default: +abort(); +} + +invalidate_and_set_dirty(pam-mr_write_to, addr + pam-pam_offset, + size); +} +} + The idea is very good, but the implementation relies on copying a lot of code from exec.c. If it is about pam_write then, for instance, I could suggest to outline corresponding part of address_space_rw to dedicated public function. The solution will require endianness conversion because of the part works with uint8_t buffer but not with uint64_t values. The rest of code looks up destination or source region or child region offset in memory sub-tree which root is PCI or RAM region provided on PAM creation. We cannon use common address_space_translate because it searches from address space root and will return current PAM region. To summarize, I suggest to move the code to exec.c. It is generic enough. Could you use an IOMMU memory region instead? Then a single region can be used to implement all four modes, and you don't hit the trying to execute code outside RAM or RAM. Did you mean MemoryRegion.iommu_ops ? The feature does not allow to change destination memory region. Also I has no find its using during write access from CPU. And there is: exec.c: address_space_translate_for_iotlb: assert(!section-mr-iommu_ops); Paolo
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Am 10.06.2015 um 11:34 schrieb Fam Zheng: On Wed, 06/10 11:18, Christian Borntraeger wrote: Am 10.06.2015 um 04:12 schrieb Fam Zheng: On Tue, 06/09 11:01, Christian Borntraeger wrote: Am 09.06.2015 um 04:28 schrieb Fam Zheng: On Tue, 06/02 16:36, Christian Borntraeger wrote: Paolo, I bisected commit a0710f7995f914e3044e5899bd8ff6c43c62f916 Author: Paolo Bonzini pbonz...@redhat.com AuthorDate: Fri Feb 20 17:26:52 2015 +0100 Commit: Kevin Wolf kw...@redhat.com CommitDate: Tue Apr 28 15:36:08 2015 +0200 iothread: release iothread around aio_poll to cause a problem with hanging guests. Having many guests all with a kernel/ramdisk (via -kernel) and several null block devices will result in hangs. All hanging guests are in partition detection code waiting for an I/O to return so very early maybe even the first I/O. Reverting that commit fixes the hangs. Any ideas? For what its worth, I can no longer reproduce the issue on current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll) bisect tells me that commit 53ec73e264f481b79b52efcadc9ceb8f8996975c Author: Fam Zheng f...@redhat.com AuthorDate: Fri May 29 18:53:14 2015 +0800 Commit: Stefan Hajnoczi stefa...@redhat.com CommitDate: Tue Jul 7 14:27:14 2015 +0100 block: Use bdrv_drain to replace uncessary bdrv_drain_all made the problem will blk-null go away. I still dont understand why. Christian
Re: [Qemu-devel] [PATCH v4 0/7] Fix QEMU crash during memory hotplug with vhost=on
On Thu, 16 Jul 2015 10:35:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 09:26:21AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:32:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 05:12:01PM +0200, Igor Mammedov wrote: On Thu, 9 Jul 2015 13:47:17 +0200 Igor Mammedov imamm...@redhat.com wrote: there also is yet another issue with vhost-user. It also has very low limit on amount of memory regions (if I recall correctly 8) and it's possible to trigger even without memory hotplug. one just need to start QEMU with a several -numa memdev= options to create a necessary amount of memory regions to trigger it. lowrisk option to fix it would be increasing limit in vhost-user backend. another option is disabling vhost and fall-back to virtio, but I don't know much about vhost if it's possible to to switch it off without loosing packets guest was sending at the moment and if it will work at all with vhost. With vhost-user you can't fall back to virtio: it's not an accelerator, it's the backend. Updating the protocol to support a bigger table is possible but old remotes won't be able to support it. it looks like increasing limit is the only option left. it's not ideal that old remotes /with hardcoded limit/ might not be able to handle bigger table but at least new ones and ones that handle VhostUserMsg payload dynamically would be able to work without crashing. I think we need a way for hotplug to fail gracefully. As long as we don't implement the hva trick, it's needed for old kernels with vhost in kernel, too. I don't see a reliable way to fail hotplug though. In case of hotplug failure path comes from memory listener which can't fail by design but it fails in vhost case, i.e. vhost side doesn't follow protocol. We already have considered idea of querying vhost, for limit from memory hotplug handler before mapping memory region but it has drawbacks: 1. amount of memory ranges changes during guest lifecycle as it initializes different devices. which leads to a case when we can hotplug more pc-dimms than cold-plug. Which leads to inability to migrate guest with hotplugged pc-dimms since target QEMU won't start with that amount of dimms from source due to hitting limit. 2. it's ugly hack to query random 'vhost' entity when plugging dimm device from modeling pov, but we can live with it if it helps QEMU not to crash. If it's acceptable to break/ignore #1 issue, I can post related QEMU patches that I have, at least qemu won't crash with old vhost backends.
[Qemu-devel] [PATCH v2 0/3] AioContext: ctx-dispatching is dead, all hail ctx-notify_me
Apart from an additional assertion this is exactly the same code as v1, but split across three patches so that the important one focuses on the optimization. Paolo v1-v2 Split some changes to the tests to a separate patch Fix commit message [Laszlo] Clarify do...while loop in aio-win32.c [Kevin] Paolo Bonzini (3): tests: remove irrelevant assertions from test-aio aio-win32: reorganize polling loop AioContext: fix broken ctx-dispatching optimization aio-posix.c | 19 ++-- aio-win32.c | 41 ++ async.c | 21 +- docs/aio_notify.promela | 77 + include/block/aio.h | 29 +++ tests/test-aio.c| 26 +++-- 6 files changed, 98 insertions(+), 115 deletions(-) -- 2.4.3 diff from v1: diff --git a/aio-win32.c b/aio-win32.c index ae7c6cf..9d6c12f 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -312,7 +312,13 @@ bool aio_poll(AioContext *ctx, bool blocking) ctx-walking_handlers--; first = true; -/* wait until next event */ +/* ctx-notifier is always registered. */ +assert(count 0); + +/* Multiple iterations, all of them non-blocking except the first, + * may be necessary to process all pending events. After the first + * WaitForMultipleObjects call ctx-notify_me will be decremented. + */ do { HANDLE event; int ret;
[Qemu-devel] [PATCH v2 3/3] AioContext: fix broken ctx-dispatching optimization
This patch rewrites the ctx-dispatching optimization, which was the cause of some mysterious hangs that could be reproduced on aarch64 KVM only. The hangs were indirectly caused by aio_poll() and in particular by flash memory updates's call to blk_write(), which invokes aio_poll(). Fun stuff: they had an extremely short race window, so much that adding all kind of tracing to either the kernel or QEMU made it go away (a single printf made it half as reproducible). On the plus side, the failure mode (a hang until the next keypress) made it very easy to examine the state of the process with a debugger. And there was a very nice reproducer from Laszlo, which failed pretty often (more than half of the time) on any version of QEMU with a non-debug kernel; it also failed fast, while still in the firmware. So, it could have been worse. For some unknown reason they happened only with virtio-scsi, but that's not important. It's more interesting that they disappeared with io=native, making thread-pool.c a likely suspect for where the bug arose. thread-pool.c is also one of the few places which use bottom halves across threads, by the way. I hope that no other similar bugs exist, but just in case :) I am going to describe how the successful debugging went... Since the likely culprit was the ctx-dispatching optimization, which mostly affects bottom halves, the first observation was that there are two qemu_bh_schedule() invocations in the thread pool: the one in the aio worker and the one in thread_pool_completion_bh. The latter always causes the optimization to trigger, the former may or may not. In order to restrict the possibilities, I introduced new functions qemu_bh_schedule_slow() and qemu_bh_schedule_fast(): /* qemu_bh_schedule_slow: */ ctx = bh-ctx; bh-idle = 0; if (atomic_xchg(bh-scheduled, 1) == 0) { event_notifier_set(ctx-notifier); } /* qemu_bh_schedule_fast: */ ctx = bh-ctx; bh-idle = 0; assert(ctx-dispatching); atomic_xchg(bh-scheduled, 1); Notice how the atomic_xchg is still in qemu_bh_schedule_slow(). This was already debated a few months ago, so I assumed it to be correct. In retrospect this was a very good idea, as you'll see later. Changing thread_pool_completion_bh() to qemu_bh_schedule_fast() didn't trigger the assertion (as expected). Changing the worker's invocation to qemu_bh_schedule_slow() didn't hide the bug (another assumption which luckily held). This already limited heavily the amount of interaction between the threads, hinting that the problematic events must have triggered around thread_pool_completion_bh(). As mentioned early, invoking a debugger to examine the state of a hung process was pretty easy; the iothread was always waiting on a poll(..., -1) system call. Infinite timeouts are much rarer on x86, and this could be the reason why the bug was never observed there. With the buggy sequence more or less resolved to an interaction between thread_pool_completion_bh() and poll(..., -1), my tracing strategy was to just add a few qemu_clock_get_ns(QEMU_CLOCK_REALTIME) calls, hoping that the ordering of aio_ctx_prepare(), aio_ctx_dispatch, poll() and qemu_bh_schedule_fast() would provide some hint. The output was: (gdb) p last_prepare $3 = 103885451 (gdb) p last_dispatch $4 = 103876492 (gdb) p last_poll $5 = 115909333 (gdb) p last_schedule $6 = 115925212 Notice how the last call to qemu_poll_ns() came after aio_ctx_dispatch(). This makes little sense unless there is an aio_poll() call involved, and indeed with a slightly different instrumentation you can see that there is one: (gdb) p last_prepare $3 = 107569679 (gdb) p last_dispatch $4 = 107561600 (gdb) p last_aio_poll $5 = 110671400 (gdb) p last_schedule $6 = 110698917 So the scenario becomes clearer: iothread VCPU thread -- aio_ctx_prepare aio_ctx_check qemu_poll_ns(timeout=-1) aio_poll aio_dispatch thread_pool_completion_bh qemu_bh_schedule() At this point bh-scheduled = 1 and the iothread has not been woken up. The solution must be close, but this alone should not be a problem, because the bottom half is only rescheduled to account for rare situations (see commit 3c80ca1, thread-pool: avoid deadlock in nested aio_poll() calls, 2014-07-15). Introducing a third thread---a thread pool worker thread, which also does qemu_bh_schedule()---does bring out the problematic case. The third thread must be awakened *after* the callback is complete and thread_pool_completion_bh has redone the whole loop, explaining the short race window. And then this is what happens: thread pool worker
[Qemu-devel] [PATCH v2 2/3] aio-win32: reorganize polling loop
Preparatory bugfixes and tweaks to the loop before the next patch: - disable dispatch optimization during aio_prepare. This fixes a bug. - do not modify blocking until after the first WaitForMultipleObjects call. This is needed in the next patch. - change the loop to do...while. This makes it obvious that the loop is always entered at least once. In the next patch this is important because the first iteration undoes the ctx-notify_me increment that happened before entering the loop. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- aio-win32.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/aio-win32.c b/aio-win32.c index 233d8f5..9268b5c 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -284,11 +284,6 @@ bool aio_poll(AioContext *ctx, bool blocking) int timeout; aio_context_acquire(ctx); -have_select_revents = aio_prepare(ctx); -if (have_select_revents) { -blocking = false; -} - was_dispatching = ctx-dispatching; progress = false; @@ -304,6 +299,8 @@ bool aio_poll(AioContext *ctx, bool blocking) */ aio_set_dispatching(ctx, !blocking); +have_select_revents = aio_prepare(ctx); + ctx-walking_handlers++; /* fill fd sets */ @@ -317,12 +314,18 @@ bool aio_poll(AioContext *ctx, bool blocking) ctx-walking_handlers--; first = true; -/* wait until next event */ -while (count 0) { +/* ctx-notifier is always registered. */ +assert(count 0); + +/* Multiple iterations, all of them non-blocking except the first, + * may be necessary to process all pending events. After the first + * WaitForMultipleObjects call ctx-notify_me will be decremented. + */ +do { HANDLE event; int ret; -timeout = blocking +timeout = blocking !have_select_revents ? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0; if (timeout) { aio_context_release(ctx); @@ -351,7 +354,7 @@ bool aio_poll(AioContext *ctx, bool blocking) blocking = false; progress |= aio_dispatch_handlers(ctx, event); -} +} while (count 0); progress |= timerlistgroup_run_timers(ctx-tlg); -- 2.4.3
Re: [Qemu-devel] [PATCH] crypto/cipher-nettle.c: Pass correct function type to cbc_encrypt and cbc_decrypt.
On 16 July 2015 at 11:08, Paolo Bonzini pbonz...@redhat.com wrote: On 16/07/2015 12:04, Peter Maydell wrote: Is this the same issue that Radim's patchset from a few days back is addressing? Yes. Are you going to apply it, or should I include it in the pull request I'll send out today? If you put it in your pull request that would be easier for me. thanks -- PMM
Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
On Thu, Jul 16, 2015 at 11:50:38AM +0200, Igor Mammedov wrote: On Thu, 16 Jul 2015 10:30:41 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:24:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 17:08:27 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote: Hi, A while ago I was pondering on the options available for retrieving a fw_cfg blob from the guest-side (now that we can insert fw_cfg files on the host-side command line, see commit 81b2b8106). So over the last couple of weekends I cooked up the sysfs kernel module below, which lists all fw_cfg files under /sys/firmware/fw_cfg/filename. One concern here is that there will be a conflict here if fw cfg is used by ACPI. I don't know whether that last is a good idea though, so maybe not a real concern. I think Igor wanted to make it so. I don't see any conflict here so far, it's just guest side module that accesses fw_cfg. If there's ACPI code that accesses fw_cfg in response to an interrupt, it'll race with fw cfg access by guest OS. On linux we might be able to block ACPI preventing it from running. We probably won't be able to do it on windows. wrt vmgenid series we were talking about possibility to access fw_cfg from ACPI device.init so it's unlikely that it will ever collide with much later sysfs accesses. Don't we need to get updates when it changes too? I've thought that it's pretty static and doesn't change. vm gen id? No, its dynamic. But if ACPI will start accessing fw_cfg from other methods it will race for sure. Maybe we shouldn't poke at fw cfg in ACPI then. Yep, maybe we shouldn't. The last vmgenid series just maps UUID region at fixed location (which is sufficient to build ACPI tables at machine done time) so there isn't real need to export address via fw_cfg. Other versions just write into guest memory from QEMU, that will work fine too. -- MST
Re: [Qemu-devel] [PATCH 6/6] ARM: enable PMSAv7-style MPU on Cortex-M3/M4
On 15 July 2015 at 08:31, Alex Züpke alexander.zue...@hs-rm.de wrote: Am 14.07.2015 um 19:49 schrieb Peter Maydell: RASR is UNPREDICTABLE for non-word-size access, so we don't need this at all. It's from ARM recommended sample code: http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/BIHHHDDJ.html Ah, thanks. It turns out that this is a difference between implementations: earlier M profile cores (M3, M4) support byte and halfword accesses here, but later ones don't, so the architecture defines it as UNPREDICTABLE. Since we're modelling an M3, it's probably safest to implement it. You might find that extract32()/deposit32() make for a cleaner way to write it, though. -- PMM
Re: [Qemu-devel] [PATCH v4 0/7] Fix QEMU crash during memory hotplug with vhost=on
On Thu, 16 Jul 2015 13:24:35 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 11:42:36AM +0200, Igor Mammedov wrote: On Thu, 16 Jul 2015 10:35:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 09:26:21AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:32:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 05:12:01PM +0200, Igor Mammedov wrote: On Thu, 9 Jul 2015 13:47:17 +0200 Igor Mammedov imamm...@redhat.com wrote: there also is yet another issue with vhost-user. It also has very low limit on amount of memory regions (if I recall correctly 8) and it's possible to trigger even without memory hotplug. one just need to start QEMU with a several -numa memdev= options to create a necessary amount of memory regions to trigger it. lowrisk option to fix it would be increasing limit in vhost-user backend. another option is disabling vhost and fall-back to virtio, but I don't know much about vhost if it's possible to to switch it off without loosing packets guest was sending at the moment and if it will work at all with vhost. With vhost-user you can't fall back to virtio: it's not an accelerator, it's the backend. Updating the protocol to support a bigger table is possible but old remotes won't be able to support it. it looks like increasing limit is the only option left. it's not ideal that old remotes /with hardcoded limit/ might not be able to handle bigger table but at least new ones and ones that handle VhostUserMsg payload dynamically would be able to work without crashing. I think we need a way for hotplug to fail gracefully. As long as we don't implement the hva trick, it's needed for old kernels with vhost in kernel, too. I don't see a reliable way to fail hotplug though. In case of hotplug failure path comes from memory listener which can't fail by design but it fails in vhost case, i.e. vhost side doesn't follow protocol. We already have considered idea of querying vhost, for limit from memory hotplug handler before mapping memory region but it has drawbacks: 1. amount of memory ranges changes during guest lifecycle as it initializes different devices. which leads to a case when we can hotplug more pc-dimms than cold-plug. Which leads to inability to migrate guest with hotplugged pc-dimms since target QEMU won't start with that amount of dimms from source due to hitting limit. 2. it's ugly hack to query random 'vhost' entity when plugging dimm device from modeling pov, but we can live with it if it helps QEMU not to crash. If it's acceptable to break/ignore #1 issue, I can post related QEMU patches that I have, at least qemu won't crash with old vhost backends. Old kvm has lower limits on 3 of slots as well. How is this handled? the same ugly/non perfect way, pc_dimm_plug() - pc_dimm_memory_plug() { ... if (kvm_enabled() !kvm_has_free_slot(machine)) { error_setg(local_err, hypervisor has no free memory slots left); ...
Re: [Qemu-devel] [PATCH 3/3] PAM: make PAM emulation closer to documentation
On 16/07/2015 12:51, Ефимов Василий wrote: The rest of code looks up destination or source region or child region offset in memory sub-tree which root is PCI or RAM region provided on PAM creation. We cannon use common address_space_translate because it searches from address space root and will return current PAM region. To summarize, I suggest to move the code to exec.c. It is generic enough. All these mechanism are extremely low level. They are encapsulated within exec.c, and copying code to pam.c is not a good idea because you already have all the AddressSpaces and RAM MemoryRegions you need. Could you use an IOMMU memory region instead? Then a single region can be used to implement all four modes, and you don't hit the trying to execute code outside RAM or RAM. Did you mean MemoryRegion.iommu_ops ? The feature does not allow to change destination memory region. It does. You're right about this: exec.c: address_space_translate_for_iotlb: assert(!section-mr-iommu_ops); ... but an IOMMU region is not needed, and I think you can do everything without touching exec.c at all. +/* Read from RAM and write to PCI */ +memory_region_init_io(pam-region[1], OBJECT(dev), pam_ops, pam, +pam-r-ram-w-pci, size); This can be done with memory_region_set_readonly on the RAM region. You need to set mr-ops in order to point to pam_ops; for a first proof of concept you can just set the field directly. Writes to the PCI memory space can use the PCI address space, with address_space_st*. +/* Read from PCI and write to RAM */ +memory_region_init_io(pam-region[2], OBJECT(dev), pam_ops, pam, +pam-r-pci-w-ram, size); Here you cannot run code from ROM, so it can be a pure MMIO region. Reads can use address_space_ld*, while writes can use memory_region_get_ram_ptr. Paolo
Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx-dispatching optimization
Am 16.07.2015 um 00:59 hat Paolo Bonzini geschrieben: On 15/07/2015 22:14, Kevin Wolf wrote: index 233d8f5..ae7c6cf 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -318,11 +313,11 @@ bool aio_poll(AioContext *ctx, bool blocking) first = true; /* wait until next event */ -while (count 0) { +do { HANDLE event; int ret; -timeout = blocking +timeout = blocking !have_select_revents ? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0; if (timeout) { aio_context_release(ctx); Here I'm not sure why you switched to a do..while loop? It's not obvious to me how the change from aio_set_dispatching() to ctx-notify_me is related to this. I sort of expected a reviewer to ask me about this change. I would have explained this in the commit message, but it was already long enough. :) Rather than an extra section in the commit message, it should have been a comment in the code at least because it's not just the change that is confusing, but the resulting code is confusing as well. The reason is that, at least to me, the do..while loop reads almost like an assertion that count may indeed be 0. The count on entry is never zero, because the ctx-notifier is always registered. I changed the while to do...while so that it's obvious that ctx-notify_me is always decremented. In retrospect I could have added simply /* ctx-notifier is always registered. */ assert(count 0); above the do. Yes, please do that, it should be much clearer. With this information, I understand that what has changed is that the return value of g_main_context_iteration() changes from true before this patch (had the aio_notify() from aio_set_fd_handler() pending) to false after the patch (aio_notify() doesn't inject an event any more). This should mean that like above we can assert that the first iteration returns false, i.e. reverse the assertion (and indeed, with this change the test still passes for me). I was a bit undecided about this. In the end I decided that the calls to aio_poll/g_main_context_iteration were just to put the AioContext in a known state, and the assertions on the return value of g_assert were not really important. For this reason, the while loop seemed to express the intentions best, and I made it consistent between the AioContext and GSource cases. You changed the AioContext case in this same patch, even if you didn't quote my comment on that hunk. :-) Both cases were asserting the return value before. Kevin
Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx-dispatching optimization
On 16/07/2015 07:29, Fam Zheng wrote: This patch rewrites the ctx-dispatching optimization I lost track, but what was the justification of this optimization? qemu_bh_schedule's call to aio_notify were showing in the profile as 20% or so. Paolo Anyway, awesome debugging and explanations!
Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx-dispatching optimization
On Thu, Jul 16, 2015 at 01:29:20PM +0800, Fam Zheng wrote: On Wed, 07/15 19:13, Paolo Bonzini wrote: This patch rewrites the ctx-dispatching optimization I lost track, but what was the justification of this optimization? To avoid signalling the event notifier when it's not necessary. pgpOBgCfD5TyW.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx-dispatching optimization
On Wed, Jul 15, 2015 at 07:13:23PM +0200, Paolo Bonzini wrote: This patch rewrites the ctx-dispatching optimization, which was the cause of some mysterious hangs that could be reproduced on aarch64 KVM only. The hangs were indirectly caused by aio_poll() and in particular by flash memory updates's call to blk_write(), which invokes aio_poll(). Fun stuff: they had an extremely short race window, so much that adding all kind of tracing to either the kernel or QEMU made it go away (a single printf made it half as reproducible). On the plus side, the failure mode (a hang until the next keypress) made it very easy to examine the state of the process with a debugger. And there was a very nice reproducer from Laszlo, which failed pretty often (more than half of the time) on any version of QEMU with a non-debug kernel; it also failed fast, while still in the firmware. So, it could have been worse. For some unknown reason they happened only with virtio-scsi, but that's not important. It's more interesting that they disappeared with io=native, making thread-pool.c a likely suspect for where the bug arose. thread-pool.c is also one of the few places which use bottom halves across threads, by the way. I hope that no other similar bugs exist, but just in case :) I am going to describe how the successful debugging went... Since the likely culprit was the ctx-dispatching optimization, which mostly affects bottom halves, the first observation was that there are two qemu_bh_schedule() invocations in the thread pool: the one in the aio worker and the one in thread_pool_completion_bh. The latter always causes the optimization to trigger, the former may or may not. In order to restrict the possibilities, I introduced new functions qemu_bh_schedule_slow() and qemu_bh_schedule_fast(): /* qemu_bh_schedule_slow: */ ctx = bh-ctx; bh-idle = 0; if (atomic_xchg(bh-scheduled, 1) == 0) { event_notifier_set(ctx-notifier); } /* qemu_bh_schedule_fast: */ ctx = bh-ctx; bh-idle = 0; assert(ctx-dispatching); atomic_xchg(bh-scheduled, 1); Notice how the atomic_xchg is still in qemu_bh_schedule_slow(). This was already debated a few months ago, so I assumed it to be correct. In retrospect this was a very good idea, as you'll see later. Changing thread_pool_completion_bh() to qemu_bh_schedule_fast() didn't trigger the assertion (as expected). Changing the worker's invocation to qemu_bh_schedule_slow() didn't hide the bug (another assumption which luckily held). This already limited heavily the amount of interaction between the threads, hinting that the problematic events must have triggered around thread_pool_completion_bh(). As mentioned early, invoking a debugger to examine the state of a hung process was pretty easy; the iothread was always waiting on a poll(..., -1) system call. Infinite timeouts are much rarer on x86, and this could be the reason why the bug was never observed there. With the buggy sequence more or less resolved to an interaction between thread_pool_completion_bh() and poll(..., -1), my tracing strategy was to just add a few qemu_clock_get_ns(QEMU_CLOCK_REALTIME) calls, hoping that the ordering of aio_ctx_prepare(), aio_ctx_dispatch, poll() and qemu_bh_schedule_fast() would provide some hint. The output was: (gdb) p last_prepare $3 = 103885451 (gdb) p last_dispatch $4 = 103876492 (gdb) p last_poll $5 = 115909333 (gdb) p last_schedule $6 = 115925212 Notice how the last call to qemu_poll_ns() came after aio_ctx_dispatch(). This makes little sense unless there is an aio_poll() call involved, and indeed with a slightly different instrumentation you can see that there is one: (gdb) p last_prepare $3 = 107569679 (gdb) p last_dispatch $4 = 107561600 (gdb) p last_aio_poll $5 = 110671400 (gdb) p last_schedule $6 = 110698917 So the scenario becomes clearer: iothread VCPU thread -- aio_ctx_prepare aio_ctx_check qemu_poll_ns(timeout=-1) aio_poll aio_dispatch thread_pool_completion_bh qemu_bh_schedule() At this point bh-scheduled = 1 and the iothread has not been woken up. The solution must be close, but this alone should not be a problem, because the bottom half is only rescheduled to account for rare situations (see commit 3c80ca1, thread-pool: avoid deadlock in nested aio_poll() calls, 2014-07-15). Introducing a third thread---a thread pool worker thread, which also does qemu_bh_schedule()---does bring out the problematic case. The third thread must be awakened *after* the callback is complete and thread_pool_completion_bh has redone the whole loop,
Re: [Qemu-devel] Patches
On Wed, Jul 15, 2015 at 12:26:44PM +0100, Peter Maydell wrote: On 15 July 2015 at 09:52, Stefan Hajnoczi stefa...@redhat.com wrote: On Tue, Jul 14, 2015 at 10:58:41AM +0100, Peter Maydell wrote: On 14 July 2015 at 09:12, Peter Crosthwaite peter.crosthwa...@xilinx.com wrote: Hi Stefan, Peter, Is patches down? I don't see any patches for the last 3 days and there is definitely new things on list: $ ./patches fetch http://vmsplice.net/~patches/patches.json Fetched info on 689 patch series Fetching mboxes... $ ./patches list | more Message-id: 1436556159-3002-1-git-send-email...@weilnetz.de From: Stefan Weil s...@weilnetz.de Date: 2015-07-10 Tags: for-2.4 [1/1] tci: Fix regression with INDEX_op_qemu_st_i32, INDEX_op_qemu_st_i64 The other patches instance also seems to have got wedged (http://wiki.qemu.org/patches/patches.json): that 2015-07-10 patch is the last one it knows about too. Strange, that is the same file as http://qemu-project.org/patches/patches.json. It unwedged itself later that same day. I experienced something weird today as well. patches fetch downloaded an incomplete patches.json file :(. I'll have to keep an eye on the server... Stefan pgpDqjotpK7wm.pgp Description: PGP signature
[Qemu-devel] [PATCH] crypto/cipher-nettle.c: Pass correct function type to cbc_encrypt and cbc_decrypt.
The prototypes of the {nettle_}cbc_encrypt and cbc_decrypt functions are: void cbc_encrypt(const void *ctx, nettle_cipher_func *f, size_t block_size, uint8_t *iv, size_t length, uint8_t *dst, const uint8_t *src); void cbc_decrypt(const void *ctx, nettle_cipher_func *f, size_t block_size, uint8_t *iv, size_t length, uint8_t *dst, const uint8_t *src); Since we passed nettle_crypt_func (instead of nettle_cipher_func) as the second argument, this gave the errors below. In file included from crypto/cipher.c:71:0: ./crypto/cipher-nettle.c: In function ‘qcrypto_cipher_encrypt’: ./crypto/cipher-nettle.c:154:39: error: passing argument 2 of ‘nettle_cbc_encrypt’ from incompatible pointer type [-Werror=incompatible-pointer-types] cbc_encrypt(ctx-ctx_encrypt, ctx-alg_encrypt, ^ In file included from ./crypto/cipher-nettle.c:24:0, from crypto/cipher.c:71: /usr/include/nettle/cbc.h:48:1: note: expected ‘void (*)(const void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(const void *, long unsigned int, unsigned char *, const unsigned char *)}’ but argument is of type ‘void (*)(void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(void *, long unsigned int, unsigned char *, const unsigned char *)}’ cbc_encrypt(const void *ctx, nettle_cipher_func *f, ^ In file included from crypto/cipher.c:71:0: ./crypto/cipher-nettle.c: In function ‘qcrypto_cipher_decrypt’: ./crypto/cipher-nettle.c:183:21: error: passing argument 2 of ‘nettle_cbc_decrypt’ from incompatible pointer type [-Werror=incompatible-pointer-types] ctx-alg_decrypt, ctx-niv, ctx-iv, ^ In file included from ./crypto/cipher-nettle.c:24:0, from crypto/cipher.c:71: /usr/include/nettle/cbc.h:54:1: note: expected ‘void (*)(const void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(const void *, long unsigned int, unsigned char *, const unsigned char *)}’ but argument is of type ‘void (*)(void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(void *, long unsigned int, unsigned char *, const unsigned char *)}’ cbc_decrypt(const void *ctx, nettle_cipher_func *f, Signed-off-by: Richard W.M. Jones rjo...@redhat.com --- crypto/cipher-nettle.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/crypto/cipher-nettle.c b/crypto/cipher-nettle.c index e5a14bc..f06ea92 100644 --- a/crypto/cipher-nettle.c +++ b/crypto/cipher-nettle.c @@ -27,8 +27,8 @@ typedef struct QCryptoCipherNettle QCryptoCipherNettle; struct QCryptoCipherNettle { void *ctx_encrypt; void *ctx_decrypt; -nettle_crypt_func *alg_encrypt; -nettle_crypt_func *alg_decrypt; +nettle_cipher_func *alg_encrypt; +nettle_cipher_func *alg_decrypt; uint8_t *iv; size_t niv; }; @@ -83,8 +83,8 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg, des_set_key(ctx-ctx_encrypt, rfbkey); g_free(rfbkey); -ctx-alg_encrypt = (nettle_crypt_func *)des_encrypt; -ctx-alg_decrypt = (nettle_crypt_func *)des_decrypt; +ctx-alg_encrypt = (nettle_cipher_func *)des_encrypt; +ctx-alg_decrypt = (nettle_cipher_func *)des_decrypt; ctx-niv = DES_BLOCK_SIZE; break; @@ -98,8 +98,8 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg, aes_set_encrypt_key(ctx-ctx_encrypt, nkey, key); aes_set_decrypt_key(ctx-ctx_decrypt, nkey, key); -ctx-alg_encrypt = (nettle_crypt_func *)aes_encrypt; -ctx-alg_decrypt = (nettle_crypt_func *)aes_decrypt; +ctx-alg_encrypt = (nettle_cipher_func *)aes_encrypt; +ctx-alg_decrypt = (nettle_cipher_func *)aes_decrypt; ctx-niv = AES_BLOCK_SIZE; break; -- 2.4.3
Re: [Qemu-devel] [PATCH] crypto/cipher-nettle.c: Pass correct function type to cbc_encrypt and cbc_decrypt.
On 16 July 2015 at 10:55, Richard W.M. Jones rjo...@redhat.com wrote: The prototypes of the {nettle_}cbc_encrypt and cbc_decrypt functions are: void cbc_encrypt(const void *ctx, nettle_cipher_func *f, size_t block_size, uint8_t *iv, size_t length, uint8_t *dst, const uint8_t *src); void cbc_decrypt(const void *ctx, nettle_cipher_func *f, size_t block_size, uint8_t *iv, size_t length, uint8_t *dst, const uint8_t *src); Since we passed nettle_crypt_func (instead of nettle_cipher_func) as the second argument, this gave the errors below. In file included from crypto/cipher.c:71:0: ./crypto/cipher-nettle.c: In function ‘qcrypto_cipher_encrypt’: ./crypto/cipher-nettle.c:154:39: error: passing argument 2 of ‘nettle_cbc_encrypt’ from incompatible pointer type [-Werror=incompatible-pointer-types] cbc_encrypt(ctx-ctx_encrypt, ctx-alg_encrypt, ^ In file included from ./crypto/cipher-nettle.c:24:0, from crypto/cipher.c:71: /usr/include/nettle/cbc.h:48:1: note: expected ‘void (*)(const void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(const void *, long unsigned int, unsigned char *, const unsigned char *)}’ but argument is of type ‘void (*)(void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(void *, long unsigned int, unsigned char *, const unsigned char *)}’ cbc_encrypt(const void *ctx, nettle_cipher_func *f, ^ In file included from crypto/cipher.c:71:0: ./crypto/cipher-nettle.c: In function ‘qcrypto_cipher_decrypt’: ./crypto/cipher-nettle.c:183:21: error: passing argument 2 of ‘nettle_cbc_decrypt’ from incompatible pointer type [-Werror=incompatible-pointer-types] ctx-alg_decrypt, ctx-niv, ctx-iv, ^ In file included from ./crypto/cipher-nettle.c:24:0, from crypto/cipher.c:71: /usr/include/nettle/cbc.h:54:1: note: expected ‘void (*)(const void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(const void *, long unsigned int, unsigned char *, const unsigned char *)}’ but argument is of type ‘void (*)(void *, size_t, uint8_t *, const uint8_t *) {aka void (*)(void *, long unsigned int, unsigned char *, const unsigned char *)}’ cbc_decrypt(const void *ctx, nettle_cipher_func *f, Is this the same issue that Radim's patchset from a few days back is addressing? thanks -- PMM
Re: [Qemu-devel] [PATCH] crypto/cipher-nettle.c: Pass correct function type to cbc_encrypt and cbc_decrypt.
On 16/07/2015 12:04, Peter Maydell wrote: Is this the same issue that Radim's patchset from a few days back is addressing? Yes. Are you going to apply it, or should I include it in the pull request I'll send out today? Paolo
Re: [Qemu-devel] [PATCH V2 1/2] tests: introduce basic pci test for virtio-net
On Mon, Jul 13, 2015 at 02:04:19PM +0800, Jason Wang wrote: Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - replace the magic value 12 with a macro --- tests/Makefile | 2 +- tests/virtio-net-test.c | 186 ++-- 2 files changed, 180 insertions(+), 8 deletions(-) I haven't reviewed this series in detail but it's great that you are doing this! Stefan pgpuyRglZJN3p.pgp Description: PGP signature
Re: [Qemu-devel] [PULL 0/9] target-mips queue
On 16 July 2015 at 09:17, Leon Alrae leon.al...@imgtec.com wrote: Hi, This pull request contains MIPS bug fixes for 2.4-rc1. Thanks, Leon Cc: Peter Maydell peter.mayd...@linaro.org Cc: Aurelien Jarno aurel...@aurel32.net The following changes since commit 661725da09f47eb92d356fac10a4cf3b7ad1f61d: Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20150714' into staging (2015-07-14 18:50:17 +0100) are available in the git repository at: git://github.com/lalrae/qemu.git tags/mips-20150716 for you to fetch changes up to 908680c6441ac468f4871d513f42be396ea0d264: target-mips: fix page fault address for LWL/LWR/LDL/LDR (2015-07-15 14:07:25 +0100) MIPS patches 2015-07-16 Changes: * bug fixes Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH 09/10 v12] target-tilegx: Generate tcg instructions to finish Hello world
Hello Maintainers: Please help review this patch when you have time. Thanks. On 06/13/2015 09:21 PM, Chen Gang wrote: Generate related tcg instructions, and qemu tilegx can finish running Hello world. The elf64 binary can be static or shared. Signed-off-by: Chen Gang gang.chen.5...@gmail.com --- target-tilegx/translate.c | 2966 + 1 file changed, 2966 insertions(+) create mode 100644 target-tilegx/translate.c diff --git a/target-tilegx/translate.c b/target-tilegx/translate.c new file mode 100644 index 000..1dd3a43 --- /dev/null +++ b/target-tilegx/translate.c @@ -0,0 +1,2966 @@ +/* + * QEMU TILE-Gx CPU + * + * Copyright (c) 2015 Chen Gang + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see + * http://www.gnu.org/licenses/lgpl-2.1.html + */ + +#include cpu.h +#include qemu/log.h +#include disas/disas.h +#include tcg-op.h +#include exec/cpu_ldst.h +#include opcode_tilegx.h +#include spr_def_64.h + +#define FMT64X %016 PRIx64 +#define TILEGX_TMP_REGS(TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE + 1) + +static TCGv_ptr cpu_env; +static TCGv cpu_pc; +static TCGv cpu_regs[TILEGX_R_COUNT]; +static TCGv cpu_spregs[TILEGX_SPR_COUNT]; +#if defined(CONFIG_USER_ONLY) +static TCGv_i32 cpu_excparam; +#endif + +static const char * const reg_names[] = { + r0, r1, r2, r3, r4, r5, r6, r7, + r8, r9, r10, r11, r12, r13, r14, r15, +r16, r17, r18, r19, r20, r21, r22, r23, +r24, r25, r26, r27, r28, r29, r30, r31, +r32, r33, r34, r35, r36, r37, r38, r39, +r40, r41, r42, r43, r44, r45, r46, r47, +r48, r49, r50, r51, bp, tp, sp, lr +}; + +static const char * const spreg_names[] = { +cmpexch, criticalsec, simcontrol +}; + +/* It is for temporary registers */ +typedef struct DisasContextTemp { +uint8_t idx; /* index */ +TCGv val; /* value */ +} DisasContextTemp; + +/* This is the state at translation time. */ +typedef struct DisasContext { +uint64_t pc; /* Current pc */ +int exception; /* Current exception */ + +TCGv zero; /* For zero register */ + +DisasContextTemp *tmp_regcur; /* Current temporary registers */ +DisasContextTemp tmp_regs[TILEGX_TMP_REGS]; /* All temporary registers */ +struct { +TCGCond cond; /* Branch condition */ +TCGv dest; /* pc jump destination, if will jump */ +TCGv val1; /* Firt value for condition comparing */ +TCGv val2; /* Second value for condition comparing */ +} jmp; /* Jump object, only once in each TB block */ +} DisasContext; + +#include exec/gen-icount.h + +static void gen_exception(DisasContext *dc, int num) +{ +TCGv_i32 tmp = tcg_const_i32(num); + +gen_helper_exception(cpu_env, tmp); +tcg_temp_free_i32(tmp); +} + +/* + * All exceptions which can still let working flow continue are all in pipe x1, + * which is the last pipe of a bundle. So it is OK to only process the first + * exception within a bundle. + */ +static void set_exception(DisasContext *dc, int num) +{ +if (dc-exception == TILEGX_EXCP_NONE) { +dc-exception = num; +} +} + +static bool check_gr(DisasContext *dc, uint8_t reg) +{ +if (likely(reg TILEGX_R_COUNT)) { +return true; +} + +switch (reg) { +case TILEGX_R_SN: +case TILEGX_R_ZERO: +break; +case TILEGX_R_IDN0: +case TILEGX_R_IDN1: +set_exception(dc, TILEGX_EXCP_REG_IDN_ACCESS); +break; +case TILEGX_R_UDN0: +case TILEGX_R_UDN1: +case TILEGX_R_UDN2: +case TILEGX_R_UDN3: +set_exception(dc, TILEGX_EXCP_REG_UDN_ACCESS); +break; +default: +g_assert_not_reached(); +} +return false; +} + +static TCGv load_zero(DisasContext *dc) +{ +if (TCGV_IS_UNUSED_I64(dc-zero)) { +dc-zero = tcg_const_i64(0); +} +return dc-zero; +} + +static TCGv load_gr(DisasContext *dc, uint8_t reg) +{ +if (check_gr(dc, reg)) { +return cpu_regs[reg]; +} +return load_zero(dc); +} + +static TCGv dest_gr(DisasContext *dc, uint8_t rdst)
Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
On Wed, 15 Jul 2015 19:24:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 17:08:27 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote: Hi, A while ago I was pondering on the options available for retrieving a fw_cfg blob from the guest-side (now that we can insert fw_cfg files on the host-side command line, see commit 81b2b8106). So over the last couple of weekends I cooked up the sysfs kernel module below, which lists all fw_cfg files under /sys/firmware/fw_cfg/filename. One concern here is that there will be a conflict here if fw cfg is used by ACPI. I don't know whether that last is a good idea though, so maybe not a real concern. I think Igor wanted to make it so. I don't see any conflict here so far, it's just guest side module that accesses fw_cfg. If there's ACPI code that accesses fw_cfg in response to an interrupt, it'll race with fw cfg access by guest OS. On linux we might be able to block ACPI preventing it from running. We probably won't be able to do it on windows. wrt vmgenid series we were talking about possibility to access fw_cfg from ACPI device.init so it's unlikely that it will ever collide with much later sysfs accesses. But if ACPI will start accessing fw_cfg from other methods it will race for sure.
[Qemu-devel] [PATCH 2/3] memory: make function memory_access_is_direct public
Make function memory_access_is_direct public. It is required by PAM emulation. Signed-off-by: Efimov Vasily r...@ispras.ru --- exec.c | 12 include/exec/memory-internal.h | 12 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/exec.c b/exec.c index 4e37ded..27064b8 100644 --- a/exec.c +++ b/exec.c @@ -372,18 +372,6 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x return section; } -static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write) -{ -if (memory_region_is_ram(mr)) { -return !(is_write mr-readonly); -} -if (memory_region_is_romd(mr)) { -return !is_write; -} - -return false; -} - /* Called from RCU critical section */ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, hwaddr *xlat, hwaddr *plen, diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index 801da82..89975b6 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -34,5 +34,17 @@ bool memory_region_access_valid(MemoryRegion *mr, hwaddr addr, void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr, hwaddr length); +static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write) +{ +if (memory_region_is_ram(mr)) { +return !(is_write mr-readonly); +} +if (memory_region_is_romd(mr)) { +return !is_write; +} + +return false; +} + #endif #endif -- 1.9.1
[Qemu-devel] [PATCH 3/3] PAM: make PAM emulation closer to documentation
This patch improves PAM emulation. PAM defines 4 memory access redirection modes. In mode 1 reads are directed to RAM and writes are directed to PCI. In mode 2 it is contrary. In mode 0 all access is directed to PCI. In mode 3 it is directed to RAM. Modes 0 and 3 are well emulated but modes 1 and 2 are not. The cause is: aliases are used while more complicated logic is required. The idea is to use ROM device like memory regions for mode 1 and 2 emulation instead of aliases. Writes are directed to proper destination region by specified I/O callback. Read redirection depends on type of source region. In most cases source region is RAM (or ROM), so ram_addr of PAM region is set to ram_addr of source region with offset. Otherwise, when source region is an I/O region, reading is redirected to source region read callback by PAM region one. Read source and write destination regions are updated by the memory commit callback. Note that we cannot use I/O region for PAM as it will violate trying to execute code outside RAM or ROM assertion. Signed-off-by: Efimov Vasily r...@ispras.ru --- hw/pci-host/pam.c | 238 +- include/hw/pci-host/pam.h | 10 +- 2 files changed, 223 insertions(+), 25 deletions(-) diff --git a/hw/pci-host/pam.c b/hw/pci-host/pam.c index 17d826c..9729b2b 100644 --- a/hw/pci-host/pam.c +++ b/hw/pci-host/pam.c @@ -27,43 +27,233 @@ * THE SOFTWARE. */ -#include qom/object.h -#include sysemu/sysemu.h #include hw/pci-host/pam.h +#include exec/address-spaces.h +#include exec/memory-internal.h +#include qemu/bswap.h + +static void pam_write(void *opaque, hwaddr addr, uint64_t data, + unsigned size) +{ +PAMMemoryRegion *pam = (PAMMemoryRegion *) opaque; +void *ptr; + +/* Destination region can be both RAM and IO. */ +if (!memory_access_is_direct(pam-mr_write_to, true)) { +memory_region_dispatch_write(pam-mr_write_to, + addr + pam-write_offset, data, size, + MEMTXATTRS_UNSPECIFIED); +} else { +ptr = memory_region_get_ram_ptr(pam-mr_write_to) + addr + + pam-write_offset; + +switch (size) { +case 1: +stb_p(ptr, data); +break; +case 2: +stw_he_p(ptr, data); +break; +case 4: +stl_he_p(ptr, data); +break; +case 8: +stq_he_p(ptr, data); +break; +default: +abort(); +} + +invalidate_and_set_dirty(pam-mr_write_to, addr + pam-pam_offset, + size); +} +} + +static uint64_t pam_read(void *opaque, hwaddr addr, unsigned size) +{ +PAMMemoryRegion *pam = (PAMMemoryRegion *) opaque; +uint64_t ret = 0; + +/* Source region can be IO only. */ +memory_region_dispatch_read(pam-mr_read_from, pam-read_offset + addr, +ret, size, MEMTXATTRS_UNSPECIFIED); + +return ret; +} + +static MemoryRegionOps pam_ops = { +.write = pam_write, +.read = pam_read, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + +static void pam_set_current(PAMMemoryRegion *pam, unsigned new_current) +{ +assert(new_current = 3); + +if (new_current != pam-current) { +#ifdef DEBUG_PAM +printf(PAM 0xTARGET_FMT_plx: %d = %s\n, pam-pam_offset, + new_current, pam-region[new_current].name); +#endif +memory_region_set_enabled(pam-region[pam-current], false); +pam-current = new_current; +memory_region_set_enabled(pam-region[new_current], true); +} +} + +static void pam_leaf_mr_lookup(MemoryRegion *mr, hwaddr offset, + MemoryRegion **leaf, hwaddr *offset_within_leaf) +{ +MemoryRegion *other; +if (mr-alias) { +pam_leaf_mr_lookup(mr-alias, offset + mr-alias_offset, leaf, + offset_within_leaf); +} else { +if (QTAILQ_EMPTY(mr-subregions) + int128_lt(int128_make64(offset), mr-size)) { +*leaf = mr; +*offset_within_leaf = offset; +} else { +QTAILQ_FOREACH(other, mr-subregions, subregions_link) { +if (!other-enabled) { +continue; +} +if (other-addr = offset int128_lt(int128_make64(offset), +int128_add(int128_make64(other-addr), other-size))) { +pam_leaf_mr_lookup(other, offset - other-addr, leaf, + offset_within_leaf); +if (*leaf) { +return; +} +} +} +*leaf = NULL; +} +} +} + +static bool mr_has_mr(MemoryRegion *container, MemoryRegion *mr, + hwaddr *offset) +{ +hwaddr tmp_offset; +
[Qemu-devel] [PATCH 0/3] PAM emulation improvement
This patch series improves PAM emulation. PAM defines 4 memory access redirection modes. In mode 1 reads are directed to RAM and writes are directed to PCI. In mode 2 it is contrary. In mode 0 all access is directed to PCI. In mode 3 it is directed to RAM. Modes 0 and 3 are well emulated but modes 1 and 2 are not. The cause is: aliases are used while more complicated logic is required. The idea is to use ROM device like memory regions for mode 1 and 2 emulation instead of aliases. Writes are directed to proper destination region by specified I/O callback. Read redirection depends on type of source region. In most cases source region is RAM (or ROM), so ram_addr of PAM region is set to ram_addr of source region with offset. Otherwise, when source region is an I/O region, reading is redirected to source region read callback by PAM region one. Read source and write destination regions are updated by the memory commit callback. Note that we cannot use I/O region for PAM as it will violate trying to execute code outside RAM or ROM assertion. Qemu distribution includes SeaBIOS which has hacks to work around incorrect modes 1 and 2 emulation. This patch series is tested using modified SeaBIOS. It is forced to use mode 2 for copying its data. BIOS reads a value from memory and immediately writes it to same address. According to PAM definition, reads are directed to PCI (i.e. to BIOS ROM) and writes are directed to RAM. The patch for SeaBIOS is listed below. Both SeaBIOS versions works with new PAM but the modified one does not work with old PAM. == diff --git a/src/fw/shadow.c b/src/fw/shadow.c index 4f6..5b0e527 100644 --- a/src/fw/shadow.c +++ b/src/fw/shadow.c @@ -26,32 +26,43 @@ static void __make_bios_writable_intel(u16 bdf, u32 pam0) { // Make ram from 0xc-0xf writable -int clear = 0; int i; +unsigned *mem, *mem_limit; for (i=0; i6; i++) { u32 pam = pam0 + 1 + i; int reg = pci_config_readb(bdf, pam); -if (CONFIG_OPTIONROMS_DEPLOYED (reg 0x11) != 0x11) { -// Need to copy optionroms to work around qemu implementation -void *mem = (void*)(BUILD_ROM_START + i * 32*1024); -memcpy((void*)BUILD_BIOS_TMP_ADDR, mem, 32*1024); +if ((reg 0x11) != 0x11) { +mem = (unsigned *)(BUILD_ROM_START + i * 32 * 1024); +pci_config_writeb(bdf, pam, 0x22); +mem_limit = mem + 32 * 1024 / sizeof(unsigned); + +while (mem mem_limit) { +volatile unsigned tmp = *mem; +*mem = tmp; +mem++; +} pci_config_writeb(bdf, pam, 0x33); -memcpy(mem, (void*)BUILD_BIOS_TMP_ADDR, 32*1024); -clear = 1; } else { pci_config_writeb(bdf, pam, 0x33); } } -if (clear) -memset((void*)BUILD_BIOS_TMP_ADDR, 0, 32*1024); // Make ram from 0xf-0x10 writable int reg = pci_config_readb(bdf, pam0); -pci_config_writeb(bdf, pam0, 0x30); if (reg 0x10) // Ram already present. return; +pci_config_writeb(bdf, pam0, 0x22); +mem = (unsigned *)BUILD_BIOS_ADDR; +mem_limit = mem + 32 * 1024 / sizeof(unsigned); +while (mem mem_limit) { +volatile unsigned tmp = *mem; +*mem = tmp; +mem++; +} +pci_config_writeb(bdf, pam0, 0x33); + // Copy bios. extern u8 code32flat_start[], code32flat_end[]; memcpy(code32flat_start, code32flat_start + BIOS_SRC_OFFSET @@ -61,17 +72,6 @@ __make_bios_writable_intel(u16 bdf, u32 pam0) static void make_bios_writable_intel(u16 bdf, u32 pam0) { -int reg = pci_config_readb(bdf, pam0); -if (!(reg 0x10)) { -// QEMU doesn't fully implement the piix shadow capabilities - -// if ram isn't backing the bios segment when shadowing is -// disabled, the code itself wont be in memory. So, run the -// code from the high-memory flash location. -u32 pos = (u32)__make_bios_writable_intel + BIOS_SRC_OFFSET; -void (*func)(u16 bdf, u32 pam0) = (void*)pos; -func(bdf, pam0); -return; -} // Ram already present - just enable writes __make_bios_writable_intel(bdf, pam0); } Efimov Vasily (3): memory: make function invalidate_and_set_dirty public memory: make function memory_access_is_direct public PAM: make PAM emulation closer to documentation exec.c | 14 +-- hw/pci-host/pam.c | 238 - include/exec/memory-internal.h | 15 +++ include/hw/pci-host/pam.h | 10 +- 4 files changed, 239 insertions(+), 38 deletions(-) -- 1.9.1
Re: [Qemu-devel] [PATCH V2] virtio-net: unbreak any layout
On Thu, Jul 16, 2015 at 02:49:48PM +0800, Jason Wang wrote: On 07/16/2015 02:42 PM, Michael S. Tsirkin wrote: On Wed, Jul 15, 2015 at 03:56:07PM +0800, Jason Wang wrote: Commit 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) breaks any layout by requiring out_sg[0].iov_len = n-guest_hdr_len. Fixing this by copying header to temporary buffer if swap is needed, and then use this buffer as part of out_sg. Fixes 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) Cc: qemu-sta...@nongnu.org Cc: c...@fr.ibm.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - avoid header copying if there's no need to do header swap - don't write the header back --- hw/net/virtio-net.c | 17 ++--- include/hw/virtio/virtio-access.h | 9 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e3c2db3..12322bd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1142,7 +1142,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) ssize_t ret, len; unsigned int out_num = elem.out_num; struct iovec *out_sg = elem.out_sg[0]; -struct iovec sg[VIRTQUEUE_MAX_SIZE]; +struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE]; +struct virtio_net_hdr hdr; if (out_num 1) { error_report(virtio-net header not in first element); @@ -1150,11 +1151,21 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } if (n-has_vnet_hdr) { -if (out_sg[0].iov_len n-guest_hdr_len) { +if (iov_size(out_sg, out_num) n-guest_hdr_len) { error_report(virtio-net header incorrect); exit(1); } this scans the iov unnecessarily. How about checking return code from iov_to_buf instead? Looks ok since hdr is very short anyway. but iov_size scans the whole iov, not just the header. -virtio_net_hdr_swap(vdev, (void *) out_sg[0].iov_base); +if (virtio_needs_swap(vdev)) { +iov_to_buf(out_sg, out_num, 0, hdr, sizeof(hdr)); +virtio_net_hdr_swap(vdev, (void *) hdr); +sg2[0].iov_base = hdr; +sg2[0].iov_len = sizeof(hdr); +out_num = iov_copy(sg2[1], ARRAY_SIZE(sg2) - 1, + out_sg, out_num, + sizeof(hdr), -1); This might truncate packet if it does not fit in 1024 anymore. It might be better to just drop it. Ok, will do this in V3. +out_num += 1; +out_sg = sg2; +} } /* diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index cee5dd7..1ec1dfd 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -143,6 +143,15 @@ static inline uint64_t virtio_ldq_p(VirtIODevice *vdev, const void *ptr) } } +static inline bool virtio_needs_swap(VirtIODevice *vdev) +{ +#ifdef HOST_WORDS_BIGENDIAN +return virtio_access_is_big_endian(vdev) ? false : true; +#else +return virtio_access_is_big_endian(vdev) ? true : false; +#endif +} + static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s) { #ifdef HOST_WORDS_BIGENDIAN -- 2.1.4
Re: [Qemu-devel] [PATCH V2] virtio-net: unbreak any layout
On 07/16/2015 03:29 PM, Michael S. Tsirkin wrote: On Thu, Jul 16, 2015 at 02:49:48PM +0800, Jason Wang wrote: On 07/16/2015 02:42 PM, Michael S. Tsirkin wrote: On Wed, Jul 15, 2015 at 03:56:07PM +0800, Jason Wang wrote: Commit 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) breaks any layout by requiring out_sg[0].iov_len = n-guest_hdr_len. Fixing this by copying header to temporary buffer if swap is needed, and then use this buffer as part of out_sg. Fixes 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) Cc: qemu-sta...@nongnu.org Cc: c...@fr.ibm.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - avoid header copying if there's no need to do header swap - don't write the header back --- hw/net/virtio-net.c | 17 ++--- include/hw/virtio/virtio-access.h | 9 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e3c2db3..12322bd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1142,7 +1142,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) ssize_t ret, len; unsigned int out_num = elem.out_num; struct iovec *out_sg = elem.out_sg[0]; -struct iovec sg[VIRTQUEUE_MAX_SIZE]; +struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE]; +struct virtio_net_hdr hdr; if (out_num 1) { error_report(virtio-net header not in first element); @@ -1150,11 +1151,21 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } if (n-has_vnet_hdr) { -if (out_sg[0].iov_len n-guest_hdr_len) { +if (iov_size(out_sg, out_num) n-guest_hdr_len) { error_report(virtio-net header incorrect); exit(1); } this scans the iov unnecessarily. How about checking return code from iov_to_buf instead? Looks ok since hdr is very short anyway. but iov_size scans the whole iov, not just the header. Right.
[Qemu-devel] [PULL 7/9] target-mips: fix resource leak reported by Coverity
UHI assert and link operations call lock_user_string() twice to obtain two strings pointed by gpr[4] and gpr[5]. If the second lock_user_string() fails, then the first one won't get freed. Fix this by introducing another macro responsible for obtaining two strings and handling allocation failure. Signed-off-by: Leon Alrae leon.al...@imgtec.com Reviewed-by: Aurelien Jarno aurel...@aurel32.net --- target-mips/mips-semi.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/target-mips/mips-semi.c b/target-mips/mips-semi.c index 1162c76..5050940 100644 --- a/target-mips/mips-semi.c +++ b/target-mips/mips-semi.c @@ -220,6 +220,23 @@ static int copy_argn_to_target(CPUMIPSState *env, int arg_num, } \ } while (0) +#define GET_TARGET_STRINGS_2(p, addr, p2, addr2)\ +do {\ +p = lock_user_string(addr); \ +if (!p) { \ +gpr[2] = -1;\ +gpr[3] = EFAULT;\ +goto uhi_done; \ +} \ +p2 = lock_user_string(addr2); \ +if (!p2) { \ +unlock_user(p, addr, 0);\ +gpr[2] = -1;\ +gpr[3] = EFAULT;\ +goto uhi_done; \ +} \ +} while (0) + #define FREE_TARGET_STRING(p, gpr) \ do {\ unlock_user(p, gpr, 0); \ @@ -322,8 +339,7 @@ void helper_do_semihosting(CPUMIPSState *env) FREE_TARGET_STRING(p, gpr[4]); break; case UHI_assert: -GET_TARGET_STRING(p, gpr[4]); -GET_TARGET_STRING(p2, gpr[5]); +GET_TARGET_STRINGS_2(p, gpr[4], p2, gpr[5]); printf(assertion '); printf(\%s\, p); printf(': file \%s\, line %d\n, p2, (int)gpr[6]); @@ -341,8 +357,7 @@ void helper_do_semihosting(CPUMIPSState *env) break; #ifndef _WIN32 case UHI_link: -GET_TARGET_STRING(p, gpr[4]); -GET_TARGET_STRING(p2, gpr[5]); +GET_TARGET_STRINGS_2(p, gpr[4], p2, gpr[5]); gpr[2] = link(p, p2); gpr[3] = errno_mips(errno); FREE_TARGET_STRING(p2, gpr[5]); -- 2.1.0
[Qemu-devel] [PULL 8/9] linux-user: Fix MIPS N64 trap and break instruction bug
From: Andrew Bennett andrew.benn...@imgtec.com For the MIPS N64 ABI when QEMU reads the break/trap instruction so that it can inspect the break/trap code it reads 8 rather than 4 bytes which means it finds the code field from the instruction after the break/trap instruction. This then causes the break/trap handling code to fail because it does not understand the code number. The fix forces QEMU to always read 4 bytes of instruction data rather than deciding how much to read based on the ABI. Signed-off-by: Andrew Bennett andrew.benn...@imgtec.com Reviewed-by: Leon Alrae leon.al...@imgtec.com Signed-off-by: Leon Alrae leon.al...@imgtec.com --- linux-user/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/linux-user/main.c b/linux-user/main.c index 05914b1..fdee981 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -2577,7 +2577,7 @@ done_syscall: code = (trap_instr 6) 0x3f; } } else { -ret = get_user_ual(trap_instr, env-active_tc.PC); +ret = get_user_u32(trap_instr, env-active_tc.PC); if (ret != 0) { goto error; } @@ -2611,7 +2611,7 @@ done_syscall: trap_instr = (instr[0] 16) | instr[1]; } else { -ret = get_user_ual(trap_instr, env-active_tc.PC); +ret = get_user_u32(trap_instr, env-active_tc.PC); } if (ret != 0) { -- 2.1.0
[Qemu-devel] [PULL 4/9] target-mips: fix ASID synchronisation for MIPS MT
From: Aurelien Jarno aurel...@aurel32.net When syncing the task ASID with EntryHi, correctly or the value instead of assigning it. Reported-by: Dr. David Alan Gilbert dgilb...@redhat.com Signed-off-by: Aurelien Jarno aurel...@aurel32.net Cc: Leon Alrae leon.al...@imgtec.com Reviewed-by: Leon Alrae leon.al...@imgtec.com Signed-off-by: Leon Alrae leon.al...@imgtec.com --- target-mips/op_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index 2a9ddff..d457a29 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -661,7 +661,7 @@ static void sync_c0_tcstatus(CPUMIPSState *cpu, int tc, /* Sync the TASID with EntryHi. */ cpu-CP0_EntryHi = ~0xff; -cpu-CP0_EntryHi = tasid; +cpu-CP0_EntryHi |= tasid; compute_hflags(cpu); } -- 2.1.0
Re: [Qemu-devel] [PATCH V2] virtio-net: unbreak any layout
On 07/16/2015 02:42 PM, Michael S. Tsirkin wrote: On Wed, Jul 15, 2015 at 03:56:07PM +0800, Jason Wang wrote: Commit 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) breaks any layout by requiring out_sg[0].iov_len = n-guest_hdr_len. Fixing this by copying header to temporary buffer if swap is needed, and then use this buffer as part of out_sg. Fixes 032a74a1c0fcdd5fd1c69e56126b4c857ee36611 (virtio-net: byteswap virtio-net header) Cc: qemu-sta...@nongnu.org Cc: c...@fr.ibm.com Signed-off-by: Jason Wang jasow...@redhat.com --- Changes from V1: - avoid header copying if there's no need to do header swap - don't write the header back --- hw/net/virtio-net.c | 17 ++--- include/hw/virtio/virtio-access.h | 9 + 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e3c2db3..12322bd 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1142,7 +1142,8 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) ssize_t ret, len; unsigned int out_num = elem.out_num; struct iovec *out_sg = elem.out_sg[0]; -struct iovec sg[VIRTQUEUE_MAX_SIZE]; +struct iovec sg[VIRTQUEUE_MAX_SIZE], sg2[VIRTQUEUE_MAX_SIZE]; +struct virtio_net_hdr hdr; if (out_num 1) { error_report(virtio-net header not in first element); @@ -1150,11 +1151,21 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q) } if (n-has_vnet_hdr) { -if (out_sg[0].iov_len n-guest_hdr_len) { +if (iov_size(out_sg, out_num) n-guest_hdr_len) { error_report(virtio-net header incorrect); exit(1); } this scans the iov unnecessarily. How about checking return code from iov_to_buf instead? Looks ok since hdr is very short anyway. -virtio_net_hdr_swap(vdev, (void *) out_sg[0].iov_base); +if (virtio_needs_swap(vdev)) { +iov_to_buf(out_sg, out_num, 0, hdr, sizeof(hdr)); +virtio_net_hdr_swap(vdev, (void *) hdr); +sg2[0].iov_base = hdr; +sg2[0].iov_len = sizeof(hdr); +out_num = iov_copy(sg2[1], ARRAY_SIZE(sg2) - 1, + out_sg, out_num, + sizeof(hdr), -1); This might truncate packet if it does not fit in 1024 anymore. It might be better to just drop it. Ok, will do this in V3. +out_num += 1; +out_sg = sg2; +} } /* diff --git a/include/hw/virtio/virtio-access.h b/include/hw/virtio/virtio-access.h index cee5dd7..1ec1dfd 100644 --- a/include/hw/virtio/virtio-access.h +++ b/include/hw/virtio/virtio-access.h @@ -143,6 +143,15 @@ static inline uint64_t virtio_ldq_p(VirtIODevice *vdev, const void *ptr) } } +static inline bool virtio_needs_swap(VirtIODevice *vdev) +{ +#ifdef HOST_WORDS_BIGENDIAN +return virtio_access_is_big_endian(vdev) ? false : true; +#else +return virtio_access_is_big_endian(vdev) ? true : false; +#endif +} + static inline uint16_t virtio_tswap16(VirtIODevice *vdev, uint16_t s) { #ifdef HOST_WORDS_BIGENDIAN -- 2.1.4
[Qemu-devel] [PULL 0/9] target-mips queue
Hi, This pull request contains MIPS bug fixes for 2.4-rc1. Thanks, Leon Cc: Peter Maydell peter.mayd...@linaro.org Cc: Aurelien Jarno aurel...@aurel32.net The following changes since commit 661725da09f47eb92d356fac10a4cf3b7ad1f61d: Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20150714' into staging (2015-07-14 18:50:17 +0100) are available in the git repository at: git://github.com/lalrae/qemu.git tags/mips-20150716 for you to fetch changes up to 908680c6441ac468f4871d513f42be396ea0d264: target-mips: fix page fault address for LWL/LWR/LDL/LDR (2015-07-15 14:07:25 +0100) MIPS patches 2015-07-16 Changes: * bug fixes Andrew Bennett (1): linux-user: Fix MIPS N64 trap and break instruction bug Aurelien Jarno (2): target-mips: fix ASID synchronisation for MIPS MT target-mips: fix page fault address for LWL/LWR/LDL/LDR Leon Alrae (3): target-mips: correct DERET instruction target-mips: fix logically dead code reported by Coverity target-mips: fix resource leak reported by Coverity Yongbok Kim (3): target-mips: fix MIPS64R6-generic configuration target-mips: fix to clear MSACSR.Cause disas/mips: fix disassembling R6 instructions disas/mips.c | 12 ++-- linux-user/main.c| 4 ++-- target-mips/mips-defs.h | 2 +- target-mips/mips-semi.c | 23 +++ target-mips/msa_helper.c | 6 ++ target-mips/op_helper.c | 5 ++--- target-mips/translate.c | 15 +++ target-mips/translate_init.c | 18 +- 8 files changed, 60 insertions(+), 25 deletions(-)
Re: [Qemu-devel] [PATCH 08/10 v12] target-tilegx: Add several helpers for instructions translation
Hello Maintainers:brbrPlease help review this patch when you have time.brbrThanks.brbrOn 06/13/2015 09:19 PM, Chen Gang wrote:brgt; The related instructions are exception, cntlz, cnttz, shufflebytes, andbrgt; add_saturate.brgt;brgt; Signed-off-by: Chen Gang lt;gang.chen.5...@gmail.comgt;brgt; ---brgt; target-tilegx/helper.c | 83 ++brgt; target-tilegx/helper.h | 5 +++brgt; 2 files changed, 88 insertions(+)brgt; create mode 100644 target-tilegx/helper.cbrgt; create mode 100644 target-tilegx/helper.hbrgt;brgt; diff --git a/target-tilegx/helper.c b/target-tilegx/helper.cbrgt; new file mode 100644brgt; index 000..5ab41cdbrgt; --- /dev/nullbrgt; +++ b/target-tilegx/helper.cbrgt; @@ -0,0 +1,83 @@brgt; +/*brgt; + * QEMU TILE-Gx helpersbrgt; + *brgt; + * Copyright (c) 2015 Chen Gangbrgt; + *brgt; + * This library is free software; you can redistribute it and/orbrgt; + * modify it under the terms of the GNU Lesser General Publicbrgt; + * License as published by the Free Software Foundation; eitherbrgt; + * version 2.1 of the License, or (at your option) any later version.brgt; + *brgt; + * This library is distributed in the hope that it will be useful,brgt; + * but WITHOUT ANY WARRANTY; without even the implied warranty ofbrgt; + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNUbrgt; + * Lesser General Public License for more details.brgt; + *brgt; + * You should have received a copy of the GNU Lesser General Publicbrgt; + * License along with this library; if not, seebrgt; + * lt;http://www.gnu.org/licenses/lgpl-2.1.htmlgt;brgt; + */brgt; +brgt; +#include cpu.hbrgt; +#include qemu-common.hbrgt; +#include exec/helper-proto.hbrgt; +brgt; +#define SIGNBIT32 0x8000brgt; +brgt; +int64_t helper_add_saturate(CPUTLGState *env, uint64_t rsrc, uint64_t rsrcb)brgt; +{brgt; +uint32_t rdst = rsrc + rsrcb;brgt; +brgt; +if (((rdst ^ rsrc) amp; SIGNBIT32) amp;amp; !((rsrc ^ rsrcb) amp; SIGNBIT32)) {brgt; +rdst = ~(((int32_t)rsrc gt;gt; 31) ^ SIGNBIT32);brgt; + }brgt; +brgt; +return (int64_t)rdst;brgt; +}brgt; +brgt; +void helper_exception(CPUTLGState *env, uint32_t excp)brgt; +{brgt; + CPUState *cs = CPU(tilegx_env_get_cpu(env));brgt; +brgt; + cs-gt;exception_index = excp;brgt; +cpu_loop_exit(cs);brgt; +}brgt; +brgt; +uint64_t helper_cntlz(uint64_t arg)brgt; +{brgt; + return clz64(arg);brgt; +}brgt; +brgt; +uint64_t helper_cnttz(uint64_t arg)brgt; +{brgt; +return ctz64(arg);brgt; +}brgt; +brgt; +/*brgt; + * Functional Descriptionbrgt; + * uint64_t a = rf[SrcA];brgt; + * uint64_t b = rf[SrcB];brgt; + * uint64_t d = rf[Dest];brgt; + * uint64_t output = 0;brgt; + * unsigned int counter;brgt; + * for (counter = 0; counter lt; (WORD_SIZE / BYTE_SIZE); counter++)brgt; + * {brgt; + * int sel = getByte (b, counter) amp; 0xf;brgt; + * uint8_t byte = (sel lt; 8) ? getByte (d, sel) : getByte (a, (sel - 8));brgt; + * output = setByte (output, counter, byte);brgt; + * }brgt; + * rf[Dest] = output;brgt; + */brgt; +uint64_t helper_shufflebytes(uint64_t rdst, uint64_t rsrc, uint64_t rsrcb)brgt; +{brgt; +uint64_t vdst = 0;brgt; +int count;brgt; +brgt; +for (count = 0; count lt; 64; count += 8) {brgt; +uint64_t sel = rsrcb gt;gt; count;brgt; +uint64_t src = (sel amp; 8) ? rsrc : rdst;brgt; +vdst |= ((src gt;gt; ((sel amp; 7) * 8)) amp; 0xff) lt;lt; count;brgt; + }brgt; +brgt; +return vdst;brgt; +}brgt; diff --git a/target-tilegx/helper.h b/target-tilegx/helper.hbrgt; new file mode 100644brgt; index 000..1411c19brgt; --- /dev/nullbrgt; +++ b/target-tilegx/helper.hbrgt; @@ -0,0 +1,5 @@brgt; +DEF_HELPER_2(exception, noreturn, env, i32)brgt; +DEF_HELPER_FLAGS_1(cntlz, TCG_CALL_NO_RWG_SE, i64, i64)brgt; +DEF_HELPER_FLAGS_1(cnttz, TCG_CALL_NO_RWG_SE, i64, i64)brgt; +DEF_HELPER_FLAGS_3(shufflebytes, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)brgt; +DEF_HELPER_3(add_saturate, s64, env, i64, i64)brgt;brbr--brChen GangbrbrOpen, share, and attitude like air, water, and life which God blessedbr
Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:24:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 17:08:27 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote: Hi, A while ago I was pondering on the options available for retrieving a fw_cfg blob from the guest-side (now that we can insert fw_cfg files on the host-side command line, see commit 81b2b8106). So over the last couple of weekends I cooked up the sysfs kernel module below, which lists all fw_cfg files under /sys/firmware/fw_cfg/filename. One concern here is that there will be a conflict here if fw cfg is used by ACPI. I don't know whether that last is a good idea though, so maybe not a real concern. I think Igor wanted to make it so. I don't see any conflict here so far, it's just guest side module that accesses fw_cfg. If there's ACPI code that accesses fw_cfg in response to an interrupt, it'll race with fw cfg access by guest OS. On linux we might be able to block ACPI preventing it from running. We probably won't be able to do it on windows. wrt vmgenid series we were talking about possibility to access fw_cfg from ACPI device.init so it's unlikely that it will ever collide with much later sysfs accesses. Don't we need to get updates when it changes too? But if ACPI will start accessing fw_cfg from other methods it will race for sure. Maybe we shouldn't poke at fw cfg in ACPI then. -- MST
Re: [Qemu-devel] [PATCH v4 0/7] Fix QEMU crash during memory hotplug with vhost=on
On Thu, Jul 16, 2015 at 09:26:21AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:32:31 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 05:12:01PM +0200, Igor Mammedov wrote: On Thu, 9 Jul 2015 13:47:17 +0200 Igor Mammedov imamm...@redhat.com wrote: there also is yet another issue with vhost-user. It also has very low limit on amount of memory regions (if I recall correctly 8) and it's possible to trigger even without memory hotplug. one just need to start QEMU with a several -numa memdev= options to create a necessary amount of memory regions to trigger it. lowrisk option to fix it would be increasing limit in vhost-user backend. another option is disabling vhost and fall-back to virtio, but I don't know much about vhost if it's possible to to switch it off without loosing packets guest was sending at the moment and if it will work at all with vhost. With vhost-user you can't fall back to virtio: it's not an accelerator, it's the backend. Updating the protocol to support a bigger table is possible but old remotes won't be able to support it. it looks like increasing limit is the only option left. it's not ideal that old remotes /with hardcoded limit/ might not be able to handle bigger table but at least new ones and ones that handle VhostUserMsg payload dynamically would be able to work without crashing. I think we need a way for hotplug to fail gracefully. As long as we don't implement the hva trick, it's needed for old kernels with vhost in kernel, too. -- MST
[Qemu-devel] [PATCH 1/3] memory: make function invalidate_and_set_dirty public
Make function invalidate_and_set_dirty public. It is required by PAM emulation. Signed-off-by: Efimov Vasily r...@ispras.ru --- exec.c | 2 +- include/exec/memory-internal.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/exec.c b/exec.c index 251dc79..4e37ded 100644 --- a/exec.c +++ b/exec.c @@ -2281,7 +2281,7 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr, #else -static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr, +void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr, hwaddr length) { uint8_t dirty_log_mask = memory_region_get_dirty_log_mask(mr); diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h index fb467ac..801da82 100644 --- a/include/exec/memory-internal.h +++ b/include/exec/memory-internal.h @@ -31,5 +31,8 @@ extern const MemoryRegionOps unassigned_mem_ops; bool memory_region_access_valid(MemoryRegion *mr, hwaddr addr, unsigned size, bool is_write); +void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr, + hwaddr length); + #endif #endif -- 1.9.1
Re: [Qemu-devel] [PATCH 3/3] PAM: make PAM emulation closer to documentation
On 16/07/2015 10:35, Efimov Vasily wrote: This patch improves PAM emulation. PAM defines 4 memory access redirection modes. In mode 1 reads are directed to RAM and writes are directed to PCI. In mode 2 it is contrary. In mode 0 all access is directed to PCI. In mode 3 it is directed to RAM. Modes 0 and 3 are well emulated but modes 1 and 2 are not. The cause is: aliases are used while more complicated logic is required. The idea is to use ROM device like memory regions for mode 1 and 2 emulation instead of aliases. Writes are directed to proper destination region by specified I/O callback. Read redirection depends on type of source region. In most cases source region is RAM (or ROM), so ram_addr of PAM region is set to ram_addr of source region with offset. Otherwise, when source region is an I/O region, reading is redirected to source region read callback by PAM region one. Read source and write destination regions are updated by the memory commit callback. Note that we cannot use I/O region for PAM as it will violate trying to execute code outside RAM or ROM assertion. Signed-off-by: Efimov Vasily r...@ispras.ru --- hw/pci-host/pam.c | 238 +- include/hw/pci-host/pam.h | 10 +- 2 files changed, 223 insertions(+), 25 deletions(-) diff --git a/hw/pci-host/pam.c b/hw/pci-host/pam.c index 17d826c..9729b2b 100644 --- a/hw/pci-host/pam.c +++ b/hw/pci-host/pam.c @@ -27,43 +27,233 @@ * THE SOFTWARE. */ -#include qom/object.h -#include sysemu/sysemu.h #include hw/pci-host/pam.h +#include exec/address-spaces.h +#include exec/memory-internal.h +#include qemu/bswap.h + +static void pam_write(void *opaque, hwaddr addr, uint64_t data, + unsigned size) +{ +PAMMemoryRegion *pam = (PAMMemoryRegion *) opaque; +void *ptr; + +/* Destination region can be both RAM and IO. */ +if (!memory_access_is_direct(pam-mr_write_to, true)) { +memory_region_dispatch_write(pam-mr_write_to, + addr + pam-write_offset, data, size, + MEMTXATTRS_UNSPECIFIED); +} else { +ptr = memory_region_get_ram_ptr(pam-mr_write_to) + addr + + pam-write_offset; + +switch (size) { +case 1: +stb_p(ptr, data); +break; +case 2: +stw_he_p(ptr, data); +break; +case 4: +stl_he_p(ptr, data); +break; +case 8: +stq_he_p(ptr, data); +break; +default: +abort(); +} + +invalidate_and_set_dirty(pam-mr_write_to, addr + pam-pam_offset, + size); +} +} + The idea is very good, but the implementation relies on copying a lot of code from exec.c. Could you use an IOMMU memory region instead? Then a single region can be used to implement all four modes, and you don't hit the trying to execute code outside RAM or RAM. Paolo
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
Am 16.07.2015 um 13:20 schrieb Paolo Bonzini: On 16/07/2015 13:03, Christian Borntraeger wrote: For what its worth, I can no longer reproduce the issue on current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll) bisect tells me that commit 53ec73e264f481b79b52efcadc9ceb8f8996975c Author: Fam Zheng f...@redhat.com AuthorDate: Fri May 29 18:53:14 2015 +0800 Commit: Stefan Hajnoczi stefa...@redhat.com CommitDate: Tue Jul 7 14:27:14 2015 +0100 block: Use bdrv_drain to replace uncessary bdrv_drain_all made the problem will blk-null go away. I still dont understand why. It could be related to the AioContext problem that I'm fixing these days, too. Good news, we'll requeue the patch for 2.5. That was also something that I had in mind (in fact I retested this to check the ctx patch). master + cherry-pick of a0710f7995f + revert of 53ec73e26 + this fix still fails, so it was (is?) a different issue. The interesting part is that this problem required 2 or more disk (and we replace drain_all with single drains) so it somewhat sounds plausible. Christian
[Qemu-devel] [PATCH v2] target-arm: kvm: Differentiate registers based on write-back levels
Some registers like the CNTVCT register should only be written to the kernel as part of machine initialization or on vmload operations, but never during runtime, as this can potentially make time go backwards or create inconsistent time observations between VCPUs. Introduce a list of registers that should not be written back at runtime and check this list on syncing the register state to the KVM state. Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changes since RFC: - Move cpreg_level to kvm_arm_cpreg_level and into kvm32.c and kvm64.c - Changed struct name and declare as static const dtc | 2 +- target-arm/kvm.c | 6 +- target-arm/kvm32.c | 30 +- target-arm/kvm64.c | 30 +- target-arm/kvm_arm.h | 12 +++- target-arm/machine.c | 2 +- 6 files changed, 76 insertions(+), 6 deletions(-) diff --git a/dtc b/dtc index 65cc4d2..bc895d6 16 --- a/dtc +++ b/dtc @@ -1 +1 @@ -Subproject commit 65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf +Subproject commit bc895d6d09695d05ceb8b52486ffe861d6cfbdde diff --git a/target-arm/kvm.c b/target-arm/kvm.c index 548bfd7..b278542 100644 --- a/target-arm/kvm.c +++ b/target-arm/kvm.c @@ -409,7 +409,7 @@ bool write_kvmstate_to_list(ARMCPU *cpu) return ok; } -bool write_list_to_kvmstate(ARMCPU *cpu) +bool write_list_to_kvmstate(ARMCPU *cpu, int level) { CPUState *cs = CPU(cpu); int i; @@ -421,6 +421,10 @@ bool write_list_to_kvmstate(ARMCPU *cpu) uint32_t v32; int ret; +if (kvm_arm_cpreg_level(regidx) level) { +continue; +} + r.id = regidx; switch (regidx KVM_REG_SIZE_MASK) { case KVM_REG_SIZE_U32: diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c index d7e7d68..6769815 100644 --- a/target-arm/kvm32.c +++ b/target-arm/kvm32.c @@ -153,6 +153,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx) } } +typedef struct CPRegStateLevel { +uint64_t regidx; +int level; +} CPRegStateLevel; + +/* All coprocessor registers not listed in the following table are assumed to + * be of the level KVM_PUT_RUNTIME_STATE, a register should be written less + * often, you must add it to this table with a state of either + * KVM_PUT_RESET_STATE or KVM_PUT_FULL_STATE. + */ +static const CPRegStateLevel non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, +}; + +int kvm_arm_cpreg_level(uint64_t regidx) +{ +int i; + +for (i = 0; i ARRAY_SIZE(non_runtime_cpregs); i++) { +const CPRegStateLevel *l = non_runtime_cpregs[i]; +if (l-regidx == regidx) { +return l-level; +} +} + +return KVM_PUT_RUNTIME_STATE; +} + #define ARM_MPIDR_HWID_BITMASK 0xFF #define ARM_CPU_ID_MPIDR 0, 0, 0, 5 @@ -367,7 +395,7 @@ int kvm_arch_put_registers(CPUState *cs, int level) * managed to update the CPUARMState with, and only allowing those * to be written back up into the kernel). */ -if (!write_list_to_kvmstate(cpu)) { +if (!write_list_to_kvmstate(cpu, level)) { return EINVAL; } diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c index ac34f51..d59f41c 100644 --- a/target-arm/kvm64.c +++ b/target-arm/kvm64.c @@ -139,6 +139,34 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx) } } +typedef struct CPRegStateLevel { +uint64_t regidx; +int level; +} CPRegStateLevel; + +/* All system not listed in the following table are assumed to be of the level + * KVM_PUT_RUNTIME_STATE, a register should be written less often, you must + * add it to this table with a state of either KVM_PUT_RESET_STATE or + * KVM_PUT_FULL_STATE. + */ +static const CPRegStateLevel non_runtime_cpregs[] = { +{ KVM_REG_ARM_TIMER_CNT, KVM_PUT_FULL_STATE }, +}; + +int kvm_arm_cpreg_level(uint64_t regidx) +{ +int i; + +for (i = 0; i ARRAY_SIZE(non_runtime_cpregs); i++) { +const CPRegStateLevel *l = non_runtime_cpregs[i]; +if (l-regidx == regidx) { +return l-level; +} +} + +return KVM_PUT_RUNTIME_STATE; +} + #define AARCH64_CORE_REG(x) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) @@ -280,7 +308,7 @@ int kvm_arch_put_registers(CPUState *cs, int level) return ret; } -if (!write_list_to_kvmstate(cpu)) { +if (!write_list_to_kvmstate(cpu, level)) { return EINVAL; } diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h index 5abd591..7912d74 100644 --- a/target-arm/kvm_arm.h +++ b/target-arm/kvm_arm.h @@ -69,8 +69,18 @@ int kvm_arm_init_cpreg_list(ARMCPU *cpu); bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx); /** + * kvm_arm_cpreg_level + * regidx: KVM register index + * + * Return the level of this coprocessor/system register. Return value is + * either KVM_PUT_RUNTIME_STATE, KVM_PUT_RESET_STATE, or KVM_PUT_FULL_STATE. + */
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On 16/07/2015 13:24, Christian Borntraeger wrote: The interesting part is that this problem required 2 or more disk (and we replace drain_all with single drains) so it somewhat sounds plausible. Yes, indeed. It is very plausible. I wanted to reproduce it these days, so thanks for saving me a lot of time! I'll test your exact setup (master + AioContext fix + cherry-pick of a0710f7995f + revert of 53ec73e26). Thanks, Paolo
Re: [Qemu-devel] [PATCH 2/5] virtio-blk: disable scsi passthrough for 1.0 device
On Thu, Jul 16, 2015 at 02:37:12PM +0200, Cornelia Huck wrote: On Wed, 15 Jul 2015 21:51:32 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 05:38:53PM +0200, Cornelia Huck wrote: On Wed, 15 Jul 2015 17:39:18 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 04:30:51PM +0200, Cornelia Huck wrote: On Wed, 15 Jul 2015 17:11:57 +0300 Michael S. Tsirkin m...@redhat.com wrote: Fine, but revision is negotiated way before features are probed so why does it make a practical difference? Legacy drivers (that don't know about the set-revision command) will read features without revision negotiation - we need to offer them the legacy feature set. Right. So simply do if (revision 1) return features 0x and that will do this, will it not? Not for bits that we want to offer for legacy but not for modern. I don't think this selective offering works at least for scsi. scsi is a backend feature, if you connect a modern device in front the device simply does not work. It therefore makes no sense to attach a transitional device to such a backend. My point is that we're losing legacy features with that approach, and it would not be possible to offer them to legacy guests with newer qemus (at least with ccw). What's wrong with adding a disable-modern flag, like pci has? User can set that to get a legacy device. The whole idea behind the revision-stuff was that we don't need something like disable-modern. If the device is able to figure out on its own if it is to act as a modern or a legacy device, why require user intervention? It's about compatibility, e.g. being able to test legacy mode in transitional drivers in guests. Let's step back a bit. Why do we need legacy? To support old hosts and old guests. So what we need is a test matrix combining old/modern hosts with old/modern guests. I don't think forcing to an old version is something we should spend time on except during development. Consider also bugs, e.g. the fact that linux guests lack WCE in modern mode ATM. If we consider this a bug, shouldn't VERSION_1 be forced off unconditionally until specs and codebase are fixed? I don't see why. We'll never fix all bugs. It causes a performance regression but otherwise things work correctly. modern is off by default for now, this seems sufficient, if we never even ship it as an option we'll never find all bugs. What about the other way around (i.e. scsi is configured, therefore the device is legacy-only)? We'd only retain the scsi bit if it is actually wanted by the user's configuration. I would need to enforce a max revision of 0 for such a device in ccw, and pci could disable modern for it. Will have to think about it. But I think a flag to disable/enable modern is useful in any case, and it seems sufficient. I don't like the idea of disabling modern or legacy for ccw, where the differences between both are very minor. I also don't think requiring the user to specify a new flag on upgrade just to present the same features as before is a good idea: it is something that is easily missed and may lead to much headscratching. And doing this on a driver upgrade won't? As I said, if you believe this feature has value, argue that we shouldn't drop scsi off in virtio 1.0 then. I seem to have problems explaining myself :( I don't care about the feature, except for supporting old guests that should continue working as before even if the host updated. And without special configuration on the host, as this is too easily forgotten. What else is legacy good for, other than providing backwards compatibility? Maybe it's me having trouble explaining. There are two kind of blk devices: - those that allow scsi passthrough Such devices can't work properly through the modern interface. We could add a modern interface but device can not operate through it. - those that don't allow scsi passthrough Such devices can't work properly through the modern interface. We could add a modern interface and we'll get a transitional device. I think for 2.4 it's a good idea to avoid enabling modern interface by default. Therefore, for 2.4 we can keep scsi enabled unless modern is requested by user. I am also fine with just doing if (modern scsi) exit; -- MST
Re: [Qemu-devel] [Qemu-block] [PATCH] raw-posix.c: Make physical devices usable in QEMU
On Thu, Jul 09, 2015 at 10:02:26AM -0400, Programmingkid wrote: On Jul 9, 2015, at 6:52 AM, Stefan Hajnoczi wrote: On Tue, Jul 07, 2015 at 01:33:23PM -0400, Programmingkid wrote: Make physical devices like a USB flash drive or a CDROM drive work in QEMU. With this patch I can use a USB flash drive like a hard drive. Before this patch, QEMU would just quit with a message like resource busy. The commit message and the description are missing on Mac OS X. It should be clear right away that this applies to Mac only. This works fine on Linux and probably other host OSes. Yeah, that should have been done. Did you see any issues with the code? QEMU shouldn't silently open a different file than the one given by the user. The user should give the exact device file they want. If there is magic behavior it needs to be documented, but I don't see a reason why that's necessary in the case of device files. QEMU shouldn't mount/unmount the CD-ROM. atexit(3) doesn't handle crashes or abort(). Users may be confused to find their CD-ROM unmounted in those cases and would see this as a bug. Instead we should refuse mounted CD-ROMs so the user understands that block-level access requires them to unmount first. The strcpy/sprintf usage in this patch is unsafe and can lead to buffer overflow, for example in the case of generating command-lines. The command-line buffer is only MAXPATHLEN so prepending the command to the filename could exceed the buffer size. There is also a buffer overflow in the array of devices that need to be mounted. What happens if there are more than 7 devices? pgpzWeEYBeNpv.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] qemu-char: Fix missed data on unix socket
On (Thu) 16 Jul 2015 [10:11:04], Nils Carlson wrote: On Thu, 16 Jul 2015, Amit Shah wrote: On (Wed) 15 Jul 2015 [23:44:52], Nils Carlson wrote: On Mon, 13 Jul 2015, Nils Carlson wrote: On Mon, 13 Jul 2015, Amit Shah wrote: snip Also, returning TRUE there isn't right - if the connection ends, we should return FALSE. I agree that this seems reasonable. I will change it and re-test. I had a closer look, and it seems always returning true is intentional here, the called function, tcp_chr_disconnect(chr), handles the deregistration from handlers. If we were to return FALSE we would be duplicating work and possibly breaking things. Not sure how. Anyway, can you please start a new thread, with the author and reviewers of the patch CC'ed, so they can chime in as well? The authors and reviewers of which patch exactly? The fix for windows which broke cloudstack was done in 812c1057. Yea, the author of this patch. Also, I meant the return TRUE in this patch, not the one Paolo added. The returning TRUE everywhere was done in commit cdbf6e16 by Paolo Bonzini, CC:d above. I don't think there was anything wrong with either patch, the former one simply broken a strange behaviour CloudStack was relying on which I would say is dubious (fire and forget messaging.) There need not be anything wrong with a previous patch; but just because someone touched the area recently, they might have more context, and even help in spotting the bug. Amit
[Qemu-devel] [PATCH 0/4] target-arm: Implement Secure physical timer
We managed to forget to implement the Secure physical timer when we added TZ support for AArch32. This patchset fixes the omission, and wires up its interrupt. This patchset sits on top of https://git.linaro.org/people/peter.maydell/qemu-arm.git target-arm-post-2.4 which is where I'm putting patches which I've reviewed but which aren't 2.4 material. Currently that's just Edgar's hyp timer series. (Warning, branch will rebase.) Since we're now wiring up four timer interrupts in virt and a15mpcore, I switched them to a data-driven loop for neatness. Incidentally, the reason that kernels worked and continue to work even if they're booting Secure is that they cope with timer interrupts coming in via either the NS timer irq line or the S timer irq line. The next step is to put the secure-GIC patchset on top of this. I think we're going to need to make the virt board default to non-secure at that point, because the QEMU UEFI blob doesn't cope with being booted Secure. That's probably for the best anyway, since then it will be the same for TCG and KVM. thanks -- PMM Peter Maydell (4): target-arm: Add the AArch64 view of the Secure physical timer target-arm: Add AArch32 banked register access to secure physical timer hw/arm/virt: Wire up secure timer interrupt hw/cpu/a15mpcore: Wire up hyp and secure physical timer interrupts hw/arm/virt.c| 28 +++-- hw/cpu/a15mpcore.c | 21 ++ target-arm/cpu-qom.h | 1 + target-arm/cpu.c | 2 + target-arm/cpu.h | 3 +- target-arm/helper.c | 114 +++ 6 files changed, 148 insertions(+), 21 deletions(-) -- 1.9.1
Re: [Qemu-devel] [PATCH 2/5] virtio-blk: disable scsi passthrough for 1.0 device
On Wed, 15 Jul 2015 21:51:32 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 05:38:53PM +0200, Cornelia Huck wrote: On Wed, 15 Jul 2015 17:39:18 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 04:30:51PM +0200, Cornelia Huck wrote: On Wed, 15 Jul 2015 17:11:57 +0300 Michael S. Tsirkin m...@redhat.com wrote: Fine, but revision is negotiated way before features are probed so why does it make a practical difference? Legacy drivers (that don't know about the set-revision command) will read features without revision negotiation - we need to offer them the legacy feature set. Right. So simply do if (revision 1) return features 0x and that will do this, will it not? Not for bits that we want to offer for legacy but not for modern. I don't think this selective offering works at least for scsi. scsi is a backend feature, if you connect a modern device in front the device simply does not work. It therefore makes no sense to attach a transitional device to such a backend. My point is that we're losing legacy features with that approach, and it would not be possible to offer them to legacy guests with newer qemus (at least with ccw). What's wrong with adding a disable-modern flag, like pci has? User can set that to get a legacy device. The whole idea behind the revision-stuff was that we don't need something like disable-modern. If the device is able to figure out on its own if it is to act as a modern or a legacy device, why require user intervention? It's about compatibility, e.g. being able to test legacy mode in transitional drivers in guests. Let's step back a bit. Why do we need legacy? To support old hosts and old guests. So what we need is a test matrix combining old/modern hosts with old/modern guests. I don't think forcing to an old version is something we should spend time on except during development. Consider also bugs, e.g. the fact that linux guests lack WCE in modern mode ATM. If we consider this a bug, shouldn't VERSION_1 be forced off unconditionally until specs and codebase are fixed? What about the other way around (i.e. scsi is configured, therefore the device is legacy-only)? We'd only retain the scsi bit if it is actually wanted by the user's configuration. I would need to enforce a max revision of 0 for such a device in ccw, and pci could disable modern for it. Will have to think about it. But I think a flag to disable/enable modern is useful in any case, and it seems sufficient. I don't like the idea of disabling modern or legacy for ccw, where the differences between both are very minor. I also don't think requiring the user to specify a new flag on upgrade just to present the same features as before is a good idea: it is something that is easily missed and may lead to much headscratching. And doing this on a driver upgrade won't? As I said, if you believe this feature has value, argue that we shouldn't drop scsi off in virtio 1.0 then. I seem to have problems explaining myself :( I don't care about the feature, except for supporting old guests that should continue working as before even if the host updated. And without special configuration on the host, as this is too easily forgotten. What else is legacy good for, other than providing backwards compatibility?
Re: [Qemu-devel] [PATCH 0/2] vmxnet3: Fix RX TCP/UDP checksum of partially summed packets
On Tue, Jul 14, 2015 at 11:55:14AM +0300, Shmulik Ladkani wrote: In case csum offloaded packet, vmxnet3 implementation always passes an RxCompDesc with the Checksum calculated and found correct notification to the OS. This emulates the observed ESXi behavior. Therefore, if packet has the NEEDS_CSUM bit set, we must calculate and place a fully computed checksum into the tcp/udp header. Otherwise, the OS driver will receive a checksum-correct indication but with the actual tcp/udp checksum field having just the pseudo header csum value. Fix, by converting partially summed packets to be fully checksummed. Shmulik Ladkani: net/vmxnet3: Refactor 'vmxnet_rx_pkt_attach_data' Dana Rubin: net/vmxnet3: Fix RX TCP/UDP checksum on partially summed packets hw/net/vmxnet3.c | 59 ++ hw/net/vmxnet_rx_pkt.c | 12 +++--- hw/net/vmxnet_rx_pkt.h | 11 ++ 3 files changed, 79 insertions(+), 3 deletions(-) -- 1.9.1 Thanks, applied to my net tree: https://github.com/stefanha/qemu/commits/net Stefan pgp6clwA4UFHu.pgp Description: PGP signature
Re: [Qemu-devel] iothread: release iothread around aio_poll causes random hangs at startup
On 16/07/2015 13:03, Christian Borntraeger wrote: For what its worth, I can no longer reproduce the issue on current master + cherry-pick of a0710f7995f (iothread: release iothread around aio_poll) bisect tells me that commit 53ec73e264f481b79b52efcadc9ceb8f8996975c Author: Fam Zheng f...@redhat.com AuthorDate: Fri May 29 18:53:14 2015 +0800 Commit: Stefan Hajnoczi stefa...@redhat.com CommitDate: Tue Jul 7 14:27:14 2015 +0100 block: Use bdrv_drain to replace uncessary bdrv_drain_all made the problem will blk-null go away. I still dont understand why. It could be related to the AioContext problem that I'm fixing these days, too. Good news, we'll requeue the patch for 2.5. Paolo
Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
* Amit Shah (amit.s...@redhat.com) wrote: On (Tue) 16 Jun 2015 [11:26:28], Dr. David Alan Gilbert (git) wrote: From: Dr. David Alan Gilbert dgilb...@redhat.com Open a return path, and handle messages that are received upon it. Signed-off-by: Dr. David Alan Gilbert dgilb...@redhat.com -/* migration thread support */ +/* + * Something bad happened to the RP stream, mark an error + * The caller shall print something to indicate why + */ +static void source_return_path_bad(MigrationState *s) Can you rename this to something like mark_source_rp_bad() ? Intent is clearer that way. Done. Also, the comment says caller will print something, but the invocations below are a mix of printfs and traces. Not saying the caller has to print always, but maybe only comment needs update. Yes, I've changed the comment, and changed one of the traces into an error_report. Thanks, Dave +{ +s-rp_state.error = true; +migrate_fd_cleanup_src_rp(s); +} + +/* + * Handles messages sent on the return path towards the source VM + * + */ +static void *source_return_path_thread(void *opaque) +{ +MigrationState *ms = opaque; +QEMUFile *rp = ms-rp_state.file; +uint16_t expected_len, header_len, header_type; +const int max_len = 512; +uint8_t buf[max_len]; +uint32_t tmp32; +int res; + +trace_source_return_path_thread_entry(); +while (rp !qemu_file_get_error(rp) +migration_already_active(ms)) { +trace_source_return_path_thread_loop_top(); +header_type = qemu_get_be16(rp); +header_len = qemu_get_be16(rp); + +switch (header_type) { +case MIG_RP_MSG_SHUT: +case MIG_RP_MSG_PONG: +expected_len = 4; +break; + +default: +error_report(RP: Received invalid message 0x%04x length 0x%04x, +header_type, header_len); +source_return_path_bad(ms); +goto out; +} +if (header_len expected_len) { +error_report(RP: Received message 0x%04x with +incorrect length %d expecting %d, +header_type, header_len, +expected_len); +source_return_path_bad(ms); +goto out; +} + +/* We know we've got a valid header by this point */ +res = qemu_get_buffer(rp, buf, header_len); +if (res != header_len) { +trace_source_return_path_thread_failed_read_cmd_data(); +source_return_path_bad(ms); +goto out; +} Amit -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] memory hotplug fails with mem-merge option
On Thu, 16 Jul 2015 16:00:07 +0530 Bharata B Rao bhar...@linux.vnet.ibm.com wrote: On Thu, Jul 16, 2015 at 10:25:09AM +0530, Bharata B Rao wrote: Hi, When I use -machine pc,mem-merge=off|on -m 1G,slots=4,maxmem=2G, adding memory-backend-ram object fails like below. Same failure is seen with -machine pseries,mem-merge=on|off. (qemu) object_add memory-backend-ram,id=ram0,size=1G qemu-system-x86_64: util/qemu-option.c:388: qemu_opt_get_bool_helper: Assertion `opt-desc opt-desc-type == QEMU_OPT_BOOL' failed. Same failure seen with -machine pseries,mem-merge=off when used together with -object memory-backend-file on QEMU commandline in which case boot time failure is observed. Regards, Bharata. looks like related to 0a7cf217 util/qemu-config: fix regression of qmp_query_command_line_options and 49d2e64 machine: remove qemu_machine_opts global list CCing Marcel in hope he knows how to fix it right away.
Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx-dispatching is dead, all hail ctx-notify_me
Sorry for the very long delay in replying to this. I wanted to be absolutely sure I was reproducing the bug. Unfortunately I'm only able to reproduce the bug with qemu 2.3.0 (both the version in Fedora Rawhide, or the tagged v2.3.0 from git). I cannot currently reproduce it at all with upstream qemu from git. The patches understandably only apply to upstream qemu from git, and have quite a few tricky conflicts with v2.3.0. I'll keep trying on this one. It may be that the window for the bug to reproduce with qemu.git has got smaller. Rich. -- Notes on how I try to reproduce this: (1) Using Fedora Rawhide aarch64 (2) libguestfs checked out and compiled from git (3) kraxel's edk2.git-aarch64-0-20150713.b1115.g2ad9cf3.noarch (4) heisenscsi.pl (attached). $ export LIBGUESTFS_HV=/path/to/qemu/aarch64-softmmu/qemu-system-aarch64 $ while true; do echo .; ./run ./heisenscsi.pl ; done -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org heisenscsi.pl Description: Perl program
Re: [Qemu-devel] [Bug 1474263] Re: Image format was not specified warning should be suppressed for the vvfat (and probably nbd) driver
On Wed, Jul 15, 2015 at 10:54:47AM -0600, Eric Blake wrote: On 07/15/2015 09:42 AM, Max Reitz wrote: Hi, Indeed using non-raw images should not be used over NBD. The warning however is not superfluous, since qemu does indeed probe the image format, so a malicious guest might write a qcow2 header into the raw image, thus making qemu probe a qcow2 image the next time the same configuration is used. The problem would be solved by not making qemu probe the image format over NBD, but always assume raw; but I guess this will break existing use cases, even though they were wrong from the start. Anyway, this is solved by explicitly specifying the image format to be raw, which is what the warning says. I could actually see the use of non-raw over NBD. We support nested protocols (where you can use qcow2-qcow2-file), that is, where a file contains a qcow2 file whose contents are themselves a qcow2 image. (Perhaps useful in nested guests, where the outer qcow2 layer serves a disk to an L0 guest, which in turn uses the inner layer to present a disk to an L1 guest). In such a case, opening just one layer of qcow2 for service over NBD will expose the inner qcow2 image, and connecting qemu as an NBD client with format=raw will directly manipulate the qcow2 data seen by the L0 guest, while connecting as an NBD client with format=qcow2 will see the raw data seen by the L1 guest. But it's more likely to encounter this scenario with NBD, and not with vvfat. I agree that it's perfectly okay to use non-raw on top of NBD. We allow image formats on host block devices and iSCSI LUNs. Why shouldn't they be allowed on NBD exports? Stefan pgp8CmTZLbvHt.pgp Description: PGP signature
[Qemu-devel] [PATCH 3/4] hw/arm/virt: Wire up secure timer interrupt
Wire up the secure timer interrupt. Since we've defined that the plain old physical timer is the NS timer, we can drop the now-out-of-date comment about QEMU not having TZ. Use a data-driven loop to wire up the timer interrupts, since we now have four of them and the code is the same for each. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/arm/virt.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index aab99f7..95b1a9a 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -392,20 +392,22 @@ static void create_gic(VirtBoardInfo *vbi, qemu_irq *pic) for (i = 0; i smp_cpus; i++) { DeviceState *cpudev = DEVICE(qemu_get_cpu(i)); int ppibase = NUM_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS; -/* physical timer; we wire it up to the non-secure timer's ID, - * since a real A15 always has TrustZone but QEMU doesn't. +int irq; +/* Mapping from the output timer irq lines from the CPU to the + * GIC PPI inputs we use for the virt board. */ -qdev_connect_gpio_out(cpudev, 0, - qdev_get_gpio_in(gicdev, - ppibase + ARCH_TIMER_NS_EL1_IRQ)); -/* virtual timer */ -qdev_connect_gpio_out(cpudev, 1, - qdev_get_gpio_in(gicdev, - ppibase + ARCH_TIMER_VIRT_IRQ)); -/* Hypervisor timer. */ -qdev_connect_gpio_out(cpudev, 2, - qdev_get_gpio_in(gicdev, - ppibase + ARCH_TIMER_NS_EL2_IRQ)); +const int timer_irq[] = { +[GTIMER_PHYS] = ARCH_TIMER_NS_EL1_IRQ, +[GTIMER_VIRT] = ARCH_TIMER_VIRT_IRQ, +[GTIMER_HYP] = ARCH_TIMER_NS_EL2_IRQ, +[GTIMER_SEC] = ARCH_TIMER_S_EL1_IRQ, +}; + +for (irq = 0; irq ARRAY_SIZE(timer_irq); irq++) { +qdev_connect_gpio_out(cpudev, irq, + qdev_get_gpio_in(gicdev, + ppibase + timer_irq[irq])); +} sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, ARM_CPU_IRQ)); sysbus_connect_irq(gicbusdev, i + smp_cpus, -- 1.9.1
[Qemu-devel] [PATCH 1/4] target-arm: Add the AArch64 view of the Secure physical timer
On CPUs with EL3, there are two physical timers, one for Secure and one for Non-secure. Implement this extra timer and the AArch64 registers which access it. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- target-arm/cpu-qom.h | 1 + target-arm/cpu.c | 2 ++ target-arm/cpu.h | 3 +- target-arm/helper.c | 87 4 files changed, 92 insertions(+), 1 deletion(-) diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h index 54db337..00c0716 100644 --- a/target-arm/cpu-qom.h +++ b/target-arm/cpu-qom.h @@ -225,6 +225,7 @@ int arm_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg); void arm_gt_ptimer_cb(void *opaque); void arm_gt_vtimer_cb(void *opaque); void arm_gt_htimer_cb(void *opaque); +void arm_gt_stimer_cb(void *opaque); #ifdef TARGET_AARCH64 int aarch64_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg); diff --git a/target-arm/cpu.c b/target-arm/cpu.c index 3525348..fafc3ed 100644 --- a/target-arm/cpu.c +++ b/target-arm/cpu.c @@ -455,6 +455,8 @@ static void arm_cpu_initfn(Object *obj) arm_gt_vtimer_cb, cpu); cpu-gt_timer[GTIMER_HYP] = timer_new(QEMU_CLOCK_VIRTUAL, GTIMER_SCALE, arm_gt_htimer_cb, cpu); +cpu-gt_timer[GTIMER_SEC] = timer_new(QEMU_CLOCK_VIRTUAL, GTIMER_SCALE, +arm_gt_stimer_cb, cpu); qdev_init_gpio_out(DEVICE(cpu), cpu-gt_timer_outputs, ARRAY_SIZE(cpu-gt_timer_outputs)); #endif diff --git a/target-arm/cpu.h b/target-arm/cpu.h index 7346c5f..4a6cd51 100644 --- a/target-arm/cpu.h +++ b/target-arm/cpu.h @@ -114,7 +114,8 @@ typedef struct ARMGenericTimer { #define GTIMER_PHYS 0 #define GTIMER_VIRT 1 #define GTIMER_HYP 2 -#define NUM_GTIMERS 3 +#define GTIMER_SEC 3 +#define NUM_GTIMERS 4 typedef struct { uint64_t raw_tcr; diff --git a/target-arm/helper.c b/target-arm/helper.c index 4a7dd24..d31b946 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -1214,6 +1214,32 @@ static CPAccessResult gt_vtimer_access(CPUARMState *env, const ARMCPRegInfo *ri) return gt_timer_access(env, GTIMER_VIRT); } +static CPAccessResult gt_stimer_access(CPUARMState *env, + const ARMCPRegInfo *ri) +{ +/* The AArch64 register view of the secure physical timer is + * always accessible from EL3, and configurably accessible from + * Secure EL1. + */ +switch (arm_current_el(env)) { +case 1: +if (!arm_is_secure(env)) { +return CP_ACCESS_TRAP; +} +if (!(env-cp15.scr_el3 SCR_ST)) { +return CP_ACCESS_TRAP_EL3; +} +return CP_ACCESS_OK; +case 0: +case 2: +return CP_ACCESS_TRAP; +case 3: +return CP_ACCESS_OK; +default: +g_assert_not_reached(); +} +} + static uint64_t gt_get_countervalue(CPUARMState *env) { return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) / GTIMER_SCALE; @@ -1420,6 +1446,34 @@ static void gt_hyp_ctl_write(CPUARMState *env, const ARMCPRegInfo *ri, gt_ctl_write(env, ri, GTIMER_HYP, value); } +static void gt_sec_timer_reset(CPUARMState *env, const ARMCPRegInfo *ri) +{ +gt_timer_reset(env, ri, GTIMER_SEC); +} + +static void gt_sec_cval_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +gt_cval_write(env, ri, GTIMER_SEC, value); +} + +static uint64_t gt_sec_tval_read(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return gt_tval_read(env, ri, GTIMER_SEC); +} + +static void gt_sec_tval_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +gt_tval_write(env, ri, GTIMER_SEC, value); +} + +static void gt_sec_ctl_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +gt_ctl_write(env, ri, GTIMER_SEC, value); +} + void arm_gt_ptimer_cb(void *opaque) { ARMCPU *cpu = opaque; @@ -1441,6 +1495,13 @@ void arm_gt_htimer_cb(void *opaque) gt_recalc_timer(cpu, GTIMER_HYP); } +void arm_gt_stimer_cb(void *opaque) +{ +ARMCPU *cpu = opaque; + +gt_recalc_timer(cpu, GTIMER_SEC); +} + static const ARMCPRegInfo generic_timer_cp_reginfo[] = { /* Note that CNTFRQ is purely reads-as-written for the benefit * of software; writing it doesn't actually change the timer frequency. @@ -1570,6 +1631,32 @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = { .resetvalue = 0, .accessfn = gt_vtimer_access, .writefn = gt_virt_cval_write, .raw_writefn = raw_write, }, +/* Secure timer -- this is actually restricted to only EL3 + * and configurably Secure-EL1 via the accessfn. + */ +{ .name = CNTPS_TVAL_EL1, .state = ARM_CP_STATE_AA64, + .opc0 = 3, .opc1 = 7, .crn = 14, .crm = 2, .opc2 = 0, + .type = ARM_CP_NO_RAW | ARM_CP_IO,
[Qemu-devel] [PATCH 4/4] hw/cpu/a15mpcore: Wire up hyp and secure physical timer interrupts
Since we now support both the hypervisor and the secure physical timer, wire their interrupt lines up in the a15mpcore wrapper object. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- hw/cpu/a15mpcore.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/hw/cpu/a15mpcore.c b/hw/cpu/a15mpcore.c index acc419e..49727d0 100644 --- a/hw/cpu/a15mpcore.c +++ b/hw/cpu/a15mpcore.c @@ -79,14 +79,21 @@ static void a15mp_priv_realize(DeviceState *dev, Error **errp) for (i = 0; i s-num_cpu; i++) { DeviceState *cpudev = DEVICE(qemu_get_cpu(i)); int ppibase = s-num_irq - 32 + i * 32; -/* physical timer; we wire it up to the non-secure timer's ID, - * since a real A15 always has TrustZone but QEMU doesn't. +int irq; +/* Mapping from the output timer irq lines from the CPU to the + * GIC PPI inputs used on the A15: */ -qdev_connect_gpio_out(cpudev, 0, - qdev_get_gpio_in(gicdev, ppibase + 30)); -/* virtual timer */ -qdev_connect_gpio_out(cpudev, 1, - qdev_get_gpio_in(gicdev, ppibase + 27)); +const int timer_irq[] = { +[GTIMER_PHYS] = 30, +[GTIMER_VIRT] = 27, +[GTIMER_HYP] = 26, +[GTIMER_SEC] = 29, +}; +for (irq = 0; irq ARRAY_SIZE(timer_irq); irq++) { +qdev_connect_gpio_out(cpudev, irq, + qdev_get_gpio_in(gicdev, + ppibase + timer_irq[irq])); +} } /* Memory map (addresses are offsets from PERIPHBASE): -- 1.9.1
Re: [Qemu-devel] [virtio guest] vring_need_event() from virtqueue_kick_prepare()
On Thu, Jul 09, 2015 at 01:45:50PM +0300, Catalin Vasile wrote: I have managed to deal with it. The thing is I was using one of the latest versions of qemu, but an old Linux Kernel version of 3.12. Old guest kernel or old host kernel? New QEMU should support old guest kernels. Maybe you have found a bug? pgpAhWoBaNKMQ.pgp Description: PGP signature
Re: [Qemu-devel] write operations during read
On Wed, Jul 15, 2015 at 02:04:17PM +0300, Gleb Stepanov wrote: Hello, all. This is a duplicate. Please see my reply to your previous email. pgpngLK6Z9I3F.pgp Description: PGP signature
Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file
On Thu, 16 Jul 2015 13:25:56 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 11:50:38AM +0200, Igor Mammedov wrote: On Thu, 16 Jul 2015 10:30:41 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Thu, Jul 16, 2015 at 08:57:36AM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 19:24:33 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Wed, Jul 15, 2015 at 06:01:42PM +0200, Igor Mammedov wrote: On Wed, 15 Jul 2015 17:08:27 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote: Hi, A while ago I was pondering on the options available for retrieving a fw_cfg blob from the guest-side (now that we can insert fw_cfg files on the host-side command line, see commit 81b2b8106). So over the last couple of weekends I cooked up the sysfs kernel module below, which lists all fw_cfg files under /sys/firmware/fw_cfg/filename. One concern here is that there will be a conflict here if fw cfg is used by ACPI. I don't know whether that last is a good idea though, so maybe not a real concern. I think Igor wanted to make it so. I don't see any conflict here so far, it's just guest side module that accesses fw_cfg. If there's ACPI code that accesses fw_cfg in response to an interrupt, it'll race with fw cfg access by guest OS. On linux we might be able to block ACPI preventing it from running. We probably won't be able to do it on windows. wrt vmgenid series we were talking about possibility to access fw_cfg from ACPI device.init so it's unlikely that it will ever collide with much later sysfs accesses. Don't we need to get updates when it changes too? I've thought that it's pretty static and doesn't change. vm gen id? No, its dynamic. nope, I've mean address of buffer, not UUID itself. But if ACPI will start accessing fw_cfg from other methods it will race for sure. Maybe we shouldn't poke at fw cfg in ACPI then. Yep, maybe we shouldn't. The last vmgenid series just maps UUID region at fixed location (which is sufficient to build ACPI tables at machine done time) so there isn't real need to export address via fw_cfg. Other versions just write into guest memory from QEMU, that will work fine too. yep they do, except of that they are more complex on ACPI side and it's more difficult to debug. I like the last revision of series for simplicity of design and behavior.
[Qemu-devel] [PATCH 2/4] target-arm: Add AArch32 banked register access to secure physical timer
If EL3 is AArch32, then the secure physical timer is accessed via banking of the registers used for the non-secure physical timer. Implement this banking. Note that the access controls for the AArch32 banked registers remain the same as the physical-timer checks; they are not the same as the controls on the AArch64 secure timer registers. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- target-arm/helper.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/target-arm/helper.c b/target-arm/helper.c index d31b946..6fd5d93 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -1527,12 +1527,22 @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = { }, /* per-timer control */ { .name = CNTP_CTL, .cp = 15, .crn = 14, .crm = 2, .opc1 = 0, .opc2 = 1, + .secure = ARM_CP_SECSTATE_NS, .type = ARM_CP_IO | ARM_CP_ALIAS, .access = PL1_RW | PL0_R, .accessfn = gt_ptimer_access, .fieldoffset = offsetoflow32(CPUARMState, cp15.c14_timer[GTIMER_PHYS].ctl), .writefn = gt_phys_ctl_write, .raw_writefn = raw_write, }, +{ .name = CNTP_CTL(S), + .cp = 15, .crn = 14, .crm = 2, .opc1 = 0, .opc2 = 1, + .secure = ARM_CP_SECSTATE_S, + .type = ARM_CP_IO | ARM_CP_ALIAS, .access = PL1_RW | PL0_R, + .accessfn = gt_ptimer_access, + .fieldoffset = offsetoflow32(CPUARMState, + cp15.c14_timer[GTIMER_SEC].ctl), + .writefn = gt_sec_ctl_write, .raw_writefn = raw_write, +}, { .name = CNTP_CTL_EL0, .state = ARM_CP_STATE_AA64, .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 2, .opc2 = 1, .type = ARM_CP_IO, .access = PL1_RW | PL0_R, @@ -1558,10 +1568,18 @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = { }, /* TimerValue views: a 32 bit downcounting view of the underlying state */ { .name = CNTP_TVAL, .cp = 15, .crn = 14, .crm = 2, .opc1 = 0, .opc2 = 0, + .secure = ARM_CP_SECSTATE_NS, .type = ARM_CP_NO_RAW | ARM_CP_IO, .access = PL1_RW | PL0_R, .accessfn = gt_ptimer_access, .readfn = gt_phys_tval_read, .writefn = gt_phys_tval_write, }, +{ .name = CNTP_TVAL(S), + .cp = 15, .crn = 14, .crm = 2, .opc1 = 0, .opc2 = 0, + .secure = ARM_CP_SECSTATE_S, + .type = ARM_CP_NO_RAW | ARM_CP_IO, .access = PL1_RW | PL0_R, + .accessfn = gt_ptimer_access, + .readfn = gt_sec_tval_read, .writefn = gt_sec_tval_write, +}, { .name = CNTP_TVAL_EL0, .state = ARM_CP_STATE_AA64, .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 2, .opc2 = 0, .type = ARM_CP_NO_RAW | ARM_CP_IO, .access = PL1_RW | PL0_R, @@ -1602,12 +1620,21 @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = { }, /* Comparison value, indicating when the timer goes off */ { .name = CNTP_CVAL, .cp = 15, .crm = 14, .opc1 = 2, + .secure = ARM_CP_SECSTATE_NS, .access = PL1_RW | PL0_R, .type = ARM_CP_64BIT | ARM_CP_IO | ARM_CP_ALIAS, .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_PHYS].cval), .accessfn = gt_ptimer_access, .writefn = gt_phys_cval_write, .raw_writefn = raw_write, }, +{ .name = CNTP_CVAL(S), .cp = 15, .crm = 14, .opc1 = 2, + .secure = ARM_CP_SECSTATE_S, + .access = PL1_RW | PL0_R, + .type = ARM_CP_64BIT | ARM_CP_IO | ARM_CP_ALIAS, + .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_SEC].cval), + .accessfn = gt_ptimer_access, + .writefn = gt_sec_cval_write, .raw_writefn = raw_write, +}, { .name = CNTP_CVAL_EL0, .state = ARM_CP_STATE_AA64, .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 2, .opc2 = 2, .access = PL1_RW | PL0_R, -- 1.9.1
Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] blockjob: Introduce block_job_relax_cpu
On Wed, Jul 15, 2015 at 06:32:54PM +0800, Fam Zheng wrote: On Tue, 07/14 13:31, Stefan Hajnoczi wrote: On Fri, Jul 10, 2015 at 05:42:48AM +0200, Alexandre DERUMIER wrote: By the way, why did you choose 10 milliseconds? That is quite long. If this function is called once per 10 ms disk I/O operations then we lose 50% utilization. 1 ms or less would be reasonable. From my tests, 1ms is not enough, It still hanging in guest or qmp queries. 10ms give me optimal balance between bitmap scan speed and guest responsiveness. Then I don't fully understand the bug. Fam: can you explain why 1ms isn't enough? In Alexandre's case, I suppose it's because the lseek is so slow that sleeping for 1ms would still let mirror coroutine to occupy, say, 90% of CPU time, so guest IO stutters. Perhaps we could move lseek to thread pool in the future. Right, that's the real problem here. If lseek is done in a worker thread than the coroutine yields in the meantime and the responsiveness problem is solved. This sounds like an important fix in the early 2.5 release cycle. pgpWzrqa3oBVa.pgp Description: PGP signature
Re: [Qemu-devel] Write requests, during read operations
On Tue, Jul 14, 2015 at 05:39:13PM +0300, Gleb Stepanov wrote: I've measured performance on qcow2 image mounted to nb0 device. nb0 device? I run tests and launch iostat -x 1 command to browse I/O operations. Test consists of sequential read operations with 1 MB size. I've get following output. Device: rrqm/s wrqm/s r/s w/srkB/swkB/s sda 0.00 5.000.00 677.00 0.00 22812.00 sdb 0.00 0.000.00 545.00 0.00 0.00 dm-0 0.00 0.000.00 674.00 0.00 22784.00 dm-1 0.00 0.000.007.00 0.0028.00 dm-2 0.00 0.000.000.00 0.00 0.00 nb0 0.00 0.00 432.00 182.00 55296.00 23296.00 So, reading produce a lot of write operations. What is the reason why these write operations occured? How to determina block size that qemu operates , because if i've writtien 55 mb with 1 mb blocks i should get 55 operations instead of 432. Please post details of the benchmark: * fio job file and fio command-line * commands used to launch QEMU or qemu-nbd pgpmdxYy8OZh_.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 3/3] PAM: make PAM emulation closer to documentation
16.07.2015 14:10, Paolo Bonzini wrote: On 16/07/2015 12:51, Ефимов Василий wrote: The rest of code looks up destination or source region or child region offset in memory sub-tree which root is PCI or RAM region provided on PAM creation. We cannon use common address_space_translate because it searches from address space root and will return current PAM region. To summarize, I suggest to move the code to exec.c. It is generic enough. All these mechanism are extremely low level. They are encapsulated within exec.c, and copying code to pam.c is not a good idea because you already have all the AddressSpaces and RAM MemoryRegions you need. The core problem is there is no region type which can redirects access depending on whether it is read or write. The access type driven alias could be perfect. But it is quite difficult to invent such type of alias (or to generalize existing). The main problem is rendering memory tree to FlatView. Could you use an IOMMU memory region instead? Then a single region can be used to implement all four modes, and you don't hit the trying to execute code outside RAM or RAM. Did you mean MemoryRegion.iommu_ops ? The feature does not allow to change destination memory region. It does. You're right about this: exec.c: address_space_translate_for_iotlb: assert(!section-mr-iommu_ops); ... but an IOMMU region is not needed, and I think you can do everything without touching exec.c at all. +/* Read from RAM and write to PCI */ +memory_region_init_io(pam-region[1], OBJECT(dev), pam_ops, pam, +pam-r-ram-w-pci, size); This can be done with memory_region_set_readonly on the RAM region. You need to set mr-ops in order to point to pam_ops; for a first proof of concept you can just set the field directly. The idea is to read directly from system RAM region and to write to PCI using I/O (because I do not see another way to emulate access type driven redirection with existent API). If we create RAM and make it read only then new useless RAM block will be created. It is useless because of ram_addr of new region will be set to one within system RAM block. Hence, cleaner way is to create I/O region. Writes to the PCI memory space can use the PCI address space, with address_space_st*. There is no PCI AddressSpace (only MemoryRegion). But address_space_st* requires AddressSpace as argument. +/* Read from PCI and write to RAM */ +memory_region_init_io(pam-region[2], OBJECT(dev), pam_ops, pam, +pam-r-pci-w-ram, size); Here you cannot run code from ROM, so it can be a pure MMIO region. Reads can use address_space_ld*, while writes can use memory_region_get_ram_ptr. Even in this mode it is possible for code to be executed from ROM. This can happen when particular PCI address is within ROM device connected to PCI bus. I do not have examples where this happens in mode 2, but in mode 0 (where reads are also directed to PCI) this happens at startup time when BIOS is executed from ROM device on PCI. The path in memory tree is system - pam-pci - pci - isa-bios - pc.bios where pc.bios is ROM and pam-pci is alias. If this happens when PAM is in mode 2 then path should become system - pam-r-pci-w-ram where pam-r-pci-w-ram is ROM which ram_addr points to RAM block of pc.bios. Write handler of pam-r-pci-w-ram performs access to RAM block of pc.ram. Implemented mechanism performs search for leaf region. And it supports cases when leaf is I/O too. In this case pam-r-pci-w-ram (and pam-r-ram-w-pci) becomes clear I/O and both its handlers redirects access to corresponding region. Paolo Vasily
Re: [Qemu-devel] [RFC PATCH qemu v3 4/4] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)
On Thu, 2015-07-16 at 15:11 +1000, David Gibson wrote: On Tue, Jul 14, 2015 at 10:21:54PM +1000, Alexey Kardashevskiy wrote: This makes use of the new memory registering feature. The idea is to provide the userspace ability to notify the host kernel about pages which are going to be used for DMA. Having this information, the host kernel can pin them all once per user process, do locked pages accounting (once) and not spent time on doing that in real time with possible failures which cannot be handled nicely in some cases. This adds a guest RAM memory listener which notifies a VFIO container about memory which needs to be pinned/unpinned. VFIO MMIO regions (i.e. skip dump regions) are skipped. The feature is only enabled for SPAPR IOMMU v2. The host kernel changes are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does not call it when v2 is detected and enabled. This does not change the guest visible interface. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru I've looked at this in more depth now, and attempting to unify the pre-reg and mapping listeners like this can't work - they need to be listening on different address spaces: mapping actions need to be listening on the PCI address space, whereas the pre-reg needs to be listening on address_space_memory. For x86 - for now - those end up being the same thing, but on Power they're not. We do need to be clear about what differences are due to the presence of a guest IOMMU versus which are due to arch or underlying IOMMU type. For now Power has a guest IOMMU and x86 doesn't, but that could well change in future: we could well implement the guest side IOMMU for x86 in future (or x86 could invent a paravirt IOMMU interface). On the other side, BenH's experimental powernv machine type could introduce Power machines without a guest side IOMMU (or at least an optional guest side IOMMU). The quick and dirty approach here is: 1. Leave the main listener as is 2. Add a new pre-reg notifier to the spapr iommu specific code, which listens on address_space_memory, *not* the PCI space It is dirty and that's exactly what I've been advising Alexey against because we have entirely too much dirty spapr specific code that doesn't need to be spapr specific. I don't see why separate address space matters, that's done at the point of registering the listener and so far doesn't play much a role in the actual listener behavior, just which regions it sees. The more generally correct approach, which allows for more complex IOMMU arrangements and the possibility of new IOMMU types with pre-reg is: 1. Have the core implement both a mapping listener and a pre-reg listener (optionally enabled by a per-iommu-type flag). Basically the first one sees what *is* mapped, the second sees what *could* be mapped. This just sounds like different address spaces, address_space_memory vs address_space_physical_memory. 2. As now, the mapping listener listens on PCI address space, if RAM blocks are added, immediately map them into the host IOMMU, if guest IOMMU blocks appear register a notifier which will mirror guest IOMMU mappings to the host IOMMU (this is what we do now). Right, this is done now, nothing new required. 3. The pre-reg listener also listens on the PCI address space. RAM blocks added are pre-registered immediately. But, if guest IOMMU blocks are added, instead of registering a guest-iommu notifier, we register another listener on the *target* AS of the guest IOMMU, same callbacks as this one. In practice that target AS will almost always resolve to address_space_memory, but this can at least in theory handle crazy guest setups with multiple layers of IOMMU. 4. Have to ensure that the pre-reg callbacks always happen before the mapping calls. For a system with an IOMMU backend which requires pre-registration, but doesn't have a guest IOMMU, we need to pre-reg, then host-iommu-map RAM blocks that appear in PCI address space. Both of these just seem like details of the iommu-type (pre-reg) specific parts of the listener, I'm not understanding what's fundamentally different about it that we can't do it now, within a single listener, nor do I see how it's all that different from x86. x86 mappings are more dynamic, but that's largely in legacy space for x86 due to some bizarre compatibility things. x86 also tries to map mmio regions for p2p, but these are just minor details to mappings that are largely the same. Thanks, Alex