Re: [PATCH] hw/arm/musicpal: Convert to qemu_add_kbd_event_handler()
On 22.01.24 10:50, Alex Bennée wrote: > Jan Kiszka writes: > >> On 19.01.24 12:24, Alex Bennée wrote: >>> Peter Maydell writes: >>> >>>> Convert the musicpal key input device to use >>>> qemu_add_kbd_event_handler(). This lets us simplify it because we no >>>> longer need to track whether we're in the middle of a PS/2 multibyte >>>> key sequence. > >>> >>> Well the key input all works as intended and looks good to me. I'm a >>> little disappointed I couldn't get audio working on the musicpal machine >>> but that is not a problem for this patch. >>> >>> Tested-by: Alex Bennée >>> Reviewed-by: Alex Bennée >>> >> >> Looks good to me as well, all keys still work fine. >> >> No idea what's the issue with sound, though. I think I haven't run the >> whole stuff in a decade or so, had to search for all the pieces first of >> all again. The webradio service original behind this stopped their >> operations, at least for this device, but manually entered favorits >> still work on the real device - I still have one, though that is >> starting to get some issues as well. > > I navigated through the favourites and after pressing some keys it seems > to indicate there was a stream of some sort (or at least a bitrate was > reported ;-). > > The main issue I was having with sound was with pipewire - this would > eventually generate a lot of warning messages because input devices are > created but I guess the model wasn't draining the input buffers so > eventually we get: > > qemu: 0x7f1490259500: overrun write:5859188 filled:5842804 + size:940 > > max:4194304 > qemu: 0x7f14902680a0: overrun write:5860128 filled:5843744 + size:940 > > max:4194304 > qemu: 0x7f1490259500: overrun write:5861068 filled:5844684 + size:940 > > max:4194304 > qemu: 0x7f14902680a0: overrun write:5862008 filled:5845624 + size:940 > > max:4194304 > I'm getting these here: pulseaudio: set_source_output_volume() failed pulseaudio: Reason: Invalid argument ... > Is your image just a hacked up version of the original firmware or > something we have source for? I guess there was never a rockbox port for > the device? > It's an original firmware, nothing hacked. I do have some sources here, but just partial ones: U-Boot, kernel, not the complete userland and even not all kernel drivers IIRC. Jan
Re: [PATCH] hw/arm/musicpal: Convert to qemu_add_kbd_event_handler()
On 19.01.24 12:24, Alex Bennée wrote: > Peter Maydell writes: > >> Convert the musicpal key input device to use >> qemu_add_kbd_event_handler(). This lets us simplify it because we no >> longer need to track whether we're in the middle of a PS/2 multibyte >> key sequence. >> >> In the conversion we move the keyboard handler registration from init >> to realize, because devices shouldn't disturb the state of the >> simulation by doing things like registering input handlers until >> they're realized, so that device objects can be introspected >> safely. >> >> The behaviour where key-repeat is permitted for the arrow-keys only >> is intentional (added in commit 7c6ce4baedfcd0c), so we retain it, >> and add a comment to that effect. > > Well the key input all works as intended and looks good to me. I'm a > little disappointed I couldn't get audio working on the musicpal machine > but that is not a problem for this patch. > > Tested-by: Alex Bennée > Reviewed-by: Alex Bennée > Looks good to me as well, all keys still work fine. No idea what's the issue with sound, though. I think I haven't run the whole stuff in a decade or so, had to search for all the pieces first of all again. The webradio service original behind this stopped their operations, at least for this device, but manually entered favorits still work on the real device - I still have one, though that is starting to get some issues as well. Thanks, Jan
Re: [PATCH] hw/arm: virt: Add SBSA watchdog
On 05.05.22 10:40, Peter Maydell wrote: > On Sun, 1 May 2022 at 19:07, Jan Kiszka wrote: >> >> From: Jan Kiszka >> >> The virt machine lacks a watchdog so far while the sbsa-ref has a simple >> model that is also supported by Linux and even U-Boot. Let's take it to >> allow, e.g., system integration tests of A/B software update under >> watchdog monitoring. >> >> Signed-off-by: Jan Kiszka > > The virt board has a PCI bus, and QEMU has a model of a > PCI watchdog device -- the i6300esb. Can you use that? While one could do that, it implies using completely alien Intel x86 host controller bits on this platform. And, thus, would require changes in firmware (e.g. adding a driver to U-Boot). > > In general I much prefer it if we can support use cases in > virt via pluggable PCI devices rather than hard-wired MMIO > devices -- it means that only users who want the functionality > need to have the exposed attack surface area of the device > present in their VM. Valid point - would making this opt-in via a machine feature be sufficient? Jan -- Siemens AG, Technology Competence Center Embedded Linux
[PATCH] hw/arm: virt: Add SBSA watchdog
From: Jan Kiszka The virt machine lacks a watchdog so far while the sbsa-ref has a simple model that is also supported by Linux and even U-Boot. Let's take it to allow, e.g., system integration tests of A/B software update under watchdog monitoring. Signed-off-by: Jan Kiszka --- An alternative could be sp805. However, QEMU has no support for that device yet. hw/arm/Kconfig| 1 + hw/arm/virt.c | 35 +++ include/hw/arm/virt.h | 3 +++ 3 files changed, 39 insertions(+) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 97f3b38019..8df04a3533 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -29,6 +29,7 @@ config ARM_VIRT select ACPI_APEI select ACPI_VIOT select VIRTIO_MEM_SUPPORTED +select WDT_SBSA config CHEETAH bool diff --git a/hw/arm/virt.c b/hw/arm/virt.c index f94278935f..a7b85438a8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -78,6 +78,7 @@ #include "hw/virtio/virtio-mem-pci.h" #include "hw/virtio/virtio-iommu.h" #include "hw/char/pl011.h" +#include "hw/watchdog/sbsa_gwdt.h" #include "qemu/guest-random.h" #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \ @@ -154,6 +155,8 @@ static const MemMapEntry base_memmap[] = { [VIRT_NVDIMM_ACPI] ={ 0x0909, NVDIMM_ACPI_IO_LEN}, [VIRT_PVTIME] = { 0x090a, 0x0001 }, [VIRT_SECURE_GPIO] ={ 0x090b, 0x1000 }, +[VIRT_GWDT_CONTROL] = { 0x090c, 0x1000 }, +[VIRT_GWDT_REFRESH] = { 0x090d, 0x1000 }, [VIRT_MMIO] = { 0x0a00, 0x0200 }, /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */ [VIRT_PLATFORM_BUS] = { 0x0c00, 0x0200 }, @@ -190,6 +193,7 @@ static const int a15irqmap[] = { [VIRT_GPIO] = 7, [VIRT_SECURE_UART] = 8, [VIRT_ACPI_GED] = 9, +[VIRT_GWDT_WS0] = 10, [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */ [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */ [VIRT_SMMU] = 74,/* ...to 74 + NUM_SMMU_IRQS - 1 */ @@ -902,6 +906,35 @@ static void create_rtc(const VirtMachineState *vms) g_free(nodename); } +static void create_wdt(const VirtMachineState *vms) +{ +hwaddr cbase = vms->memmap[VIRT_GWDT_CONTROL].base; +hwaddr csize = vms->memmap[VIRT_GWDT_CONTROL].size; +hwaddr rbase = vms->memmap[VIRT_GWDT_REFRESH].base; +hwaddr rsize = vms->memmap[VIRT_GWDT_REFRESH].size; +DeviceState *dev = qdev_new(TYPE_WDT_SBSA); +SysBusDevice *s = SYS_BUS_DEVICE(dev); +int irq = vms->irqmap[VIRT_GWDT_WS0]; +const char compat[] = "arm,sbsa-gwdt"; +MachineState *ms = MACHINE(vms); +char *nodename; + +sysbus_realize_and_unref(s, _fatal); +sysbus_mmio_map(s, 1, cbase); +sysbus_mmio_map(s, 0, rbase); +sysbus_connect_irq(s, 0, qdev_get_gpio_in(vms->gic, irq)); + +nodename = g_strdup_printf("/watchdog@%" PRIx64, cbase); +qemu_fdt_add_subnode(ms->fdt, nodename); +qemu_fdt_setprop(ms->fdt, nodename, "compatible", compat, sizeof(compat)); +qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg", + 2, cbase, 2, csize, 2, rbase, 2, rsize); +qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts", + GIC_FDT_IRQ_TYPE_SPI, irq, + GIC_FDT_IRQ_FLAGS_LEVEL_HI); +g_free(nodename); +} + static DeviceState *gpio_key_dev; static void virt_powerdown_req(Notifier *n, void *opaque) { @@ -2240,6 +2273,8 @@ static void machvirt_init(MachineState *machine) create_rtc(vms); +create_wdt(vms); + create_pcie(vms); if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) { diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 15feabac63..7f295f771e 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -86,6 +86,9 @@ enum { VIRT_ACPI_GED, VIRT_NVDIMM_ACPI, VIRT_PVTIME, +VIRT_GWDT_WS0, +VIRT_GWDT_CONTROL, +VIRT_GWDT_REFRESH, VIRT_LOWMEMMAP_LAST, }; -- 2.34.1
Re: [RFC PATCH 08/21] contrib/gitdm: Add Mentor Graphics to the domain map
On 06.10.20 14:41, Philippe Mathieu-Daudé wrote: > On 10/6/20 11:44 AM, Alex Bennée wrote: >> >> Jan Kiszka writes: >> >>> On 05.10.20 22:52, Joseph Myers wrote: >>>> On Mon, 5 Oct 2020, Alex Bennée wrote: >>>> >>>>> Joseph Myers writes: >>>>> >>>>>> On Sun, 4 Oct 2020, Philippe Mathieu-Daudé wrote: >>>>>> >>>>>>> There is a number of contributors from this domain, >>>>>>> add its own entry to the gitdm domain map. >>>>>> >>>>>> At some point the main branding will be Siemens; not sure how you want >>>>>> to >>>>>> handle that. >>>>> >>>>> We've already done something similar with WaveComp who have rolled up >>>>> the various mips and imgtec contributions into >>>>> contrib/gitdm/group-map-wavecomp. >>>>> >>>>> It's really up to you and which corporate entity would like internet >>>>> bragging points. The only Siemens contributor I could find is Jan Kiszka >>>>> but he has contributed a fair amount ;-) >>>> >>>> Given that the Mentor branding is going away (and the "Mentor Graphics" >>>> version largely has gone away, "Mentor, a Siemens Business" is what's >>>> currently used as a Mentor brand), probably it makes sense to use Siemens >>>> for both codesourcery.com and mentor.com addresses. >>>> >>> >>> I think the key question is what this map is used for: Is it supposed to >>> document the historic status, who was who at the time of contribution? >>> Or is its purpose to help identifying the copyright holder of a >>> contribution today? >> >> I don't know what others use them for but for me it was just an easy way >> to get a survey of the companies and individuals involved over the last >> year (2y, 3y, 5y... etc) of development. > > My personal interest is seeing how the corporate/academic/hobbyist > contributions are split, and how this evolves over time. > > Since there were entries for some companies, I added more, > but this is not a requisite and we can drop the patches if > considered not useful or giving headaches to the contributors. > >> The consolidation of >> contributions isn't overly distorting IMO. The biggest user is probably >> the end of year state of the nation surveys wanting to see what impact >> various organisations are having on a project and consolidation just >> helps push you up the table a little more. >> >> The biggest counter example we have at the moment is RedHat/IBM which >> AFAICT is because the RedHat guys are treated as a separate business >> unit with their own unique identity. >> >> Either way I don't think it's a major issue - hence it is up to the >> hackers to make the choice. > > Totally. In this particular case, we could even keep "Codesourcery" > group separate, similarly to RH/IBM. > If there is a way to express also timeline, then I would keep the historic contributions separate and account those since the merger (April 2017) to Siemens. You may even consider adding also the phase when Codesourcery was owned by Mentor. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux
Re: [RFC PATCH 08/21] contrib/gitdm: Add Mentor Graphics to the domain map
On 05.10.20 22:52, Joseph Myers wrote: > On Mon, 5 Oct 2020, Alex Bennée wrote: > >> Joseph Myers writes: >> >>> On Sun, 4 Oct 2020, Philippe Mathieu-Daudé wrote: >>> >>>> There is a number of contributors from this domain, >>>> add its own entry to the gitdm domain map. >>> >>> At some point the main branding will be Siemens; not sure how you want to >>> handle that. >> >> We've already done something similar with WaveComp who have rolled up >> the various mips and imgtec contributions into >> contrib/gitdm/group-map-wavecomp. >> >> It's really up to you and which corporate entity would like internet >> bragging points. The only Siemens contributor I could find is Jan Kiszka >> but he has contributed a fair amount ;-) > > Given that the Mentor branding is going away (and the "Mentor Graphics" > version largely has gone away, "Mentor, a Siemens Business" is what's > currently used as a Mentor brand), probably it makes sense to use Siemens > for both codesourcery.com and mentor.com addresses. > I think the key question is what this map is used for: Is it supposed to document the historic status, who was who at the time of contribution? Or is its purpose to help identifying the copyright holder of a contribution today? Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux
Re: scripts/gdb: issues when loading modules after lx-symbols
On 05.10.20 13:05, Stefano Garzarella wrote: > On Mon, Oct 05, 2020 at 11:45:41AM +0200, Jan Kiszka wrote: >> On 05.10.20 11:29, Stefano Garzarella wrote: >>> On Mon, Oct 05, 2020 at 10:33:30AM +0200, Jan Kiszka wrote: >>>> On 05.10.20 10:14, Stefano Garzarella wrote: >>>>> On Sun, Oct 04, 2020 at 08:52:37PM +0200, Jan Kiszka wrote: >>>>>> On 01.10.20 16:31, Stefano Garzarella wrote: >>>>>>> Hi, >>>>>>> I had some issues with gdb scripts and kernel modules in Linux 5.9-rc7. >>>>>>> >>>>>>> If the modules are already loaded, and I do 'lx-symbols', all work fine. >>>>>>> But, if I load a kernel module after 'lx-symbols', I had this issue: >>>>>>> >>>>>>> [ 5093.393940] invalid opcode: [#1] SMP PTI >>>>>>> [ 5093.395134] CPU: 0 PID: 576 Comm: modprobe Not tainted >>>>>>> 5.9.0-rc7-ste-00010-gf0b671d9608d-dirty #2 >>>>>>> [ 5093.397566] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >>>>>>> 1.13.0-2.fc32 04/01/2014 >>>>>>> [ 5093.400761] RIP: 0010:do_init_module+0x1/0x270 >>>>>>> [ 5093.402553] Code: ff ff e9 cf fe ff ff 0f 0b 49 c7 c4 f2 ff ff ff e9 >>>>>>> c1 fe ff ff e8 5f b2 65 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>>>> cc <1f> 44 00 00 55 ba 10 00 00 00 be c0 0c 00 00 48 89 e5 41 56 41 55 >>>>>>> [ 5093.409505] RSP: 0018:c9563d18 EFLAGS: 00010246 >>>>>>> [ 5093.412056] RAX: RBX: c010a0c0 RCX: >>>>>>> 4ee3 >>>>>>> [ 5093.414472] RDX: 4ee2 RSI: ea0001efe188 RDI: >>>>>>> c010a0c0 >>>>>>> [ 5093.416349] RBP: c9563e50 R08: R09: >>>>>>> 0002 >>>>>>> [ 5093.418044] R10: 0096 R11: 08a4 R12: >>>>>>> 88807a0d1280 >>>>>>> [ 5093.424721] R13: c010a110 R14: 88807a0d1300 R15: >>>>>>> c9563e70 >>>>>>> [ 5093.427138] FS: 7f018f632740() GS:88807dc0() >>>>>>> knlGS: >>>>>>> [ 5093.430037] CS: 0010 DS: ES: CR0: 80050033 >>>>>>> [ 5093.432279] CR2: 55fbe282b239 CR3: 7922a006 CR4: >>>>>>> 00170ef0 >>>>>>> [ 5093.435096] DR0: DR1: DR2: >>>>>>> >>>>>>> [ 5093.436765] DR3: DR6: fffe0ff0 DR7: >>>>>>> 0400 >>>>>>> [ 5093.439689] Call Trace: >>>>>>> [ 5093.440954] ? load_module+0x24b6/0x27d0 >>>>>>> [ 5093.443212] ? __kernel_read+0xd6/0x150 >>>>>>> [ 5093.445140] __do_sys_finit_module+0xd3/0xf0 >>>>>>> [ 5093.446877] __x64_sys_finit_module+0x1a/0x20 >>>>>>> [ 5093.449098] do_syscall_64+0x38/0x50 >>>>>>> [ 5093.450877] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>>>> [ 5093.456153] RIP: 0033:0x7f018f75c43d >>>>>>> [ 5093.457728] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa >>>>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f >>>>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 6a 0c 00 f7 d8 64 89 01 48 >>>>>>> [ 5093.466349] RSP: 002b:7ffd7f080368 EFLAGS: 0246 ORIG_RAX: >>>>>>> 0139 >>>>>>> [ 5093.470613] RAX: ffda RBX: 557e5c96f9c0 RCX: >>>>>>> 7f018f75c43d >>>>>>> [ 5093.474747] RDX: RSI: 557e5c964288 RDI: >>>>>>> 0003 >>>>>>> [ 5093.478049] RBP: 0004 R08: R09: >>>>>>> >>>>>>> [ 5093.481298] R10: 0003 R11: 0246 R12: >>>>>>> >>>>>>> [ 5093.483725] R13: 557e5c964288 R14: 557e5c96f950 R15: >>>>>>> 557e5c9775c0 >>>>>>> [ 5093.485778] Modules linked in: virtio_vdpa(+) vdpa sunrpc kvm_intel >>>>>>> kvm irqbypass virtio_blk virtio_rng rng_core [last
Re: scripts/gdb: issues when loading modules after lx-symbols
On 05.10.20 11:29, Stefano Garzarella wrote: > On Mon, Oct 05, 2020 at 10:33:30AM +0200, Jan Kiszka wrote: >> On 05.10.20 10:14, Stefano Garzarella wrote: >>> On Sun, Oct 04, 2020 at 08:52:37PM +0200, Jan Kiszka wrote: >>>> On 01.10.20 16:31, Stefano Garzarella wrote: >>>>> Hi, >>>>> I had some issues with gdb scripts and kernel modules in Linux 5.9-rc7. >>>>> >>>>> If the modules are already loaded, and I do 'lx-symbols', all work fine. >>>>> But, if I load a kernel module after 'lx-symbols', I had this issue: >>>>> >>>>> [ 5093.393940] invalid opcode: [#1] SMP PTI >>>>> [ 5093.395134] CPU: 0 PID: 576 Comm: modprobe Not tainted >>>>> 5.9.0-rc7-ste-00010-gf0b671d9608d-dirty #2 >>>>> [ 5093.397566] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >>>>> 1.13.0-2.fc32 04/01/2014 >>>>> [ 5093.400761] RIP: 0010:do_init_module+0x1/0x270 >>>>> [ 5093.402553] Code: ff ff e9 cf fe ff ff 0f 0b 49 c7 c4 f2 ff ff ff e9 >>>>> c1 fe ff ff e8 5f b2 65 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>> cc <1f> 44 00 00 55 ba 10 00 00 00 be c0 0c 00 00 48 89 e5 41 56 41 55 >>>>> [ 5093.409505] RSP: 0018:c9563d18 EFLAGS: 00010246 >>>>> [ 5093.412056] RAX: RBX: c010a0c0 RCX: >>>>> 4ee3 >>>>> [ 5093.414472] RDX: 4ee2 RSI: ea0001efe188 RDI: >>>>> c010a0c0 >>>>> [ 5093.416349] RBP: c9563e50 R08: R09: >>>>> 0002 >>>>> [ 5093.418044] R10: 0096 R11: 08a4 R12: >>>>> 88807a0d1280 >>>>> [ 5093.424721] R13: c010a110 R14: 88807a0d1300 R15: >>>>> c9563e70 >>>>> [ 5093.427138] FS: 7f018f632740() GS:88807dc0() >>>>> knlGS: >>>>> [ 5093.430037] CS: 0010 DS: ES: CR0: 80050033 >>>>> [ 5093.432279] CR2: 55fbe282b239 CR3: 7922a006 CR4: >>>>> 00170ef0 >>>>> [ 5093.435096] DR0: DR1: DR2: >>>>> >>>>> [ 5093.436765] DR3: DR6: fffe0ff0 DR7: >>>>> 0400 >>>>> [ 5093.439689] Call Trace: >>>>> [ 5093.440954] ? load_module+0x24b6/0x27d0 >>>>> [ 5093.443212] ? __kernel_read+0xd6/0x150 >>>>> [ 5093.445140] __do_sys_finit_module+0xd3/0xf0 >>>>> [ 5093.446877] __x64_sys_finit_module+0x1a/0x20 >>>>> [ 5093.449098] do_syscall_64+0x38/0x50 >>>>> [ 5093.450877] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [ 5093.456153] RIP: 0033:0x7f018f75c43d >>>>> [ 5093.457728] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa >>>>> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f >>>>> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 6a 0c 00 f7 d8 64 89 01 48 >>>>> [ 5093.466349] RSP: 002b:7ffd7f080368 EFLAGS: 0246 ORIG_RAX: >>>>> 0139 >>>>> [ 5093.470613] RAX: ffda RBX: 557e5c96f9c0 RCX: >>>>> 7f018f75c43d >>>>> [ 5093.474747] RDX: RSI: 557e5c964288 RDI: >>>>> 0003 >>>>> [ 5093.478049] RBP: 0004 R08: R09: >>>>> >>>>> [ 5093.481298] R10: 0003 R11: 0246 R12: >>>>> >>>>> [ 5093.483725] R13: 557e5c964288 R14: 557e5c96f950 R15: >>>>> 557e5c9775c0 >>>>> [ 5093.485778] Modules linked in: virtio_vdpa(+) vdpa sunrpc kvm_intel >>>>> kvm irqbypass virtio_blk virtio_rng rng_core [last unloaded: virtio_vdpa] >>>>> [ 5093.488695] ---[ end trace 23712ecebc11f53c ]--- >>>>> >>>>> Guest kernel: Linux 5.9-rc7 >>>>> gdb: GNU gdb (GDB) Fedora 9.1-6.fc32 >>>>> I tried with QEMU 4.2.1 and the latest master branch: same issue. >>>>> >>>>> >>>>> I did some digging, and skipping the gdb 'add-symbol-file' command in >>>>> symbol.py >>>>> avoid the issue, but of course I don't have the symbols loaded: >>>>> >>>>> diff --git a/scripts/gdb
Re: scripts/gdb: issues when loading modules after lx-symbols
On 05.10.20 10:14, Stefano Garzarella wrote: > On Sun, Oct 04, 2020 at 08:52:37PM +0200, Jan Kiszka wrote: >> On 01.10.20 16:31, Stefano Garzarella wrote: >>> Hi, >>> I had some issues with gdb scripts and kernel modules in Linux 5.9-rc7. >>> >>> If the modules are already loaded, and I do 'lx-symbols', all work fine. >>> But, if I load a kernel module after 'lx-symbols', I had this issue: >>> >>> [ 5093.393940] invalid opcode: [#1] SMP PTI >>> [ 5093.395134] CPU: 0 PID: 576 Comm: modprobe Not tainted >>> 5.9.0-rc7-ste-00010-gf0b671d9608d-dirty #2 >>> [ 5093.397566] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >>> 1.13.0-2.fc32 04/01/2014 >>> [ 5093.400761] RIP: 0010:do_init_module+0x1/0x270 >>> [ 5093.402553] Code: ff ff e9 cf fe ff ff 0f 0b 49 c7 c4 f2 ff ff ff e9 c1 >>> fe ff ff e8 5f b2 65 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 cc >>> <1f> 44 00 00 55 ba 10 00 00 00 be c0 0c 00 00 48 89 e5 41 56 41 55 >>> [ 5093.409505] RSP: 0018:c9563d18 EFLAGS: 00010246 >>> [ 5093.412056] RAX: RBX: c010a0c0 RCX: >>> 4ee3 >>> [ 5093.414472] RDX: 4ee2 RSI: ea0001efe188 RDI: >>> c010a0c0 >>> [ 5093.416349] RBP: c9563e50 R08: R09: >>> 0002 >>> [ 5093.418044] R10: 0096 R11: 08a4 R12: >>> 88807a0d1280 >>> [ 5093.424721] R13: c010a110 R14: 88807a0d1300 R15: >>> c9563e70 >>> [ 5093.427138] FS: 7f018f632740() GS:88807dc0() >>> knlGS: >>> [ 5093.430037] CS: 0010 DS: ES: CR0: 80050033 >>> [ 5093.432279] CR2: 55fbe282b239 CR3: 7922a006 CR4: >>> 00170ef0 >>> [ 5093.435096] DR0: DR1: DR2: >>> >>> [ 5093.436765] DR3: DR6: fffe0ff0 DR7: >>> 0400 >>> [ 5093.439689] Call Trace: >>> [ 5093.440954] ? load_module+0x24b6/0x27d0 >>> [ 5093.443212] ? __kernel_read+0xd6/0x150 >>> [ 5093.445140] __do_sys_finit_module+0xd3/0xf0 >>> [ 5093.446877] __x64_sys_finit_module+0x1a/0x20 >>> [ 5093.449098] do_syscall_64+0x38/0x50 >>> [ 5093.450877] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [ 5093.456153] RIP: 0033:0x7f018f75c43d >>> [ 5093.457728] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 >>> 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 >>> <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 6a 0c 00 f7 d8 64 89 01 48 >>> [ 5093.466349] RSP: 002b:7ffd7f080368 EFLAGS: 0246 ORIG_RAX: >>> 0139 >>> [ 5093.470613] RAX: ffda RBX: 557e5c96f9c0 RCX: >>> 7f018f75c43d >>> [ 5093.474747] RDX: RSI: 557e5c964288 RDI: >>> 0003 >>> [ 5093.478049] RBP: 0004 R08: R09: >>> >>> [ 5093.481298] R10: 0003 R11: 0246 R12: >>> >>> [ 5093.483725] R13: 557e5c964288 R14: 557e5c96f950 R15: >>> 557e5c9775c0 >>> [ 5093.485778] Modules linked in: virtio_vdpa(+) vdpa sunrpc kvm_intel kvm >>> irqbypass virtio_blk virtio_rng rng_core [last unloaded: virtio_vdpa] >>> [ 5093.488695] ---[ end trace 23712ecebc11f53c ]--- >>> >>> Guest kernel: Linux 5.9-rc7 >>> gdb: GNU gdb (GDB) Fedora 9.1-6.fc32 >>> I tried with QEMU 4.2.1 and the latest master branch: same issue. >>> >>> >>> I did some digging, and skipping the gdb 'add-symbol-file' command in >>> symbol.py >>> avoid the issue, but of course I don't have the symbols loaded: >>> >>> diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py >>> index 1be9763cf8bb..eadfaa4d4907 100644 >>> --- a/scripts/gdb/linux/symbols.py >>> +++ b/scripts/gdb/linux/symbols.py >>> @@ -129,7 +129,7 @@ lx-symbols command.""" >>> filename=module_file, >>> addr=module_addr, >>> sections=self._section_arguments(module)) >>> -gdb.execute(cmdline, to_string=True) >>> +#gdb.execute(cmdline, to_string=True) >>> if module_name not in self.loaded_modules: >>>
Re: scripts/gdb: issues when loading modules after lx-symbols
On 01.10.20 16:31, Stefano Garzarella wrote: > Hi, > I had some issues with gdb scripts and kernel modules in Linux 5.9-rc7. > > If the modules are already loaded, and I do 'lx-symbols', all work fine. > But, if I load a kernel module after 'lx-symbols', I had this issue: > > [ 5093.393940] invalid opcode: [#1] SMP PTI > [ 5093.395134] CPU: 0 PID: 576 Comm: modprobe Not tainted > 5.9.0-rc7-ste-00010-gf0b671d9608d-dirty #2 > [ 5093.397566] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.13.0-2.fc32 04/01/2014 > [ 5093.400761] RIP: 0010:do_init_module+0x1/0x270 > [ 5093.402553] Code: ff ff e9 cf fe ff ff 0f 0b 49 c7 c4 f2 ff ff ff e9 c1 fe > ff ff e8 5f b2 65 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 cc <1f> 44 > 00 00 55 ba 10 00 00 00 be c0 0c 00 00 48 89 e5 41 56 41 55 > [ 5093.409505] RSP: 0018:c9563d18 EFLAGS: 00010246 > [ 5093.412056] RAX: RBX: c010a0c0 RCX: > 4ee3 > [ 5093.414472] RDX: 4ee2 RSI: ea0001efe188 RDI: > c010a0c0 > [ 5093.416349] RBP: c9563e50 R08: R09: > 0002 > [ 5093.418044] R10: 0096 R11: 08a4 R12: > 88807a0d1280 > [ 5093.424721] R13: c010a110 R14: 88807a0d1300 R15: > c9563e70 > [ 5093.427138] FS: 7f018f632740() GS:88807dc0() > knlGS: > [ 5093.430037] CS: 0010 DS: ES: CR0: 80050033 > [ 5093.432279] CR2: 55fbe282b239 CR3: 7922a006 CR4: > 00170ef0 > [ 5093.435096] DR0: DR1: DR2: > > [ 5093.436765] DR3: DR6: fffe0ff0 DR7: > 0400 > [ 5093.439689] Call Trace: > [ 5093.440954] ? load_module+0x24b6/0x27d0 > [ 5093.443212] ? __kernel_read+0xd6/0x150 > [ 5093.445140] __do_sys_finit_module+0xd3/0xf0 > [ 5093.446877] __x64_sys_finit_module+0x1a/0x20 > [ 5093.449098] do_syscall_64+0x38/0x50 > [ 5093.450877] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 5093.456153] RIP: 0033:0x7f018f75c43d > [ 5093.457728] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 > f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 8b 0d 2b 6a 0c 00 f7 d8 64 89 01 48 > [ 5093.466349] RSP: 002b:7ffd7f080368 EFLAGS: 0246 ORIG_RAX: > 0139 > [ 5093.470613] RAX: ffda RBX: 557e5c96f9c0 RCX: > 7f018f75c43d > [ 5093.474747] RDX: RSI: 557e5c964288 RDI: > 0003 > [ 5093.478049] RBP: 0004 R08: R09: > > [ 5093.481298] R10: 0003 R11: 0246 R12: > > [ 5093.483725] R13: 557e5c964288 R14: 557e5c96f950 R15: > 557e5c9775c0 > [ 5093.485778] Modules linked in: virtio_vdpa(+) vdpa sunrpc kvm_intel kvm > irqbypass virtio_blk virtio_rng rng_core [last unloaded: virtio_vdpa] > [ 5093.488695] ---[ end trace 23712ecebc11f53c ]--- > > Guest kernel: Linux 5.9-rc7 > gdb: GNU gdb (GDB) Fedora 9.1-6.fc32 > I tried with QEMU 4.2.1 and the latest master branch: same issue. > > > I did some digging, and skipping the gdb 'add-symbol-file' command in > symbol.py > avoid the issue, but of course I don't have the symbols loaded: > > diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py > index 1be9763cf8bb..eadfaa4d4907 100644 > --- a/scripts/gdb/linux/symbols.py > +++ b/scripts/gdb/linux/symbols.py > @@ -129,7 +129,7 @@ lx-symbols command.""" > filename=module_file, > addr=module_addr, > sections=self._section_arguments(module)) > -gdb.execute(cmdline, to_string=True) > +#gdb.execute(cmdline, to_string=True) > if module_name not in self.loaded_modules: > self.loaded_modules.append(module_name) > else: > > I tried several modules and this happens every time after '(gdb) lx-symbols'. > > Do you have any hints? > I assume you are debugging a kernel inside QEMU/KVM, right? Does it work without -enable-kvm? Debugging guests in KVM mode at least was unstable for a long time. I avoided setting soft-BPs - which is what the script does for the sake of tracking modules loading -, falling back to hw-BPs, as I had no time to debug that further. /Maybe/ that's the issue here. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [virtio-comment] Re: [RFC] ivshmem v2: Shared memory device specification
On 27.07.20 15:56, Michael S. Tsirkin wrote: On Mon, Jul 27, 2020 at 03:39:32PM +0200, Jan Kiszka wrote: On 27.07.20 15:20, Michael S. Tsirkin wrote: On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote: Vendor Specific Capability (ID 09h) This capability must always be present. | Offset | Register| Content | |---:|:|:---| |00h | ID | 09h | |01h | Next Capability | Pointer to next capability or 00h | |02h | Length | 20h if Base Address is present, 18h otherwise | |03h | Privileged Control | Bit 0 (read/write): one-shot interrupt mode | || | Bits 1-7: Reserved (0 on read, writes ignored) | |04h | State Table Size| 32-bit size of read-only State Table | |08h | R/W Section Size| 64-bit size of common read/write section | |10h | Output Section Size | 64-bit size of output sections | |18h | Base Address| optional: 64-bit base address of shared memory | All registers are read-only. Writes are ignored, except to bit 0 of the Privileged Control register. Is there value in making this follow the virtio vendor-specific capability format? That will cost several extra bytes - do you envision having many of these in the config space? Of course, this could be modeled with via virtio_pci_cap as well. Would add 12 unused by bytes and one type byte. If it helps to make the device look more virtio'ish, but I'm afraid there are more differences at PCI level. I guess it will be useful if we ever find it handy to make an ivshmem device also be a virtio device. Can't say why yet but if we don't care it vaguely seems kind of like a good idea. I guess it will also be handy if you ever need another vendor specific cap: you already get a way to identify it without breaking drivers. I can look into that. Those 12 wasted bytes are a bit ugly, but so far we are not short on config space, even in the non-extended range. More problematic is that the existing specification of virtio_pci_cap assumes that this describes a structure in a PCI resource, rather than even being that data itself, and even a register (privileged control). Would it be possible to split the types into two ranges, one for the existing structure, one for others - like ivshmem - that will only share the cfg_type field? I do not see a use case for having multiple of those caps above per device. If someone comes around with a valid use case for having multiple, non-consequitive shared memory regions for one device, we would need to add registers for them. But that would also only work for non-BAR regions due to limited BARs. OK, I guess this answers the below too. Also, do we want to define an extended capability format in case this is a pci extended capability? What would be the practical benefit? Do you see PCIe caps that could become useful in virtual setups? So if we ever have a huge number of these caps, PCIe allows many more caps. We don't do that for regular virtio devices either, do we? We don't, there's a small number of these so we don't run out of config space. Right. But then it would not a be a problem to add PCIe (right before adding it becomes impossible) and push new caps into the extended space. And all that without breaking existing drivers. It's just a cap, and the spec so far does not state that there must be no other cap, neither in current virtio nor this ivshmem device. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [virtio-comment] Re: [RFC] ivshmem v2: Shared memory device specification
On 27.07.20 15:20, Michael S. Tsirkin wrote: On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote: Vendor Specific Capability (ID 09h) This capability must always be present. | Offset | Register| Content | |---:|:|:---| |00h | ID | 09h | |01h | Next Capability | Pointer to next capability or 00h | |02h | Length | 20h if Base Address is present, 18h otherwise | |03h | Privileged Control | Bit 0 (read/write): one-shot interrupt mode | || | Bits 1-7: Reserved (0 on read, writes ignored) | |04h | State Table Size| 32-bit size of read-only State Table | |08h | R/W Section Size| 64-bit size of common read/write section | |10h | Output Section Size | 64-bit size of output sections | |18h | Base Address| optional: 64-bit base address of shared memory | All registers are read-only. Writes are ignored, except to bit 0 of the Privileged Control register. Is there value in making this follow the virtio vendor-specific capability format? That will cost several extra bytes - do you envision having many of these in the config space? Of course, this could be modeled with via virtio_pci_cap as well. Would add 12 unused by bytes and one type byte. If it helps to make the device look more virtio'ish, but I'm afraid there are more differences at PCI level. I do not see a use case for having multiple of those caps above per device. If someone comes around with a valid use case for having multiple, non-consequitive shared memory regions for one device, we would need to add registers for them. But that would also only work for non-BAR regions due to limited BARs. Also, do we want to define an extended capability format in case this is a pci extended capability? What would be the practical benefit? Do you see PCIe caps that could become useful in virtual setups? We don't do that for regular virtio devices either, do we? Thanks, Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [PATCH] KVM: fix CPU reset wrt HF2_GIF_MASK
On 23.07.20 16:27, Vitaly Kuznetsov wrote: HF2_GIF_MASK is set in env->hflags2 unconditionally on CPU reset (see x86_cpu_reset()) but when calling KVM_SET_NESTED_STATE, KVM_STATE_NESTED_GIF_SET is only valid for nSVM as e.g. nVMX code looks like if (kvm_state->hdr.vmx.vmxon_pa == -1ull) { if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS) return -EINVAL; } Also, when adjusting the environment after KVM_GET_NESTED_STATE we need not reset HF2_GIF_MASK on VMX as e.g. x86_cpu_pending_interrupt() expects it to be set. Alternatively, we could've made env->hflags2 SVM-only. Reported-by: Jan Kiszka Fixes: b16c0e20c742 ("KVM: add support for AMD nested live migration") Signed-off-by: Vitaly Kuznetsov --- target/i386/kvm.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 2b6b7443d273..3c4647fbfbf6 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -3883,7 +3883,9 @@ static int kvm_put_nested_state(X86CPU *cpu) } else { env->nested_state->flags &= ~KVM_STATE_NESTED_GUEST_MODE; } -if (env->hflags2 & HF2_GIF_MASK) { + +/* Don't set KVM_STATE_NESTED_GIF_SET on VMX as it is illegal */ +if (cpu_has_svm(env) && (env->hflags2 & HF2_GIF_MASK)) { env->nested_state->flags |= KVM_STATE_NESTED_GIF_SET; } else { env->nested_state->flags &= ~KVM_STATE_NESTED_GIF_SET; @@ -3925,10 +3927,14 @@ static int kvm_get_nested_state(X86CPU *cpu) } else { env->hflags &= ~HF_GUEST_MASK; } -if (env->nested_state->flags & KVM_STATE_NESTED_GIF_SET) { -env->hflags2 |= HF2_GIF_MASK; -} else { -env->hflags2 &= ~HF2_GIF_MASK; + +/* Keep HF2_GIF_MASK set on !SVM as x86_cpu_pending_interrupt() needs it */ +if (cpu_has_svm(env)) { +if (env->nested_state->flags & KVM_STATE_NESTED_GIF_SET) { +env->hflags2 |= HF2_GIF_MASK; + } else { +env->hflags2 &= ~HF2_GIF_MASK; +} } return ret; Tested-by: Jan Kiszka Thanks! Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: 5.1.0-rc1 regression: reset fails with kvm and -cpu host
On 23.07.20 14:52, Dr. David Alan Gilbert wrote: * Vitaly Kuznetsov (vkuzn...@redhat.com) wrote: Philippe Mathieu-Daudé writes: +Vitaly On 7/23/20 10:40 AM, Dr. David Alan Gilbert wrote: * Eduardo Habkost (ehabk...@redhat.com) wrote: On Wed, Jul 22, 2020 at 04:47:32PM -0400, Eduardo Habkost wrote: On Wed, Jul 22, 2020 at 08:05:01PM +0200, Jan Kiszka wrote: On 22.07.20 19:35, Eduardo Habkost wrote: Hi Jan, What was the last version where it worked for you? Does using "-cpu host,-vmx" help? Yeah, -vmx does indeed help. I didn't have the time to bisect yet. Just check my reflog, picked eb6490f544, and that works. Thanks! I could reproduce it locally[1], I will bisect it. The good news is that "-cpu host,+vmx" still works, on commit eb6490f544. [1] Linux 5.6.19-300.fc32.x86_64, Intel Core i7-8665U CPU. Bisected to: commit b16c0e20c74218f2d69710cedad11da7dd4d2190 Author: Paolo Bonzini Date: Wed May 20 10:49:22 2020 -0400 KVM: add support for AMD nested live migration Support for nested guest live migration is part of Linux 5.8, add the corresponding code to QEMU. The migration format consists of a few flags, is an opaque 4k blob. The blob is in VMCB format (the control area represents the L1 VMCB control fields, the save area represents the pre-vmentry state; KVM does not use the host save area since the AMD manual allows that) but QEMU does not really care about that. However, the flags need to be copied to hflags/hflags2 and back. In addition, support for retrieving and setting the AMD nested virtualization states allows the L1 guest to be reset while running a nested guest, but a small bug in CPU reset needs to be fixed for that to work. Signed-off-by: Paolo Bonzini Guesswork led me to try reverting the chunk in kvm_put_nested_state; without it the reset seems to work; I can't explain that code though. (sorry, missed the beginning of the discussion) So one does: (qemu) system_reset on Intel wiht '-cpu host' and the result is: (qemu) KVM: entry failed, hardware error 0x8021 Interesting; I hadn't seen that error - I just see a hard hung guest rather than a reset one. I've seen it once or twice, maybe that was also with a more complex command line. The point is that an invalid state is very likely loaded on reset. Not all invalid states cause KVM to complain, though. Some just lock up the guest. Jan (i7-8650U laptop 5.7.9 fedora 32) Dave If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX=0064 EBX=91df5efe ECX= EDX=03f8 ESI= EDI=91ee32c0 EBP=90643260 ESP=00013c68 EIP=906428e6 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? I can take a look (if no one beats me to it). -- Vitaly -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [virtio-comment] [RFC] ivshmem v2: Shared memory device specification
On 23.07.20 08:54, Stefan Hajnoczi wrote: On Fri, Jul 17, 2020 at 06:15:58PM +0200, Jan Kiszka wrote: On 15.07.20 15:27, Stefan Hajnoczi wrote: On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote: Thanks for the responses. It would be great to update the spec with these clarifications. If BAR 2 is not present, the shared memory region is not relocatable by the user. In that case, the hypervisor has to implement the Base Address register in the vendor-specific capability. What does relocatable mean in this context? That the guest can decide (via BAR) where the resource should show up in the physical guest address space. We do not want to support this in setups like for static partitioning hypervisors, and then we use that side-channel read-only configuration. I see. I'm not sure what is vendor-specific about non-relocatable shared memory. I guess it could be added to the spec too? That "vendor-specific" comes from the PCI spec which - to my understanding - provides us no other means to introduce registers to the config space that are outside of the PCI spec. I could introduce a name for the ivshmem vendor cap and use that name here - would that be better? In any case, since "relocatable" hasn't been fully defined, I suggest making the statement more general: If BAR 2 is not present the hypervisor has to implement the Base Address Register in the vendor-specific capability. This can be used for vendor-specific shared memory functionality. Will integrate this. Thanks, Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: 5.1.0-rc1 regression: reset fails with kvm and -cpu host
On 22.07.20 19:35, Eduardo Habkost wrote: Hi Jan, What was the last version where it worked for you? Does using "-cpu host,-vmx" help? Yeah, -vmx does indeed help. I didn't have the time to bisect yet. Just check my reflog, picked eb6490f544, and that works. HTH, Jan On Wed, Jul 22, 2020 at 11:15:43AM +0200, Jan Kiszka wrote: Hi all, this locks up the guest: - qemu-system-x86_64 -enable-kvm -cpu host - trigger hard reset Host kernel: 5.7.7. Host CPU: i7-8850H Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
5.1.0-rc1 regression: reset fails with kvm and -cpu host
Hi all, this locks up the guest: - qemu-system-x86_64 -enable-kvm -cpu host - trigger hard reset Host kernel: 5.7.7. Host CPU: i7-8850H Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: Inter-VM device emulation (call on Mon 20th July 2020)
On 21.07.20 12:49, Alex Bennée wrote: Stefan Hajnoczi writes: Thank you everyone who joined! I didn't take notes but two things stood out: 1. The ivshmem v2 and virtio-vhost-user use cases are quite different so combining them does not seem realistic. ivshmem v2 needs to be as simple for the hypervisor to implement as possible even if this involves some sacrifices (e.g. not transparent to the Driver VM that is accessing the device, performance). virtio-vhost-user is more aimed at general-purpose device emulation although support for arbitrary devices (e.g. PCI) would be important to serve all use cases. I believe my phone gave up on the last few minutes of the call so I'll just say we are interested in being able to implement arbitrary devices in the inter-VM silos. Devices we are looking at: virtio-audio virtio-video these are performance sensitive devices which provide a HAL abstraction to a common software core. virtio-rpmb this is a secure device where the backend may need to reside in a secure virtualised world. virtio-scmi this is a more complex device which allows the guest to make power and clock demands from the firmware. Needless to say this starts to become complex with multiple moving parts. The flexibility of vhost-user seems to match up quite well with wanting to have a reasonably portable backend that just needs to be fed signals and a memory mapping. However we don't want daemons to automatically have a full view of the whole of the guests system memory. 2. Alexander Graf's idea for a new Linux driver that provides an enforcing software IOMMU. This would be a character device driver that is mmapped by the device emulation process (either vhost-user-style on the host or another VMM for inter-VM device emulation). The Driver VMM can program mappings into the device and the page tables in the device emulation process will be updated. This way the Driver VMM can share memory specific regions of guest RAM with the device emulation process and revoke those mappings later. I'm wondering if there is enough plumbing on the guest side so a guest can use the virtio-iommu to mark out exactly which bits of memory the virtual device can have access to? At a minimum the virtqueues need to be accessible and for larger transfers maybe a bounce buffer. However for speed you want as wide as possible mapping but no more. It would be nice for example if a block device could load data directly into the guests block cache (zero-copy) but without getting a view of the kernels internal data structures. Welcome to a classic optimization triangle: - speed -> direct mappings - security -> restricted mapping - simplicity -> static mapping Pick two, you can't have them all. Well, you could try a little bit more of one, at the price of losing on another. But that's it. We chose the last two, ending up with probably the simplest but not fastest solution for type-1 hypervisors like Jailhouse. Specifically for non-Linux use cases, legacy RTOSes, often with limited driver stacks, having not only virtio but also even simpler channels over application-defined shared memory layouts is a requirement. Another thing that came across in the call was quite a lot of assumptions about QEMU and Linux w.r.t virtio. While our project will likely have Linux as a guest OS we are looking specifically at enabling virtio for Type-1 hypervisors like Xen and the various safety certified proprietary ones. It is unlikely that QEMU would be used as the VMM for these deployments. We want to work out what sort of common facilities hypervisors need to support to enable virtio so the daemons can be re-usable and maybe setup with a minimal shim for the particular hypervisor in question. I'm with you regarding stacks that are mappable not only on QEMU/Linux. And also one that does not let the certification costs sky-rocket because of its mandated implementation complexity. I'm not sure anymore if there will be only one device model. Maybe we should eventually think about a backend layer that can sit on something like virtio-vhost-user as well as on ivshmem-virtio, allowing the same device backend code to be plumbed into both transports. Why shouldn't work what already works well under Linux with the frontend device drivers vs. virtio transports? Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [PATCH for-5.1] target/arm: Always pass cacheattr in S1_ptw_translate
On 21.07.20 18:35, Richard Henderson wrote: When we changed the interface of get_phys_addr_lpae to require the cacheattr parameter, this spot was missed. The compiler is unable to detect the use of NULL vs the nonnull attribute here. Fixes: 7e98e21c098 Reported-by: Jan Kiszka Signed-off-by: Richard Henderson --- target/arm/helper.c | 19 ++- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index c69a2baf1d..8ef0fb478f 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -10204,21 +10204,11 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx, int s2prot; int ret; ARMCacheAttrs cacheattrs = {}; -ARMCacheAttrs *pcacheattrs = NULL; - -if (env->cp15.hcr_el2 & HCR_PTW) { -/* - * PTW means we must fault if this S1 walk touches S2 Device - * memory; otherwise we don't care about the attributes and can - * save the S2 translation the effort of computing them. - */ -pcacheattrs = -} ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2, false, , , , , fi, - pcacheattrs); + ); if (ret) { assert(fi->type != ARMFault_None); fi->s2addr = addr; @@ -10226,8 +10216,11 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx, fi->s1ptw = true; return ~0; } -if (pcacheattrs && (pcacheattrs->attrs & 0xf0) == 0) { -/* Access was to Device memory: generate Permission fault */ +if ((env->cp15.hcr_el2 & HCR_PTW) && (cacheattrs.attrs & 0xf0) == 0) { +/* + * PTW set and S1 walk touched S2 Device memory: + * generate Permission fault. + */ fi->type = ARMFault_Permission; fi->s2addr = addr; fi->stage2 = true; Jup: Tested-by: Jan Kiszka Thanks, Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
aarch64: Crash with qemu master when starting Jailhouse
Hi, I've seen this first a couple of weeks ago, ignored it, but it's still there today with master: Thread 13 "qemu-system-aar" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f90e2ffd700 (LWP 26883)] 0x560ef0bddda7 in get_phys_addr_lpae (env=, address=address@entry=1095261192, access_type=access_type@entry=MMU_DATA_LOAD, mmu_idx=mmu_idx@entry=ARMMMUIdx_Stage2, s1_is_el0=s1_is_el0@entry=false, phys_ptr=phys_ptr@entry=0x7f90e2ffc200, txattrs=0x7f90e2ffc1ec, prot=0x7f90e2ffc1f0, page_size_ptr=0x7f90e2ffc1f8, fi=0x7f90e2ffc530, cacheattrs=0x0) at /data/qemu/target/arm/helper.c:11106 11106 cacheattrs->attrs = convert_stage2_attrs(env, extract32(attrs, 0, 4)); (gdb) bt #0 0x560ef0bddda7 in get_phys_addr_lpae (env=, address=address@entry=1095261192, access_type=access_type@entry=MMU_DATA_LOAD, mmu_idx=mmu_idx@entry=ARMMMUIdx_Stage2, s1_is_el0=s1_is_el0@entry=false, phys_ptr=phys_ptr@entry=0x7f90e2ffc200, txattrs=0x7f90e2ffc1ec, prot=0x7f90e2ffc1f0, page_size_ptr=0x7f90e2ffc1f8, fi=0x7f90e2ffc530, cacheattrs=0x0) at /data/qemu/target/arm/helper.c:11106 #1 0x560ef0bde3c6 in S1_ptw_translate (env=env@entry=0x560ef32742b0, mmu_idx=mmu_idx@entry=ARMMMUIdx_Stage1_E1, addr=1095261192, txattrs=..., fi=fi@entry=0x7f90e2ffc530) at /data/qemu/target/arm/helper.c:10218 #2 0x560ef0bdd7f0 in arm_ldq_ptw (fi=0x7f90e2ffc530, mmu_idx=ARMMMUIdx_Stage1_E1, is_secure=false, addr=, cs=0x560ef326ac10) at /data/qemu/target/arm/helper.c:10284 #3 0x560ef0bdd7f0 in get_phys_addr_lpae (env=env@entry=0x560ef32742b0, address=address@entry=18446674270391351284, access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=mmu_idx@entry=ARMMMUIdx_Stage1_E1, s1_is_el0=s1_is_el0@entry=false, phys_ptr=phys_ptr@entry=0x7f90e2ffc490, txattrs=0x7f90e2ffc518, prot=0x7f90e2ffc514, page_size_ptr=0x7f90e2ffc528, fi=0x7f90e2ffc530, cacheattrs=0x7f90e2ffc51c) at /data/qemu/target/arm/helper.c:11014 #4 0x560ef0bdfacb in get_phys_addr (env=env@entry=0x560ef32742b0, address=, address@entry=18446674270391351284, access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=, mmu_idx@entry=ARMMMUIdx_Stage1_E1, phys_ptr=phys_ptr@entry=0x7f90e2ffc490, attrs=attrs@entry=0x7f90e2ffc518, prot=0x7f90e2ffc514, page_size=0x7f90e2ffc528, fi=0x7f90e2ffc530, cacheattrs=0x7f90e2ffc51c) at /data/qemu/target/arm/helper.c:12115 #5 0x560ef0bdf5ca in get_phys_addr (env=env@entry=0x560ef32742b0, address=address@entry=18446674270391351284, access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=, phys_ptr=phys_ptr@entry=0x7f90e2ffc520, attrs=attrs@entry=0x7f90e2ffc518, prot=0x7f90e2ffc514, page_size=0x7f90e2ffc528, fi=0x7f90e2ffc530, cacheattrs=0x7f90e2ffc51c) at /data/qemu/target/arm/helper.c:11950 #6 0x560ef0bef669 in arm_cpu_tlb_fill (cs=0x560ef326ac10, address=18446674270391351284, size=, access_type=MMU_INST_FETCH, mmu_idx=2, probe=, retaddr=0) at /data/qemu/target/arm/tlb_helper.c:177 #7 0x560ef0adbd85 in tlb_fill (cpu=0x560ef326ac10, addr=18446674270391351284, size=0, access_type=MMU_INST_FETCH, mmu_idx=2, retaddr=0) at /data/qemu/accel/tcg/cputlb.c:1032 #8 0x560ef0adf216 in get_page_addr_code_hostp (env=, addr=addr@entry=18446674270391351284, hostp=hostp@entry=0x0) at /data/qemu/accel/tcg/cputlb.c:1211 #9 0x560ef0adf287 in get_page_addr_code (env=, addr=addr@entry=18446674270391351284) at /data/qemu/accel/tcg/cputlb.c:1243 #10 0x560ef0af21c4 in tb_htable_lookup (cpu=cpu@entry=0x560ef326ac10, pc=18446674270391351284, cs_base=, flags=2182107137, cf_mask=4278714368) at /data/qemu/accel/tcg/cpu-exec.c:337 #11 0x560ef0af2fd6 in tb_lookup__cpu_state (cf_mask=, flags=0x7f90e2ffc718, cs_base=0x7f90e2ffc720, pc=0x7f90e2ffc728, cpu=0x0) at /data/qemu/include/exec/tb-lookup.h:43 #12 0x560ef0af2fd6 in tb_find (cf_mask=524288, tb_exit=0, last_tb=0x0, cpu=0x0) at /data/qemu/accel/tcg/cpu-exec.c:404 #13 0x560ef0af2fd6 in cpu_exec (cpu=cpu@entry=0x560ef326ac10) at /data/qemu/accel/tcg/cpu-exec.c:748 #14 0x560ef0bb779f in tcg_cpu_exec (cpu=0x560ef326ac10) at /data/qemu/softmmu/cpus.c:1356 #15 0x560ef0bb980b in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x560ef326ac10) at /data/qemu/softmmu/cpus.c:1664 #16 0x560ef10aaf76 in qemu_thread_start (args=) at /data/qemu/util/qemu-thread-posix.c:521 #17 0x7f919e9434f9 in start_thread () at /lib64/libpthread.so.0 #18 0x7f919e67bf2f in clone () at /lib64/libc.so.6 I've reproduced that with a local Jailhouse installation, but I suspect (do not have the time right now to check) that a vanilla jailhouse- images [1] build for qemu-arm64 will trigger it as well. Once time permits, I could try to generate and share such an image. qemu 3.1.1.1 of my distro is fine, also f4d8cf148e43. Any ideas? Jan [1] https://github.com/siemens/jailhouse-images -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [virtio-comment] [RFC] ivshmem v2: Shared memory device specification
On 15.07.20 15:27, Stefan Hajnoczi wrote: On Mon, May 25, 2020 at 09:58:28AM +0200, Jan Kiszka wrote: IVSHMEM Device Specification ** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! ** Hi Jan, Thanks for posting this! I have a posted comments where I wasn't sure what the spec meant. The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to define the minimally needed building blocks a hypervisor has to provide for enabling guest-to-guest communication. The details of communication protocols shall remain opaque to the hypervisor so that guests are free to define them as simple or sophisticated as they need. For that purpose, the IVSHMEM provides the following features to its users: - Interconnection between up to 65536 peers - Multi-purpose shared memory region - common read/writable section - output sections that are read/writable for one peer and only readable for the others - section with peer states - Event signaling via interrupt to remote sides - Support for life-cycle management via state value exchange and interrupt notification on changes, backed by a shared memory section - Free choice of protocol to be used on top - Protocol type declaration - Register can be implemented either memory-mapped or via I/O, depending on platform support and lower VM-exit costs - Unprivileged access to memory-mapped or I/O registers feasible - Single discovery and configuration via standard PCI, no complexity by additionally defining a platform device model Hypervisor Model In order to provide a consistent link between peers, all connected instances of IVSHMEM devices need to be configured, created and run by the hypervisor according to the following requirements: - The instances of the device shall appear as a PCI device to their users. - The read/write shared memory section has to be of the same size for all peers. The size can be zero. - If shared memory output sections are present (non-zero section size), there must be one reserved for each peer with exclusive write access. All output sections must have the same size and must be readable for all peers. - The State Table must have the same size for all peers, must be large enough to hold the state values of all peers, and must be read-only for the user. Who/what is the "user"? I guess this simply means that the State Table is read-only and only the hypervisor can update the table entries? Read-only for the guest, right. I used the term "user" here and elsewhere, switching to "guest" might be better. - State register changes (explicit writes, peer resets) have to be propagated to the other peers by updating the corresponding State Table entry and issuing an interrupt to all other peers if they enabled reception. - Interrupts events triggered by a peer have to be delivered to the target peer, provided the receiving side is valid and has enabled the reception. - All peers must have the same interrupt delivery features available, i.e. MSI-X with the same maximum number of vectors on platforms supporting this mechanism, otherwise INTx with one vector. Guest-side Programming Model An IVSHMEM device appears as a PCI device to its users. Unless otherwise noted, it conforms to the PCI Local Bus Specification, Revision 3.0. As such, it is discoverable via the PCI configuration space and provides a number of standard and custom PCI configuration registers. ### Shared Memory Region Layout The shared memory region is divided into several sections. +-+ - | | : | Output Section for peer n-1 | : Output Section Size | (n = Maximum Peers) | : +-+ - : : : : : : +-+ - | | : | Output Section for peer 1 | : Output Section Size | | : +-+ - | | : | Output Section for peer 0 | : Output Section Size | | : +-+ - | | : | Read/Write Section | : R/W Section Size | | : +-+ - | | : | State Table | : State Table Size | | : +-+ <-- Shared memory base address The first section consists of the mandatory State Table. Its size is defined by the State Table Size register and cannot be zero. This section is rea
Re: Inter-VM device emulation (call on Mon 20th July 2020)
On 15.07.20 13:23, Stefan Hajnoczi wrote: > Hi, > Several projects are underway to create an inter-VM device emulation > interface: > > * ivshmem v2 >https://www.mail-archive.com/qemu-devel@nongnu.org/msg706465.html > >A PCI device that provides shared-memory communication between VMs. >This device already exists but is limited in its current form. The >"v2" project updates IVSHMEM's capabilities and makes it suitable as >a VIRTIO transport. > >Jan Kiszka is working on this and has posted specs for review. > > * virtio-vhost-user >https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg06429.html > >A VIRTIO device that transports the vhost-user protocol. Allows >vhost-user device emulation to be implemented by another VM. > >Nikos Dragazis is working on this with QEMU, DPDK, and VIRTIO patches >posted. > > * VFIO-over-socket > > https://github.com/tmakatos/qemu/blob/master/docs/devel/vfio-over-socket.rst > >Similar to the vhost-user protocol in spirit but for any PCI device. >Uses the Linux VFIO ioctl API as the protocol instead of vhost. > >It doesn't have a virtio-vhost-user equivalent yet, but the same >approach could be applied to VFIO-over-socket too. > >Thanos Makatos and John G. Johnson are working on this. The draft >spec is available. > > Let's have a call to figure out: > > 1. What is unique about these approaches and how do they overlap? > 2. Can we focus development and code review efforts to get something >merged sooner? > > Jan and Nikos: do you have time to join on Monday, 20th of July at 15:00 > UTC? > https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200720T1500 > Not at that slot, but one hour earlier or later would work for me (so far). Jan > Video call URL: https://bluejeans.com/240406010 > > It would be nice if Thanos and/or JJ could join the call too. Others > welcome too (feel free to forward this email)! > > Stefan > -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH v2] apic: Report current_count via 'info lapic'
On 07.02.20 07:43, Jan Kiszka wrote: From: Jan Kiszka This is helpful when debugging stuck guest timers. As we need apic_get_current_count for that, and it is really not emulation specific, move it to apic_common.c and export it. Fix its style at this chance as well. Signed-off-by: Jan Kiszka Reviewed-by: Philippe Mathieu-Daudé --- Changes in v2: - fix style of apic_get_current_count hw/intc/apic.c | 18 -- hw/intc/apic_common.c | 19 +++ include/hw/i386/apic_internal.h | 1 + target/i386/helper.c| 5 +++-- 4 files changed, 23 insertions(+), 20 deletions(-) diff --git a/hw/intc/apic.c b/hw/intc/apic.c index bd40467965..f2207d0ace 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -615,24 +615,6 @@ int apic_accept_pic_intr(DeviceState *dev) return 0; } -static uint32_t apic_get_current_count(APICCommonState *s) -{ -int64_t d; -uint32_t val; -d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - s->initial_count_load_time) >> -s->count_shift; -if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { -/* periodic */ -val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); -} else { -if (d >= s->initial_count) -val = 0; -else -val = s->initial_count - d; -} -return val; -} - static void apic_timer_update(APICCommonState *s, int64_t current_time) { if (apic_next_timer(s, current_time)) { diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index 9ec0f2deb2..fb432e83f2 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -189,6 +189,25 @@ bool apic_next_timer(APICCommonState *s, int64_t current_time) return true; } +uint32_t apic_get_current_count(APICCommonState *s) +{ +int64_t d; +uint32_t val; +d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - s->initial_count_load_time) >> +s->count_shift; +if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { +/* periodic */ +val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); +} else { +if (d >= s->initial_count) { +val = 0; +} else { +val = s->initial_count - d; +} +} +return val; +} + void apic_init_reset(DeviceState *dev) { APICCommonState *s; diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h index b04bdd947f..2597000e03 100644 --- a/include/hw/i386/apic_internal.h +++ b/include/hw/i386/apic_internal.h @@ -211,6 +211,7 @@ void vapic_report_tpr_access(DeviceState *dev, CPUState *cpu, target_ulong ip, TPRAccess access); int apic_get_ppr(APICCommonState *s); +uint32_t apic_get_current_count(APICCommonState *s); static inline void apic_set_bit(uint32_t *tab, int index) { diff --git a/target/i386/helper.c b/target/i386/helper.c index c3a6e4fabe..e3c3726c29 100644 --- a/target/i386/helper.c +++ b/target/i386/helper.c @@ -370,10 +370,11 @@ void x86_cpu_dump_local_apic_state(CPUState *cs, int flags) dump_apic_lvt("LVTTHMR", lvt[APIC_LVT_THERMAL], false); dump_apic_lvt("LVTT", lvt[APIC_LVT_TIMER], true); -qemu_printf("Timer\t DCR=0x%x (divide by %u) initial_count = %u\n", +qemu_printf("Timer\t DCR=0x%x (divide by %u) initial_count = %u" +" current_count = %u\n", s->divide_conf & APIC_DCR_MASK, divider_conf(s->divide_conf), -s->initial_count); +s->initial_count, apic_get_current_count(s)); qemu_printf("SPIV\t 0x%08x APIC %s, focus=%s, spurious vec %u\n", s->spurious_vec, Ping. Jan
Re: [RFC] ivshmem v2: Shared memory device specification
On 17.06.20 17:32, Alex Bennée wrote: > > Jan Kiszka writes: > >> Hi all, >> >> as requested by Michael, find below the current version of the Inter-VM >> Shared Memory device specification version 2 (as version 1 could be >> considered what is currently in QEMU). >> >> This posting is intended to collect feedback from the virtio community >> before officially proposing it to become part of the spec. As you can >> see, the spec format is not yet integrated with the virtio documents at >> all. >> >> IVSHMEM has value of its own, allowing unprivileged applications to >> establish links to other applications in VMs connected via this >> transport. In addition, and that is where virtio comes into play even >> more, it can be used to build virtio backend-frontend connections >> between two VMs. Prototypes have been developed, see e.g. [1], >> specifying that transport is still a to-do. I will try to reserve a few >> cycle in the upcoming weeks for a first draft. >> >> My current strategy for these two pieces, ivshmem2 and virtio-shmem, is >> propose them both to virtio but allowing virtio-shmem to also be mapped >> on other shared memory channels for virtual machines. >> >> The ivshmem2 device model is fairly stable now, also being in use in >> Jailhouse for quite a while. Still there are some aspects that could be >> worth to discuss in particular: >> >> - Do we also need a platform device model, in addition to PCI? My >>feelings are negative, but there has been at least one request. >> >> - Should we add some feature flags, or is using the PCI revision ID >>sufficient (...to do that later)? Currently, there is no need for >>communicating features this way. >> >> - Should we try to model the doorbell interface more flexibly, in way >>that may allow mapping it on hardware-provided mailboxes (i.e. VM- >>exit free channels)? Unfortunately, I'm not yet aware of any hardware >>that could provide this feature and, thus, could act as a use case to >>design against. >> >> Thanks, >> Jan >> >> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg668749.html >> >> --- >> >> IVSHMEM Device Specification >> >> >> ** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! >> ** >> >> The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to >> define the minimally needed building blocks a hypervisor has to >> provide for enabling guest-to-guest communication. The details of >> communication protocols shall remain opaque to the hypervisor so that >> guests are free to define them as simple or sophisticated as they >> need. >> >> For that purpose, the IVSHMEM provides the following features to its >> users: >> >> - Interconnection between up to 65536 peers >> >> - Multi-purpose shared memory region >> >> - common read/writable section >> >> - output sections that are read/writable for one peer and only >> readable for the others >> >> - section with peer states >> >> - Event signaling via interrupt to remote sides >> >> - Support for life-cycle management via state value exchange and >> interrupt notification on changes, backed by a shared memory >> section >> >> - Free choice of protocol to be used on top >> >> - Protocol type declaration >> >> - Register can be implemented either memory-mapped or via I/O, >> depending on platform support and lower VM-exit costs >> >> - Unprivileged access to memory-mapped or I/O registers feasible >> >> - Single discovery and configuration via standard PCI, no complexity >> by additionally defining a platform device model >> >> >> Hypervisor Model >> >> >> In order to provide a consistent link between peers, all connected >> instances of IVSHMEM devices need to be configured, created and run >> by the hypervisor according to the following requirements: >> >> - The instances of the device shall appear as a PCI device to their >> users. > > Would there ever be scope for a plain MMIO style device to be reported? As pointed out above, there has been a question already targeting at MMIO binding. > While I agree PCI is pretty useful from the point of view of easy > discovery I note a number of machine types prefer device tree reported > devices because they are faster to bring up for cloudy fast VM purposes >
[RFC] ivshmem v2: Shared memory device specification
Hi all, as requested by Michael, find below the current version of the Inter-VM Shared Memory device specification version 2 (as version 1 could be considered what is currently in QEMU). This posting is intended to collect feedback from the virtio community before officially proposing it to become part of the spec. As you can see, the spec format is not yet integrated with the virtio documents at all. IVSHMEM has value of its own, allowing unprivileged applications to establish links to other applications in VMs connected via this transport. In addition, and that is where virtio comes into play even more, it can be used to build virtio backend-frontend connections between two VMs. Prototypes have been developed, see e.g. [1], specifying that transport is still a to-do. I will try to reserve a few cycle in the upcoming weeks for a first draft. My current strategy for these two pieces, ivshmem2 and virtio-shmem, is propose them both to virtio but allowing virtio-shmem to also be mapped on other shared memory channels for virtual machines. The ivshmem2 device model is fairly stable now, also being in use in Jailhouse for quite a while. Still there are some aspects that could be worth to discuss in particular: - Do we also need a platform device model, in addition to PCI? My feelings are negative, but there has been at least one request. - Should we add some feature flags, or is using the PCI revision ID sufficient (...to do that later)? Currently, there is no need for communicating features this way. - Should we try to model the doorbell interface more flexibly, in way that may allow mapping it on hardware-provided mailboxes (i.e. VM- exit free channels)? Unfortunately, I'm not yet aware of any hardware that could provide this feature and, thus, could act as a use case to design against. Thanks, Jan [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg668749.html --- IVSHMEM Device Specification ** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! ** The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to define the minimally needed building blocks a hypervisor has to provide for enabling guest-to-guest communication. The details of communication protocols shall remain opaque to the hypervisor so that guests are free to define them as simple or sophisticated as they need. For that purpose, the IVSHMEM provides the following features to its users: - Interconnection between up to 65536 peers - Multi-purpose shared memory region - common read/writable section - output sections that are read/writable for one peer and only readable for the others - section with peer states - Event signaling via interrupt to remote sides - Support for life-cycle management via state value exchange and interrupt notification on changes, backed by a shared memory section - Free choice of protocol to be used on top - Protocol type declaration - Register can be implemented either memory-mapped or via I/O, depending on platform support and lower VM-exit costs - Unprivileged access to memory-mapped or I/O registers feasible - Single discovery and configuration via standard PCI, no complexity by additionally defining a platform device model Hypervisor Model In order to provide a consistent link between peers, all connected instances of IVSHMEM devices need to be configured, created and run by the hypervisor according to the following requirements: - The instances of the device shall appear as a PCI device to their users. - The read/write shared memory section has to be of the same size for all peers. The size can be zero. - If shared memory output sections are present (non-zero section size), there must be one reserved for each peer with exclusive write access. All output sections must have the same size and must be readable for all peers. - The State Table must have the same size for all peers, must be large enough to hold the state values of all peers, and must be read-only for the user. - State register changes (explicit writes, peer resets) have to be propagated to the other peers by updating the corresponding State Table entry and issuing an interrupt to all other peers if they enabled reception. - Interrupts events triggered by a peer have to be delivered to the target peer, provided the receiving side is valid and has enabled the reception. - All peers must have the same interrupt delivery features available, i.e. MSI-X with the same maximum number of vectors on platforms supporting this mechanism, otherwise INTx with one vector. Guest-side Programming Model An IVSHMEM device appears as a PCI device to its users. Unless otherwise noted, it conforms to the PCI Local Bus Specification, Revision 3.0. As such, it is discoverable via the PCI configuration space and provides a number of
Re: [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2
On 21.05.20 20:18, Michael S. Tsirkin wrote: > On Thu, May 21, 2020 at 05:53:31PM +0100, Alex Bennée wrote: >> >> Jan Kiszka writes: >> >>> From: Jan Kiszka >>> >>> This imports the ivshmem v2 specification draft from Jailhouse where the >>> implementation is about to be merged now. The final home of the spec is >>> to be decided, this shall just simplify the review process at this >>> stage. >>> >>> Signed-off-by: Jan Kiszka >>> --- >>> docs/specs/ivshmem-2-device-spec.md | 376 >>> >>> 1 file changed, 376 insertions(+) >>> create mode 100644 docs/specs/ivshmem-2-device-spec.md >>> >>> diff --git a/docs/specs/ivshmem-2-device-spec.md >>> b/docs/specs/ivshmem-2-device-spec.md >>> new file mode 100644 >>> index 00..d93cb22b04 >>> --- /dev/null >>> +++ b/docs/specs/ivshmem-2-device-spec.md >>> @@ -0,0 +1,376 @@ >>> +IVSHMEM Device Specification >>> + >>> + >>> +** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE >>> SPECIFICATION! ** >> >> Has there been any proposal to the OASIS virtio spec to use this as a >> transport for VirtIO? >> Not yet, primarily because the second half - virtio over generic shared memory or even ivshmem2 in particular - has not been written yet. It only exists as prototypes. Lacking time. >> -- >> Alex Bennée > > > I think intergrating this in the virtio spec needs not gate the > implementation, but I'd like to see the current spec copied to > virtio-comments list to possibly get comments from that community. > > Will do. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [RFC PATCH] hw/arm/musicpal: Map the UART devices unconditionally
On 05.05.20 11:59, Philippe Mathieu-Daudé wrote: I can't find proper documentation or datasheet, but it is likely a MMIO mapped serial device mapped in the 0x8000..0x8000 range belongs to the SoC address space, thus is always mapped in the memory bus. Map the devices on the bus regardless a chardev is attached to it. Signed-off-by: Philippe Mathieu-Daudé --- from 2019... found while doing housekeeping --- hw/arm/musicpal.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c index b2d0cfdac8..92f33ed87e 100644 --- a/hw/arm/musicpal.c +++ b/hw/arm/musicpal.c @@ -1619,14 +1619,10 @@ static void musicpal_init(MachineState *machine) pic[MP_TIMER2_IRQ], pic[MP_TIMER3_IRQ], pic[MP_TIMER4_IRQ], NULL); -if (serial_hd(0)) { -serial_mm_init(address_space_mem, MP_UART1_BASE, 2, pic[MP_UART1_IRQ], - 1825000, serial_hd(0), DEVICE_NATIVE_ENDIAN); -} -if (serial_hd(1)) { -serial_mm_init(address_space_mem, MP_UART2_BASE, 2, pic[MP_UART2_IRQ], - 1825000, serial_hd(1), DEVICE_NATIVE_ENDIAN); -} +serial_mm_init(address_space_mem, MP_UART1_BASE, 2, pic[MP_UART1_IRQ], + 1825000, serial_hd(0), DEVICE_NATIVE_ENDIAN); +serial_mm_init(address_space_mem, MP_UART2_BASE, 2, pic[MP_UART2_IRQ], + 1825000, serial_hd(1), DEVICE_NATIVE_ENDIAN); /* Register flash */ dinfo = drive_get(IF_PFLASH, 0, 0); I don't recall details anymore either (more than 10 year ago now...), but this looks reasonable. Reviewed-by: Jan Kiszka Jan
Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
On 29.04.20 06:15, Liang Yan wrote: Hi, All, Did a test for these patches, all looked fine. Test environment: Host: opensuse tumbleweed + latest upstream qemu + these three patches Guest: opensuse tumbleweed root fs + custom kernel(5.5) + related uio-ivshmem driver + ivshmem-console/ivshmem-block tools 1. lspci show 00:04.0 Unassigned class [ff80]: Siemens AG Device 4106 (prog-if 02) Subsystem: Red Hat, Inc. Device 1100 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [40] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=1 offset= PBA: BAR=1 offset=0800 Kernel driver in use: virtio-ivshmem 2. virtio-ivshmem-console test 2.1 ivshmem2-server(host) airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003 *** Example code, do not use in production *** 2.2 guest vm backend(test-01) localhost:~ # echo "110a 4106 1af4 1100 ffc003 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id [ 185.831277] uio_ivshmem :00:04.0: state_table at 0xfd80, size 0x1000 [ 185.835129] uio_ivshmem :00:04.0: rw_section at 0xfd801000, size 0x7000 localhost:~ # virtio/virtio-ivshmem-console /dev/uio0 Waiting for peer to be ready... 2.3 guest vm frontend(test-02) need to boot or reboot after backend is done 2.4 backend will serial output of frontend localhost:~ # virtio/virtio-ivshmem-console /dev/uio0 Waiting for peer to be ready... localhost:~/virtio # ./virtio-ivshmem-console /dev/uio0 Waiting for peer to be ready... Starting virtio device device_status: 0x0 device_status: 0x1 device_status: 0x3 device_features_sel: 1 device_features_sel: 0 driver_features_sel: 1 driver_features[1]: 0x13 driver_features_sel: 0 driver_features[0]: 0x1 device_status: 0xb queue_sel: 0 queue size: 8 queue driver vector: 1 queue desc: 0x200 queue driver: 0x280 queue device: 0x2c0 queue enable: 1 queue_sel: 1 queue size: 8 queue driver vector: 2 queue desc: 0x400 queue driver: 0x480 queue device: 0x4c0 queue enable: 1 device_status: 0xf Welcome to openSUSE Tumbleweed 20200326 - Kernel 5.5.0-rc5-1-default+ (hvc0). enp0s3: localhost login: 2.5 close backend and frontend will show localhost:~ # [ 185.685041] virtio-ivshmem :00:04.0: backend failed! 3. virtio-ivshmem-block test 3.1 ivshmem2-server(host) airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 1M -n 2 -V 2 -P 0x8002 *** Example code, do not use in production *** 3.2 guest vm backend(test-01) localhost:~ # echo "110a 4106 1af4 1100 ffc002 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id [ 77.701462] uio_ivshmem :00:04.0: state_table at 0xfd80, size 0x1000 [ 77.705231] uio_ivshmem :00:04.0: rw_section at 0xfd801000, size 0x000ff000 localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img Waiting for peer to be ready... 3.3 guest vm frontend(test-02) need to boot or reboot after backend is done 3.4 guest vm backend(test-01) localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img Waiting for peer to be ready... Starting virtio device device_status: 0x0 device_status: 0x1 device_status: 0x3 device_features_sel: 1 device_features_sel: 0 driver_features_sel: 1 driver_features[1]: 0x13 driver_features_sel: 0 driver_features[0]: 0x206 device_status: 0xb queue_sel: 0 queue size: 8 queue driver vector: 1 queue desc: 0x200 queue driver: 0x280 queue device: 0x2c0 queue enable: 1 device_status: 0xf 3.5 guest vm frontend(test-02), a new disk is attached: fdisk /dev/vdb Disk /dev/vdb: 192 KiB, 196608 bytes, 384 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes 3.6 close backend and frontend will show localhost:~ # [ 1312.284301] virtio-ivshmem :00:04.0: backend failed! Tested-by: Liang Yan Thanks for testing this! I'll look into your patch findings. Jan On 1/7/20 9:36 AM, Jan Kiszka wrote: Overdue update of the ivshmem 2.0 device model as presented at [1]. Changes in v2: - changed PCI device ID to Siemens-granted one, adjusted PCI device revision to 0 - removed unused feature register from device - addressed feedback on specification document - rebased over master This version is now fully in sync with the implementation for Jailhouse that is currently under review [2][3], UIO and virtio-ivshmem drivers are shared. Jailhouse will very likely pick up this revision of the device in order to move forward with stressing it. More details on the usage with QEMU were in the original cover letter (with adjustements to the new device ID): If you want to play with this, the basic setup of the shared memory device is described in patch 1 and 3. UIO driver and also the virtio-ivshmem prototype can be found at http://git.kiszka.org/?p=linu
Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
On 09.04.20 15:52, Liang Yan wrote: On 1/7/20 9:36 AM, Jan Kiszka wrote: Overdue update of the ivshmem 2.0 device model as presented at [1]. Changes in v2: - changed PCI device ID to Siemens-granted one, adjusted PCI device revision to 0 - removed unused feature register from device - addressed feedback on specification document - rebased over master This version is now fully in sync with the implementation for Jailhouse that is currently under review [2][3], UIO and virtio-ivshmem drivers are shared. Jailhouse will very likely pick up this revision of the device in order to move forward with stressing it. More details on the usage with QEMU were in the original cover letter (with adjustements to the new device ID): If you want to play with this, the basic setup of the shared memory device is described in patch 1 and 3. UIO driver and also the virtio-ivshmem prototype can be found at http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 Accessing the device via UIO is trivial enough. If you want to use it for virtio, this is additionally to the description in patch 3 needed on the virtio console backend side: modprobe uio_ivshmem echo "110a 4106 1af4 1100 ffc003 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 And for virtio block: echo "110a 4106 1af4 1100 ffc002 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img After that, you can start the QEMU frontend instance with the virtio-ivshmem driver installed which can use the new /dev/hvc* or /dev/vda* as usual. Hi, Jan, Nice work. I did a full test for your this new version. QEMU device part looks good, virtio console worked as expected. Just had some issue with virtio-ivshmem-block tests here. I suppose you mean "linux/tools/virtio/virtio-ivshmem-block"? Yes, copy mistake, had the same issue over in https://github.com/siemens/jailhouse/blob/master/Documentation/inter-cell-communication.md Noticed "ffc002" is the main difference, however I saw nothing response when run echo command here, are there anything I need to prepare? I build the driver in guest kernel already. Do I need a new protocol or anything for below command line? ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003 Yes, you need to adjust that command line - didn't I document that somewhere? Looks like I didn't: ivshmem2-server -F -l 1M -n 2 -V 2 -P 0x8002 i.e. a bit more memory is good (but this isn't speed-optimized anyway), you only need 2 vectors here (but more do not harm), and the protocol indeed needs adjustment (that is the key). Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[PATCH] hw/i386/intel_iommu: Fix out-of-bounds access on guest IRT
From: Jan Kiszka vtd_irte_get failed to check the index against the configured table size, causing an out-of-bounds access on guest memory and potentially misinterpreting the result. Signed-off-by: Jan Kiszka --- BTW, we still miss error reporting emulation, right? Therefore, I added that simple error_report_once thing, like the other paths do. hw/i386/intel_iommu.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 204b6841ec..df7ad254ac 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3094,6 +3094,12 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index, uint16_t mask, source_id; uint8_t bus, bus_max, bus_min; +if (index >= iommu->intr_size) { +error_report_once("%s: index too large: ind=0x%x", + __func__, index); +return -VTD_FR_IR_INDEX_OVER; +} + addr = iommu->intr_root + index * sizeof(*entry); if (dma_memory_read(_space_memory, addr, entry, sizeof(*entry))) { -- 2.16.4 -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [PATCH v2 0/2] ui/gtk: Fix gd_refresh_rate_millihz() when widget window is not realized
On 08.02.20 17:10, Philippe Mathieu-Daudé wrote: Fix bug report from Jan Kiszka: https://www.mail-archive.com/qemu-devel@nongnu.org/msg678130.html https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02260.html Supersedes: <20200208143008.6157-1-f4...@amsat.org> Philippe Mathieu-Daudé (2): ui/gtk: Update gd_refresh_rate_millihz() to handle VirtualConsole ui/gtk: Fix gd_refresh_rate_millihz() when widget window is not realized ui/gtk.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) Yep. Tested-by: Jan Kiszka
Re: [PATCH] ui/gtk: Fix gd_refresh_rate_millihz() when using VirtualConsole
On 08.02.20 16:20, Jan Kiszka wrote: > On 08.02.20 15:30, Philippe Mathieu-Daudé wrote: >> Fix using virtual console under gtk 3.22.30 (mate 1.20.1): >> >> qemu-system-x86_64: Gdk: gdk_window_get_origin: assertion >> 'GDK_IS_WINDOW (window)' failed >> >> Fixes: c4c00922cc and 28b58f19d2 (display/gtk: get proper refreshrate) >> Reported-by: Jan Kiszka >> Signed-off-by: Philippe Mathieu-Daudé >> --- >> Cc: Nikola Pavlica >> Report: >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg678130.html >> --- >> ui/gtk.c | 9 + >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/ui/gtk.c b/ui/gtk.c >> index d18892d1de..c59297ff4d 100644 >> --- a/ui/gtk.c >> +++ b/ui/gtk.c >> @@ -1965,11 +1965,11 @@ static GtkWidget >> *gd_create_menu_machine(GtkDisplayState *s) >> * If available, return the refresh rate of the display in milli-Hertz, >> * else return 0. >> */ >> -static int gd_refresh_rate_millihz(GtkDisplayState *s) >> +static int gd_refresh_rate_millihz(GtkWidget *window) >> { >> #ifdef GDK_VERSION_3_22 >> - GdkDisplay *dpy = gtk_widget_get_display(s->window); >> - GdkWindow *win = gtk_widget_get_window(s->window); >> + GdkDisplay *dpy = gtk_widget_get_display(window); >> + GdkWindow *win = gtk_widget_get_window(window); >> GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); > > Nope, no change. It's triggered right from ui/gtk.c:1973, the line above. > >> return gdk_monitor_get_refresh_rate(monitor); >> @@ -2045,7 +2045,8 @@ static GSList *gd_vc_gfx_init(GtkDisplayState >> *s, VirtualConsole *vc, >> vc->gfx.kbd = qkbd_state_init(con); >> vc->gfx.dcl.con = con; >> - refresh_rate_millihz = gd_refresh_rate_millihz(s); >> + refresh_rate_millihz = gd_refresh_rate_millihz(vc->window ? >> + vc->window : >> s->window); >> if (refresh_rate_millihz) { >> vc->gfx.dcl.update_interval = MILLISEC_PER_SEC / >> refresh_rate_millihz; >> } >> > > Jan Here is the full backtrace, just in case: #0 0x7496cf70 in gdk_window_get_origin () from /usr/lib64/libgdk-3.so.0 #1 0x749582a0 in gdk_display_get_monitor_at_window () from /usr/lib64/libgdk-3.so.0 #2 0x55bb73e2 in gd_refresh_rate_millihz (window=0x579d6280) at /data/qemu/ui/gtk.c:1973 #3 gd_vc_gfx_init (view_menu=0x579f0590, group=0x0, idx=0, con=, vc=0x579d4a90, s=0x579d49f0) at /data/qemu/ui/gtk.c:2048 #4 gd_create_menu_view (s=0x579d49f0) at /data/qemu/ui/gtk.c:2149 #5 gd_create_menus (s=0x579d49f0) at /data/qemu/ui/gtk.c:2188 #6 gtk_display_init (ds=, opts=0x5661ed80 ) at /data/qemu/ui/gtk.c:2256 #7 0x5583d5a0 in main (argc=, argv=, envp=) at /data/qemu/vl.c:4358 Jan
Re: [PATCH] ui/gtk: Fix gd_refresh_rate_millihz() when using VirtualConsole
On 08.02.20 15:30, Philippe Mathieu-Daudé wrote: Fix using virtual console under gtk 3.22.30 (mate 1.20.1): qemu-system-x86_64: Gdk: gdk_window_get_origin: assertion 'GDK_IS_WINDOW (window)' failed Fixes: c4c00922cc and 28b58f19d2 (display/gtk: get proper refreshrate) Reported-by: Jan Kiszka Signed-off-by: Philippe Mathieu-Daudé --- Cc: Nikola Pavlica Report: https://www.mail-archive.com/qemu-devel@nongnu.org/msg678130.html --- ui/gtk.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/ui/gtk.c b/ui/gtk.c index d18892d1de..c59297ff4d 100644 --- a/ui/gtk.c +++ b/ui/gtk.c @@ -1965,11 +1965,11 @@ static GtkWidget *gd_create_menu_machine(GtkDisplayState *s) * If available, return the refresh rate of the display in milli-Hertz, * else return 0. */ -static int gd_refresh_rate_millihz(GtkDisplayState *s) +static int gd_refresh_rate_millihz(GtkWidget *window) { #ifdef GDK_VERSION_3_22 -GdkDisplay *dpy = gtk_widget_get_display(s->window); -GdkWindow *win = gtk_widget_get_window(s->window); +GdkDisplay *dpy = gtk_widget_get_display(window); +GdkWindow *win = gtk_widget_get_window(window); GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); Nope, no change. It's triggered right from ui/gtk.c:1973, the line above. return gdk_monitor_get_refresh_rate(monitor); @@ -2045,7 +2045,8 @@ static GSList *gd_vc_gfx_init(GtkDisplayState *s, VirtualConsole *vc, vc->gfx.kbd = qkbd_state_init(con); vc->gfx.dcl.con = con; -refresh_rate_millihz = gd_refresh_rate_millihz(s); +refresh_rate_millihz = gd_refresh_rate_millihz(vc->window ? + vc->window : s->window); if (refresh_rate_millihz) { vc->gfx.dcl.update_interval = MILLISEC_PER_SEC / refresh_rate_millihz; } Jan
Re: [PATCH] ui/gtk: Get display refresh rate with GDK version 3.22 or later
On 16.01.20 02:12, Philippe Mathieu-Daudé wrote: > The GdkMonitor was introduced in GTK+ 3.22: > https://developer.gnome.org/gdk3/stable/api-index-3-22.html#api-index-3.22 > > If we build with older GTK+, the build fails: > > CC ui/gtk.o >qemu/ui/gtk.c: In function ‘gd_vc_gfx_init’: >qemu/ui/gtk.c:1973:5: error: unknown type name ‘GdkMonitor’ > GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); > ^ >qemu/ui/gtk.c:1973:27: error: implicit declaration of function > ‘gdk_display_get_monitor_at_window’ [-Werror=implicit-function-declaration] > GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); > ^ >qemu/ui/gtk.c:1973:5: error: nested extern declaration of > ‘gdk_display_get_monitor_at_window’ [-Werror=nested-externs] > GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); > ^ >qemu/ui/gtk.c:1973:27: error: initialization makes pointer from integer > without a cast [-Werror=int-conversion] > GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); > ^ >qemu/ui/gtk.c:2035:28: error: implicit declaration of function > ‘gdk_monitor_get_refresh_rate’ [-Werror=implicit-function-declaration] > refresh_rate_millihz = gdk_monitor_get_refresh_rate(monitor); >^ >qemu/ui/gtk.c:2035:5: error: nested extern declaration of > ‘gdk_monitor_get_refresh_rate’ [-Werror=nested-externs] > refresh_rate_millihz = gdk_monitor_get_refresh_rate(monitor); > ^ >cc1: all warnings being treated as errors >qemu/rules.mak:69: recipe for target 'ui/gtk.o' failed >make: *** [ui/gtk.o] Error 1 > > We only use the GdkMonitor API to get the monitor refresh rate. > > GTK+ provides convenient definition in > (already include by ) to check which API are available. > > Extract this code as a new gd_refresh_rate_millihz() function, > and check GDK_VERSION_3_22 is defined before calling its API. > If it is not defined, return 0. This is safe and fixes our build > failure. > > Fixes: c4c00922cc (display/gtk: get proper refreshrate) > Signed-off-by: Philippe Mathieu-Daudé > --- > Sorry I missed that, I only tested Nikola's patch on my workstation > which runs Fedora 30: > >$ pkg-config --modversion gtk+-3.0 >3.24.11 > > If Gerd acks this patch, we might consider having it directly > applied as a build fix. > --- > ui/gtk.c | 23 ++- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/ui/gtk.c b/ui/gtk.c > index 7355d34fcf..d18892d1de 100644 > --- a/ui/gtk.c > +++ b/ui/gtk.c > @@ -1961,6 +1961,23 @@ static GtkWidget > *gd_create_menu_machine(GtkDisplayState *s) > return machine_menu; > } > > +/* > + * If available, return the refresh rate of the display in milli-Hertz, > + * else return 0. > + */ > +static int gd_refresh_rate_millihz(GtkDisplayState *s) > +{ > +#ifdef GDK_VERSION_3_22 > +GdkDisplay *dpy = gtk_widget_get_display(s->window); > +GdkWindow *win = gtk_widget_get_window(s->window); > +GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); > + > +return gdk_monitor_get_refresh_rate(monitor); > +#else > +return 0; > +#endif > +} > + > static GSList *gd_vc_gfx_init(GtkDisplayState *s, VirtualConsole *vc, > QemuConsole *con, int idx, > GSList *group, GtkWidget *view_menu) > @@ -1968,10 +1985,6 @@ static GSList *gd_vc_gfx_init(GtkDisplayState *s, > VirtualConsole *vc, > bool zoom_to_fit = false; > int refresh_rate_millihz; > > -GdkDisplay *dpy = gtk_widget_get_display(s->window); > -GdkWindow *win = gtk_widget_get_window(s->window); > -GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win); > - > vc->label = qemu_console_get_label(con); > vc->s = s; > vc->gfx.scale_x = 1.0; > @@ -2032,7 +2045,7 @@ static GSList *gd_vc_gfx_init(GtkDisplayState *s, > VirtualConsole *vc, > vc->gfx.kbd = qkbd_state_init(con); > vc->gfx.dcl.con = con; > > -refresh_rate_millihz = gdk_monitor_get_refresh_rate(monitor); > +refresh_rate_millihz = gd_refresh_rate_millihz(s); > if (refresh_rate_millihz) { > vc->gfx.dcl.update_interval = MILLISEC_PER_SEC / > refresh_rate_millihz; > } > This (as well as c4c00922cc) gives qemu-system-x86_64: Gdk: gdk_window_get_origin: assertion 'GDK_IS_WINDOW (window)' failed on startup under gtk 3.22.30 (mate 1.20.1). Jan
[PATCH v2] apic: Report current_count via 'info lapic'
From: Jan Kiszka This is helpful when debugging stuck guest timers. As we need apic_get_current_count for that, and it is really not emulation specific, move it to apic_common.c and export it. Fix its style at this chance as well. Signed-off-by: Jan Kiszka Reviewed-by: Philippe Mathieu-Daudé --- Changes in v2: - fix style of apic_get_current_count hw/intc/apic.c | 18 -- hw/intc/apic_common.c | 19 +++ include/hw/i386/apic_internal.h | 1 + target/i386/helper.c| 5 +++-- 4 files changed, 23 insertions(+), 20 deletions(-) diff --git a/hw/intc/apic.c b/hw/intc/apic.c index bd40467965..f2207d0ace 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -615,24 +615,6 @@ int apic_accept_pic_intr(DeviceState *dev) return 0; } -static uint32_t apic_get_current_count(APICCommonState *s) -{ -int64_t d; -uint32_t val; -d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - s->initial_count_load_time) >> -s->count_shift; -if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { -/* periodic */ -val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); -} else { -if (d >= s->initial_count) -val = 0; -else -val = s->initial_count - d; -} -return val; -} - static void apic_timer_update(APICCommonState *s, int64_t current_time) { if (apic_next_timer(s, current_time)) { diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index 9ec0f2deb2..fb432e83f2 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -189,6 +189,25 @@ bool apic_next_timer(APICCommonState *s, int64_t current_time) return true; } +uint32_t apic_get_current_count(APICCommonState *s) +{ +int64_t d; +uint32_t val; +d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - s->initial_count_load_time) >> +s->count_shift; +if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { +/* periodic */ +val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); +} else { +if (d >= s->initial_count) { +val = 0; +} else { +val = s->initial_count - d; +} +} +return val; +} + void apic_init_reset(DeviceState *dev) { APICCommonState *s; diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h index b04bdd947f..2597000e03 100644 --- a/include/hw/i386/apic_internal.h +++ b/include/hw/i386/apic_internal.h @@ -211,6 +211,7 @@ void vapic_report_tpr_access(DeviceState *dev, CPUState *cpu, target_ulong ip, TPRAccess access); int apic_get_ppr(APICCommonState *s); +uint32_t apic_get_current_count(APICCommonState *s); static inline void apic_set_bit(uint32_t *tab, int index) { diff --git a/target/i386/helper.c b/target/i386/helper.c index c3a6e4fabe..e3c3726c29 100644 --- a/target/i386/helper.c +++ b/target/i386/helper.c @@ -370,10 +370,11 @@ void x86_cpu_dump_local_apic_state(CPUState *cs, int flags) dump_apic_lvt("LVTTHMR", lvt[APIC_LVT_THERMAL], false); dump_apic_lvt("LVTT", lvt[APIC_LVT_TIMER], true); -qemu_printf("Timer\t DCR=0x%x (divide by %u) initial_count = %u\n", +qemu_printf("Timer\t DCR=0x%x (divide by %u) initial_count = %u" +" current_count = %u\n", s->divide_conf & APIC_DCR_MASK, divider_conf(s->divide_conf), -s->initial_count); +s->initial_count, apic_get_current_count(s)); qemu_printf("SPIV\t 0x%08x APIC %s, focus=%s, spurious vec %u\n", s->spurious_vec, -- 2.16.4
Re: [PATCH] apic: Report current_count via 'info lapic'
On 06.02.20 23:36, Philippe Mathieu-Daudé wrote: > On 2/6/20 8:50 PM, Jan Kiszka wrote: >> From: Jan Kiszka >> >> This is helpful when debugging stuck guest timers. >> >> As we need apic_get_current_count for that, and it is really not >> emulation specific, move it to apic_common.c and export it. >> >> Signed-off-by: Jan Kiszka >> --- >> hw/intc/apic.c | 18 -- >> hw/intc/apic_common.c | 18 ++ >> include/hw/i386/apic_internal.h | 1 + >> target/i386/helper.c | 5 +++-- >> 4 files changed, 22 insertions(+), 20 deletions(-) >> >> diff --git a/hw/intc/apic.c b/hw/intc/apic.c >> index bd40467965..f2207d0ace 100644 >> --- a/hw/intc/apic.c >> +++ b/hw/intc/apic.c >> @@ -615,24 +615,6 @@ int apic_accept_pic_intr(DeviceState *dev) >> return 0; >> } >> -static uint32_t apic_get_current_count(APICCommonState *s) >> -{ >> - int64_t d; >> - uint32_t val; >> - d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - >> s->initial_count_load_time) >> >> - s->count_shift; >> - if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { >> - /* periodic */ >> - val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); >> - } else { >> - if (d >= s->initial_count) >> - val = 0; >> - else >> - val = s->initial_count - d; >> - } >> - return val; >> -} >> - >> static void apic_timer_update(APICCommonState *s, int64_t current_time) >> { >> if (apic_next_timer(s, current_time)) { >> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c >> index 9ec0f2deb2..6f4e877878 100644 >> --- a/hw/intc/apic_common.c >> +++ b/hw/intc/apic_common.c >> @@ -189,6 +189,24 @@ bool apic_next_timer(APICCommonState *s, int64_t >> current_time) >> return true; >> } >> +uint32_t apic_get_current_count(APICCommonState *s) >> +{ >> + int64_t d; >> + uint32_t val; >> + d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - >> s->initial_count_load_time) >> >> + s->count_shift; >> + if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { >> + /* periodic */ >> + val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); >> + } else { >> + if (d >= s->initial_count) >> + val = 0; >> + else >> + val = s->initial_count - d; > > Using QEMU style if () {} else {}: Yeah, that happens when you move old code - will address. > Reviewed-by: Philippe Mathieu-Daudé Thanks, Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[PATCH] apic: Report current_count via 'info lapic'
From: Jan Kiszka This is helpful when debugging stuck guest timers. As we need apic_get_current_count for that, and it is really not emulation specific, move it to apic_common.c and export it. Signed-off-by: Jan Kiszka --- hw/intc/apic.c | 18 -- hw/intc/apic_common.c | 18 ++ include/hw/i386/apic_internal.h | 1 + target/i386/helper.c| 5 +++-- 4 files changed, 22 insertions(+), 20 deletions(-) diff --git a/hw/intc/apic.c b/hw/intc/apic.c index bd40467965..f2207d0ace 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -615,24 +615,6 @@ int apic_accept_pic_intr(DeviceState *dev) return 0; } -static uint32_t apic_get_current_count(APICCommonState *s) -{ -int64_t d; -uint32_t val; -d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - s->initial_count_load_time) >> -s->count_shift; -if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { -/* periodic */ -val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); -} else { -if (d >= s->initial_count) -val = 0; -else -val = s->initial_count - d; -} -return val; -} - static void apic_timer_update(APICCommonState *s, int64_t current_time) { if (apic_next_timer(s, current_time)) { diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index 9ec0f2deb2..6f4e877878 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -189,6 +189,24 @@ bool apic_next_timer(APICCommonState *s, int64_t current_time) return true; } +uint32_t apic_get_current_count(APICCommonState *s) +{ +int64_t d; +uint32_t val; +d = (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - s->initial_count_load_time) >> +s->count_shift; +if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) { +/* periodic */ +val = s->initial_count - (d % ((uint64_t)s->initial_count + 1)); +} else { +if (d >= s->initial_count) +val = 0; +else +val = s->initial_count - d; +} +return val; +} + void apic_init_reset(DeviceState *dev) { APICCommonState *s; diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h index b04bdd947f..2597000e03 100644 --- a/include/hw/i386/apic_internal.h +++ b/include/hw/i386/apic_internal.h @@ -211,6 +211,7 @@ void vapic_report_tpr_access(DeviceState *dev, CPUState *cpu, target_ulong ip, TPRAccess access); int apic_get_ppr(APICCommonState *s); +uint32_t apic_get_current_count(APICCommonState *s); static inline void apic_set_bit(uint32_t *tab, int index) { diff --git a/target/i386/helper.c b/target/i386/helper.c index c3a6e4fabe..e3c3726c29 100644 --- a/target/i386/helper.c +++ b/target/i386/helper.c @@ -370,10 +370,11 @@ void x86_cpu_dump_local_apic_state(CPUState *cs, int flags) dump_apic_lvt("LVTTHMR", lvt[APIC_LVT_THERMAL], false); dump_apic_lvt("LVTT", lvt[APIC_LVT_TIMER], true); -qemu_printf("Timer\t DCR=0x%x (divide by %u) initial_count = %u\n", +qemu_printf("Timer\t DCR=0x%x (divide by %u) initial_count = %u" +" current_count = %u\n", s->divide_conf & APIC_DCR_MASK, divider_conf(s->divide_conf), -s->initial_count); +s->initial_count, apic_get_current_count(s)); qemu_printf("SPIV\t 0x%08x APIC %s, focus=%s, spurious vec %u\n", s->spurious_vec, -- 2.16.4
Re: [RFC 0/9] Add an interVM memory sharing device
On 05.02.20 15:39, Stefan Hajnoczi wrote: On Tue, Feb 04, 2020 at 12:30:42PM +0100, i.kotrasi...@partner.samsung.com wrote: From: Igor Kotrasinski This patchset adds a "memory exposing" device that allows two QEMU instances to share arbitrary memory regions. Unlike ivshmem, it does not create a new region of memory that's shared between VMs, but instead allows one VM to access any memory region of the other VM we choose to share. The motivation for this device is a sort of ARM Trustzone "emulation", where a rich system running on one machine (e.g. x86_64 linux) is able to perform SMCs to a trusted system running on another (e.g. OpTEE on ARM). With a device that allows sharing arbitrary memory between VMs, this can be achieved with minimal changes to the trusted system and its linux driver while allowing the rich system to run on a speedier x86 emulator. I prepared additional patches for linux, OpTEE OS and OpTEE build system as a PoC that such emulation works and passes OpTEE tests; I'm not sure what would be the best way to share them. This patchset is my first foray into QEMU source code and I'm certain it's not yet ready to be merged in. I'm not sure whether memory sharing code has any race conditions or breaks rules of working with memory regions, or if having VMs communicate synchronously via chardevs is the right way to do it. I do believe the basic idea for sharing memory regions is sound and that it could be useful for inter-VM communication. Hi, Without having looked into the patches yet, I'm already wondering if you can use the existing -object memory-backend-file,size=512M,mem-path=/my/shared/mem feature for your use case? That's the existing mechanism for fully sharing guest RAM and if you want to share all of memory then maybe a device is not necessary - just share the memory. I suspect it's about sharing that memory in a discoverable way. Maybe it is also about the signalling channel defined in the device. OTOH, when it's really about sharing everything, even bidirectional, that rather looks like a pragmatic shortcut, not a generic model. The patches should clarify their use case a bit further, I think. The title suggests a generic sharing solution, but my impression is that it rather caters a specific case under specific boundary conditions. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device
From: Jan Kiszka This adds a reimplementation of ivshmem in its new revision 2 as separate device. The goal of this is not to enable sharing with v1, rather to allow explore the properties and potential limitation of the new version prior to discussing its integration with the existing code. v2 always requires a server to interconnect two more more QEMU instances because it provides signaling between peers unconditionally. Therefore, only the interconnecting chardev, master mode, and the usage of ioeventfd can be configured at device level. All other parameters are defined by the server instance. A new server protocol is introduced along this. Its primary difference is the introduction of a single welcome message that contains all peer parameters, rather than a series of single-word messages pushing them piece by piece. A complicating difference in interrupt handling, compare to v1, is the auto-disable mode of v2: When this is active, interrupt delivery is disabled by the device after each interrupt event. This prevents the usage of irqfd on the receiving side, but it lowers the handling cost for guests that implemented interrupt throttling this way (specifically when exposing the device via UIO). No changes have been made to the ivshmem device regarding migration: Only the master can live-migrate, slave peers have to hot-unplug the device first. The details of the device model will be specified in a succeeding commit. Drivers for this device can currently be found under http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 To instantiate a ivshmem v2 device, just add ... -chardev socket,path=/tmp/ivshmem_socket,id=ivshmem \ -device ivshmem,chardev=ivshmem provided the server opened its socket under the default path. Signed-off-by: Jan Kiszka --- hw/misc/Makefile.objs |2 +- hw/misc/ivshmem2.c | 1085 include/hw/misc/ivshmem2.h | 48 ++ include/hw/pci/pci_ids.h |2 + 4 files changed, 1136 insertions(+), 1 deletion(-) create mode 100644 hw/misc/ivshmem2.c create mode 100644 include/hw/misc/ivshmem2.h diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index ba898a5781..90a4a6608c 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -26,7 +26,7 @@ common-obj-$(CONFIG_PUV3) += puv3_pm.o common-obj-$(CONFIG_MACIO) += macio/ -common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o +common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o ivshmem2.o common-obj-$(CONFIG_REALVIEW) += arm_sysctl.o common-obj-$(CONFIG_NSERIES) += cbus.o diff --git a/hw/misc/ivshmem2.c b/hw/misc/ivshmem2.c new file mode 100644 index 00..d5f88ed0e9 --- /dev/null +++ b/hw/misc/ivshmem2.c @@ -0,0 +1,1085 @@ +/* + * Inter-VM Shared Memory PCI device, version 2. + * + * Copyright (c) Siemens AG, 2019 + * + * Authors: + * Jan Kiszka + * + * Based on ivshmem.c by Cam Macdonell + * + * This code is licensed under the GNU GPL v2. + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qapi/error.h" +#include "qemu/cutils.h" +#include "hw/hw.h" +#include "hw/pci/pci.h" +#include "hw/pci/msi.h" +#include "hw/pci/msix.h" +#include "hw/qdev-properties.h" +#include "sysemu/kvm.h" +#include "migration/blocker.h" +#include "migration/vmstate.h" +#include "qemu/error-report.h" +#include "qemu/event_notifier.h" +#include "qemu/module.h" +#include "qom/object_interfaces.h" +#include "chardev/char-fe.h" +#include "sysemu/qtest.h" +#include "qapi/visitor.h" + +#include "hw/misc/ivshmem2.h" + +#define PCI_VENDOR_ID_IVSHMEM PCI_VENDOR_ID_SIEMENS +#define PCI_DEVICE_ID_IVSHMEM 0x4106 + +#define IVSHMEM_MAX_PEERS UINT16_MAX +#define IVSHMEM_IOEVENTFD 0 +#define IVSHMEM_MSI 1 + +#define IVSHMEM_REG_BAR_SIZE0x1000 + +#define IVSHMEM_REG_ID 0x00 +#define IVSHMEM_REG_MAX_PEERS 0x04 +#define IVSHMEM_REG_INT_CTRL0x08 +#define IVSHMEM_REG_DOORBELL0x0c +#define IVSHMEM_REG_STATE 0x10 + +#define IVSHMEM_INT_ENABLE 0x1 + +#define IVSHMEM_ONESHOT_MODE0x1 + +#define IVSHMEM_DEBUG 0 +#define IVSHMEM_DPRINTF(fmt, ...) \ +do {\ +if (IVSHMEM_DEBUG) {\ +printf("IVSHMEM: " fmt, ## __VA_ARGS__);\ +} \ +} while (0) + +#define TYPE_IVSHMEM "ivshmem" +#define IVSHMEM(obj) \ +OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM) + +typedef struct Peer { +int nb_eventfds; +EventNotifier *eventfds; +} Peer; + +typedef struct MSIVector { +PCIDevice *pdev; +int virq; +bool unmasked; +} MSIVector; + +typedef struct IVShmemVndrCap { +uint8_t id; +
[RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2
From: Jan Kiszka This imports the ivshmem v2 specification draft from Jailhouse where the implementation is about to be merged now. The final home of the spec is to be decided, this shall just simplify the review process at this stage. Signed-off-by: Jan Kiszka --- docs/specs/ivshmem-2-device-spec.md | 376 1 file changed, 376 insertions(+) create mode 100644 docs/specs/ivshmem-2-device-spec.md diff --git a/docs/specs/ivshmem-2-device-spec.md b/docs/specs/ivshmem-2-device-spec.md new file mode 100644 index 00..d93cb22b04 --- /dev/null +++ b/docs/specs/ivshmem-2-device-spec.md @@ -0,0 +1,376 @@ +IVSHMEM Device Specification + + +** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! ** + +The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to +define the minimally needed building blocks a hypervisor has to +provide for enabling guest-to-guest communication. The details of +communication protocols shall remain opaque to the hypervisor so that +guests are free to define them as simple or sophisticated as they +need. + +For that purpose, the IVSHMEM provides the following features to its +users: + +- Interconnection between up to 65536 peers + +- Multi-purpose shared memory region + +- common read/writable section + +- output sections that are read/writable for one peer and only + readable for the others + +- section with peer states + +- Event signaling via interrupt to remote sides + +- Support for life-cycle management via state value exchange and + interrupt notification on changes, backed by a shared memory + section + +- Free choice of protocol to be used on top + +- Protocol type declaration + +- Register can be implemented either memory-mapped or via I/O, + depending on platform support and lower VM-exit costs + +- Unprivileged access to memory-mapped or I/O registers feasible + +- Single discovery and configuration via standard PCI, no complexity + by additionally defining a platform device model + + +Hypervisor Model + + +In order to provide a consistent link between peers, all connected +instances of IVSHMEM devices need to be configured, created and run +by the hypervisor according to the following requirements: + +- The instances of the device shall appear as a PCI device to their + users. + +- The read/write shared memory section has to be of the same size for + all peers. The size can be zero. + +- If shared memory output sections are present (non-zero section + size), there must be one reserved for each peer with exclusive + write access. All output sections must have the same size and must + be readable for all peers. + +- The State Table must have the same size for all peers, must be + large enough to hold the state values of all peers, and must be + read-only for the user. + +- State register changes (explicit writes, peer resets) have to be + propagated to the other peers by updating the corresponding State + Table entry and issuing an interrupt to all other peers if they + enabled reception. + +- Interrupts events triggered by a peer have to be delivered to the + target peer, provided the receiving side is valid and has enabled + the reception. + +- All peers must have the same interrupt delivery features available, + i.e. MSI-X with the same maximum number of vectors on platforms + supporting this mechanism, otherwise INTx with one vector. + + +Guest-side Programming Model + + +An IVSHMEM device appears as a PCI device to its users. Unless +otherwise noted, it conforms to the PCI Local Bus Specification, +Revision 3.0. As such, it is discoverable via the PCI configuration +space and provides a number of standard and custom PCI configuration +registers. + +### Shared Memory Region Layout + +The shared memory region is divided into several sections. + ++-+ - +| | : +| Output Section for peer n-1 | : Output Section Size +| (n = Maximum Peers) | : ++-+ - +: : +: : +: : ++-+ - +| | : +| Output Section for peer 1 | : Output Section Size +| | : ++-+ - +| | : +| Output Section for peer 0 | : Output Section Size +| | : ++-+ - +| | : +| Read/Write Section | : R/W Section Size +| | : ++-+ - +| | : +| State Table | : State Table Size
[RFC][PATCH v2 3/3] contrib: Add server for ivshmem revision 2
From: Jan Kiszka This implements the server process for ivshmem v2 device models of QEMU. Again, no effort has been spent yet on sharing code with the v1 server. Parts have been copied, others were rewritten. In addition to parameters of v1, this server now also specifies - the maximum number of peers to be connected (required to know in advance because of v2's state table) - the size of the output sections (can be 0) - the protocol ID to be published to all peers When a virtio protocol ID is chosen, only 2 peers can be connected. Furthermore, the server will signal the backend variant of the ID to the master instance and the frontend ID to the slave peer. To start, e.g., a server that allows virtio console over ivshmem, call ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003 TODO: specify the new server protocol. Signed-off-by: Jan Kiszka --- Makefile | 3 + Makefile.objs | 1 + configure | 1 + contrib/ivshmem2-server/Makefile.objs | 1 + contrib/ivshmem2-server/ivshmem2-server.c | 462 ++ contrib/ivshmem2-server/ivshmem2-server.h | 158 ++ contrib/ivshmem2-server/main.c| 313 7 files changed, 939 insertions(+) create mode 100644 contrib/ivshmem2-server/Makefile.objs create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h create mode 100644 contrib/ivshmem2-server/main.c diff --git a/Makefile b/Makefile index 6b5ad1121b..33bb0eefdb 100644 --- a/Makefile +++ b/Makefile @@ -427,6 +427,7 @@ dummy := $(call unnest-vars,, \ elf2dmp-obj-y \ ivshmem-client-obj-y \ ivshmem-server-obj-y \ +ivshmem2-server-obj-y \ rdmacm-mux-obj-y \ libvhost-user-obj-y \ vhost-user-scsi-obj-y \ @@ -655,6 +656,8 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) $(COMMON_LDADDS) $(call LINK, $^) ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) $(COMMON_LDADDS) $(call LINK, $^) +ivshmem2-server$(EXESUF): $(ivshmem2-server-obj-y) $(COMMON_LDADDS) + $(call LINK, $^) endif vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) libvhost-user.a $(call LINK, $^) diff --git a/Makefile.objs b/Makefile.objs index 02bf5ce11d..ce243975ef 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -115,6 +115,7 @@ qga-vss-dll-obj-y = qga/ elf2dmp-obj-y = contrib/elf2dmp/ ivshmem-client-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-client/ ivshmem-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-server/ +ivshmem2-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem2-server/ libvhost-user-obj-y = contrib/libvhost-user/ vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS) vhost-user-scsi.o-libs := $(LIBISCSI_LIBS) diff --git a/configure b/configure index 747d3b4120..1cb1427f1b 100755 --- a/configure +++ b/configure @@ -6165,6 +6165,7 @@ if test "$want_tools" = "yes" ; then fi if [ "$ivshmem" = "yes" ]; then tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools" +tools="ivshmem2-server\$(EXESUF) $tools" fi if [ "$curl" = "yes" ]; then tools="elf2dmp\$(EXESUF) $tools" diff --git a/contrib/ivshmem2-server/Makefile.objs b/contrib/ivshmem2-server/Makefile.objs new file mode 100644 index 00..d233e18ec8 --- /dev/null +++ b/contrib/ivshmem2-server/Makefile.objs @@ -0,0 +1 @@ +ivshmem2-server-obj-y = ivshmem2-server.o main.o diff --git a/contrib/ivshmem2-server/ivshmem2-server.c b/contrib/ivshmem2-server/ivshmem2-server.c new file mode 100644 index 00..b341f1fcd0 --- /dev/null +++ b/contrib/ivshmem2-server/ivshmem2-server.c @@ -0,0 +1,462 @@ +/* + * Copyright 6WIND S.A., 2014 + * Copyright (c) Siemens AG, 2019 + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ +#include "qemu/osdep.h" +#include "qemu/host-utils.h" +#include "qemu/sockets.h" +#include "qemu/atomic.h" + +#include +#include + +#include "ivshmem2-server.h" + +/* log a message on stdout if verbose=1 */ +#define IVSHMEM_SERVER_DEBUG(server, fmt, ...) do { \ +if ((server)->args.verbose) { \ +printf(fmt, ## __VA_ARGS__); \ +}\ +} while (0) + +/** maximum size of a huge page, used by ivshmem_server_ftruncate() */ +#define IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE (1024 * 1024 * 1024) + +/** default listen backlog (number of sockets not accepted) */ +#define IVSHMEM_SERVER_LISTEN_BACKLOG 10 + +/* send message to a client unix socket */ +static int ivshmem_server_send_msg(int sock_fd, void *payload, int len, int fd) +{ +in
[RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
Overdue update of the ivshmem 2.0 device model as presented at [1]. Changes in v2: - changed PCI device ID to Siemens-granted one, adjusted PCI device revision to 0 - removed unused feature register from device - addressed feedback on specification document - rebased over master This version is now fully in sync with the implementation for Jailhouse that is currently under review [2][3], UIO and virtio-ivshmem drivers are shared. Jailhouse will very likely pick up this revision of the device in order to move forward with stressing it. More details on the usage with QEMU were in the original cover letter (with adjustements to the new device ID): If you want to play with this, the basic setup of the shared memory device is described in patch 1 and 3. UIO driver and also the virtio-ivshmem prototype can be found at http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 Accessing the device via UIO is trivial enough. If you want to use it for virtio, this is additionally to the description in patch 3 needed on the virtio console backend side: modprobe uio_ivshmem echo "110a 4106 1af4 1100 ffc003 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 And for virtio block: echo "110a 4106 1af4 1100 ffc002 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img After that, you can start the QEMU frontend instance with the virtio-ivshmem driver installed which can use the new /dev/hvc* or /dev/vda* as usual. Any feedback welcome! Jan PS: Let me know if I missed someone potentially interested in this topic on CC - or if you would like to be dropped from the list. [1] https://kvmforum2019.sched.com/event/TmxI [2] https://groups.google.com/forum/#!topic/jailhouse-dev/ffnCcRh8LOs [3] https://groups.google.com/forum/#!topic/jailhouse-dev/HX-0AGF1cjg Jan Kiszka (3): hw/misc: Add implementation of ivshmem revision 2 device docs/specs: Add specification of ivshmem device revision 2 contrib: Add server for ivshmem revision 2 Makefile |3 + Makefile.objs |1 + configure |1 + contrib/ivshmem2-server/Makefile.objs |1 + contrib/ivshmem2-server/ivshmem2-server.c | 462 contrib/ivshmem2-server/ivshmem2-server.h | 158 + contrib/ivshmem2-server/main.c| 313 + docs/specs/ivshmem-2-device-spec.md | 376 ++ hw/misc/Makefile.objs |2 +- hw/misc/ivshmem2.c| 1085 + include/hw/misc/ivshmem2.h| 48 ++ include/hw/pci/pci_ids.h |2 + 12 files changed, 2451 insertions(+), 1 deletion(-) create mode 100644 contrib/ivshmem2-server/Makefile.objs create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h create mode 100644 contrib/ivshmem2-server/main.c create mode 100644 docs/specs/ivshmem-2-device-spec.md create mode 100644 hw/misc/ivshmem2.c create mode 100644 include/hw/misc/ivshmem2.h -- 2.16.4
Re: [RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2
On 05.12.19 12:14, Markus Armbruster wrote: > This has been on the list for more than three weeks already. I > apologize for the delay. No problem! Your feedback is highly appreciated. > > Jan Kiszka writes: > >> From: Jan Kiszka >> >> This imports the ivshmem v2 specification draft from Jailhouse. Its >> final home is to be decided, this shall just simplify the review process >> at this stage. >> >> Note that specifically the Features register (offset 08h) is still under >> consideration. In particular, its bit 0 seems useless now as its benefit >> to guests, specifically when they want to be portable, is close to zero. >> Maybe the register should still be kept, with all bits RsvdZ, for easier >> extensibility. >> >> The rest appears now rather mature and reasonably implementable, as >> proven also by a version for Jailhouse. >> >> Signed-off-by: Jan Kiszka >> --- >> docs/specs/ivshmem-2-device-spec.md | 333 >> >> 1 file changed, 333 insertions(+) >> create mode 100644 docs/specs/ivshmem-2-device-spec.md >> >> diff --git a/docs/specs/ivshmem-2-device-spec.md >> b/docs/specs/ivshmem-2-device-spec.md >> new file mode 100644 >> index 00..98cfde585a >> --- /dev/null >> +++ b/docs/specs/ivshmem-2-device-spec.md >> @@ -0,0 +1,333 @@ >> +IVSHMEM Device Specification >> + >> + >> +** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE >> SPECIFICATION! ** >> + >> +The Inter-VM Shared Memory device provides the following features to its >> users: >> + >> +- Interconnection between up to 65536 peers >> + >> +- Multi-purpose shared memory region >> + >> +- common read/writable section >> + >> +- unidirectional sections that are read/writable for one peer and only >> + readable for the others >> + >> +- section with peer states >> + >> +- Event signaling via interrupt to remote sides >> + >> +- Support for life-cycle management via state value exchange and interrupt >> + notification on changes, backed by a shared memory section >> + >> +- Free choice of protocol to be used on top >> + >> +- Protocol type declaration >> + >> +- Unprivileged access to memory-mapped or I/O registers feasible >> + >> +- Discoverable and configurable via standard PCI mechanisms > > Stating requirements is much appreciated. Design rationale would be > even better :) The idea is to avoid having to model also a platform device with platform discovery mechanism. Based on our experiences with providing a generic virtual PCI host controller, such a complication of specification and implementation is simply not needed. > > As pointed out many times, shared memory is not a solution to any > communication problem, it's merely a building block for building such > solutions: you invariably have to layer some protocol on top. In your > KVM Forum talk, you mention layering virtio on top. Makes sense to me. > But why does *this* virtio transport have to be an independent device? > Other transports aren't. This device is not only a virtio transport. As you correctly point out, it is a foundation for communication, not the full solution by itself. Its goal is to define the minimal piece that a hypervisor has to provide to its guests in order to enable them building full communication solutions of their preferred flavor on top. It does not want to deal with those details, may they be virtio, may they be something much simpler, much older or much more sophisticated (or complicated). All that shall not be the business of a hypervisor. > > Now let me indulge in spec nitpicking :) > >> + >> + >> +Hypervisor Model >> + >> + >> +In order to provide a consistent link between peers, all connected >> instances of >> +IVSHMEM devices need to be configured, created and run by the hypervisor >> +according to the following requirements: >> + >> +- The instances of the device need to be accessible via PCI programming >> + interfaces on all sides. > > What does that mean? "From a guest point of view, this shall be a PCI device." > >> + >> +- The read/write shared memory section has to be of the same size for all >> + peers and, if non-zero, has to reflect the same memory content for them. > > Isn't "same memory content" redundant with "shared memory"? Probably. > >> + >> +- If output sections are present (non-zero section size), there must be one >>
Re: [RFC][PATCH 0/3] IVSHMEM version 2 device for QEMU
On 03.12.19 06:53, Liang Yan wrote: > > On 12/2/19 1:16 AM, Jan Kiszka wrote: >> On 27.11.19 18:19, Jan Kiszka wrote: >>> Hi Liang, >>> >>> On 27.11.19 16:28, Liang Yan wrote: >>>> >>>> >>>> On 11/11/19 7:57 AM, Jan Kiszka wrote: >>>>> To get the ball rolling after my presentation of the topic at KVM Forum >>>>> [1] and many fruitful discussions around it, this is a first concrete >>>>> code series. As discussed, I'm starting with the IVSHMEM implementation >>>>> of a QEMU device and server. It's RFC because, besides specification >>>>> and >>>>> implementation details, there will still be some decisions needed about >>>>> how to integrate the new version best into the existing code bases. >>>>> >>>>> If you want to play with this, the basic setup of the shared memory >>>>> device is described in patch 1 and 3. UIO driver and also the >>>>> virtio-ivshmem prototype can be found at >>>>> >>>>> >>>>> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 >>>>> >>>>> >>>>> Accessing the device via UIO is trivial enough. If you want to use it >>>>> for virtio, this is additionally to the description in patch 3 >>>>> needed on >>>>> the virtio console backend side: >>>>> >>>>> modprobe uio_ivshmem >>>>> echo "1af4 1110 1af4 1100 ffc003 ff" > >>>>> /sys/bus/pci/drivers/uio_ivshmem/new_id >>>>> linux/tools/virtio/virtio-ivshmem-console /dev/uio0 >>>>> >>>>> And for virtio block: >>>>> >>>>> echo "1af4 1110 1af4 1100 ffc002 ff" > >>>>> /sys/bus/pci/drivers/uio_ivshmem/new_id >>>>> linux/tools/virtio/virtio-ivshmem-console /dev/uio0 >>>>> /path/to/disk.img >>>>> >>>>> After that, you can start the QEMU frontend instance with the >>>>> virtio-ivshmem driver installed which can use the new /dev/hvc* or >>>>> /dev/vda* as usual. >>>>> >>>>> Any feedback welcome! >>>> >>>> Hi, Jan, >>>> >>>> I have been playing your code for last few weeks, mostly study and test, >>>> of course. Really nice work. I have a few questions here: >>>> >>>> First, qemu part looks good, I tried test between couple VMs, and device >>>> could pop up correctly for all of them, but I had some problems when >>>> trying load driver. For example, if set up two VMs, vm1 and vm2, start >>>> ivshmem server as you suggested. vm1 could load uio_ivshmem and >>>> virtio_ivshmem correctly, vm2 could load uio_ivshmem but could not show >>>> up "/dev/uio0", virtio_ivshmem could not be loaded at all, these still >>>> exist even I switch the load sequence of vm1 and vm2, and sometimes >>>> reset "virtio_ivshmem" could crash both vm1 and vm2. Not quite sure this >>>> is bug or "Ivshmem Mode" issue, but I went through ivshmem-server code, >>>> did not related information. >>> >>> If we are only talking about one ivshmem link and vm1 is the master, >>> there is not role for virtio_ivshmem on it as backend. That is purely >>> a frontend driver. Vice versa for vm2: If you want to use its ivshmem >>> instance as virtio frontend, uio_ivshmem plays no role. >>> > Hi, Jan, > > Sorry for the late response. Just came back from Thanksgiving holiday. > > I have a few questions here. > First, how to decide master/slave node? I used two VMs here, they did > not show same behavior even if I change the boot sequence. The current mechanism works by selecting the VM gets ID 0 as the backend, thus sending it also a different protocol ID than the frontend gets. Could possibly be improved by allowing selection also on the VM side (QEMU command line parameter etc.). Inherently, this only affects virtio over ivshmem. Other, symmetric protocols do not need this differentiation. > > Second, in order to run virtio-ivshmem-console demo, VM1 connect to VM2 > Console. So, need to install virtio frontend driver in VM2, then install > uio frontend driver in VM1 to get "/dev/uio0", then run demo, right? > Could you share your procedure? > > Also, I could not get "/dev/uio0" all the time. OK, should have collected this earlier. This is h
Re: [RFC][PATCH 0/3] IVSHMEM version 2 device for QEMU
On 27.11.19 18:19, Jan Kiszka wrote: Hi Liang, On 27.11.19 16:28, Liang Yan wrote: On 11/11/19 7:57 AM, Jan Kiszka wrote: To get the ball rolling after my presentation of the topic at KVM Forum [1] and many fruitful discussions around it, this is a first concrete code series. As discussed, I'm starting with the IVSHMEM implementation of a QEMU device and server. It's RFC because, besides specification and implementation details, there will still be some decisions needed about how to integrate the new version best into the existing code bases. If you want to play with this, the basic setup of the shared memory device is described in patch 1 and 3. UIO driver and also the virtio-ivshmem prototype can be found at http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 Accessing the device via UIO is trivial enough. If you want to use it for virtio, this is additionally to the description in patch 3 needed on the virtio console backend side: modprobe uio_ivshmem echo "1af4 1110 1af4 1100 ffc003 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 And for virtio block: echo "1af4 1110 1af4 1100 ffc002 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img After that, you can start the QEMU frontend instance with the virtio-ivshmem driver installed which can use the new /dev/hvc* or /dev/vda* as usual. Any feedback welcome! Hi, Jan, I have been playing your code for last few weeks, mostly study and test, of course. Really nice work. I have a few questions here: First, qemu part looks good, I tried test between couple VMs, and device could pop up correctly for all of them, but I had some problems when trying load driver. For example, if set up two VMs, vm1 and vm2, start ivshmem server as you suggested. vm1 could load uio_ivshmem and virtio_ivshmem correctly, vm2 could load uio_ivshmem but could not show up "/dev/uio0", virtio_ivshmem could not be loaded at all, these still exist even I switch the load sequence of vm1 and vm2, and sometimes reset "virtio_ivshmem" could crash both vm1 and vm2. Not quite sure this is bug or "Ivshmem Mode" issue, but I went through ivshmem-server code, did not related information. If we are only talking about one ivshmem link and vm1 is the master, there is not role for virtio_ivshmem on it as backend. That is purely a frontend driver. Vice versa for vm2: If you want to use its ivshmem instance as virtio frontend, uio_ivshmem plays no role. The "crash" is would be interesting to understand: Do you see kernel panics of the guests? Or are they stuck? Or are the QEMU instances stuck? Do you know that you can debug the guest kernels via gdb (and gdb-scripts of the kernel)? I started some code work recently, such as fix code style issues and some work based on above testing, however I know you are also working on RFC V2, beside the protocol between server-client and client-client is not finalized yet either, things may change, so much appreciate if you could squeeze me into your develop schedule and share with me some plans, :-) Maybe I could send some pull request in your github repo? I'm currently working on a refresh of the Jailhouse queue and the kernel patches to incorporate just two smaller changes: - use Siemens device ID - drop "features" register from ivshmem device I have not yet touched the QEMU code for that so far, thus no conflict yet. I would wait for your patches then. If it helps us to work on this together, I can push things to github as well. Will drop you a note when done. We should just present the outcome frequently as new series to the list. I've updated my queues, mostly small changes, mostly to the kernel bits. Besides the already announced places, you can also find them as PR targets on https://github.com/siemens/qemu/commits/wip/ivshmem2 https://github.com/siemens/linux/commits/queues/ivshmem2 To give the whole thing broader coverage, I will now also move forward and integrate the current state into Jailhouse - at the risk of having to rework the interface there once again. But there are a number of users already requiring the extended features (or even using them), plus this gives a nice test coverage of key components and properties. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [RFC][PATCH 0/3] IVSHMEM version 2 device for QEMU
Hi Liang, On 27.11.19 16:28, Liang Yan wrote: On 11/11/19 7:57 AM, Jan Kiszka wrote: To get the ball rolling after my presentation of the topic at KVM Forum [1] and many fruitful discussions around it, this is a first concrete code series. As discussed, I'm starting with the IVSHMEM implementation of a QEMU device and server. It's RFC because, besides specification and implementation details, there will still be some decisions needed about how to integrate the new version best into the existing code bases. If you want to play with this, the basic setup of the shared memory device is described in patch 1 and 3. UIO driver and also the virtio-ivshmem prototype can be found at http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 Accessing the device via UIO is trivial enough. If you want to use it for virtio, this is additionally to the description in patch 3 needed on the virtio console backend side: modprobe uio_ivshmem echo "1af4 1110 1af4 1100 ffc003 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 And for virtio block: echo "1af4 1110 1af4 1100 ffc002 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img After that, you can start the QEMU frontend instance with the virtio-ivshmem driver installed which can use the new /dev/hvc* or /dev/vda* as usual. Any feedback welcome! Hi, Jan, I have been playing your code for last few weeks, mostly study and test, of course. Really nice work. I have a few questions here: First, qemu part looks good, I tried test between couple VMs, and device could pop up correctly for all of them, but I had some problems when trying load driver. For example, if set up two VMs, vm1 and vm2, start ivshmem server as you suggested. vm1 could load uio_ivshmem and virtio_ivshmem correctly, vm2 could load uio_ivshmem but could not show up "/dev/uio0", virtio_ivshmem could not be loaded at all, these still exist even I switch the load sequence of vm1 and vm2, and sometimes reset "virtio_ivshmem" could crash both vm1 and vm2. Not quite sure this is bug or "Ivshmem Mode" issue, but I went through ivshmem-server code, did not related information. If we are only talking about one ivshmem link and vm1 is the master, there is not role for virtio_ivshmem on it as backend. That is purely a frontend driver. Vice versa for vm2: If you want to use its ivshmem instance as virtio frontend, uio_ivshmem plays no role. The "crash" is would be interesting to understand: Do you see kernel panics of the guests? Or are they stuck? Or are the QEMU instances stuck? Do you know that you can debug the guest kernels via gdb (and gdb-scripts of the kernel)? I started some code work recently, such as fix code style issues and some work based on above testing, however I know you are also working on RFC V2, beside the protocol between server-client and client-client is not finalized yet either, things may change, so much appreciate if you could squeeze me into your develop schedule and share with me some plans, :-) Maybe I could send some pull request in your github repo? I'm currently working on a refresh of the Jailhouse queue and the kernel patches to incorporate just two smaller changes: - use Siemens device ID - drop "features" register from ivshmem device I have not yet touched the QEMU code for that so far, thus no conflict yet. I would wait for your patches then. If it helps us to work on this together, I can push things to github as well. Will drop you a note when done. We should just present the outcome frequently as new series to the list. I personally like this project a lot, there would be a lot of potential and user case for it, especially some devices like ivshmem-net/ivshmem-block. Anyway, thanks for adding me to the list, and looking forward to your reply. Thanks for the positive feedback. I'm looking forward to work on this together! Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2
On 12.11.19 09:04, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 05:38:29PM +0100, Jan Kiszka wrote: On 11.11.19 17:11, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 03:27:43PM +, Daniel P. Berrangé wrote: On Mon, Nov 11, 2019 at 10:08:20AM -0500, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 02:59:07PM +0100, Jan Kiszka wrote: On 11.11.19 14:45, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 01:57:11PM +0100, Jan Kiszka wrote: +| Offset | Register | Content | +|---:|:---|:-| +|00h | Vendor ID | 1AF4h | +|02h | Device ID | 1110h | Given it's a virtio vendor ID, please reserve a device ID with the virtio TC. Yeah, QEMU's IVSHMEM was always using that. I'm happy to make this finally official. And I guess we will just mark it reserved or something right? Since at least IVSHMEM 1 isn't a virtio device. And will you be reusing same ID for IVSHMEM 2 or a new one? 1110h isn't under either of the virtio PCI device ID allowed ranges according to the spec: "Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through 0x107F inclusive is a virtio device. ... Additionally, devices MAY utilize a Transitional PCI Device ID range, 0x1000 to 0x103F depending on the device type. " So there's no need to reserve 0x1110h from the virtio spec POV. Well we do have: B.3 What Device Number? Device numbers can be reserved by the OASIS committee: email virtio-...@lists.oasis-open.org to secure a unique one. Meanwhile for experimental drivers, use 65535 and work backwards. So it seems it can in theory conflict at least with experimental virtio devices. Really it's messy that people are reusing the virtio vendor ID for random stuff - getting a vendor ID is only hard for a hobbyist, any big company already has an ID - but if it is a hobbyist and they at least register then doesn't cause much harm. Note that ivshmem came from a research environment. I do know if there was a check for the IDs at the point the code was merged. That said, I may get a device ID here as well, provided I can explain that not a single "product" will own it, but rather an open specification. Jan OK, up to you - if you decide you want an ID reserved, pls let us know. Turned out to be much simpler than expect: I reserved device ID 4106h under the vendor ID 110Ah (Siemens AG) for the purpose of specifying a shared memory device via the VIRTIO TC. Will update this "detail" in the next revision of the patches, also resetting the device revision ID to 0 as no longer need to tell us apart from the current implementation this way. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2
On 11.11.19 17:11, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 03:27:43PM +, Daniel P. Berrangé wrote: On Mon, Nov 11, 2019 at 10:08:20AM -0500, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 02:59:07PM +0100, Jan Kiszka wrote: On 11.11.19 14:45, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 01:57:11PM +0100, Jan Kiszka wrote: +| Offset | Register | Content | +|---:|:---|:-| +|00h | Vendor ID | 1AF4h | +|02h | Device ID | 1110h | Given it's a virtio vendor ID, please reserve a device ID with the virtio TC. Yeah, QEMU's IVSHMEM was always using that. I'm happy to make this finally official. And I guess we will just mark it reserved or something right? Since at least IVSHMEM 1 isn't a virtio device. And will you be reusing same ID for IVSHMEM 2 or a new one? 1110h isn't under either of the virtio PCI device ID allowed ranges according to the spec: "Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through 0x107F inclusive is a virtio device. ... Additionally, devices MAY utilize a Transitional PCI Device ID range, 0x1000 to 0x103F depending on the device type. " So there's no need to reserve 0x1110h from the virtio spec POV. Well we do have: B.3 What Device Number? Device numbers can be reserved by the OASIS committee: email virtio-...@lists.oasis-open.org to secure a unique one. Meanwhile for experimental drivers, use 65535 and work backwards. So it seems it can in theory conflict at least with experimental virtio devices. Really it's messy that people are reusing the virtio vendor ID for random stuff - getting a vendor ID is only hard for a hobbyist, any big company already has an ID - but if it is a hobbyist and they at least register then doesn't cause much harm. Note that ivshmem came from a research environment. I do know if there was a check for the IDs at the point the code was merged. That said, I may get a device ID here as well, provided I can explain that not a single "product" will own it, but rather an open specification. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2
On 11.11.19 17:14, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 04:42:52PM +0100, Jan Kiszka wrote: On 11.11.19 16:27, Daniel P. Berrangé wrote: On Mon, Nov 11, 2019 at 10:08:20AM -0500, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 02:59:07PM +0100, Jan Kiszka wrote: On 11.11.19 14:45, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 01:57:11PM +0100, Jan Kiszka wrote: +| Offset | Register | Content | +|---:|:---|:-| +|00h | Vendor ID | 1AF4h | +|02h | Device ID | 1110h | Given it's a virtio vendor ID, please reserve a device ID with the virtio TC. Yeah, QEMU's IVSHMEM was always using that. I'm happy to make this finally official. And I guess we will just mark it reserved or something right? Since at least IVSHMEM 1 isn't a virtio device. And will you be reusing same ID for IVSHMEM 2 or a new one? 1110h isn't under either of the virtio PCI device ID allowed ranges according to the spec: "Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through 0x107F inclusive is a virtio device. ... Additionally, devices MAY utilize a Transitional PCI Device ID range, 0x1000 to 0x103F depending on the device type. " So there's no need to reserve 0x1110h from the virtio spec POV. Indeed. I have, however, ensured it is assigned to ivshmem from POV of Red Hat's own internal tracking of allocated device IDs, under its vendor ID. If ivshmem 2 is now a virtio device, then it is a good thing that it will get a new/different PCI device ID, to show that it is not compatible with the old device impl. At this stage, it is just a PCI device that may be used in combination with virtio (stacked on top), but it is not designed like a normal virtio (PCI) device. That's because it lacks many properties of regular virtio devices, like queues. So, if such a device could be come part of the virtio spec, it would be separate from the rest, and having an ID from the regular range would likely not be helpful in this regard. Jan I agree it needs a separate ID not from the regular range. It's a distinct transport. Maybe even a distinct vendor ID - we could easily get another one if needed. That might be useful because I've seen the kernel's virtio-pci driver grabbing ivshmem devices from time to time. OTOH, that could likely also be improved in Linux, at least for future versions. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2
On 11.11.19 16:27, Daniel P. Berrangé wrote: On Mon, Nov 11, 2019 at 10:08:20AM -0500, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 02:59:07PM +0100, Jan Kiszka wrote: On 11.11.19 14:45, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 01:57:11PM +0100, Jan Kiszka wrote: +| Offset | Register | Content | +|---:|:---|:-| +|00h | Vendor ID | 1AF4h | +|02h | Device ID | 1110h | Given it's a virtio vendor ID, please reserve a device ID with the virtio TC. Yeah, QEMU's IVSHMEM was always using that. I'm happy to make this finally official. And I guess we will just mark it reserved or something right? Since at least IVSHMEM 1 isn't a virtio device. And will you be reusing same ID for IVSHMEM 2 or a new one? 1110h isn't under either of the virtio PCI device ID allowed ranges according to the spec: "Any PCI device with PCI Vendor ID 0x1AF4, and PCI Device ID 0x1000 through 0x107F inclusive is a virtio device. ... Additionally, devices MAY utilize a Transitional PCI Device ID range, 0x1000 to 0x103F depending on the device type. " So there's no need to reserve 0x1110h from the virtio spec POV. Indeed. I have, however, ensured it is assigned to ivshmem from POV of Red Hat's own internal tracking of allocated device IDs, under its vendor ID. If ivshmem 2 is now a virtio device, then it is a good thing that it will get a new/different PCI device ID, to show that it is not compatible with the old device impl. At this stage, it is just a PCI device that may be used in combination with virtio (stacked on top), but it is not designed like a normal virtio (PCI) device. That's because it lacks many properties of regular virtio devices, like queues. So, if such a device could be come part of the virtio spec, it would be separate from the rest, and having an ID from the regular range would likely not be helpful in this regard. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2
On 11.11.19 14:45, Michael S. Tsirkin wrote: On Mon, Nov 11, 2019 at 01:57:11PM +0100, Jan Kiszka wrote: +| Offset | Register | Content | +|---:|:---|:-| +|00h | Vendor ID | 1AF4h | +|02h | Device ID | 1110h | Given it's a virtio vendor ID, please reserve a device ID with the virtio TC. Yeah, QEMU's IVSHMEM was always using that. I'm happy to make this finally official. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[RFC][PATCH 1/3] hw/misc: Add implementation of ivshmem revision 2 device
From: Jan Kiszka This adds a reimplementation of ivshmem in its new revision 2 as separate device. The goal of this is not to enable sharing with v1, rather to allow explore the properties and potential limitation of the new version prior to discussing its integration with the existing code. v2 always requires a server to interconnect two more more QEMU instances because it provides signaling between peers unconditionally. Therefore, only the interconnecting chardev, master mode, and the usage of ioeventfd can be configured at device level. All other parameters are defined by the server instance. A new server protocol is introduced along this. Its primary difference is the introduction of a single welcome message that contains all peer parameters, rather than a series of single-word messages pushing them piece by piece. A complicating difference in interrupt handling, compare to v1, is the auto-disable mode of v2: When this is active, interrupt delivery is disabled by the device after each interrupt event. This prevents the usage of irqfd on the receiving side, but it lowers the handling cost for guests that implemented interrupt throttling this way (specifically when exposing the device via UIO). No changes have been made to the ivshmem device regarding migration: Only the master can live-migrate, slave peers have to hot-unplug the device first. The details of the device model will be specified in a succeeding commit. Drivers for this device can currently be found under http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 To instantiate a ivshmem v2 device, just add ... -chardev socket,path=/tmp/ivshmem_socket,id=ivshmem \ -device ivshmem,chardev=ivshmem provided the server opened its socket under the default path. Signed-off-by: Jan Kiszka --- hw/misc/Makefile.objs |2 +- hw/misc/ivshmem2.c | 1091 include/hw/misc/ivshmem2.h | 48 ++ 3 files changed, 1140 insertions(+), 1 deletion(-) create mode 100644 hw/misc/ivshmem2.c create mode 100644 include/hw/misc/ivshmem2.h diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs index ba898a5781..90a4a6608c 100644 --- a/hw/misc/Makefile.objs +++ b/hw/misc/Makefile.objs @@ -26,7 +26,7 @@ common-obj-$(CONFIG_PUV3) += puv3_pm.o common-obj-$(CONFIG_MACIO) += macio/ -common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o +common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o ivshmem2.o common-obj-$(CONFIG_REALVIEW) += arm_sysctl.o common-obj-$(CONFIG_NSERIES) += cbus.o diff --git a/hw/misc/ivshmem2.c b/hw/misc/ivshmem2.c new file mode 100644 index 00..aeb5c40e0b --- /dev/null +++ b/hw/misc/ivshmem2.c @@ -0,0 +1,1091 @@ +/* + * Inter-VM Shared Memory PCI device, version 2. + * + * Copyright (c) Siemens AG, 2019 + * + * Authors: + * Jan Kiszka + * + * Based on ivshmem.c by Cam Macdonell + * + * This code is licensed under the GNU GPL v2. + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qapi/error.h" +#include "qemu/cutils.h" +#include "hw/hw.h" +#include "hw/pci/pci.h" +#include "hw/pci/msi.h" +#include "hw/pci/msix.h" +#include "hw/qdev-properties.h" +#include "sysemu/kvm.h" +#include "migration/blocker.h" +#include "migration/vmstate.h" +#include "qemu/error-report.h" +#include "qemu/event_notifier.h" +#include "qemu/module.h" +#include "qom/object_interfaces.h" +#include "chardev/char-fe.h" +#include "sysemu/qtest.h" +#include "qapi/visitor.h" + +#include "hw/misc/ivshmem2.h" + +#define PCI_VENDOR_ID_IVSHMEM PCI_VENDOR_ID_REDHAT_QUMRANET +#define PCI_DEVICE_ID_IVSHMEM 0x1110 + +#define IVSHMEM_MAX_PEERS UINT16_MAX +#define IVSHMEM_IOEVENTFD 0 +#define IVSHMEM_MSI 1 + +#define IVSHMEM_REG_BAR_SIZE0x1000 + +#define IVSHMEM_REG_ID 0x00 +#define IVSHMEM_REG_MAX_PEERS 0x04 +#define IVSHMEM_REG_FEATURES0x08 +#define IVSHMEM_REG_INT_CTRL0x0c +#define IVSHMEM_REG_DOORBELL0x10 +#define IVSHMEM_REG_STATE 0x14 + +#define IVSHMEM_INT_ENABLE 0x1 + +#define IVSHMEM_ONESHOT_MODE0x1 + +#define IVSHMEM_DEBUG 0 +#define IVSHMEM_DPRINTF(fmt, ...) \ +do {\ +if (IVSHMEM_DEBUG) {\ +printf("IVSHMEM: " fmt, ## __VA_ARGS__);\ +} \ +} while (0) + +#define TYPE_IVSHMEM "ivshmem" +#define IVSHMEM(obj) \ +OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM) + +typedef struct Peer { +int nb_eventfds; +EventNotifier *eventfds; +} Peer; + +typedef struct MSIVector { +PCIDevice *pdev; +int virq; +bool unmasked; +} MSIVector; + +typedef struct IVShmemVndrCap {
[RFC][PATCH 0/3] IVSHMEM version 2 device for QEMU
To get the ball rolling after my presentation of the topic at KVM Forum [1] and many fruitful discussions around it, this is a first concrete code series. As discussed, I'm starting with the IVSHMEM implementation of a QEMU device and server. It's RFC because, besides specification and implementation details, there will still be some decisions needed about how to integrate the new version best into the existing code bases. If you want to play with this, the basic setup of the shared memory device is described in patch 1 and 3. UIO driver and also the virtio-ivshmem prototype can be found at http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2 Accessing the device via UIO is trivial enough. If you want to use it for virtio, this is additionally to the description in patch 3 needed on the virtio console backend side: modprobe uio_ivshmem echo "1af4 1110 1af4 1100 ffc003 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 And for virtio block: echo "1af4 1110 1af4 1100 ffc002 ff" > /sys/bus/pci/drivers/uio_ivshmem/new_id linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img After that, you can start the QEMU frontend instance with the virtio-ivshmem driver installed which can use the new /dev/hvc* or /dev/vda* as usual. Any feedback welcome! Jan PS: Let me know if I missed someone potentially interested in this topic on CC - or if you would like to be dropped from the list. PPS: The Jailhouse queues are currently out of sync /wrt minor details of this one, primarily the device ID. Will update them when the general direction is clear. [1] https://kvmforum2019.sched.com/event/TmxI Jan Kiszka (3): hw/misc: Add implementation of ivshmem revision 2 device docs/specs: Add specification of ivshmem device revision 2 contrib: Add server for ivshmem revision 2 Makefile |3 + Makefile.objs |1 + configure |1 + contrib/ivshmem2-server/Makefile.objs |1 + contrib/ivshmem2-server/ivshmem2-server.c | 462 contrib/ivshmem2-server/ivshmem2-server.h | 158 + contrib/ivshmem2-server/main.c| 313 + docs/specs/ivshmem-2-device-spec.md | 333 + hw/misc/Makefile.objs |2 +- hw/misc/ivshmem2.c| 1091 + include/hw/misc/ivshmem2.h| 48 ++ 11 files changed, 2412 insertions(+), 1 deletion(-) create mode 100644 contrib/ivshmem2-server/Makefile.objs create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h create mode 100644 contrib/ivshmem2-server/main.c create mode 100644 docs/specs/ivshmem-2-device-spec.md create mode 100644 hw/misc/ivshmem2.c create mode 100644 include/hw/misc/ivshmem2.h -- 2.16.4
[RFC][PATCH 2/3] docs/specs: Add specification of ivshmem device revision 2
From: Jan Kiszka This imports the ivshmem v2 specification draft from Jailhouse. Its final home is to be decided, this shall just simplify the review process at this stage. Note that specifically the Features register (offset 08h) is still under consideration. In particular, its bit 0 seems useless now as its benefit to guests, specifically when they want to be portable, is close to zero. Maybe the register should still be kept, with all bits RsvdZ, for easier extensibility. The rest appears now rather mature and reasonably implementable, as proven also by a version for Jailhouse. Signed-off-by: Jan Kiszka --- docs/specs/ivshmem-2-device-spec.md | 333 1 file changed, 333 insertions(+) create mode 100644 docs/specs/ivshmem-2-device-spec.md diff --git a/docs/specs/ivshmem-2-device-spec.md b/docs/specs/ivshmem-2-device-spec.md new file mode 100644 index 00..98cfde585a --- /dev/null +++ b/docs/specs/ivshmem-2-device-spec.md @@ -0,0 +1,333 @@ +IVSHMEM Device Specification + + +** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! ** + +The Inter-VM Shared Memory device provides the following features to its users: + +- Interconnection between up to 65536 peers + +- Multi-purpose shared memory region + +- common read/writable section + +- unidirectional sections that are read/writable for one peer and only + readable for the others + +- section with peer states + +- Event signaling via interrupt to remote sides + +- Support for life-cycle management via state value exchange and interrupt + notification on changes, backed by a shared memory section + +- Free choice of protocol to be used on top + +- Protocol type declaration + +- Unprivileged access to memory-mapped or I/O registers feasible + +- Discoverable and configurable via standard PCI mechanisms + + +Hypervisor Model + + +In order to provide a consistent link between peers, all connected instances of +IVSHMEM devices need to be configured, created and run by the hypervisor +according to the following requirements: + +- The instances of the device need to be accessible via PCI programming + interfaces on all sides. + +- The read/write shared memory section has to be of the same size for all + peers and, if non-zero, has to reflect the same memory content for them. + +- If output sections are present (non-zero section size), there must be one + reserved for each peer with exclusive write access. All output sections + must have the same size and must be readable for all peers. They have to + reflect the same memory content for all peers. + +- The State Table must have the same size for all peers, must be large enough to + hold a state values of all peers, and must be read-only for the user. + +- State register changes (explicit writes, peer resets) have to be propagated + to the other peers by updating the corresponding State Table entry and issuing + an interrupt to all other peers if they enabled reception. + +- Interrupts events triggered by a peer have to be delivered to the target peer, + provided the receiving side is valid and has enabled the reception. + +- All peers must have the same interrupt delivery features available, i.e. MSI-X + with the same maximum number of vectors on platforms supporting this + mechanism, otherwise INTx with one vector. + + +Guest-side Programming Model + + +An IVSHMEM device appears as a PCI device to its users. Unless otherwise noted, +it conforms to the PCI Local Bus Specification, Revision 3.0 As such, it is +discoverable via the PCI configuration space and provides a number of standard +and custom PCI configuration registers. + +### Shared Memory Region Layout + +The shared memory region is divided into several sections. + ++-+ - +| | : +| Output Section for peer n-1 | : Output Section Size +| (n = Maximum Peers) | : ++-+ - +: : +: : +: : ++-+ - +| | : +| Output Section for peer 1 | : Output Section Size +| | : ++-+ - +| | : +| Output Section for peer 0 | : Output Section Size +| | : ++-+ - +| | : +| Read/Write Section | : R/W Section Size +| | : ++-+ - +| | : +| State Table | : State Table Size +| | : ++-+ <-- Shared memory b
[RFC][PATCH 3/3] contrib: Add server for ivshmem revision 2
From: Jan Kiszka This implements the server process for ivshmem v2 device models of QEMU. Again, no effort has been spent yet on sharing code with the v1 server. Parts have been copied, others were rewritten. In addition to parameters of v1, this server now also specifies - the maximum number of peers to be connected (required to know in advance because of v2's state table) - the size of the output sections (can be 0) - the protocol ID to be published to all peers When a virtio protocol ID is chosen, only 2 peers can be connected. Furthermore, the server will signal the backend variant of the ID to the master instance and the frontend ID to the slave peer. To start, e.g., a server that allows virtio console over ivshmem, call ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003 TODO: specify the new server protocol. Signed-off-by: Jan Kiszka --- Makefile | 3 + Makefile.objs | 1 + configure | 1 + contrib/ivshmem2-server/Makefile.objs | 1 + contrib/ivshmem2-server/ivshmem2-server.c | 462 ++ contrib/ivshmem2-server/ivshmem2-server.h | 158 ++ contrib/ivshmem2-server/main.c| 313 7 files changed, 939 insertions(+) create mode 100644 contrib/ivshmem2-server/Makefile.objs create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h create mode 100644 contrib/ivshmem2-server/main.c diff --git a/Makefile b/Makefile index aa9d1a42aa..85b5e985ac 100644 --- a/Makefile +++ b/Makefile @@ -430,6 +430,7 @@ dummy := $(call unnest-vars,, \ elf2dmp-obj-y \ ivshmem-client-obj-y \ ivshmem-server-obj-y \ +ivshmem2-server-obj-y \ rdmacm-mux-obj-y \ libvhost-user-obj-y \ vhost-user-scsi-obj-y \ @@ -662,6 +663,8 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) $(COMMON_LDADDS) $(call LINK, $^) ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) $(COMMON_LDADDS) $(call LINK, $^) +ivshmem2-server$(EXESUF): $(ivshmem2-server-obj-y) $(COMMON_LDADDS) + $(call LINK, $^) endif vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) libvhost-user.a $(call LINK, $^) diff --git a/Makefile.objs b/Makefile.objs index 11ba1a36bd..a4a729d808 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -117,6 +117,7 @@ qga-vss-dll-obj-y = qga/ elf2dmp-obj-y = contrib/elf2dmp/ ivshmem-client-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-client/ ivshmem-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-server/ +ivshmem2-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem2-server/ libvhost-user-obj-y = contrib/libvhost-user/ vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS) vhost-user-scsi.o-libs := $(LIBISCSI_LIBS) diff --git a/configure b/configure index efe165edf9..1b466b2461 100755 --- a/configure +++ b/configure @@ -6184,6 +6184,7 @@ if test "$want_tools" = "yes" ; then fi if [ "$ivshmem" = "yes" ]; then tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools" +tools="ivshmem2-server\$(EXESUF) $tools" fi if [ "$curl" = "yes" ]; then tools="elf2dmp\$(EXESUF) $tools" diff --git a/contrib/ivshmem2-server/Makefile.objs b/contrib/ivshmem2-server/Makefile.objs new file mode 100644 index 00..d233e18ec8 --- /dev/null +++ b/contrib/ivshmem2-server/Makefile.objs @@ -0,0 +1 @@ +ivshmem2-server-obj-y = ivshmem2-server.o main.o diff --git a/contrib/ivshmem2-server/ivshmem2-server.c b/contrib/ivshmem2-server/ivshmem2-server.c new file mode 100644 index 00..b341f1fcd0 --- /dev/null +++ b/contrib/ivshmem2-server/ivshmem2-server.c @@ -0,0 +1,462 @@ +/* + * Copyright 6WIND S.A., 2014 + * Copyright (c) Siemens AG, 2019 + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * (at your option) any later version. See the COPYING file in the + * top-level directory. + */ +#include "qemu/osdep.h" +#include "qemu/host-utils.h" +#include "qemu/sockets.h" +#include "qemu/atomic.h" + +#include +#include + +#include "ivshmem2-server.h" + +/* log a message on stdout if verbose=1 */ +#define IVSHMEM_SERVER_DEBUG(server, fmt, ...) do { \ +if ((server)->args.verbose) { \ +printf(fmt, ## __VA_ARGS__); \ +}\ +} while (0) + +/** maximum size of a huge page, used by ivshmem_server_ftruncate() */ +#define IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE (1024 * 1024 * 1024) + +/** default listen backlog (number of sockets not accepted) */ +#define IVSHMEM_SERVER_LISTEN_BACKLOG 10 + +/* send message to a client unix socket */ +static int ivshmem_server_send_msg(int sock_fd, void *payload, int len, int fd) +{ +in
Re: [PATCH] MAINTAINERS: slirp: Remove myself as maintainer
On 27.07.19 12:00, Marc-André Lureau wrote: Hi On Sat, Jul 27, 2019 at 10:16 AM Jan Kiszka wrote: From: Jan Kiszka I haven't been doing anything here for a long time, nor does my git repo still play a role. Signed-off-by: Jan Kiszka too bad, we could still use some help ;) thanks anyway! Reviewed-by: Marc-André Lureau --- MAINTAINERS | 2 -- 1 file changed, 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index d6de200453..238feb965f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2089,13 +2089,11 @@ F: include/hw/registerfields.h SLIRP M: Samuel Thibault -M: Jan Kiszka S: Maintained F: slirp/ F: net/slirp.c F: include/net/slirp.h T: git https://people.debian.org/~sthibault/qemu.git slirp -T: git git://git.kiszka.org/qemu.git queues/slirp Stubs M: Paolo Bonzini -- 2.16.4 May I point out that this one was never merged? Sorry, I really can't help in this area anymore. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [PATCH v2] ivshmem-server: Terminate also on SIGINT
On 03.08.19 15:22, Jan Kiszka wrote: From: Jan Kiszka Allows to shutdown a foreground session via ctrl-c. Signed-off-by: Jan Kiszka --- Changes in v2: - adjust error message contrib/ivshmem-server/main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c index 197c79c57e..e4cd35f74c 100644 --- a/contrib/ivshmem-server/main.c +++ b/contrib/ivshmem-server/main.c @@ -223,8 +223,9 @@ main(int argc, char *argv[]) sa_quit.sa_handler = ivshmem_server_quit_cb; sa_quit.sa_flags = 0; if (sigemptyset(_quit.sa_mask) == -1 || -sigaction(SIGTERM, _quit, 0) == -1) { -perror("failed to add SIGTERM handler; sigaction"); +sigaction(SIGTERM, _quit, 0) == -1 || +sigaction(SIGINT, _quit, 0) == -1) { +perror("failed to add signal handler; sigaction"); goto err; } -- 2.16.4 ...and this one for you as well, Markus? Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [PATCH v2] ivshmem-server: Clean up shmem on shutdown
On 06.08.19 15:01, Claudio Fontana wrote: On 8/5/19 7:54 AM, Jan Kiszka wrote: From: Jan Kiszka So far, the server leaves the posix shared memory object behind when terminating, requiring the user to explicitly remove it in order to start a new instance. Signed-off-by: Jan Kiszka --- Changes in v2: - respect use_shm_open - also clean up in ivshmem_server_start error path contrib/ivshmem-server/ivshmem-server.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index 77f97b209c..88daee812d 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -353,6 +353,9 @@ ivshmem_server_start(IvshmemServer *server) err_close_sock: close(sock_fd); err_close_shm: +if (server->use_shm_open) { +shm_unlink(server->shm_path); +} close(shm_fd); return -1; } @@ -370,6 +373,9 @@ ivshmem_server_close(IvshmemServer *server) } unlink(server->unix_sock_path); +if (server->use_shm_open) { +shm_unlink(server->shm_path); +} close(server->sock_fd); close(server->shm_fd); server->sock_fd = -1; -- 2.16.4 Reviewed-by: Claudio Fontana Markus, would you take this? Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] i386/vmmouse: Properly reset state
On 29.08.19 20:00, Philippe Mathieu-Daudé wrote: Hi Jan, On 8/27/19 9:49 PM, Eduardo Habkost wrote: On Sun, Aug 25, 2019 at 04:58:18PM +0200, Jan Kiszka wrote: On 21.07.19 10:58, Jan Kiszka wrote: From: Jan Kiszka nb_queue was not zeroed so that we no longer delivered events if a previous guest left the device in an overflow state. The state of absolute does not matter as the next vmmouse_update_handler call will align it again. Signed-off-by: Jan Kiszka --- hw/i386/vmmouse.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/i386/vmmouse.c b/hw/i386/vmmouse.c index 5d2d278be4..e335bd07da 100644 --- a/hw/i386/vmmouse.c +++ b/hw/i386/vmmouse.c @@ -257,6 +257,7 @@ static void vmmouse_reset(DeviceState *d) VMMouseState *s = VMMOUSE(d); s->queue_size = VMMOUSE_QUEUE_SIZE; +s->nb_queue = 0; Don't we also need to reset the status in case vmmouse_get_status() is called directly after reset? s->status = 0; Thanks for checking. We call vmmouse_disable() here, and that sets status to 0x anyway. Jan With it: Reviewed-by: Philippe Mathieu-Daudé vmmouse_disable(s); } -- 2.16.4 Ping - or who is looking after this? Despite being in hw/i386, I think we can say vmmouse.c doesn't have a maintainer. Last time someone changed vmmouse.c in a meaningful way (not just adapting to API changes or removing duplicate code) was in 2012. Well it does has a few: $ ./scripts/get_maintainer.pl -f hw/i386/vmmouse.c "Michael S. Tsirkin" (supporter:PC) Marcel Apfelbaum (supporter:PC) Paolo Bonzini (maintainer:X86 TCG CPUs) Richard Henderson (maintainer:X86 TCG CPUs) Eduardo Habkost (maintainer:X86 TCG CPUs) However the correct section should rather be "PC Chipset". But the change makes sense to me, so: Reviewed-by: Eduardo Habkost I'll queue it.
Re: [Qemu-devel] [PATCH] i386/vmmouse: Properly reset state
On 21.07.19 10:58, Jan Kiszka wrote: From: Jan Kiszka nb_queue was not zeroed so that we no longer delivered events if a previous guest left the device in an overflow state. The state of absolute does not matter as the next vmmouse_update_handler call will align it again. Signed-off-by: Jan Kiszka --- hw/i386/vmmouse.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/i386/vmmouse.c b/hw/i386/vmmouse.c index 5d2d278be4..e335bd07da 100644 --- a/hw/i386/vmmouse.c +++ b/hw/i386/vmmouse.c @@ -257,6 +257,7 @@ static void vmmouse_reset(DeviceState *d) VMMouseState *s = VMMOUSE(d); s->queue_size = VMMOUSE_QUEUE_SIZE; +s->nb_queue = 0; vmmouse_disable(s); } -- 2.16.4 Ping - or who is looking after this? Jan
[Qemu-devel] [PATCH] kvm: vmxcap: Enhance with latest features
Based on SDM from May 2019. Signed-off-by: Jan Kiszka --- scripts/kvm/vmxcap | 8 1 file changed, 8 insertions(+) diff --git a/scripts/kvm/vmxcap b/scripts/kvm/vmxcap index 99a8146aaa..d8c7d6dfb8 100755 --- a/scripts/kvm/vmxcap +++ b/scripts/kvm/vmxcap @@ -178,7 +178,11 @@ controls = [ 19: 'Conceal non-root operation from PT', 20: 'Enable XSAVES/XRSTORS', 22: 'Mode-based execute control (XS/XU)', +23: 'Sub-page write permissions', +24: 'GPA translation for PT', 25: 'TSC scaling', +26: 'User wait and pause', +28: 'ENCLV exiting', }, cap_msr = MSR_IA32_VMX_PROCBASED_CTLS2, ), @@ -197,6 +201,7 @@ controls = [ 22: 'Save VMX-preemption timer value', 23: 'Clear IA32_BNDCFGS', 24: 'Conceal VM exits from PT', +25: 'Clear IA32_RTIT_CTL', }, cap_msr = MSR_IA32_VMX_EXIT_CTLS, true_cap_msr = MSR_IA32_VMX_TRUE_EXIT_CTLS, @@ -214,6 +219,7 @@ controls = [ 15: 'Load IA32_EFER', 16: 'Load IA32_BNDCFGS', 17: 'Conceal VM entries from PT', +18: 'Load IA32_RTIT_CTL', }, cap_msr = MSR_IA32_VMX_ENTRY_CTLS, true_cap_msr = MSR_IA32_VMX_TRUE_ENTRY_CTLS, @@ -227,6 +233,7 @@ controls = [ 6: 'HLT activity state', 7: 'Shutdown activity state', 8: 'Wait-for-SIPI activity state', +14: 'PT in VMX operation', 15: 'IA32_SMBASE support', (16,24): 'Number of CR3-target values', (25,27): 'MSR-load/store count recommendation', @@ -249,6 +256,7 @@ controls = [ 17: '1GB EPT pages', 20: 'INVEPT supported', 21: 'EPT accessed and dirty flags', +22: 'Advanced VM-exit information for EPT violations', 25: 'Single-context INVEPT', 26: 'All-context INVEPT', 32: 'INVVPID supported', -- 2.16.4
Re: [Qemu-devel] [PATCH v2] ivshmem-server: Terminate also on SIGINT
On 05.08.19 10:33, Stefano Garzarella wrote: > On Sat, Aug 03, 2019 at 03:22:04PM +0200, Jan Kiszka wrote: >> From: Jan Kiszka >> >> Allows to shutdown a foreground session via ctrl-c. >> >> Signed-off-by: Jan Kiszka >> --- >> >> Changes in v2: >> - adjust error message >> >> contrib/ivshmem-server/main.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c >> index 197c79c57e..e4cd35f74c 100644 >> --- a/contrib/ivshmem-server/main.c >> +++ b/contrib/ivshmem-server/main.c >> @@ -223,8 +223,9 @@ main(int argc, char *argv[]) >> sa_quit.sa_handler = ivshmem_server_quit_cb; >> sa_quit.sa_flags = 0; >> if (sigemptyset(_quit.sa_mask) == -1 || >> -sigaction(SIGTERM, _quit, 0) == -1) { >> -perror("failed to add SIGTERM handler; sigaction"); >> +sigaction(SIGTERM, _quit, 0) == -1 || >> +sigaction(SIGINT, _quit, 0) == -1) { >> +perror("failed to add signal handler; sigaction"); >> goto err; >> } > > Reviewed-by: Stefano Garzarella Thanks. > > Not related with this patch, but since I was looking at the code, > I noticed the 'ivshmem_server_quit' variable, set in the signal handler, > is not volatile. > Should we define it volatile to avoid possible compiler optimizations? Yes, would be better. I'm sure there is more, possibly also on the current device model side. I only started to dig through it while implementing a new revision of the protocol on top and aside of it. Jan
[Qemu-devel] [PATCH v2] ivshmem-server: Clean up shmem on shutdown
From: Jan Kiszka So far, the server leaves the posix shared memory object behind when terminating, requiring the user to explicitly remove it in order to start a new instance. Signed-off-by: Jan Kiszka --- Changes in v2: - respect use_shm_open - also clean up in ivshmem_server_start error path contrib/ivshmem-server/ivshmem-server.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index 77f97b209c..88daee812d 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -353,6 +353,9 @@ ivshmem_server_start(IvshmemServer *server) err_close_sock: close(sock_fd); err_close_shm: +if (server->use_shm_open) { +shm_unlink(server->shm_path); +} close(shm_fd); return -1; } @@ -370,6 +373,9 @@ ivshmem_server_close(IvshmemServer *server) } unlink(server->unix_sock_path); +if (server->use_shm_open) { +shm_unlink(server->shm_path); +} close(server->sock_fd); close(server->shm_fd); server->sock_fd = -1; -- 2.16.4
[Qemu-devel] [PATCH v2] ivshmem-server: Terminate also on SIGINT
From: Jan Kiszka Allows to shutdown a foreground session via ctrl-c. Signed-off-by: Jan Kiszka --- Changes in v2: - adjust error message contrib/ivshmem-server/main.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c index 197c79c57e..e4cd35f74c 100644 --- a/contrib/ivshmem-server/main.c +++ b/contrib/ivshmem-server/main.c @@ -223,8 +223,9 @@ main(int argc, char *argv[]) sa_quit.sa_handler = ivshmem_server_quit_cb; sa_quit.sa_flags = 0; if (sigemptyset(_quit.sa_mask) == -1 || -sigaction(SIGTERM, _quit, 0) == -1) { -perror("failed to add SIGTERM handler; sigaction"); +sigaction(SIGTERM, _quit, 0) == -1 || +sigaction(SIGINT, _quit, 0) == -1) { +perror("failed to add signal handler; sigaction"); goto err; } -- 2.16.4
[Qemu-devel] [PATCH] ivshmem-server: Terminate also on SIGINT
From: Jan Kiszka Allows to shutdown a foreground session via ctrl-c. Signed-off-by: Jan Kiszka --- contrib/ivshmem-server/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c index 197c79c57e..8a81cdb04c 100644 --- a/contrib/ivshmem-server/main.c +++ b/contrib/ivshmem-server/main.c @@ -223,7 +223,8 @@ main(int argc, char *argv[]) sa_quit.sa_handler = ivshmem_server_quit_cb; sa_quit.sa_flags = 0; if (sigemptyset(_quit.sa_mask) == -1 || -sigaction(SIGTERM, _quit, 0) == -1) { +sigaction(SIGTERM, _quit, 0) == -1 || +sigaction(SIGINT, _quit, 0) == -1) { perror("failed to add SIGTERM handler; sigaction"); goto err; } -- 2.16.4
[Qemu-devel] [PATCH] ivshmem-server: Clean up shmem on shutdown
From: Jan Kiszka So far, the server leaves the posix shared memory object behind when terminating, requiring the user to explicitly remove it in order to start a new instance. Signed-off-by: Jan Kiszka --- contrib/ivshmem-server/ivshmem-server.c | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c index 77f97b209c..9b9dbc87ec 100644 --- a/contrib/ivshmem-server/ivshmem-server.c +++ b/contrib/ivshmem-server/ivshmem-server.c @@ -370,6 +370,7 @@ ivshmem_server_close(IvshmemServer *server) } unlink(server->unix_sock_path); +shm_unlink(server->shm_path); close(server->sock_fd); close(server->shm_fd); server->sock_fd = -1; -- 2.16.4
[Qemu-devel] [PATCH] MAINTAINERS: slirp: Remove myself as maintainer
From: Jan Kiszka I haven't been doing anything here for a long time, nor does my git repo still play a role. Signed-off-by: Jan Kiszka --- MAINTAINERS | 2 -- 1 file changed, 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index d6de200453..238feb965f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2089,13 +2089,11 @@ F: include/hw/registerfields.h SLIRP M: Samuel Thibault -M: Jan Kiszka S: Maintained F: slirp/ F: net/slirp.c F: include/net/slirp.h T: git https://people.debian.org/~sthibault/qemu.git slirp -T: git git://git.kiszka.org/qemu.git queues/slirp Stubs M: Paolo Bonzini -- 2.16.4
Re: [Qemu-devel] [PATCH] i386/kvm: Do not sync nested state during runtime
On 22.07.19 12:31, Liran Alon wrote: >> On 22 Jul 2019, at 13:20, Jan Kiszka wrote: >> >> On 22.07.19 11:44, Liran Alon wrote: >>> >>> >>>> On 22 Jul 2019, at 7:00, Jan Kiszka wrote: >>>> >>>> Writing the nested state e.g. after a vmport access can invalidate >>>> important parts of the kernel-internal state, and it is not needed as >>>> well. So leave this out from KVM_PUT_RUNTIME_STATE. >>>> >>>> Suggested-by: Paolo Bonzini >>>> Signed-off-by: Jan Kiszka >>> >>> As QEMU never modifies vCPU nested-state in userspace besides in vmload and >>> vCPU creation, >>> shouldn’t this be under KVM_PUT_FULL_STATE? Same as the call to >>> kvm_arch_set_tsc_khz(). >> >> Reset is a relevant modification as well. If we do not write back a state >> that >> is disabling virtualization, we break. >> >> Jan > > Currently QEMU writes to userspace maintained nested-state only at > kvm_arch_init_vcpu() and > when loading vmstate_nested_state vmstate subsection. > kvm_arch_reset_vcpu() do not modify userspace maintained nested-state. Hmm, then we probably achieve that effect by clearing the related bit in CR4. If doing that implies an invalidation of the nested state by KVM, we are fine. Otherwise, I would expect userspace to reset the state to VMCLEAR and purge any traces of prior use. > > I would expect KVM internal nested-state to be reset as part of KVM’s > vmx_vcpu_reset(). vmx_vcpu_reset is not issued on userspace initiated VM reset, only on INIT/SIPI cycles. It's misleading, and I remember myself running into that when I hacked on KVM back then. Jan > Are we missing a call to vmx_leave_nested() there? > > -Liran > -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] i386/kvm: Do not sync nested state during runtime
On 22.07.19 11:44, Liran Alon wrote: > > >> On 22 Jul 2019, at 7:00, Jan Kiszka wrote: >> >> Writing the nested state e.g. after a vmport access can invalidate >> important parts of the kernel-internal state, and it is not needed as >> well. So leave this out from KVM_PUT_RUNTIME_STATE. >> >> Suggested-by: Paolo Bonzini >> Signed-off-by: Jan Kiszka > > As QEMU never modifies vCPU nested-state in userspace besides in vmload and > vCPU creation, > shouldn’t this be under KVM_PUT_FULL_STATE? Same as the call to > kvm_arch_set_tsc_khz(). Reset is a relevant modification as well. If we do not write back a state that is disabling virtualization, we break. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH] i386/kvm: Do not sync nested state during runtime
Writing the nested state e.g. after a vmport access can invalidate important parts of the kernel-internal state, and it is not needed as well. So leave this out from KVM_PUT_RUNTIME_STATE. Suggested-by: Paolo Bonzini Signed-off-by: Jan Kiszka --- target/i386/kvm.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index ec7870c6af..da98b2cbca 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -3577,12 +3577,12 @@ int kvm_arch_put_registers(CPUState *cpu, int level) assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu)); -ret = kvm_put_nested_state(x86_cpu); -if (ret < 0) { -return ret; -} - if (level >= KVM_PUT_RESET_STATE) { +ret = kvm_put_nested_state(x86_cpu); +if (ret < 0) { +return ret; +} + ret = kvm_put_msr_feature_control(x86_cpu); if (ret < 0) { return ret; -- 2.16.4
[Qemu-devel] [PATCH] i386/vmmouse: Properly reset state
From: Jan Kiszka nb_queue was not zeroed so that we no longer delivered events if a previous guest left the device in an overflow state. The state of absolute does not matter as the next vmmouse_update_handler call will align it again. Signed-off-by: Jan Kiszka --- hw/i386/vmmouse.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/i386/vmmouse.c b/hw/i386/vmmouse.c index 5d2d278be4..e335bd07da 100644 --- a/hw/i386/vmmouse.c +++ b/hw/i386/vmmouse.c @@ -257,6 +257,7 @@ static void vmmouse_reset(DeviceState *d) VMMouseState *s = VMMOUSE(d); s->queue_size = VMMOUSE_QUEUE_SIZE; +s->nb_queue = 0; vmmouse_disable(s); } -- 2.16.4
Re: [Qemu-devel] [PATCH] ioapic: kvm: Skip route updates for masked pins
On 03.06.19 02:36, Michael S. Tsirkin wrote: > On Sun, Jun 02, 2019 at 01:42:13PM +0200, Jan Kiszka wrote: >> From: Jan Kiszka >> >> Masked entries will not generate interrupt messages, thus do no need to >> be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of >> the kind >> >> qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, >> high=0xff00, low=0x100) >> >> if the masked entry happens to reference a non-present IRTE. >> >> Signed-off-by: Jan Kiszka > > Reviewed-by: Michael S. Tsirkin > >> --- >> hw/intc/ioapic.c | 8 +--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c >> index 7074489fdf..2fb288a22d 100644 >> --- a/hw/intc/ioapic.c >> +++ b/hw/intc/ioapic.c >> @@ -197,9 +197,11 @@ static void ioapic_update_kvm_routes(IOAPICCommonState >> *s) >> MSIMessage msg; >> struct ioapic_entry_info info; >> ioapic_entry_parse(s->ioredtbl[i], ); >> -msg.address = info.addr; >> -msg.data = info.data; >> -kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); >> +if (!info.masked) { >> +msg.address = info.addr; >> +msg.data = info.data; >> +kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); >> +} >> } >> kvm_irqchip_commit_routes(kvm_state); >> } >> -- >> 2.16.4 > > Ping. Or is this queued for 4.2? Jan
Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities
On 10.07.19 18:34, Paolo Bonzini wrote: > On 10/07/19 18:08, Jan Kiszka wrote: >> On 10.07.19 16:40, Paolo Bonzini wrote: >>> On 08/07/19 20:31, Jan Kiszka wrote: >>>>> -if (cpu_has_nested_virt(env) && !env->nested_state) { >>>>> +if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) { >>>>> error_report("Guest enabled nested virtualization but kernel " >>>>> "does not support saving of nested state"); >>>>> return -EINVAL; >>>>> >>>> Starting with this commit latest (bisection issue...), running Jailhouse >>>> in a >>>> guest first stalls L1 (looks like we lose interrupts), and if I try to >>>> reset >>>> that VM, I lose my host as well: >>> >>> If 5.2 is still broken, can you gather a dump of KVM_SET_NESTED_STATE's >>> payload? >> >> savevm or is there a dump function in QEMU or the kernel? > > There is qemu_hexdump. > Here are two states taken during the vmport loop: 2: nested_state: : 01 00 00 00 80 10 00 00 00 00 24 3a 00 00 00 00 ..$: 2: nested_state: 0010: 00 10 24 3a 00 00 00 00 00 00 00 00 00 00 00 00 ..$: 2: nested_state: 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0080: d0 7e e5 11 00 00 00 00 01 00 00 00 00 00 00 00 .~.. 2: nested_state: 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 00a0: 00 00 00 00 00 00 00 00 00 70 25 3a 00 00 00 00 .p%: 2: nested_state: 00b0: 00 80 25 3a 00 00 00 00 00 f0 00 3a 00 00 00 00 ..%:...: 2: nested_state: 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 00e0: 00 00 00 00 00 00 00 00 00 50 01 3a 00 00 00 00 .P.: 2: nested_state: 00f0: 00 00 00 00 00 00 00 00 1e e0 22 3a 00 00 00 00 ..": 2: nested_state: 0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0120: 00 00 00 00 00 00 00 00 10 00 00 00 01 00 00 00 2: nested_state: 0130: ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 2: nested_state: 0140: 06 01 07 00 06 01 07 00 01 0d 00 00 00 00 00 00 2: nested_state: 0150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0180: 06 01 07 00 00 00 00 00 00 05 00 00 00 00 00 00 2: nested_state: 0190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 01a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 01b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 01c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 01d0: 00 00 00 00 00 00 00 00 30 00 00 60 ff ff ff ff 0..` 2: nested_state: 01e0: 00 00 00 00 00 00 00 00 33 00 05 80 00 00 00 00 3... 2: nested_state: 01f0: a0 26 04 00 00 00 00 00 00 00 00 00 00 00 00 00 .&.. 2: nested_state: 0200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0220: 00 00 00 00 00 00 00 00 33 00 05 80 00 00 00 00 3... 2: nested_state: 0230: 00 20 6f 32 00 00 00 00 a0 26 04 00 00 00 00 00 . o2.&.. 2: nested_state: 0240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0260: 00 97 c4 ec b4 7f 00 00 00 00 00 00 00 00 00 00 2: nested_state: 0270: 00 00 00 00 00 00 00 00 00 90 06 00 00 fe ff ff 2: nested_state: 0280: 00 70 06 00 00 fe ff ff 00 00 00
Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities
On 10.07.19 16:40, Paolo Bonzini wrote: > On 08/07/19 20:31, Jan Kiszka wrote: >>> -if (cpu_has_nested_virt(env) && !env->nested_state) { >>> +if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) { >>> error_report("Guest enabled nested virtualization but kernel " >>> "does not support saving of nested state"); >>> return -EINVAL; >>> >> Starting with this commit latest (bisection issue...), running Jailhouse in a >> guest first stalls L1 (looks like we lose interrupts), and if I try to reset >> that VM, I lose my host as well: > > If 5.2 is still broken, can you gather a dump of KVM_SET_NESTED_STATE's > payload? savevm or is there a dump function in QEMU or the kernel? Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities
On 08.07.19 20:31, Jan Kiszka wrote: > > On 21.06.19 13:30, Paolo Bonzini wrote: >> From: Liran Alon >> >> Previous commits have added support for migration of nested virtualization >> workloads. This was done by utilising two new KVM capabilities: >> KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are >> required in order to correctly migrate such workloads. >> >> Therefore, change code to add a migration blocker for vCPUs exposed with >> Intel VMX or AMD SVM in case one of these kernel capabilities is >> missing. >> >> Signed-off-by: Liran Alon >> Reviewed-by: Maran Wilson >> Message-Id: <20190619162140.133674-11-liran.a...@oracle.com> >> Signed-off-by: Paolo Bonzini >> --- >> target/i386/kvm.c | 9 +++-- >> target/i386/machine.c | 2 +- >> 2 files changed, 8 insertions(+), 3 deletions(-) >> >> diff --git a/target/i386/kvm.c b/target/i386/kvm.c >> index c931e9d..e4b4f57 100644 >> --- a/target/i386/kvm.c >> +++ b/target/i386/kvm.c >> @@ -1640,9 +1640,14 @@ int kvm_arch_init_vcpu(CPUState *cs) >>!!(c->ecx & CPUID_EXT_SMX); >> } >> >> -if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) { >> +if (cpu_has_vmx(env) && !nested_virt_mig_blocker && >> +((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) { >> error_setg(_virt_mig_blocker, >> - "Nested virtualization does not support live migration >> yet"); >> + "Kernel do not provide required capabilities for " >> + "nested virtualization migration. " >> + "(CAP_NESTED_STATE=%d, CAP_EXCEPTION_PAYLOAD=%d)", >> + kvm_max_nested_state_length() > 0, >> + has_exception_payload); >> r = migrate_add_blocker(nested_virt_mig_blocker, _err); >> if (local_err) { >> error_report_err(local_err); >> diff --git a/target/i386/machine.c b/target/i386/machine.c >> index fc49e5a..851b249 100644 >> --- a/target/i386/machine.c >> +++ b/target/i386/machine.c >> @@ -233,7 +233,7 @@ static int cpu_pre_save(void *opaque) >> >> #ifdef CONFIG_KVM >> /* Verify we have nested virtualization state from kernel if required */ >> -if (cpu_has_nested_virt(env) && !env->nested_state) { >> +if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) { >> error_report("Guest enabled nested virtualization but kernel " >> "does not support saving of nested state"); >> return -EINVAL; >> > > Starting with this commit latest (bisection issue...), running Jailhouse in a > guest first stalls L1 (looks like we lose interrupts), and if I try to reset > that VM, I lose my host as well: > > kvm: vmptrld (null)/6eb9 failed > kvm: vmclear fail: (null)/6eb9 > > and then things start to lock up because we seem to lose the CPUs the guest > was > running on. Once I had this in the logs: > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 15040 > jiffies s: 4673 root: 0x80/. > rcu: blocking rcu_node structures: > Task dump for CPU 7: > qemu-system-x86 R running task0 17413 17345 0x0008 > Call Trace: > ? x86_virt_spec_ctrl+0x7/0xe0 > ? vmx_vcpu_run.part.0+0x2a4/0xfa0 [kvm_intel] > ? vcpu_enter_guest+0x349/0xe80 [kvm] > ? kvm_arch_vcpu_ioctl_run+0xff/0x550 [kvm] > ? kvm_vcpu_ioctl+0x20d/0x590 [kvm] > ? get_futex_key+0x35d/0x3b0 > ? do_vfs_ioctl+0x447/0x640 > ? do_futex+0x157/0x1d0 > ? ksys_ioctl+0x5e/0x90 > ? __x64_sys_ioctl+0x16/0x20 > ? do_syscall_64+0x60/0x120 > ? entry_SYSCALL_64_after_hwframe+0x49/0xbe > > This was on a 5.1.16 distro kernel. Currently rebuilding 5.2 vanilla. > > Looks like we have up to two critical bugs here... > > Jan > It turns out it's actually patch 20 that introduces the problem. Maybe it is only the kernel, but I rather suspect a combination of userspace not providing the right state (specifically on reset) and the kernel accepting that. Continuing the search on 5.2 now. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PULL 22/25] target/i386: kvm: Add nested migration blocker only when kernel lacks required capabilities
On 21.06.19 13:30, Paolo Bonzini wrote: > From: Liran Alon > > Previous commits have added support for migration of nested virtualization > workloads. This was done by utilising two new KVM capabilities: > KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are > required in order to correctly migrate such workloads. > > Therefore, change code to add a migration blocker for vCPUs exposed with > Intel VMX or AMD SVM in case one of these kernel capabilities is > missing. > > Signed-off-by: Liran Alon > Reviewed-by: Maran Wilson > Message-Id: <20190619162140.133674-11-liran.a...@oracle.com> > Signed-off-by: Paolo Bonzini > --- > target/i386/kvm.c | 9 +++-- > target/i386/machine.c | 2 +- > 2 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/target/i386/kvm.c b/target/i386/kvm.c > index c931e9d..e4b4f57 100644 > --- a/target/i386/kvm.c > +++ b/target/i386/kvm.c > @@ -1640,9 +1640,14 @@ int kvm_arch_init_vcpu(CPUState *cs) >!!(c->ecx & CPUID_EXT_SMX); > } > > -if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) { > +if (cpu_has_vmx(env) && !nested_virt_mig_blocker && > +((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) { > error_setg(_virt_mig_blocker, > - "Nested virtualization does not support live migration > yet"); > + "Kernel do not provide required capabilities for " > + "nested virtualization migration. " > + "(CAP_NESTED_STATE=%d, CAP_EXCEPTION_PAYLOAD=%d)", > + kvm_max_nested_state_length() > 0, > + has_exception_payload); > r = migrate_add_blocker(nested_virt_mig_blocker, _err); > if (local_err) { > error_report_err(local_err); > diff --git a/target/i386/machine.c b/target/i386/machine.c > index fc49e5a..851b249 100644 > --- a/target/i386/machine.c > +++ b/target/i386/machine.c > @@ -233,7 +233,7 @@ static int cpu_pre_save(void *opaque) > > #ifdef CONFIG_KVM > /* Verify we have nested virtualization state from kernel if required */ > -if (cpu_has_nested_virt(env) && !env->nested_state) { > +if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) { > error_report("Guest enabled nested virtualization but kernel " > "does not support saving of nested state"); > return -EINVAL; > Starting with this commit latest (bisection issue...), running Jailhouse in a guest first stalls L1 (looks like we lose interrupts), and if I try to reset that VM, I lose my host as well: kvm: vmptrld (null)/6eb9 failed kvm: vmclear fail: (null)/6eb9 and then things start to lock up because we seem to lose the CPUs the guest was running on. Once I had this in the logs: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 15040 jiffies s: 4673 root: 0x80/. rcu: blocking rcu_node structures: Task dump for CPU 7: qemu-system-x86 R running task0 17413 17345 0x0008 Call Trace: ? x86_virt_spec_ctrl+0x7/0xe0 ? vmx_vcpu_run.part.0+0x2a4/0xfa0 [kvm_intel] ? vcpu_enter_guest+0x349/0xe80 [kvm] ? kvm_arch_vcpu_ioctl_run+0xff/0x550 [kvm] ? kvm_vcpu_ioctl+0x20d/0x590 [kvm] ? get_futex_key+0x35d/0x3b0 ? do_vfs_ioctl+0x447/0x640 ? do_futex+0x157/0x1d0 ? ksys_ioctl+0x5e/0x90 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x60/0x120 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe This was on a 5.1.16 distro kernel. Currently rebuilding 5.2 vanilla. Looks like we have up to two critical bugs here... Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PULL 17/25] target/i386: kvm: Block migration for vCPUs exposed with nested virtualization
On 21.06.19 13:30, Paolo Bonzini wrote: > From: Liran Alon > > Commit d98f26073beb ("target/i386: kvm: add VMX migration blocker") > added a migration blocker for vCPU exposed with Intel VMX. > However, migration should also be blocked for vCPU exposed with > AMD SVM. > > Both cases should be blocked because QEMU should extract additional > vCPU state from KVM that should be migrated as part of vCPU VMState. > E.g. Whether vCPU is running in guest-mode or host-mode. > > Fixes: d98f26073beb ("target/i386: kvm: add VMX migration blocker") > Reviewed-by: Maran Wilson > Signed-off-by: Liran Alon > Message-Id: <20190619162140.133674-6-liran.a...@oracle.com> > Signed-off-by: Paolo Bonzini > --- > target/i386/cpu.c | 6 -- > target/i386/cpu.h | 12 > target/i386/kvm.c | 14 +++--- > 3 files changed, 19 insertions(+), 13 deletions(-) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index c330fd9..61e44cb 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -5215,12 +5215,6 @@ static int x86_cpu_filter_features(X86CPU *cpu) > return rv; > } > > -#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \ > - (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \ > - (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3) > -#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \ > - (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \ > - (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3) > static void x86_cpu_realizefn(DeviceState *dev, Error **errp) > { > CPUState *cs = CPU(dev); > diff --git a/target/i386/cpu.h b/target/i386/cpu.h > index 7f48136..bf0c9c2 100644 > --- a/target/i386/cpu.h > +++ b/target/i386/cpu.h > @@ -722,6 +722,13 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; > > #define CPUID_VENDOR_HYGON"HygonGenuine" > > +#define IS_INTEL_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_INTEL_1 && \ > + (env)->cpuid_vendor2 == CPUID_VENDOR_INTEL_2 && \ > + (env)->cpuid_vendor3 == CPUID_VENDOR_INTEL_3) > +#define IS_AMD_CPU(env) ((env)->cpuid_vendor1 == CPUID_VENDOR_AMD_1 && \ > + (env)->cpuid_vendor2 == CPUID_VENDOR_AMD_2 && \ > + (env)->cpuid_vendor3 == CPUID_VENDOR_AMD_3) > + > #define CPUID_MWAIT_IBE (1U << 1) /* Interrupts can exit capability */ > #define CPUID_MWAIT_EMX (1U << 0) /* enumeration supported */ > > @@ -1848,6 +1855,11 @@ static inline int32_t x86_get_a20_mask(CPUX86State > *env) > } > } > > +static inline bool cpu_has_vmx(CPUX86State *env) > +{ > +return env->features[FEAT_1_ECX] & CPUID_EXT_VMX; > +} > + > /* fpu_helper.c */ > void update_fp_status(CPUX86State *env); > void update_mxcsr_status(CPUX86State *env); > diff --git a/target/i386/kvm.c b/target/i386/kvm.c > index 9864aa0..f9872f1 100644 > --- a/target/i386/kvm.c > +++ b/target/i386/kvm.c > @@ -1299,7 +1299,7 @@ static int hyperv_init_vcpu(X86CPU *cpu) > } > > static Error *invtsc_mig_blocker; > -static Error *vmx_mig_blocker; > +static Error *nested_virt_mig_blocker; > > #define KVM_MAX_CPUID_ENTRIES 100 > > @@ -1597,13 +1597,13 @@ int kvm_arch_init_vcpu(CPUState *cs) >!!(c->ecx & CPUID_EXT_SMX); > } > > -if ((env->features[FEAT_1_ECX] & CPUID_EXT_VMX) && !vmx_mig_blocker) { > -error_setg(_mig_blocker, > - "Nested VMX virtualization does not support live > migration yet"); > -r = migrate_add_blocker(vmx_mig_blocker, _err); > +if (cpu_has_nested_virt(env) && !nested_virt_mig_blocker) { This broke bisection (which I currently have to do because this pull manages to lock up 5.1 host kernels): cpu_has_nested_virt will only come later in this series. Jan > +error_setg(_virt_mig_blocker, > + "Nested virtualization does not support live migration > yet"); > +r = migrate_add_blocker(nested_virt_mig_blocker, _err); > if (local_err) { > error_report_err(local_err); > -error_free(vmx_mig_blocker); > +error_free(nested_virt_mig_blocker); > return r; > } > } > @@ -1674,7 +1674,7 @@ int kvm_arch_init_vcpu(CPUState *cs) > fail: > migrate_del_blocker(invtsc_mig_blocker); > fail2: > -migrate_del_blocker(vmx_mig_blocker); > +migrate_del_blocker(nested_virt_mig_blocker); > > return r; > } > -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH] hw/arm/virt: Add support for Cortex-A7
From: Jan Kiszka No reason to deny this type. Signed-off-by: Jan Kiszka --- hw/arm/virt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 431e2900fd..ed009fa447 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -176,6 +176,7 @@ static const int a15irqmap[] = { }; static const char *valid_cpus[] = { +ARM_CPU_TYPE_NAME("cortex-a7"), ARM_CPU_TYPE_NAME("cortex-a15"), ARM_CPU_TYPE_NAME("cortex-a53"), ARM_CPU_TYPE_NAME("cortex-a57"), -- 2.16.4
Re: [Qemu-devel] [PATCH] ioapic: kvm: Skip route updates for masked pins
On 02.06.19 14:10, Peter Xu wrote: On Sun, Jun 02, 2019 at 01:42:13PM +0200, Jan Kiszka wrote: From: Jan Kiszka Masked entries will not generate interrupt messages, thus do no need to be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of the kind qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, high=0xff00, low=0x100) if the masked entry happens to reference a non-present IRTE. Signed-off-by: Jan Kiszka Reviewed-by: Peter Xu Thanks, Jan. The "non-cosmetic" part of clearing of those entries (e.g. including when the entries were not setup correctly rather than masked) was never really implemented and that task has been on my todo list for quite a while but with a very low priority (low enough to sink...). I hope I didn't overlook its importance since AFAICT general OSs should hardly trigger those paths and so far I don't see it a very big issue. I triggered the message during the handover phase from Linux to Jailhouse. That involves reprogramming both IOAPIC and IRTEs - likely unusual sequences, I just didn't find invalid ones. Thanks, Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH] ioapic: kvm: Skip route updates for masked pins
From: Jan Kiszka Masked entries will not generate interrupt messages, thus do no need to be routed by KVM. This is a cosmetic cleanup, just avoiding warnings of the kind qemu-system-x86_64: vtd_irte_get: detected non-present IRTE (index=0, high=0xff00, low=0x100) if the masked entry happens to reference a non-present IRTE. Signed-off-by: Jan Kiszka --- hw/intc/ioapic.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c index 7074489fdf..2fb288a22d 100644 --- a/hw/intc/ioapic.c +++ b/hw/intc/ioapic.c @@ -197,9 +197,11 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s) MSIMessage msg; struct ioapic_entry_info info; ioapic_entry_parse(s->ioredtbl[i], ); -msg.address = info.addr; -msg.data = info.data; -kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); +if (!info.masked) { +msg.address = info.addr; +msg.data = info.data; +kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL); +} } kvm_irqchip_commit_routes(kvm_state); } -- 2.16.4
Re: [Qemu-devel] [PATCH v8 05/16] gdbstub: add multiprocess support to vCont packets
On 22.03.19 15:01, Luc Michel wrote: On 3/22/19 2:29 PM, Jan Kiszka wrote: On 07.12.18 10:01, Luc Michel wrote: Add the gdb_first_attached_cpu() and gdb_next_attached_cpu() to iterate over all the CPUs in currently attached processes. Add the gdb_first_cpu_in_process() and gdb_next_cpu_in_process() to iterate over CPUs of a given process. Use them to add multiprocess extension support to vCont packets. Signed-off-by: Luc Michel Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Edgar E. Iglesias Acked-by: Alistair Francis --- gdbstub.c | 117 +++--- 1 file changed, 102 insertions(+), 15 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 911faa225a..77b3dbb2c8 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -721,10 +721,40 @@ static CPUState *find_cpu(uint32_t thread_id) } return NULL; } +static CPUState *get_first_cpu_in_process(const GDBState *s, + GDBProcess *process) +{ +CPUState *cpu; + +CPU_FOREACH(cpu) { +if (gdb_get_cpu_pid(s, cpu) == process->pid) { +return cpu; +} +} + +return NULL; +} + +static CPUState *gdb_next_cpu_in_process(const GDBState *s, CPUState *cpu) +{ +uint32_t pid = gdb_get_cpu_pid(s, cpu); +cpu = CPU_NEXT(cpu); + +while (cpu) { +if (gdb_get_cpu_pid(s, cpu) == pid) { +break; +} + +cpu = CPU_NEXT(cpu); +} + +return cpu; +} + static CPUState *gdb_get_cpu(const GDBState *s, uint32_t pid, uint32_t tid) { GDBProcess *process; CPUState *cpu; @@ -750,10 +780,41 @@ static CPUState *gdb_get_cpu(const GDBState *s, uint32_t pid, uint32_t tid) } return cpu; } +/* Return the cpu following @cpu, while ignoring + * unattached processes. + */ +static CPUState *gdb_next_attached_cpu(const GDBState *s, CPUState *cpu) +{ +cpu = CPU_NEXT(cpu); + +while (cpu) { +if (gdb_get_cpu_process(s, cpu)->attached) { +break; +} + +cpu = CPU_NEXT(cpu); +} + +return cpu; +} + +/* Return the first attached cpu */ +static CPUState *gdb_first_attached_cpu(const GDBState *s) +{ +CPUState *cpu = first_cpu; +GDBProcess *process = gdb_get_cpu_process(s, cpu); + +if (!process->attached) { +return gdb_next_attached_cpu(s, cpu); +} + +return cpu; +} + static const char *get_feature_xml(const char *p, const char **newp, CPUClass *cc) { size_t len; int i; @@ -1088,14 +1149,16 @@ static int is_query_packet(const char *p, const char *query, char separator) * returns -ENOTSUP if a command is unsupported, -EINVAL or -ERANGE if there is * a format error, 0 on success. */ static int gdb_handle_vcont(GDBState *s, const char *p) { -int res, idx, signal = 0; +int res, signal = 0; char cur_action; char *newstates; unsigned long tmp; +uint32_t pid, tid; +GDBProcess *process; CPUState *cpu; #ifdef CONFIG_USER_ONLY int max_cpus = 1; /* global variable max_cpus exists only in system mode */ CPU_FOREACH(cpu) { @@ -1134,29 +1197,52 @@ static int gdb_handle_vcont(GDBState *s, const char *p) } else if (cur_action != 'c' && cur_action != 's') { /* unknown/invalid/unsupported command */ res = -ENOTSUP; goto out; } -/* thread specification. special values: (none), -1 = all; 0 = any */ -if ((p[0] == ':' && p[1] == '-' && p[2] == '1') || (p[0] != ':')) { -if (*p == ':') { -p += 3; -} -for (idx = 0; idx < max_cpus; idx++) { -if (newstates[idx] == 1) { -newstates[idx] = cur_action; + +if (*p++ != ':') { +res = -ENOTSUP; +goto out; +} + +switch (read_thread_id(p, , , )) { +case GDB_READ_THREAD_ERR: +res = -EINVAL; +goto out; + +case GDB_ALL_PROCESSES: +cpu = gdb_first_attached_cpu(s); +while (cpu) { +if (newstates[cpu->cpu_index] == 1) { +newstates[cpu->cpu_index] = cur_action; } + +cpu = gdb_next_attached_cpu(s, cpu); } -} else if (*p == ':') { -p++; -res = qemu_strtoul(p, , 16, ); -if (res) { +break; + +case GDB_ALL_THREADS: +process = gdb_get_process(s, pid); + +if (!process->attached) { +res = -EINVAL; goto out; } -/* 0 means any thread, so we pick the first valid CPU */ -cpu = tmp ? find_cpu(tmp) : first_cpu; +cpu = get_first_cpu_in
Re: [Qemu-devel] [PATCH v8 05/16] gdbstub: add multiprocess support to vCont packets
On 07.12.18 10:01, Luc Michel wrote: > Add the gdb_first_attached_cpu() and gdb_next_attached_cpu() to iterate > over all the CPUs in currently attached processes. > > Add the gdb_first_cpu_in_process() and gdb_next_cpu_in_process() to > iterate over CPUs of a given process. > > Use them to add multiprocess extension support to vCont packets. > > Signed-off-by: Luc Michel > Reviewed-by: Philippe Mathieu-Daudé > Reviewed-by: Edgar E. Iglesias > Acked-by: Alistair Francis > --- > gdbstub.c | 117 +++--- > 1 file changed, 102 insertions(+), 15 deletions(-) > > diff --git a/gdbstub.c b/gdbstub.c > index 911faa225a..77b3dbb2c8 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -721,10 +721,40 @@ static CPUState *find_cpu(uint32_t thread_id) > } > > return NULL; > } > > +static CPUState *get_first_cpu_in_process(const GDBState *s, > + GDBProcess *process) > +{ > +CPUState *cpu; > + > +CPU_FOREACH(cpu) { > +if (gdb_get_cpu_pid(s, cpu) == process->pid) { > +return cpu; > +} > +} > + > +return NULL; > +} > + > +static CPUState *gdb_next_cpu_in_process(const GDBState *s, CPUState *cpu) > +{ > +uint32_t pid = gdb_get_cpu_pid(s, cpu); > +cpu = CPU_NEXT(cpu); > + > +while (cpu) { > +if (gdb_get_cpu_pid(s, cpu) == pid) { > +break; > +} > + > +cpu = CPU_NEXT(cpu); > +} > + > +return cpu; > +} > + > static CPUState *gdb_get_cpu(const GDBState *s, uint32_t pid, uint32_t tid) > { > GDBProcess *process; > CPUState *cpu; > > @@ -750,10 +780,41 @@ static CPUState *gdb_get_cpu(const GDBState *s, > uint32_t pid, uint32_t tid) > } > > return cpu; > } > > +/* Return the cpu following @cpu, while ignoring > + * unattached processes. > + */ > +static CPUState *gdb_next_attached_cpu(const GDBState *s, CPUState *cpu) > +{ > +cpu = CPU_NEXT(cpu); > + > +while (cpu) { > +if (gdb_get_cpu_process(s, cpu)->attached) { > +break; > +} > + > +cpu = CPU_NEXT(cpu); > +} > + > +return cpu; > +} > + > +/* Return the first attached cpu */ > +static CPUState *gdb_first_attached_cpu(const GDBState *s) > +{ > +CPUState *cpu = first_cpu; > +GDBProcess *process = gdb_get_cpu_process(s, cpu); > + > +if (!process->attached) { > +return gdb_next_attached_cpu(s, cpu); > +} > + > +return cpu; > +} > + > static const char *get_feature_xml(const char *p, const char **newp, > CPUClass *cc) > { > size_t len; > int i; > @@ -1088,14 +1149,16 @@ static int is_query_packet(const char *p, const char > *query, char separator) >* returns -ENOTSUP if a command is unsupported, -EINVAL or -ERANGE if > there is >* a format error, 0 on success. >*/ > static int gdb_handle_vcont(GDBState *s, const char *p) > { > -int res, idx, signal = 0; > +int res, signal = 0; > char cur_action; > char *newstates; > unsigned long tmp; > +uint32_t pid, tid; > +GDBProcess *process; > CPUState *cpu; > #ifdef CONFIG_USER_ONLY > int max_cpus = 1; /* global variable max_cpus exists only in system > mode */ > > CPU_FOREACH(cpu) { > @@ -1134,29 +1197,52 @@ static int gdb_handle_vcont(GDBState *s, const char > *p) > } else if (cur_action != 'c' && cur_action != 's') { > /* unknown/invalid/unsupported command */ > res = -ENOTSUP; > goto out; > } > -/* thread specification. special values: (none), -1 = all; 0 = any */ > -if ((p[0] == ':' && p[1] == '-' && p[2] == '1') || (p[0] != ':')) { > -if (*p == ':') { > -p += 3; > -} > -for (idx = 0; idx < max_cpus; idx++) { > -if (newstates[idx] == 1) { > -newstates[idx] = cur_action; > + > +if (*p++ != ':') { > +res = -ENOTSUP; > +goto out; > +} > + > +switch (read_thread_id(p, , , )) { > +case GDB_READ_THREAD_ERR: > +res = -EINVAL; > +goto out; > + > +case GDB_ALL_PROCESSES: > +cpu = gdb_first_attached_cpu(s); > +while (cpu) { > +if (newstates[cpu->cpu_index] == 1) { > +newstates[cpu->cpu_index] = cur_action; > } > + > +cpu = gdb_next_attached_cpu(s, cpu); > } > -} else if (*p == ':') { > -p++; > -res = qemu_strtoul(p, , 16, ); > -if (res) { > +break; > + > +case GDB_ALL_THREADS: > +process = gdb_get_process(s, pid); > + > +if (!process->attached) { > +res = -EINVAL; > goto out; > } > > -
Re: [Qemu-devel] [PATCH 0/1] snip my name and email
On 21.02.19 17:48, Markus Armbruster wrote: Jan Kiszka writes: On 21.02.19 17:05, Eric Blake wrote: On 2/21/19 9:53 AM, David Kiarie wrote: the occurrence of my name and email on the files below may have led to some confusion in the reporting of a few recent bugs. i have therefore choosen to snip it. Dropping an email from the copyright line makes sense; dropping the Copyright declaration altogether is a bit odd (the GPL works only in tandem with a copyright assertion) - but as you are the author of the line and copyright holder of your contributions, I am not in a position to say you are wrong in removing it, only that it looks odd. Yeah, indeed. David, also note that you probably have been addressed because scripts/get_maintainer.pl will look into the git history of files that some patch addresses and pick up significant and/or recent contributors from there. MAINTAINERS covers these files, so the most common use of get_maintainer.pl won't list anyone not listed there: $ scripts/get_maintainer.pl -f hw/i386/amd_iommu.[ch] Paolo Bonzini (maintainer:X86) Richard Henderson (maintainer:X86) Eduardo Habkost (maintainer:X86) "Michael S. Tsirkin" (supporter:PC) Marcel Apfelbaum (supporter:PC) qemu-devel@nongnu.org (open list:All patches CC here) David's contributions have aged out of --git: $ scripts/get_maintainer.pl -f hw/i386/amd_iommu.[ch] --git "Michael S. Tsirkin" (supporter:PC,commit_signer:12/7=100%,commit_signer:10/5=100%) Marcel Apfelbaum (supporter:PC) Paolo Bonzini (maintainer:X86) Richard Henderson (maintainer:X86,commit_signer:1/7=14%) Eduardo Habkost (maintainer:X86) Peter Xu (commit_signer:6/7=86%,commit_signer:4/5=80%) Brijesh Singh (commit_signer:5/7=71%,commit_signer:4/5=80%) "Alex Bennée" (commit_signer:1/7=14%) Jan Kiszka (commit_signer:1/5=20%) qemu-devel@nongnu.org (open list:All patches CC here) However, --git-blame still lists him: $ scripts/get_maintainer.pl -f hw/i386/amd_iommu.[ch] --git-blame Paolo Bonzini (maintainer:X86,commits:7/24=29%) Richard Henderson (maintainer:X86) Eduardo Habkost (maintainer:X86,commits:7/24=29%) "Michael S. Tsirkin" (supporter:PC,commits:26/24=100%,commits:14/8=100%) Marcel Apfelbaum (supporter:PC) David Kiarie (authored lines:1176/1645=71%,authored lines:274/373=73%) Brijesh Singh (authored lines:403/1645=24%,authored lines:93/373=25%,commits:4/8=50%) Peter Xu (commits:10/24=42%,commits:4/8=50%) David Gibson (commits:7/24=29%) Jan Kiszka (commits:1/8=12%) Prasad J Pandit (commits:1/8=12%) qemu-devel@nongnu.org (open list:All patches CC here) --help admonishes: Using "--git-blame" is slow and may add old committers and authors that are no longer active maintainers to the output. There should be some opt-out statement from that, but I don't recall how. I don't think get_maintainer.pl supports a blacklist of people who don't want to be pestered anymore. # cat linux/.get_maintainer.ignore Christoph Hellwig That's why I remembered it vaguely. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 0/1] snip my name and email
On 21.02.19 17:05, Eric Blake wrote: On 2/21/19 9:53 AM, David Kiarie wrote: the occurrence of my name and email on the files below may have led to some confusion in the reporting of a few recent bugs. i have therefore choosen to snip it. Dropping an email from the copyright line makes sense; dropping the Copyright declaration altogether is a bit odd (the GPL works only in tandem with a copyright assertion) - but as you are the author of the line and copyright holder of your contributions, I am not in a position to say you are wrong in removing it, only that it looks odd. Yeah, indeed. David, also note that you probably have been addressed because scripts/get_maintainer.pl will look into the git history of files that some patch addresses and pick up significant and/or recent contributors from there. There should be some opt-out statement from that, but I don't recall how. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH] kvm: x86: Fix kvm_arch_fixup_msi_route for remap-less case
The AMD IOMMU does not (yet) support interrupt remapping. But kvm_arch_fixup_msi_route assumes that all implementations do and crashes when the AMD IOMMU is used in KVM mode. Fixes: 8b5ed7dffa1f ("intel_iommu: add support for split irqchip") Reported-by: Christopher Goldsworthy Signed-off-by: Jan Kiszka --- target/i386/kvm.c | 4 1 file changed, 4 insertions(+) diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 9313602d3d..1fe3a73a10 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -3677,6 +3677,10 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route, MSIMessage src, dst; X86IOMMUClass *class = X86_IOMMU_GET_CLASS(iommu); +if (!class->int_remap) { +return 0; +} + src.address = route->u.msi.address_hi; src.address <<= VTD_MSI_ADDR_HI_SHIFT; src.address |= route->u.msi.address_lo; -- 2.16.4
Re: [Qemu-devel] [PATCH v3 20/20] arm/virt: Add support for GICv2 virtualization extensions
On 2018-07-05 10:00, Jan Kiszka wrote: > On 2018-07-05 08:51, Jan Kiszka wrote: >> On 2018-06-29 15:29, Luc Michel wrote: >>> Add support for GICv2 virtualization extensions by mapping the necessary >>> I/O regions and connecting the maintenance IRQ lines. >>> >>> Declare those additions in the device tree and in the ACPI tables. >>> >>> Signed-off-by: Luc Michel >>> --- >>> hw/arm/virt-acpi-build.c | 4 >>> hw/arm/virt.c| 50 +--- >>> include/hw/arm/virt.h| 3 +++ >>> 3 files changed, 49 insertions(+), 8 deletions(-) >>> >>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >>> index 6ea47e2588..3b74bf0372 100644 >>> --- a/hw/arm/virt-acpi-build.c >>> +++ b/hw/arm/virt-acpi-build.c >>> @@ -659,6 +659,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, >>> VirtMachineState *vms) >>> gicc->length = sizeof(*gicc); >>> if (vms->gic_version == 2) { >>> gicc->base_address = cpu_to_le64(memmap[VIRT_GIC_CPU].base); >>> +gicc->gich_base_address = >>> cpu_to_le64(memmap[VIRT_GIC_HYP].base); >>> +gicc->gicv_base_address = >>> cpu_to_le64(memmap[VIRT_GIC_VCPU].base); >>> } >>> gicc->cpu_interface_number = cpu_to_le32(i); >>> gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); >>> @@ -670,6 +672,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, >>> VirtMachineState *vms) >>> } >>> if (vms->virt && vms->gic_version == 3) { >>> gicc->vgic_interrupt = cpu_to_le32(PPI(ARCH_GICV3_MAINT_IRQ)); >>> +} else if (vms->virt && vms->gic_version == 2) { >>> +gicc->vgic_interrupt = cpu_to_le32(PPI(ARCH_GICV2_MAINT_IRQ)); >>> } >>> } >>> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>> index 742f68afca..e45b9de3be 100644 >>> --- a/hw/arm/virt.c >>> +++ b/hw/arm/virt.c >>> @@ -131,6 +131,8 @@ static const MemMapEntry a15memmap[] = { >>> [VIRT_GIC_DIST] = { 0x0800, 0x0001 }, >>> [VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, >>> [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, >>> +[VIRT_GIC_HYP] ={ 0x0803, 0x1000 }, >>> +[VIRT_GIC_VCPU] = { 0x0804, 0x1000 }, >>> /* The space in between here is reserved for GICv3 CPU/vCPU/HYP */ >>> [VIRT_GIC_ITS] ={ 0x0808, 0x0002 }, >>> /* This redistributor space allows up to 2*64kB*123 CPUs */ >>> @@ -438,11 +440,26 @@ static void fdt_add_gic_node(VirtMachineState *vms) >>> /* 'cortex-a15-gic' means 'GIC v2' */ >>> qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible", >>> "arm,cortex-a15-gic"); >>> -qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >>> - 2, vms->memmap[VIRT_GIC_DIST].base, >>> - 2, vms->memmap[VIRT_GIC_DIST].size, >>> - 2, vms->memmap[VIRT_GIC_CPU].base, >>> - 2, vms->memmap[VIRT_GIC_CPU].size); >>> +if (!vms->virt) { >>> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >>> + 2, >>> vms->memmap[VIRT_GIC_DIST].base, >>> + 2, >>> vms->memmap[VIRT_GIC_DIST].size, >>> + 2, vms->memmap[VIRT_GIC_CPU].base, >>> + 2, >>> vms->memmap[VIRT_GIC_CPU].size); >>> +} else { >>> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >>> + 2, >>> vms->memmap[VIRT_GIC_DIST].base, >>> + 2, >>> vms->memmap[VIRT_GIC_DIST].size, >>> + 2, vms->memmap[VIRT_GIC_CPU].base, >>> + 2, vms->memmap[VIRT_GIC_CPU].size, >>> +
Re: [Qemu-devel] [PATCH v3 20/20] arm/virt: Add support for GICv2 virtualization extensions
On 2018-07-05 08:51, Jan Kiszka wrote: > On 2018-06-29 15:29, Luc Michel wrote: >> Add support for GICv2 virtualization extensions by mapping the necessary >> I/O regions and connecting the maintenance IRQ lines. >> >> Declare those additions in the device tree and in the ACPI tables. >> >> Signed-off-by: Luc Michel >> --- >> hw/arm/virt-acpi-build.c | 4 >> hw/arm/virt.c| 50 +--- >> include/hw/arm/virt.h| 3 +++ >> 3 files changed, 49 insertions(+), 8 deletions(-) >> >> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >> index 6ea47e2588..3b74bf0372 100644 >> --- a/hw/arm/virt-acpi-build.c >> +++ b/hw/arm/virt-acpi-build.c >> @@ -659,6 +659,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, >> VirtMachineState *vms) >> gicc->length = sizeof(*gicc); >> if (vms->gic_version == 2) { >> gicc->base_address = cpu_to_le64(memmap[VIRT_GIC_CPU].base); >> +gicc->gich_base_address = >> cpu_to_le64(memmap[VIRT_GIC_HYP].base); >> +gicc->gicv_base_address = >> cpu_to_le64(memmap[VIRT_GIC_VCPU].base); >> } >> gicc->cpu_interface_number = cpu_to_le32(i); >> gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); >> @@ -670,6 +672,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, >> VirtMachineState *vms) >> } >> if (vms->virt && vms->gic_version == 3) { >> gicc->vgic_interrupt = cpu_to_le32(PPI(ARCH_GICV3_MAINT_IRQ)); >> +} else if (vms->virt && vms->gic_version == 2) { >> +gicc->vgic_interrupt = cpu_to_le32(PPI(ARCH_GICV2_MAINT_IRQ)); >> } >> } >> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >> index 742f68afca..e45b9de3be 100644 >> --- a/hw/arm/virt.c >> +++ b/hw/arm/virt.c >> @@ -131,6 +131,8 @@ static const MemMapEntry a15memmap[] = { >> [VIRT_GIC_DIST] = { 0x0800, 0x0001 }, >> [VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, >> [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, >> +[VIRT_GIC_HYP] ={ 0x0803, 0x1000 }, >> +[VIRT_GIC_VCPU] = { 0x0804, 0x1000 }, >> /* The space in between here is reserved for GICv3 CPU/vCPU/HYP */ >> [VIRT_GIC_ITS] ={ 0x0808, 0x0002 }, >> /* This redistributor space allows up to 2*64kB*123 CPUs */ >> @@ -438,11 +440,26 @@ static void fdt_add_gic_node(VirtMachineState *vms) >> /* 'cortex-a15-gic' means 'GIC v2' */ >> qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible", >> "arm,cortex-a15-gic"); >> -qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >> - 2, vms->memmap[VIRT_GIC_DIST].base, >> - 2, vms->memmap[VIRT_GIC_DIST].size, >> - 2, vms->memmap[VIRT_GIC_CPU].base, >> - 2, vms->memmap[VIRT_GIC_CPU].size); >> +if (!vms->virt) { >> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >> + 2, vms->memmap[VIRT_GIC_DIST].base, >> + 2, vms->memmap[VIRT_GIC_DIST].size, >> + 2, vms->memmap[VIRT_GIC_CPU].base, >> + 2, vms->memmap[VIRT_GIC_CPU].size); >> +} else { >> +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", >> + 2, vms->memmap[VIRT_GIC_DIST].base, >> + 2, vms->memmap[VIRT_GIC_DIST].size, >> + 2, vms->memmap[VIRT_GIC_CPU].base, >> + 2, vms->memmap[VIRT_GIC_CPU].size, >> + 2, vms->memmap[VIRT_GIC_HYP].base, >> + 2, vms->memmap[VIRT_GIC_HYP].size, >> + 2, vms->memmap[VIRT_GIC_VCPU].base, >> + 2, >> vms->memmap[VIRT_GIC_VCPU].size); >> +qemu_fdt_setprop_cells(vms->fdt, "/intc", "
Re: [Qemu-devel] [PATCH v3 20/20] arm/virt: Add support for GICv2 virtualization extensions
On 2018-06-29 15:29, Luc Michel wrote: > Add support for GICv2 virtualization extensions by mapping the necessary > I/O regions and connecting the maintenance IRQ lines. > > Declare those additions in the device tree and in the ACPI tables. > > Signed-off-by: Luc Michel > --- > hw/arm/virt-acpi-build.c | 4 > hw/arm/virt.c| 50 +--- > include/hw/arm/virt.h| 3 +++ > 3 files changed, 49 insertions(+), 8 deletions(-) > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > index 6ea47e2588..3b74bf0372 100644 > --- a/hw/arm/virt-acpi-build.c > +++ b/hw/arm/virt-acpi-build.c > @@ -659,6 +659,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, > VirtMachineState *vms) > gicc->length = sizeof(*gicc); > if (vms->gic_version == 2) { > gicc->base_address = cpu_to_le64(memmap[VIRT_GIC_CPU].base); > +gicc->gich_base_address = cpu_to_le64(memmap[VIRT_GIC_HYP].base); > +gicc->gicv_base_address = > cpu_to_le64(memmap[VIRT_GIC_VCPU].base); > } > gicc->cpu_interface_number = cpu_to_le32(i); > gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); > @@ -670,6 +672,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, > VirtMachineState *vms) > } > if (vms->virt && vms->gic_version == 3) { > gicc->vgic_interrupt = cpu_to_le32(PPI(ARCH_GICV3_MAINT_IRQ)); > +} else if (vms->virt && vms->gic_version == 2) { > +gicc->vgic_interrupt = cpu_to_le32(PPI(ARCH_GICV2_MAINT_IRQ)); > } > } > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 742f68afca..e45b9de3be 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -131,6 +131,8 @@ static const MemMapEntry a15memmap[] = { > [VIRT_GIC_DIST] = { 0x0800, 0x0001 }, > [VIRT_GIC_CPU] ={ 0x0801, 0x0001 }, > [VIRT_GIC_V2M] ={ 0x0802, 0x1000 }, > +[VIRT_GIC_HYP] ={ 0x0803, 0x1000 }, > +[VIRT_GIC_VCPU] = { 0x0804, 0x1000 }, > /* The space in between here is reserved for GICv3 CPU/vCPU/HYP */ > [VIRT_GIC_ITS] ={ 0x0808, 0x0002 }, > /* This redistributor space allows up to 2*64kB*123 CPUs */ > @@ -438,11 +440,26 @@ static void fdt_add_gic_node(VirtMachineState *vms) > /* 'cortex-a15-gic' means 'GIC v2' */ > qemu_fdt_setprop_string(vms->fdt, "/intc", "compatible", > "arm,cortex-a15-gic"); > -qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", > - 2, vms->memmap[VIRT_GIC_DIST].base, > - 2, vms->memmap[VIRT_GIC_DIST].size, > - 2, vms->memmap[VIRT_GIC_CPU].base, > - 2, vms->memmap[VIRT_GIC_CPU].size); > +if (!vms->virt) { > +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", > + 2, vms->memmap[VIRT_GIC_DIST].base, > + 2, vms->memmap[VIRT_GIC_DIST].size, > + 2, vms->memmap[VIRT_GIC_CPU].base, > + 2, vms->memmap[VIRT_GIC_CPU].size); > +} else { > +qemu_fdt_setprop_sized_cells(vms->fdt, "/intc", "reg", > + 2, vms->memmap[VIRT_GIC_DIST].base, > + 2, vms->memmap[VIRT_GIC_DIST].size, > + 2, vms->memmap[VIRT_GIC_CPU].base, > + 2, vms->memmap[VIRT_GIC_CPU].size, > + 2, vms->memmap[VIRT_GIC_HYP].base, > + 2, vms->memmap[VIRT_GIC_HYP].size, > + 2, vms->memmap[VIRT_GIC_VCPU].base, > + 2, vms->memmap[VIRT_GIC_VCPU].size); > +qemu_fdt_setprop_cells(vms->fdt, "/intc", "interrupts", > + GIC_FDT_IRQ_TYPE_PPI, > ARCH_GICV2_MAINT_IRQ, > + GIC_FDT_IRQ_FLAGS_LEVEL_HI); > +} > } > > qemu_fdt_setprop_cell(vms->fdt, "/intc", "phandle", vms->gic_phandle); > @@ -563,6 +580,11 @@ static void create_gic(VirtMachineState *vms, qemu_irq > *pic) > qdev_prop_set_uint32(gicdev, "redist-region-count[1]", > MIN(smp_cpus - redist0_count, redist1_capacity)); > } > +} else { > +if (!kvm_irqchip_in_kernel()) { > +qdev_prop_set_bit(gicdev, "has-virtualization-extensions", > + vms->virt); > +} > } > qdev_init_nofail(gicdev); > gicbusdev = SYS_BUS_DEVICE(gicdev); > @@ -574,6 +596,10 @@ static void
Re: [Qemu-devel] [PATCH v2] e1000e: Prevent MSI/MSI-X storms
On 2018-07-02 05:40, Jason Wang wrote: > > > On 2018年06月30日 14:13, Jan Kiszka wrote: >> On 2018-04-05 19:41, Jan Kiszka wrote: >>> From: Jan Kiszka >>> >>> Only signal MSI/MSI-X events on rising edges. So far we re-triggered the >>> interrupt sources even if the guest did no consumed the pending one, >>> easily causing interrupt storms. >>> >>> Issue was observable with Linux 4.16 e1000e driver when MSI-X was used. >>> Vector 2 was causing interrupt storms after the driver activated the >>> device. >>> >>> Signed-off-by: Jan Kiszka >>> --- >>> >>> Changes in v2: >>> - also update msi_causes_pending after EIAC changes (required because >>> there is no e1000e_update_interrupt_state after that >>> >>> hw/net/e1000e_core.c | 11 +++ >>> hw/net/e1000e_core.h | 2 ++ >>> 2 files changed, 13 insertions(+) >>> >>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c >>> index c93c4661ed..d6ddd59986 100644 >>> --- a/hw/net/e1000e_core.c >>> +++ b/hw/net/e1000e_core.c >>> @@ -2027,6 +2027,7 @@ e1000e_msix_notify_one(E1000ECore *core, >>> uint32_t cause, uint32_t int_cfg) >>> } >>> core->mac[ICR] &= ~effective_eiac; >>> + core->msi_causes_pending &= ~effective_eiac; >>> if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { >>> core->mac[IMS] &= ~effective_eiac; >>> @@ -2123,6 +2124,13 @@ e1000e_send_msi(E1000ECore *core, bool msix) >>> { >>> uint32_t causes = core->mac[ICR] & core->mac[IMS] & >>> ~E1000_ICR_ASSERTED; >>> + core->msi_causes_pending &= causes; >>> + causes ^= core->msi_causes_pending; >>> + if (causes == 0) { >>> + return; >>> + } >>> + core->msi_causes_pending |= causes; >>> + >>> if (msix) { >>> e1000e_msix_notify(core, causes); >>> } else { >>> @@ -2160,6 +2168,9 @@ e1000e_update_interrupt_state(E1000ECore *core) >>> core->mac[ICS] = core->mac[ICR]; >>> interrupts_pending = (core->mac[IMS] & core->mac[ICR]) ? true >>> : false; >>> + if (!interrupts_pending) { >>> + core->msi_causes_pending = 0; >>> + } >>> trace_e1000e_irq_pending_interrupts(core->mac[ICR] & >>> core->mac[IMS], >>> core->mac[ICR], >>> core->mac[IMS]); >>> diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h >>> index 7d8ff41890..63a15510cc 100644 >>> --- a/hw/net/e1000e_core.h >>> +++ b/hw/net/e1000e_core.h >>> @@ -109,6 +109,8 @@ struct E1000Core { >>> NICState *owner_nic; >>> PCIDevice *owner; >>> void (*owner_start_recv)(PCIDevice *d); >>> + >>> + uint32_t msi_causes_pending; >>> }; >>> void >>> >> Ping again, this one is still open. >> >> Jan > > Sorry for the late. > > Just one thing to confirm, I think the reason that we don't need to > migrate msi_cause_pending is that it can only lead at most one more > signal of MSI on destination? Right, a cleared msi_causes_pending on the destination may lead to one spurious interrupt injects per vector. That should be tolerable. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2] e1000e: Prevent MSI/MSI-X storms
On 2018-04-05 19:41, Jan Kiszka wrote: > From: Jan Kiszka > > Only signal MSI/MSI-X events on rising edges. So far we re-triggered the > interrupt sources even if the guest did no consumed the pending one, > easily causing interrupt storms. > > Issue was observable with Linux 4.16 e1000e driver when MSI-X was used. > Vector 2 was causing interrupt storms after the driver activated the > device. > > Signed-off-by: Jan Kiszka > --- > > Changes in v2: > - also update msi_causes_pending after EIAC changes (required because >there is no e1000e_update_interrupt_state after that > > hw/net/e1000e_core.c | 11 +++ > hw/net/e1000e_core.h | 2 ++ > 2 files changed, 13 insertions(+) > > diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c > index c93c4661ed..d6ddd59986 100644 > --- a/hw/net/e1000e_core.c > +++ b/hw/net/e1000e_core.c > @@ -2027,6 +2027,7 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t > cause, uint32_t int_cfg) > } > > core->mac[ICR] &= ~effective_eiac; > +core->msi_causes_pending &= ~effective_eiac; > > if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { > core->mac[IMS] &= ~effective_eiac; > @@ -2123,6 +2124,13 @@ e1000e_send_msi(E1000ECore *core, bool msix) > { > uint32_t causes = core->mac[ICR] & core->mac[IMS] & ~E1000_ICR_ASSERTED; > > +core->msi_causes_pending &= causes; > +causes ^= core->msi_causes_pending; > +if (causes == 0) { > +return; > +} > +core->msi_causes_pending |= causes; > + > if (msix) { > e1000e_msix_notify(core, causes); > } else { > @@ -2160,6 +2168,9 @@ e1000e_update_interrupt_state(E1000ECore *core) > core->mac[ICS] = core->mac[ICR]; > > interrupts_pending = (core->mac[IMS] & core->mac[ICR]) ? true : false; > +if (!interrupts_pending) { > +core->msi_causes_pending = 0; > +} > > trace_e1000e_irq_pending_interrupts(core->mac[ICR] & core->mac[IMS], > core->mac[ICR], core->mac[IMS]); > diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h > index 7d8ff41890..63a15510cc 100644 > --- a/hw/net/e1000e_core.h > +++ b/hw/net/e1000e_core.h > @@ -109,6 +109,8 @@ struct E1000Core { > NICState *owner_nic; > PCIDevice *owner; > void (*owner_start_recv)(PCIDevice *d); > + > +uint32_t msi_causes_pending; > }; > > void > Ping again, this one is still open. Jan
[Qemu-devel] [PATCH v2] target-i386: Add NPT support
From: Jan Kiszka This implements NPT suport for SVM by hooking into x86_cpu_handle_mmu_fault where it reads the stage-1 page table. Whether we need to perform this 2nd stage translation, and how, is decided during vmrun and stored in hflags2, along with nested_cr3 and nested_pg_mode. As get_hphys performs a direct cpu_vmexit in case of NPT faults, we need retaddr in that function. To avoid changing the signature of cpu_handle_mmu_fault, this passes the value from tlb_fill to get_hphys via the CPU state. This was tested successfully via the Jailhouse hypervisor. Signed-off-by: Jan Kiszka --- Changes in v2: - use hflags2 instead of hflags - add conditional vmstate subsection target/i386/cpu.c | 2 +- target/i386/cpu.h | 6 ++ target/i386/excp_helper.c | 216 +- target/i386/machine.c | 21 + target/i386/mem_helper.c | 6 +- target/i386/svm.h | 14 +++ target/i386/svm_helper.c | 22 + 7 files changed, 281 insertions(+), 6 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 1e6a7d0a75..6e1f180249 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -751,7 +751,7 @@ static void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A) #define TCG_EXT4_FEATURES 0 -#define TCG_SVM_FEATURES 0 +#define TCG_SVM_FEATURES CPUID_SVM_NPT #define TCG_KVM_FEATURES 0 #define TCG_7_0_EBX_FEATURES (CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_SMAP | \ CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ADX | \ diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 8eaefeee3e..7f33755bf5 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -211,6 +211,7 @@ typedef enum X86Seg { #define HF2_VINTR_SHIFT 3 /* value of V_INTR_MASKING bit */ #define HF2_SMM_INSIDE_NMI_SHIFT 4 /* CPU serving SMI nested inside NMI */ #define HF2_MPX_PR_SHIFT 5 /* BNDCFGx.BNDPRESERVE */ +#define HF2_NPT_SHIFT6 /* Nested Paging enabled */ #define HF2_GIF_MASK(1 << HF2_GIF_SHIFT) #define HF2_HIF_MASK(1 << HF2_HIF_SHIFT) @@ -218,6 +219,7 @@ typedef enum X86Seg { #define HF2_VINTR_MASK (1 << HF2_VINTR_SHIFT) #define HF2_SMM_INSIDE_NMI_MASK (1 << HF2_SMM_INSIDE_NMI_SHIFT) #define HF2_MPX_PR_MASK (1 << HF2_MPX_PR_SHIFT) +#define HF2_NPT_MASK(1 << HF2_NPT_SHIFT) #define CR0_PE_SHIFT 0 #define CR0_MP_SHIFT 1 @@ -1265,12 +1267,16 @@ typedef struct CPUX86State { uint16_t intercept_dr_read; uint16_t intercept_dr_write; uint32_t intercept_exceptions; +uint64_t nested_cr3; +uint32_t nested_pg_mode; uint8_t v_tpr; /* KVM states, automatically cleared on reset */ uint8_t nmi_injected; uint8_t nmi_pending; +uintptr_t retaddr; + /* Fields up to this point are cleared by a CPU reset */ struct {} end_reset_fields; diff --git a/target/i386/excp_helper.c b/target/i386/excp_helper.c index cb4d1b7d33..37a33d5ae0 100644 --- a/target/i386/excp_helper.c +++ b/target/i386/excp_helper.c @@ -157,6 +157,209 @@ int x86_cpu_handle_mmu_fault(CPUState *cs, vaddr addr, int size, #else +static hwaddr get_hphys(CPUState *cs, hwaddr gphys, MMUAccessType access_type, +int *prot) +{ +CPUX86State *env = _CPU(cs)->env; +uint64_t rsvd_mask = PG_HI_RSVD_MASK; +uint64_t ptep, pte; +uint64_t exit_info_1 = 0; +target_ulong pde_addr, pte_addr; +uint32_t page_offset; +int page_size; + +if (likely(!(env->hflags2 & HF2_NPT_MASK))) { +return gphys; +} + +if (!(env->nested_pg_mode & SVM_NPT_NXE)) { +rsvd_mask |= PG_NX_MASK; +} + +if (env->nested_pg_mode & SVM_NPT_PAE) { +uint64_t pde, pdpe; +target_ulong pdpe_addr; + +#ifdef TARGET_X86_64 +if (env->nested_pg_mode & SVM_NPT_LMA) { +uint64_t pml5e; +uint64_t pml4e_addr, pml4e; + +pml5e = env->nested_cr3; +ptep = PG_NX_MASK | PG_USER_MASK | PG_RW_MASK; + +pml4e_addr = (pml5e & PG_ADDRESS_MASK) + +(((gphys >> 39) & 0x1ff) << 3); +pml4e = x86_ldq_phys(cs, pml4e_addr); +if (!(pml4e & PG_PRESENT_MASK)) { +goto do_fault; +} +if (pml4e & (rsvd_mask | PG_PSE_MASK)) { +goto do_fault_rsvd; +} +if (!(pml4e & PG_ACCESSED_MASK)) { +pml4e |= PG_ACCESSED_MASK; +x86_stl_phys_notdirty(cs, pml4e_addr, pml4e); +} +ptep &= pml4e ^ PG_NX_MASK; +pdpe_addr = (pml4e & PG_ADDRESS_MASK) + +(((gphys >> 30) & 0x1ff) << 3);
Re: [Qemu-devel] [PATCH v2 4/4] target-i386: Add NPT support
On 2018-06-30 08:05, Paolo Bonzini wrote: > On 30/06/2018 07:25, Jan Kiszka wrote: >> On 2018-06-27 14:14, Paolo Bonzini wrote: >>> On 03/04/2018 17:36, Jan Kiszka wrote: >>>> >>>> +static hwaddr get_hphys(CPUState *cs, hwaddr gphys, MMUAccessType >>>> access_type, >>>> +int *prot) >>>> +{ >>>> +CPUX86State *env = _CPU(cs)->env; >>>> +uint64_t rsvd_mask = PG_HI_RSVD_MASK; >>>> +uint64_t ptep, pte; >>>> +uint64_t exit_info_1 = 0; >>>> +target_ulong pde_addr, pte_addr; >>>> +uint32_t page_offset; >>>> +int page_size; >>>> + >>>> +if (likely(!(env->hflags & HF_NPT_MASK))) { >>>> +return gphys; >>>> +} >>> >>> hflags are a somewhat limited resource. Can this go in hflags2? >>> >> >> Will have a look - I don't seen why not. Or is there any special >> semantical difference between both fields? > > Yes, hflags become flags of the translation block, while hflags2 are > just random processor state. If translate.c uses it you must use > hflags, but here hflags2 should be safe. Indeed. We only change it at vmentry/exit, and that is already a translation block barrier. v2 is on the way. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2 4/4] target-i386: Add NPT support
On 2018-06-27 14:14, Paolo Bonzini wrote: > On 03/04/2018 17:36, Jan Kiszka wrote: >> >> +static hwaddr get_hphys(CPUState *cs, hwaddr gphys, MMUAccessType >> access_type, >> +int *prot) >> +{ >> +CPUX86State *env = _CPU(cs)->env; >> +uint64_t rsvd_mask = PG_HI_RSVD_MASK; >> +uint64_t ptep, pte; >> +uint64_t exit_info_1 = 0; >> +target_ulong pde_addr, pte_addr; >> +uint32_t page_offset; >> +int page_size; >> + >> +if (likely(!(env->hflags & HF_NPT_MASK))) { >> +return gphys; >> +} > > hflags are a somewhat limited resource. Can this go in hflags2? > Will have a look - I don't seen why not. Or is there any special semantical difference between both fields? >> >> + >> +env->nested_pg_mode = 0; >> +if (env->cr[4] & CR4_PAE_MASK) { >> +env->nested_pg_mode |= SVM_NPT_PAE; >> +} >> +if (env->hflags & HF_LMA_MASK) { >> +env->nested_pg_mode |= SVM_NPT_LMA; >> +} >> +if (env->efer & MSR_EFER_NXE) { >> +env->nested_pg_mode |= SVM_NPT_NXE; >> +} >> +} >> + > > This needs to be migrated. You can put it in a subsection, conditional > on hflags & HF_SVMI_MASK. OK. > > Also, do you need to flush the TLB unconditionally, even if CR0.PG is zero? Cannot follow you here yet. What flush are you referring to? Also, CR0.PG would not reflect if NPT is on, which now also contributes to our TLB. > > Otherwise looks good. I have queued patches 1-3, but hopefully this one > can go in the next release too. Sorry for the delayed review. No problem. Thanks, Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2 0/7] arm_gic: add virtualization extensions support
On 2018-06-26 12:22, Jan Kiszka wrote: > On 2018-06-26 11:24, Luc Michel wrote: >> >> >> On 06/25/2018 06:55 AM, Jan Kiszka wrote: >>> On 2018-06-19 11:31, luc.mic...@greensocs.com wrote: >>>> From: Luc MICHEL >>>> >>>> This patch series add support for the virtualization extensions in the >>>> ARM GICv2 interrupt controller. >>>> >>>> The first two commits do some refactoring to prepare for the >>>> implementation. Commits 3 and 4 are the actual implementation. The last >>>> commit updates the ZynqMP implementation to support virtualization. >>> >>> Is it enabled for the virt machine as well? Seems we are missing some >>> mapping of GICV and GICH there. >>> >>> And what is the recommended way to get the ZynqMP model running? I've a >>> kernel that works on a ZCU102, but just passing it via -kernel to the >>> QEMU model seems insufficient. >> Hum, I think it should be enough. This is the way I run my tests with Xen: >> qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on \ >> -m 1024 \ >> -kernel xen-kernel -nographic -smp 1 >> > > Just debugged this further a bit, and it seems my kernel is not getting > any device tree from qemu: > > (gdb) bt > #0 cpu_relax () at ../arch/arm64/include/asm/processor.h:203 > #1 setup_machine_fdt (dt_phys=256) at ../arch/arm64/kernel/setup.c:194 > #2 setup_arch (cmdline_p=) at ../arch/arm64/kernel/setup.c:258 > #3 0xff8008e30bc4 in start_kernel () at ../init/main.c:552 > #4 0x in ?? () > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > (gdb) lx-dmesg > [0.00] Booting Linux on physical CPU 0x00 [0x410fd034] > [0.00] Linux version 4.17.0-00033-g67c2a9a37d59 (jan@md1f2u6c) (gcc > version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)) #17 SMP Tue Jun 26 12:16:56 > CEST 2018 > (gdb) l setup_machine_fdt > 178 pr_warn("Large number of MPIDR hash buckets > detected\n"); > 179 } > 180 > 181 static void __init setup_machine_fdt(phys_addr_t dt_phys) > 182 { > 183 void *dt_virt = fixmap_remap_fdt(dt_phys); > 184 const char *name; > 185 > 186 if (!dt_virt || !early_init_dt_scan(dt_virt)) { > 187 pr_crit("\n" > 188 "Error: invalid device tree blob at physical > address %pa (virtual address 0x%p)\n" > 189 "The dtb must be 8-byte aligned and must not > exceed 2 MB in size\n" > 190 "\nPlease check your bootloader.", > 191 _phys, dt_virt); > 192 > 193 while (true) > 194 cpu_relax(); > [...] > (gdb) p dt_virt > $1 = (void *) 0x0 > > Jan > OK, feeding in -dtb arch/arm64/boot/dts/xilinx/zynqmp-zcu102-revB.dtb and setting the console to ttyPS0 gives now some output. Looks like this can take me closer towards a testable setup. Jan
Re: [Qemu-devel] [PATCH v2 0/7] arm_gic: add virtualization extensions support
On 2018-06-26 11:24, Luc Michel wrote: > > > On 06/25/2018 06:55 AM, Jan Kiszka wrote: >> On 2018-06-19 11:31, luc.mic...@greensocs.com wrote: >>> From: Luc MICHEL >>> >>> This patch series add support for the virtualization extensions in the >>> ARM GICv2 interrupt controller. >>> >>> The first two commits do some refactoring to prepare for the >>> implementation. Commits 3 and 4 are the actual implementation. The last >>> commit updates the ZynqMP implementation to support virtualization. >> >> Is it enabled for the virt machine as well? Seems we are missing some >> mapping of GICV and GICH there. >> >> And what is the recommended way to get the ZynqMP model running? I've a >> kernel that works on a ZCU102, but just passing it via -kernel to the >> QEMU model seems insufficient. > Hum, I think it should be enough. This is the way I run my tests with Xen: > qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on \ > -m 1024 \ > -kernel xen-kernel -nographic -smp 1 > Just debugged this further a bit, and it seems my kernel is not getting any device tree from qemu: (gdb) bt #0 cpu_relax () at ../arch/arm64/include/asm/processor.h:203 #1 setup_machine_fdt (dt_phys=256) at ../arch/arm64/kernel/setup.c:194 #2 setup_arch (cmdline_p=) at ../arch/arm64/kernel/setup.c:258 #3 0xff8008e30bc4 in start_kernel () at ../init/main.c:552 #4 0x in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) lx-dmesg [0.00] Booting Linux on physical CPU 0x00 [0x410fd034] [0.00] Linux version 4.17.0-00033-g67c2a9a37d59 (jan@md1f2u6c) (gcc version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)) #17 SMP Tue Jun 26 12:16:56 CEST 2018 (gdb) l setup_machine_fdt 178 pr_warn("Large number of MPIDR hash buckets detected\n"); 179 } 180 181 static void __init setup_machine_fdt(phys_addr_t dt_phys) 182 { 183 void *dt_virt = fixmap_remap_fdt(dt_phys); 184 const char *name; 185 186 if (!dt_virt || !early_init_dt_scan(dt_virt)) { 187 pr_crit("\n" 188 "Error: invalid device tree blob at physical address %pa (virtual address 0x%p)\n" 189 "The dtb must be 8-byte aligned and must not exceed 2 MB in size\n" 190 "\nPlease check your bootloader.", 191 _phys, dt_virt); 192 193 while (true) 194 cpu_relax(); [...] (gdb) p dt_virt $1 = (void *) 0x0 Jan
Re: [Qemu-devel] [PATCH v2 0/7] arm_gic: add virtualization extensions support
On 2018-06-19 11:31, luc.mic...@greensocs.com wrote: > From: Luc MICHEL > > This patch series add support for the virtualization extensions in the > ARM GICv2 interrupt controller. > > The first two commits do some refactoring to prepare for the > implementation. Commits 3 and 4 are the actual implementation. The last > commit updates the ZynqMP implementation to support virtualization. Is it enabled for the virt machine as well? Seems we are missing some mapping of GICV and GICH there. And what is the recommended way to get the ZynqMP model running? I've a kernel that works on a ZCU102, but just passing it via -kernel to the QEMU model seems insufficient. Jan
Re: [Qemu-devel] [PATCH] e1000e: Do not auto-clear ICR bits which aren't set in EIAC
On 2018-06-15 04:07, Jason Wang wrote: > > > On 2018年06月13日 10:29, Jason Wang wrote: >> >> >> On 2018年06月13日 10:26, Philippe Mathieu-Daudé wrote: >>> Hi Jason, >>> >>> On 06/12/2018 11:18 PM, Jason Wang wrote: >>>> On 2018年06月13日 03:00, Philippe Mathieu-Daudé wrote: >>>>> Cc'ing Jason who is also listed as co-maintainer: >>>>> >>>>> ./scripts/get_maintainer.pl -f hw/net/e1000e_core.c >>>>> Dmitry Fleytman (maintainer:e1000e) >>>>> Jason Wang (odd fixer:Network devices) >>>>> >>>>> On 06/12/2018 03:43 PM, Jan Kiszka wrote: >>>>>> On 2018-06-12 20:38, Philippe Mathieu-Daudé wrote: >>>>>>> On 06/12/2018 03:30 PM, Jan Kiszka wrote: >>>>>>>> On 2018-06-12 20:11, Philippe Mathieu-Daudé wrote: >>>>>>>>> Hi Jan, >>>>>>>>> >>>>>>>>> On 06/12/2018 02:22 PM, Jan Kiszka wrote: >>>>>>>>>> On 2018-05-22 09:00, Jan Kiszka wrote: >>>>>>>>>>> On 2018-04-16 17:29, Peter Maydell wrote: >>>>>>>>>>>> On 16 April 2018 at 16:25, Jan Kiszka >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On 2018-04-01 23:17, Jan Kiszka wrote: >>>>>>>>>>>>>> From: Jan Kiszka >>>>>>>>>>>>>> >>>>>>>>>>>>>> The spec does not justify clearing of any >>>>>>>>>>>>>> E1000_ICR_OTHER_CAUSES when >>>>>>>>>>>>>> E1000_ICR_OTHER is set in EIAC. In fact, removing this code >>>>>>>>>>>>>> fixes the >>>>>>>>>>>>>> issue the Linux driver runs into since 4aea7a5c5e94 ("e1000e: >>>>>>>>>>>>>> Avoid >>>>>>>>>>>>>> receiver overrun interrupt bursts") and was worked around by >>>>>>>>>>>>>> 745d0bd3af99 ("e1000e: Remove Other from EIAC"). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Signed-off-by: Jan Kiszka >>>>>>>>>>>>>> --- >>>>>>>>>>>>>> >>>>>>>>>>>>>> This resolves the issue I reported on February 18 ("e1000e: >>>>>>>>>>>>>> MSI-X >>>>>>>>>>>>>> problem with recent Linux drivers"). >>>>>>>>>>>>>> >>>>>>>>>>>>>> hw/net/e1000e_core.c | 4 >>>>>>>>>>>>>> 1 file changed, 4 deletions(-) >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c >>>>>>>>>>>>>> index ecf9b1..d38f025c0f 100644 >>>>>>>>>>>>>> --- a/hw/net/e1000e_core.c >>>>>>>>>>>>>> +++ b/hw/net/e1000e_core.c >>>>>>>>>>>>>> @@ -2022,10 +2022,6 @@ e1000e_msix_notify_one(E1000ECore >>>>>>>>>>>>>> *core, uint32_t cause, uint32_t int_cfg) >>>>>>>>>>>>>> >>>>>>>>>>>>>> effective_eiac = core->mac[EIAC] & cause; >>>>>>>>>>>>>> >>>>>>>>>>>>>> - if (effective_eiac == E1000_ICR_OTHER) { >>>>>>>>>>>>>> - effective_eiac |= E1000_ICR_OTHER_CAUSES; >>>>>>>>>>>>>> - } >>>>>>>>>>>>>> - >>>>>>>>>>>>>> core->mac[ICR] &= ~effective_eiac; >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { >>>>>>>>>>>>>> >>>>>>>>>>>>> Ping for this - as well as >>>>>>>>>>>>> https://patchwork.ozlabs.org/patch/895476. >>>>>>>>>>>&g
Re: [Qemu-devel] [PATCH] hw/i386: Fix IVHD entry length for AMD IOMMU
On 2018-06-13 16:26, Michael S. Tsirkin wrote: > On Tue, May 22, 2018 at 09:06:56AM +0200, Jan Kiszka wrote: >> On 2018-03-29 14:51, Jan Kiszka wrote: >>> From: Jan Kiszka >>> >>> Counting from the IVHD ID field to the all-devices entry, we have 28 >>> bytes, not 36. >>> >>> Signed-off-by: Jan Kiszka >>> --- >>> hw/i386/acpi-build.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c >>> index deb440f286..a0cda71411 100644 >>> --- a/hw/i386/acpi-build.c >>> +++ b/hw/i386/acpi-build.c >>> @@ -2561,7 +2561,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker >>> *linker) >>> (1UL << 7), /* PPRSup */ >>> 1); >>> /* IVHD length */ >>> -build_append_int_noprefix(table_data, 0x24, 2); >>> +build_append_int_noprefix(table_data, 28, 2); >>> /* DeviceID */ >>> build_append_int_noprefix(table_data, s->devid, 2); >>> /* Capability offset */ >>> >> >> Waiting to be merged. >> >> Jan > > I'll queue this but you really should Cc maintainers :) > Sorry, wasn't clear to me that you are also maintaining the AMD IOMMU code. Thanks, Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH] e1000e: Do not auto-clear ICR bits which aren't set in EIAC
On 2018-06-12 20:38, Philippe Mathieu-Daudé wrote: > On 06/12/2018 03:30 PM, Jan Kiszka wrote: >> On 2018-06-12 20:11, Philippe Mathieu-Daudé wrote: >>> Hi Jan, >>> >>> On 06/12/2018 02:22 PM, Jan Kiszka wrote: >>>> On 2018-05-22 09:00, Jan Kiszka wrote: >>>>> On 2018-04-16 17:29, Peter Maydell wrote: >>>>>> On 16 April 2018 at 16:25, Jan Kiszka wrote: >>>>>>> On 2018-04-01 23:17, Jan Kiszka wrote: >>>>>>>> From: Jan Kiszka >>>>>>>> >>>>>>>> The spec does not justify clearing of any E1000_ICR_OTHER_CAUSES when >>>>>>>> E1000_ICR_OTHER is set in EIAC. In fact, removing this code fixes the >>>>>>>> issue the Linux driver runs into since 4aea7a5c5e94 ("e1000e: Avoid >>>>>>>> receiver overrun interrupt bursts") and was worked around by >>>>>>>> 745d0bd3af99 ("e1000e: Remove Other from EIAC"). >>>>>>>> >>>>>>>> Signed-off-by: Jan Kiszka >>>>>>>> --- >>>>>>>> >>>>>>>> This resolves the issue I reported on February 18 ("e1000e: MSI-X >>>>>>>> problem with recent Linux drivers"). >>>>>>>> >>>>>>>> hw/net/e1000e_core.c | 4 >>>>>>>> 1 file changed, 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c >>>>>>>> index ecf9b1..d38f025c0f 100644 >>>>>>>> --- a/hw/net/e1000e_core.c >>>>>>>> +++ b/hw/net/e1000e_core.c >>>>>>>> @@ -2022,10 +2022,6 @@ e1000e_msix_notify_one(E1000ECore *core, >>>>>>>> uint32_t cause, uint32_t int_cfg) >>>>>>>> >>>>>>>> effective_eiac = core->mac[EIAC] & cause; >>>>>>>> >>>>>>>> -if (effective_eiac == E1000_ICR_OTHER) { >>>>>>>> -effective_eiac |= E1000_ICR_OTHER_CAUSES; >>>>>>>> -} >>>>>>>> - >>>>>>>> core->mac[ICR] &= ~effective_eiac; >>>>>>>> >>>>>>>> if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { >>>>>>>> >>>>>>> >>>>>>> Ping for this - as well as https://patchwork.ozlabs.org/patch/895476. >>>>>>> >>>>>>> Given that q35 uses e1000e by default and many Linux kernel versions no >>>>>>> longer work, this should likely go into upcoming and stable versions >>>>>> >>>>>> I'd rather not put it into 2.12 at this point in the release >>>>>> cycle unless it's a regression from 2.11, I think. >>>>> >>>>> Second ping - nothing hit the repo so far, nor did I receive feedback. >>>>> >>>> >>>> And another ping. For both. >>>> >>>> These days I had to help someone with a broken QEMU setup that failed >>>> installing from network. It turned out that "modprobe e1000e IntMode=0" >>>> was needed to workaround the issues my patches address. >>> >>> What about the IMS register? It is set just after. >>> >>> Looking at b38636b8372, can you test this patch? >>> >>> -- >8 -- >>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c >>> index c93c4661ed..a484b68a5a 100644 >>> --- a/hw/net/e1000e_core.c >>> +++ b/hw/net/e1000e_core.c >>> @@ -2022,13 +2022,13 @@ e1000e_msix_notify_one(E1000ECore *core, >>> uint32_t cause, uint32_t int_cfg) >>> >>> effective_eiac = core->mac[EIAC] & cause; >>> >>> -if (effective_eiac == E1000_ICR_OTHER) { >>> -effective_eiac |= E1000_ICR_OTHER_CAUSES; >>> -} >>> - >>> core->mac[ICR] &= ~effective_eiac; >>> >>> if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { >>> +if (effective_eiac == E1000_ICR_OTHER) { >>> +effective_eiac |= E1000_ICR_OTHER_CAUSES; >>> +} >>> + >>> core->mac[IMS] &= ~effective_eiac; >>> } >>> } >>> >> >> Before testing this: What would be the reasoning for this change? > > Not breaking the purpose of b38636b8372 :) I disagree on this expansion of bit 31 ("other causes"). I see no indication in the spec that setting this bit for autoclear has more impact than on the very same bit itself. Therefore I'm asking for a reasoning - based on the spec. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] e1000e: Do not auto-clear ICR bits which aren't set in EIAC
On 2018-06-12 20:11, Philippe Mathieu-Daudé wrote: > Hi Jan, > > On 06/12/2018 02:22 PM, Jan Kiszka wrote: >> On 2018-05-22 09:00, Jan Kiszka wrote: >>> On 2018-04-16 17:29, Peter Maydell wrote: >>>> On 16 April 2018 at 16:25, Jan Kiszka wrote: >>>>> On 2018-04-01 23:17, Jan Kiszka wrote: >>>>>> From: Jan Kiszka >>>>>> >>>>>> The spec does not justify clearing of any E1000_ICR_OTHER_CAUSES when >>>>>> E1000_ICR_OTHER is set in EIAC. In fact, removing this code fixes the >>>>>> issue the Linux driver runs into since 4aea7a5c5e94 ("e1000e: Avoid >>>>>> receiver overrun interrupt bursts") and was worked around by >>>>>> 745d0bd3af99 ("e1000e: Remove Other from EIAC"). >>>>>> >>>>>> Signed-off-by: Jan Kiszka >>>>>> --- >>>>>> >>>>>> This resolves the issue I reported on February 18 ("e1000e: MSI-X >>>>>> problem with recent Linux drivers"). >>>>>> >>>>>> hw/net/e1000e_core.c | 4 >>>>>> 1 file changed, 4 deletions(-) >>>>>> >>>>>> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c >>>>>> index ecf9b1..d38f025c0f 100644 >>>>>> --- a/hw/net/e1000e_core.c >>>>>> +++ b/hw/net/e1000e_core.c >>>>>> @@ -2022,10 +2022,6 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t >>>>>> cause, uint32_t int_cfg) >>>>>> >>>>>> effective_eiac = core->mac[EIAC] & cause; >>>>>> >>>>>> -if (effective_eiac == E1000_ICR_OTHER) { >>>>>> -effective_eiac |= E1000_ICR_OTHER_CAUSES; >>>>>> -} >>>>>> - >>>>>> core->mac[ICR] &= ~effective_eiac; >>>>>> >>>>>> if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { >>>>>> >>>>> >>>>> Ping for this - as well as https://patchwork.ozlabs.org/patch/895476. >>>>> >>>>> Given that q35 uses e1000e by default and many Linux kernel versions no >>>>> longer work, this should likely go into upcoming and stable versions >>>> >>>> I'd rather not put it into 2.12 at this point in the release >>>> cycle unless it's a regression from 2.11, I think. >>> >>> Second ping - nothing hit the repo so far, nor did I receive feedback. >>> >> >> And another ping. For both. >> >> These days I had to help someone with a broken QEMU setup that failed >> installing from network. It turned out that "modprobe e1000e IntMode=0" >> was needed to workaround the issues my patches address. > > What about the IMS register? It is set just after. > > Looking at b38636b8372, can you test this patch? > > -- >8 -- > diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c > index c93c4661ed..a484b68a5a 100644 > --- a/hw/net/e1000e_core.c > +++ b/hw/net/e1000e_core.c > @@ -2022,13 +2022,13 @@ e1000e_msix_notify_one(E1000ECore *core, > uint32_t cause, uint32_t int_cfg) > > effective_eiac = core->mac[EIAC] & cause; > > -if (effective_eiac == E1000_ICR_OTHER) { > -effective_eiac |= E1000_ICR_OTHER_CAUSES; > -} > - > core->mac[ICR] &= ~effective_eiac; > > if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { > +if (effective_eiac == E1000_ICR_OTHER) { > +effective_eiac |= E1000_ICR_OTHER_CAUSES; > +} > + > core->mac[IMS] &= ~effective_eiac; > } > } > Before testing this: What would be the reasoning for this change? I need to refresh my caches, the debugging session is now too long ago again. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux