Re: [PATCH v7 7/9] drm/virtio: Improve DMA API usage for shmem BOs
Hi Emil, On 07/05/22 22:56, Emil Velikov wrote: > On 2022/07/05, Dmitry Osipenko wrote: >> On 7/5/22 18:45, Gerd Hoffmann wrote: >>> Hi, >>> >>>>> Also note that pci is not the only virtio transport we have. >>>> >>>> The VirtIO indeed has other transports, but only PCI is really supported >>>> in case of the VirtIO-GPU in kernel and in Qemu/crosvm, AFAICT. Hence >>>> only the PCI transport was tested. >>> >>> qemu -M microvm \ >>> -global virtio-mmio.force-legacy=false \ >>> -device virtio-gpu-device >>> >>> Gives you a functional virtio-gpu device on virtio-mmio. >>> >>> aarch64 virt machines support both pci and mmio too. >>> s390x has virtio-gpu-ccw ... >> >> Gerd, thank you very much! It's was indeed unclear to me how to test the >> MMIO GPU, but yours variant with microvm works! I was looking for trying >> aarch64 in the past, but it also was unclear how to do it since there is >> no DT support for the VirtIO-GPU, AFAICS. >> >> I booted kernel with this patchset applied and everything is okay, Xorg >> works. >> >> [drm] Initialized virtio_gpu 0.1.0 0 for LNRO0005:01 on minor 0 >> virtio-mmio LNRO0005:01: [drm] drm_plane_enable_fb_damage_clips() not >> called >> virtio-mmio LNRO0005:01: [drm] fb0: virtio_gpudrmfb frame buffer device >> >> There is no virgl support because it's a virtio-gpu-device and not >> virtio-gpu-device-gl that is PCI-only in Qemu. Hence everything seems good. >> >> I'd appreciate if you could give s390x a test.. I never touched s390x >> and it will probably take some extra effort to get into it. >> > > Adding Laszlo Ersek, who debugged and tested this the last time. > > Laszlo Ersek do ypu have some tips for Dmitry? Xorg seems to be > working on his end with the drm_drv_set_unique(... "pci:...") call > removed. > > Original patch can be found at: > https://lore.kernel.org/dri-devel/1380526d-17fb-6eb2-0fd5-5cddbdf0a...@collabora.com/T/#mbc1a1bedc91d1855007188a725c5c75bbc771cf0 thanks for recalling this, but... I've moved to different projects, and I'm already scraping the bottom of the barrel for every chunk of time I can find :( Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active
On 09/09/20 14:44, Laszlo Ersek wrote: > To summarize: for QemuFlashFvbServicesRuntimeDxe to allocate UEFI > Runtime Services Data type memory, for its own runtime GHCB, two > permissions are necessary (together), at OS runtime: > > - QemuFlashFvbServicesRuntimeDxe must be allowed to swap MSR_SEV_ES_GHCB > temporarily (before executing VMGEXIT), > > - QemuFlashFvbServicesRuntimeDxe must be allowed to change the OS-owned > PTE temporarily (for remapping the GHCB as plaintext, before writing > to it). Condition#2 gets worse: If the firmware-allocated runtime GHCB happens to be virt-mapped by the OS using a 2MB page (covering other UEFI runtime data areas, perhaps?), then simply flipping the encryption bit in QemuFlashFvbServicesRuntimeDxe would mark more runtime memory than just the GHCB as "plaintext". (2MB-4KB specifically.) This could result in: - firmware read accesses outside of the GHCB returning garbage - firmware write accesses ouside of the GHCB leaking information to the hypervisor, and reading back later as garbage In order to prevent those symptoms, the firmware would have to split the 2MB page to 4KB pages, and decrypt just the one (GHCB) page. But page splitting requires additional memory (for the finer granularity page table), and fw memory cannot be allocated at OS runtime. So that extra memory too would have to be pre-allocated by the firmware. Nasty. I'd recommend sticking with this kernel patch. Thanks, Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active
On 09/09/20 10:27, Ard Biesheuvel wrote: > (adding Laszlo and Brijesh) > > On Tue, 8 Sep 2020 at 20:46, Borislav Petkov wrote: >> >> + Ard so that he can ack the efi bits. >> >> On Mon, Sep 07, 2020 at 03:16:12PM +0200, Joerg Roedel wrote: >>> From: Tom Lendacky >>> >>> Calling down to EFI runtime services can result in the firmware >>> performing VMGEXIT calls. The firmware is likely to use the GHCB of >>> the OS (e.g., for setting EFI variables), I've had to stare at this for a while. Because, normally a VMGEXIT is supposed to occur like this: - guest does something privileged - resultant non-automatic exit (NAE) injects a #VC exception - exception handler figures out what that privileged thing was - exception handler submits request to hypervisor via GHCB contents plus VMGEXIT instruction Point being, the agent that "owns" the exception handler is supposed to pre-allocate or otherwise provide the GHCB too, for information passing. So... what is the particular NAE that occurs during the execution of UEFI runtime services (at OS runtime)? And assuming it occurs, I'm unsure if the exception handler (IDT) at that point is owned (temporarily?) by the firmware. - If the #VC handler comes from the firmware, then I don't know why it would use the OS's GHCB. - If the #VC handler comes from the OS, then I don't understand why the commit message says "firmware performing VMGEXIT", given that in this case it would be the OS's #VC handler executing VMGEXIT. So, I think the above commit message implies a VMGEXIT *without* a NAE / #VC context. (Because, I fail to interpret the commit message in a NAE / #VC context in any way; see above.) OK, so let's see where the firmware performs a VMGEXIT *outside* of an exception handler, *while* at OS runtime. There seems to be one, in file "OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c": > VOID > QemuFlashPtrWrite ( > INvolatile UINT8*Ptr, > INUINT8 Value > ) > { > if (MemEncryptSevEsIsEnabled ()) { > MSR_SEV_ES_GHCB_REGISTER Msr; > GHCB *Ghcb; > > Msr.GhcbPhysicalAddress = AsmReadMsr64 (MSR_SEV_ES_GHCB); > Ghcb = Msr.Ghcb; > > // > // Writing to flash is emulated by the hypervisor through the use of write > // protection. This won't work for an SEV-ES guest because the write won't > // be recognized as a true MMIO write, which would result in the required > // #VC exception. Instead, use the the VMGEXIT MMIO write support directly > // to perform the update. > // > VmgInit (Ghcb); > Ghcb->SharedBuffer[0] = Value; > Ghcb->SaveArea.SwScratch = (UINT64) (UINTN) Ghcb->SharedBuffer; > VmgExit (Ghcb, SVM_EXIT_MMIO_WRITE, (UINT64) (UINTN) Ptr, 1); > VmgDone (Ghcb); > } else { > *Ptr = Value; > } > } This function *does* run at OS runtime (as a part of non-volatile UEFI variable writes). And note that, wherever MSR_SEV_ES_GHCB points to at the moment, is used as GHCB. If the guest kernel allocates its own GHCB and writes the allocation address to MSR_SEV_ES_GHCB, then indeed the firmware will use the GHCB of the OS. I reviewed edk2 commit 437eb3f7a8db ("OvmfPkg/QemuFlashFvbServicesRuntimeDxe: Bypass flash detection with SEV-ES", 2020-08-17), but I admit I never thought of the guest OS changing MSR_SEV_ES_GHCB. I'm sorry about that. As long as this driver is running before OS runtime (i.e., during the DXE and BDS phases), MSR_SEV_ES_GHCB is supposed to carry the value we set in "OvmfPkg/PlatformPei/AmdSev.c": > STATIC > VOID > AmdSevEsInitialize ( > VOID > ) > { > VOID *GhcbBase; > PHYSICAL_ADDRESS GhcbBasePa; > UINTN GhcbPageCount, PageCount; > RETURN_STATUS PcdStatus, DecryptStatus; > IA32_DESCRIPTOR Gdtr; > VOID *Gdt; > > if (!MemEncryptSevEsIsEnabled ()) { > return; > } > > PcdStatus = PcdSetBoolS (PcdSevEsIsEnabled, TRUE); > ASSERT_RETURN_ERROR (PcdStatus); > > // > // Allocate GHCB and per-CPU variable pages. > // Since the pages must survive across the UEFI to OS transition > // make them reserved. > // > GhcbPageCount = mMaxCpuCount * 2; > GhcbBase = AllocateReservedPages (GhcbPageCount); > ASSERT (GhcbBase != NULL); > > GhcbBasePa = (PHYSICAL_ADDRESS)(UINTN) GhcbBase; > > // > // Each vCPU gets two consecutive pages, the first is the GHCB and the > // second is the per-CPU variable page. Loop through the allocation and > // only clear the encryption mask for the GHCB pages. > // > for (PageCount = 0; PageCount < GhcbPageCount; PageCount += 2) { > DecryptStatus = MemEncryptSevClearPageEncMask ( > 0, > GhcbBasePa + EFI_PAGES_TO_SIZE (PageCount), > 1, > TRUE > ); > ASSERT_RETURN_ERROR (DecryptStatus); > } > > ZeroMem (GhcbBase, EFI_PAGES_TO_SIZE (GhcbPageCount)); > > PcdStatus = PcdSet64S (PcdGhcbBase, GhcbBasePa); > ASSERT_RETURN_ERROR (PcdStatus); > PcdStatus =
Re: [PATCH v5] drm/virtio: use virtio_max_dma_size
On 08/21/19 13:12, Gerd Hoffmann wrote: > We must make sure our scatterlist segments are not too big, otherwise > we might see swiotlb failures (happens with sev, also reproducable with > swiotlb=force). > > Suggested-by: Laszlo Ersek > Signed-off-by: Gerd Hoffmann > --- > drivers/gpu/drm/virtio/virtgpu_object.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c > b/drivers/gpu/drm/virtio/virtgpu_object.c > index b2da31310d24..09b526518f5a 100644 > --- a/drivers/gpu/drm/virtio/virtgpu_object.c > +++ b/drivers/gpu/drm/virtio/virtgpu_object.c > @@ -204,6 +204,7 @@ int virtio_gpu_object_get_sg_table(struct > virtio_gpu_device *qdev, > .interruptible = false, > .no_wait_gpu = false > }; > + size_t max_segment; > > /* wtf swapping */ > if (bo->pages) > @@ -215,8 +216,13 @@ int virtio_gpu_object_get_sg_table(struct > virtio_gpu_device *qdev, > if (!bo->pages) > goto out; > > - ret = sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0, > - nr_pages << PAGE_SHIFT, GFP_KERNEL); > + max_segment = virtio_max_dma_size(qdev->vdev); > + max_segment &= PAGE_MASK; > + if (max_segment > SCATTERLIST_MAX_SEGMENT) > + max_segment = SCATTERLIST_MAX_SEGMENT; > + ret = __sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0, > + nr_pages << PAGE_SHIFT, > + max_segment, GFP_KERNEL); > if (ret) > goto out; > return 0; > Reviewed-by: Laszlo Ersek Thanks! Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
QueuePFN peculiarity in virtio-mmio
Hi, Appendix X: virtio-mmio in the virtio spec says • 0x040 | RW | QueuePFN [...] When the Guest stops using the queue it must write zero (0x0) to this register. [...] and Virtqueue Configuration [...] 2. Check if the queue is not already in use: read QueuePFN register, returned value should be zero (0x0). [...] I think this in itself is already suboptimal, because a guest that crashes and reboots (while the emulator itself survives) will not be able to use the device after said reboot (it has never re-set QueuePFN to zero). But, more importantly: I think that resetting the device (by writing 0 to its status register) should include (ie. *guarantee*) the effects of setting QueuePFN to zero for all imaginable queues of the device. This way, a defensive guest that starts up by resetting the device (*) after identifying it via MagicValue / Version / DeviceID / VendorID would be able to use the device regardless of the device's prior QueuePFN setting(s). (*) Resetting the device is the first step in 2.2.1 Device Initialization Sequence. It is not required on initial start up, but as a guest driver can never be sure whether the startup in question is the initial one, a defensive driver will always start with device reet. The question arises because Olivier has posted a series to edk2-devel that adds virtio-mmio support to TianoCore, and Mark tested it (using OVMF) with a Linux guest and found problems. Namely, OVMF itself can drive the virtio devices via virtio-mmio, but the Linux kernel booted from OVMF can not. The reason is the missing zeroing of QueuePFN when OVMF is exiting. (I'm just paraphrasing the analysis.) I think - that resetting the device (via its status register) should make the host forget *all* prior configuration, including QueuePFN, - and that the Linux driver should reset the device as first step. So: - What's the motivation for the acquire/release semantics of QueuePFN? - Am I right that device reset should force a QueuePFN release too? Thanks, Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: QueuePFN peculiarity in virtio-mmio
My apologies, I used Anthony's previous (now obsolete) email. Updated it now keeping full context below. Sorry. On 10/22/13 19:49, Laszlo Ersek wrote: Hi, Appendix X: virtio-mmio in the virtio spec says • 0x040 | RW | QueuePFN [...] When the Guest stops using the queue it must write zero (0x0) to this register. [...] and Virtqueue Configuration [...] 2. Check if the queue is not already in use: read QueuePFN register, returned value should be zero (0x0). [...] I think this in itself is already suboptimal, because a guest that crashes and reboots (while the emulator itself survives) will not be able to use the device after said reboot (it has never re-set QueuePFN to zero). But, more importantly: I think that resetting the device (by writing 0 to its status register) should include (ie. *guarantee*) the effects of setting QueuePFN to zero for all imaginable queues of the device. This way, a defensive guest that starts up by resetting the device (*) after identifying it via MagicValue / Version / DeviceID / VendorID would be able to use the device regardless of the device's prior QueuePFN setting(s). (*) Resetting the device is the first step in 2.2.1 Device Initialization Sequence. It is not required on initial start up, but as a guest driver can never be sure whether the startup in question is the initial one, a defensive driver will always start with device reet. The question arises because Olivier has posted a series to edk2-devel that adds virtio-mmio support to TianoCore, and Mark tested it (using OVMF) with a Linux guest and found problems. Namely, OVMF itself can drive the virtio devices via virtio-mmio, but the Linux kernel booted from OVMF can not. The reason is the missing zeroing of QueuePFN when OVMF is exiting. (I'm just paraphrasing the analysis.) I think - that resetting the device (via its status register) should make the host forget *all* prior configuration, including QueuePFN, - and that the Linux driver should reset the device as first step. So: - What's the motivation for the acquire/release semantics of QueuePFN? - Am I right that device reset should force a QueuePFN release too? Thanks, Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [edk2] QueuePFN peculiarity in virtio-mmio
On 10/22/13 19:55, Laszlo Ersek wrote: The question arises because Olivier has posted a series to edk2-devel that adds virtio-mmio support to TianoCore, and Mark tested it (using OVMF) with a Linux guest and found problems. Namely, OVMF itself can drive the virtio devices via virtio-mmio, but the Linux kernel booted from OVMF can not. The reason is the missing zeroing of QueuePFN when OVMF is exiting. (I'm just paraphrasing the analysis.) s/OVMF/AArch64 foundation model/g http://thread.gmane.org/gmane.comp.bios.tianocore.devel/4373/focus=4411 I should have followed my own advice, not to post when sick. I'll go now and hide in a cave. Laszlo /facepalm ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-spec PATCH 0/5] Receiving Used Buffers example code: cleanups and an extra mb()
On 06/17/13 04:17, Rusty Russell wrote: Stefan Hajnoczi stefa...@gmail.com writes: On Sat, Jun 8, 2013 at 7:39 PM, Laszlo Ersek ler...@redhat.com wrote: Patches before the last are small cleanups. In the last patch I'm trying to extract / generalize an idea from Stefan Hajnoczi's review of my virtio-net driver for OVMF. How about a single patch which just replaces the completely broken example? :) Do you want me to squash the series into one patch and repost? Receiving Used Buffers: prevent speculative load when not sequentially consistent Yes, though this only needs to be a rmb(). Probably, but the spec never qualifies the memory barriers it recommends. (Maybe their read/write types should be obvious to the reader. I didn't give their types much thought because in edk2/OVMF there's only MemoryFence().) In any case I'll take this as your approval of the 5/5 patch. Thanks! In the OASIS 1.0 spec, I'd like an appendix with tested code for doing these operations (probably based on a simplifeid version of vringh.c). ( I assume the tested code should come from one of the more sophisticated drivers (that also have a unix-y coding style). Also, I've been actively avoiding reading other virtio code (qemu, kernel, SeaBIOS, iPXE etc) except when I was stuck with an OVMF driver and (a) it looked like there was some silent requirement/assumption in qemu that had not been spelled out in the spec, (b) I was being dyslexic. Thus I can't really suggest code for the appendix (although the OVMF drivers do work). ) Thanks, Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 05/29/13 09:27, Paolo Bonzini wrote: Il 29/05/2013 06:33, Rusty Russell ha scritto: Paolo Bonzini pbonz...@redhat.com writes: Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: + +switch (addr) { +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy-device_feature_select; Oh dear no... Please use defines like the rest of QEMU. Any good reason not to use offsetof? I'm not sure it's portable to use it in case labels. IIRC, the definition ((int)(((T *)0)-field)) is not a valid C integer constant expression. Laszlo? It's defined to yield an integer constant expression in the ISO standard (and I think ANSI too, though that's not at hand): It's not in C89. It is, see 7.1.6 Common definitions stddef.h. The oldest compiler QEMU cares about is GCC 4.2. I don't know if it has a builtin offsetof, probably it does. Talking about nothing else but the ISO C standard(s), it doesn't matter how the C implementation deals with offsetof() internally. C implementation in this sense includes both compiler and standard library. If the standard library header (stddef.h or something internal included by it) expands the offsetof() macro to replacement text that does pointer black magic, magic that would be otherwise nonportable or undefined, it is alright as long as the accompanying compiler guarantees that the replacement text will work. That is, offsetof() gives the C implementor leeway to implement it in wherever distribution between compiler and standard library; the main thing is that the C program use the one offsetof() provided by the C implementation. Of course in the FLOSS world OS distros might mix and match gcc and glibc statistically randomly; normally some documentation should enable the user to put his/her system into ISO C or even SUSv[1234] mode. ( An interesting example for, say, non-unified implementation is getconf vs. c99. I'll pick SUSv3 for this example. http://pubs.opengroup.org/onlinepubs/95399/utilities/getconf.html http://pubs.opengroup.org/onlinepubs/95399/utilities/c99.html In a nutshell one can interrogate getconf for CFLAGS, LDFLAGS and LIBS so that c99 builds a program in ILP32_OFF32 / ILP32_OFFBIG / LP64_OFF64 / LPBIG_OFFBIG mode (programming environment). On my x86_64 RHEL-6.4 laptop, getconf is part of glibc (glibc-common-2.12-1.107.el6.x86_64), while c99 is part of gcc (gcc-4.4.7-3.el6.x86_64). Supposing I'd like to build a 32-bit program with 64-bit off_t: getconf _POSIX_V6_ILP32_OFFBIG 1 getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_CFLAGS -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LDFLAGS -m32 getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LIBS [empty string] However if I try to actually build a program using these flags: #define _XOPEN_SOURCE 600 int main(void) { return 0; } c99 \ -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 \ -o mytest \ -m32 \ mytest.c I get /usr/bin/ld: crt1.o: No such file: No such file or directory collect2: ld returned 1 exit status unless I install glibc-devel.i686. This problem is rooted in getconf (a glibc utility) being unaware of RHEL packaging choices: if the system can't guarantee the final c99 invocation to succeed, then the very first getconf invocation should write -1\n or undefined\n to stdout. (I'm aware that RHEL-6 is not certified UNIX 03; this is just an example for problems caused by various parts of a standard's (quasi-)implementation coming from separate projects. In that sense, caring about the offsetof() internals may have merit.) ) Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
anyone willing to review a virtio-net guest driver for OVMF?
(Resending to the virt list after the original qemu-devel posting, following MST's advice; plus adding ipxe-devel too -- apologies for spamming!) Hi, the driver in question is intended as a fallback driver when the iPXE EFI driver for virtio-net is not present as an oprom. The series starts with technical notes that should help the virtio-net expert catch any errors more easily. This review could easily take up a lot of time; my hope is in the fact that people here are both more skilled and faster than my humble self. http://thread.gmane.org/gmane.comp.bios.tianocore.devel/2804 Thanks, Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 05/28/13 19:43, Paolo Bonzini wrote: Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: + +switch (addr) { +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy-device_feature_select; Oh dear no... Please use defines like the rest of QEMU. Any good reason not to use offsetof? I'm not sure it's portable to use it in case labels. IIRC, the definition ((int)(((T *)0)-field)) is not a valid C integer constant expression. Laszlo? As long as we use the offsetof() that comes with the C implementation (ie. we don't hack it up ourselves), we should be safe; ISO C99 elegantly defines offsetof() in 7.17 Common definitions stddef.h p3 as The macros are [...] offsetof(type, member-designator) which expands to an integer constant expression that has type *size_t*, the value of which is the offset in bytes, to the structure member (designated by /member-designator/), from the beginning of its structure (designated by /type/). The type and member designator shall be such that given static type t; then the expression (t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.) (Address constant is described in 6.6p9 but quoting even that here doesn't seem necesary.) From 6.8.4.2p3, The expression of each case label shall be an integer constant expression [...]. So, (a) if we use the standard offsetof(), (b) and you can also write, for example, static struct virtio_pci_common_cfg t; static void *p = t.device_feature_select; then the construct seems valid. (Consistently with the above mention of UB if the specified member is a bit-field: the address-of operator's constraints also forbid a bit-field object as operand.) Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: suggesting wording fixes for virtio-spec 0.9.5
On 04/23/13 06:05, Rusty Russell wrote: Laszlo Ersek ler...@redhat.com writes: Hi, (I'm not subscribed to either list,) using the word descriptor is misleading in the following sections: Yes, I like the use of 'descriptor chains'. This is a definite improvement. Here's the diff I ended up with (massaged to minimize it). Thanks! Rusty. --- virtio-spec.txt-old 2013-04-23 13:22:21.339158214 +0930 +++ virtio-spec.txt 2013-04-23 13:34:14.055176464 +0930 @@ -482,10 +482,10 @@ 2.3.4 Available Ring -The available ring refers to what descriptors we are offering the -device: it refers to the head of a descriptor chain. The “flags” +The available ring refers to what descriptor chains we are offering the +device: each entry refers to the head of a descriptor chain. The “flags” field is currently 0 or 1: 1 indicating that we do not need an -interrupt when the device consumes a descriptor from the +interrupt when the device consumes a descriptor chain from the available ring. Alternatively, the guest can ask the device to delay interrupts until an entry with an index specified by the “ used_event” field is written in the used ring (equivalently, @@ -671,16 +671,16 @@ avail-ring[avail-idx % qsz] = head; -However, in general we can add many descriptors before we update -the “idx” field (at which point they become visible to the -device), so we keep a counter of how many we've added: +However, in general we can add many separate descriptor chains before we update +the “idx” field (at which point they become visible to the device), +so we keep a counter of how many we've added: avail-ring[(avail-idx + added++) % qsz] = head; 2.4.1.3 Updating The Index Field Once the idx field of the virtqueue is updated, the device will -be able to access the descriptor entries we've created and the +be able to access the descriptor chains we've created and the memory they refer to. This is why a memory barrier is generally used before the idx update, to ensure it sees the most up-to-date copy. Not sure if it's customary here or if you need it / want it, but anyway Reviewed-by: Laszlo Ersek ler...@redhat.com (Also I've fixed the OVMF driver; just reposting the patch today with a better commit message.) Thanks much! Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
suggesting wording fixes for virtio-spec 0.9.5
Hi, (I'm not subscribed to either list,) using the word descriptor is misleading in the following sections: 2.4.1.2 Updating The Available Ring [...] However, in general we can add many descriptors before we update the idx field (at which point they become visible to the device), so we keep a counter of how many we've added: [...] and 2.4.1.3 Updating The Index Field Once the idx field of the virtqueue is updated, the device will be able to access the descriptor entries we've created and the memory they refer to. [...] (The word descriptor in the above language is the reason I mis-implemented the virtio-blk guest driver in OVMF.) In fact the available ring tracks *head* descriptors only. I suggest s/many descriptors/many separate descriptor chains/ s/descriptor entries/separate descriptor chains/ for the above. Similarly, 2.3.4 Available Ring should start with something like The available ring describes what descriptor chains we are offering the device: each entry of the available ring refers to the head descriptor of a separate descriptor chain. Thanks Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
miscalculated vring size in virtio-0.9.5.pdf?
Hi, the vring_size() formula in section 2.3 seems to miss (a) vring_avail.used_event in the first ALIGN(), and (b) vring_used.flags, vring_used.idx, vring_used.avail_event in the second ALIGN(). vring_size() in Appendix A fixes (b) (no rounding up to page size though), but (a) is wrong there too, I believe. Thanks, Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net-next v4] xen-netfront: delay gARP until backend switches to Connected
After a guest is live migrated, the xen-netfront driver emits a gratuitous ARP message, so that networking hardware on the target host's subnet can take notice, and public routing to the guest is re-established. However, if the packet appears on the backend interface before the backend is added to the target host's bridge, the packet is lost, and the migrated guest's peers become unable to talk to the guest. A sufficient two-parts condition to prevent the above is: (1) ensure that the backend only moves to Connected xenbus state after its hotplug scripts completed, ie. the netback interface got added to the bridge; and (2) ensure the frontend only queues the gARP when it sees the backend move to Connected. These two together provide complete ordering. Sub-condition (1) is already satisfied by commit f942dc2552b8 in Linus' tree, based on commit 6b0b80ca7165 from [1]. In general, the full condition is sufficient, not necessary, because, according to [2], live migration has been working for a long time without satisfying sub-condition (2). However, after 6b0b80ca7165 was backported to the RHEL-5 host to ensure (1), (2) still proved necessary in the RHEL-6 guest. This patch intends to provide (2) for upstream. The Reviewed-by line comes from [3]. [1] git://xenbits.xen.org/people/ianc/linux-2.6.git#upstream/dom0/backend/netback-history [2] http://old-list-archives.xen.org/xen-devel/2011-06/msg01969.html [3] http://old-list-archives.xen.org/xen-devel/2011-07/msg00484.html Signed-off-by: Laszlo Ersek ler...@redhat.com Reviewed-by: Ian Campbell ian.campb...@citrix.com --- drivers/net/xen-netfront.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index d29365a..f033656 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1646,7 +1646,6 @@ static void netback_changed(struct xenbus_device *dev, case XenbusStateInitialised: case XenbusStateReconfiguring: case XenbusStateReconfigured: - case XenbusStateConnected: case XenbusStateUnknown: case XenbusStateClosed: break; @@ -1657,6 +1656,9 @@ static void netback_changed(struct xenbus_device *dev, if (xennet_connect(netdev) != 0) break; xenbus_switch_state(dev, XenbusStateConnected); + break; + + case XenbusStateConnected: netif_notify_peers(netdev); break; -- 1.7.4.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v3 REPOST] xen-netfront: delay gARP until backend switches to Connected
After a guest is live migrated, the xen-netfront driver emits a gratuitous ARP message, so that networking hardware on the target host's subnet can take notice, and public routing to the guest is re-established. However, if the packet appears on the backend interface before the backend is added to the target host's bridge, the packet is lost, and the migrated guest's peers become unable to talk to the guest. A sufficient two-parts condition to prevent the above is: (1) ensure that the backend only moves to Connected xenbus state after its hotplug scripts completed, ie. the netback interface got added to the bridge; and (2) ensure the frontend only queues the gARP when it sees the backend move to Connected. These two together provide complete ordering. Sub-condition (1) is satisfied by pvops commit 43223efd9bfd. In general, the full condition is sufficient, not necessary, because, according to [1], live migration has been working for a long time without satisfying sub-condition (2). However, after 43223efd9bfd was backported to the RHEL-5 host to ensure (1), (2) still proved necessary in the RHEL-6 guest. This patch intends to provide (2) for upstream. The Reviewed-by line comes from [2]. [1] http://old-list-archives.xen.org/xen-devel/2011-06/msg01969.html [2] http://old-list-archives.xen.org/xen-devel/2011-07/msg00484.html Signed-off-by: Laszlo Ersek ler...@redhat.com Reviewed-by: Ian Campbell ian.campb...@citrix.com --- drivers/net/xen-netfront.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index d29365a..f033656 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1646,7 +1646,6 @@ static void netback_changed(struct xenbus_device *dev, case XenbusStateInitialised: case XenbusStateReconfiguring: case XenbusStateReconfigured: - case XenbusStateConnected: case XenbusStateUnknown: case XenbusStateClosed: break; @@ -1657,6 +1656,9 @@ static void netback_changed(struct xenbus_device *dev, if (xennet_connect(netdev) != 0) break; xenbus_switch_state(dev, XenbusStateConnected); + break; + + case XenbusStateConnected: netif_notify_peers(netdev); break; -- 1.7.4.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization