Re: [PATCH v7 7/9] drm/virtio: Improve DMA API usage for shmem BOs

2022-07-06 Thread Laszlo Ersek
Hi Emil,

On 07/05/22 22:56, Emil Velikov wrote:
> On 2022/07/05, Dmitry Osipenko wrote:
>> On 7/5/22 18:45, Gerd Hoffmann wrote:
>>>   Hi,
>>>
>>>>> Also note that pci is not the only virtio transport we have.
>>>>
>>>> The VirtIO indeed has other transports, but only PCI is really supported
>>>> in case of the VirtIO-GPU in kernel and in Qemu/crosvm, AFAICT. Hence
>>>> only the PCI transport was tested.
>>>
>>> qemu -M microvm \
>>>   -global virtio-mmio.force-legacy=false \
>>>   -device virtio-gpu-device
>>>
>>> Gives you a functional virtio-gpu device on virtio-mmio.
>>>
>>> aarch64 virt machines support both pci and mmio too.
>>> s390x has virtio-gpu-ccw ...
>>
>> Gerd, thank you very much! It's was indeed unclear to me how to test the
>> MMIO GPU, but yours variant with microvm works! I was looking for trying
>> aarch64 in the past, but it also was unclear how to do it since there is
>> no DT support for the VirtIO-GPU, AFAICS.
>>
>> I booted kernel with this patchset applied and everything is okay, Xorg
>> works.
>>
>>  [drm] Initialized virtio_gpu 0.1.0 0 for LNRO0005:01 on minor 0
>>  virtio-mmio LNRO0005:01: [drm] drm_plane_enable_fb_damage_clips() not
>> called
>>  virtio-mmio LNRO0005:01: [drm] fb0: virtio_gpudrmfb frame buffer device
>>
>> There is no virgl support because it's a virtio-gpu-device and not
>> virtio-gpu-device-gl that is PCI-only in Qemu. Hence everything seems good.
>>
>> I'd appreciate if you could give s390x a test.. I never touched s390x
>> and it will probably take some extra effort to get into it.
>>
> 
> Adding Laszlo Ersek, who debugged and tested this the last time.
> 
> Laszlo Ersek do ypu have some tips for Dmitry? Xorg seems to be
> working on his end with the drm_drv_set_unique(... "pci:...") call
> removed.
> 
> Original patch can be found at:
> https://lore.kernel.org/dri-devel/1380526d-17fb-6eb2-0fd5-5cddbdf0a...@collabora.com/T/#mbc1a1bedc91d1855007188a725c5c75bbc771cf0

thanks for recalling this, but... I've moved to different projects, and
I'm already scraping the bottom of the barrel for every chunk of time I
can find :(

Laszlo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active

2020-09-09 Thread Laszlo Ersek
On 09/09/20 14:44, Laszlo Ersek wrote:

> To summarize: for QemuFlashFvbServicesRuntimeDxe to allocate UEFI
> Runtime Services Data type memory, for its own runtime GHCB, two
> permissions are necessary (together), at OS runtime:
> 
> - QemuFlashFvbServicesRuntimeDxe must be allowed to swap MSR_SEV_ES_GHCB
>   temporarily (before executing VMGEXIT),
> 
> - QemuFlashFvbServicesRuntimeDxe must be allowed to change the OS-owned
>   PTE temporarily (for remapping the GHCB as plaintext, before writing
>   to it).

Condition#2 gets worse:

If the firmware-allocated runtime GHCB happens to be virt-mapped by the
OS using a 2MB page (covering other UEFI runtime data areas, perhaps?),
then simply flipping the encryption bit in
QemuFlashFvbServicesRuntimeDxe would mark more runtime memory than just
the GHCB as "plaintext". (2MB-4KB specifically.)

This could result in:
- firmware read accesses outside of the GHCB returning garbage
- firmware write accesses ouside of the GHCB leaking information to the
hypervisor, and reading back later as garbage

In order to prevent those symptoms, the firmware would have to split the
2MB page to 4KB pages, and decrypt just the one (GHCB) page.

But page splitting requires additional memory (for the finer granularity
page table), and fw memory cannot be allocated at OS runtime. So that
extra memory too would have to be pre-allocated by the firmware. Nasty.

I'd recommend sticking with this kernel patch.

Thanks,
Laszlo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active

2020-09-09 Thread Laszlo Ersek
On 09/09/20 10:27, Ard Biesheuvel wrote:
> (adding Laszlo and Brijesh)
>
> On Tue, 8 Sep 2020 at 20:46, Borislav Petkov  wrote:
>>
>> + Ard so that he can ack the efi bits.
>>
>> On Mon, Sep 07, 2020 at 03:16:12PM +0200, Joerg Roedel wrote:
>>> From: Tom Lendacky 
>>>
>>> Calling down to EFI runtime services can result in the firmware
>>> performing VMGEXIT calls. The firmware is likely to use the GHCB of
>>> the OS (e.g., for setting EFI variables),

I've had to stare at this for a while.

Because, normally a VMGEXIT is supposed to occur like this:

- guest does something privileged
- resultant non-automatic exit (NAE) injects a #VC exception
- exception handler figures out what that privileged thing was
- exception handler submits request to hypervisor via GHCB contents plus
  VMGEXIT instruction

Point being, the agent that "owns" the exception handler is supposed to
pre-allocate or otherwise provide the GHCB too, for information passing.

So... what is the particular NAE that occurs during the execution of
UEFI runtime services (at OS runtime)?

And assuming it occurs, I'm unsure if the exception handler (IDT) at
that point is owned (temporarily?) by the firmware.

- If the #VC handler comes from the firmware, then I don't know why it
would use the OS's GHCB.

- If the #VC handler comes from the OS, then I don't understand why the
commit message says "firmware performing VMGEXIT", given that in this
case it would be the OS's #VC handler executing VMGEXIT.

So, I think the above commit message implies a VMGEXIT *without* a NAE /
#VC context. (Because, I fail to interpret the commit message in a NAE /
#VC context in any way; see above.)

OK, so let's see where the firmware performs a VMGEXIT *outside* of an
exception handler, *while* at OS runtime. There seems to be one, in file
"OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c":

> VOID
> QemuFlashPtrWrite (
>   INvolatile UINT8*Ptr,
>   INUINT8 Value
>   )
> {
>   if (MemEncryptSevEsIsEnabled ()) {
> MSR_SEV_ES_GHCB_REGISTER  Msr;
> GHCB  *Ghcb;
>
> Msr.GhcbPhysicalAddress = AsmReadMsr64 (MSR_SEV_ES_GHCB);
> Ghcb = Msr.Ghcb;
>
> //
> // Writing to flash is emulated by the hypervisor through the use of write
> // protection. This won't work for an SEV-ES guest because the write won't
> // be recognized as a true MMIO write, which would result in the required
> // #VC exception. Instead, use the the VMGEXIT MMIO write support directly
> // to perform the update.
> //
> VmgInit (Ghcb);
> Ghcb->SharedBuffer[0] = Value;
> Ghcb->SaveArea.SwScratch = (UINT64) (UINTN) Ghcb->SharedBuffer;
> VmgExit (Ghcb, SVM_EXIT_MMIO_WRITE, (UINT64) (UINTN) Ptr, 1);
> VmgDone (Ghcb);
>   } else {
> *Ptr = Value;
>   }
> }

This function *does* run at OS runtime (as a part of non-volatile UEFI
variable writes).

And note that, wherever MSR_SEV_ES_GHCB points to at the moment, is used
as GHCB.

If the guest kernel allocates its own GHCB and writes the allocation
address to MSR_SEV_ES_GHCB, then indeed the firmware will use the GHCB
of the OS.

I reviewed edk2 commit 437eb3f7a8db
("OvmfPkg/QemuFlashFvbServicesRuntimeDxe: Bypass flash detection with
SEV-ES", 2020-08-17), but I admit I never thought of the guest OS
changing MSR_SEV_ES_GHCB. I'm sorry about that.

As long as this driver is running before OS runtime (i.e., during the
DXE and BDS phases), MSR_SEV_ES_GHCB is supposed to carry the value we
set in "OvmfPkg/PlatformPei/AmdSev.c":

> STATIC
> VOID
> AmdSevEsInitialize (
>   VOID
>   )
> {
>   VOID  *GhcbBase;
>   PHYSICAL_ADDRESS  GhcbBasePa;
>   UINTN GhcbPageCount, PageCount;
>   RETURN_STATUS PcdStatus, DecryptStatus;
>   IA32_DESCRIPTOR   Gdtr;
>   VOID  *Gdt;
>
>   if (!MemEncryptSevEsIsEnabled ()) {
> return;
>   }
>
>   PcdStatus = PcdSetBoolS (PcdSevEsIsEnabled, TRUE);
>   ASSERT_RETURN_ERROR (PcdStatus);
>
>   //
>   // Allocate GHCB and per-CPU variable pages.
>   //   Since the pages must survive across the UEFI to OS transition
>   //   make them reserved.
>   //
>   GhcbPageCount = mMaxCpuCount * 2;
>   GhcbBase = AllocateReservedPages (GhcbPageCount);
>   ASSERT (GhcbBase != NULL);
>
>   GhcbBasePa = (PHYSICAL_ADDRESS)(UINTN) GhcbBase;
>
>   //
>   // Each vCPU gets two consecutive pages, the first is the GHCB and the
>   // second is the per-CPU variable page. Loop through the allocation and
>   // only clear the encryption mask for the GHCB pages.
>   //
>   for (PageCount = 0; PageCount < GhcbPageCount; PageCount += 2) {
> DecryptStatus = MemEncryptSevClearPageEncMask (
>   0,
>   GhcbBasePa + EFI_PAGES_TO_SIZE (PageCount),
>   1,
>   TRUE
>   );
> ASSERT_RETURN_ERROR (DecryptStatus);
>   }
>
>   ZeroMem (GhcbBase, EFI_PAGES_TO_SIZE (GhcbPageCount));
>
>   PcdStatus = PcdSet64S (PcdGhcbBase, GhcbBasePa);
>   ASSERT_RETURN_ERROR (PcdStatus);
>   PcdStatus = 

Re: [PATCH v5] drm/virtio: use virtio_max_dma_size

2019-08-22 Thread Laszlo Ersek
On 08/21/19 13:12, Gerd Hoffmann wrote:
> We must make sure our scatterlist segments are not too big, otherwise
> we might see swiotlb failures (happens with sev, also reproducable with
> swiotlb=force).
> 
> Suggested-by: Laszlo Ersek 
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/virtio/virtgpu_object.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index b2da31310d24..09b526518f5a 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -204,6 +204,7 @@ int virtio_gpu_object_get_sg_table(struct 
> virtio_gpu_device *qdev,
>   .interruptible = false,
>   .no_wait_gpu = false
>   };
> + size_t max_segment;
>  
>   /* wtf swapping */
>   if (bo->pages)
> @@ -215,8 +216,13 @@ int virtio_gpu_object_get_sg_table(struct 
> virtio_gpu_device *qdev,
>   if (!bo->pages)
>   goto out;
>  
> - ret = sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
> - nr_pages << PAGE_SHIFT, GFP_KERNEL);
> + max_segment = virtio_max_dma_size(qdev->vdev);
> + max_segment &= PAGE_MASK;
> + if (max_segment > SCATTERLIST_MAX_SEGMENT)
> + max_segment = SCATTERLIST_MAX_SEGMENT;
> + ret = __sg_alloc_table_from_pages(bo->pages, pages, nr_pages, 0,
> +   nr_pages << PAGE_SHIFT,
> +   max_segment, GFP_KERNEL);
>   if (ret)
>   goto out;
>   return 0;
> 

Reviewed-by: Laszlo Ersek 

Thanks!
Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


QueuePFN peculiarity in virtio-mmio

2013-10-22 Thread Laszlo Ersek
Hi,

Appendix X: virtio-mmio in the virtio spec says

• 0x040 | RW | QueuePFN
  [...] When the Guest stops using the queue it must write zero
  (0x0) to this register.
  [...]

and

Virtqueue Configuration

[...]
2. Check if the queue is not already in use: read QueuePFN
register, returned value should be zero (0x0).
[...]

I think this in itself is already suboptimal, because a guest that
crashes and reboots (while the emulator itself survives) will not be
able to use the device after said reboot (it has never re-set QueuePFN
to zero).

But, more importantly: I think that resetting the device (by writing 0
to its status register) should include (ie. *guarantee*) the effects of
setting QueuePFN to zero for all imaginable queues of the device.

This way, a defensive guest that starts up by resetting the device (*)
after identifying it via MagicValue / Version / DeviceID / VendorID
would be able to use the device regardless of the device's prior
QueuePFN setting(s).

(*) Resetting the device is the first step in 2.2.1 Device
Initialization Sequence. It is not required on initial start up, but
as a guest driver can never be sure whether the startup in question is
the initial one, a defensive driver will always start with device reet.


The question arises because Olivier has posted a series to edk2-devel
that adds virtio-mmio support to TianoCore, and Mark tested it (using
OVMF) with a Linux guest and found problems. Namely, OVMF itself can
drive the virtio devices via virtio-mmio, but the Linux kernel booted
from OVMF can not. The reason is the missing zeroing of QueuePFN when
OVMF is exiting. (I'm just paraphrasing the analysis.)

I think
- that resetting the device (via its status register) should make the
host forget *all* prior configuration, including QueuePFN,
- and that the Linux driver should reset the device as first step.

So:
- What's the motivation for the acquire/release semantics of QueuePFN?
- Am I right that device reset should force a QueuePFN release too?

Thanks,
Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: QueuePFN peculiarity in virtio-mmio

2013-10-22 Thread Laszlo Ersek
My apologies, I used Anthony's previous (now obsolete) email. Updated it
now  keeping full context below. Sorry.


On 10/22/13 19:49, Laszlo Ersek wrote:
 Hi,
 
 Appendix X: virtio-mmio in the virtio spec says
 
 • 0x040 | RW | QueuePFN
   [...] When the Guest stops using the queue it must write zero
   (0x0) to this register.
   [...]
 
 and
 
 Virtqueue Configuration
 
 [...]
 2. Check if the queue is not already in use: read QueuePFN
 register, returned value should be zero (0x0).
 [...]
 
 I think this in itself is already suboptimal, because a guest that
 crashes and reboots (while the emulator itself survives) will not be
 able to use the device after said reboot (it has never re-set QueuePFN
 to zero).
 
 But, more importantly: I think that resetting the device (by writing 0
 to its status register) should include (ie. *guarantee*) the effects of
 setting QueuePFN to zero for all imaginable queues of the device.
 
 This way, a defensive guest that starts up by resetting the device (*)
 after identifying it via MagicValue / Version / DeviceID / VendorID
 would be able to use the device regardless of the device's prior
 QueuePFN setting(s).
 
 (*) Resetting the device is the first step in 2.2.1 Device
 Initialization Sequence. It is not required on initial start up, but
 as a guest driver can never be sure whether the startup in question is
 the initial one, a defensive driver will always start with device reet.
 
 
 The question arises because Olivier has posted a series to edk2-devel
 that adds virtio-mmio support to TianoCore, and Mark tested it (using
 OVMF) with a Linux guest and found problems. Namely, OVMF itself can
 drive the virtio devices via virtio-mmio, but the Linux kernel booted
 from OVMF can not. The reason is the missing zeroing of QueuePFN when
 OVMF is exiting. (I'm just paraphrasing the analysis.)
 
 I think
 - that resetting the device (via its status register) should make the
 host forget *all* prior configuration, including QueuePFN,
 - and that the Linux driver should reset the device as first step.
 
 So:
 - What's the motivation for the acquire/release semantics of QueuePFN?
 - Am I right that device reset should force a QueuePFN release too?
 
 Thanks,
 Laszlo
 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [edk2] QueuePFN peculiarity in virtio-mmio

2013-10-22 Thread Laszlo Ersek
On 10/22/13 19:55, Laszlo Ersek wrote:

 The question arises because Olivier has posted a series to edk2-devel
 that adds virtio-mmio support to TianoCore, and Mark tested it (using
 OVMF) with a Linux guest and found problems. Namely, OVMF itself can
 drive the virtio devices via virtio-mmio, but the Linux kernel booted
 from OVMF can not. The reason is the missing zeroing of QueuePFN when
 OVMF is exiting. (I'm just paraphrasing the analysis.)

s/OVMF/AArch64 foundation model/g

http://thread.gmane.org/gmane.comp.bios.tianocore.devel/4373/focus=4411


I should have followed my own advice, not to post when sick. I'll go now
and hide in a cave.

Laszlo
/facepalm

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-spec PATCH 0/5] Receiving Used Buffers example code: cleanups and an extra mb()

2013-06-17 Thread Laszlo Ersek
On 06/17/13 04:17, Rusty Russell wrote:
 Stefan Hajnoczi stefa...@gmail.com writes:
 On Sat, Jun 8, 2013 at 7:39 PM, Laszlo Ersek ler...@redhat.com wrote:
 Patches before the last are small cleanups.

 In the last patch I'm trying to extract / generalize an idea from Stefan
 Hajnoczi's review of my virtio-net driver for OVMF.
 
 How about a single patch which just replaces the completely broken
 example? :)

Do you want me to squash the series into one patch and repost?

   Receiving Used Buffers: prevent speculative load when not
 sequentially consistent
 
 Yes, though this only needs to be a rmb().

Probably, but the spec never qualifies the memory barriers it
recommends. (Maybe their read/write types should be obvious to the
reader. I didn't give their types much thought because in edk2/OVMF
there's only MemoryFence().)

In any case I'll take this as your approval of the 5/5 patch. Thanks!

 In the OASIS 1.0 spec, I'd like an appendix with tested code for doing
 these operations (probably based on a simplifeid version of vringh.c).

(
I assume the tested code should come from one of the more sophisticated
drivers (that also have a unix-y coding style). Also, I've been actively
avoiding reading other virtio code (qemu, kernel, SeaBIOS, iPXE etc)
except when I was stuck with an OVMF driver and (a) it looked like there
was some silent requirement/assumption in qemu that had not been spelled
out in the spec, (b) I was being dyslexic.

Thus I can't really suggest code for the appendix (although the OVMF
drivers do work).
)

Thanks,
Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Laszlo Ersek
On 05/29/13 09:27, Paolo Bonzini wrote:
 Il 29/05/2013 06:33, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, 
 device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?

 I'm not sure it's portable to use it in case labels.  IIRC, the
 definition ((int)(((T *)0)-field)) is not a valid C integer constant
 expression.  Laszlo?

 It's defined to yield an integer constant expression in the ISO standard
 (and I think ANSI too, though that's not at hand):
 
 It's not in C89.

It is, see 7.1.6 Common definitions stddef.h.


 The oldest compiler QEMU cares about is GCC 4.2.  I
 don't know if it has a builtin offsetof, probably it does.

Talking about nothing else but the ISO C standard(s), it doesn't matter
how the C implementation deals with offsetof() internally. C
implementation in this sense includes both compiler and standard library.

If the standard library header (stddef.h or something internal
included by it) expands the offsetof() macro to replacement text that
does pointer black magic, magic that would be otherwise nonportable or
undefined, it is alright as long as the accompanying compiler guarantees
that the replacement text will work.

That is, offsetof() gives the C implementor leeway to implement it in
wherever distribution between compiler and standard library; the main
thing is that the C program use the one offsetof() provided by the C
implementation.

Of course in the FLOSS world OS distros might mix and match gcc and
glibc statistically randomly; normally some documentation should enable
the user to put his/her system into ISO C or even SUSv[1234] mode.

(

An interesting example for, say, non-unified implementation is
getconf vs. c99. I'll pick SUSv3 for this example.

http://pubs.opengroup.org/onlinepubs/95399/utilities/getconf.html
http://pubs.opengroup.org/onlinepubs/95399/utilities/c99.html

In a nutshell one can interrogate getconf for CFLAGS, LDFLAGS and LIBS
so that c99 builds a program in ILP32_OFF32 / ILP32_OFFBIG / LP64_OFF64
/ LPBIG_OFFBIG mode (programming environment).

On my x86_64 RHEL-6.4 laptop, getconf is part of glibc
(glibc-common-2.12-1.107.el6.x86_64), while c99 is part of gcc
(gcc-4.4.7-3.el6.x86_64).

Supposing I'd like to build a 32-bit program with 64-bit off_t:

  getconf _POSIX_V6_ILP32_OFFBIG
  1

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_CFLAGS
  -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LDFLAGS
  -m32

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LIBS
  [empty string]

However if I try to actually build a program using these flags:

  #define _XOPEN_SOURCE 600
  int main(void) { return 0; }

  c99 \
  -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 \
  -o mytest \
  -m32 \
  mytest.c

I get

  /usr/bin/ld: crt1.o: No such file: No such file or directory
  collect2: ld returned 1 exit status

unless I install glibc-devel.i686.

This problem is rooted in getconf (a glibc utility) being unaware of
RHEL packaging choices: if the system can't guarantee the final c99
invocation to succeed, then the very first getconf invocation should
write -1\n or undefined\n to stdout.

(I'm aware that RHEL-6 is not certified UNIX 03; this is just an example
for problems caused by various parts of a standard's
(quasi-)implementation coming from separate projects. In that sense,
caring about the offsetof() internals may have merit.)

)

Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


anyone willing to review a virtio-net guest driver for OVMF?

2013-05-29 Thread Laszlo Ersek
(Resending to the virt list after the original qemu-devel posting,
following MST's advice; plus adding ipxe-devel too -- apologies for
spamming!)

Hi,

the driver in question is intended as a fallback driver when the iPXE
EFI driver for virtio-net is not present as an oprom. The series starts
with technical notes that should help the virtio-net expert catch any
errors more easily.

This review could easily take up a lot of time; my hope is in the fact
that people here are both more skilled and faster than my humble self.

http://thread.gmane.org/gmane.comp.bios.tianocore.devel/2804

Thanks,
Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Laszlo Ersek
On 05/28/13 19:43, Paolo Bonzini wrote:
 Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?
 
 I'm not sure it's portable to use it in case labels.  IIRC, the
 definition ((int)(((T *)0)-field)) is not a valid C integer constant
 expression.  Laszlo?

As long as we use the offsetof() that comes with the C implementation
(ie. we don't hack it up ourselves), we should be safe; ISO C99
elegantly defines offsetof() in 7.17 Common definitions stddef.h p3
as

  The macros are [...]

offsetof(type, member-designator)

  which expands to an integer constant expression that has type
  *size_t*, the value of which is the offset in bytes, to the structure
  member (designated by /member-designator/), from the beginning of its
  structure (designated by /type/). The type and member designator shall
  be such that given

static type t;

  then the expression (t.member-designator) evaluates to an address
  constant. (If the specified member is a bit-field, the behavior is
  undefined.)

(Address constant is described in 6.6p9 but quoting even that here
doesn't seem necesary.)

From 6.8.4.2p3, The expression of each case label shall be an integer
constant expression [...].

So,
(a) if we use the standard offsetof(),
(b) and you can also write, for example,

  static struct virtio_pci_common_cfg t;
  static void *p = t.device_feature_select;

then the construct seems valid.

(Consistently with the above mention of UB if the specified member is a
bit-field: the address-of operator's constraints also forbid a bit-field
object as operand.)

Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: suggesting wording fixes for virtio-spec 0.9.5

2013-04-23 Thread Laszlo Ersek
On 04/23/13 06:05, Rusty Russell wrote:
 Laszlo Ersek ler...@redhat.com writes:
 Hi,

 (I'm not subscribed to either list,)

 using the word descriptor is misleading in the following sections:
 
 Yes, I like the use of 'descriptor chains'.  This is a definite
 improvement.
 
 Here's the diff I ended up with (massaged to minimize it).
 
 Thanks!
 Rusty.
 
 --- virtio-spec.txt-old   2013-04-23 13:22:21.339158214 +0930
 +++ virtio-spec.txt   2013-04-23 13:34:14.055176464 +0930
 @@ -482,10 +482,10 @@
  
  2.3.4 Available Ring
  
 -The available ring refers to what descriptors we are offering the 
 -device: it refers to the head of a descriptor chain. The “flags” 
 +The available ring refers to what descriptor chains we are offering the
 +device: each entry refers to the head of a descriptor chain. The “flags”
  field is currently 0 or 1: 1 indicating that we do not need an 
 -interrupt when the device consumes a descriptor from the 
 +interrupt when the device consumes a descriptor chain from the
  available ring. Alternatively, the guest can ask the device to 
  delay interrupts until an entry with an index specified by the “
  used_event” field is written in the used ring (equivalently, 
 @@ -671,16 +671,16 @@
  
  avail-ring[avail-idx % qsz] = head;
  
 -However, in general we can add many descriptors before we update 
 -the “idx” field (at which point they become visible to the 
 -device), so we keep a counter of how many we've added:
 +However, in general we can add many separate descriptor chains before we 
 update
 +the “idx” field (at which point they become visible to the device),
 +so we keep a counter of how many we've added:
  
  avail-ring[(avail-idx + added++) % qsz] = head;
  
  2.4.1.3 Updating The Index Field
  
  Once the idx field of the virtqueue is updated, the device will 
 -be able to access the descriptor entries we've created and the 
 +be able to access the descriptor chains we've created and the 
  memory they refer to. This is why a memory barrier is generally 
  used before the idx update, to ensure it sees the most up-to-date 
  copy.
 

Not sure if it's customary here or if you need it / want it, but anyway

Reviewed-by: Laszlo Ersek ler...@redhat.com

(Also I've fixed the OVMF driver; just reposting the patch today with a
better commit message.)

Thanks much!
Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

suggesting wording fixes for virtio-spec 0.9.5

2013-04-22 Thread Laszlo Ersek
Hi,

(I'm not subscribed to either list,)

using the word descriptor is misleading in the following sections:

  2.4.1.2 Updating The Available Ring

  [...] However, in general we can add many descriptors before we
  update the idx field (at which point they become visible to the
  device), so we keep a counter of how many we've added: [...]

and

  2.4.1.3 Updating The Index Field

  Once the idx field of the virtqueue is updated, the device will be
  able to access the descriptor entries we've created and the memory
  they refer to. [...]

(The word descriptor in the above language is the reason I
mis-implemented the virtio-blk guest driver in OVMF.)

In fact the available ring tracks *head* descriptors only. I suggest

  s/many descriptors/many separate descriptor chains/
  s/descriptor entries/separate descriptor chains/

for the above.

Similarly, 2.3.4 Available Ring should start with something like

  The available ring describes what descriptor chains we are offering
  the device: each entry of the available ring refers to the head
  descriptor of a separate descriptor chain.

Thanks
Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


miscalculated vring size in virtio-0.9.5.pdf?

2012-09-18 Thread Laszlo Ersek
Hi,

the vring_size() formula in section 2.3 seems to miss

(a) vring_avail.used_event in the first ALIGN(), and

(b) vring_used.flags, vring_used.idx, vring_used.avail_event in
the second ALIGN().

vring_size() in Appendix A fixes (b) (no rounding up to page size
though), but (a) is wrong there too, I believe.

Thanks,
Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v4] xen-netfront: delay gARP until backend switches to Connected

2011-12-11 Thread Laszlo Ersek
After a guest is live migrated, the xen-netfront driver emits a gratuitous
ARP message, so that networking hardware on the target host's subnet can
take notice, and public routing to the guest is re-established. However,
if the packet appears on the backend interface before the backend is added
to the target host's bridge, the packet is lost, and the migrated guest's
peers become unable to talk to the guest.

A sufficient two-parts condition to prevent the above is:

(1) ensure that the backend only moves to Connected xenbus state after its
hotplug scripts completed, ie. the netback interface got added to the
bridge; and

(2) ensure the frontend only queues the gARP when it sees the backend move
to Connected.

These two together provide complete ordering. Sub-condition (1) is already
satisfied by commit f942dc2552b8 in Linus' tree, based on commit
6b0b80ca7165 from [1].

In general, the full condition is sufficient, not necessary, because,
according to [2], live migration has been working for a long time without
satisfying sub-condition (2). However, after 6b0b80ca7165 was backported
to the RHEL-5 host to ensure (1), (2) still proved necessary in the RHEL-6
guest. This patch intends to provide (2) for upstream.

The Reviewed-by line comes from [3].

[1] 
git://xenbits.xen.org/people/ianc/linux-2.6.git#upstream/dom0/backend/netback-history
[2] http://old-list-archives.xen.org/xen-devel/2011-06/msg01969.html
[3] http://old-list-archives.xen.org/xen-devel/2011-07/msg00484.html

Signed-off-by: Laszlo Ersek ler...@redhat.com
Reviewed-by: Ian Campbell ian.campb...@citrix.com
---
 drivers/net/xen-netfront.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d29365a..f033656 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1646,7 +1646,6 @@ static void netback_changed(struct xenbus_device *dev,
case XenbusStateInitialised:
case XenbusStateReconfiguring:
case XenbusStateReconfigured:
-   case XenbusStateConnected:
case XenbusStateUnknown:
case XenbusStateClosed:
break;
@@ -1657,6 +1656,9 @@ static void netback_changed(struct xenbus_device *dev,
if (xennet_connect(netdev) != 0)
break;
xenbus_switch_state(dev, XenbusStateConnected);
+   break;
+
+   case XenbusStateConnected:
netif_notify_peers(netdev);
break;
 
-- 
1.7.4.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 REPOST] xen-netfront: delay gARP until backend switches to Connected

2011-12-09 Thread Laszlo Ersek
After a guest is live migrated, the xen-netfront driver emits a gratuitous
ARP message, so that networking hardware on the target host's subnet can
take notice, and public routing to the guest is re-established. However,
if the packet appears on the backend interface before the backend is added
to the target host's bridge, the packet is lost, and the migrated guest's
peers become unable to talk to the guest.

A sufficient two-parts condition to prevent the above is:

(1) ensure that the backend only moves to Connected xenbus state after its
hotplug scripts completed, ie. the netback interface got added to the
bridge; and

(2) ensure the frontend only queues the gARP when it sees the backend move
to Connected.

These two together provide complete ordering. Sub-condition (1) is
satisfied by pvops commit 43223efd9bfd.

In general, the full condition is sufficient, not necessary, because,
according to [1], live migration has been working for a long time without
satisfying sub-condition (2). However, after 43223efd9bfd was backported
to the RHEL-5 host to ensure (1), (2) still proved necessary in the RHEL-6
guest. This patch intends to provide (2) for upstream.

The Reviewed-by line comes from [2].

[1] http://old-list-archives.xen.org/xen-devel/2011-06/msg01969.html
[2] http://old-list-archives.xen.org/xen-devel/2011-07/msg00484.html

Signed-off-by: Laszlo Ersek ler...@redhat.com
Reviewed-by: Ian Campbell ian.campb...@citrix.com
---
 drivers/net/xen-netfront.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index d29365a..f033656 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1646,7 +1646,6 @@ static void netback_changed(struct xenbus_device *dev,
case XenbusStateInitialised:
case XenbusStateReconfiguring:
case XenbusStateReconfigured:
-   case XenbusStateConnected:
case XenbusStateUnknown:
case XenbusStateClosed:
break;
@@ -1657,6 +1656,9 @@ static void netback_changed(struct xenbus_device *dev,
if (xennet_connect(netdev) != 0)
break;
xenbus_switch_state(dev, XenbusStateConnected);
+   break;
+
+   case XenbusStateConnected:
netif_notify_peers(netdev);
break;
 
-- 
1.7.4.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization