Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support
On 15.07.2019 19:28, Andy Lutomirski wrote: > On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen wrote: >> >> Juergen Gross writes: >> >>> The long term plan has been to replace Xen PV guests by PVH. The first >>> victim of that plan are now 32-bit PV guests, as those are used only >>> rather seldom these days. Xen on x86 requires 64-bit support and with >>> Grub2 now supporting PVH officially since version 2.04 there is no >>> need to keep 32-bit PV guest support alive in the Linux kernel. >>> Additionally Meltdown mitigation is not available in the kernel running >>> as 32-bit PV guest, so dropping this mode makes sense from security >>> point of view, too. >> >> Normally we have a deprecation period for feature removals like this. >> You would make the kernel print a warning for some releases, and when >> no user complains you can then remove. If a user complains you can't. >> > > As I understand it, the kernel rules do allow changes like this even > if there's a complaint: this is a patch that removes what is > effectively hardware support. If the maintenance cost exceeds the > value, then removal is fair game. (Obviously we weight the value to > preserving compatibility quite highly, but in this case, Xen dropped > 32-bit hardware support a long time ago. If the Xen hypervisor says > that 32-bit PV guest support is deprecated, it's deprecated.) Since it was implied but not explicit from Andrew's reply, just to make it explicit: So far 32-bit PV guest support has not been deprecated in Xen itself. Jan ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks
On 15.07.19 17:16, Andrew Cooper wrote: There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by lapic_{suspend,resume}(). Saving and restoring cr8 independently of the rest of the Local APIC state isn't a clever thing to be doing. Delete the suspend/resume cr8 handling, which shrinks the size of struct saved_context, and allows for the removal of both PVOPS. Signed-off-by: Andrew Cooper Reviewed-by: Juergen Gross Juergen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support
On 15.07.19 19:39, Andrew Cooper wrote: On 15/07/2019 18:28, Andy Lutomirski wrote: On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen wrote: Juergen Gross writes: The long term plan has been to replace Xen PV guests by PVH. The first victim of that plan are now 32-bit PV guests, as those are used only rather seldom these days. Xen on x86 requires 64-bit support and with Grub2 now supporting PVH officially since version 2.04 there is no need to keep 32-bit PV guest support alive in the Linux kernel. Additionally Meltdown mitigation is not available in the kernel running as 32-bit PV guest, so dropping this mode makes sense from security point of view, too. Normally we have a deprecation period for feature removals like this. You would make the kernel print a warning for some releases, and when no user complains you can then remove. If a user complains you can't. As I understand it, the kernel rules do allow changes like this even if there's a complaint: this is a patch that removes what is effectively hardware support. If the maintenance cost exceeds the value, then removal is fair game. (Obviously we weight the value to preserving compatibility quite highly, but in this case, Xen dropped 32-bit hardware support a long time ago. If the Xen hypervisor says that 32-bit PV guest support is deprecated, it's deprecated.) That being said, a warning might not be a bad idea. What's the current status of this in upstream Xen? So personally, I'd prefer to see support stay, but at the end of the day it is Juergen's choice as the maintainer of the code. Especially on the security front we are unsafe with 32-bit PV Linux. And making it safe will make it so slow that the needed effort is not spent very well. Juergen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks
> On Jul 15, 2019, at 4:30 PM, Andrew Cooper wrote: > > On 15/07/2019 19:17, Nadav Amit wrote: >>> On Jul 15, 2019, at 8:16 AM, Andrew Cooper >>> wrote: >>> >>> There is a lot of infrastructure for functionality which is used >>> exclusively in __{save,restore}_processor_state() on the suspend/resume >>> path. >>> >>> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by >>> lapic_{suspend,resume}(). Saving and restoring cr8 independently of the >>> rest of the Local APIC state isn't a clever thing to be doing. >>> >>> Delete the suspend/resume cr8 handling, which shrinks the size of struct >>> saved_context, and allows for the removal of both PVOPS. >> I think removing the interface for CR8 writes is also good to avoid >> potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task >> Priorities between CR8 and APIC”): >> >> "Operating software should implement either direct APIC TPR updates or CR8 >> style TPR updates but not mix them. Software can use a serializing >> instruction (for example, CPUID) to serialize updates between MOV CR8 and >> stores to the APIC.” >> >> And native_write_cr8() did not even issue a serializing instruction. > > Given its location, the one write_cr8() is bounded by two serialising > operations, so is safe in practice. That’s what the “potential” in "potential correctness issues” means :) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks
On Mon, Jul 15, 2019 at 4:30 PM Andrew Cooper wrote: > > On 15/07/2019 19:17, Nadav Amit wrote: > >> On Jul 15, 2019, at 8:16 AM, Andrew Cooper > >> wrote: > >> > >> There is a lot of infrastructure for functionality which is used > >> exclusively in __{save,restore}_processor_state() on the suspend/resume > >> path. > >> > >> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by > >> lapic_{suspend,resume}(). Saving and restoring cr8 independently of the > >> rest of the Local APIC state isn't a clever thing to be doing. > >> > >> Delete the suspend/resume cr8 handling, which shrinks the size of struct > >> saved_context, and allows for the removal of both PVOPS. > > I think removing the interface for CR8 writes is also good to avoid > > potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task > > Priorities between CR8 and APIC”): > > > > "Operating software should implement either direct APIC TPR updates or CR8 > > style TPR updates but not mix them. Software can use a serializing > > instruction (for example, CPUID) to serialize updates between MOV CR8 and > > stores to the APIC.” > > > > And native_write_cr8() did not even issue a serializing instruction. > > > > Given its location, the one write_cr8() is bounded by two serialising > operations, so is safe in practice. > > However, I agree with the statement in the manual. I could submit a v3 > with an updated commit message, or let it be fixed on commit. Whichever > is easiest. > I don't see anything wrong with the message. If we actually used CR8 for interrupt priorities, we wouldn't want it to serialize. The bug is that the code that did the write_cr8() should have had a comment as to how it serialized against lapic_restore(). But that doesn't seem worth mentioning in the message, since, as noted, the real problem was that it nonsensically restored just TPR without restoring everything else. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
On Mon, 2019-07-15 at 19:03 -0300, Thiago Jung Bauermann wrote: > > > Indeed. The idea is that QEMU can offer the flag, old guests can > > > reject > > > it (or even new guests can reject it, if they decide not to > > > convert into > > > secure VMs) and the feature negotiation will succeed with the > > > flag > > > unset. > > > > OK. And then what does QEMU do? Assume guest is not encrypted I > > guess? > > There's nothing different that QEMU needs to do, with or without the > flag. the perspective of the host, a secure guest and a regular guest > work the same way with respect to virtio. This is *precisely* why I was against adding a flag and touch the protocol negociation with qemu in the first place, back when I cared about that stuff... Guys, this has gone in circles over and over again. This has nothing to do with qemu. Qemu doesn't need to know about this. It's entirely guest local. This is why the one-liner in virtio was a far better and simpler solution. This is something the guest does to itself (with the participation of a ultravisor but that's not something qemu cares about at this stage, at least not as far as virtio is concerned). Basically, the guest "hides" its memory from the host using a HW secure memory facility. As a result, it needs to ensure that all of its DMA pages are bounced through insecure pages that aren't hidden. That's it, it's all guest side. Qemu shouldn't have to care about it at all. Cheers, Ben. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks
On 15/07/2019 19:17, Nadav Amit wrote: >> On Jul 15, 2019, at 8:16 AM, Andrew Cooper wrote: >> >> There is a lot of infrastructure for functionality which is used >> exclusively in __{save,restore}_processor_state() on the suspend/resume >> path. >> >> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by >> lapic_{suspend,resume}(). Saving and restoring cr8 independently of the >> rest of the Local APIC state isn't a clever thing to be doing. >> >> Delete the suspend/resume cr8 handling, which shrinks the size of struct >> saved_context, and allows for the removal of both PVOPS. > I think removing the interface for CR8 writes is also good to avoid > potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task > Priorities between CR8 and APIC”): > > "Operating software should implement either direct APIC TPR updates or CR8 > style TPR updates but not mix them. Software can use a serializing > instruction (for example, CPUID) to serialize updates between MOV CR8 and > stores to the APIC.” > > And native_write_cr8() did not even issue a serializing instruction. > Given its location, the one write_cr8() is bounded by two serialising operations, so is safe in practice. However, I agree with the statement in the manual. I could submit a v3 with an updated commit message, or let it be fixed on commit. Whichever is easiest. ~Andrew ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
Michael S. Tsirkin writes: > On Mon, Jul 15, 2019 at 07:03:03PM -0300, Thiago Jung Bauermann wrote: >> >> Michael S. Tsirkin writes: >> >> > On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote: >> >> >> >> Michael S. Tsirkin writes: >> >> >> >> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote: >> >> >> >> >> >> >> >> >> Michael S. Tsirkin writes: >> >> >> >> >> >> > So this is what I would call this option: >> >> >> > >> >> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS >> >> >> > >> >> >> > and the explanation should state that all device >> >> >> > addresses are translated by the platform to identical >> >> >> > addresses. >> >> >> > >> >> >> > In fact this option then becomes more, not less restrictive >> >> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise >> >> >> > by guest to only create identity mappings, >> >> >> > and only before driver_ok is set. >> >> >> > This option then would always be negotiated together with >> >> >> > VIRTIO_F_ACCESS_PLATFORM. >> >> >> > >> >> >> > Host then must verify that >> >> >> > 1. full 1:1 mappings are created before driver_ok >> >> >> > or can we make sure this happens before features_ok? >> >> >> > that would be ideal as we could require that features_ok fails >> >> >> > 2. mappings are not modified between driver_ok and reset >> >> >> > i guess attempts to change them will fail - >> >> >> > possibly by causing a guest crash >> >> >> > or some other kind of platform-specific error >> >> >> >> >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but >> >> >> requiring >> >> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is >> >> >> SLOF as I mentioned above, another is that we would be requiring all >> >> >> guests running on the machine (secure guests or not, since we would use >> >> >> the same configuration for all guests) to support it. But >> >> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For >> >> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know >> >> >> about >> >> >> it and wouldn't be able to use the device. >> >> > >> >> > OK and your target is to enable use with kernel drivers within >> >> > guests, right? >> >> >> >> Right. >> >> >> >> > My question is, we are defining a new flag here, I guess old guests >> >> > then do not set it. How does it help old guests? Or maybe it's >> >> > not designed to ... >> >> >> >> Indeed. The idea is that QEMU can offer the flag, old guests can reject >> >> it (or even new guests can reject it, if they decide not to convert into >> >> secure VMs) and the feature negotiation will succeed with the flag >> >> unset. >> > >> > OK. And then what does QEMU do? Assume guest is not encrypted I guess? >> >> There's nothing different that QEMU needs to do, with or without the >> flag. the perspective of the host, a secure guest and a regular guest >> work the same way with respect to virtio. > > OK. So now let's get back to implementation. What will > Linux guest driver do? It can't activate DMA API blindly since that > will assume translation also works, right? It can on pseries, because we always have a 1:1 window mapping the whole guest memory. > Or do we somehow limit it to just a specific platform? Yes, we want to accept the new flag only on secure pseries guests. -- Thiago Jung Bauermann IBM Linux Technology Center ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
On Mon, Jul 15, 2019 at 07:03:03PM -0300, Thiago Jung Bauermann wrote: > > Michael S. Tsirkin writes: > > > On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote: > >> > >> Michael S. Tsirkin writes: > >> > >> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote: > >> >> > >> >> > >> >> Michael S. Tsirkin writes: > >> >> > >> >> > So this is what I would call this option: > >> >> > > >> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS > >> >> > > >> >> > and the explanation should state that all device > >> >> > addresses are translated by the platform to identical > >> >> > addresses. > >> >> > > >> >> > In fact this option then becomes more, not less restrictive > >> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise > >> >> > by guest to only create identity mappings, > >> >> > and only before driver_ok is set. > >> >> > This option then would always be negotiated together with > >> >> > VIRTIO_F_ACCESS_PLATFORM. > >> >> > > >> >> > Host then must verify that > >> >> > 1. full 1:1 mappings are created before driver_ok > >> >> > or can we make sure this happens before features_ok? > >> >> > that would be ideal as we could require that features_ok fails > >> >> > 2. mappings are not modified between driver_ok and reset > >> >> > i guess attempts to change them will fail - > >> >> > possibly by causing a guest crash > >> >> > or some other kind of platform-specific error > >> >> > >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring > >> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is > >> >> SLOF as I mentioned above, another is that we would be requiring all > >> >> guests running on the machine (secure guests or not, since we would use > >> >> the same configuration for all guests) to support it. But > >> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For > >> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about > >> >> it and wouldn't be able to use the device. > >> > > >> > OK and your target is to enable use with kernel drivers within > >> > guests, right? > >> > >> Right. > >> > >> > My question is, we are defining a new flag here, I guess old guests > >> > then do not set it. How does it help old guests? Or maybe it's > >> > not designed to ... > >> > >> Indeed. The idea is that QEMU can offer the flag, old guests can reject > >> it (or even new guests can reject it, if they decide not to convert into > >> secure VMs) and the feature negotiation will succeed with the flag > >> unset. > > > > OK. And then what does QEMU do? Assume guest is not encrypted I guess? > > There's nothing different that QEMU needs to do, with or without the > flag. the perspective of the host, a secure guest and a regular guest > work the same way with respect to virtio. OK. So now let's get back to implementation. What will Linux guest driver do? It can't activate DMA API blindly since that will assume translation also works, right? Or do we somehow limit it to just a specific platform? > -- > Thiago Jung Bauermann > IBM Linux Technology Center ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
Michael S. Tsirkin writes: > On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote: >> >> Michael S. Tsirkin writes: >> >> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote: >> >> >> >> >> >> Michael S. Tsirkin writes: >> >> >> >> > So this is what I would call this option: >> >> > >> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS >> >> > >> >> > and the explanation should state that all device >> >> > addresses are translated by the platform to identical >> >> > addresses. >> >> > >> >> > In fact this option then becomes more, not less restrictive >> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise >> >> > by guest to only create identity mappings, >> >> > and only before driver_ok is set. >> >> > This option then would always be negotiated together with >> >> > VIRTIO_F_ACCESS_PLATFORM. >> >> > >> >> > Host then must verify that >> >> > 1. full 1:1 mappings are created before driver_ok >> >> > or can we make sure this happens before features_ok? >> >> > that would be ideal as we could require that features_ok fails >> >> > 2. mappings are not modified between driver_ok and reset >> >> > i guess attempts to change them will fail - >> >> > possibly by causing a guest crash >> >> > or some other kind of platform-specific error >> >> >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring >> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is >> >> SLOF as I mentioned above, another is that we would be requiring all >> >> guests running on the machine (secure guests or not, since we would use >> >> the same configuration for all guests) to support it. But >> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For >> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about >> >> it and wouldn't be able to use the device. >> > >> > OK and your target is to enable use with kernel drivers within >> > guests, right? >> >> Right. >> >> > My question is, we are defining a new flag here, I guess old guests >> > then do not set it. How does it help old guests? Or maybe it's >> > not designed to ... >> >> Indeed. The idea is that QEMU can offer the flag, old guests can reject >> it (or even new guests can reject it, if they decide not to convert into >> secure VMs) and the feature negotiation will succeed with the flag >> unset. > > OK. And then what does QEMU do? Assume guest is not encrypted I guess? There's nothing different that QEMU needs to do, with or without the flag. the perspective of the host, a secure guest and a regular guest work the same way with respect to virtio. -- Thiago Jung Bauermann IBM Linux Technology Center ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote: > > Michael S. Tsirkin writes: > > > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote: > >> > >> > >> Michael S. Tsirkin writes: > >> > >> > So this is what I would call this option: > >> > > >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS > >> > > >> > and the explanation should state that all device > >> > addresses are translated by the platform to identical > >> > addresses. > >> > > >> > In fact this option then becomes more, not less restrictive > >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise > >> > by guest to only create identity mappings, > >> > and only before driver_ok is set. > >> > This option then would always be negotiated together with > >> > VIRTIO_F_ACCESS_PLATFORM. > >> > > >> > Host then must verify that > >> > 1. full 1:1 mappings are created before driver_ok > >> > or can we make sure this happens before features_ok? > >> > that would be ideal as we could require that features_ok fails > >> > 2. mappings are not modified between driver_ok and reset > >> > i guess attempts to change them will fail - > >> > possibly by causing a guest crash > >> > or some other kind of platform-specific error > >> > >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring > >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is > >> SLOF as I mentioned above, another is that we would be requiring all > >> guests running on the machine (secure guests or not, since we would use > >> the same configuration for all guests) to support it. But > >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For > >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about > >> it and wouldn't be able to use the device. > > > > OK and your target is to enable use with kernel drivers within > > guests, right? > > Right. > > > My question is, we are defining a new flag here, I guess old guests > > then do not set it. How does it help old guests? Or maybe it's > > not designed to ... > > Indeed. The idea is that QEMU can offer the flag, old guests can reject > it (or even new guests can reject it, if they decide not to convert into > secure VMs) and the feature negotiation will succeed with the flag > unset. OK. And then what does QEMU do? Assume guest is not encrypted I guess? > -- > Thiago Jung Bauermann > IBM Linux Technology Center ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
Michael S. Tsirkin writes: > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote: >> >> >> Michael S. Tsirkin writes: >> >> > So this is what I would call this option: >> > >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS >> > >> > and the explanation should state that all device >> > addresses are translated by the platform to identical >> > addresses. >> > >> > In fact this option then becomes more, not less restrictive >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise >> > by guest to only create identity mappings, >> > and only before driver_ok is set. >> > This option then would always be negotiated together with >> > VIRTIO_F_ACCESS_PLATFORM. >> > >> > Host then must verify that >> > 1. full 1:1 mappings are created before driver_ok >> > or can we make sure this happens before features_ok? >> > that would be ideal as we could require that features_ok fails >> > 2. mappings are not modified between driver_ok and reset >> > i guess attempts to change them will fail - >> > possibly by causing a guest crash >> > or some other kind of platform-specific error >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is >> SLOF as I mentioned above, another is that we would be requiring all >> guests running on the machine (secure guests or not, since we would use >> the same configuration for all guests) to support it. But >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about >> it and wouldn't be able to use the device. > > OK and your target is to enable use with kernel drivers within > guests, right? Right. > My question is, we are defining a new flag here, I guess old guests > then do not set it. How does it help old guests? Or maybe it's > not designed to ... Indeed. The idea is that QEMU can offer the flag, old guests can reject it (or even new guests can reject it, if they decide not to convert into secure VMs) and the feature negotiation will succeed with the flag unset. -- Thiago Jung Bauermann IBM Linux Technology Center ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks
> On Jul 15, 2019, at 8:16 AM, Andrew Cooper wrote: > > There is a lot of infrastructure for functionality which is used > exclusively in __{save,restore}_processor_state() on the suspend/resume > path. > > cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by > lapic_{suspend,resume}(). Saving and restoring cr8 independently of the > rest of the Local APIC state isn't a clever thing to be doing. > > Delete the suspend/resume cr8 handling, which shrinks the size of struct > saved_context, and allows for the removal of both PVOPS. I think removing the interface for CR8 writes is also good to avoid potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task Priorities between CR8 and APIC”): "Operating software should implement either direct APIC TPR updates or CR8 style TPR updates but not mix them. Software can use a serializing instruction (for example, CPUID) to serialize updates between MOV CR8 and stores to the APIC.” And native_write_cr8() did not even issue a serializing instruction. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
On Mon, Jul 15, 2019 at 09:44:16AM +0200, Stefano Garzarella wrote: > On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote: > > > > On 2019/7/12 下午6:00, Stefano Garzarella wrote: > > > On Thu, Jul 11, 2019 at 03:52:21PM -0400, Michael S. Tsirkin wrote: > > > > On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote: > > > > > On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote: > > > > > > On 2019/7/10 下午11:37, Stefano Garzarella wrote: > > > > > > > Hi, > > > > > > > as Jason suggested some months ago, I looked better at the > > > > > > > virtio-net driver to > > > > > > > understand if we can reuse some parts also in the virtio-vsock > > > > > > > driver, since we > > > > > > > have similar challenges (mergeable buffers, page allocation, small > > > > > > > packets, etc.). > > > > > > > > > > > > > > Initially, I would add the skbuff in the virtio-vsock in order to > > > > > > > re-use > > > > > > > receive_*() functions. > > > > > > > > > > > > Yes, that will be a good step. > > > > > > > > > > > Okay, I'll go on this way. > > > > > > > > > > > > Then I would move receive_[small, big, mergeable]() and > > > > > > > add_recvbuf_[small, big, mergeable]() outside of virtio-net > > > > > > > driver, in order to > > > > > > > call them also from virtio-vsock. I need to do some refactoring > > > > > > > (e.g. leave the > > > > > > > XDP part on the virtio-net driver), but I think it is feasible. > > > > > > > > > > > > > > The idea is to create a virtio-skb.[h,c] where put these > > > > > > > functions and a new > > > > > > > object where stores some attributes needed (e.g. hdr_len ) and > > > > > > > status (e.g. > > > > > > > some fields of struct receive_queue). > > > > > > > > > > > > My understanding is we could be more ambitious here. Do you see any > > > > > > blocker > > > > > > for reusing virtio-net directly? It's better to reuse not only the > > > > > > functions > > > > > > but also the logic like NAPI to avoid re-inventing something buggy > > > > > > and > > > > > > duplicated. > > > > > > > > > > > These are my concerns: > > > > > - virtio-vsock is not a "net_device", so a lot of code related to > > > > >ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, > > > > > offloading) will be > > > > >not used by virtio-vsock. > > > > > > Linux support device other than ethernet, so it should not be a problem. > > > > > > > > > > > > > > - virtio-vsock has a different header. We can consider it as part of > > > > >virtio_net payload, but it precludes the compatibility with old > > > > > hosts. This > > > > >was one of the major doubts that made me think about using only the > > > > >send/recv skbuff functions, that it shouldn't break the > > > > > compatibility. > > > > > > We can extend the current vnet header helper for it to work for vsock. > > Okay, I'll do it. > > > > > > > > > > > > > > > > > This is an idea of virtio-skb.h that > > > > > > > I have in mind: > > > > > > > struct virtskb; > > > > > > > > > > > > What fields do you want to store in virtskb? It looks to be exist > > > > > > sk_buff is > > > > > > flexible enough to us? > > > > > My idea is to store queues information, like struct receive_queue or > > > > > struct send_queue, and some device attributes (e.g. hdr_len ). > > > > > > If you reuse skb or virtnet_info, there is not necessary. > > > > Okay. > > > > > > > > > > > > > > > > > > > > > struct sk_buff *virtskb_receive_small(struct virtskb *vs, > > > > > > > ...); > > > > > > > struct sk_buff *virtskb_receive_big(struct virtskb *vs, > > > > > > > ...); > > > > > > > struct sk_buff *virtskb_receive_mergeable(struct virtskb > > > > > > > *vs, ...); > > > > > > > > > > > > > > int virtskb_add_recvbuf_small(struct virtskb*vs, ...); > > > > > > > int virtskb_add_recvbuf_big(struct virtskb *vs, ...); > > > > > > > int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...); > > > > > > > > > > > > > > For the Guest->Host path it should be easier, so maybe I can add a > > > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a > > > > > > > part of the code > > > > > > > of xmit_skb(). > > > > > > > > > > > > I may miss something, but I don't see any thing that prevents us > > > > > > from using > > > > > > xmit_skb() directly. > > > > > > > > > > > Yes, but my initial idea was to make it more parametric and not > > > > > related to the > > > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the > > > > > 'num_buffers' should be handled by the caller. > > > > > > > > > > > > Let me know if you have in mind better names or if I should put > > > > > > > these function > > > > > > > in another place. > > > > > > > > > > > > > > I would like to leave the control part completely separate, so, > > > > > > > for example, > > > > > > > the two drivers will negotiate the features independently and > > > > > > > they will
Re: [PATCH 0/2] Remove 32-bit Xen PV guest support
On 15.07.19 19:28, Andy Lutomirski wrote: On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen wrote: Juergen Gross writes: The long term plan has been to replace Xen PV guests by PVH. The first victim of that plan are now 32-bit PV guests, as those are used only rather seldom these days. Xen on x86 requires 64-bit support and with Grub2 now supporting PVH officially since version 2.04 there is no need to keep 32-bit PV guest support alive in the Linux kernel. Additionally Meltdown mitigation is not available in the kernel running as 32-bit PV guest, so dropping this mode makes sense from security point of view, too. Normally we have a deprecation period for feature removals like this. You would make the kernel print a warning for some releases, and when no user complains you can then remove. If a user complains you can't. As I understand it, the kernel rules do allow changes like this even if there's a complaint: this is a patch that removes what is effectively hardware support. If the maintenance cost exceeds the value, then removal is fair game. (Obviously we weight the value to preserving compatibility quite highly, but in this case, Xen dropped 32-bit hardware support a long time ago. If the Xen hypervisor says that 32-bit PV guest support is deprecated, it's deprecated.) That being said, a warning might not be a bad idea. What's the current status of this in upstream Xen? Xen still supports that. We have asked downstream for their opinion about dropping 32-bit PV guest support in the kernel about 1 year ago and the common answer was: no problem, but for users still wanting 32 bit guests we should wait until PVH support is available in all related products. Grub2 was the last one missing and as grub2 has released a version with PVH support I posted this small series now. Juergen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support
On 15/07/2019 18:28, Andy Lutomirski wrote: > On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen wrote: >> Juergen Gross writes: >> >>> The long term plan has been to replace Xen PV guests by PVH. The first >>> victim of that plan are now 32-bit PV guests, as those are used only >>> rather seldom these days. Xen on x86 requires 64-bit support and with >>> Grub2 now supporting PVH officially since version 2.04 there is no >>> need to keep 32-bit PV guest support alive in the Linux kernel. >>> Additionally Meltdown mitigation is not available in the kernel running >>> as 32-bit PV guest, so dropping this mode makes sense from security >>> point of view, too. >> Normally we have a deprecation period for feature removals like this. >> You would make the kernel print a warning for some releases, and when >> no user complains you can then remove. If a user complains you can't. >> > As I understand it, the kernel rules do allow changes like this even > if there's a complaint: this is a patch that removes what is > effectively hardware support. If the maintenance cost exceeds the > value, then removal is fair game. (Obviously we weight the value to > preserving compatibility quite highly, but in this case, Xen dropped > 32-bit hardware support a long time ago. If the Xen hypervisor says > that 32-bit PV guest support is deprecated, it's deprecated.) > > That being said, a warning might not be a bad idea. What's the > current status of this in upstream Xen? So personally, I'd prefer to see support stay, but at the end of the day it is Juergen's choice as the maintainer of the code. Xen itself has been exclusively 64-bit since Xen 4.3 (released in 2013). Over time, various features like SMEP/SMAP have been making 32bit PV guests progressively slower, because ring 1 is supervisor rather than user. Things have got even worse with IBRS, to the point at which 32bit PV guests are starting to run like treacle. There are no current plans to remove support for 32bit PV guests from Xen, but it is very much in the category of "you shouldn't be using this mode any more". ~Andrew P.S. I don't see 64bit PV guest support going anywhere, because there are still a number of open performance questions due to the inherent differences between syscall and vmexit, and the difference EPT/NPT tables make on cross-domain mappings. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] Remove 32-bit Xen PV guest support
On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen wrote: > > Juergen Gross writes: > > > The long term plan has been to replace Xen PV guests by PVH. The first > > victim of that plan are now 32-bit PV guests, as those are used only > > rather seldom these days. Xen on x86 requires 64-bit support and with > > Grub2 now supporting PVH officially since version 2.04 there is no > > need to keep 32-bit PV guest support alive in the Linux kernel. > > Additionally Meltdown mitigation is not available in the kernel running > > as 32-bit PV guest, so dropping this mode makes sense from security > > point of view, too. > > Normally we have a deprecation period for feature removals like this. > You would make the kernel print a warning for some releases, and when > no user complains you can then remove. If a user complains you can't. > As I understand it, the kernel rules do allow changes like this even if there's a complaint: this is a patch that removes what is effectively hardware support. If the maintenance cost exceeds the value, then removal is fair game. (Obviously we weight the value to preserving compatibility quite highly, but in this case, Xen dropped 32-bit hardware support a long time ago. If the Xen hypervisor says that 32-bit PV guest support is deprecated, it's deprecated.) That being said, a warning might not be a bad idea. What's the current status of this in upstream Xen? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] Remove 32-bit Xen PV guest support
Juergen Gross writes: > The long term plan has been to replace Xen PV guests by PVH. The first > victim of that plan are now 32-bit PV guests, as those are used only > rather seldom these days. Xen on x86 requires 64-bit support and with > Grub2 now supporting PVH officially since version 2.04 there is no > need to keep 32-bit PV guest support alive in the Linux kernel. > Additionally Meltdown mitigation is not available in the kernel running > as 32-bit PV guest, so dropping this mode makes sense from security > point of view, too. Normally we have a deprecation period for feature removals like this. You would make the kernel print a warning for some releases, and when no user complains you can then remove. If a user complains you can't. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 31/33] drm/bochs: drop use of drmP.h
On Mon, Jul 01, 2019 at 08:39:08AM +0200, Gerd Hoffmann wrote: > On Sun, Jun 30, 2019 at 08:19:20AM +0200, Sam Ravnborg wrote: > > Drop use of the deprecated drmP.h header file. > > Made bochs.h self-contained and then fixed > > fallout in remaining files. > > Several unused includes was dropped in the process. > > > > Signed-off-by: Sam Ravnborg > > Cc: Gerd Hoffmann > > Cc: David Airlie > > Cc: Daniel Vetter > > Cc: virtualization@lists.linux-foundation.org > > Acked-by: Gerd Hoffmann Thanks again, applied. Sam ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 27/33] drm/virtgpu: drop use of drmP.h
On Mon, Jul 01, 2019 at 08:38:56AM +0200, Gerd Hoffmann wrote: > On Sun, Jun 30, 2019 at 08:19:16AM +0200, Sam Ravnborg wrote: > > Drop use of the deprecated drmP.h header file. > > Fix fallout by adding missing include files. > > > > Signed-off-by: Sam Ravnborg > > Cc: David Airlie > > Cc: Gerd Hoffmann > > Cc: Daniel Vetter > > Cc: virtualization@lists.linux-foundation.org > > Acked-by: Gerd Hoffmann Thanks, applied. Sam ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 09/33] drm/qxl: drop use of drmP.h
On Mon, Jul 01, 2019 at 08:38:43AM +0200, Gerd Hoffmann wrote: > On Sun, Jun 30, 2019 at 08:18:58AM +0200, Sam Ravnborg wrote: > > Drop use of the deprecated drmP.h header file. > > While touching the files divided includes in blocks, > > and when needed sort the blocks. > > Fix fallout. > > > > Signed-off-by: Sam Ravnborg > > Cc: Dave Airlie > > Cc: Gerd Hoffmann > > Cc: virtualization@lists.linux-foundation.org > > Cc: spice-de...@lists.freedesktop.org > > Acked-by: Gerd Hoffmann Thanks, applied. Sam ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks
There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by lapic_{suspend,resume}(). Saving and restoring cr8 independently of the rest of the Local APIC state isn't a clever thing to be doing. Delete the suspend/resume cr8 handling, which shrinks the size of struct saved_context, and allows for the removal of both PVOPS. Signed-off-by: Andrew Cooper --- CC: x...@kernel.org CC: virtualization@lists.linux-foundation.org CC: Borislav Petkov CC: Peter Zijlstra CC: Andy Lutomirski CC: Nadav Amit CC: Stephane Eranian CC: Feng Tang CC: Juergen Gross CC: Boris Ostrovsky CC: "Rafael J. Wysocki" CC: Pavel Machek Spotted while reviewing "x86/apic: Initialize TPR to block interrupts 16-31" https://lore.kernel.org/lkml/dc04a9f8b234d7b0956a8d2560b8945bcd9c4bf7.1563117760.git.l...@kernel.org/ v2: * Drop saved_context.cr8 as well (Juergen) * Remove akata...@vmware.com from the CC list due to bounces --- arch/x86/include/asm/paravirt.h | 12 arch/x86/include/asm/paravirt_types.h | 5 - arch/x86/include/asm/special_insns.h | 24 arch/x86/include/asm/suspend_64.h | 2 +- arch/x86/kernel/asm-offsets_64.c | 1 - arch/x86/kernel/paravirt.c| 4 arch/x86/power/cpu.c | 4 arch/x86/xen/enlighten_pv.c | 15 --- 8 files changed, 1 insertion(+), 66 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index c25c38a05c1c..0e4a0539c353 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -139,18 +139,6 @@ static inline void __write_cr4(unsigned long x) PVOP_VCALL1(cpu.write_cr4, x); } -#ifdef CONFIG_X86_64 -static inline unsigned long read_cr8(void) -{ - return PVOP_CALL0(unsigned long, cpu.read_cr8); -} - -static inline void write_cr8(unsigned long x) -{ - PVOP_VCALL1(cpu.write_cr8, x); -} -#endif - static inline void arch_safe_halt(void) { PVOP_VCALL0(irq.safe_halt); diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 946f8f1f1efc..3c775fb5524b 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -119,11 +119,6 @@ struct pv_cpu_ops { void (*write_cr4)(unsigned long); -#ifdef CONFIG_X86_64 - unsigned long (*read_cr8)(void); - void (*write_cr8)(unsigned long); -#endif - /* Segment descriptor handling */ void (*load_tr_desc)(void); void (*load_gdt)(const struct desc_ptr *); diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 219be88a59d2..6d37b8fcfc77 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -73,20 +73,6 @@ static inline unsigned long native_read_cr4(void) void native_write_cr4(unsigned long val); -#ifdef CONFIG_X86_64 -static inline unsigned long native_read_cr8(void) -{ - unsigned long cr8; - asm volatile("movq %%cr8,%0" : "=r" (cr8)); - return cr8; -} - -static inline void native_write_cr8(unsigned long val) -{ - asm volatile("movq %0,%%cr8" :: "r" (val) : "memory"); -} -#endif - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS static inline u32 rdpkru(void) { @@ -200,16 +186,6 @@ static inline void wbinvd(void) #ifdef CONFIG_X86_64 -static inline unsigned long read_cr8(void) -{ - return native_read_cr8(); -} - -static inline void write_cr8(unsigned long x) -{ - native_write_cr8(x); -} - static inline void load_gs_index(unsigned selector) { native_load_gs_index(selector); diff --git a/arch/x86/include/asm/suspend_64.h b/arch/x86/include/asm/suspend_64.h index a7af9f53c0cb..35bb35d28733 100644 --- a/arch/x86/include/asm/suspend_64.h +++ b/arch/x86/include/asm/suspend_64.h @@ -34,7 +34,7 @@ struct saved_context { */ unsigned long kernelmode_gs_base, usermode_gs_base, fs_base; - unsigned long cr0, cr2, cr3, cr4, cr8; + unsigned long cr0, cr2, cr3, cr4; u64 misc_enable; bool misc_enable_saved; struct saved_msrs saved_msrs; diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c index d3d075226c0a..8b54d8e3a561 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -62,7 +62,6 @@ int main(void) ENTRY(cr2); ENTRY(cr3); ENTRY(cr4); - ENTRY(cr8); ENTRY(gdt_desc); BLANK(); #undef ENTRY diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 98039d7fb998..de4d4e8a54c1 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -311,10 +311,6 @@ struct paravirt_patch_template pv_ops = { .cpu.read_cr0 = native_read_cr0, .cpu.write_cr0
Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks
On 15/07/2019 15:52, Juergen Gross wrote: > On 15.07.19 16:26, Andy Lutomirski wrote: >> On Mon, Jul 15, 2019 at 6:23 AM Juergen Gross wrote: >>> >>> On 15.07.19 15:00, Andrew Cooper wrote: There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored independently by lapic_{suspend,resume}(). >>> >>> Aren't those called only with CONFIG_PM set? >>> >> >> >> Unless I'm missing something, we only build any of the restore code >> (including the write_cr8() call) if CONFIG_PM_SLEEP is set, and that >> selects CONFIG_PM, so we should be fine, I think. >> > > Okay, in that case I'd suggest to remove "cr8" from struct saved_context > as it won't be used any longer. To be honest, saving and restoring of cr8 without the rest of the Local APIC state is conceptually broken to begin with. If there are any bugs revealed by this, then the correct fixes are elsewhere. You're right about the saved_context. I'll submit a v2 with an even larger negative diffstat. ~Andrew ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks
On 15.07.19 16:26, Andy Lutomirski wrote: On Mon, Jul 15, 2019 at 6:23 AM Juergen Gross wrote: On 15.07.19 15:00, Andrew Cooper wrote: There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored independently by lapic_{suspend,resume}(). Aren't those called only with CONFIG_PM set? Unless I'm missing something, we only build any of the restore code (including the write_cr8() call) if CONFIG_PM_SLEEP is set, and that selects CONFIG_PM, so we should be fine, I think. Okay, in that case I'd suggest to remove "cr8" from struct saved_context as it won't be used any longer. Juergen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH AUTOSEL 4.4 33/53] vhost_net: disable zerocopy by default
From: Jason Wang [ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ] Vhost_net was known to suffer from HOL[1] issues which is not easy to fix. Several downstream disable the feature by default. What's more, the datapath was split and datacopy path got the support of batching and XDP support recently which makes it faster than zerocopy part for small packets transmission. It looks to me that disable zerocopy by default is more appropriate. It cold be enabled by default again in the future if we fix the above issues. [1] https://patchwork.kernel.org/patch/3787671/ Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/vhost/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 645b2197930e..f46317135224 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -30,7 +30,7 @@ #include "vhost.h" -static int experimental_zcopytx = 1; +static int experimental_zcopytx = 0; module_param(experimental_zcopytx, int, 0444); MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" " 1 -Enable; 0 - Disable"); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH AUTOSEL 4.9 41/73] vhost_net: disable zerocopy by default
From: Jason Wang [ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ] Vhost_net was known to suffer from HOL[1] issues which is not easy to fix. Several downstream disable the feature by default. What's more, the datapath was split and datacopy path got the support of batching and XDP support recently which makes it faster than zerocopy part for small packets transmission. It looks to me that disable zerocopy by default is more appropriate. It cold be enabled by default again in the future if we fix the above issues. [1] https://patchwork.kernel.org/patch/3787671/ Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/vhost/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 681d0eade82f..75e1089dfb01 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -30,7 +30,7 @@ #include "vhost.h" -static int experimental_zcopytx = 1; +static int experimental_zcopytx = 0; module_param(experimental_zcopytx, int, 0444); MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" " 1 -Enable; 0 - Disable"); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote: > > > Michael S. Tsirkin writes: > > > On Thu, Jun 27, 2019 at 10:58:40PM -0300, Thiago Jung Bauermann wrote: > >> > >> Michael S. Tsirkin writes: > >> > >> > On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote: > >> >> > >> >> > >> >> Michael S. Tsirkin writes: > >> >> > >> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote: > >> >> >> I rephrased it in terms of address translation. What do you think of > >> >> >> this version? The flag name is slightly different too: > >> >> >> > >> >> >> > >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same > >> >> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not > >> >> >> set, > >> >> >> with the exception that address translation is guaranteed to be > >> >> >> unnecessary when accessing memory addresses supplied to the > >> >> >> device > >> >> >> by the driver. Which is to say, the device will always use > >> >> >> physical > >> >> >> addresses matching addresses used by the driver (typically > >> >> >> meaning > >> >> >> physical addresses used by the CPU) and not translated further. > >> >> >> This > >> >> >> flag should be set by the guest if offered, but to allow for > >> >> >> backward-compatibility device implementations allow for it to be > >> >> >> left unset by the guest. It is an error to set both this flag and > >> >> >> VIRTIO_F_ACCESS_PLATFORM. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged > >> >> > drivers. This is why devices fail when it's not negotiated. > >> >> > >> >> Just to clarify, what do you mean by unprivileged drivers? Is it drivers > >> >> implemented in guest userspace such as with VFIO? Or unprivileged in > >> >> some other sense such as needing to use bounce buffers for some reason? > >> > > >> > I had drivers in guest userspace in mind. > >> > >> Great. Thanks for clarifying. > >> > >> I don't think this flag would work for guest userspace drivers. Should I > >> add a note about that in the flag definition? > > > > I think you need to clarify access protection rules. Is it only > > translation that is bypassed or is any platform-specific > > protection mechanism bypassed too? > > It is only translation. In a secure guest, if the device tries to access > a memory address that wasn't provided by the driver then the > architecture will deny that access. If the device accesses addresses > provided to it by the driver, then there's no protection mechanism or > translation to get in the way. > > >> >> > This confuses me. > >> >> > If driver is unpriveledged then what happens with this flag? > >> >> > It can supply any address it wants. Will that corrupt kernel > >> >> > memory? > >> >> > >> >> Not needing address translation doesn't necessarily mean that there's no > >> >> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's > >> >> always an IOMMU present. And we also support VFIO drivers. The VFIO API > >> >> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls > >> >> to program the IOMMU. > >> >> > >> >> For our use case, we don't need address translation because we set up an > >> >> identity mapping in the IOMMU so that the device can use guest physical > >> >> addresses. > > > > OK so I think I am beginning to see it in a different light. Right now the > > specific > > platform creates an identity mapping. That in turn means DMA API can be > > fast - it does not need to do anything. What you are looking for is a > > way to tell host it's an identity mapping - just as an optimization. > > > > Is that right? > > Almost. Theoretically it is just an optimization. But in practice the > pseries boot firmware (SLOF) doesn't support IOMMU_PLATFORM so it's not > possible to boot a guest from a device with that flag set. > > > So this is what I would call this option: > > > > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS > > > > and the explanation should state that all device > > addresses are translated by the platform to identical > > addresses. > > > > In fact this option then becomes more, not less restrictive > > than VIRTIO_F_ACCESS_PLATFORM - it's a promise > > by guest to only create identity mappings, > > and only before driver_ok is set. > > This option then would always be negotiated together with > > VIRTIO_F_ACCESS_PLATFORM. > > > > Host then must verify that > > 1. full 1:1 mappings are created before driver_ok > > or can we make sure this happens before features_ok? > > that would be ideal as we could require that features_ok fails > > 2. mappings are not modified between driver_ok and reset > > i guess attempts to change them will fail - > > possibly by causing a guest crash > > or some other kind of platform-specific error > > I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring > it
[PATCH AUTOSEL 4.14 056/105] vhost_net: disable zerocopy by default
From: Jason Wang [ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ] Vhost_net was known to suffer from HOL[1] issues which is not easy to fix. Several downstream disable the feature by default. What's more, the datapath was split and datacopy path got the support of batching and XDP support recently which makes it faster than zerocopy part for small packets transmission. It looks to me that disable zerocopy by default is more appropriate. It cold be enabled by default again in the future if we fix the above issues. [1] https://patchwork.kernel.org/patch/3787671/ Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/vhost/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index b40e8ded49c6..4d11152e60c1 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -35,7 +35,7 @@ #include "vhost.h" -static int experimental_zcopytx = 1; +static int experimental_zcopytx = 0; module_param(experimental_zcopytx, int, 0444); MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" " 1 -Enable; 0 - Disable"); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks
On Mon, Jul 15, 2019 at 6:23 AM Juergen Gross wrote: > > On 15.07.19 15:00, Andrew Cooper wrote: > > There is a lot of infrastructure for functionality which is used > > exclusively in __{save,restore}_processor_state() on the suspend/resume > > path. > > > > cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored > > independently by lapic_{suspend,resume}(). > > Aren't those called only with CONFIG_PM set? > Unless I'm missing something, we only build any of the restore code (including the write_cr8() call) if CONFIG_PM_SLEEP is set, and that selects CONFIG_PM, so we should be fine, I think. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH AUTOSEL 4.19 079/158] vhost_net: disable zerocopy by default
From: Jason Wang [ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ] Vhost_net was known to suffer from HOL[1] issues which is not easy to fix. Several downstream disable the feature by default. What's more, the datapath was split and datacopy path got the support of batching and XDP support recently which makes it faster than zerocopy part for small packets transmission. It looks to me that disable zerocopy by default is more appropriate. It cold be enabled by default again in the future if we fix the above issues. [1] https://patchwork.kernel.org/patch/3787671/ Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/vhost/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 39155d7cc894..ae704658b528 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -36,7 +36,7 @@ #include "vhost.h" -static int experimental_zcopytx = 1; +static int experimental_zcopytx = 0; module_param(experimental_zcopytx, int, 0444); MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" " 1 -Enable; 0 - Disable"); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH AUTOSEL 5.1 105/219] vhost_net: disable zerocopy by default
From: Jason Wang [ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ] Vhost_net was known to suffer from HOL[1] issues which is not easy to fix. Several downstream disable the feature by default. What's more, the datapath was split and datacopy path got the support of batching and XDP support recently which makes it faster than zerocopy part for small packets transmission. It looks to me that disable zerocopy by default is more appropriate. It cold be enabled by default again in the future if we fix the above issues. [1] https://patchwork.kernel.org/patch/3787671/ Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/vhost/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index df51a35cf537..8beacbee2553 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -36,7 +36,7 @@ #include "vhost.h" -static int experimental_zcopytx = 1; +static int experimental_zcopytx = 0; module_param(experimental_zcopytx, int, 0444); MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" " 1 -Enable; 0 - Disable"); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH AUTOSEL 5.2 119/249] vhost_net: disable zerocopy by default
From: Jason Wang [ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ] Vhost_net was known to suffer from HOL[1] issues which is not easy to fix. Several downstream disable the feature by default. What's more, the datapath was split and datacopy path got the support of batching and XDP support recently which makes it faster than zerocopy part for small packets transmission. It looks to me that disable zerocopy by default is more appropriate. It cold be enabled by default again in the future if we fix the above issues. [1] https://patchwork.kernel.org/patch/3787671/ Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin --- drivers/vhost/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index d57ebdd616d9..247e5585af5d 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -35,7 +35,7 @@ #include "vhost.h" -static int experimental_zcopytx = 1; +static int experimental_zcopytx = 0; module_param(experimental_zcopytx, int, 0444); MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" " 1 -Enable; 0 - Disable"); -- 2.20.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks
On 15.07.19 15:00, Andrew Cooper wrote: There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored independently by lapic_{suspend,resume}(). Aren't those called only with CONFIG_PM set? Juergen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] x86/paravirt: Drop {read,write}_cr8() hooks
There is a lot of infrastructure for functionality which is used exclusively in __{save,restore}_processor_state() on the suspend/resume path. cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored independently by lapic_{suspend,resume}(). Delete the saving and restoration of cr8, which allows for the removal of both PVOPS. Signed-off-by: Andrew Cooper --- CC: x...@kernel.org CC: virtualization@lists.linux-foundation.org CC: Borislav Petkov CC: Peter Zijlstra CC: Andy Lutomirski CC: Nadav Amit CC: Stephane Eranian CC: Feng Tang CC: Juergen Gross CC: Boris Ostrovsky CC: Alok Kataria CC: "Rafael J. Wysocki" CC: Pavel Machek Spotted while reviewing "x86/apic: Initialize TPR to block interrupts 16-31" https://lore.kernel.org/lkml/dc04a9f8b234d7b0956a8d2560b8945bcd9c4bf7.1563117760.git.l...@kernel.org/ --- arch/x86/include/asm/paravirt.h | 12 arch/x86/include/asm/paravirt_types.h | 5 - arch/x86/include/asm/special_insns.h | 24 arch/x86/kernel/paravirt.c| 4 arch/x86/power/cpu.c | 4 arch/x86/xen/enlighten_pv.c | 15 --- 6 files changed, 64 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index c25c38a05c1c..0e4a0539c353 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -139,18 +139,6 @@ static inline void __write_cr4(unsigned long x) PVOP_VCALL1(cpu.write_cr4, x); } -#ifdef CONFIG_X86_64 -static inline unsigned long read_cr8(void) -{ - return PVOP_CALL0(unsigned long, cpu.read_cr8); -} - -static inline void write_cr8(unsigned long x) -{ - PVOP_VCALL1(cpu.write_cr8, x); -} -#endif - static inline void arch_safe_halt(void) { PVOP_VCALL0(irq.safe_halt); diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 946f8f1f1efc..3c775fb5524b 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -119,11 +119,6 @@ struct pv_cpu_ops { void (*write_cr4)(unsigned long); -#ifdef CONFIG_X86_64 - unsigned long (*read_cr8)(void); - void (*write_cr8)(unsigned long); -#endif - /* Segment descriptor handling */ void (*load_tr_desc)(void); void (*load_gdt)(const struct desc_ptr *); diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 219be88a59d2..6d37b8fcfc77 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -73,20 +73,6 @@ static inline unsigned long native_read_cr4(void) void native_write_cr4(unsigned long val); -#ifdef CONFIG_X86_64 -static inline unsigned long native_read_cr8(void) -{ - unsigned long cr8; - asm volatile("movq %%cr8,%0" : "=r" (cr8)); - return cr8; -} - -static inline void native_write_cr8(unsigned long val) -{ - asm volatile("movq %0,%%cr8" :: "r" (val) : "memory"); -} -#endif - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS static inline u32 rdpkru(void) { @@ -200,16 +186,6 @@ static inline void wbinvd(void) #ifdef CONFIG_X86_64 -static inline unsigned long read_cr8(void) -{ - return native_read_cr8(); -} - -static inline void write_cr8(unsigned long x) -{ - native_write_cr8(x); -} - static inline void load_gs_index(unsigned selector) { native_load_gs_index(selector); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 98039d7fb998..de4d4e8a54c1 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -311,10 +311,6 @@ struct paravirt_patch_template pv_ops = { .cpu.read_cr0 = native_read_cr0, .cpu.write_cr0 = native_write_cr0, .cpu.write_cr4 = native_write_cr4, -#ifdef CONFIG_X86_64 - .cpu.read_cr8 = native_read_cr8, - .cpu.write_cr8 = native_write_cr8, -#endif .cpu.wbinvd = native_wbinvd, .cpu.read_msr = native_read_msr, .cpu.write_msr = native_write_msr, diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 24b079e94bc2..1c58d8982728 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -122,9 +122,6 @@ static void __save_processor_state(struct saved_context *ctxt) ctxt->cr2 = read_cr2(); ctxt->cr3 = __read_cr3(); ctxt->cr4 = __read_cr4(); -#ifdef CONFIG_X86_64 - ctxt->cr8 = read_cr8(); -#endif ctxt->misc_enable_saved = !rdmsrl_safe(MSR_IA32_MISC_ENABLE, &ctxt->misc_enable); msr_save_context(ctxt); @@ -207,7 +204,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) #else /* CONFIG X86_64 */ wrmsrl(MSR_EFER, ctxt->efer); - write_cr8(ctxt->cr8); __write_cr4(ctxt->cr4); #endif write_cr3(ctxt->cr3); diff --git a/arch/x86/xen/enlighten
Re: [PATCH 0/2] Remove 32-bit Xen PV guest support
On 15.07.19 14:32, Peter Zijlstra wrote: On Mon, Jul 15, 2019 at 01:37:37PM +0200, Juergen Gross wrote: The long term plan has been to replace Xen PV guests by PVH. The first victim of that plan are now 32-bit PV guests, as those are used only rather seldom these days. Xen on x86 requires 64-bit support and with Grub2 now supporting PVH officially since version 2.04 there is no need to keep 32-bit PV guest support alive in the Linux kernel. Additionally Meltdown mitigation is not available in the kernel running as 32-bit PV guest, so dropping this mode makes sense from security point of view, too. Juergen Gross (2): x86/xen: remove 32-bit Xen PV guest support x86/paravirt: remove 32-bit support from PARAVIRT_XXL Hooray! Always a pleasure to cheer the community up by sending Xen patches. :-D Juergen ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/2] Remove 32-bit Xen PV guest support
On Mon, Jul 15, 2019 at 01:37:37PM +0200, Juergen Gross wrote: > The long term plan has been to replace Xen PV guests by PVH. The first > victim of that plan are now 32-bit PV guests, as those are used only > rather seldom these days. Xen on x86 requires 64-bit support and with > Grub2 now supporting PVH officially since version 2.04 there is no > need to keep 32-bit PV guest support alive in the Linux kernel. > Additionally Meltdown mitigation is not available in the kernel running > as 32-bit PV guest, so dropping this mode makes sense from security > point of view, too. > > Juergen Gross (2): > x86/xen: remove 32-bit Xen PV guest support > x86/paravirt: remove 32-bit support from PARAVIRT_XXL Hooray! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/2] Remove 32-bit Xen PV guest support
The long term plan has been to replace Xen PV guests by PVH. The first victim of that plan are now 32-bit PV guests, as those are used only rather seldom these days. Xen on x86 requires 64-bit support and with Grub2 now supporting PVH officially since version 2.04 there is no need to keep 32-bit PV guest support alive in the Linux kernel. Additionally Meltdown mitigation is not available in the kernel running as 32-bit PV guest, so dropping this mode makes sense from security point of view, too. Juergen Gross (2): x86/xen: remove 32-bit Xen PV guest support x86/paravirt: remove 32-bit support from PARAVIRT_XXL arch/x86/entry/entry_32.S | 93 arch/x86/entry/vdso/vdso32/vclock_gettime.c | 1 + arch/x86/include/asm/paravirt.h | 105 + arch/x86/include/asm/paravirt_types.h | 20 -- arch/x86/include/asm/pgtable-3level_types.h | 5 - arch/x86/include/asm/proto.h| 2 +- arch/x86/include/asm/segment.h | 2 +- arch/x86/include/asm/traps.h| 2 +- arch/x86/kernel/cpu/common.c| 8 - arch/x86/kernel/paravirt.c | 17 -- arch/x86/kernel/paravirt_patch_32.c | 36 +-- arch/x86/xen/Kconfig| 3 +- arch/x86/xen/Makefile | 4 +- arch/x86/xen/apic.c | 17 -- arch/x86/xen/enlighten_pv.c | 45 +--- arch/x86/xen/mmu_pv.c | 326 +++- arch/x86/xen/p2m.c | 4 - arch/x86/xen/setup.c| 44 +--- arch/x86/xen/smp_pv.c | 19 +- arch/x86/xen/xen-asm.S | 14 -- arch/x86/xen/xen-asm_32.S | 207 -- arch/x86/xen/xen-head.S | 6 - arch/x86/xen/xen-ops.h | 5 - drivers/xen/Kconfig | 4 +- 24 files changed, 57 insertions(+), 932 deletions(-) delete mode 100644 arch/x86/xen/xen-asm_32.S -- 2.16.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/2] x86/paravirt: remove 32-bit support from PARAVIRT_XXL
The last 32-bit user of stuff under CONFIG_PARAVIRT_XXL is gone. Remove 32-bit specific parts. Signed-off-by: Juergen Gross --- arch/x86/entry/vdso/vdso32/vclock_gettime.c | 1 + arch/x86/include/asm/paravirt.h | 105 arch/x86/include/asm/paravirt_types.h | 20 -- arch/x86/include/asm/pgtable-3level_types.h | 5 -- arch/x86/kernel/cpu/common.c| 8 --- arch/x86/kernel/paravirt.c | 17 - arch/x86/kernel/paravirt_patch_32.c | 36 +- 7 files changed, 15 insertions(+), 177 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c b/arch/x86/entry/vdso/vdso32/vclock_gettime.c index 9242b28418d5..36f4ce1405cb 100644 --- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c +++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c @@ -17,6 +17,7 @@ #undef CONFIG_ILLEGAL_POINTER_VALUE #undef CONFIG_SPARSEMEM_VMEMMAP #undef CONFIG_NR_CPUS +#undef CONFIG_PARAVIRT_XXL #define CONFIG_X86_32 1 #define CONFIG_PGTABLE_LEVELS 2 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index c25c38a05c1c..60dfa93313a9 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -139,7 +139,6 @@ static inline void __write_cr4(unsigned long x) PVOP_VCALL1(cpu.write_cr4, x); } -#ifdef CONFIG_X86_64 static inline unsigned long read_cr8(void) { return PVOP_CALL0(unsigned long, cpu.read_cr8); @@ -149,7 +148,6 @@ static inline void write_cr8(unsigned long x) { PVOP_VCALL1(cpu.write_cr8, x); } -#endif static inline void arch_safe_halt(void) { @@ -283,12 +281,10 @@ static inline void load_TLS(struct thread_struct *t, unsigned cpu) PVOP_VCALL2(cpu.load_tls, t, cpu); } -#ifdef CONFIG_X86_64 static inline void load_gs_index(unsigned int gs) { PVOP_VCALL1(cpu.load_gs_index, gs); } -#endif static inline void write_ldt_entry(struct desc_struct *dt, int entry, const void *desc) @@ -375,50 +371,28 @@ static inline pte_t __pte(pteval_t val) { pteval_t ret; - if (sizeof(pteval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pteval_t, mmu.make_pte, val, (u64)val >> 32); - else - ret = PVOP_CALLEE1(pteval_t, mmu.make_pte, val); + ret = PVOP_CALLEE1(pteval_t, mmu.make_pte, val); return (pte_t) { .pte = ret }; } static inline pteval_t pte_val(pte_t pte) { - pteval_t ret; - - if (sizeof(pteval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pteval_t, mmu.pte_val, - pte.pte, (u64)pte.pte >> 32); - else - ret = PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte); - - return ret; + return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte); } static inline pgd_t __pgd(pgdval_t val) { pgdval_t ret; - if (sizeof(pgdval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pgdval_t, mmu.make_pgd, val, (u64)val >> 32); - else - ret = PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val); + ret = PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val); return (pgd_t) { ret }; } static inline pgdval_t pgd_val(pgd_t pgd) { - pgdval_t ret; - - if (sizeof(pgdval_t) > sizeof(long)) - ret = PVOP_CALLEE2(pgdval_t, mmu.pgd_val, - pgd.pgd, (u64)pgd.pgd >> 32); - else - ret = PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd); - - return ret; + return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd); } #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION @@ -435,79 +409,48 @@ static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t pte) { - - if (sizeof(pteval_t) > sizeof(long)) - /* 5 arg words */ - pv_ops.mmu.ptep_modify_prot_commit(vma, addr, ptep, pte); - else - PVOP_VCALL4(mmu.ptep_modify_prot_commit, - vma, addr, ptep, pte.pte); + PVOP_VCALL4(mmu.ptep_modify_prot_commit, vma, addr, ptep, pte.pte); } static inline void set_pte(pte_t *ptep, pte_t pte) { - if (sizeof(pteval_t) > sizeof(long)) - PVOP_VCALL3(mmu.set_pte, ptep, pte.pte, (u64)pte.pte >> 32); - else - PVOP_VCALL2(mmu.set_pte, ptep, pte.pte); + PVOP_VCALL2(mmu.set_pte, ptep, pte.pte); } static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { - if (sizeof(pteval_t) > sizeof(long)) - /* 5 arg words */ - pv_ops.mmu.set_pte_at(mm, addr, ptep, pte); - else - PVOP_VCALL4(mmu.set_pte_at, mm, addr, ptep, pte.pte); + PVOP_VCALL4(mmu.set_pte_
Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
On Mon, Jul 15, 2019 at 05:16:09PM +0800, Jason Wang wrote: > > > > > > > > >struct sk_buff *virtskb_receive_small(struct virtskb > > > > > > > > *vs, ...); > > > > > > > >struct sk_buff *virtskb_receive_big(struct virtskb *vs, > > > > > > > > ...); > > > > > > > >struct sk_buff *virtskb_receive_mergeable(struct virtskb > > > > > > > > *vs, ...); > > > > > > > > > > > > > > > >int virtskb_add_recvbuf_small(struct virtskb*vs, ...); > > > > > > > >int virtskb_add_recvbuf_big(struct virtskb *vs, ...); > > > > > > > >int virtskb_add_recvbuf_mergeable(struct virtskb *vs, > > > > > > > > ...); > > > > > > > > > > > > > > > > For the Guest->Host path it should be easier, so maybe I can > > > > > > > > add a > > > > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a > > > > > > > > part of the code > > > > > > > > of xmit_skb(). > > > > > > > I may miss something, but I don't see any thing that prevents us > > > > > > > from using > > > > > > > xmit_skb() directly. > > > > > > > > > > > > > Yes, but my initial idea was to make it more parametric and not > > > > > > related to the > > > > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the > > > > > > 'num_buffers' should be handled by the caller. > > > > > > > > > > > > > > Let me know if you have in mind better names or if I should put > > > > > > > > these function > > > > > > > > in another place. > > > > > > > > > > > > > > > > I would like to leave the control part completely separate, so, > > > > > > > > for example, > > > > > > > > the two drivers will negotiate the features independently and > > > > > > > > they will call > > > > > > > > the right virtskb_receive_*() function based on the negotiation. > > > > > > > If it's one the issue of negotiation, we can simply change the > > > > > > > virtnet_probe() to deal with different devices. > > > > > > > > > > > > > > > > > > > > > > I already started to work on it, but before to do more steps > > > > > > > > and send an RFC > > > > > > > > patch, I would like to hear your opinion. > > > > > > > > Do you think that makes sense? > > > > > > > > Do you see any issue or a better solution? > > > > > > > I still think we need to seek a way of adding some codes on > > > > > > > virtio-net.c > > > > > > > directly if there's no huge different in the processing of TX/RX. > > > > > > > That would > > > > > > > save us a lot time. > > > > > > After the reading of the buffers from the virtqueue I think the > > > > > > process > > > > > > is slightly different, because virtio-net will interface with the > > > > > > network > > > > > > stack, while virtio-vsock will interface with the vsock-core > > > > > > (socket). > > > > > > So the virtio-vsock implements the following: > > > > > > - control flow mechanism to avoid to loose packets, informing the > > > > > > peer > > > > > > about the amount of memory available in the receive queue using > > > > > > some > > > > > > fields in the virtio_vsock_hdr > > > > > > - de-multiplexing parsing the virtio_vsock_hdr and choosing the > > > > > > right > > > > > > socket depending on the port > > > > > > - socket state handling > > > > > > I think it's just a branch, for ethernet, go for networking stack. > > > otherwise > > > go for vsock core? > > > > > Yes, that should work. > > > > So, I should refactor the functions that can be called also from the vsock > > core, in order to remove "struct net_device *dev" parameter. > > Maybe creating some wrappers for the network stack. > > > > Otherwise I should create a fake net_device for vsock_core. > > > > What do you suggest? > > > I'm not quite sure I get the question. Can you just use the one that created > by virtio_net? Sure, sorry but I missed that it is allocated in the virtnet_probe()! Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...); struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...); struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...); int virtskb_add_recvbuf_small(struct virtskb*vs, ...); int virtskb_add_recvbuf_big(struct virtskb *vs, ...); int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...); For the Guest->Host path it should be easier, so maybe I can add a "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code of xmit_skb(). I may miss something, but I don't see any thing that prevents us from using xmit_skb() directly. Yes, but my initial idea was to make it more parametric and not related to the virtio_net_hdr, so the 'hdr_len' could be a parameter and the 'num_buffers' should be handled by the caller. Let me know if you have in mind better names or if I should put these function in another place. I would like to leave the control part completely separate, so, for example, the two drivers will negotiate the features independently and they will call the right virtskb_receive_*() function based on the negotiation. If it's one the issue of negotiation, we can simply change the virtnet_probe() to deal with different devices. I already started to work on it, but before to do more steps and send an RFC patch, I would like to hear your opinion. Do you think that makes sense? Do you see any issue or a better solution? I still think we need to seek a way of adding some codes on virtio-net.c directly if there's no huge different in the processing of TX/RX. That would save us a lot time. After the reading of the buffers from the virtqueue I think the process is slightly different, because virtio-net will interface with the network stack, while virtio-vsock will interface with the vsock-core (socket). So the virtio-vsock implements the following: - control flow mechanism to avoid to loose packets, informing the peer about the amount of memory available in the receive queue using some fields in the virtio_vsock_hdr - de-multiplexing parsing the virtio_vsock_hdr and choosing the right socket depending on the port - socket state handling I think it's just a branch, for ethernet, go for networking stack. otherwise go for vsock core? Yes, that should work. So, I should refactor the functions that can be called also from the vsock core, in order to remove "struct net_device *dev" parameter. Maybe creating some wrappers for the network stack. Otherwise I should create a fake net_device for vsock_core. What do you suggest? I'm not quite sure I get the question. Can you just use the one that created by virtio_net? Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote: > > On 2019/7/12 下午6:00, Stefano Garzarella wrote: > > On Thu, Jul 11, 2019 at 03:52:21PM -0400, Michael S. Tsirkin wrote: > > > On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote: > > > > On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote: > > > > > On 2019/7/10 下午11:37, Stefano Garzarella wrote: > > > > > > Hi, > > > > > > as Jason suggested some months ago, I looked better at the > > > > > > virtio-net driver to > > > > > > understand if we can reuse some parts also in the virtio-vsock > > > > > > driver, since we > > > > > > have similar challenges (mergeable buffers, page allocation, small > > > > > > packets, etc.). > > > > > > > > > > > > Initially, I would add the skbuff in the virtio-vsock in order to > > > > > > re-use > > > > > > receive_*() functions. > > > > > > > > > > Yes, that will be a good step. > > > > > > > > > Okay, I'll go on this way. > > > > > > > > > > Then I would move receive_[small, big, mergeable]() and > > > > > > add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, > > > > > > in order to > > > > > > call them also from virtio-vsock. I need to do some refactoring > > > > > > (e.g. leave the > > > > > > XDP part on the virtio-net driver), but I think it is feasible. > > > > > > > > > > > > The idea is to create a virtio-skb.[h,c] where put these functions > > > > > > and a new > > > > > > object where stores some attributes needed (e.g. hdr_len ) and > > > > > > status (e.g. > > > > > > some fields of struct receive_queue). > > > > > > > > > > My understanding is we could be more ambitious here. Do you see any > > > > > blocker > > > > > for reusing virtio-net directly? It's better to reuse not only the > > > > > functions > > > > > but also the logic like NAPI to avoid re-inventing something buggy and > > > > > duplicated. > > > > > > > > > These are my concerns: > > > > - virtio-vsock is not a "net_device", so a lot of code related to > > > >ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, > > > > offloading) will be > > > >not used by virtio-vsock. > > > Linux support device other than ethernet, so it should not be a problem. > > > > > > > > > > - virtio-vsock has a different header. We can consider it as part of > > > >virtio_net payload, but it precludes the compatibility with old > > > > hosts. This > > > >was one of the major doubts that made me think about using only the > > > >send/recv skbuff functions, that it shouldn't break the > > > > compatibility. > > > We can extend the current vnet header helper for it to work for vsock. Okay, I'll do it. > > > > > > > > > > > > This is an idea of virtio-skb.h that > > > > > > I have in mind: > > > > > > struct virtskb; > > > > > > > > > > What fields do you want to store in virtskb? It looks to be exist > > > > > sk_buff is > > > > > flexible enough to us? > > > > My idea is to store queues information, like struct receive_queue or > > > > struct send_queue, and some device attributes (e.g. hdr_len ). > > > If you reuse skb or virtnet_info, there is not necessary. > Okay. > > > > > > > > > > > > > > > > struct sk_buff *virtskb_receive_small(struct virtskb *vs, > > > > > > ...); > > > > > > struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...); > > > > > > struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, > > > > > > ...); > > > > > > > > > > > > int virtskb_add_recvbuf_small(struct virtskb*vs, ...); > > > > > > int virtskb_add_recvbuf_big(struct virtskb *vs, ...); > > > > > > int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...); > > > > > > > > > > > > For the Guest->Host path it should be easier, so maybe I can add a > > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part > > > > > > of the code > > > > > > of xmit_skb(). > > > > > > > > > > I may miss something, but I don't see any thing that prevents us from > > > > > using > > > > > xmit_skb() directly. > > > > > > > > > Yes, but my initial idea was to make it more parametric and not related > > > > to the > > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the > > > > 'num_buffers' should be handled by the caller. > > > > > > > > > > Let me know if you have in mind better names or if I should put > > > > > > these function > > > > > > in another place. > > > > > > > > > > > > I would like to leave the control part completely separate, so, for > > > > > > example, > > > > > > the two drivers will negotiate the features independently and they > > > > > > will call > > > > > > the right virtskb_receive_*() function based on the negotiation. > > > > > > > > > > If it's one the issue of negotiation, we can simply change the > > > > > virtnet_probe() to deal with different devices. > > > > > > > > > > > > > > > > I already started to work on it, but before to do more steps and > > > > > >