Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Jan Beulich
On 15.07.2019 19:28, Andy Lutomirski wrote:
> On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen  wrote:
>>
>> Juergen Gross  writes:
>>
>>> The long term plan has been to replace Xen PV guests by PVH. The first
>>> victim of that plan are now 32-bit PV guests, as those are used only
>>> rather seldom these days. Xen on x86 requires 64-bit support and with
>>> Grub2 now supporting PVH officially since version 2.04 there is no
>>> need to keep 32-bit PV guest support alive in the Linux kernel.
>>> Additionally Meltdown mitigation is not available in the kernel running
>>> as 32-bit PV guest, so dropping this mode makes sense from security
>>> point of view, too.
>>
>> Normally we have a deprecation period for feature removals like this.
>> You would make the kernel print a warning for some releases, and when
>> no user complains you can then remove. If a user complains you can't.
>>
> 
> As I understand it, the kernel rules do allow changes like this even
> if there's a complaint: this is a patch that removes what is
> effectively hardware support.  If the maintenance cost exceeds the
> value, then removal is fair game.  (Obviously we weight the value to
> preserving compatibility quite highly, but in this case, Xen dropped
> 32-bit hardware support a long time ago.  If the Xen hypervisor says
> that 32-bit PV guest support is deprecated, it's deprecated.)

Since it was implied but not explicit from Andrew's reply, just to
make it explicit: So far 32-bit PV guest support has not been
deprecated in Xen itself.

Jan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Juergen Gross

On 15.07.19 17:16, Andrew Cooper wrote:

There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
rest of the Local APIC state isn't a clever thing to be doing.

Delete the suspend/resume cr8 handling, which shrinks the size of struct
saved_context, and allows for the removal of both PVOPS.

Signed-off-by: Andrew Cooper 


Reviewed-by: Juergen Gross 


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Juergen Gross

On 15.07.19 19:39, Andrew Cooper wrote:

On 15/07/2019 18:28, Andy Lutomirski wrote:

On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen  wrote:

Juergen Gross  writes:


The long term plan has been to replace Xen PV guests by PVH. The first
victim of that plan are now 32-bit PV guests, as those are used only
rather seldom these days. Xen on x86 requires 64-bit support and with
Grub2 now supporting PVH officially since version 2.04 there is no
need to keep 32-bit PV guest support alive in the Linux kernel.
Additionally Meltdown mitigation is not available in the kernel running
as 32-bit PV guest, so dropping this mode makes sense from security
point of view, too.

Normally we have a deprecation period for feature removals like this.
You would make the kernel print a warning for some releases, and when
no user complains you can then remove. If a user complains you can't.


As I understand it, the kernel rules do allow changes like this even
if there's a complaint: this is a patch that removes what is
effectively hardware support.  If the maintenance cost exceeds the
value, then removal is fair game.  (Obviously we weight the value to
preserving compatibility quite highly, but in this case, Xen dropped
32-bit hardware support a long time ago.  If the Xen hypervisor says
that 32-bit PV guest support is deprecated, it's deprecated.)

That being said, a warning might not be a bad idea.  What's the
current status of this in upstream Xen?


So personally, I'd prefer to see support stay, but at the end of the day
it is Juergen's choice as the maintainer of the code.


Especially on the security front we are unsafe with 32-bit PV Linux.
And making it safe will make it so slow that the needed effort is not
spent very well.


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Nadav Amit via Virtualization
> On Jul 15, 2019, at 4:30 PM, Andrew Cooper  wrote:
> 
> On 15/07/2019 19:17, Nadav Amit wrote:
>>> On Jul 15, 2019, at 8:16 AM, Andrew Cooper  
>>> wrote:
>>> 
>>> There is a lot of infrastructure for functionality which is used
>>> exclusively in __{save,restore}_processor_state() on the suspend/resume
>>> path.
>>> 
>>> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
>>> lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
>>> rest of the Local APIC state isn't a clever thing to be doing.
>>> 
>>> Delete the suspend/resume cr8 handling, which shrinks the size of struct
>>> saved_context, and allows for the removal of both PVOPS.
>> I think removing the interface for CR8 writes is also good to avoid
>> potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task
>> Priorities between CR8 and APIC”):
>> 
>> "Operating software should implement either direct APIC TPR updates or CR8
>> style TPR updates but not mix them. Software can use a serializing
>> instruction (for example, CPUID) to serialize updates between MOV CR8 and
>> stores to the APIC.”
>> 
>> And native_write_cr8() did not even issue a serializing instruction.
> 
> Given its location, the one write_cr8() is bounded by two serialising
> operations, so is safe in practice.

That’s what the “potential” in "potential correctness issues” means :)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andy Lutomirski
On Mon, Jul 15, 2019 at 4:30 PM Andrew Cooper  wrote:
>
> On 15/07/2019 19:17, Nadav Amit wrote:
> >> On Jul 15, 2019, at 8:16 AM, Andrew Cooper  
> >> wrote:
> >>
> >> There is a lot of infrastructure for functionality which is used
> >> exclusively in __{save,restore}_processor_state() on the suspend/resume
> >> path.
> >>
> >> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
> >> lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
> >> rest of the Local APIC state isn't a clever thing to be doing.
> >>
> >> Delete the suspend/resume cr8 handling, which shrinks the size of struct
> >> saved_context, and allows for the removal of both PVOPS.
> > I think removing the interface for CR8 writes is also good to avoid
> > potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task
> > Priorities between CR8 and APIC”):
> >
> > "Operating software should implement either direct APIC TPR updates or CR8
> > style TPR updates but not mix them. Software can use a serializing
> > instruction (for example, CPUID) to serialize updates between MOV CR8 and
> > stores to the APIC.”
> >
> > And native_write_cr8() did not even issue a serializing instruction.
> >
>
> Given its location, the one write_cr8() is bounded by two serialising
> operations, so is safe in practice.
>
> However, I agree with the statement in the manual.  I could submit a v3
> with an updated commit message, or let it be fixed on commit.  Whichever
> is easiest.
>

I don't see anything wrong with the message.  If we actually used CR8
for interrupt priorities, we wouldn't want it to serialize.  The bug
is that the code that did the write_cr8() should have had a comment as
to how it serialized against lapic_restore().  But that doesn't seem
worth mentioning in the message, since, as noted, the real problem was
that it nonsensically restored just TPR without restoring everything
else.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Benjamin Herrenschmidt
On Mon, 2019-07-15 at 19:03 -0300, Thiago Jung Bauermann wrote:
> > > Indeed. The idea is that QEMU can offer the flag, old guests can
> > > reject
> > > it (or even new guests can reject it, if they decide not to
> > > convert into
> > > secure VMs) and the feature negotiation will succeed with the
> > > flag
> > > unset.
> > 
> > OK. And then what does QEMU do? Assume guest is not encrypted I
> > guess?
> 
> There's nothing different that QEMU needs to do, with or without the
> flag. the perspective of the host, a secure guest and a regular guest
> work the same way with respect to virtio.

This is *precisely* why I was against adding a flag and touch the
protocol negociation with qemu in the first place, back when I cared
about that stuff...

Guys, this has gone in circles over and over again.

This has nothing to do with qemu. Qemu doesn't need to know about this.
It's entirely guest local. This is why the one-liner in virtio was a
far better and simpler solution.

This is something the guest does to itself (with the participation of a
ultravisor but that's not something qemu cares about at this stage, at
least not as far as virtio is concerned).

Basically, the guest "hides" its memory from the host using a HW secure
memory facility. As a result, it needs to ensure that all of its DMA
pages are bounced through insecure pages that aren't hidden. That's it,
it's all guest side. Qemu shouldn't have to care about it at all.

Cheers,
Ben.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andrew Cooper
On 15/07/2019 19:17, Nadav Amit wrote:
>> On Jul 15, 2019, at 8:16 AM, Andrew Cooper  wrote:
>>
>> There is a lot of infrastructure for functionality which is used
>> exclusively in __{save,restore}_processor_state() on the suspend/resume
>> path.
>>
>> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
>> lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
>> rest of the Local APIC state isn't a clever thing to be doing.
>>
>> Delete the suspend/resume cr8 handling, which shrinks the size of struct
>> saved_context, and allows for the removal of both PVOPS.
> I think removing the interface for CR8 writes is also good to avoid
> potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task
> Priorities between CR8 and APIC”):
>
> "Operating software should implement either direct APIC TPR updates or CR8
> style TPR updates but not mix them. Software can use a serializing
> instruction (for example, CPUID) to serialize updates between MOV CR8 and
> stores to the APIC.”
>
> And native_write_cr8() did not even issue a serializing instruction.
>

Given its location, the one write_cr8() is bounded by two serialising
operations, so is safe in practice.

However, I agree with the statement in the manual.  I could submit a v3
with an updated commit message, or let it be fixed on commit.  Whichever
is easiest.

~Andrew
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Thiago Jung Bauermann


Michael S. Tsirkin  writes:

> On Mon, Jul 15, 2019 at 07:03:03PM -0300, Thiago Jung Bauermann wrote:
>> 
>> Michael S. Tsirkin  writes:
>> 
>> > On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
>> >>
>> >> Michael S. Tsirkin  writes:
>> >>
>> >> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
>> >> >>
>> >> >>
>> >> >> Michael S. Tsirkin  writes:
>> >> >>
>> >> >> > So this is what I would call this option:
>> >> >> >
>> >> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
>> >> >> >
>> >> >> > and the explanation should state that all device
>> >> >> > addresses are translated by the platform to identical
>> >> >> > addresses.
>> >> >> >
>> >> >> > In fact this option then becomes more, not less restrictive
>> >> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
>> >> >> > by guest to only create identity mappings,
>> >> >> > and only before driver_ok is set.
>> >> >> > This option then would always be negotiated together with
>> >> >> > VIRTIO_F_ACCESS_PLATFORM.
>> >> >> >
>> >> >> > Host then must verify that
>> >> >> > 1. full 1:1 mappings are created before driver_ok
>> >> >> > or can we make sure this happens before features_ok?
>> >> >> > that would be ideal as we could require that features_ok fails
>> >> >> > 2. mappings are not modified between driver_ok and reset
>> >> >> > i guess attempts to change them will fail -
>> >> >> > possibly by causing a guest crash
>> >> >> > or some other kind of platform-specific error
>> >> >>
>> >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but 
>> >> >> requiring
>> >> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
>> >> >> SLOF as I mentioned above, another is that we would be requiring all
>> >> >> guests running on the machine (secure guests or not, since we would use
>> >> >> the same configuration for all guests) to support it. But
>> >> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
>> >> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know 
>> >> >> about
>> >> >> it and wouldn't be able to use the device.
>> >> >
>> >> > OK and your target is to enable use with kernel drivers within
>> >> > guests, right?
>> >>
>> >> Right.
>> >>
>> >> > My question is, we are defining a new flag here, I guess old guests
>> >> > then do not set it. How does it help old guests? Or maybe it's
>> >> > not designed to ...
>> >>
>> >> Indeed. The idea is that QEMU can offer the flag, old guests can reject
>> >> it (or even new guests can reject it, if they decide not to convert into
>> >> secure VMs) and the feature negotiation will succeed with the flag
>> >> unset.
>> >
>> > OK. And then what does QEMU do? Assume guest is not encrypted I guess?
>> 
>> There's nothing different that QEMU needs to do, with or without the
>> flag. the perspective of the host, a secure guest and a regular guest
>> work the same way with respect to virtio.
>
> OK. So now let's get back to implementation. What will
> Linux guest driver do? It can't activate DMA API blindly since that
> will assume translation also works, right?

It can on pseries, because we always have a 1:1 window mapping the whole
guest memory.

> Or do we somehow limit it to just a specific platform?

Yes, we want to accept the new flag only on secure pseries guests.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2019 at 07:03:03PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > So this is what I would call this option:
> >> >> >
> >> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
> >> >> >
> >> >> > and the explanation should state that all device
> >> >> > addresses are translated by the platform to identical
> >> >> > addresses.
> >> >> >
> >> >> > In fact this option then becomes more, not less restrictive
> >> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
> >> >> > by guest to only create identity mappings,
> >> >> > and only before driver_ok is set.
> >> >> > This option then would always be negotiated together with
> >> >> > VIRTIO_F_ACCESS_PLATFORM.
> >> >> >
> >> >> > Host then must verify that
> >> >> > 1. full 1:1 mappings are created before driver_ok
> >> >> > or can we make sure this happens before features_ok?
> >> >> > that would be ideal as we could require that features_ok fails
> >> >> > 2. mappings are not modified between driver_ok and reset
> >> >> > i guess attempts to change them will fail -
> >> >> > possibly by causing a guest crash
> >> >> > or some other kind of platform-specific error
> >> >>
> >> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
> >> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
> >> >> SLOF as I mentioned above, another is that we would be requiring all
> >> >> guests running on the machine (secure guests or not, since we would use
> >> >> the same configuration for all guests) to support it. But
> >> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
> >> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
> >> >> it and wouldn't be able to use the device.
> >> >
> >> > OK and your target is to enable use with kernel drivers within
> >> > guests, right?
> >>
> >> Right.
> >>
> >> > My question is, we are defining a new flag here, I guess old guests
> >> > then do not set it. How does it help old guests? Or maybe it's
> >> > not designed to ...
> >>
> >> Indeed. The idea is that QEMU can offer the flag, old guests can reject
> >> it (or even new guests can reject it, if they decide not to convert into
> >> secure VMs) and the feature negotiation will succeed with the flag
> >> unset.
> >
> > OK. And then what does QEMU do? Assume guest is not encrypted I guess?
> 
> There's nothing different that QEMU needs to do, with or without the
> flag. the perspective of the host, a secure guest and a regular guest
> work the same way with respect to virtio.

OK. So now let's get back to implementation. What will
Linux guest driver do? It can't activate DMA API blindly since that
will assume translation also works, right?
Or do we somehow limit it to just a specific platform?

> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Thiago Jung Bauermann


Michael S. Tsirkin  writes:

> On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
>>
>> Michael S. Tsirkin  writes:
>>
>> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
>> >>
>> >>
>> >> Michael S. Tsirkin  writes:
>> >>
>> >> > So this is what I would call this option:
>> >> >
>> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
>> >> >
>> >> > and the explanation should state that all device
>> >> > addresses are translated by the platform to identical
>> >> > addresses.
>> >> >
>> >> > In fact this option then becomes more, not less restrictive
>> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
>> >> > by guest to only create identity mappings,
>> >> > and only before driver_ok is set.
>> >> > This option then would always be negotiated together with
>> >> > VIRTIO_F_ACCESS_PLATFORM.
>> >> >
>> >> > Host then must verify that
>> >> > 1. full 1:1 mappings are created before driver_ok
>> >> > or can we make sure this happens before features_ok?
>> >> > that would be ideal as we could require that features_ok fails
>> >> > 2. mappings are not modified between driver_ok and reset
>> >> > i guess attempts to change them will fail -
>> >> > possibly by causing a guest crash
>> >> > or some other kind of platform-specific error
>> >>
>> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
>> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
>> >> SLOF as I mentioned above, another is that we would be requiring all
>> >> guests running on the machine (secure guests or not, since we would use
>> >> the same configuration for all guests) to support it. But
>> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
>> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
>> >> it and wouldn't be able to use the device.
>> >
>> > OK and your target is to enable use with kernel drivers within
>> > guests, right?
>>
>> Right.
>>
>> > My question is, we are defining a new flag here, I guess old guests
>> > then do not set it. How does it help old guests? Or maybe it's
>> > not designed to ...
>>
>> Indeed. The idea is that QEMU can offer the flag, old guests can reject
>> it (or even new guests can reject it, if they decide not to convert into
>> secure VMs) and the feature negotiation will succeed with the flag
>> unset.
>
> OK. And then what does QEMU do? Assume guest is not encrypted I guess?

There's nothing different that QEMU needs to do, with or without the
flag. the perspective of the host, a secure guest and a regular guest
work the same way with respect to virtio.

--
Thiago Jung Bauermann
IBM Linux Technology Center
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2019 at 05:29:06PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> >>
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > So this is what I would call this option:
> >> >
> >> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
> >> >
> >> > and the explanation should state that all device
> >> > addresses are translated by the platform to identical
> >> > addresses.
> >> >
> >> > In fact this option then becomes more, not less restrictive
> >> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
> >> > by guest to only create identity mappings,
> >> > and only before driver_ok is set.
> >> > This option then would always be negotiated together with
> >> > VIRTIO_F_ACCESS_PLATFORM.
> >> >
> >> > Host then must verify that
> >> > 1. full 1:1 mappings are created before driver_ok
> >> > or can we make sure this happens before features_ok?
> >> > that would be ideal as we could require that features_ok fails
> >> > 2. mappings are not modified between driver_ok and reset
> >> > i guess attempts to change them will fail -
> >> > possibly by causing a guest crash
> >> > or some other kind of platform-specific error
> >>
> >> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
> >> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
> >> SLOF as I mentioned above, another is that we would be requiring all
> >> guests running on the machine (secure guests or not, since we would use
> >> the same configuration for all guests) to support it. But
> >> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
> >> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
> >> it and wouldn't be able to use the device.
> >
> > OK and your target is to enable use with kernel drivers within
> > guests, right?
> 
> Right.
> 
> > My question is, we are defining a new flag here, I guess old guests
> > then do not set it. How does it help old guests? Or maybe it's
> > not designed to ...
> 
> Indeed. The idea is that QEMU can offer the flag, old guests can reject
> it (or even new guests can reject it, if they decide not to convert into
> secure VMs) and the feature negotiation will succeed with the flag
> unset.

OK. And then what does QEMU do? Assume guest is not encrypted I guess?

> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Thiago Jung Bauermann


Michael S. Tsirkin  writes:

> On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
>>
>>
>> Michael S. Tsirkin  writes:
>>
>> > So this is what I would call this option:
>> >
>> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
>> >
>> > and the explanation should state that all device
>> > addresses are translated by the platform to identical
>> > addresses.
>> >
>> > In fact this option then becomes more, not less restrictive
>> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
>> > by guest to only create identity mappings,
>> > and only before driver_ok is set.
>> > This option then would always be negotiated together with
>> > VIRTIO_F_ACCESS_PLATFORM.
>> >
>> > Host then must verify that
>> > 1. full 1:1 mappings are created before driver_ok
>> > or can we make sure this happens before features_ok?
>> > that would be ideal as we could require that features_ok fails
>> > 2. mappings are not modified between driver_ok and reset
>> > i guess attempts to change them will fail -
>> > possibly by causing a guest crash
>> > or some other kind of platform-specific error
>>
>> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
>> it to be accompanied by ACCESS_PLATFORM can be a problem. One reason is
>> SLOF as I mentioned above, another is that we would be requiring all
>> guests running on the machine (secure guests or not, since we would use
>> the same configuration for all guests) to support it. But
>> ACCESS_PLATFORM is relatively recent so it's a bit early for that. For
>> instance, Ubuntu 16.04 LTS (which is still supported) doesn't know about
>> it and wouldn't be able to use the device.
>
> OK and your target is to enable use with kernel drivers within
> guests, right?

Right.

> My question is, we are defining a new flag here, I guess old guests
> then do not set it. How does it help old guests? Or maybe it's
> not designed to ...

Indeed. The idea is that QEMU can offer the flag, old guests can reject
it (or even new guests can reject it, if they decide not to convert into
secure VMs) and the feature negotiation will succeed with the flag
unset.

--
Thiago Jung Bauermann
IBM Linux Technology Center

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Nadav Amit via Virtualization
> On Jul 15, 2019, at 8:16 AM, Andrew Cooper  wrote:
> 
> There is a lot of infrastructure for functionality which is used
> exclusively in __{save,restore}_processor_state() on the suspend/resume
> path.
> 
> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
> lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
> rest of the Local APIC state isn't a clever thing to be doing.
> 
> Delete the suspend/resume cr8 handling, which shrinks the size of struct
> saved_context, and allows for the removal of both PVOPS.

I think removing the interface for CR8 writes is also good to avoid
potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task
Priorities between CR8 and APIC”):

"Operating software should implement either direct APIC TPR updates or CR8
style TPR updates but not mix them. Software can use a serializing
instruction (for example, CPUID) to serialize updates between MOV CR8 and
stores to the APIC.”

And native_write_cr8() did not even issue a serializing instruction.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock

2019-07-15 Thread Michael S. Tsirkin
On Mon, Jul 15, 2019 at 09:44:16AM +0200, Stefano Garzarella wrote:
> On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:
> > 
> > On 2019/7/12 下午6:00, Stefano Garzarella wrote:
> > > On Thu, Jul 11, 2019 at 03:52:21PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote:
> > > > > On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote:
> > > > > > On 2019/7/10 下午11:37, Stefano Garzarella wrote:
> > > > > > > Hi,
> > > > > > > as Jason suggested some months ago, I looked better at the 
> > > > > > > virtio-net driver to
> > > > > > > understand if we can reuse some parts also in the virtio-vsock 
> > > > > > > driver, since we
> > > > > > > have similar challenges (mergeable buffers, page allocation, small
> > > > > > > packets, etc.).
> > > > > > > 
> > > > > > > Initially, I would add the skbuff in the virtio-vsock in order to 
> > > > > > > re-use
> > > > > > > receive_*() functions.
> > > > > > 
> > > > > > Yes, that will be a good step.
> > > > > > 
> > > > > Okay, I'll go on this way.
> > > > > 
> > > > > > > Then I would move receive_[small, big, mergeable]() and
> > > > > > > add_recvbuf_[small, big, mergeable]() outside of virtio-net 
> > > > > > > driver, in order to
> > > > > > > call them also from virtio-vsock. I need to do some refactoring 
> > > > > > > (e.g. leave the
> > > > > > > XDP part on the virtio-net driver), but I think it is feasible.
> > > > > > > 
> > > > > > > The idea is to create a virtio-skb.[h,c] where put these 
> > > > > > > functions and a new
> > > > > > > object where stores some attributes needed (e.g. hdr_len ) and 
> > > > > > > status (e.g.
> > > > > > > some fields of struct receive_queue).
> > > > > > 
> > > > > > My understanding is we could be more ambitious here. Do you see any 
> > > > > > blocker
> > > > > > for reusing virtio-net directly? It's better to reuse not only the 
> > > > > > functions
> > > > > > but also the logic like NAPI to avoid re-inventing something buggy 
> > > > > > and
> > > > > > duplicated.
> > > > > > 
> > > > > These are my concerns:
> > > > > - virtio-vsock is not a "net_device", so a lot of code related to
> > > > >ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, 
> > > > > offloading) will be
> > > > >not used by virtio-vsock.
> > 
> > 
> > Linux support device other than ethernet, so it should not be a problem.
> > 
> > 
> > > > > 
> > > > > - virtio-vsock has a different header. We can consider it as part of
> > > > >virtio_net payload, but it precludes the compatibility with old 
> > > > > hosts. This
> > > > >was one of the major doubts that made me think about using only the
> > > > >send/recv skbuff functions, that it shouldn't break the 
> > > > > compatibility.
> > 
> > 
> > We can extend the current vnet header helper for it to work for vsock.
> 
> Okay, I'll do it.
> 
> > 
> > 
> > > > > 
> > > > > > > This is an idea of virtio-skb.h that
> > > > > > > I have in mind:
> > > > > > >   struct virtskb;
> > > > > > 
> > > > > > What fields do you want to store in virtskb? It looks to be exist 
> > > > > > sk_buff is
> > > > > > flexible enough to us?
> > > > > My idea is to store queues information, like struct receive_queue or
> > > > > struct send_queue, and some device attributes (e.g. hdr_len ).
> > 
> > 
> > If you reuse skb or virtnet_info, there is not necessary.
> > 
> 
> Okay.
> 
> > 
> > > > > 
> > > > > > 
> > > > > > >   struct sk_buff *virtskb_receive_small(struct virtskb *vs, 
> > > > > > > ...);
> > > > > > >   struct sk_buff *virtskb_receive_big(struct virtskb *vs, 
> > > > > > > ...);
> > > > > > >   struct sk_buff *virtskb_receive_mergeable(struct virtskb 
> > > > > > > *vs, ...);
> > > > > > > 
> > > > > > >   int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
> > > > > > >   int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
> > > > > > >   int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
> > > > > > > 
> > > > > > > For the Guest->Host path it should be easier, so maybe I can add a
> > > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a 
> > > > > > > part of the code
> > > > > > > of xmit_skb().
> > > > > > 
> > > > > > I may miss something, but I don't see any thing that prevents us 
> > > > > > from using
> > > > > > xmit_skb() directly.
> > > > > > 
> > > > > Yes, but my initial idea was to make it more parametric and not 
> > > > > related to the
> > > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the
> > > > > 'num_buffers' should be handled by the caller.
> > > > > 
> > > > > > > Let me know if you have in mind better names or if I should put 
> > > > > > > these function
> > > > > > > in another place.
> > > > > > > 
> > > > > > > I would like to leave the control part completely separate, so, 
> > > > > > > for example,
> > > > > > > the two drivers will negotiate the features independently and 
> > > > > > > they will

Re: [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Juergen Gross

On 15.07.19 19:28, Andy Lutomirski wrote:

On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen  wrote:


Juergen Gross  writes:


The long term plan has been to replace Xen PV guests by PVH. The first
victim of that plan are now 32-bit PV guests, as those are used only
rather seldom these days. Xen on x86 requires 64-bit support and with
Grub2 now supporting PVH officially since version 2.04 there is no
need to keep 32-bit PV guest support alive in the Linux kernel.
Additionally Meltdown mitigation is not available in the kernel running
as 32-bit PV guest, so dropping this mode makes sense from security
point of view, too.


Normally we have a deprecation period for feature removals like this.
You would make the kernel print a warning for some releases, and when
no user complains you can then remove. If a user complains you can't.



As I understand it, the kernel rules do allow changes like this even
if there's a complaint: this is a patch that removes what is
effectively hardware support.  If the maintenance cost exceeds the
value, then removal is fair game.  (Obviously we weight the value to
preserving compatibility quite highly, but in this case, Xen dropped
32-bit hardware support a long time ago.  If the Xen hypervisor says
that 32-bit PV guest support is deprecated, it's deprecated.)

That being said, a warning might not be a bad idea.  What's the
current status of this in upstream Xen?


Xen still supports that.

We have asked downstream for their opinion about dropping 32-bit PV
guest support in the kernel about 1 year ago and the common answer was:
no problem, but for users still wanting 32 bit guests we should wait
until PVH support is available in all related products. Grub2 was the
last one missing and as grub2 has released a version with PVH support
I posted this small series now.


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Andrew Cooper
On 15/07/2019 18:28, Andy Lutomirski wrote:
> On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen  wrote:
>> Juergen Gross  writes:
>>
>>> The long term plan has been to replace Xen PV guests by PVH. The first
>>> victim of that plan are now 32-bit PV guests, as those are used only
>>> rather seldom these days. Xen on x86 requires 64-bit support and with
>>> Grub2 now supporting PVH officially since version 2.04 there is no
>>> need to keep 32-bit PV guest support alive in the Linux kernel.
>>> Additionally Meltdown mitigation is not available in the kernel running
>>> as 32-bit PV guest, so dropping this mode makes sense from security
>>> point of view, too.
>> Normally we have a deprecation period for feature removals like this.
>> You would make the kernel print a warning for some releases, and when
>> no user complains you can then remove. If a user complains you can't.
>>
> As I understand it, the kernel rules do allow changes like this even
> if there's a complaint: this is a patch that removes what is
> effectively hardware support.  If the maintenance cost exceeds the
> value, then removal is fair game.  (Obviously we weight the value to
> preserving compatibility quite highly, but in this case, Xen dropped
> 32-bit hardware support a long time ago.  If the Xen hypervisor says
> that 32-bit PV guest support is deprecated, it's deprecated.)
>
> That being said, a warning might not be a bad idea.  What's the
> current status of this in upstream Xen?

So personally, I'd prefer to see support stay, but at the end of the day
it is Juergen's choice as the maintainer of the code.

Xen itself has been exclusively 64-bit since Xen 4.3 (released in 2013).

Over time, various features like SMEP/SMAP have been making 32bit PV
guests progressively slower, because ring 1 is supervisor rather than
user.  Things have got even worse with IBRS, to the point at which 32bit
PV guests are starting to run like treacle.

There are no current plans to remove support for 32bit PV guests from
Xen, but it is very much in the category of "you shouldn't be using this
mode any more".

~Andrew

P.S. I don't see 64bit PV guest support going anywhere, because there
are still a number of open performance questions due to the inherent
differences between syscall and vmexit, and the difference EPT/NPT
tables make on cross-domain mappings.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Andy Lutomirski
On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen  wrote:
>
> Juergen Gross  writes:
>
> > The long term plan has been to replace Xen PV guests by PVH. The first
> > victim of that plan are now 32-bit PV guests, as those are used only
> > rather seldom these days. Xen on x86 requires 64-bit support and with
> > Grub2 now supporting PVH officially since version 2.04 there is no
> > need to keep 32-bit PV guest support alive in the Linux kernel.
> > Additionally Meltdown mitigation is not available in the kernel running
> > as 32-bit PV guest, so dropping this mode makes sense from security
> > point of view, too.
>
> Normally we have a deprecation period for feature removals like this.
> You would make the kernel print a warning for some releases, and when
> no user complains you can then remove. If a user complains you can't.
>

As I understand it, the kernel rules do allow changes like this even
if there's a complaint: this is a patch that removes what is
effectively hardware support.  If the maintenance cost exceeds the
value, then removal is fair game.  (Obviously we weight the value to
preserving compatibility quite highly, but in this case, Xen dropped
32-bit hardware support a long time ago.  If the Xen hypervisor says
that 32-bit PV guest support is deprecated, it's deprecated.)

That being said, a warning might not be a bad idea.  What's the
current status of this in upstream Xen?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Andi Kleen
Juergen Gross  writes:

> The long term plan has been to replace Xen PV guests by PVH. The first
> victim of that plan are now 32-bit PV guests, as those are used only
> rather seldom these days. Xen on x86 requires 64-bit support and with
> Grub2 now supporting PVH officially since version 2.04 there is no
> need to keep 32-bit PV guest support alive in the Linux kernel.
> Additionally Meltdown mitigation is not available in the kernel running
> as 32-bit PV guest, so dropping this mode makes sense from security
> point of view, too.

Normally we have a deprecation period for feature removals like this.
You would make the kernel print a warning for some releases, and when
no user complains you can then remove. If a user complains you can't.

-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 31/33] drm/bochs: drop use of drmP.h

2019-07-15 Thread Sam Ravnborg
On Mon, Jul 01, 2019 at 08:39:08AM +0200, Gerd Hoffmann wrote:
> On Sun, Jun 30, 2019 at 08:19:20AM +0200, Sam Ravnborg wrote:
> > Drop use of the deprecated drmP.h header file.
> > Made bochs.h self-contained and then fixed
> > fallout in remaining files.
> > Several unused includes was dropped in the process.
> > 
> > Signed-off-by: Sam Ravnborg 
> > Cc: Gerd Hoffmann 
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Cc: virtualization@lists.linux-foundation.org
> 
> Acked-by: Gerd Hoffmann 

Thanks again, applied.

Sam
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 27/33] drm/virtgpu: drop use of drmP.h

2019-07-15 Thread Sam Ravnborg
On Mon, Jul 01, 2019 at 08:38:56AM +0200, Gerd Hoffmann wrote:
> On Sun, Jun 30, 2019 at 08:19:16AM +0200, Sam Ravnborg wrote:
> > Drop use of the deprecated drmP.h header file.
> > Fix fallout by adding missing include files.
> > 
> > Signed-off-by: Sam Ravnborg 
> > Cc: David Airlie 
> > Cc: Gerd Hoffmann 
> > Cc: Daniel Vetter 
> > Cc: virtualization@lists.linux-foundation.org
> 
> Acked-by: Gerd Hoffmann 

Thanks, applied.

Sam
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 09/33] drm/qxl: drop use of drmP.h

2019-07-15 Thread Sam Ravnborg
On Mon, Jul 01, 2019 at 08:38:43AM +0200, Gerd Hoffmann wrote:
> On Sun, Jun 30, 2019 at 08:18:58AM +0200, Sam Ravnborg wrote:
> > Drop use of the deprecated drmP.h header file.
> > While touching the files divided includes in blocks,
> > and when needed sort the blocks.
> > Fix fallout.
> > 
> > Signed-off-by: Sam Ravnborg 
> > Cc: Dave Airlie 
> > Cc: Gerd Hoffmann 
> > Cc: virtualization@lists.linux-foundation.org
> > Cc: spice-de...@lists.freedesktop.org
> 
> Acked-by: Gerd Hoffmann 

Thanks, applied.

Sam
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andrew Cooper
There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
rest of the Local APIC state isn't a clever thing to be doing.

Delete the suspend/resume cr8 handling, which shrinks the size of struct
saved_context, and allows for the removal of both PVOPS.

Signed-off-by: Andrew Cooper 
---
CC: x...@kernel.org
CC: virtualization@lists.linux-foundation.org
CC: Borislav Petkov 
CC: Peter Zijlstra 
CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Stephane Eranian 
CC: Feng Tang 
CC: Juergen Gross 
CC: Boris Ostrovsky 
CC: "Rafael J. Wysocki" 
CC: Pavel Machek 

Spotted while reviewing "x86/apic: Initialize TPR to block interrupts 16-31"

https://lore.kernel.org/lkml/dc04a9f8b234d7b0956a8d2560b8945bcd9c4bf7.1563117760.git.l...@kernel.org/

v2:
 * Drop saved_context.cr8 as well (Juergen)
 * Remove akata...@vmware.com from the CC list due to bounces
---
 arch/x86/include/asm/paravirt.h   | 12 
 arch/x86/include/asm/paravirt_types.h |  5 -
 arch/x86/include/asm/special_insns.h  | 24 
 arch/x86/include/asm/suspend_64.h |  2 +-
 arch/x86/kernel/asm-offsets_64.c  |  1 -
 arch/x86/kernel/paravirt.c|  4 
 arch/x86/power/cpu.c  |  4 
 arch/x86/xen/enlighten_pv.c   | 15 ---
 8 files changed, 1 insertion(+), 66 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c25c38a05c1c..0e4a0539c353 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -139,18 +139,6 @@ static inline void __write_cr4(unsigned long x)
PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-#ifdef CONFIG_X86_64
-static inline unsigned long read_cr8(void)
-{
-   return PVOP_CALL0(unsigned long, cpu.read_cr8);
-}
-
-static inline void write_cr8(unsigned long x)
-{
-   PVOP_VCALL1(cpu.write_cr8, x);
-}
-#endif
-
 static inline void arch_safe_halt(void)
 {
PVOP_VCALL0(irq.safe_halt);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 946f8f1f1efc..3c775fb5524b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -119,11 +119,6 @@ struct pv_cpu_ops {
 
void (*write_cr4)(unsigned long);
 
-#ifdef CONFIG_X86_64
-   unsigned long (*read_cr8)(void);
-   void (*write_cr8)(unsigned long);
-#endif
-
/* Segment descriptor handling */
void (*load_tr_desc)(void);
void (*load_gdt)(const struct desc_ptr *);
diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index 219be88a59d2..6d37b8fcfc77 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -73,20 +73,6 @@ static inline unsigned long native_read_cr4(void)
 
 void native_write_cr4(unsigned long val);
 
-#ifdef CONFIG_X86_64
-static inline unsigned long native_read_cr8(void)
-{
-   unsigned long cr8;
-   asm volatile("movq %%cr8,%0" : "=r" (cr8));
-   return cr8;
-}
-
-static inline void native_write_cr8(unsigned long val)
-{
-   asm volatile("movq %0,%%cr8" :: "r" (val) : "memory");
-}
-#endif
-
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 static inline u32 rdpkru(void)
 {
@@ -200,16 +186,6 @@ static inline void wbinvd(void)
 
 #ifdef CONFIG_X86_64
 
-static inline unsigned long read_cr8(void)
-{
-   return native_read_cr8();
-}
-
-static inline void write_cr8(unsigned long x)
-{
-   native_write_cr8(x);
-}
-
 static inline void load_gs_index(unsigned selector)
 {
native_load_gs_index(selector);
diff --git a/arch/x86/include/asm/suspend_64.h 
b/arch/x86/include/asm/suspend_64.h
index a7af9f53c0cb..35bb35d28733 100644
--- a/arch/x86/include/asm/suspend_64.h
+++ b/arch/x86/include/asm/suspend_64.h
@@ -34,7 +34,7 @@ struct saved_context {
 */
unsigned long kernelmode_gs_base, usermode_gs_base, fs_base;
 
-   unsigned long cr0, cr2, cr3, cr4, cr8;
+   unsigned long cr0, cr2, cr3, cr4;
u64 misc_enable;
bool misc_enable_saved;
struct saved_msrs saved_msrs;
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index d3d075226c0a..8b54d8e3a561 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -62,7 +62,6 @@ int main(void)
ENTRY(cr2);
ENTRY(cr3);
ENTRY(cr4);
-   ENTRY(cr8);
ENTRY(gdt_desc);
BLANK();
 #undef ENTRY
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 98039d7fb998..de4d4e8a54c1 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -311,10 +311,6 @@ struct paravirt_patch_template pv_ops = {
.cpu.read_cr0   = native_read_cr0,
.cpu.write_cr0   

Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andrew Cooper
On 15/07/2019 15:52, Juergen Gross wrote:
> On 15.07.19 16:26, Andy Lutomirski wrote:
>> On Mon, Jul 15, 2019 at 6:23 AM Juergen Gross  wrote:
>>>
>>> On 15.07.19 15:00, Andrew Cooper wrote:
 There is a lot of infrastructure for functionality which is used
 exclusively in __{save,restore}_processor_state() on the
 suspend/resume
 path.

 cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored
 independently by lapic_{suspend,resume}().
>>>
>>> Aren't those called only with CONFIG_PM set?
>>>
>>
>>
>> Unless I'm missing something, we only build any of the restore code
>> (including the write_cr8() call) if CONFIG_PM_SLEEP is set, and that
>> selects CONFIG_PM, so we should be fine, I think.
>>
>
> Okay, in that case I'd suggest to remove "cr8" from struct saved_context
> as it won't be used any longer.

To be honest, saving and restoring of cr8 without the rest of the Local
APIC state is conceptually broken to begin with.

If there are any bugs revealed by this, then the correct fixes are
elsewhere.

You're right about the saved_context.  I'll submit a v2 with an even
larger negative diffstat.

~Andrew
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Juergen Gross

On 15.07.19 16:26, Andy Lutomirski wrote:

On Mon, Jul 15, 2019 at 6:23 AM Juergen Gross  wrote:


On 15.07.19 15:00, Andrew Cooper wrote:

There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored
independently by lapic_{suspend,resume}().


Aren't those called only with CONFIG_PM set?




Unless I'm missing something, we only build any of the restore code
(including the write_cr8() call) if CONFIG_PM_SLEEP is set, and that
selects CONFIG_PM, so we should be fine, I think.



Okay, in that case I'd suggest to remove "cr8" from struct saved_context
as it won't be used any longer.


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 4.4 33/53] vhost_net: disable zerocopy by default

2019-07-15 Thread Sasha Levin
From: Jason Wang 

[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ]

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 645b2197930e..f46317135224 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -30,7 +30,7 @@
 
 #include "vhost.h"
 
-static int experimental_zcopytx = 1;
+static int experimental_zcopytx = 0;
 module_param(experimental_zcopytx, int, 0444);
 MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
   " 1 -Enable; 0 - Disable");
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 4.9 41/73] vhost_net: disable zerocopy by default

2019-07-15 Thread Sasha Levin
From: Jason Wang 

[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ]

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 681d0eade82f..75e1089dfb01 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -30,7 +30,7 @@
 
 #include "vhost.h"
 
-static int experimental_zcopytx = 1;
+static int experimental_zcopytx = 0;
 module_param(experimental_zcopytx, int, 0444);
 MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
   " 1 -Enable; 0 - Disable");
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-07-15 Thread Michael S. Tsirkin
On Sun, Jul 14, 2019 at 02:51:18AM -0300, Thiago Jung Bauermann wrote:
> 
> 
> Michael S. Tsirkin  writes:
> 
> > On Thu, Jun 27, 2019 at 10:58:40PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Mon, Jun 03, 2019 at 10:13:59PM -0300, Thiago Jung Bauermann wrote:
> >> >>
> >> >>
> >> >> Michael S. Tsirkin  writes:
> >> >>
> >> >> > On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> >> >> >> I rephrased it in terms of address translation. What do you think of
> >> >> >> this version? The flag name is slightly different too:
> >> >> >>
> >> >> >>
> >> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> >> >> >> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not 
> >> >> >> set,
> >> >> >> with the exception that address translation is guaranteed to be
> >> >> >> unnecessary when accessing memory addresses supplied to the 
> >> >> >> device
> >> >> >> by the driver. Which is to say, the device will always use 
> >> >> >> physical
> >> >> >> addresses matching addresses used by the driver (typically 
> >> >> >> meaning
> >> >> >> physical addresses used by the CPU) and not translated further. 
> >> >> >> This
> >> >> >> flag should be set by the guest if offered, but to allow for
> >> >> >> backward-compatibility device implementations allow for it to be
> >> >> >> left unset by the guest. It is an error to set both this flag and
> >> >> >> VIRTIO_F_ACCESS_PLATFORM.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > OK so VIRTIO_F_ACCESS_PLATFORM is designed to allow unpriveledged
> >> >> > drivers. This is why devices fail when it's not negotiated.
> >> >>
> >> >> Just to clarify, what do you mean by unprivileged drivers? Is it drivers
> >> >> implemented in guest userspace such as with VFIO? Or unprivileged in
> >> >> some other sense such as needing to use bounce buffers for some reason?
> >> >
> >> > I had drivers in guest userspace in mind.
> >>
> >> Great. Thanks for clarifying.
> >>
> >> I don't think this flag would work for guest userspace drivers. Should I
> >> add a note about that in the flag definition?
> >
> > I think you need to clarify access protection rules. Is it only
> > translation that is bypassed or is any platform-specific
> > protection mechanism bypassed too?
> 
> It is only translation. In a secure guest, if the device tries to access
> a memory address that wasn't provided by the driver then the
> architecture will deny that access. If the device accesses addresses
> provided to it by the driver, then there's no protection mechanism or
> translation to get in the way.
> 
> >> >> > This confuses me.
> >> >> > If driver is unpriveledged then what happens with this flag?
> >> >> > It can supply any address it wants. Will that corrupt kernel
> >> >> > memory?
> >> >>
> >> >> Not needing address translation doesn't necessarily mean that there's no
> >> >> IOMMU. On powerpc we don't use VIRTIO_F_ACCESS_PLATFORM but there's
> >> >> always an IOMMU present. And we also support VFIO drivers. The VFIO API
> >> >> for pseries (sPAPR section in Documentation/vfio.txt) has extra ioctls
> >> >> to program the IOMMU.
> >> >>
> >> >> For our use case, we don't need address translation because we set up an
> >> >> identity mapping in the IOMMU so that the device can use guest physical
> >> >> addresses.
> >
> > OK so I think I am beginning to see it in a different light.  Right now the 
> > specific
> > platform creates an identity mapping. That in turn means DMA API can be
> > fast - it does not need to do anything.  What you are looking for is a
> > way to tell host it's an identity mapping - just as an optimization.
> >
> > Is that right?
> 
> Almost. Theoretically it is just an optimization. But in practice the
> pseries boot firmware (SLOF) doesn't support IOMMU_PLATFORM so it's not
> possible to boot a guest from a device with that flag set.
> 
> > So this is what I would call this option:
> >
> > VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS
> >
> > and the explanation should state that all device
> > addresses are translated by the platform to identical
> > addresses.
> >
> > In fact this option then becomes more, not less restrictive
> > than VIRTIO_F_ACCESS_PLATFORM - it's a promise
> > by guest to only create identity mappings,
> > and only before driver_ok is set.
> > This option then would always be negotiated together with
> > VIRTIO_F_ACCESS_PLATFORM.
> >
> > Host then must verify that
> > 1. full 1:1 mappings are created before driver_ok
> > or can we make sure this happens before features_ok?
> > that would be ideal as we could require that features_ok fails
> > 2. mappings are not modified between driver_ok and reset
> > i guess attempts to change them will fail -
> > possibly by causing a guest crash
> > or some other kind of platform-specific error
> 
> I think VIRTIO_F_ACCESS_PLATFORM_IDENTITY_ADDRESS is good, but requiring
> it

[PATCH AUTOSEL 4.14 056/105] vhost_net: disable zerocopy by default

2019-07-15 Thread Sasha Levin
From: Jason Wang 

[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ]

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index b40e8ded49c6..4d11152e60c1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -35,7 +35,7 @@
 
 #include "vhost.h"
 
-static int experimental_zcopytx = 1;
+static int experimental_zcopytx = 0;
 module_param(experimental_zcopytx, int, 0444);
 MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
   " 1 -Enable; 0 - Disable");
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andy Lutomirski
On Mon, Jul 15, 2019 at 6:23 AM Juergen Gross  wrote:
>
> On 15.07.19 15:00, Andrew Cooper wrote:
> > There is a lot of infrastructure for functionality which is used
> > exclusively in __{save,restore}_processor_state() on the suspend/resume
> > path.
> >
> > cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored
> > independently by lapic_{suspend,resume}().
>
> Aren't those called only with CONFIG_PM set?
>


Unless I'm missing something, we only build any of the restore code
(including the write_cr8() call) if CONFIG_PM_SLEEP is set, and that
selects CONFIG_PM, so we should be fine, I think.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 4.19 079/158] vhost_net: disable zerocopy by default

2019-07-15 Thread Sasha Levin
From: Jason Wang 

[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ]

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 39155d7cc894..ae704658b528 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -36,7 +36,7 @@
 
 #include "vhost.h"
 
-static int experimental_zcopytx = 1;
+static int experimental_zcopytx = 0;
 module_param(experimental_zcopytx, int, 0444);
 MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
   " 1 -Enable; 0 - Disable");
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.1 105/219] vhost_net: disable zerocopy by default

2019-07-15 Thread Sasha Levin
From: Jason Wang 

[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ]

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index df51a35cf537..8beacbee2553 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -36,7 +36,7 @@
 
 #include "vhost.h"
 
-static int experimental_zcopytx = 1;
+static int experimental_zcopytx = 0;
 module_param(experimental_zcopytx, int, 0444);
 MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
   " 1 -Enable; 0 - Disable");
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH AUTOSEL 5.2 119/249] vhost_net: disable zerocopy by default

2019-07-15 Thread Sasha Levin
From: Jason Wang 

[ Upstream commit 098eadce3c622c07b328d0a43dda379b38cf7c5e ]

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/

Signed-off-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/vhost/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index d57ebdd616d9..247e5585af5d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -35,7 +35,7 @@
 
 #include "vhost.h"
 
-static int experimental_zcopytx = 1;
+static int experimental_zcopytx = 0;
 module_param(experimental_zcopytx, int, 0444);
 MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
   " 1 -Enable; 0 - Disable");
-- 
2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Juergen Gross

On 15.07.19 15:00, Andrew Cooper wrote:

There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored
independently by lapic_{suspend,resume}().


Aren't those called only with CONFIG_PM set?


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andrew Cooper
There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored
independently by lapic_{suspend,resume}().

Delete the saving and restoration of cr8, which allows for the removal
of both PVOPS.

Signed-off-by: Andrew Cooper 
---
CC: x...@kernel.org
CC: virtualization@lists.linux-foundation.org
CC: Borislav Petkov 
CC: Peter Zijlstra 
CC: Andy Lutomirski 
CC: Nadav Amit 
CC: Stephane Eranian 
CC: Feng Tang 
CC: Juergen Gross 
CC: Boris Ostrovsky 
CC: Alok Kataria 
CC: "Rafael J. Wysocki" 
CC: Pavel Machek 

Spotted while reviewing "x86/apic: Initialize TPR to block interrupts 16-31"

https://lore.kernel.org/lkml/dc04a9f8b234d7b0956a8d2560b8945bcd9c4bf7.1563117760.git.l...@kernel.org/
---
 arch/x86/include/asm/paravirt.h   | 12 
 arch/x86/include/asm/paravirt_types.h |  5 -
 arch/x86/include/asm/special_insns.h  | 24 
 arch/x86/kernel/paravirt.c|  4 
 arch/x86/power/cpu.c  |  4 
 arch/x86/xen/enlighten_pv.c   | 15 ---
 6 files changed, 64 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c25c38a05c1c..0e4a0539c353 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -139,18 +139,6 @@ static inline void __write_cr4(unsigned long x)
PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-#ifdef CONFIG_X86_64
-static inline unsigned long read_cr8(void)
-{
-   return PVOP_CALL0(unsigned long, cpu.read_cr8);
-}
-
-static inline void write_cr8(unsigned long x)
-{
-   PVOP_VCALL1(cpu.write_cr8, x);
-}
-#endif
-
 static inline void arch_safe_halt(void)
 {
PVOP_VCALL0(irq.safe_halt);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 946f8f1f1efc..3c775fb5524b 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -119,11 +119,6 @@ struct pv_cpu_ops {
 
void (*write_cr4)(unsigned long);
 
-#ifdef CONFIG_X86_64
-   unsigned long (*read_cr8)(void);
-   void (*write_cr8)(unsigned long);
-#endif
-
/* Segment descriptor handling */
void (*load_tr_desc)(void);
void (*load_gdt)(const struct desc_ptr *);
diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index 219be88a59d2..6d37b8fcfc77 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -73,20 +73,6 @@ static inline unsigned long native_read_cr4(void)
 
 void native_write_cr4(unsigned long val);
 
-#ifdef CONFIG_X86_64
-static inline unsigned long native_read_cr8(void)
-{
-   unsigned long cr8;
-   asm volatile("movq %%cr8,%0" : "=r" (cr8));
-   return cr8;
-}
-
-static inline void native_write_cr8(unsigned long val)
-{
-   asm volatile("movq %0,%%cr8" :: "r" (val) : "memory");
-}
-#endif
-
 #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
 static inline u32 rdpkru(void)
 {
@@ -200,16 +186,6 @@ static inline void wbinvd(void)
 
 #ifdef CONFIG_X86_64
 
-static inline unsigned long read_cr8(void)
-{
-   return native_read_cr8();
-}
-
-static inline void write_cr8(unsigned long x)
-{
-   native_write_cr8(x);
-}
-
 static inline void load_gs_index(unsigned selector)
 {
native_load_gs_index(selector);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 98039d7fb998..de4d4e8a54c1 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -311,10 +311,6 @@ struct paravirt_patch_template pv_ops = {
.cpu.read_cr0   = native_read_cr0,
.cpu.write_cr0  = native_write_cr0,
.cpu.write_cr4  = native_write_cr4,
-#ifdef CONFIG_X86_64
-   .cpu.read_cr8   = native_read_cr8,
-   .cpu.write_cr8  = native_write_cr8,
-#endif
.cpu.wbinvd = native_wbinvd,
.cpu.read_msr   = native_read_msr,
.cpu.write_msr  = native_write_msr,
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 24b079e94bc2..1c58d8982728 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -122,9 +122,6 @@ static void __save_processor_state(struct saved_context 
*ctxt)
ctxt->cr2 = read_cr2();
ctxt->cr3 = __read_cr3();
ctxt->cr4 = __read_cr4();
-#ifdef CONFIG_X86_64
-   ctxt->cr8 = read_cr8();
-#endif
ctxt->misc_enable_saved = !rdmsrl_safe(MSR_IA32_MISC_ENABLE,
   &ctxt->misc_enable);
msr_save_context(ctxt);
@@ -207,7 +204,6 @@ static void notrace __restore_processor_state(struct 
saved_context *ctxt)
 #else
 /* CONFIG X86_64 */
wrmsrl(MSR_EFER, ctxt->efer);
-   write_cr8(ctxt->cr8);
__write_cr4(ctxt->cr4);
 #endif
write_cr3(ctxt->cr3);
diff --git a/arch/x86/xen/enlighten

Re: [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Juergen Gross

On 15.07.19 14:32, Peter Zijlstra wrote:

On Mon, Jul 15, 2019 at 01:37:37PM +0200, Juergen Gross wrote:

The long term plan has been to replace Xen PV guests by PVH. The first
victim of that plan are now 32-bit PV guests, as those are used only
rather seldom these days. Xen on x86 requires 64-bit support and with
Grub2 now supporting PVH officially since version 2.04 there is no
need to keep 32-bit PV guest support alive in the Linux kernel.
Additionally Meltdown mitigation is not available in the kernel running
as 32-bit PV guest, so dropping this mode makes sense from security
point of view, too.

Juergen Gross (2):
   x86/xen: remove 32-bit Xen PV guest support
   x86/paravirt: remove 32-bit support from PARAVIRT_XXL


Hooray!



Always a pleasure to cheer the community up by sending Xen patches. :-D


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Peter Zijlstra
On Mon, Jul 15, 2019 at 01:37:37PM +0200, Juergen Gross wrote:
> The long term plan has been to replace Xen PV guests by PVH. The first
> victim of that plan are now 32-bit PV guests, as those are used only
> rather seldom these days. Xen on x86 requires 64-bit support and with
> Grub2 now supporting PVH officially since version 2.04 there is no
> need to keep 32-bit PV guest support alive in the Linux kernel.
> Additionally Meltdown mitigation is not available in the kernel running
> as 32-bit PV guest, so dropping this mode makes sense from security
> point of view, too.
> 
> Juergen Gross (2):
>   x86/xen: remove 32-bit Xen PV guest support
>   x86/paravirt: remove 32-bit support from PARAVIRT_XXL

Hooray!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Juergen Gross
The long term plan has been to replace Xen PV guests by PVH. The first
victim of that plan are now 32-bit PV guests, as those are used only
rather seldom these days. Xen on x86 requires 64-bit support and with
Grub2 now supporting PVH officially since version 2.04 there is no
need to keep 32-bit PV guest support alive in the Linux kernel.
Additionally Meltdown mitigation is not available in the kernel running
as 32-bit PV guest, so dropping this mode makes sense from security
point of view, too.

Juergen Gross (2):
  x86/xen: remove 32-bit Xen PV guest support
  x86/paravirt: remove 32-bit support from PARAVIRT_XXL

 arch/x86/entry/entry_32.S   |  93 
 arch/x86/entry/vdso/vdso32/vclock_gettime.c |   1 +
 arch/x86/include/asm/paravirt.h | 105 +
 arch/x86/include/asm/paravirt_types.h   |  20 --
 arch/x86/include/asm/pgtable-3level_types.h |   5 -
 arch/x86/include/asm/proto.h|   2 +-
 arch/x86/include/asm/segment.h  |   2 +-
 arch/x86/include/asm/traps.h|   2 +-
 arch/x86/kernel/cpu/common.c|   8 -
 arch/x86/kernel/paravirt.c  |  17 --
 arch/x86/kernel/paravirt_patch_32.c |  36 +--
 arch/x86/xen/Kconfig|   3 +-
 arch/x86/xen/Makefile   |   4 +-
 arch/x86/xen/apic.c |  17 --
 arch/x86/xen/enlighten_pv.c |  45 +---
 arch/x86/xen/mmu_pv.c   | 326 +++-
 arch/x86/xen/p2m.c  |   4 -
 arch/x86/xen/setup.c|  44 +---
 arch/x86/xen/smp_pv.c   |  19 +-
 arch/x86/xen/xen-asm.S  |  14 --
 arch/x86/xen/xen-asm_32.S   | 207 --
 arch/x86/xen/xen-head.S |   6 -
 arch/x86/xen/xen-ops.h  |   5 -
 drivers/xen/Kconfig |   4 +-
 24 files changed, 57 insertions(+), 932 deletions(-)
 delete mode 100644 arch/x86/xen/xen-asm_32.S

-- 
2.16.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/2] x86/paravirt: remove 32-bit support from PARAVIRT_XXL

2019-07-15 Thread Juergen Gross
The last 32-bit user of stuff under CONFIG_PARAVIRT_XXL is gone.

Remove 32-bit specific parts.

Signed-off-by: Juergen Gross 
---
 arch/x86/entry/vdso/vdso32/vclock_gettime.c |   1 +
 arch/x86/include/asm/paravirt.h | 105 
 arch/x86/include/asm/paravirt_types.h   |  20 --
 arch/x86/include/asm/pgtable-3level_types.h |   5 --
 arch/x86/kernel/cpu/common.c|   8 ---
 arch/x86/kernel/paravirt.c  |  17 -
 arch/x86/kernel/paravirt_patch_32.c |  36 +-
 7 files changed, 15 insertions(+), 177 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso32/vclock_gettime.c 
b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
index 9242b28418d5..36f4ce1405cb 100644
--- a/arch/x86/entry/vdso/vdso32/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vdso32/vclock_gettime.c
@@ -17,6 +17,7 @@
 #undef CONFIG_ILLEGAL_POINTER_VALUE
 #undef CONFIG_SPARSEMEM_VMEMMAP
 #undef CONFIG_NR_CPUS
+#undef CONFIG_PARAVIRT_XXL
 
 #define CONFIG_X86_32 1
 #define CONFIG_PGTABLE_LEVELS 2
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c25c38a05c1c..60dfa93313a9 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -139,7 +139,6 @@ static inline void __write_cr4(unsigned long x)
PVOP_VCALL1(cpu.write_cr4, x);
 }
 
-#ifdef CONFIG_X86_64
 static inline unsigned long read_cr8(void)
 {
return PVOP_CALL0(unsigned long, cpu.read_cr8);
@@ -149,7 +148,6 @@ static inline void write_cr8(unsigned long x)
 {
PVOP_VCALL1(cpu.write_cr8, x);
 }
-#endif
 
 static inline void arch_safe_halt(void)
 {
@@ -283,12 +281,10 @@ static inline void load_TLS(struct thread_struct *t, 
unsigned cpu)
PVOP_VCALL2(cpu.load_tls, t, cpu);
 }
 
-#ifdef CONFIG_X86_64
 static inline void load_gs_index(unsigned int gs)
 {
PVOP_VCALL1(cpu.load_gs_index, gs);
 }
-#endif
 
 static inline void write_ldt_entry(struct desc_struct *dt, int entry,
   const void *desc)
@@ -375,50 +371,28 @@ static inline pte_t __pte(pteval_t val)
 {
pteval_t ret;
 
-   if (sizeof(pteval_t) > sizeof(long))
-   ret = PVOP_CALLEE2(pteval_t, mmu.make_pte, val, (u64)val >> 32);
-   else
-   ret = PVOP_CALLEE1(pteval_t, mmu.make_pte, val);
+   ret = PVOP_CALLEE1(pteval_t, mmu.make_pte, val);
 
return (pte_t) { .pte = ret };
 }
 
 static inline pteval_t pte_val(pte_t pte)
 {
-   pteval_t ret;
-
-   if (sizeof(pteval_t) > sizeof(long))
-   ret = PVOP_CALLEE2(pteval_t, mmu.pte_val,
-  pte.pte, (u64)pte.pte >> 32);
-   else
-   ret = PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
-
-   return ret;
+   return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
 }
 
 static inline pgd_t __pgd(pgdval_t val)
 {
pgdval_t ret;
 
-   if (sizeof(pgdval_t) > sizeof(long))
-   ret = PVOP_CALLEE2(pgdval_t, mmu.make_pgd, val, (u64)val >> 32);
-   else
-   ret = PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val);
+   ret = PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val);
 
return (pgd_t) { ret };
 }
 
 static inline pgdval_t pgd_val(pgd_t pgd)
 {
-   pgdval_t ret;
-
-   if (sizeof(pgdval_t) > sizeof(long))
-   ret =  PVOP_CALLEE2(pgdval_t, mmu.pgd_val,
-   pgd.pgd, (u64)pgd.pgd >> 32);
-   else
-   ret =  PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
-
-   return ret;
+   return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -435,79 +409,48 @@ static inline pte_t ptep_modify_prot_start(struct 
vm_area_struct *vma, unsigned
 static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
   pte_t *ptep, pte_t old_pte, pte_t 
pte)
 {
-
-   if (sizeof(pteval_t) > sizeof(long))
-   /* 5 arg words */
-   pv_ops.mmu.ptep_modify_prot_commit(vma, addr, ptep, pte);
-   else
-   PVOP_VCALL4(mmu.ptep_modify_prot_commit,
-   vma, addr, ptep, pte.pte);
+   PVOP_VCALL4(mmu.ptep_modify_prot_commit, vma, addr, ptep, pte.pte);
 }
 
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
-   if (sizeof(pteval_t) > sizeof(long))
-   PVOP_VCALL3(mmu.set_pte, ptep, pte.pte, (u64)pte.pte >> 32);
-   else
-   PVOP_VCALL2(mmu.set_pte, ptep, pte.pte);
+   PVOP_VCALL2(mmu.set_pte, ptep, pte.pte);
 }
 
 static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep, pte_t pte)
 {
-   if (sizeof(pteval_t) > sizeof(long))
-   /* 5 arg words */
-   pv_ops.mmu.set_pte_at(mm, addr, ptep, pte);
-   else
-   PVOP_VCALL4(mmu.set_pte_at, mm, addr, ptep, pte.pte);
+   PVOP_VCALL4(mmu.set_pte_

Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock

2019-07-15 Thread Stefano Garzarella
On Mon, Jul 15, 2019 at 05:16:09PM +0800, Jason Wang wrote:
> 
> > > > > > > >struct sk_buff *virtskb_receive_small(struct virtskb 
> > > > > > > > *vs, ...);
> > > > > > > >struct sk_buff *virtskb_receive_big(struct virtskb *vs, 
> > > > > > > > ...);
> > > > > > > >struct sk_buff *virtskb_receive_mergeable(struct virtskb 
> > > > > > > > *vs, ...);
> > > > > > > > 
> > > > > > > >int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
> > > > > > > >int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
> > > > > > > >int virtskb_add_recvbuf_mergeable(struct virtskb *vs, 
> > > > > > > > ...);
> > > > > > > > 
> > > > > > > > For the Guest->Host path it should be easier, so maybe I can 
> > > > > > > > add a
> > > > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a 
> > > > > > > > part of the code
> > > > > > > > of xmit_skb().
> > > > > > > I may miss something, but I don't see any thing that prevents us 
> > > > > > > from using
> > > > > > > xmit_skb() directly.
> > > > > > > 
> > > > > > Yes, but my initial idea was to make it more parametric and not 
> > > > > > related to the
> > > > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the
> > > > > > 'num_buffers' should be handled by the caller.
> > > > > > 
> > > > > > > > Let me know if you have in mind better names or if I should put 
> > > > > > > > these function
> > > > > > > > in another place.
> > > > > > > > 
> > > > > > > > I would like to leave the control part completely separate, so, 
> > > > > > > > for example,
> > > > > > > > the two drivers will negotiate the features independently and 
> > > > > > > > they will call
> > > > > > > > the right virtskb_receive_*() function based on the negotiation.
> > > > > > > If it's one the issue of negotiation, we can simply change the
> > > > > > > virtnet_probe() to deal with different devices.
> > > > > > > 
> > > > > > > 
> > > > > > > > I already started to work on it, but before to do more steps 
> > > > > > > > and send an RFC
> > > > > > > > patch, I would like to hear your opinion.
> > > > > > > > Do you think that makes sense?
> > > > > > > > Do you see any issue or a better solution?
> > > > > > > I still think we need to seek a way of adding some codes on 
> > > > > > > virtio-net.c
> > > > > > > directly if there's no huge different in the processing of TX/RX. 
> > > > > > > That would
> > > > > > > save us a lot time.
> > > > > > After the reading of the buffers from the virtqueue I think the 
> > > > > > process
> > > > > > is slightly different, because virtio-net will interface with the 
> > > > > > network
> > > > > > stack, while virtio-vsock will interface with the vsock-core 
> > > > > > (socket).
> > > > > > So the virtio-vsock implements the following:
> > > > > > - control flow mechanism to avoid to loose packets, informing the 
> > > > > > peer
> > > > > > about the amount of memory available in the receive queue using 
> > > > > > some
> > > > > > fields in the virtio_vsock_hdr
> > > > > > - de-multiplexing parsing the virtio_vsock_hdr and choosing the 
> > > > > > right
> > > > > > socket depending on the port
> > > > > > - socket state handling
> > > 
> > > I think it's just a branch, for ethernet, go for networking stack. 
> > > otherwise
> > > go for vsock core?
> > > 
> > Yes, that should work.
> > 
> > So, I should refactor the functions that can be called also from the vsock
> > core, in order to remove "struct net_device *dev" parameter.
> > Maybe creating some wrappers for the network stack.
> > 
> > Otherwise I should create a fake net_device for vsock_core.
> > 
> > What do you suggest?
> 
> 
> I'm not quite sure I get the question. Can you just use the one that created
> by virtio_net?

Sure, sorry but I missed that it is allocated in the virtnet_probe()!

Thanks,
Stefano
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock

2019-07-15 Thread Jason Wang




   struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
   struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
   struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);

   int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
   int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
   int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);

For the Guest->Host path it should be easier, so maybe I can add a
"virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
of xmit_skb().

I may miss something, but I don't see any thing that prevents us from using
xmit_skb() directly.


Yes, but my initial idea was to make it more parametric and not related to the
virtio_net_hdr, so the 'hdr_len' could be a parameter and the
'num_buffers' should be handled by the caller.


Let me know if you have in mind better names or if I should put these function
in another place.

I would like to leave the control part completely separate, so, for example,
the two drivers will negotiate the features independently and they will call
the right virtskb_receive_*() function based on the negotiation.

If it's one the issue of negotiation, we can simply change the
virtnet_probe() to deal with different devices.



I already started to work on it, but before to do more steps and send an RFC
patch, I would like to hear your opinion.
Do you think that makes sense?
Do you see any issue or a better solution?

I still think we need to seek a way of adding some codes on virtio-net.c
directly if there's no huge different in the processing of TX/RX. That would
save us a lot time.

After the reading of the buffers from the virtqueue I think the process
is slightly different, because virtio-net will interface with the network
stack, while virtio-vsock will interface with the vsock-core (socket).
So the virtio-vsock implements the following:
- control flow mechanism to avoid to loose packets, informing the peer
about the amount of memory available in the receive queue using some
fields in the virtio_vsock_hdr
- de-multiplexing parsing the virtio_vsock_hdr and choosing the right
socket depending on the port
- socket state handling


I think it's just a branch, for ethernet, go for networking stack. otherwise
go for vsock core?


Yes, that should work.

So, I should refactor the functions that can be called also from the vsock
core, in order to remove "struct net_device *dev" parameter.
Maybe creating some wrappers for the network stack.

Otherwise I should create a fake net_device for vsock_core.

What do you suggest?



I'm not quite sure I get the question. Can you just use the one that 
created by virtio_net?



Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock

2019-07-15 Thread Stefano Garzarella
On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:
> 
> On 2019/7/12 下午6:00, Stefano Garzarella wrote:
> > On Thu, Jul 11, 2019 at 03:52:21PM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote:
> > > > On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote:
> > > > > On 2019/7/10 下午11:37, Stefano Garzarella wrote:
> > > > > > Hi,
> > > > > > as Jason suggested some months ago, I looked better at the 
> > > > > > virtio-net driver to
> > > > > > understand if we can reuse some parts also in the virtio-vsock 
> > > > > > driver, since we
> > > > > > have similar challenges (mergeable buffers, page allocation, small
> > > > > > packets, etc.).
> > > > > > 
> > > > > > Initially, I would add the skbuff in the virtio-vsock in order to 
> > > > > > re-use
> > > > > > receive_*() functions.
> > > > > 
> > > > > Yes, that will be a good step.
> > > > > 
> > > > Okay, I'll go on this way.
> > > > 
> > > > > > Then I would move receive_[small, big, mergeable]() and
> > > > > > add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, 
> > > > > > in order to
> > > > > > call them also from virtio-vsock. I need to do some refactoring 
> > > > > > (e.g. leave the
> > > > > > XDP part on the virtio-net driver), but I think it is feasible.
> > > > > > 
> > > > > > The idea is to create a virtio-skb.[h,c] where put these functions 
> > > > > > and a new
> > > > > > object where stores some attributes needed (e.g. hdr_len ) and 
> > > > > > status (e.g.
> > > > > > some fields of struct receive_queue).
> > > > > 
> > > > > My understanding is we could be more ambitious here. Do you see any 
> > > > > blocker
> > > > > for reusing virtio-net directly? It's better to reuse not only the 
> > > > > functions
> > > > > but also the logic like NAPI to avoid re-inventing something buggy and
> > > > > duplicated.
> > > > > 
> > > > These are my concerns:
> > > > - virtio-vsock is not a "net_device", so a lot of code related to
> > > >ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, 
> > > > offloading) will be
> > > >not used by virtio-vsock.
> 
> 
> Linux support device other than ethernet, so it should not be a problem.
> 
> 
> > > > 
> > > > - virtio-vsock has a different header. We can consider it as part of
> > > >virtio_net payload, but it precludes the compatibility with old 
> > > > hosts. This
> > > >was one of the major doubts that made me think about using only the
> > > >send/recv skbuff functions, that it shouldn't break the 
> > > > compatibility.
> 
> 
> We can extend the current vnet header helper for it to work for vsock.

Okay, I'll do it.

> 
> 
> > > > 
> > > > > > This is an idea of virtio-skb.h that
> > > > > > I have in mind:
> > > > > >   struct virtskb;
> > > > > 
> > > > > What fields do you want to store in virtskb? It looks to be exist 
> > > > > sk_buff is
> > > > > flexible enough to us?
> > > > My idea is to store queues information, like struct receive_queue or
> > > > struct send_queue, and some device attributes (e.g. hdr_len ).
> 
> 
> If you reuse skb or virtnet_info, there is not necessary.
> 

Okay.

> 
> > > > 
> > > > > 
> > > > > >   struct sk_buff *virtskb_receive_small(struct virtskb *vs, 
> > > > > > ...);
> > > > > >   struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
> > > > > >   struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, 
> > > > > > ...);
> > > > > > 
> > > > > >   int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
> > > > > >   int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
> > > > > >   int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
> > > > > > 
> > > > > > For the Guest->Host path it should be easier, so maybe I can add a
> > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part 
> > > > > > of the code
> > > > > > of xmit_skb().
> > > > > 
> > > > > I may miss something, but I don't see any thing that prevents us from 
> > > > > using
> > > > > xmit_skb() directly.
> > > > > 
> > > > Yes, but my initial idea was to make it more parametric and not related 
> > > > to the
> > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the
> > > > 'num_buffers' should be handled by the caller.
> > > > 
> > > > > > Let me know if you have in mind better names or if I should put 
> > > > > > these function
> > > > > > in another place.
> > > > > > 
> > > > > > I would like to leave the control part completely separate, so, for 
> > > > > > example,
> > > > > > the two drivers will negotiate the features independently and they 
> > > > > > will call
> > > > > > the right virtskb_receive_*() function based on the negotiation.
> > > > > 
> > > > > If it's one the issue of negotiation, we can simply change the
> > > > > virtnet_probe() to deal with different devices.
> > > > > 
> > > > > 
> > > > > > I already started to work on it, but before to do more steps and 
> > > > > >