Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/27/2018 11:58 AM, Michael S. Tsirkin wrote:

On Wed, Jun 27, 2018 at 11:00:05AM +0800, Wei Wang wrote:

On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:

On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:

On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:


+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.

I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.


and
that would be much more complex than what we have. I think your idea of
passing a list of pages is better.

Best,
Wei

How much memory is this allocating anyway?


For every 2TB memory that the guest has, we allocate 4MB.

Hmm I guess I'm missing something, I don't see it:


+   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
+   entries_per_page = PAGE_SIZE / sizeof(__le64);
+   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
+   max_array_num = max_entries / entries_per_array +
+   !!(max_entries % entries_per_array);

Looks like you always allocate the max number?

Yes. We allocated the max number and then free what's not used.
For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4
buffers to get_from_free_page_list. If it uses 3, then the remaining 1 "4MB
buffer" will end up being freed.

For today's guests, max_array_num is usually 1.

Best,
Wei

I see, it's based on total ram pages. It's reasonable but might
get out of sync if memory is onlined quickly. So you want to
detect that there's more free memory than can fit and
retry the reporting.




- AFAIK, memory hotplug isn't expected to happen during live migration 
today. Hypervisors (e.g. QEMU) explicitly forbid this.


- Allocating buffers based on total ram pages already gives some 
headroom for newly plugged memory if that could happen in any case. 
Also, we can think about why people plug in more memory - usually 
because the existing memory isn't enough, which implies that the free 
page list is very likely to be close to empty.


- This method could be easily scaled if people really need more headroom 
for hot-plugged memory. For example, calculation based on "X * 
total_ram_pages", X could be a number passed from the hypervisor.


- This is an optimization feature, and reporting less free memory in 
that rare case doesn't hurt anything.


So I think it is good to start from a fundamental implementation, which 
doesn't confuse people, and complexities can be added when there is a 
real need in the future.


Best,
Wei


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Michael S. Tsirkin
On Wed, Jun 27, 2018 at 11:00:05AM +0800, Wei Wang wrote:
> On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:
> > > On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> > > > > On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > > > > > 
> > > > > > > > > + if (!arrays)
> > > > > > > > > + return NULL;
> > > > > > > > > +
> > > > > > > > > + for (i = 0; i < max_array_num; i++) {
> > > > > > > > So we are getting a ton of memory here just to free it up a bit 
> > > > > > > > later.
> > > > > > > > Why doesn't get_from_free_page_list get the pages from free 
> > > > > > > > list for us?
> > > > > > > > We could also avoid the 1st allocation then - just build a list
> > > > > > > > of these.
> > > > > > > That wouldn't be a good choice for us. If we check how the regular
> > > > > > > allocation works, there are many many things we need to consider 
> > > > > > > when pages
> > > > > > > are allocated to users.
> > > > > > > For example, we need to take care of the nr_free
> > > > > > > counter, we need to check the watermark and perform the related 
> > > > > > > actions.
> > > > > > > Also the folks working on arch_alloc_page to monitor page 
> > > > > > > allocation
> > > > > > > activities would get a surprise..if page allocation is allowed to 
> > > > > > > work in
> > > > > > > this way.
> > > > > > > 
> > > > > > mm/ code is well positioned to handle all this correctly.
> > > > > I'm afraid that would be a re-implementation of the alloc functions,
> > > > A re-factoring - you can share code. The main difference is locking.
> > > > 
> > > > > and
> > > > > that would be much more complex than what we have. I think your idea 
> > > > > of
> > > > > passing a list of pages is better.
> > > > > 
> > > > > Best,
> > > > > Wei
> > > > How much memory is this allocating anyway?
> > > > 
> > > For every 2TB memory that the guest has, we allocate 4MB.
> > Hmm I guess I'm missing something, I don't see it:
> > 
> > 
> > +   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
> > +   entries_per_page = PAGE_SIZE / sizeof(__le64);
> > +   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
> > +   max_array_num = max_entries / entries_per_array +
> > +   !!(max_entries % entries_per_array);
> > 
> > Looks like you always allocate the max number?
> 
> Yes. We allocated the max number and then free what's not used.
> For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4
> buffers to get_from_free_page_list. If it uses 3, then the remaining 1 "4MB
> buffer" will end up being freed.
> 
> For today's guests, max_array_num is usually 1.
> 
> Best,
> Wei

I see, it's based on total ram pages. It's reasonable but might
get out of sync if memory is onlined quickly. So you want to
detect that there's more free memory than can fit and
retry the reporting.

> 
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:

On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:

On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:


+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.

I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.


and
that would be much more complex than what we have. I think your idea of
passing a list of pages is better.

Best,
Wei

How much memory is this allocating anyway?


For every 2TB memory that the guest has, we allocate 4MB.

Hmm I guess I'm missing something, I don't see it:


+   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
+   entries_per_page = PAGE_SIZE / sizeof(__le64);
+   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
+   max_array_num = max_entries / entries_per_array +
+   !!(max_entries % entries_per_array);

Looks like you always allocate the max number?


Yes. We allocated the max number and then free what's not used.
For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4 
buffers to get_from_free_page_list. If it uses 3, then the remaining 1 
"4MB buffer" will end up being freed.


For today's guests, max_array_num is usually 1.

Best,
Wei





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Michael S. Tsirkin
On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:
> On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> > > On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > > > 
> > > > > > > + if (!arrays)
> > > > > > > + return NULL;
> > > > > > > +
> > > > > > > + for (i = 0; i < max_array_num; i++) {
> > > > > > So we are getting a ton of memory here just to free it up a bit 
> > > > > > later.
> > > > > > Why doesn't get_from_free_page_list get the pages from free list 
> > > > > > for us?
> > > > > > We could also avoid the 1st allocation then - just build a list
> > > > > > of these.
> > > > > That wouldn't be a good choice for us. If we check how the regular
> > > > > allocation works, there are many many things we need to consider when 
> > > > > pages
> > > > > are allocated to users.
> > > > > For example, we need to take care of the nr_free
> > > > > counter, we need to check the watermark and perform the related 
> > > > > actions.
> > > > > Also the folks working on arch_alloc_page to monitor page allocation
> > > > > activities would get a surprise..if page allocation is allowed to 
> > > > > work in
> > > > > this way.
> > > > > 
> > > > mm/ code is well positioned to handle all this correctly.
> > > I'm afraid that would be a re-implementation of the alloc functions,
> > A re-factoring - you can share code. The main difference is locking.
> > 
> > > and
> > > that would be much more complex than what we have. I think your idea of
> > > passing a list of pages is better.
> > > 
> > > Best,
> > > Wei
> > How much memory is this allocating anyway?
> > 
> 
> For every 2TB memory that the guest has, we allocate 4MB.

Hmm I guess I'm missing something, I don't see it:


+   max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
+   entries_per_page = PAGE_SIZE / sizeof(__le64);
+   entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
+   max_array_num = max_entries / entries_per_array +
+   !!(max_entries % entries_per_array);

Looks like you always allocate the max number?


> This is the same
> for both cases.
> For today's guests, usually there will be only one 4MB allocated and passed
> to get_from_free_page_list.
> 
> Best,
> Wei
> 
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:


+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.

I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.


and
that would be much more complex than what we have. I think your idea of
passing a list of pages is better.

Best,
Wei

How much memory is this allocating anyway?



For every 2TB memory that the guest has, we allocate 4MB. This is the 
same for both cases.
For today's guests, usually there will be only one 4MB allocated and 
passed to get_from_free_page_list.


Best,
Wei



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote:
> On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin  wrote:
> > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
> >> > > > > Might not neccessarily be something wrong, but it's very limited to
> >> > > > > prohibit the MAC of VF from changing when enslaved by failover.
> >> > > > You mean guest changing MAC? I'm not sure why we prohibit that.
> >> > > I think Sridhar and Jiri might be better person to answer it. My
> >> > > impression was that sync'ing the MAC address change between all 3
> >> > > devices is challenging, as the failover driver uses MAC address to
> >> > > match net_device internally.
> >>
> >> Yes. The MAC address is assigned by the hypervisor and it needs to manage 
> >> the movement
> >> of the MAC between the PF and VF.  Allowing the guest to change the MAC 
> >> will require
> >> synchronization between the hypervisor and the PF/VF drivers. Most of the 
> >> VF drivers
> >> don't allow changing guest MAC unless it is a trusted VF.
> >
> > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> > For example I can see host just
> > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> 
> That's why I think pairing using MAC is fragile IMHO. When VF's MAC
> got changed before virtio attempts to match and pair the device, it
> ends up with no pairing found out at all.

Guest seems to match on the hardware mac and ignore whatever
is set by user. Makes sense to me and should not be fragile.


> UUID is better.
> 
> -Siwei
> 
> >
> > --
> > MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Siwei Liu
On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin  wrote:
> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
>> > > > > Might not neccessarily be something wrong, but it's very limited to
>> > > > > prohibit the MAC of VF from changing when enslaved by failover.
>> > > > You mean guest changing MAC? I'm not sure why we prohibit that.
>> > > I think Sridhar and Jiri might be better person to answer it. My
>> > > impression was that sync'ing the MAC address change between all 3
>> > > devices is challenging, as the failover driver uses MAC address to
>> > > match net_device internally.
>>
>> Yes. The MAC address is assigned by the hypervisor and it needs to manage 
>> the movement
>> of the MAC between the PF and VF.  Allowing the guest to change the MAC will 
>> require
>> synchronization between the hypervisor and the PF/VF drivers. Most of the VF 
>> drivers
>> don't allow changing guest MAC unless it is a trusted VF.
>
> OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> For example I can see host just
> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.

That's why I think pairing using MAC is fragile IMHO. When VF's MAC
got changed before virtio attempts to match and pair the device, it
ends up with no pairing found out at all. UUID is better.

-Siwei

>
> --
> MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 0/5] Add virtio-iommu driver

2018-06-26 Thread Michael S. Tsirkin
On Thu, Jun 21, 2018 at 08:06:50PM +0100, Jean-Philippe Brucker wrote:
> Implement the base virtio-iommu driver, following version 0.7 of the
> specification [1].
> 
> Changes since last version [2]:
> * Address comments, thanks again for the review.
> * As suggested, add a DT binding description in patch 1.
> * Depend on VIRTIO_MMIO=y to fix a build failure¹
> * Switch to v0.7 of the spec, which changes resv_mem parameters and adds
>   an MMIO flag. These are trivial but not backward compatible. Once
>   device or driver is upstream, updates to the spec will rely on feature
>   bits to stay compatible with this code.
> * Implement the new tlb_sync interface, by splitting add_req() and
>   sync_req(). I noticed a small improvement on netperf stream because
>   the synchronous iommu_unmap() also benefits from this. Other
>   experiments, such as using kmem_cache for requests instead of kmalloc,
>   didn't show any improvement.
> 
> Driver is available on branch virtio-iommu/v0.7 [3], and the x86+ACPI
> prototype is on branch virtio-iommu/devel. That x86 code hasn't changed,
> it still requires the DMA ifdeffery and I lack the expertise to tidy it
> up. The kvmtool example device has been cleaned up and is available on
> branch virtio-iommu/v0.7 [4].
> 
> Feedback welcome!
> 
> Thanks,
> Jean

So as I pointed out new virtio 0 device isn't really welcome ;)
No one bothered implementing virtio 1 in MMIO for all the work
that was put in defining it. The fact that the MMIO part of the
spec doesn't seem to allow for transitional devices might
or might not have something to do with this.

So why make it an MMIO device at all? A system with an IOMMU
will have a PCI bus, won't it? And it's pretty common to
have the IOMMU be a PCI device on the root bus.
Will remove need to bother with dt bindings etc.


> [1] virtio-iommu specification v0.7
> https://www.spinics.net/lists/linux-virtualization/msg34127.html
> [2] virtio-iommu driver v1
> https://www.spinics.net/lists/kvm/msg164322.html
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.7
> 
> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/v0.7
> [4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.7 
> 
>   ---
> ¹ A word on the module story. Because of complex dependencies IOMMU
> drivers cannot yet be .ko modules. Enabling it is outside the scope of
> this series but I have a small prototype on branch virtio-iommu/
> module-devel. It seems desirable since some distros currently ship the
> transport code as module and are unlikely to change this on our account.
> Note that this series works fine with arm64 defconfig, which selects
> VIRTIO_MMIO=y.
> 
> I could use some help to clean this up. Currently my solution is to split
> virtio-iommu into a module and a 3-lines built-in stub, which isn't
> graceful but could have been worse.
> 
> Keeping the whole virtio-iommu driver as builtin would require accessing
> any virtio utility through get_symbol. So far we only have seven of
> those and could keep a list of pointer ops, but I find this solution
> terrible. If virtio or virtio_mmio is a module, virtio-iommu also needs
> to be one.
> 
> The minimal set of changes to make a module out of an OF-based IOMMU
> driver seems to be:
> * Export IOMMU symbols used by drivers
> * Don't give up deferring probe in of_iommu
> * Move IOMMU OF tables to .rodata
> * Create a static virtio-iommu stub that declares the virtio-iommu OF
>   table entry. The build system doesn't pick up IOMMU_OF_DECLARE when
>   done from an object destined to be built as module :(
> * Create a device_link between endpoint and IOMMU, to ensure that
>   removing the IOMMU driver also unbinds the endpoint driver. Putting
>   this in IOMMU core seems like a better approach since even when not
>   built as module, unbinding an IOMMU device via sysfs will cause
>   crashes.
> 
> With this, as long as virtio-mmio isn't loaded, the OF code defers probe
> of endpoints that depend on virtio-iommu. Once virtio-mmio is loaded,
> the probe is still deferred until virtio-iommu registers itself to the
> virtio bus. Once virtio-iommu is loaded, probe of endpoints managed by
> the IOMMU follows.
> 
> I'll investigate ACPI IORT when I find some time, though I don't expect
> much complication and suggest we tackle one problem at a time. Since it
> is a luxury that requires changes to the IOMMU core, module support is
> left as a future improvement.
>   ---
> 
> Jean-Philippe Brucker (5):
>   dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
>   vfio: Allow type-1 IOMMU instantiation for ARM
> 
>  .../devicetree/bindings/virtio/mmio.txt   |8 +
>  MAINTAINERS   |6 +
>  drivers/iommu/Kconfig |   11 +
>  

Re: [virtio-dev] [PATCH v2 3/5] iommu/virtio: Add probe request

2018-06-26 Thread Jean-Philippe Brucker
On 21/06/18 20:06, Jean-Philippe Brucker wrote:
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + u64 start = le64_to_cpu(mem->start);
> + u64 end = le64_to_cpu(mem->end);
> + size_t size = end - start + 1;
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + default:
> + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)

Hm, I messed it up while rebasing. I'll remove this condition.

Thanks,
Jean

> + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
> +  mem->subtype);
> + /* Fall-through */
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + region = iommu_alloc_resv_region(start, size, 0,
> +  IOMMU_RESV_RESERVED);
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(start, size, prot,
> +  IOMMU_RESV_MSI);
> + break;
> + }
> +
> + list_add(>resv_regions, >list);
> + return 0;
> +}
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/5] dt-bindings: virtio: Specify #iommu-cells value for a virtio-iommu

2018-06-26 Thread Jean-Philippe Brucker
On 25/06/18 20:27, Rob Herring wrote:
> On Thu, Jun 21, 2018 at 08:06:51PM +0100, Jean-Philippe Brucker wrote:
>> A virtio-mmio node may represent a virtio-iommu device. This is discovered
>> by the virtio driver at probe time, but the DMA topology isn't
>> discoverable and must be described by firmware. For DT the standard IOMMU
>> description is used, as specified in bindings/iommu/iommu.txt and
>> bindings/pci/pci-iommu.txt. Like many other IOMMUs, virtio-iommu
>> distinguishes masters by their endpoint IDs, which requires one IOMMU cell
>> in the "iommus" property.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> ---
>>  Documentation/devicetree/bindings/virtio/mmio.txt | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
>> b/Documentation/devicetree/bindings/virtio/mmio.txt
>> index 5069c1b8e193..337da0e3a87f 100644
>> --- a/Documentation/devicetree/bindings/virtio/mmio.txt
>> +++ b/Documentation/devicetree/bindings/virtio/mmio.txt
>> @@ -8,6 +8,14 @@ Required properties:
>>  - reg:  control registers base address and size including 
>> configuration space
>>  - interrupts:   interrupt generated by the device
>>  
>> +Required properties for virtio-iommu:
>> +
>> +- #iommu-cells: When the node describes a virtio-iommu device, it is
>> +linked to DMA masters using the "iommus" property as
>> +described in devicetree/bindings/iommu/iommu.txt. For
>> +virtio-iommu #iommu-cells must be 1, each cell describing
>> +a single endpoint ID.
> 
> The iommus property should also be documented for the client side.

Isn't section "IOMMU master node" of iommu.txt sufficient? Since the
iommus property applies to any DMA master, not only virtio-mmio devices,
the canonical description in iommu.txt seems the best place for it, and
I'm not sure what to add in this file. Maybe a short example below the
virtio_block one?

Thanks,
Jean


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 05:08:13PM +0200, Cornelia Huck wrote:
> On Fri, 22 Jun 2018 17:05:04 -0700
> Siwei Liu  wrote:
> 
> > On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin  wrote:
> > > I suspect the diveregence will be lost on most users though
> > > simply because they don't even care about vfio. They just
> > > want things to go fast.  
> > 
> > Like Jason said, VF isn't faster than virtio-net in all cases. It
> > depends on the workload and performance metrics: throughput, latency,
> > or packet per second.
> 
> So, will it be guest/admin-controllable then where the traffic flows
> through? Just because we do have a vf available after negotiation of
> the feature bit, it does not necessarily mean we want to use it? Do we
> (the guest) even want to make it visible in that case?

I think these ideas belong to what Alex Duyck wanted to do:
some kind of advanced device that isn't tied to
any network interfaces and allows workload and performance
specific tuning.

Way out of scope for a simple failover, and more importantly,
no one is looking at even enumerating the problems involved,
much less solving them.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 06:03:16PM +0200, Cornelia Huck wrote:
> Ok, that makes me conclude that we definitely need to involve the
> libvirt folks before we proceed further with defining QEMU interfaces.

That's always a wise thing to do.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Cornelia Huck
On Tue, 26 Jun 2018 18:38:51 +0300
"Michael S. Tsirkin"  wrote:

> On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote:
> > On Tue, 26 Jun 2018 04:50:25 +0300
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:  
> > > > > > > > Might not neccessarily be something wrong, but it's very 
> > > > > > > > limited to
> > > > > > > > prohibit the MAC of VF from changing when enslaved by failover. 
> > > > > > > >
> > > > > > > You mean guest changing MAC? I'm not sure why we prohibit that.   
> > > > > > >  
> > > > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > > > impression was that sync'ing the MAC address change between all 3
> > > > > > devices is challenging, as the failover driver uses MAC address to
> > > > > > match net_device internally.
> > > > 
> > > > Yes. The MAC address is assigned by the hypervisor and it needs to 
> > > > manage the movement
> > > > of the MAC between the PF and VF.  Allowing the guest to change the MAC 
> > > > will require
> > > > synchronization between the hypervisor and the PF/VF drivers. Most of 
> > > > the VF drivers
> > > > don't allow changing guest MAC unless it is a trusted VF.
> > > 
> > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> > > For example I can see host just
> > > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> > > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> > >   
> > 
> > So, what I get from this is that QEMU needs to be able to control all
> > of standby, uuid, and mac to accommodate the different setups
> > (respectively have libvirt/management software set it up). Is the host
> > able to find out respectively define whether a VF is trusted?  
> 
> You do it with ip link I think but QEMU doesn't normally do this,
> it relies on libvirt to poke at host kernel and supply the info.
> 

Ok, that makes me conclude that we definitely need to involve the
libvirt folks before we proceed further with defining QEMU interfaces.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote:
> On Tue, 26 Jun 2018 04:50:25 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
> > > > > > > Might not neccessarily be something wrong, but it's very limited 
> > > > > > > to
> > > > > > > prohibit the MAC of VF from changing when enslaved by failover.  
> > > > > > You mean guest changing MAC? I'm not sure why we prohibit that.  
> > > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > > impression was that sync'ing the MAC address change between all 3
> > > > > devices is challenging, as the failover driver uses MAC address to
> > > > > match net_device internally.  
> > > 
> > > Yes. The MAC address is assigned by the hypervisor and it needs to manage 
> > > the movement
> > > of the MAC between the PF and VF.  Allowing the guest to change the MAC 
> > > will require
> > > synchronization between the hypervisor and the PF/VF drivers. Most of the 
> > > VF drivers
> > > don't allow changing guest MAC unless it is a trusted VF.  
> > 
> > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> > For example I can see host just
> > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> > 
> 
> So, what I get from this is that QEMU needs to be able to control all
> of standby, uuid, and mac to accommodate the different setups
> (respectively have libvirt/management software set it up). Is the host
> able to find out respectively define whether a VF is trusted?

You do it with ip link I think but QEMU doesn't normally do this,
it relies on libvirt to poke at host kernel and supply the info.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Cornelia Huck
On Tue, 26 Jun 2018 04:50:25 +0300
"Michael S. Tsirkin"  wrote:

> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
> > > > > > Might not neccessarily be something wrong, but it's very limited to
> > > > > > prohibit the MAC of VF from changing when enslaved by failover.  
> > > > > You mean guest changing MAC? I'm not sure why we prohibit that.  
> > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > impression was that sync'ing the MAC address change between all 3
> > > > devices is challenging, as the failover driver uses MAC address to
> > > > match net_device internally.  
> > 
> > Yes. The MAC address is assigned by the hypervisor and it needs to manage 
> > the movement
> > of the MAC between the PF and VF.  Allowing the guest to change the MAC 
> > will require
> > synchronization between the hypervisor and the PF/VF drivers. Most of the 
> > VF drivers
> > don't allow changing guest MAC unless it is a trusted VF.  
> 
> OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> For example I can see host just
> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> 

So, what I get from this is that QEMU needs to be able to control all
of standby, uuid, and mac to accommodate the different setups
(respectively have libvirt/management software set it up). Is the host
able to find out respectively define whether a VF is trusted?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Cornelia Huck
On Fri, 22 Jun 2018 17:05:04 -0700
Siwei Liu  wrote:

> On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin  wrote:
> > I suspect the diveregence will be lost on most users though
> > simply because they don't even care about vfio. They just
> > want things to go fast.  
> 
> Like Jason said, VF isn't faster than virtio-net in all cases. It
> depends on the workload and performance metrics: throughput, latency,
> or packet per second.

So, will it be guest/admin-controllable then where the traffic flows
through? Just because we do have a vf available after negotiation of
the feature bit, it does not necessarily mean we want to use it? Do we
(the guest) even want to make it visible in that case?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 01:55:09PM +0200, Cornelia Huck wrote:
> On Tue, 26 Jun 2018 04:46:03 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Jun 25, 2018 at 11:55:12AM +0200, Cornelia Huck wrote:
> > > On Fri, 22 Jun 2018 22:05:50 +0300
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:  
> > > > > On Thu, 21 Jun 2018 21:20:13 +0300
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> > > > > > > OK, so what about the following:
> > > > > > > 
> > > > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that 
> > > > > > > indicates
> > > > > > >   that we have a new uuid field in the virtio-net config space
> > > > > > > - in QEMU, add a property for virtio-net that allows to specify a 
> > > > > > > uuid,
> > > > > > >   offer VIRTIO_NET_F_STANDBY_UUID if set
> > > > > > > - when configuring, set the property to the group UUID of the 
> > > > > > > vfio-pci
> > > > > > >   device
> > > > > > > - in the guest, use the uuid from the virtio-net device's config 
> > > > > > > space
> > > > > > >   if applicable; else, fall back to matching by MAC as done today
> > > > > > > 
> > > > > > > That should work for all virtio transports.  
> > > > > > 
> > > > > > True. I'm a bit unhappy that it's virtio net specific though
> > > > > > since down the road I expect we'll have a very similar feature
> > > > > > for scsi (and maybe others).
> > > > > > 
> > > > > > But we do not have a way to have fields that are portable
> > > > > > both across devices and transports, and I think it would
> > > > > > be a useful addition. How would this work though? Any idea?
> > > > > 
> > > > > Can we introduce some kind of device-independent config space area?
> > > > > Pushing back the device-specific config space by a certain value if 
> > > > > the
> > > > > appropriate feature is negotiated and use that for things like the 
> > > > > uuid?
> > > > 
> > > > So config moves back and forth?
> > > > Reminds me of the msi vector mess we had with pci.  
> > > 
> > > Yes, that would be a bit unfortunate.
> > >   
> > > > I'd rather have every transport add a new config.  
> > > 
> > > You mean via different mechanisms?  
> > 
> > I guess so.
> 
> Is there an alternate mechanism for pci to use? (Not so familiar with
> it.)

We have a device and transport config capability.
We could add a generic config capability too.

> For ccw, this needs more thought. We already introduced two commands
> for reading/writing the config space (a concept that does not really
> exist on s390). There's the generic read configuration data command,
> but the data returned by it is not really generic enough. So we would
> need one new command (or two, if we need to write as well). I'm not
> sure about that yet.
> 
> > 
> > > >   
> > > > > But regardless of that, I'm not sure whether extending this approach 
> > > > > to
> > > > > other device types is the way to go. Tying together two different
> > > > > devices is creating complicated situations at least in the hypervisor
> > > > > (even if it's fairly straightforward in the guest). [I have not come
> > > > > around again to look at the "how to handle visibility in QEMU"
> > > > > questions due to lack of cycles, sorry about that.]
> > > > > 
> > > > > So, what's the goal of this approach? Only to allow migration with
> > > > > vfio-pci, or also to plug in a faster device and use it instead of an
> > > > > already attached paravirtualized device?
> > > > 
> > > > These are two sides of the same coin, I think the second approach
> > > > is closer to what we are doing here.  
> > > 
> > > Thinking about it, do we need any knob to keep the vfio device
> > > invisible if the virtio device is not present? IOW, how does the
> > > hypervisor know that the vfio device is supposed to be paired with a
> > > virtio device? It seems we need an explicit tie-in.  
> > 
> > If we are going the way of the bridge, both bridge and
> > virtio would have some kind of id.
> 
> So the presence of the id would indicate "this is one part of a pair"?

I guess so, yes.

> > 
> > When pairing using mac, I'm less sure. PAss vfio device mac to qemu
> > as a property?
> 
> That feels a bit odd. "This is the vfio device's mac, use this instead
> of your usual mac property"? As we have not designed the QEMU interface
> yet, just go with the id in any case? The guest can still match by mac.

OK

> > > > > What about migration of vfio devices that are not easily replaced by a
> > > > > paravirtualized device? I'm thinking of vfio-ccw, where our main (and
> > > > > currently only) supported device is dasd (disks) -- which can do a lot
> > > > > of specialized things that virtio-blk does not support (and should not
> > > > > or even cannot support).
> > > > 
> > > > But maybe virtio-scsi can?  
> > > 
> > > I don't think so. Dasds have some channel commands 

Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > 
> 
> > > 
> > > > 
> > > > > + if (!arrays)
> > > > > + return NULL;
> > > > > +
> > > > > + for (i = 0; i < max_array_num; i++) {
> > > > So we are getting a ton of memory here just to free it up a bit later.
> > > > Why doesn't get_from_free_page_list get the pages from free list for us?
> > > > We could also avoid the 1st allocation then - just build a list
> > > > of these.
> > > That wouldn't be a good choice for us. If we check how the regular
> > > allocation works, there are many many things we need to consider when 
> > > pages
> > > are allocated to users.
> > > For example, we need to take care of the nr_free
> > > counter, we need to check the watermark and perform the related actions.
> > > Also the folks working on arch_alloc_page to monitor page allocation
> > > activities would get a surprise..if page allocation is allowed to work in
> > > this way.
> > > 
> > mm/ code is well positioned to handle all this correctly.
> 
> I'm afraid that would be a re-implementation of the alloc functions,

A re-factoring - you can share code. The main difference is locking.

> and
> that would be much more complex than what we have. I think your idea of
> passing a list of pages is better.
> 
> Best,
> Wei

How much memory is this allocating anyway?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-06-26 Thread Wei Wang

On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:

On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:








+   if (!arrays)
+   return NULL;
+
+   for (i = 0; i < max_array_num; i++) {

So we are getting a ton of memory here just to free it up a bit later.
Why doesn't get_from_free_page_list get the pages from free list for us?
We could also avoid the 1st allocation then - just build a list
of these.

That wouldn't be a good choice for us. If we check how the regular
allocation works, there are many many things we need to consider when pages
are allocated to users.
For example, we need to take care of the nr_free
counter, we need to check the watermark and perform the related actions.
Also the folks working on arch_alloc_page to monitor page allocation
activities would get a surprise..if page allocation is allowed to work in
this way.


mm/ code is well positioned to handle all this correctly.


I'm afraid that would be a re-implementation of the alloc functions, and 
that would be much more complex than what we have. I think your idea of 
passing a list of pages is better.


Best,
Wei






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-26 Thread Cornelia Huck
On Tue, 26 Jun 2018 04:46:03 +0300
"Michael S. Tsirkin"  wrote:

> On Mon, Jun 25, 2018 at 11:55:12AM +0200, Cornelia Huck wrote:
> > On Fri, 22 Jun 2018 22:05:50 +0300
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:  
> > > > On Thu, 21 Jun 2018 21:20:13 +0300
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> > > > > > OK, so what about the following:
> > > > > > 
> > > > > > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that 
> > > > > > indicates
> > > > > >   that we have a new uuid field in the virtio-net config space
> > > > > > - in QEMU, add a property for virtio-net that allows to specify a 
> > > > > > uuid,
> > > > > >   offer VIRTIO_NET_F_STANDBY_UUID if set
> > > > > > - when configuring, set the property to the group UUID of the 
> > > > > > vfio-pci
> > > > > >   device
> > > > > > - in the guest, use the uuid from the virtio-net device's config 
> > > > > > space
> > > > > >   if applicable; else, fall back to matching by MAC as done today
> > > > > > 
> > > > > > That should work for all virtio transports.  
> > > > > 
> > > > > True. I'm a bit unhappy that it's virtio net specific though
> > > > > since down the road I expect we'll have a very similar feature
> > > > > for scsi (and maybe others).
> > > > > 
> > > > > But we do not have a way to have fields that are portable
> > > > > both across devices and transports, and I think it would
> > > > > be a useful addition. How would this work though? Any idea?
> > > > 
> > > > Can we introduce some kind of device-independent config space area?
> > > > Pushing back the device-specific config space by a certain value if the
> > > > appropriate feature is negotiated and use that for things like the 
> > > > uuid?
> > > 
> > > So config moves back and forth?
> > > Reminds me of the msi vector mess we had with pci.  
> > 
> > Yes, that would be a bit unfortunate.
> >   
> > > I'd rather have every transport add a new config.  
> > 
> > You mean via different mechanisms?  
> 
> I guess so.

Is there an alternate mechanism for pci to use? (Not so familiar with
it.)

For ccw, this needs more thought. We already introduced two commands
for reading/writing the config space (a concept that does not really
exist on s390). There's the generic read configuration data command,
but the data returned by it is not really generic enough. So we would
need one new command (or two, if we need to write as well). I'm not
sure about that yet.

> 
> > >   
> > > > But regardless of that, I'm not sure whether extending this approach to
> > > > other device types is the way to go. Tying together two different
> > > > devices is creating complicated situations at least in the hypervisor
> > > > (even if it's fairly straightforward in the guest). [I have not come
> > > > around again to look at the "how to handle visibility in QEMU"
> > > > questions due to lack of cycles, sorry about that.]
> > > > 
> > > > So, what's the goal of this approach? Only to allow migration with
> > > > vfio-pci, or also to plug in a faster device and use it instead of an
> > > > already attached paravirtualized device?
> > > 
> > > These are two sides of the same coin, I think the second approach
> > > is closer to what we are doing here.  
> > 
> > Thinking about it, do we need any knob to keep the vfio device
> > invisible if the virtio device is not present? IOW, how does the
> > hypervisor know that the vfio device is supposed to be paired with a
> > virtio device? It seems we need an explicit tie-in.  
> 
> If we are going the way of the bridge, both bridge and
> virtio would have some kind of id.

So the presence of the id would indicate "this is one part of a pair"?

> 
> When pairing using mac, I'm less sure. PAss vfio device mac to qemu
> as a property?

That feels a bit odd. "This is the vfio device's mac, use this instead
of your usual mac property"? As we have not designed the QEMU interface
yet, just go with the id in any case? The guest can still match by mac.

> > > > What about migration of vfio devices that are not easily replaced by a
> > > > paravirtualized device? I'm thinking of vfio-ccw, where our main (and
> > > > currently only) supported device is dasd (disks) -- which can do a lot
> > > > of specialized things that virtio-blk does not support (and should not
> > > > or even cannot support).
> > > 
> > > But maybe virtio-scsi can?  
> > 
> > I don't think so. Dasds have some channel commands that don't map
> > easily to scsi commands.  
> 
> There's always a choice of adding these to the spec.
> E.g. FC extensions were proposed, I don't remember why they
> are still stuck.

FC extensions are a completely different kind of enhancements, though.
For a start, they are not unique to a certain transport.

Also, we have a whole list of special dasd issues. Weird disk layout
for eckd, low-level disk 

Re: [PATCH v6 3/3] x86: paravirt: make native_save_fl extern inline

2018-06-26 Thread Ingo Molnar


* Nick Desaulniers  wrote:

> On Thu, Jun 21, 2018 at 7:24 PM Ingo Molnar  wrote:
> > * Nick Desaulniers  wrote:
> >
> > > native_save_fl() is marked static inline, but by using it as
> > > a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.
> >
> > > --- a/arch/x86/include/asm/irqflags.h
> > > +++ b/arch/x86/include/asm/irqflags.h
> > > @@ -13,7 +13,7 @@
> > >   * Interrupt control:
> > >   */
> > >
> > > -static inline unsigned long native_save_fl(void)
> > > +extern inline unsigned long native_save_fl(void)
> > >  {
> > >   unsigned long flags;
> > >
> >
> > What's the code generation effect of this on say a defconfig kernel vmlinux 
> > with
> > paravirt enabled?
> 
> Starting with this patch set applied:
> $ make CC=gcc-8 -j46
> $ objdump -d vmlinux | grep native_save_fl --context=3
> 81059140 :
> 81059140: 9cpushfq
> 81059141: 58pop%rax
> 81059142: c3retq
> $ git checkout HEAD~3
> $ make CC=gcc-8 -j46
> $ objdump -d vmlinux | grep native_save_fl --context=3
> 81079410 :
> 81079410: 9cpushfq
> 81079411: 58pop%rax
> 81079412: c3retq
> 
> Mainly, this is to prevent the compiler from adding a stack protector
> to the outlined version, as the stack protector clobbers %rcx, but
> paravirt expects %rcx to be preserved. More info can be found:
> https://lkml.org/lkml/2018/5/24/1242--

Ok!

Acked-by: Ingo Molnar 

What's the planned upstreaming route for these patches/fixes?

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization