Re: [RFC/PATCH 1/1] virtio: Introduce MMIO ops

2020-04-30 Thread Jan Kiszka

On 30.04.20 13:11, Srivatsa Vaddagiri wrote:

* Will Deacon  [2020-04-30 11:41:50]:


On Thu, Apr 30, 2020 at 04:04:46PM +0530, Srivatsa Vaddagiri wrote:

If CONFIG_VIRTIO_MMIO_OPS is defined, then I expect this to be unconditionally
set to 'magic_qcom_ops' that uses hypervisor-supported interface for IO (for
example: message_queue_send() and message_queue_recevie() hypercalls).


Hmm, but then how would such a kernel work as a guest under all the
spec-compliant hypervisors out there?


Ok I see your point and yes for better binary compatibility, the ops have to be
set based on runtime detection of hypervisor capabilities.


Ok. I guess the other option is to standardize on a new virtio transport (like
ivshmem2-virtio)?


I haven't looked at that, but I suppose it depends on what your hypervisor
folks are willing to accomodate.


I believe ivshmem2_virtio requires hypervisor to support PCI device emulation
(for life-cycle management of VMs), which our hypervisor may not support. A
simple shared memory and doorbell or message-queue based transport will work for
us.


As written in our private conversation, a mapping of the ivshmem2 device 
discovery to platform mechanism (device tree etc.) and maybe even the 
register access for doorbell and life-cycle management to something 
hypercall-like would be imaginable. What would count more from virtio 
perspective is a common mapping on a shared memory transport.


That said, I also warned about all the features that PCI already defined 
(such as message-based interrupts) which you may have to add when going 
a different way for the shared memory device.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Jan Kiszka

On 29.04.20 12:45, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote:

On 29.04.20 12:20, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:

That would still not work I think where swiotlb is used for pass-thr devices
(when private memory is fine) as well as virtio devices (when shared memory is
required).


So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.



Definitely. That's the model we have for ivshmem-virtio as well.

Jan


Want to try implementing that?



The desire is definitely there, currently "just" not the time.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/5] virtio: Add bounce DMA ops

2020-04-29 Thread Jan Kiszka

On 29.04.20 12:20, Michael S. Tsirkin wrote:

On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:

That would still not work I think where swiotlb is used for pass-thr devices
(when private memory is fine) as well as virtio devices (when shared memory is
required).


So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.



Definitely. That's the model we have for ivshmem-virtio as well.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: VIRTIO adoption in other hypervisors

2020-02-28 Thread Jan Kiszka

On 28.02.20 17:47, Alex Bennée wrote:


Jan Kiszka  writes:


On 28.02.20 11:30, Jan Kiszka wrote:

On 28.02.20 11:16, Alex Bennée wrote:

Hi,




I believe there has been some development work for supporting VIRTIO on
Xen although it seems to have stalled according to:

https://wiki.xenproject.org/wiki/Virtio_On_Xen

Recently at KVM Forum there was Jan's talk about Inter-VM shared memory
which proposed ivshmemv2 as a VIRTIO transport:

https://events19.linuxfoundation.org/events/kvm-forum-2019/program/schedule/


As I understood it this would allow Xen (and other hypervisors) a simple
way to be able to carry virtio traffic between guest and end point.


And to clarify the scope of this effort: virtio-over-ivshmem is not
the fastest option to offer virtio to a guest (static "DMA" window),
but it is the simplest one from the hypervisor PoV and, thus, also
likely the easiest one to argue over when it comes to security and
safety.


So to drill down on this is this a particular problem with type-1
hypervisors?


Well, this typing doesn't help here (like it rarely does). There are 
kvm-based setups that are stripped down and hardened in a way where 
other folks would rather think of "type 1". I just had a discussion 
around such a model for a cloud scenario that runs on kvm.




It seems to me any KVM-like run loop trivially supports a range of
virtio devices by virtue of trapping accesses to the signalling area of
a virtqueue and allowing the VMM to handle the transaction which ever
way it sees fit.

I've not quite understood the way Xen interfaces to QEMU aside from it's
different to everything else. More over it seems the type-1 hypervisors
are more interested in providing better isolation between segments of a
system whereas VIRTIO currently assumes either the VMM or the hypervisor
has full access the full guest address space. I've seen quite a lot of
slides that want to isolate sections of device emulation to separate
processes or even separate guest VMs.


The point is in fact not only whether to trap IO accesses or to ask the 
guest to rather target something like ivshmem (in fact, that is where 
use cases I have in mind deviated from those of that cloud operator). It 
is specifically the question how the backend should be able to transfer 
data to/from the frontend. If you want to isolate the both from each 
other (driver VMs/domains/etc.), you either need a complex virtual IOMMU 
(or "grant tables") or a static DMA windows (like ivshmem). The former 
is more efficient with large transfers, the latter is much simpler and 
therefore more robust.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: VIRTIO adoption in other hypervisors

2020-02-28 Thread Jan Kiszka

On 28.02.20 11:30, Jan Kiszka wrote:

On 28.02.20 11:16, Alex Bennée wrote:

Hi,

I'm currently trying to get my head around virtio and was wondering how
widespread adoption of virtio is amongst the various hypervisors and
emulators out there.

Obviously I'm familiar with QEMU both via KVM and even when just doing
plain emulation (although with some restrictions). As far as I'm aware
the various Rust based VMMs have vary degrees of support for virtio
devices over KVM as well. CrosVM specifically is embracing virtio for
multi-process device emulation.

I believe there has been some development work for supporting VIRTIO on
Xen although it seems to have stalled according to:

   https://wiki.xenproject.org/wiki/Virtio_On_Xen

Recently at KVM Forum there was Jan's talk about Inter-VM shared memory
which proposed ivshmemv2 as a VIRTIO transport:

   
https://events19.linuxfoundation.org/events/kvm-forum-2019/program/schedule/ 



As I understood it this would allow Xen (and other hypervisors) a simple
way to be able to carry virtio traffic between guest and end point.


And to clarify the scope of this effort: virtio-over-ivshmem is not the 
fastest option to offer virtio to a guest (static "DMA" window), but it 
is the simplest one from the hypervisor PoV and, thus, also likely the 
easiest one to argue over when it comes to security and safety.


Jan



So some questions:

   - Am I missing anything out in that summary?
   - How about HyperV and the OSX equivalent?
   - Do any other type-1 hypervisors support virtio?


 From the top of my head, some other hypervisors with virtio support 
(irrespective of any classification):


https://wiki.freebsd.org/bhyve
https://projectacrn.org/
http://www.xhypervisor.org/
https://www.opensynergy.com/automotive-hypervisor/

But there are likely more.

Jan



--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: VIRTIO adoption in other hypervisors

2020-02-28 Thread Jan Kiszka

On 28.02.20 11:16, Alex Bennée wrote:

Hi,

I'm currently trying to get my head around virtio and was wondering how
widespread adoption of virtio is amongst the various hypervisors and
emulators out there.

Obviously I'm familiar with QEMU both via KVM and even when just doing
plain emulation (although with some restrictions). As far as I'm aware
the various Rust based VMMs have vary degrees of support for virtio
devices over KVM as well. CrosVM specifically is embracing virtio for
multi-process device emulation.

I believe there has been some development work for supporting VIRTIO on
Xen although it seems to have stalled according to:

   https://wiki.xenproject.org/wiki/Virtio_On_Xen

Recently at KVM Forum there was Jan's talk about Inter-VM shared memory
which proposed ivshmemv2 as a VIRTIO transport:

   https://events19.linuxfoundation.org/events/kvm-forum-2019/program/schedule/

As I understood it this would allow Xen (and other hypervisors) a simple
way to be able to carry virtio traffic between guest and end point.

So some questions:

   - Am I missing anything out in that summary?
   - How about HyperV and the OSX equivalent?
   - Do any other type-1 hypervisors support virtio?


From the top of my head, some other hypervisors with virtio support 
(irrespective of any classification):


https://wiki.freebsd.org/bhyve
https://projectacrn.org/
http://www.xhypervisor.org/
https://www.opensynergy.com/automotive-hypervisor/

But there are likely more.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] tools/virtio: Fix build

2019-10-13 Thread Jan Kiszka
On 13.10.19 14:20, Michael S. Tsirkin wrote:
> On Sun, Oct 13, 2019 at 02:01:03PM +0200, Jan Kiszka wrote:
>> On 13.10.19 13:52, Michael S. Tsirkin wrote:
>>> On Sun, Oct 13, 2019 at 11:03:30AM +0200, Jan Kiszka wrote:
>>>> From: Jan Kiszka 
>>>>
>>>> Various changes in the recent kernel versions broke the build due to
>>>> missing function and header stubs.
>>>>
>>>> Signed-off-by: Jan Kiszka 
>>>
>>> Thanks!
>>> I think it's already fixes in the vhost tree.
>>> That tree also includes a bugfix for the test.
>>> Can you pls give it a spin and report?
>>
>> Mostly fixed: the xen_domain stup is missing.
>>
>> Jan
>
> That's in xen/xen.h. Do you still see any build errors?

ca16cf7b30ca79eeca4d612af121e664ee7d8737 lacks this - forgot to add to
some commit?

Jan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] tools/virtio: Fix build

2019-10-13 Thread Jan Kiszka
On 13.10.19 13:52, Michael S. Tsirkin wrote:
> On Sun, Oct 13, 2019 at 11:03:30AM +0200, Jan Kiszka wrote:
>> From: Jan Kiszka 
>>
>> Various changes in the recent kernel versions broke the build due to
>> missing function and header stubs.
>>
>> Signed-off-by: Jan Kiszka 
>
> Thanks!
> I think it's already fixes in the vhost tree.
> That tree also includes a bugfix for the test.
> Can you pls give it a spin and report?

Mostly fixed: the xen_domain stup is missing.

Jan

> Thanks!
>
>> ---
>>  tools/virtio/crypto/hash.h   | 0
>>  tools/virtio/linux/dma-mapping.h | 2 ++
>>  tools/virtio/linux/kernel.h  | 2 ++
>>  3 files changed, 4 insertions(+)
>>  create mode 100644 tools/virtio/crypto/hash.h
>>
>> diff --git a/tools/virtio/crypto/hash.h b/tools/virtio/crypto/hash.h
>> new file mode 100644
>> index ..e69de29bb2d1
>> diff --git a/tools/virtio/linux/dma-mapping.h 
>> b/tools/virtio/linux/dma-mapping.h
>> index f91aeb5fe571..db96cb4bf877 100644
>> --- a/tools/virtio/linux/dma-mapping.h
>> +++ b/tools/virtio/linux/dma-mapping.h
>> @@ -29,4 +29,6 @@ enum dma_data_direction {
>>  #define dma_unmap_single(...) do { } while (0)
>>  #define dma_unmap_page(...) do { } while (0)
>>
>> +#define dma_max_mapping_size(d) 0
>> +
>>  #endif
>> diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
>> index 6683b4a70b05..ccf321173210 100644
>> --- a/tools/virtio/linux/kernel.h
>> +++ b/tools/virtio/linux/kernel.h
>> @@ -141,4 +141,6 @@ static inline void free_page(unsigned long addr)
>>  #define list_for_each_entry(a, b, c) while (0)
>>  /* end of stubs */
>>
>> +#define xen_domain() 0
>> +
>>  #endif /* KERNEL_H */
>> --
>> 2.16.4
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] tools/virtio: Fix build

2019-10-13 Thread Jan Kiszka
From: Jan Kiszka 

Various changes in the recent kernel versions broke the build due to
missing function and header stubs.

Signed-off-by: Jan Kiszka 
---
 tools/virtio/crypto/hash.h   | 0
 tools/virtio/linux/dma-mapping.h | 2 ++
 tools/virtio/linux/kernel.h  | 2 ++
 3 files changed, 4 insertions(+)
 create mode 100644 tools/virtio/crypto/hash.h

diff --git a/tools/virtio/crypto/hash.h b/tools/virtio/crypto/hash.h
new file mode 100644
index ..e69de29bb2d1
diff --git a/tools/virtio/linux/dma-mapping.h b/tools/virtio/linux/dma-mapping.h
index f91aeb5fe571..db96cb4bf877 100644
--- a/tools/virtio/linux/dma-mapping.h
+++ b/tools/virtio/linux/dma-mapping.h
@@ -29,4 +29,6 @@ enum dma_data_direction {
 #define dma_unmap_single(...) do { } while (0)
 #define dma_unmap_page(...) do { } while (0)

+#define dma_max_mapping_size(d)0
+
 #endif
diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
index 6683b4a70b05..ccf321173210 100644
--- a/tools/virtio/linux/kernel.h
+++ b/tools/virtio/linux/kernel.h
@@ -141,4 +141,6 @@ static inline void free_page(unsigned long addr)
 #define list_for_each_entry(a, b, c) while (0)
 /* end of stubs */

+#define xen_domain() 0
+
 #endif /* KERNEL_H */
--
2.16.4

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 3/7] x86/jailhouse: Enable PCI mmconfig access in inmates

2018-03-06 Thread Jan Kiszka
From: Otavio Pontes <otavio.pon...@intel.com>

Use the PCI mmconfig base address exported by jailhouse in boot
parameters in order to access the memory mapped PCI configuration space.

Signed-off-by: Otavio Pontes <otavio.pon...@intel.com>
[Jan: rebased, fixed !CONFIG_PCI_MMCONFIG, used pcibios_last_bus]
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com>
---
 arch/x86/include/asm/pci_x86.h | 2 ++
 arch/x86/kernel/jailhouse.c| 8 
 arch/x86/pci/mmconfig-shared.c | 4 ++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index eb66fa9cd0fc..959d618dbb17 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, 
u8 start, u8 end,
   phys_addr_t addr);
 extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+   int end, u64 addr);
 
 extern struct list_head pci_mmcfg_list;
 
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index b68fd895235a..fa183a131edc 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -124,6 +124,14 @@ static int __init jailhouse_pci_arch_init(void)
if (pcibios_last_bus < 0)
pcibios_last_bus = 0xff;
 
+#ifdef CONFIG_PCI_MMCONFIG
+   if (setup_data.pci_mmconfig_base) {
+   pci_mmconfig_add(0, 0, pcibios_last_bus,
+setup_data.pci_mmconfig_base);
+   pci_mmcfg_arch_init();
+   }
+#endif
+
return 0;
 }
 
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 96684d0adcf9..0e590272366b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int 
segment, int start,
return new;
 }
 
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
-   int end, u64 addr)
+struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+int end, u64 addr)
 {
struct pci_mmcfg_region *new;
 
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 5/7] x86: Consolidate PCI_MMCONFIG configs

2018-03-06 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), we
have two PCI_MMCONFIG entries, one from the original i386 and another
from x86_64. This consolidates both entries into a single one.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c19f5342ec2b..8986a6b6e3df 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2641,8 +2641,10 @@ config PCI_DIRECT
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC 
|| PCI_GOMMCONFIG))
 
 config PCI_MMCONFIG
-   def_bool y
-   depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || 
PCI_GOANY)
+   bool "Support mmconfig PCI config space access" if X86_64
+   default y
+   depends on PCI && (ACPI || SFI)
+   depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)
 
 config PCI_OLPC
def_bool y
@@ -2657,11 +2659,6 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
-config PCI_MMCONFIG
-   bool "Support mmconfig PCI config space access"
-   default y
-   depends on X86_64 && PCI && (ACPI || SFI)
-
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 0/7] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI

2018-03-06 Thread Jan Kiszka
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).

Key elements of this series are:
 - detection of Jailhouse via device tree hypervisor node
 - function-level PCI scan if Jailhouse is detected
 - MMCONFIG support for x86 guests

As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.

Changes in v5:
 - fix build breakage of patch 6 on i386

Changes in v4:
 - slit up Kconfig changes
 - respect pcibios_last_bus during mmconfig setup
 - cosmetic changes requested by Andy

Changes in v3:
 - avoided duplicate scans of PCI functions under Jailhouse
 - reformated PCI_MMCONFIG condition and rephrase related commit log

Changes in v2:
 - adjusted commit log and include ordering in patch 2
 - rebased over Linus master

Jan

[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org

CC: Benedikt Spranger <b.spran...@linutronix.de>
CC: Juergen Gross <jgr...@suse.com>
CC: Mark Rutland <mark.rutl...@arm.com>
CC: Otavio Pontes <otavio.pon...@intel.com>
CC: Rob Herring <robh...@kernel.org>

Jan Kiszka (6):
  jailhouse: Provide detection for non-x86 systems
  PCI: Scan all functions when running over Jailhouse
  x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
  x86: Consolidate PCI_MMCONFIG configs
  x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
  MAINTAINERS: Add entry for Jailhouse

Otavio Pontes (1):
  x86/jailhouse: Enable PCI mmconfig access in inmates

 Documentation/devicetree/bindings/jailhouse.txt |  8 
 MAINTAINERS |  7 +++
 arch/x86/Kconfig| 12 +++-
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 arch/x86/include/asm/pci_x86.h  |  2 ++
 arch/x86/kernel/Makefile|  2 +-
 arch/x86/kernel/cpu/amd.c   |  2 +-
 arch/x86/kernel/jailhouse.c |  8 
 arch/x86/pci/legacy.c   |  4 +++-
 arch/x86/pci/mmconfig-shared.c  |  4 ++--
 drivers/pci/probe.c | 22 +++---
 include/linux/hypervisor.h  | 17 +++--
 12 files changed, 74 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 6/7] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI

2018-03-06 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, we
need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI,
instead of just the former.

Saves some bytes in the Jailhouse non-root kernel.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig  | 6 +-
 arch/x86/kernel/Makefile  | 2 +-
 arch/x86/kernel/cpu/amd.c | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8986a6b6e3df..b53340e71f84 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2643,7 +2643,7 @@ config PCI_DIRECT
 config PCI_MMCONFIG
bool "Support mmconfig PCI config space access" if X86_64
default y
-   depends on PCI && (ACPI || SFI)
+   depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST)
depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)
 
 config PCI_OLPC
@@ -2659,6 +2659,10 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
+config MMCONF_FAM10H
+   def_bool y
+   depends on X86_64 && PCI_MMCONFIG && ACPI
+
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 29786c87e864..73ccf80c09a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
 
-   obj-$(CONFIG_PCI_MMCONFIG)  += mmconf-fam10h_64.o
+   obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
obj-y   += vsmp_64.o
 endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f0e6456ca7d3..12bc0a1139da 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
 
 static void init_amd_gh(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_MMCONF_FAM10H
/* do this for boot cpu */
if (c == _cpu_data)
check_enable_amd_mmconf_dmi();
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 4/7] x86: Align x86_64 PCI_MMCONFIG with 32-bit variant

2018-03-06 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Allow to enable PCI_MMCONFIG when only SFI is present and make this
option default on. This will help consolidating both into one Kconfig
statement.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb7f43f23521..c19f5342ec2b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2659,7 +2659,8 @@ config PCI_DOMAINS
 
 config PCI_MMCONFIG
bool "Support mmconfig PCI config space access"
-   depends on X86_64 && PCI && ACPI
+   default y
+   depends on X86_64 && PCI && (ACPI || SFI)
 
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 2/7] PCI: Scan all functions when running over Jailhouse

2018-03-06 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
have a function 0.  Therefore, Linux scans for devices at function 0
(devfn 0/8/16/...) and only scans for other functions if function 0
has its Multi-Function Device bit set or ARI or SR-IOV indicate
there are more functions.

The Jailhouse hypervisor may pass individual functions of a
multi-function device to a guest without passing function 0, which
means a Linux guest won't find them.

Change Linux PCI probing so it scans all function numbers when
running as a guest over Jailhouse.

This is technically prohibited by the spec, so it is possible that
PCI devices without the Multi-Function Device bit set may have
unexpected behavior in response to this probe.

Derived from original patch by Benedikt Spranger.

CC: Benedikt Spranger <b.spran...@linutronix.de>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
Acked-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com>
---
 arch/x86/pci/legacy.c |  4 +++-
 drivers/pci/probe.c   | 22 +++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 1cb01abcb1be..dfbe6ac38830 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -34,13 +35,14 @@ int __init pci_legacy_init(void)
 
 void pcibios_scan_specific_bus(int busn)
 {
+   int stride = jailhouse_paravirt() ? 1 : 8;
int devfn;
u32 l;
 
if (pci_find_bus(0, busn))
return;
 
-   for (devfn = 0; devfn < 256; devfn += 8) {
+   for (devfn = 0; devfn < 256; devfn += stride) {
if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
l != 0x && l != 0x) {
DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, 
l);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ef5377438a1e..3c365dc996e7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "pci.h"
@@ -2518,14 +2519,29 @@ static unsigned int pci_scan_child_bus_extend(struct 
pci_bus *bus,
 {
unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
unsigned int start = bus->busn_res.start;
-   unsigned int devfn, cmax, max = start;
+   unsigned int devfn, fn, cmax, max = start;
struct pci_dev *dev;
+   int nr_devs;
 
dev_dbg(>dev, "scanning bus\n");
 
/* Go find them, Rover! */
-   for (devfn = 0; devfn < 0x100; devfn += 8)
-   pci_scan_slot(bus, devfn);
+   for (devfn = 0; devfn < 256; devfn += 8) {
+   nr_devs = pci_scan_slot(bus, devfn);
+
+   /*
+* The Jailhouse hypervisor may pass individual functions of a
+* multi-function device to a guest without passing function 0.
+* Look for them as well.
+*/
+   if (jailhouse_paravirt() && nr_devs == 0) {
+   for (fn = 1; fn < 8; fn++) {
+   dev = pci_scan_single_device(bus, devfn + fn);
+   if (dev)
+   dev->multifunction = 1;
+   }
+   }
+   }
 
/* Reserve buses for SR-IOV capability */
used_buses = pci_iov_bus_range(bus);
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 1/7] jailhouse: Provide detection for non-x86 systems

2018-03-06 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.

CC: Rob Herring <robh...@kernel.org>
CC: Mark Rutland <mark.rutl...@arm.com>
CC: Juergen Gross <jgr...@suse.com>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
Reviewed-by: Juergen Gross <jgr...@suse.com>
---
 Documentation/devicetree/bindings/jailhouse.txt |  8 
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 include/linux/hypervisor.h  | 17 +++--
 3 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

diff --git a/Documentation/devicetree/bindings/jailhouse.txt 
b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index ..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h 
b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL2.0 */
 
 /*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
  *
  * Copyright (c) Siemens AG, 2015-2017
  *
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
  */
 
 #ifdef CONFIG_X86
+
+#include 
 #include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
x86_platform.hyper.pin_vcpu(cpu);
 }
-#else
+
+#else /* !CONFIG_X86 */
+
+#include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
 }
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+   return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
 
 #endif /* __LINUX_HYPEVISOR_H */
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 7/7] MAINTAINERS: Add entry for Jailhouse

2018-03-06 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4623caf8d72d..6dc0b8f3ae0e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7523,6 +7523,13 @@ Q:   
http://patchwork.linuxtv.org/project/linux-media/list/
 S: Maintained
 F: drivers/media/dvb-frontends/ix2505v*
 
+JAILHOUSE HYPERVISOR INTERFACE
+M:     Jan Kiszka <jan.kis...@siemens.com>
+L: jailhouse-...@googlegroups.com
+S: Maintained
+F: arch/x86/kernel/jailhouse.c
+F: arch/x86/include/asm/jailhouse_para.h
+
 JC42.4 TEMPERATURE SENSOR DRIVER
 M: Guenter Roeck <li...@roeck-us.net>
 L: linux-hw...@vger.kernel.org
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 2/7] PCI: Scan all functions when running over Jailhouse

2018-03-04 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
have a function 0.  Therefore, Linux scans for devices at function 0
(devfn 0/8/16/...) and only scans for other functions if function 0
has its Multi-Function Device bit set or ARI or SR-IOV indicate
there are more functions.

The Jailhouse hypervisor may pass individual functions of a
multi-function device to a guest without passing function 0, which
means a Linux guest won't find them.

Change Linux PCI probing so it scans all function numbers when
running as a guest over Jailhouse.

This is technically prohibited by the spec, so it is possible that
PCI devices without the Multi-Function Device bit set may have
unexpected behavior in response to this probe.

Derived from original patch by Benedikt Spranger.

CC: Benedikt Spranger <b.spran...@linutronix.de>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
Acked-by: Bjorn Helgaas <bhelg...@google.com>
Reviewed-by: Andy Shevchenko <andy.shevche...@gmail.com>
---
 arch/x86/pci/legacy.c |  4 +++-
 drivers/pci/probe.c   | 22 +++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 1cb01abcb1be..dfbe6ac38830 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -34,13 +35,14 @@ int __init pci_legacy_init(void)
 
 void pcibios_scan_specific_bus(int busn)
 {
+   int stride = jailhouse_paravirt() ? 1 : 8;
int devfn;
u32 l;
 
if (pci_find_bus(0, busn))
return;
 
-   for (devfn = 0; devfn < 256; devfn += 8) {
+   for (devfn = 0; devfn < 256; devfn += stride) {
if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
l != 0x && l != 0x) {
DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, 
l);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ef5377438a1e..3c365dc996e7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "pci.h"
@@ -2518,14 +2519,29 @@ static unsigned int pci_scan_child_bus_extend(struct 
pci_bus *bus,
 {
unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
unsigned int start = bus->busn_res.start;
-   unsigned int devfn, cmax, max = start;
+   unsigned int devfn, fn, cmax, max = start;
struct pci_dev *dev;
+   int nr_devs;
 
dev_dbg(>dev, "scanning bus\n");
 
/* Go find them, Rover! */
-   for (devfn = 0; devfn < 0x100; devfn += 8)
-   pci_scan_slot(bus, devfn);
+   for (devfn = 0; devfn < 256; devfn += 8) {
+   nr_devs = pci_scan_slot(bus, devfn);
+
+   /*
+* The Jailhouse hypervisor may pass individual functions of a
+* multi-function device to a guest without passing function 0.
+* Look for them as well.
+*/
+   if (jailhouse_paravirt() && nr_devs == 0) {
+   for (fn = 1; fn < 8; fn++) {
+   dev = pci_scan_single_device(bus, devfn + fn);
+   if (dev)
+   dev->multifunction = 1;
+   }
+   }
+   }
 
/* Reserve buses for SR-IOV capability */
used_buses = pci_iov_bus_range(bus);
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 0/7] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI

2018-03-04 Thread Jan Kiszka
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).

Key elements of this series are:
 - detection of Jailhouse via device tree hypervisor node
 - function-level PCI scan if Jailhouse is detected
 - MMCONFIG support for x86 guests

As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.

Changes in v4:
 - slit up Kconfig changes
 - respect pcibios_last_bus during mmconfig setup
 - cosmetic changes requested by Andy

Changes in v3:
 - avoided duplicate scans of PCI functions under Jailhouse
 - reformated PCI_MMCONFIG condition and rephrase related commit log

Changes in v2:
 - adjusted commit log and include ordering in patch 2
 - rebased over Linus master

Jan

[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org

CC: Benedikt Spranger <b.spran...@linutronix.de>
CC: Juergen Gross <jgr...@suse.com>
CC: Mark Rutland <mark.rutl...@arm.com>
CC: Otavio Pontes <otavio.pon...@intel.com>
CC: Rob Herring <robh...@kernel.org>

Jan Kiszka (6):
  jailhouse: Provide detection for non-x86 systems
  PCI: Scan all functions when running over Jailhouse
  x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
  x86: Consolidate PCI_MMCONFIG configs
  x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
  MAINTAINERS: Add entry for Jailhouse

Otavio Pontes (1):
  x86/jailhouse: Enable PCI mmconfig access in inmates

 Documentation/devicetree/bindings/jailhouse.txt |  8 
 MAINTAINERS |  7 +++
 arch/x86/Kconfig| 12 +++-
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 arch/x86/include/asm/pci_x86.h  |  2 ++
 arch/x86/kernel/Makefile|  2 +-
 arch/x86/kernel/cpu/amd.c   |  2 +-
 arch/x86/kernel/jailhouse.c |  8 
 arch/x86/pci/legacy.c   |  4 +++-
 arch/x86/pci/mmconfig-shared.c  |  4 ++--
 drivers/pci/probe.c | 22 +++---
 include/linux/hypervisor.h  | 17 +++--
 12 files changed, 74 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 3/7] x86/jailhouse: Enable PCI mmconfig access in inmates

2018-03-04 Thread Jan Kiszka
From: Otavio Pontes <otavio.pon...@intel.com>

Use the PCI mmconfig base address exported by jailhouse in boot
parameters in order to access the memory mapped PCI configuration space.

Signed-off-by: Otavio Pontes <otavio.pon...@intel.com>
[Jan: rebased, fixed !CONFIG_PCI_MMCONFIG, used pcibios_last_bus]
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/include/asm/pci_x86.h | 2 ++
 arch/x86/kernel/jailhouse.c| 8 
 arch/x86/pci/mmconfig-shared.c | 4 ++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index eb66fa9cd0fc..959d618dbb17 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, 
u8 start, u8 end,
   phys_addr_t addr);
 extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+   int end, u64 addr);
 
 extern struct list_head pci_mmcfg_list;
 
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index b68fd895235a..fa183a131edc 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -124,6 +124,14 @@ static int __init jailhouse_pci_arch_init(void)
if (pcibios_last_bus < 0)
pcibios_last_bus = 0xff;
 
+#ifdef CONFIG_PCI_MMCONFIG
+   if (setup_data.pci_mmconfig_base) {
+   pci_mmconfig_add(0, 0, pcibios_last_bus,
+setup_data.pci_mmconfig_base);
+   pci_mmcfg_arch_init();
+   }
+#endif
+
return 0;
 }
 
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 96684d0adcf9..0e590272366b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int 
segment, int start,
return new;
 }
 
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
-   int end, u64 addr)
+struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+int end, u64 addr)
 {
struct pci_mmcfg_region *new;
 
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 4/7] x86: Align x86_64 PCI_MMCONFIG with 32-bit variant

2018-03-04 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Allow to enable PCI_MMCONFIG when only SFI is present and make this
option default on. This will help consolidating both into one Kconfig
statement.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb7f43f23521..c19f5342ec2b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2659,7 +2659,8 @@ config PCI_DOMAINS
 
 config PCI_MMCONFIG
bool "Support mmconfig PCI config space access"
-   depends on X86_64 && PCI && ACPI
+   default y
+   depends on X86_64 && PCI && (ACPI || SFI)
 
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 6/7] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI

2018-03-04 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, we
need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI,
instead of just the former.

Saves some bytes in the Jailhouse non-root kernel.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig  | 6 +-
 arch/x86/kernel/Makefile  | 2 +-
 arch/x86/kernel/cpu/amd.c | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8986a6b6e3df..08a3236cb6f2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2643,7 +2643,7 @@ config PCI_DIRECT
 config PCI_MMCONFIG
bool "Support mmconfig PCI config space access" if X86_64
default y
-   depends on PCI && (ACPI || SFI)
+   depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST)
depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)
 
 config PCI_OLPC
@@ -2659,6 +2659,10 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
+config MMCONF_FAM10H
+   def_bool y
+   depends on PCI_MMCONFIG && ACPI
+
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 29786c87e864..73ccf80c09a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
 
-   obj-$(CONFIG_PCI_MMCONFIG)  += mmconf-fam10h_64.o
+   obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
obj-y   += vsmp_64.o
 endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f0e6456ca7d3..12bc0a1139da 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
 
 static void init_amd_gh(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_MMCONF_FAM10H
/* do this for boot cpu */
if (c == _cpu_data)
check_enable_amd_mmconf_dmi();
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 5/7] x86: Consolidate PCI_MMCONFIG configs

2018-03-04 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), we
have two PCI_MMCONFIG entries, one from the original i386 and another
from x86_64. This consolidates both entries into a single one.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c19f5342ec2b..8986a6b6e3df 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2641,8 +2641,10 @@ config PCI_DIRECT
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC 
|| PCI_GOMMCONFIG))
 
 config PCI_MMCONFIG
-   def_bool y
-   depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || 
PCI_GOANY)
+   bool "Support mmconfig PCI config space access" if X86_64
+   default y
+   depends on PCI && (ACPI || SFI)
+   depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)
 
 config PCI_OLPC
def_bool y
@@ -2657,11 +2659,6 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
-config PCI_MMCONFIG
-   bool "Support mmconfig PCI config space access"
-   default y
-   depends on X86_64 && PCI && (ACPI || SFI)
-
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 1/7] jailhouse: Provide detection for non-x86 systems

2018-03-04 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.

CC: Rob Herring <robh...@kernel.org>
CC: Mark Rutland <mark.rutl...@arm.com>
CC: Juergen Gross <jgr...@suse.com>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
Reviewed-by: Juergen Gross <jgr...@suse.com>
---
 Documentation/devicetree/bindings/jailhouse.txt |  8 
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 include/linux/hypervisor.h  | 17 +++--
 3 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

diff --git a/Documentation/devicetree/bindings/jailhouse.txt 
b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index ..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h 
b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL2.0 */
 
 /*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
  *
  * Copyright (c) Siemens AG, 2015-2017
  *
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
  */
 
 #ifdef CONFIG_X86
+
+#include 
 #include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
x86_platform.hyper.pin_vcpu(cpu);
 }
-#else
+
+#else /* !CONFIG_X86 */
+
+#include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
 }
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+   return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
 
 #endif /* __LINUX_HYPEVISOR_H */
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 7/7] MAINTAINERS: Add entry for Jailhouse

2018-03-04 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4623caf8d72d..6dc0b8f3ae0e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7523,6 +7523,13 @@ Q:   
http://patchwork.linuxtv.org/project/linux-media/list/
 S: Maintained
 F: drivers/media/dvb-frontends/ix2505v*
 
+JAILHOUSE HYPERVISOR INTERFACE
+M:     Jan Kiszka <jan.kis...@siemens.com>
+L: jailhouse-...@googlegroups.com
+S: Maintained
+F: arch/x86/kernel/jailhouse.c
+F: arch/x86/include/asm/jailhouse_para.h
+
 JC42.4 TEMPERATURE SENSOR DRIVER
 M: Guenter Roeck <li...@roeck-us.net>
 L: linux-hw...@vger.kernel.org
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates

2018-03-02 Thread Jan Kiszka
On 2018-03-01 11:31, Andy Shevchenko wrote:
> On Thu, Mar 1, 2018 at 7:40 AM, Jan Kiszka <jan.kis...@siemens.com> wrote:
> 
>> Use the PCI mmconfig base address exported by jailhouse in boot
>> parameters in order to access the memory mapped PCI configuration space.
> 
> 
>> --- a/arch/x86/kernel/jailhouse.c
>> +++ b/arch/x86/kernel/jailhouse.c
>> @@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void)
>> if (pcibios_last_bus < 0)
>> pcibios_last_bus = 0xff;
>>
>> +#ifdef CONFIG_PCI_MMCONFIG
>> +   if (setup_data.pci_mmconfig_base) {
> 
>> +   pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base);
> 
> Hmm... Shouldn't be pcibios_last_bus instead of 0xff?
> 

Indeed.

Thanks,
Jan

>> +   pci_mmcfg_arch_init();
>> +   }
>> +#endif
> 
-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 5/6] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI

2018-02-28 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, we
need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI,
instead of just the former.

Saves some bytes in the Jailhouse non-root kernel.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig  | 6 +-
 arch/x86/kernel/Makefile  | 2 +-
 arch/x86/kernel/cpu/amd.c | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index aef9d67ac186..b8e73e748acc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2643,7 +2643,7 @@ config PCI_DIRECT
 config PCI_MMCONFIG
bool "Support mmconfig PCI config space access" if X86_64
default y
-   depends on PCI && (ACPI || SFI) && (X86_64 || (PCI_GOANY || 
PCI_GOMMCONFIG))
+   depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) && (X86_64 || 
(PCI_GOANY || PCI_GOMMCONFIG))
 
 config PCI_OLPC
def_bool y
@@ -2658,6 +2658,10 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
+config MMCONF_FAM10H
+   def_bool y
+   depends on PCI_MMCONFIG && ACPI
+
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 29786c87e864..73ccf80c09a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
 
-   obj-$(CONFIG_PCI_MMCONFIG)  += mmconf-fam10h_64.o
+   obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
obj-y   += vsmp_64.o
 endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f0e6456ca7d3..12bc0a1139da 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
 
 static void init_amd_gh(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_MMCONF_FAM10H
/* do this for boot cpu */
if (c == _cpu_data)
check_enable_amd_mmconf_dmi();
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/6] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI

2018-02-28 Thread Jan Kiszka
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).

Key elements of this series are:
 - detection of Jailhouse via device tree hypervisor node
 - function-level PCI scan if Jailhouse is detected
 - MMCONFIG support for x86 guests

As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.

Changes in v3:
 - avoided duplicate scans of PCI functions under Jailhouse
 - reformated PCI_MMCONFIG condition and rephrase related commit log

Changes in v2:
 - adjusted commit log and include ordering in patch 2
 - rebased over Linus master

Jan

[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org

CC: Benedikt Spranger <b.spran...@linutronix.de>
CC: Mark Rutland <mark.rutl...@arm.com>
CC: Otavio Pontes <otavio.pon...@intel.com>
CC: Rob Herring <robh...@kernel.org>

Jan Kiszka (5):
  jailhouse: Provide detection for non-x86 systems
  PCI: Scan all functions when running over Jailhouse
  x86: Consolidate PCI_MMCONFIG configs
  x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
  MAINTAINERS: Add entry for Jailhouse

Otavio Pontes (1):
  x86/jailhouse: Enable PCI mmconfig access in inmates

 Documentation/devicetree/bindings/jailhouse.txt |  8 
 MAINTAINERS |  7 +++
 arch/x86/Kconfig| 11 ++-
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 arch/x86/include/asm/pci_x86.h  |  2 ++
 arch/x86/kernel/Makefile|  2 +-
 arch/x86/kernel/cpu/amd.c   |  2 +-
 arch/x86/kernel/jailhouse.c |  7 +++
 arch/x86/pci/legacy.c   |  4 +++-
 arch/x86/pci/mmconfig-shared.c  |  4 ++--
 drivers/pci/probe.c | 22 +++---
 include/linux/hypervisor.h  | 17 +++--
 12 files changed, 72 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 4/6] x86: Consolidate PCI_MMCONFIG configs

2018-02-28 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Since e279b6c1d329 ("x86: start unification of arch/x86/Kconfig.*"), we
have two PCI_MMCONFIG entries, one from the original i386 and another
from x86_64. This consolidates both entries into a single one.

The logic for x86_32, where this option was not under user control,
remains identical. On x86_64, PCI_MMCONFIG becomes additionally
configurable for SFI systems even if ACPI was disabled. This just
simplifies the logic without restricting the configurability in any way.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb7f43f23521..aef9d67ac186 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2641,8 +2641,9 @@ config PCI_DIRECT
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC 
|| PCI_GOMMCONFIG))
 
 config PCI_MMCONFIG
-   def_bool y
-   depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || 
PCI_GOANY)
+   bool "Support mmconfig PCI config space access" if X86_64
+   default y
+   depends on PCI && (ACPI || SFI) && (X86_64 || (PCI_GOANY || 
PCI_GOMMCONFIG))
 
 config PCI_OLPC
def_bool y
@@ -2657,10 +2658,6 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
-config PCI_MMCONFIG
-   bool "Support mmconfig PCI config space access"
-   depends on X86_64 && PCI && ACPI
-
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 1/6] jailhouse: Provide detection for non-x86 systems

2018-02-28 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.

CC: Rob Herring <robh...@kernel.org>
CC: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 Documentation/devicetree/bindings/jailhouse.txt |  8 
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 include/linux/hypervisor.h  | 17 +++--
 3 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

diff --git a/Documentation/devicetree/bindings/jailhouse.txt 
b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index ..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h 
b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL2.0 */
 
 /*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
  *
  * Copyright (c) Siemens AG, 2015-2017
  *
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
  */
 
 #ifdef CONFIG_X86
+
+#include 
 #include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
x86_platform.hyper.pin_vcpu(cpu);
 }
-#else
+
+#else /* !CONFIG_X86 */
+
+#include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
 }
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+   return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
 
 #endif /* __LINUX_HYPEVISOR_H */
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates

2018-02-28 Thread Jan Kiszka
From: Otavio Pontes <otavio.pon...@intel.com>

Use the PCI mmconfig base address exported by jailhouse in boot
parameters in order to access the memory mapped PCI configuration space.

Signed-off-by: Otavio Pontes <otavio.pon...@intel.com>
[Jan: rebased, fixed !CONFIG_PCI_MMCONFIG]
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/include/asm/pci_x86.h | 2 ++
 arch/x86/kernel/jailhouse.c| 7 +++
 arch/x86/pci/mmconfig-shared.c | 4 ++--
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index eb66fa9cd0fc..959d618dbb17 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, 
u8 start, u8 end,
   phys_addr_t addr);
 extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+   int end, u64 addr);
 
 extern struct list_head pci_mmcfg_list;
 
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index b68fd895235a..7fe2a73da0b3 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void)
if (pcibios_last_bus < 0)
pcibios_last_bus = 0xff;
 
+#ifdef CONFIG_PCI_MMCONFIG
+   if (setup_data.pci_mmconfig_base) {
+   pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base);
+   pci_mmcfg_arch_init();
+   }
+#endif
+
return 0;
 }
 
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 96684d0adcf9..0e590272366b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int 
segment, int start,
return new;
 }
 
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
-   int end, u64 addr)
+struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+int end, u64 addr)
 {
struct pci_mmcfg_region *new;
 
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 2/6] PCI: Scan all functions when running over Jailhouse

2018-02-28 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
have a function 0.  Therefore, Linux scans for devices at function 0
(devfn 0/8/16/...) and only scans for other functions if function 0
has its Multi-Function Device bit set or ARI or SR-IOV indicate
there are more functions.

The Jailhouse hypervisor may pass individual functions of a
multi-function device to a guest without passing function 0, which
means a Linux guest won't find them.

Change Linux PCI probing so it scans all function numbers when
running as a guest over Jailhouse.

This is technically prohibited by the spec, so it is possible that
PCI devices without the Multi-Function Device bit set may have
unexpected behavior in response to this probe.

Derived from original patch by Benedikt Spranger.

CC: Benedikt Spranger <b.spran...@linutronix.de>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
Acked-by: Bjorn Helgaas <bhelg...@google.com>
---
 arch/x86/pci/legacy.c |  4 +++-
 drivers/pci/probe.c   | 22 +++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 1cb01abcb1be..dfbe6ac38830 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -34,13 +35,14 @@ int __init pci_legacy_init(void)
 
 void pcibios_scan_specific_bus(int busn)
 {
+   int stride = jailhouse_paravirt() ? 1 : 8;
int devfn;
u32 l;
 
if (pci_find_bus(0, busn))
return;
 
-   for (devfn = 0; devfn < 256; devfn += 8) {
+   for (devfn = 0; devfn < 256; devfn += stride) {
if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
l != 0x && l != 0x) {
DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, 
l);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ef5377438a1e..da22d6d216f8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "pci.h"
@@ -2518,14 +2519,29 @@ static unsigned int pci_scan_child_bus_extend(struct 
pci_bus *bus,
 {
unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
unsigned int start = bus->busn_res.start;
-   unsigned int devfn, cmax, max = start;
+   unsigned int devfn, fn, cmax, max = start;
struct pci_dev *dev;
+   int nr_devs;
 
dev_dbg(>dev, "scanning bus\n");
 
/* Go find them, Rover! */
-   for (devfn = 0; devfn < 0x100; devfn += 8)
-   pci_scan_slot(bus, devfn);
+   for (devfn = 0; devfn < 0x100; devfn += 8) {
+   nr_devs = pci_scan_slot(bus, devfn);
+
+   /*
+* The Jailhouse hypervisor may pass individual functions of a
+* multi-function device to a guest without passing function 0.
+* Look for them as well.
+*/
+   if (jailhouse_paravirt() && nr_devs == 0) {
+   for (fn = 1; fn < 8; fn++) {
+   dev = pci_scan_single_device(bus, devfn + fn);
+   if (dev)
+   dev->multifunction = 1;
+   }
+   }
+   }
 
/* Reserve buses for SR-IOV capability */
used_buses = pci_iov_bus_range(bus);
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 6/6] MAINTAINERS: Add entry for Jailhouse

2018-02-28 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 93a12af4f180..4b889f282c77 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7521,6 +7521,13 @@ Q:   
http://patchwork.linuxtv.org/project/linux-media/list/
 S: Maintained
 F: drivers/media/dvb-frontends/ix2505v*
 
+JAILHOUSE HYPERVISOR INTERFACE
+M:     Jan Kiszka <jan.kis...@siemens.com>
+L: jailhouse-...@googlegroups.com
+S: Maintained
+F: arch/x86/kernel/jailhouse.c
+F: arch/x86/include/asm/jailhouse_para.h
+
 JC42.4 TEMPERATURE SENSOR DRIVER
 M: Guenter Roeck <li...@roeck-us.net>
 L: linux-hw...@vger.kernel.org
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/6] PCI: Scan all functions when running over Jailhouse

2018-02-28 Thread Jan Kiszka
On 2018-02-28 09:44, Thomas Gleixner wrote:
> On Wed, 28 Feb 2018, Jan Kiszka wrote:
> 
>> From: Jan Kiszka <jan.kis...@siemens.com>
>>
>> Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
>> have a function 0.  Therefore, Linux scans for devices at function 0
>> (devfn 0/8/16/...) and only scans for other functions if function 0
>> has its Multi-Function Device bit set or ARI or SR-IOV indicate
>> there are more functions.
>>
>> The Jailhouse hypervisor may pass individual functions of a
>> multi-function device to a guest without passing function 0, which
>> means a Linux guest won't find them.
>>
>> Change Linux PCI probing so it scans all function numbers when
>> running as a guest over Jailhouse.
> 
>>  void pcibios_scan_specific_bus(int busn)
>>  {
>> +int stride = jailhouse_paravirt() ? 1 : 8;
>>  int devfn;
>>  u32 l;
>>  
>>  if (pci_find_bus(0, busn))
>>  return;
>>  
>> -for (devfn = 0; devfn < 256; devfn += 8) {
>> +for (devfn = 0; devfn < 256; devfn += stride) {
>>  if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
>>  l != 0x && l != 0x) {
>>  DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, 
>> l);
> 
> Shouldn't that take the situation into account where the MFD bit is set on
> a regular devfn, i.e. (devfn % 8) == 0? In that case you'd scan the
> subfunctions twice.

Good point, and it also applies to pci_scan_child_bus_extend. Will add
some filters.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/6] jailhouse: Provide detection for non-x86 systems

2018-02-27 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.

CC: Rob Herring <robh...@kernel.org>
CC: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 Documentation/devicetree/bindings/jailhouse.txt |  8 
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 include/linux/hypervisor.h  | 17 +++--
 3 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

diff --git a/Documentation/devicetree/bindings/jailhouse.txt 
b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index ..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h 
b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL2.0 */
 
 /*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
  *
  * Copyright (c) Siemens AG, 2015-2017
  *
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
  */
 
 #ifdef CONFIG_X86
+
+#include 
 #include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
x86_platform.hyper.pin_vcpu(cpu);
 }
-#else
+
+#else /* !CONFIG_X86 */
+
+#include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
 }
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+   return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
 
 #endif /* __LINUX_HYPEVISOR_H */
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/6] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI

2018-02-27 Thread Jan Kiszka
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).

Key elements of this series are:
 - detection of Jailhouse via device tree hypervisor node
 - function-level PCI scan if Jailhouse is detected
 - MMCONFIG support for x86 guests

As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.

Changes in v2:
 - adjusted commit log and include ordering in patch 2
 - rebased over Linus master

Jan

[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org

CC: Benedikt Spranger <b.spran...@linutronix.de>
CC: Mark Rutland <mark.rutl...@arm.com>
CC: Otavio Pontes <otavio.pon...@intel.com>
CC: Rob Herring <robh...@kernel.org>

Jan Kiszka (5):
  jailhouse: Provide detection for non-x86 systems
  PCI: Scan all functions when running over Jailhouse
  x86: Consolidate PCI_MMCONFIG configs
  x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
  MAINTAINERS: Add entry for Jailhouse

Otavio Pontes (1):
  x86/jailhouse: Enable PCI mmconfig access in inmates

 Documentation/devicetree/bindings/jailhouse.txt |  8 
 MAINTAINERS |  7 +++
 arch/x86/Kconfig| 11 ++-
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 arch/x86/include/asm/pci_x86.h  |  2 ++
 arch/x86/kernel/Makefile|  2 +-
 arch/x86/kernel/cpu/amd.c   |  2 +-
 arch/x86/kernel/jailhouse.c |  7 +++
 arch/x86/pci/legacy.c   |  4 +++-
 arch/x86/pci/mmconfig-shared.c  |  4 ++--
 drivers/pci/probe.c |  4 +++-
 include/linux/hypervisor.h  | 17 +++--
 12 files changed, 56 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates

2018-02-27 Thread Jan Kiszka
From: Otavio Pontes <otavio.pon...@intel.com>

Use the PCI mmconfig base address exported by jailhouse in boot
parameters in order to access the memory mapped PCI configuration space.

Signed-off-by: Otavio Pontes <otavio.pon...@intel.com>
[Jan: rebased, fixed !CONFIG_PCI_MMCONFIG]
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/include/asm/pci_x86.h | 2 ++
 arch/x86/kernel/jailhouse.c| 7 +++
 arch/x86/pci/mmconfig-shared.c | 4 ++--
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index eb66fa9cd0fc..959d618dbb17 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, 
u8 start, u8 end,
   phys_addr_t addr);
 extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+   int end, u64 addr);
 
 extern struct list_head pci_mmcfg_list;
 
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index b68fd895235a..7fe2a73da0b3 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void)
if (pcibios_last_bus < 0)
pcibios_last_bus = 0xff;
 
+#ifdef CONFIG_PCI_MMCONFIG
+   if (setup_data.pci_mmconfig_base) {
+   pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base);
+   pci_mmcfg_arch_init();
+   }
+#endif
+
return 0;
 }
 
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 96684d0adcf9..0e590272366b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int 
segment, int start,
return new;
 }
 
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
-   int end, u64 addr)
+struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+int end, u64 addr)
 {
struct pci_mmcfg_region *new;
 
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 4/6] x86: Consolidate PCI_MMCONFIG configs

2018-02-27 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Not sure if those two worked by design or just by chance so far. In any
case, it's at least cleaner and clearer to express this in a single
config statement.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb7f43f23521..63e85e7da12e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2641,8 +2641,9 @@ config PCI_DIRECT
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC 
|| PCI_GOMMCONFIG))
 
 config PCI_MMCONFIG
-   def_bool y
-   depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || 
PCI_GOANY)
+   bool "Support mmconfig PCI config space access" if X86_64
+   default y
+   depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || 
X86_64)
 
 config PCI_OLPC
def_bool y
@@ -2657,10 +2658,6 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
-config PCI_MMCONFIG
-   bool "Support mmconfig PCI config space access"
-   depends on X86_64 && PCI && ACPI
-
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/6] PCI: Scan all functions when running over Jailhouse

2018-02-27 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
have a function 0.  Therefore, Linux scans for devices at function 0
(devfn 0/8/16/...) and only scans for other functions if function 0
has its Multi-Function Device bit set or ARI or SR-IOV indicate
there are more functions.

The Jailhouse hypervisor may pass individual functions of a
multi-function device to a guest without passing function 0, which
means a Linux guest won't find them.

Change Linux PCI probing so it scans all function numbers when
running as a guest over Jailhouse.

This is technically prohibited by the spec, so it is possible that
PCI devices without the Multi-Function Device bit set may have
unexpected behavior in response to this probe.

Based on patch by Benedikt Spranger, adding Jailhouse probing to avoid
changing the behavior in the absence of the hypervisor.

CC: Benedikt Spranger <b.spran...@linutronix.de>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
Acked-by: Bjorn Helgaas <bhelg...@google.com>
---
 arch/x86/pci/legacy.c | 4 +++-
 drivers/pci/probe.c   | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 1cb01abcb1be..dfbe6ac38830 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -34,13 +35,14 @@ int __init pci_legacy_init(void)
 
 void pcibios_scan_specific_bus(int busn)
 {
+   int stride = jailhouse_paravirt() ? 1 : 8;
int devfn;
u32 l;
 
if (pci_find_bus(0, busn))
return;
 
-   for (devfn = 0; devfn < 256; devfn += 8) {
+   for (devfn = 0; devfn < 256; devfn += stride) {
if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
l != 0x && l != 0x) {
DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, 
l);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ef5377438a1e..ce728251ae36 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "pci.h"
@@ -2517,6 +2518,7 @@ static unsigned int pci_scan_child_bus_extend(struct 
pci_bus *bus,
  unsigned int available_buses)
 {
unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
+   unsigned int stride = jailhouse_paravirt() ? 1 : 8;
unsigned int start = bus->busn_res.start;
unsigned int devfn, cmax, max = start;
struct pci_dev *dev;
@@ -2524,7 +2526,7 @@ static unsigned int pci_scan_child_bus_extend(struct 
pci_bus *bus,
dev_dbg(>dev, "scanning bus\n");
 
/* Go find them, Rover! */
-   for (devfn = 0; devfn < 0x100; devfn += 8)
+   for (devfn = 0; devfn < 0x100; devfn += stride)
pci_scan_slot(bus, devfn);
 
/* Reserve buses for SR-IOV capability */
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 5/6] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI

2018-02-27 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, we
need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI,
instead of just the former.

Saves some bytes in the Jailhouse non-root kernel.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig  | 6 +-
 arch/x86/kernel/Makefile  | 2 +-
 arch/x86/kernel/cpu/amd.c | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 63e85e7da12e..5b0ac52e357a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2643,7 +2643,7 @@ config PCI_DIRECT
 config PCI_MMCONFIG
bool "Support mmconfig PCI config space access" if X86_64
default y
-   depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || 
X86_64)
+   depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) && (PCI_GOMMCONFIG 
|| PCI_GOANY || X86_64)
 
 config PCI_OLPC
def_bool y
@@ -2658,6 +2658,10 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
+config MMCONF_FAM10H
+   def_bool y
+   depends on PCI_MMCONFIG && ACPI
+
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 29786c87e864..73ccf80c09a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -146,6 +146,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
 
-   obj-$(CONFIG_PCI_MMCONFIG)  += mmconf-fam10h_64.o
+   obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
obj-y   += vsmp_64.o
 endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f0e6456ca7d3..12bc0a1139da 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -716,7 +716,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
 
 static void init_amd_gh(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_MMCONF_FAM10H
/* do this for boot cpu */
if (c == _cpu_data)
check_enable_amd_mmconf_dmi();
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 6/6] MAINTAINERS: Add entry for Jailhouse

2018-02-27 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 93a12af4f180..4b889f282c77 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7521,6 +7521,13 @@ Q:   
http://patchwork.linuxtv.org/project/linux-media/list/
 S: Maintained
 F: drivers/media/dvb-frontends/ix2505v*
 
+JAILHOUSE HYPERVISOR INTERFACE
+M:     Jan Kiszka <jan.kis...@siemens.com>
+L: jailhouse-...@googlegroups.com
+S: Maintained
+F: arch/x86/kernel/jailhouse.c
+F: arch/x86/include/asm/jailhouse_para.h
+
 JC42.4 TEMPERATURE SENSOR DRIVER
 M: Guenter Roeck <li...@roeck-us.net>
 L: linux-hw...@vger.kernel.org
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/6] pci: Scan all functions when probing while running over Jailhouse

2018-02-26 Thread Jan Kiszka
On 2018-02-22 21:57, Bjorn Helgaas wrote:
> On Mon, Jan 22, 2018 at 07:12:46AM +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kis...@siemens.com>
>>
>> PCI and PCIBIOS probing only scans devices at function number 0/8/16/...
>> Subdevices (e.g. multiqueue) have function numbers which are not a
>> multiple of 8.
> 
> Suggested text:
> 
>   Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to
>   have a function 0.  Therefore, Linux scans for devices at function 0
>   (devfn 0/8/16/...) and only scans for other functions if function 0
>   has its Multi-Function Device bit set or ARI or SR-IOV indicate
>   there are more functions.
>   
>   The Jailhouse hypervisor may pass individual functions of a
>   multi-function device to a guest without passing function 0, which
>   means a Linux guest won't find them.
> 
>   Change Linux PCI probing so it scans all function numbers when
>   running as a guest over Jailhouse.
>   
>   This is technically prohibited by the spec, so it is possible that
>   PCI devices without the Multi-Function Device bit set may have
>   unexpected behavior in response to this probe.
> 
>> The simple hypervisor Jailhouse passes subdevices directly w/o providing
>> a virtual PCI topology like KVM. As a consequence a PCI passthrough from
>> Jailhouse to a guest will not be detected by Linux.
>>
>> Based on patch by Benedikt Spranger, adding Jailhouse probing to avoid
>> changing the behavior in the absence of the hypervisor.
>>
>> CC: Benedikt Spranger <b.spran...@linutronix.de>
>> Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
> 
> With subject change to:
> 
>   PCI: Scan all functions when running over Jailhouse
> 
> Acked-by: Bjorn Helgaas <bhelg...@google.com>
> 

Thanks, all suggestions picked up for next round.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/6] pci: Scan all functions when probing while running over Jailhouse

2018-02-26 Thread Jan Kiszka
On 2018-02-23 14:23, Andy Shevchenko wrote:
> On Mon, Jan 22, 2018 at 8:12 AM, Jan Kiszka <jan.kis...@siemens.com> wrote:
> 
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> Keep it in order?
> 

Done.

> 
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> Ditto.
> 

Despite the context suggesting it, this file has no ordering.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/6] x86: Consolidate PCI_MMCONFIG configs

2018-02-26 Thread Jan Kiszka
On 2018-01-28 18:26, Andy Shevchenko wrote:
> On Mon, Jan 22, 2018 at 8:12 AM, Jan Kiszka <jan.kis...@siemens.com> wrote:
>> From: Jan Kiszka <jan.kis...@siemens.com>
>>
>> Not sure if those two worked by design or just by chance so far. In any
>> case, it's at least cleaner and clearer to express this in a single
>> config statement.
> 
> Congrats! You found by the way a bug in
> 
> commit e279b6c1d329e50b766bce96aacc197eae8a053b
> Author: Sam Ravnborg <s...@ravnborg.org>
> Date:   Tue Nov 6 20:41:05 2007 +0100
> 
>x86: start unification of arch/x86/Kconfig.*
> 
> ...and proper fix seems to split PCI stuff to common + X86_32 only + X86_64 
> only
> 

Hmm, is that a change request on this patch?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 6/6] MAINTAINERS: Add entry for Jailhouse

2018-01-21 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 426ba037d943..dd51a2012b36 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7468,6 +7468,13 @@ Q:   
http://patchwork.linuxtv.org/project/linux-media/list/
 S: Maintained
 F: drivers/media/dvb-frontends/ix2505v*
 
+JAILHOUSE HYPERVISOR INTERFACE
+M:     Jan Kiszka <jan.kis...@siemens.com>
+L: jailhouse-...@googlegroups.com
+S: Maintained
+F: arch/x86/kernel/jailhouse.c
+F: arch/x86/include/asm/jailhouse_para.h
+
 JC42.4 TEMPERATURE SENSOR DRIVER
 M: Guenter Roeck <li...@roeck-us.net>
 L: linux-hw...@vger.kernel.org
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/6] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI

2018-01-21 Thread Jan Kiszka
Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
is currently pending in the tip tree. This builds on top and enhances
the PCI support for x86 and also ARM guests (ARM[64] does not require
platform patches and works already).

Key elements of this series are:
 - detection of Jailhouse via device tree hypervisor node
 - function-level PCI scan if Jailhouse is detected
 - MMCONFIG support for x86 guests

As most changes affect x86, I would suggest to route the series also via
tip after the necessary acks are collected.

Jan

[1] https://lkml.org/lkml/2017/11/27/125
[2] http://jailhouse-project.org

CC: Benedikt Spranger <b.spran...@linutronix.de>
CC: Mark Rutland <mark.rutl...@arm.com>
CC: Otavio Pontes <otavio.pon...@intel.com>
CC: Rob Herring <robh...@kernel.org>

Jan Kiszka (5):
  jailhouse: Provide detection for non-x86 systems
  pci: Scan all functions when probing while running over Jailhouse
  x86: Consolidate PCI_MMCONFIG configs
  x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
  MAINTAINERS: Add entry for Jailhouse

Otavio Pontes (1):
  x86/jailhouse: Enable PCI mmconfig access in inmates

 Documentation/devicetree/bindings/jailhouse.txt |  8 
 MAINTAINERS |  7 +++
 arch/x86/Kconfig| 11 ++-
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 arch/x86/include/asm/pci_x86.h  |  2 ++
 arch/x86/kernel/Makefile|  2 +-
 arch/x86/kernel/cpu/amd.c   |  2 +-
 arch/x86/kernel/jailhouse.c |  7 +++
 arch/x86/pci/legacy.c   |  4 +++-
 arch/x86/pci/mmconfig-shared.c  |  4 ++--
 drivers/pci/probe.c |  4 +++-
 include/linux/hypervisor.h  | 17 +++--
 12 files changed, 56 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 5/6] x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI

2018-01-21 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Jailhouse does not use ACPI, but it does support MMCONFIG. Make sure the
latter can be built without having to enable ACPI as well. Primarily, we
need to make the AMD mmconf-fam10h_64 depend upon MMCONFIG and ACPI,
instead of just the former.

Saves some bytes in the Jailhouse non-root kernel.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig  | 6 +-
 arch/x86/kernel/Makefile  | 2 +-
 arch/x86/kernel/cpu/amd.c | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f2038417a590..77ba0eb0a258 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2597,7 +2597,7 @@ config PCI_DIRECT
 config PCI_MMCONFIG
bool "Support mmconfig PCI config space access" if X86_64
default y
-   depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || 
X86_64)
+   depends on PCI && (ACPI || SFI || JAILHOUSE_GUEST) && (PCI_GOMMCONFIG 
|| PCI_GOANY || X86_64)
 
 config PCI_OLPC
def_bool y
@@ -2612,6 +2612,10 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
+config MMCONF_FAM10H
+   def_bool y
+   depends on PCI_MMCONFIG && ACPI
+
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index aed9296dccd3..b2c9e230e2fe 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -143,6 +143,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_GART_IOMMU)+= amd_gart_64.o aperture_64.o
obj-$(CONFIG_CALGARY_IOMMU) += pci-calgary_64.o tce_64.o
 
-   obj-$(CONFIG_PCI_MMCONFIG)  += mmconf-fam10h_64.o
+   obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
obj-y   += vsmp_64.o
 endif
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ea831c858195..47edf599f6fd 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -690,7 +690,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c)
 
 static void init_amd_gh(struct cpuinfo_x86 *c)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_MMCONF_FAM10H
/* do this for boot cpu */
if (c == _cpu_data)
check_enable_amd_mmconf_dmi();
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4/6] x86: Consolidate PCI_MMCONFIG configs

2018-01-21 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Not sure if those two worked by design or just by chance so far. In any
case, it's at least cleaner and clearer to express this in a single
config statement.

Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/Kconfig | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 423e4b64e683..f2038417a590 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2595,8 +2595,9 @@ config PCI_DIRECT
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC 
|| PCI_GOMMCONFIG))
 
 config PCI_MMCONFIG
-   def_bool y
-   depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || 
PCI_GOANY)
+   bool "Support mmconfig PCI config space access" if X86_64
+   default y
+   depends on PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY || 
X86_64)
 
 config PCI_OLPC
def_bool y
@@ -2611,10 +2612,6 @@ config PCI_DOMAINS
def_bool y
depends on PCI
 
-config PCI_MMCONFIG
-   bool "Support mmconfig PCI config space access"
-   depends on X86_64 && PCI && ACPI
-
 config PCI_CNB20LE_QUIRK
bool "Read CNB20LE Host Bridge Windows" if EXPERT
depends on PCI
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/6] x86/jailhouse: Enable PCI mmconfig access in inmates

2018-01-21 Thread Jan Kiszka
From: Otavio Pontes <otavio.pon...@intel.com>

Use the PCI mmconfig base address exported by jailhouse in boot
parameters in order to access the memory mapped PCI configuration space.

Signed-off-by: Otavio Pontes <otavio.pon...@intel.com>
[Jan: rebased, fixed !CONFIG_PCI_MMCONFIG]
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/include/asm/pci_x86.h | 2 ++
 arch/x86/kernel/jailhouse.c| 7 +++
 arch/x86/pci/mmconfig-shared.c | 4 ++--
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index eb66fa9cd0fc..959d618dbb17 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -151,6 +151,8 @@ extern int pci_mmconfig_insert(struct device *dev, u16 seg, 
u8 start, u8 end,
   phys_addr_t addr);
 extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
 extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
+extern struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+   int end, u64 addr);
 
 extern struct list_head pci_mmcfg_list;
 
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index b68fd895235a..7fe2a73da0b3 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -124,6 +124,13 @@ static int __init jailhouse_pci_arch_init(void)
if (pcibios_last_bus < 0)
pcibios_last_bus = 0xff;
 
+#ifdef CONFIG_PCI_MMCONFIG
+   if (setup_data.pci_mmconfig_base) {
+   pci_mmconfig_add(0, 0, 0xff, setup_data.pci_mmconfig_base);
+   pci_mmcfg_arch_init();
+   }
+#endif
+
return 0;
 }
 
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 96684d0adcf9..0e590272366b 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -94,8 +94,8 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int 
segment, int start,
return new;
 }
 
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
-   int end, u64 addr)
+struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
+int end, u64 addr)
 {
struct pci_mmcfg_region *new;
 
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/6] jailhouse: Provide detection for non-x86 systems

2018-01-21 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

Implement jailhouse_paravirt() via device tree probing on architectures
!= x86. Will be used by the PCI core.

CC: Rob Herring <robh...@kernel.org>
CC: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 Documentation/devicetree/bindings/jailhouse.txt |  8 
 arch/x86/include/asm/jailhouse_para.h   |  2 +-
 include/linux/hypervisor.h  | 17 +++--
 3 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/jailhouse.txt

diff --git a/Documentation/devicetree/bindings/jailhouse.txt 
b/Documentation/devicetree/bindings/jailhouse.txt
new file mode 100644
index ..2901c25ff340
--- /dev/null
+++ b/Documentation/devicetree/bindings/jailhouse.txt
@@ -0,0 +1,8 @@
+Jailhouse non-root cell device tree bindings
+
+
+When running in a non-root Jailhouse cell (partition), the device tree of this
+platform shall have a top-level "hypervisor" node with the following
+properties:
+
+- compatible = "jailhouse,cell"
diff --git a/arch/x86/include/asm/jailhouse_para.h 
b/arch/x86/include/asm/jailhouse_para.h
index 875b54376689..b885a961a150 100644
--- a/arch/x86/include/asm/jailhouse_para.h
+++ b/arch/x86/include/asm/jailhouse_para.h
@@ -1,7 +1,7 @@
 /* SPDX-License-Identifier: GPL2.0 */
 
 /*
- * Jailhouse paravirt_ops implementation
+ * Jailhouse paravirt detection
  *
  * Copyright (c) Siemens AG, 2015-2017
  *
diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h
index b19563f9a8eb..fc08b433c856 100644
--- a/include/linux/hypervisor.h
+++ b/include/linux/hypervisor.h
@@ -8,15 +8,28 @@
  */
 
 #ifdef CONFIG_X86
+
+#include 
 #include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
x86_platform.hyper.pin_vcpu(cpu);
 }
-#else
+
+#else /* !CONFIG_X86 */
+
+#include 
+
 static inline void hypervisor_pin_vcpu(int cpu)
 {
 }
-#endif
+
+static inline bool jailhouse_paravirt(void)
+{
+   return of_find_compatible_node(NULL, NULL, "jailhouse,cell");
+}
+
+#endif /* !CONFIG_X86 */
 
 #endif /* __LINUX_HYPEVISOR_H */
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/6] pci: Scan all functions when probing while running over Jailhouse

2018-01-21 Thread Jan Kiszka
From: Jan Kiszka <jan.kis...@siemens.com>

PCI and PCIBIOS probing only scans devices at function number 0/8/16/...
Subdevices (e.g. multiqueue) have function numbers which are not a
multiple of 8.

The simple hypervisor Jailhouse passes subdevices directly w/o providing
a virtual PCI topology like KVM. As a consequence a PCI passthrough from
Jailhouse to a guest will not be detected by Linux.

Based on patch by Benedikt Spranger, adding Jailhouse probing to avoid
changing the behavior in the absence of the hypervisor.

CC: Benedikt Spranger <b.spran...@linutronix.de>
Signed-off-by: Jan Kiszka <jan.kis...@siemens.com>
---
 arch/x86/pci/legacy.c | 4 +++-
 drivers/pci/probe.c   | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 1cb01abcb1be..a7b0476b4f44 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Discover remaining PCI buses in case there are peer host bridges.
@@ -34,13 +35,14 @@ int __init pci_legacy_init(void)
 
 void pcibios_scan_specific_bus(int busn)
 {
+   int stride = jailhouse_paravirt() ? 1 : 8;
int devfn;
u32 l;
 
if (pci_find_bus(0, busn))
return;
 
-   for (devfn = 0; devfn < 256; devfn += 8) {
+   for (devfn = 0; devfn < 256; devfn += stride) {
if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, ) &&
l != 0x && l != 0x) {
DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, 
l);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 14e0ea1ff38b..60ad14c8245f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pci.h"
 
 #define CARDBUS_LATENCY_TIMER  176 /* secondary latency timer */
@@ -2454,6 +2455,7 @@ static unsigned int pci_scan_child_bus_extend(struct 
pci_bus *bus,
  unsigned int available_buses)
 {
unsigned int used_buses, normal_bridges = 0, hotplug_bridges = 0;
+   unsigned int stride = jailhouse_paravirt() ? 1 : 8;
unsigned int start = bus->busn_res.start;
unsigned int devfn, cmax, max = start;
struct pci_dev *dev;
@@ -2461,7 +2463,7 @@ static unsigned int pci_scan_child_bus_extend(struct 
pci_bus *bus,
dev_dbg(>dev, "scanning bus\n");
 
/* Go find them, Rover! */
-   for (devfn = 0; devfn < 0x100; devfn += 8)
+   for (devfn = 0; devfn < 0x100; devfn += stride)
pci_scan_slot(bus, devfn);
 
/* Reserve buses for SR-IOV capability. */
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-09 Thread Jan Kiszka
On 2015-11-10 03:18, Andy Lutomirski wrote:
> On Mon, Nov 9, 2015 at 6:04 PM, Benjamin Herrenschmidt
>> I thus go back to my original statement, it's a LOT easier to handle if
>> the device itself is self describing, indicating whether it is set to
>> bypass a host iommu or not. For L1->L2, well, that wouldn't be the
>> first time qemu/VFIO plays tricks with the passed through device
>> configuration space...
> 
> Which leaves the special case of Xen, where even preexisting devices
> don't bypass the IOMMU.  Can we keep this specific to powerpc and
> sparc?  On x86, this problem is basically nonexistent, since the IOMMU
> is properly self-describing.
> 
> IOW, I think that on x86 we should assume that all virtio devices
> honor the IOMMU.

>From the guest driver POV, that is OK because either there is no IOMMU
to program (the current situation with qemu), there can be one that
doesn't need it (the current situation with qemu and iommu=on) or there
is (Xen) or will be (future qemu) one that requires it.

> 
>>
>> Note that the above can be solved via some kind of compromise: The
>> device self describes the ability to honor the iommu, along with the
>> property (or ACPI table entry) that indicates whether or not it does.
>>
>> IE. We could use the revision or ProgIf field of the config space for
>> example. Or something in virtio config. If it's an "old" device, we
>> know it always bypass. If it's a new device, we know it only bypasses
>> if the corresponding property is in. I still would have to sort out the
>> openbios case for mac among others but it's at least a workable
>> direction.
>>
>> BTW. Don't you have a similar problem on x86 that today qemu claims
>> that everything honors the iommu in ACPI ?
> 
> Only on a single experimental configuration, and that can apparently
> just be fixed going forward without any real problems being caused.

BTW, I once tried to describe the current situation on QEMU x86 with
IOMMU enabled via ACPI. While you can easily add IOMMU device exceptions
to the static tables, the fun starts when considering device hotplug for
virtio. Unless I missed some trick, ACPI doesn't seem like being
designed for that level of flexibility.

You would have to reserve a complete PCI bus, declare that one as not
being IOMMU-governed, and then only add new virtio devices to that bus.
Possible, but a lot of restrictions that existing management software
would have to be aware of as well.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RFC: virtio-peer shared memory based peer communication device

2015-09-21 Thread Jan Kiszka
On 2015-09-18 23:11, Paolo Bonzini wrote:
> On 18/09/2015 18:29, Claudio Fontana wrote:
>>
>> this is a first RFC for virtio-peer 0.1, which is still very much a work in 
>> progress:
>>
>> https://github.com/hw-claudio/virtio-peer/wiki
>>
>> It is also available as PDF there, but the text is reproduced here for 
>> commenting:
>>
>> Peer shared memory communication device (virtio-peer)
> 
> Apart from the windows idea, how does virtio-peer compare to virtio-rpmsg?

rpmsg is a very specialized thing. It targets single AMP cores, assuming
that those have full access to the main memory. And it is also a
centralized approach where all message go through the main Linux
instance. I suspect we could cover that use case as well with generic
inter-vm shared memory device, but I didn't think about all details yet.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RFC: virtio-peer shared memory based peer communication device

2015-09-21 Thread Jan Kiszka
On 2015-09-21 14:13, Michael S. Tsirkin wrote:
> On Fri, Sep 18, 2015 at 06:29:27PM +0200, Claudio Fontana wrote:
>> Hello,
>>
>> this is a first RFC for virtio-peer 0.1, which is still very much a work in 
>> progress:
>>
>> https://github.com/hw-claudio/virtio-peer/wiki
>>
>> It is also available as PDF there, but the text is reproduced here for 
>> commenting:
>>
>> Peer shared memory communication device (virtio-peer)
>>
>> General Overview
>>
>> (I recommend looking at the PDF for some clarifying pictures)
>>
>> The Virtio Peer shared memory communication device (virtio-peer) is a
>> virtual device which allows high performance low latency guest to
>> guest communication. It uses a new queue extension feature tentatively
>> called VIRTIO_F_WINDOW which indicates that descriptor tables,
>> available and used rings and Queue Data reside in physical memory
>> ranges called Windows, each identified with an unique identifier
>> called WindowID.
> 
> So if I had to summarize the difference from regular virtio,
> I'd say the main one is that this uses window id + offset
> instead of the physical address.
> 
> 
> My question is - why do it?
> 
> All windows are in memory space, are they not?
> 
> How about guest using full physical addresses,
> and hypervisor sending the window physical address
> to VM2?
> 
> VM2 can uses that to find both window id and offset.
> 
> 
> This way at least VM1 can use regular virtio without changes.

What would be the value of having different drivers in VM1 and VM2,
specifically if both run Linux?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-03 Thread Jan Kiszka
On 2015-09-03 10:08, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 06:28:28PM +0200, Jan Kiszka wrote:
>> On 2015-09-01 18:02, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 05:34:37PM +0200, Jan Kiszka wrote:
>>>> On 2015-09-01 16:34, Michael S. Tsirkin wrote:
>>>>> On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote:
>>>>>> On 2015-09-01 11:24, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
>>>>>>>> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
>>>>>>>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>>>>>>>>>> Leaving all the implementation and interface details aside, this
>>>>>>>>>> discussion is first of all about two fundamentally different 
>>>>>>>>>> approaches:
>>>>>>>>>> static shared memory windows vs. dynamically remapped shared windows 
>>>>>>>>>> (a
>>>>>>>>>> third one would be copying in the hypervisor, but I suppose we all 
>>>>>>>>>> agree
>>>>>>>>>> that the whole exercise is about avoiding that). Which way do we 
>>>>>>>>>> want or
>>>>>>>>>> have to go?
>>>>>>>>>>
>>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>> Dynamic is a superset of static: you can always make it static if you
>>>>>>>>> wish. Static has the advantage of simplicity, but that's lost once you
>>>>>>>>> realize you need to invent interfaces to make it work.  Since we can 
>>>>>>>>> use
>>>>>>>>> existing IOMMU interfaces for the dynamic one, what's the 
>>>>>>>>> disadvantage?
>>>>>>>>
>>>>>>>> Complexity. Having to emulate even more of an IOMMU in the hypervisor
>>>>>>>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
>>>>>>>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
>>>>>>>> sense, generic grant tables would be more appealing.
>>>>>>>
>>>>>>> That's not how we do things for KVM, PV features need to be
>>>>>>> modular and interchangeable with emulation.
>>>>>>
>>>>>> I know, and we may have to make some compromise for Jailhouse if that
>>>>>> brings us valuable standardization and broad guest support. But we will
>>>>>> surely not support an arbitrary amount of IOMMU models for that reason.
>>>>>>
>>>>>>>
>>>>>>> If you just want something that's cross-platform and easy to
>>>>>>> implement, just build a PV IOMMU. Maybe use virtio for this.
>>>>>>
>>>>>> That is likely required to keep the complexity manageable and to allow
>>>>>> static preconfiguration.
>>>>>
>>>>> Real IOMMU allow static configuration just fine. This is exactly
>>>>> what VFIO uses.
>>>>
>>>> Please specify more precisely which feature in which IOMMU you are
>>>> referring to. Also, given that you refer to VFIO, I suspect we have
>>>> different thing in mind. I'm talking about an IOMMU device model, like
>>>> the one we have in QEMU now for VT-d. That one is not at all
>>>> preconfigured by the host for VFIO.
>>>
>>> I really just mean that VFIO creates a mostly static IOMMU configuration.
>>>
>>> It's configured by the guest, not the host.
>>
>> OK, that resolves my confusion.
>>
>>>
>>> I don't see host control over configuration as being particularly important.
>>
>> We do, see below.
>>
>>>
>>>
>>>>>
>>>>>> Well, we could declare our virtio-shmem device to be an IOMMU device
>>>>>> that controls access of a remote VM to RAM of the one that owns the
>>>>>> device. In the static case, this access may at most be enabled/disabled
>>>>>> but not moved around. The static regions would have to be discoverable
>>>>>> for the VM (register read-back), and the guest's firmware will likely
>>>>>> have to declare those ranges 

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-03 Thread Jan Kiszka
On 2015-09-03 10:37, Michael S. Tsirkin wrote:
> On Thu, Sep 03, 2015 at 10:21:28AM +0200, Jan Kiszka wrote:
>> On 2015-09-03 10:08, Michael S. Tsirkin wrote:
>>>
>>> IOW if you wish, you actually can create a shared memory device,
>>> make it accessible to the IOMMU and place some or all
>>> data there.
>>>
>>
>> Actually, that could also be something more sophisticated, including
>> virtio-net, IF that device will be able to express its DMA window
>> restrictions (a bit like 32-bit PCI devices being restricted to <4G
>> addresses or ISA devices <1M).
>>
>> Jan
> 
> Actually, it's the bus restriction, not the device restriction.
> 
> So if you want to use bounce buffers in the name of security or
> real-time requirements, you should be able to do this if virtio uses the
> DMA API.

Bounce buffer will only be the simplest option (though fine for low-rate
traffic that we also have in mind, like virtual consoles). Given
properly-sized regions, even if fixed, and the right communication
stacks, you can directly allocate application buffers in those regions
and avoid most to all copying.

In any case, if we manage to address this variation along with your
proposal, that would help tremendously.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-08-31 16:11, Michael S. Tsirkin wrote:
> Hello!
> During the KVM forum, we discussed supporting virtio on top
> of ivshmem.

No, not on top of ivshmem. On top of shared memory. Our model is
different from the simplistic ivshmem.

> I have considered it, and came up with an alternative
> that has several advantages over that - please see below.
> Comments welcome.
> 
> -
> 
> Existing solutions to userspace switching between VMs on the
> same host are vhost-user and ivshmem.
> 
> vhost-user works by mapping memory of all VMs being bridged into the
> switch memory space.
> 
> By comparison, ivshmem works by exposing a shared region of memory to all VMs.
> VMs are required to use this region to store packets. The switch only
> needs access to this region.
> 
> Another difference between vhost-user and ivshmem surfaces when polling
> is used. With vhost-user, the switch is required to handle
> data movement between VMs, if using polling, this means that 1 host CPU
> needs to be sacrificed for this task.
> 
> This is easiest to understand when one of the VMs is
> used with VF pass-through. This can be schematically shown below:
> 
> +-- VM1 --++---VM2---+
> | virtio-pci  +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- 
> NIC
> +-++-+
> 
> 
> With ivshmem in theory communication can happen directly, with two VMs
> polling the shared memory region.
> 
> 
> I won't spend time listing advantages of vhost-user over ivshmem.
> Instead, having identified two advantages of ivshmem over vhost-user,
> below is a proposal to extend vhost-user to gain the advantages
> of ivshmem.
> 
> 
> 1: virtio in guest can be extended to allow support
> for IOMMUs. This provides guest with full flexibility
> about memory which is readable or write able by each device.
> By setting up a virtio device for each other VM we need to
> communicate to, guest gets full control of its security, from
> mapping all memory (like with current vhost-user) to only
> mapping buffers used for networking (like ivshmem) to
> transient mappings for the duration of data transfer only.
> This also allows use of VFIO within guests, for improved
> security.
> 
> vhost user would need to be extended to send the
> mappings programmed by guest IOMMU.
> 
> 2. qemu can be extended to serve as a vhost-user client:
> remote VM mappings over the vhost-user protocol, and
> map them into another VM's memory.
> This mapping can take, for example, the form of
> a BAR of a pci device, which I'll call here vhost-pci - 
> with bus address allowed
> by VM1's IOMMU mappings being translated into
> offsets within this BAR within VM2's physical
> memory space.
> 
> Since the translation can be a simple one, VM2
> can perform it within its vhost-pci device driver.
> 
> While this setup would be the most useful with polling,
> VM1's ioeventfd can also be mapped to
> another VM2's irqfd, and vice versa, such that VMs
> can trigger interrupts to each other without need
> for a helper thread on the host.
> 
> 
> The resulting channel might look something like the following:
> 
> +-- VM1 --+  +---VM2---+
> | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC
> +-+  +-+
> 
> comparing the two diagrams, a vhost-user thread on the host is
> no longer required, reducing the host CPU utilization when
> polling is active.  At the same time, VM2 can not access all of VM1's
> memory - it is limited by the iommu configuration setup by VM1.
> 
> 
> Advantages over ivshmem:
> 
> - more flexibility, endpoint VMs do not have to place data at any
>   specific locations to use the device, in practice this likely
>   means less data copies.
> - better standardization/code reuse
>   virtio changes within guests would be fairly easy to implement
>   and would also benefit other backends, besides vhost-user
>   standard hotplug interfaces can be used to add and remove these
>   channels as VMs are added or removed.
> - migration support
>   It's easy to implement since ownership of memory is well defined.
>   For example, during migration VM2 can notify hypervisor of VM1
>   by updating dirty bitmap each time is writes into VM1 memory.
> 
> Thanks,
> 

This sounds like a different interface to a concept very similar to
Xen's grant table, no? Well, there might be benefits for some use cases,
for ours this is too dynamic, in fact. We'd like to avoid remappings
during runtime controlled by guest activities, which is clearly required
for this model.

Another shortcoming: If VM1 does not trust (security or safety-wise) VM2
while preparing a message for it, it has to keep the buffer invisible
for VM2 until it is completed and signed, hashed etc. That means it has
to reprogram the IOMMU frequently. With the concept we discussed at KVM
Forum, there would be shared memory mapped read-only to VM2 while being
R/W for VM1. That would 

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 10:01, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>> Leaving all the implementation and interface details aside, this
>> discussion is first of all about two fundamentally different approaches:
>> static shared memory windows vs. dynamically remapped shared windows (a
>> third one would be copying in the hypervisor, but I suppose we all agree
>> that the whole exercise is about avoiding that). Which way do we want or
>> have to go?
>>
>> Jan
> 
> Dynamic is a superset of static: you can always make it static if you
> wish. Static has the advantage of simplicity, but that's lost once you
> realize you need to invent interfaces to make it work.  Since we can use
> existing IOMMU interfaces for the dynamic one, what's the disadvantage?

Complexity. Having to emulate even more of an IOMMU in the hypervisor
(we already have to do a bit for VT-d IR in Jailhouse) and doing this
per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
sense, generic grant tables would be more appealing. But what we would
actually need is an interface that is only *optionally* configured by a
guest for dynamic scenarios, otherwise preconfigured by the hypervisor
for static setups. And we need guests that support both. That's the
challenge.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 18:02, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 05:34:37PM +0200, Jan Kiszka wrote:
>> On 2015-09-01 16:34, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote:
>>>> On 2015-09-01 11:24, Michael S. Tsirkin wrote:
>>>>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
>>>>>> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>>>>>>>> Leaving all the implementation and interface details aside, this
>>>>>>>> discussion is first of all about two fundamentally different 
>>>>>>>> approaches:
>>>>>>>> static shared memory windows vs. dynamically remapped shared windows (a
>>>>>>>> third one would be copying in the hypervisor, but I suppose we all 
>>>>>>>> agree
>>>>>>>> that the whole exercise is about avoiding that). Which way do we want 
>>>>>>>> or
>>>>>>>> have to go?
>>>>>>>>
>>>>>>>> Jan
>>>>>>>
>>>>>>> Dynamic is a superset of static: you can always make it static if you
>>>>>>> wish. Static has the advantage of simplicity, but that's lost once you
>>>>>>> realize you need to invent interfaces to make it work.  Since we can use
>>>>>>> existing IOMMU interfaces for the dynamic one, what's the disadvantage?
>>>>>>
>>>>>> Complexity. Having to emulate even more of an IOMMU in the hypervisor
>>>>>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
>>>>>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
>>>>>> sense, generic grant tables would be more appealing.
>>>>>
>>>>> That's not how we do things for KVM, PV features need to be
>>>>> modular and interchangeable with emulation.
>>>>
>>>> I know, and we may have to make some compromise for Jailhouse if that
>>>> brings us valuable standardization and broad guest support. But we will
>>>> surely not support an arbitrary amount of IOMMU models for that reason.
>>>>
>>>>>
>>>>> If you just want something that's cross-platform and easy to
>>>>> implement, just build a PV IOMMU. Maybe use virtio for this.
>>>>
>>>> That is likely required to keep the complexity manageable and to allow
>>>> static preconfiguration.
>>>
>>> Real IOMMU allow static configuration just fine. This is exactly
>>> what VFIO uses.
>>
>> Please specify more precisely which feature in which IOMMU you are
>> referring to. Also, given that you refer to VFIO, I suspect we have
>> different thing in mind. I'm talking about an IOMMU device model, like
>> the one we have in QEMU now for VT-d. That one is not at all
>> preconfigured by the host for VFIO.
> 
> I really just mean that VFIO creates a mostly static IOMMU configuration.
> 
> It's configured by the guest, not the host.

OK, that resolves my confusion.

> 
> I don't see host control over configuration as being particularly important.

We do, see below.

> 
> 
>>>
>>>> Well, we could declare our virtio-shmem device to be an IOMMU device
>>>> that controls access of a remote VM to RAM of the one that owns the
>>>> device. In the static case, this access may at most be enabled/disabled
>>>> but not moved around. The static regions would have to be discoverable
>>>> for the VM (register read-back), and the guest's firmware will likely
>>>> have to declare those ranges reserved to the guest OS.
>>>> In the dynamic case, the guest would be able to create an alternative
>>>> mapping.
>>>
>>>
>>> I don't think we want a special device just to support the
>>> static case. It might be a bit less code to write, but
>>> eventually it should be up to the guest.
>>> Fundamentally, it's policy that host has no business
>>> dictating.
>>
>> "A bit less" is to be validated, and I doubt its just "a bit". But if
>> KVM and its guests will also support some PV-IOMMU that we can reuse for
>> our scenarios, than that is fine. KVM would not have to mandate support
>> for it while we would, that's all.
> 
> Someone will have to do this work.
> 
>

Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 11:24, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
>> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>>>> Leaving all the implementation and interface details aside, this
>>>> discussion is first of all about two fundamentally different approaches:
>>>> static shared memory windows vs. dynamically remapped shared windows (a
>>>> third one would be copying in the hypervisor, but I suppose we all agree
>>>> that the whole exercise is about avoiding that). Which way do we want or
>>>> have to go?
>>>>
>>>> Jan
>>>
>>> Dynamic is a superset of static: you can always make it static if you
>>> wish. Static has the advantage of simplicity, but that's lost once you
>>> realize you need to invent interfaces to make it work.  Since we can use
>>> existing IOMMU interfaces for the dynamic one, what's the disadvantage?
>>
>> Complexity. Having to emulate even more of an IOMMU in the hypervisor
>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
>> sense, generic grant tables would be more appealing.
> 
> That's not how we do things for KVM, PV features need to be
> modular and interchangeable with emulation.

I know, and we may have to make some compromise for Jailhouse if that
brings us valuable standardization and broad guest support. But we will
surely not support an arbitrary amount of IOMMU models for that reason.

> 
> If you just want something that's cross-platform and easy to
> implement, just build a PV IOMMU. Maybe use virtio for this.

That is likely required to keep the complexity manageable and to allow
static preconfiguration.

Well, we could declare our virtio-shmem device to be an IOMMU device
that controls access of a remote VM to RAM of the one that owns the
device. In the static case, this access may at most be enabled/disabled
but not moved around. The static regions would have to be discoverable
for the VM (register read-back), and the guest's firmware will likely
have to declare those ranges reserved to the guest OS.

In the dynamic case, the guest would be able to create an alternative
mapping. We would probably have to define a generic page table structure
for that. Or do you rather have some MPU-like control structure in mind,
more similar to the memory region descriptions vhost is already using?
Also not yet clear to me are how the vhost-pci device and the
translations it will have to do should look like for VM2.

> 
>> But what we would
>> actually need is an interface that is only *optionally* configured by a
>> guest for dynamic scenarios, otherwise preconfigured by the hypervisor
>> for static setups. And we need guests that support both. That's the
>> challenge.
>>
>> Jan
> 
> That's already there for IOMMUs: vfio does the static setup by default,
> enabling iommu by guests is optional.

Cannot follow yet how vfio comes into play regarding some preconfigured
virtual IOMMU.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: rfc: vhost user enhancements for vm2vm communication

2015-09-01 Thread Jan Kiszka
On 2015-09-01 16:34, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 04:09:44PM +0200, Jan Kiszka wrote:
>> On 2015-09-01 11:24, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 11:11:52AM +0200, Jan Kiszka wrote:
>>>> On 2015-09-01 10:01, Michael S. Tsirkin wrote:
>>>>> On Tue, Sep 01, 2015 at 09:35:21AM +0200, Jan Kiszka wrote:
>>>>>> Leaving all the implementation and interface details aside, this
>>>>>> discussion is first of all about two fundamentally different approaches:
>>>>>> static shared memory windows vs. dynamically remapped shared windows (a
>>>>>> third one would be copying in the hypervisor, but I suppose we all agree
>>>>>> that the whole exercise is about avoiding that). Which way do we want or
>>>>>> have to go?
>>>>>>
>>>>>> Jan
>>>>>
>>>>> Dynamic is a superset of static: you can always make it static if you
>>>>> wish. Static has the advantage of simplicity, but that's lost once you
>>>>> realize you need to invent interfaces to make it work.  Since we can use
>>>>> existing IOMMU interfaces for the dynamic one, what's the disadvantage?
>>>>
>>>> Complexity. Having to emulate even more of an IOMMU in the hypervisor
>>>> (we already have to do a bit for VT-d IR in Jailhouse) and doing this
>>>> per platform (AMD IOMMU, ARM SMMU, ...) is out of scope for us. In that
>>>> sense, generic grant tables would be more appealing.
>>>
>>> That's not how we do things for KVM, PV features need to be
>>> modular and interchangeable with emulation.
>>
>> I know, and we may have to make some compromise for Jailhouse if that
>> brings us valuable standardization and broad guest support. But we will
>> surely not support an arbitrary amount of IOMMU models for that reason.
>>
>>>
>>> If you just want something that's cross-platform and easy to
>>> implement, just build a PV IOMMU. Maybe use virtio for this.
>>
>> That is likely required to keep the complexity manageable and to allow
>> static preconfiguration.
> 
> Real IOMMU allow static configuration just fine. This is exactly
> what VFIO uses.

Please specify more precisely which feature in which IOMMU you are
referring to. Also, given that you refer to VFIO, I suspect we have
different thing in mind. I'm talking about an IOMMU device model, like
the one we have in QEMU now for VT-d. That one is not at all
preconfigured by the host for VFIO.

> 
>> Well, we could declare our virtio-shmem device to be an IOMMU device
>> that controls access of a remote VM to RAM of the one that owns the
>> device. In the static case, this access may at most be enabled/disabled
>> but not moved around. The static regions would have to be discoverable
>> for the VM (register read-back), and the guest's firmware will likely
>> have to declare those ranges reserved to the guest OS.
>> In the dynamic case, the guest would be able to create an alternative
>> mapping.
> 
> 
> I don't think we want a special device just to support the
> static case. It might be a bit less code to write, but
> eventually it should be up to the guest.
> Fundamentally, it's policy that host has no business
> dictating.

"A bit less" is to be validated, and I doubt its just "a bit". But if
KVM and its guests will also support some PV-IOMMU that we can reuse for
our scenarios, than that is fine. KVM would not have to mandate support
for it while we would, that's all.

> 
>> We would probably have to define a generic page table structure
>> for that. Or do you rather have some MPU-like control structure in mind,
>> more similar to the memory region descriptions vhost is already using?
> 
> I don't care much. Page tables use less memory if a lot of memory needs
> to be covered. OTOH if you want to use virtio (e.g. to allow command
> batching) that likely means commands to manipulate the IOMMU, and
> maintaining it all on the host. You decide.

I don't care very much about the dynamic case as we won't support it
anyway. However, if the configuration concept used for it is applicable
to static mode as well, then we could reuse it. But preconfiguration
will required register-based region description, I suspect.

> 
>> Also not yet clear to me are how the vhost-pci device and the
>> translations it will have to do should look like for VM2.
> 
> I think we can use vhost-pci BAR + VM1 bus address as the
> VM2 physical address. In other words, all memory exposed to
> virtio-pci by VM1 through it's IOMMU is 

Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-29 Thread Jan Kiszka
On 2015-07-29 01:21, Benjamin Herrenschmidt wrote:
 On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski wrote:
   New QEMU
 always advertises this feature flag.  If iommu=on, QEMU's virtio
 devices refuse to work unless the driver acknowledges the flag.
 
 This should be configurable.

Advertisement of that flag must be configurable, or we won't be able to
run older guests anymore which don't know it, thus will reject it. The
only precondition: there must be no IOMMU if we turn it off.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-29 Thread Jan Kiszka
On 2015-07-29 10:17, Paolo Bonzini wrote:
 
 
 On 29/07/2015 02:47, Andy Lutomirski wrote:
 If new kernels ignore the IOMMU for devices that don't set the flag
 and there are physical devices that already exist and don't set the
 flag, then those devices won't work reliably on most modern
 non-virtual platforms, PPC included.

 Are there many virtio physical devices out there ? We are talking about
 a virtio flag right ? Or have you been considering something else ?

 Yes, virtio flag.  I dislike having a virtio flag at all, but so far
 no one has come up with any better ideas.  If there was a reliable,
 cross-platform mechanism for per-device PCI bus properties, I'd be all
 for using that instead.
 
 No, a virtio flag doesn't make sense.

That will create the risk of subtly breaking old guests over new setups.
I wouldn't suggest this.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Jan Kiszka
On 2015-07-28 15:06, Michael S. Tsirkin wrote:
 On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote:


 On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
 That is an experimental feature (it's x-iommu), so it can change.

 The plan was:

 - for PPC, virtio never honors IOMMU

 - for non-PPC, either have virtio always honor IOMMU, or enforce that
 virtio is not under IOMMU.

 I dislike having PPC special cased.

 In fact, today x86 guests also assume that virtio bypasses IOMMU I
 believe. In fact *all* guests do.

 This doesn't matter much, since the only guests that implement an IOMMU
 in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
 of stability.
 
 Hmm I think Jan (cc) said it was already used out there.

Yes, no known issues with vt-d emulation for almost a year now. Error
reporting could be improved, and interrupt remapping is still missing,
but those are minor issues in this context.

In my testing setups, I also have virtio devices in use, passed through
to an L2 guest, but only in 1:1 mapping so that their broken IOMMU
support causes no practical problems.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Jan Kiszka
On 2015-07-28 19:15, Paolo Bonzini wrote:
 
 
 On 28/07/2015 18:42, Jan Kiszka wrote:
 On the other hand interrupt remapping is absolutely necessary for
 production use, hence my point that x86 does not promise API stability.

 Well, we currently implement the features that the Q35 used to expose.
 Adding interrupt remapping will require a new chipset and/or a hack
 switch to ignore compatibility.
 
 Isn't the VT-d register space separate from other Q35 features and
 backwards-compatible?  You could even add it to PIIX in theory just by
 adding a DMAR.

Yes, it's practically working, but it's not accurate /wrt how that
hardware looked like in reality.

 
 It's not like for example SMRAM, where the registers are in the
 northbridge configuration space and move around in every chipset generation.
 
 (Any kind of stability actually didn't include crashes; those are not
 expected :))

 The Google patches for userspace PIC and IOAPIC are proceeding well, so
 hopefully we can have interrupt remapping soon.

 If the day had 48 hours... I'd love to look into this, first adding QEMU
 support for the new irqchip architecture.
 
 I hope I can squeeze in some time for that...  Google also had an intern
 that was looking at it.

Great!

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Jan Kiszka
On 2015-07-28 18:36, Paolo Bonzini wrote:
 On 28/07/2015 15:11, Jan Kiszka wrote:

 This doesn't matter much, since the only guests that implement an IOMMU
 in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
 of stability.

 Hmm I think Jan (cc) said it was already used out there.
 Yes, no known issues with vt-d emulation for almost a year now. Error
 reporting could be improved, and interrupt remapping is still missing,
 but those are minor issues in this context.
 
 On the other hand interrupt remapping is absolutely necessary for
 production use, hence my point that x86 does not promise API stability.

Well, we currently implement the features that the Q35 used to expose.
Adding interrupt remapping will require a new chipset and/or a hack
switch to ignore compatibility.

 
 (Any kind of stability actually didn't include crashes; those are not
 expected :))
 
 The Google patches for userspace PIC and IOAPIC are proceeding well, so
 hopefully we can have interrupt remapping soon.

If the day had 48 hours... I'd love to look into this, first adding QEMU
support for the new irqchip architecture.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Jan Kiszka
On 2015-07-28 19:10, Andy Lutomirski wrote:
 On Tue, Jul 28, 2015 at 9:44 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 The ability to have virtio on systems with IOMMU in place makes testing
 much more efficient for us. Ideally, we would have it in non-identity
 mapping scenarios as well, e.g. to start secondary Linux instances in
 the test VMs, giving them their own virtio devices. And we will
 eventually have this need on ARM as well.

 Virtio needs to be backward compatible, so the change to put these
 devices under IOMMU control could be advertised during feature
 negotiations and controlled on QEMU side via a device property. Newer
 guest drivers would have to acknowledge that they support virtio via
 IOMMUs. Older ones would refuse to work, and the admin could instead
 spawn VMs with this feature disabled.

 
 The trouble is that this is really a property of the bus and not of
 the device.  If you build a virtio device that physically plugs into a
 PCIe slot, the device has no concept of an IOMMU in the first place.

If one would build a real virtio device today, it would be broken
because every IOMMU would start to translate its requests. Already from
that POV, we really need to introduce a feature flag I will be
IOMMU-translated so that a potential physical implementation can carry
it unconditionally.

 Similarly, if you take an L0-provided IOMMU-supporting device and pass
 it through to L2 using current QEMU on L1 (with Q35 emulation and
 iommu enabled), then, from L2's perspective, the device is 1:1 no
 matter what the device thinks.
 
 IOW, I think the original design was wrong and now we have to deal
 with it.  I think the best solution would be to teach QEMU to fix its
 ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.

Only the current drivers are broken. And we can easily tell them apart
from newer ones via feature flags. Sorry, don't get the problem.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Jan Kiszka
On 2015-07-28 18:11, Andy Lutomirski wrote:
 On Jul 28, 2015 6:11 AM, Jan Kiszka jan.kis...@siemens.com wrote:

 On 2015-07-28 15:06, Michael S. Tsirkin wrote:
 On Tue, Jul 28, 2015 at 02:46:20PM +0200, Paolo Bonzini wrote:


 On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
 That is an experimental feature (it's x-iommu), so it can change.

 The plan was:

 - for PPC, virtio never honors IOMMU

 - for non-PPC, either have virtio always honor IOMMU, or enforce that
 virtio is not under IOMMU.

 I dislike having PPC special cased.

 In fact, today x86 guests also assume that virtio bypasses IOMMU I
 believe. In fact *all* guests do.

 This doesn't matter much, since the only guests that implement an IOMMU
 in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
 of stability.

 Hmm I think Jan (cc) said it was already used out there.

 Yes, no known issues with vt-d emulation for almost a year now. Error
 reporting could be improved, and interrupt remapping is still missing,
 but those are minor issues in this context.

 In my testing setups, I also have virtio devices in use, passed through
 to an L2 guest, but only in 1:1 mapping so that their broken IOMMU
 support causes no practical problems.

 
 How are you getting 1:1 to work?  Is it something that L0 QEMU can
 advertise to L1?  If so, can we just do that unconditionally, which
 would make my patch work?

The guest hypervisor is Jailhouse and the guest is the root cell that
loaded the hypervisor, thus continues with identity mappings. You
usually don't have 1:1 mapping with other setups - maybe with some Xen
configuration? Dunno.

 
 I have no objection to 1:1 devices in general.  It's only devices that
 the PCI code on the guest identifies as not 1:1 but that are
 nonetheless 1:1 that cause problems.

The ability to have virtio on systems with IOMMU in place makes testing
much more efficient for us. Ideally, we would have it in non-identity
mapping scenarios as well, e.g. to start secondary Linux instances in
the test VMs, giving them their own virtio devices. And we will
eventually have this need on ARM as well.

Virtio needs to be backward compatible, so the change to put these
devices under IOMMU control could be advertised during feature
negotiations and controlled on QEMU side via a device property. Newer
guest drivers would have to acknowledge that they support virtio via
IOMMUs. Older ones would refuse to work, and the admin could instead
spawn VMs with this feature disabled.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Jan Kiszka
On 2015-07-28 20:22, Andy Lutomirski wrote:
 On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2015-07-28 19:10, Andy Lutomirski wrote:
 The trouble is that this is really a property of the bus and not of
 the device.  If you build a virtio device that physically plugs into a
 PCIe slot, the device has no concept of an IOMMU in the first place.

 If one would build a real virtio device today, it would be broken
 because every IOMMU would start to translate its requests. Already from
 that POV, we really need to introduce a feature flag I will be
 IOMMU-translated so that a potential physical implementation can carry
 it unconditionally.

 
 Except that, with my patches, it would work correctly.  ISTM the thing

I haven't looked at your patches yet - they make the virtio PCI driver
in Linux IOMMU-compatible? Perfect - except for a compatibility check,
right?

 that's broken right now is QEMU and the virtio_pci driver.  My patches
 fix the driver.  Last year that would have been the end of the story
 except for PPC.  Now we have to deal with QEMU.
 
 Similarly, if you take an L0-provided IOMMU-supporting device and pass
 it through to L2 using current QEMU on L1 (with Q35 emulation and
 iommu enabled), then, from L2's perspective, the device is 1:1 no
 matter what the device thinks.

 IOW, I think the original design was wrong and now we have to deal
 with it.  I think the best solution would be to teach QEMU to fix its
 ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.

 Only the current drivers are broken. And we can easily tell them apart
 from newer ones via feature flags. Sorry, don't get the problem.
 
 I still don't see how feature flags solve the problem.  Suppose we
 added a feature flag meaning respects IOMMU.
 
 Bad case 1:  Build a malicious device that advertises
 non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host starts
 leaking physical addresses to the device (and the device doesn't work,
 of course).  Maybe that's only barely a security problem, but still...

I don't see right now how critical such a hypothetical case could be.
But the OS / its drivers could still decide to refuse talking to such a
device.

 
 Bad case 2:  Use current QEMU w/ IOMMU enabled.  Assign a virtio
 device provided by L0 QEMU to L2.  L1 crashes.  I consider *that* to
 be a security problem, although in practice no one will configure
 their system that way because it has zero chance of actually working.
 Nonetheless, the device does work if L1 accesses it directly?  The
 issue is vfio doesn't notice that the device doesn't respect the IOMMU
 because respects-IOMMU is a property of the PCI bus and the platform
 IOMMU, and vfio assumes it works correctly.

I would have no problem with rejecting configurations in future QEMU
that try to expose unconfined virtio devices in the presence of IOMMU
emulation. Once we can do better, it's just about letting the guest know
about the difference.

The current situation is indeed just broken, we don't need to discuss
this as we can't change history to prevent this.

 
 Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio
 device that *does* respect the IOMMU and sets the feature flag.  They
 emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data corruption in
 the guest.

No. In that case, the feature negotiation of virtio-with-iommu-support
would have failed for older drivers, and the device would have never
been used by the guest.

 
 We could make the rule that *all* virtio-pci devices (except on PPC)
 respect the bus rules.  We'd have to fix QEMU so that virtio devices
 on Q35 iommu=on systems set up a PCI topology where the devices
 *aren't* behind the IOMMU or are protected by RMRRs or whatever.  Then
 old kernels would work correctly on new hosts, new kernels would work
 correctly except on old iommu-providing hosts, and Xen would work.

I don't see a point in doing anything about old QEMU with IOMMU enabled
and virtio devices plugged except declaring such setups broken. No one
should have configured this for production purposes, only for test
setups (like we, with the knowledge about the limitations).

 
 In fact, on Xen, it's impossible without colossal hacks to support
 non-IOMMU-respecting virtio devices because Xen acts as an
 intermediate IOMMU between the Linux dom0 guest and the actual host.
 The QEMU host doesn't even know that Xen is involved.  This is why Xen
 and virtio don't currently work together (without my patches): the
 device thinks it doesn't respect the IOMMU, the driver thinks the
 device doesn't respect the IOMMU, and they're both wrong.
 
 TL;DR: I think there are only two cases.  Either a device respects the
 IOMMU or a device doesn't know whether it respects the IOMMU.  The
 latter case is problematic.

See above, the latter is only problematic on setups that actually use an
IOMMU. If that includes Xen, then no one should use it until virtio can
declare itself IOMMU

Re: [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Jan Kiszka
On 2015-07-28 21:24, Andy Lutomirski wrote:
 On Tue, Jul 28, 2015 at 12:06 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2015-07-28 20:22, Andy Lutomirski wrote:
 On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka jan.kis...@siemens.com wrote:
 On 2015-07-28 19:10, Andy Lutomirski wrote:
 The trouble is that this is really a property of the bus and not of
 the device.  If you build a virtio device that physically plugs into a
 PCIe slot, the device has no concept of an IOMMU in the first place.

 If one would build a real virtio device today, it would be broken
 because every IOMMU would start to translate its requests. Already from
 that POV, we really need to introduce a feature flag I will be
 IOMMU-translated so that a potential physical implementation can carry
 it unconditionally.


 Except that, with my patches, it would work correctly.  ISTM the thing

 I haven't looked at your patches yet - they make the virtio PCI driver
 in Linux IOMMU-compatible? Perfect - except for a compatibility check,
 right?
 
 Yes.  (virtio_pci_legacy, anyway.  Presumably virtio_pci_modern is
 easy to adapt, too.)
 

 that's broken right now is QEMU and the virtio_pci driver.  My patches
 fix the driver.  Last year that would have been the end of the story
 except for PPC.  Now we have to deal with QEMU.

 Similarly, if you take an L0-provided IOMMU-supporting device and pass
 it through to L2 using current QEMU on L1 (with Q35 emulation and
 iommu enabled), then, from L2's perspective, the device is 1:1 no
 matter what the device thinks.

 IOW, I think the original design was wrong and now we have to deal
 with it.  I think the best solution would be to teach QEMU to fix its
 ACPI tables so that 1:1 virtio devices are actually exposed as 1:1.

 Only the current drivers are broken. And we can easily tell them apart
 from newer ones via feature flags. Sorry, don't get the problem.

 I still don't see how feature flags solve the problem.  Suppose we
 added a feature flag meaning respects IOMMU.

 Bad case 1:  Build a malicious device that advertises
 non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host starts
 leaking physical addresses to the device (and the device doesn't work,
 of course).  Maybe that's only barely a security problem, but still...

 I don't see right now how critical such a hypothetical case could be.
 But the OS / its drivers could still decide to refuse talking to such a
 device.

 
 How does OS know it's such a device as opposed to a QEMU-supplied thing?

It can restrict itself to virtio devices exposing the feature if it
feels uncomfortable that it might be talking to some evil piece of
silicon (instead of the hypervisor, which has to be trusted anyway).

 

 Bad case 2: Some hypothetical well-behaved new QEMU provides a virtio
 device that *does* respect the IOMMU and sets the feature flag.  They
 emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data corruption in
 the guest.

 No. In that case, the feature negotiation of virtio-with-iommu-support
 would have failed for older drivers, and the device would have never
 been used by the guest.
 
 So are you suggesting that newer virtio devices always provide this
 feature flag and, if supplied by QEMU with iommu=on, simply refuse to
 operate of the driver doesn't support that flag?

Exactly.

 
 That could work as long as QEMU with the current (broken?) iommu=on
 never exposes such a device.

QEMU would have to be adjusted first so that all its virtio-pci device
models take IOMMUs into account - if they exist or not. Only then it
could expose the feature and expect the guest to acknowledge it.

For compat reasons, QEMU should still be able to expose virtio devices
without the flag set - but then without any IOMMU emulation enabled as
well. That would prevent the current setup we are using today, but it's
trivial to update the guest kernel to a newer virtio driver which would
restore our scenario again.

 


 We could make the rule that *all* virtio-pci devices (except on PPC)
 respect the bus rules.  We'd have to fix QEMU so that virtio devices
 on Q35 iommu=on systems set up a PCI topology where the devices
 *aren't* behind the IOMMU or are protected by RMRRs or whatever.  Then
 old kernels would work correctly on new hosts, new kernels would work
 correctly except on old iommu-providing hosts, and Xen would work.

 I don't see a point in doing anything about old QEMU with IOMMU enabled
 and virtio devices plugged except declaring such setups broken. No one
 should have configured this for production purposes, only for test
 setups (like we, with the knowledge about the limitations).

 
 I'm fine with that.  In fact, I proposed these patches before QEMU had
 this feature in the first place.
 

 In fact, on Xen, it's impossible without colossal hacks to support
 non-IOMMU-respecting virtio devices because Xen acts as an
 intermediate IOMMU between the Linux dom0 guest and the actual host.
 The QEMU host doesn't even know that Xen is involved

Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net

2015-04-27 Thread Jan Kiszka
Am 2015-04-27 um 14:35 schrieb Jan Kiszka:
 Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
 On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie l...@snabb.co wrote:
 On 24 April 2015 at 15:22, Stefan Hajnoczi stefa...@gmail.com wrote:

 The motivation for making VM-to-VM fast is that while software
 switches on the host are efficient today (thanks to vhost-user), there
 is no efficient solution if the software switch is a VM.


 I see. This sounds like a noble goal indeed. I would love to run the
 software switch as just another VM in the long term. It would make it much
 easier for the various software switches to coexist in the world.

 The main technical risk I see in this proposal is that eliminating the
 memory copies might not have the desired effect. I might be tempted to keep
 the copies but prevent the kernel from having to inspect the vrings (more
 like vhost-user). But that is just a hunch and I suppose the first step
 would be a prototype to check the performance anyway.

 For what it is worth here is my view of networking performance on x86 in the
 Haswell+ era:
 https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow

 Thanks.

 I've been thinking about how to eliminate the VM - host - VM
 switching and instead achieve just VM - VM.

 The holy grail of VM-to-VM networking is an exitless I/O path.  In
 other words, packets can be transferred between VMs without any
 vmexits (this requires a polling driver).

 Here is how it works.  QEMU gets -device vhost-user so that a VM can
 act as the vhost-user server:

 VM1 (virtio-net guest driver) - VM2 (vhost-user device)

 VM1 has a regular virtio-net PCI device.  VM2 has a vhost-user device
 and plays the host role instead of the normal virtio-net guest driver
 role.

 The ugly thing about this is that VM2 needs to map all of VM1's guest
 RAM so it can access the vrings and packet data.  The solution to this
 is something like the Shared Buffers BAR but this time it contains not
 just the packet data but also the vring, let's call it the Shared
 Virtqueues BAR.

 The Shared Virtqueues BAR eliminates the need for vhost-net on the
 host because VM1 and VM2 communicate directly using virtqueue notify
 or polling vring memory.  Virtqueue notify works by connecting an
 eventfd as ioeventfd in VM1 and irqfd in VM2.  And VM2 would also have
 an ioeventfd that is irqfd for VM1 to signal completions.
 
 We had such a discussion before:
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658
 
 Would be great to get this ball rolling again.
 
 Jan
 

But one challenge would remain even then (unless both sides only poll):
exit-free inter-VM signaling, no? But that's a hardware issue first of all.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net

2015-04-27 Thread Jan Kiszka
Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
 On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie l...@snabb.co wrote:
 On 24 April 2015 at 15:22, Stefan Hajnoczi stefa...@gmail.com wrote:

 The motivation for making VM-to-VM fast is that while software
 switches on the host are efficient today (thanks to vhost-user), there
 is no efficient solution if the software switch is a VM.


 I see. This sounds like a noble goal indeed. I would love to run the
 software switch as just another VM in the long term. It would make it much
 easier for the various software switches to coexist in the world.

 The main technical risk I see in this proposal is that eliminating the
 memory copies might not have the desired effect. I might be tempted to keep
 the copies but prevent the kernel from having to inspect the vrings (more
 like vhost-user). But that is just a hunch and I suppose the first step
 would be a prototype to check the performance anyway.

 For what it is worth here is my view of networking performance on x86 in the
 Haswell+ era:
 https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow
 
 Thanks.
 
 I've been thinking about how to eliminate the VM - host - VM
 switching and instead achieve just VM - VM.
 
 The holy grail of VM-to-VM networking is an exitless I/O path.  In
 other words, packets can be transferred between VMs without any
 vmexits (this requires a polling driver).
 
 Here is how it works.  QEMU gets -device vhost-user so that a VM can
 act as the vhost-user server:
 
 VM1 (virtio-net guest driver) - VM2 (vhost-user device)
 
 VM1 has a regular virtio-net PCI device.  VM2 has a vhost-user device
 and plays the host role instead of the normal virtio-net guest driver
 role.
 
 The ugly thing about this is that VM2 needs to map all of VM1's guest
 RAM so it can access the vrings and packet data.  The solution to this
 is something like the Shared Buffers BAR but this time it contains not
 just the packet data but also the vring, let's call it the Shared
 Virtqueues BAR.
 
 The Shared Virtqueues BAR eliminates the need for vhost-net on the
 host because VM1 and VM2 communicate directly using virtqueue notify
 or polling vring memory.  Virtqueue notify works by connecting an
 eventfd as ioeventfd in VM1 and irqfd in VM2.  And VM2 would also have
 an ioeventfd that is irqfd for VM1 to signal completions.

We had such a discussion before:
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658

Would be great to get this ball rolling again.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net

2015-04-27 Thread Jan Kiszka
Am 2015-04-27 um 15:01 schrieb Stefan Hajnoczi:
 On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka jan.kis...@siemens.com wrote:
 Am 2015-04-27 um 14:35 schrieb Jan Kiszka:
 Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
 On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie l...@snabb.co wrote:
 On 24 April 2015 at 15:22, Stefan Hajnoczi stefa...@gmail.com wrote:

 The motivation for making VM-to-VM fast is that while software
 switches on the host are efficient today (thanks to vhost-user), there
 is no efficient solution if the software switch is a VM.


 I see. This sounds like a noble goal indeed. I would love to run the
 software switch as just another VM in the long term. It would make it much
 easier for the various software switches to coexist in the world.

 The main technical risk I see in this proposal is that eliminating the
 memory copies might not have the desired effect. I might be tempted to 
 keep
 the copies but prevent the kernel from having to inspect the vrings (more
 like vhost-user). But that is just a hunch and I suppose the first step
 would be a prototype to check the performance anyway.

 For what it is worth here is my view of networking performance on x86 in 
 the
 Haswell+ era:
 https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow

 Thanks.

 I've been thinking about how to eliminate the VM - host - VM
 switching and instead achieve just VM - VM.

 The holy grail of VM-to-VM networking is an exitless I/O path.  In
 other words, packets can be transferred between VMs without any
 vmexits (this requires a polling driver).

 Here is how it works.  QEMU gets -device vhost-user so that a VM can
 act as the vhost-user server:

 VM1 (virtio-net guest driver) - VM2 (vhost-user device)

 VM1 has a regular virtio-net PCI device.  VM2 has a vhost-user device
 and plays the host role instead of the normal virtio-net guest driver
 role.

 The ugly thing about this is that VM2 needs to map all of VM1's guest
 RAM so it can access the vrings and packet data.  The solution to this
 is something like the Shared Buffers BAR but this time it contains not
 just the packet data but also the vring, let's call it the Shared
 Virtqueues BAR.

 The Shared Virtqueues BAR eliminates the need for vhost-net on the
 host because VM1 and VM2 communicate directly using virtqueue notify
 or polling vring memory.  Virtqueue notify works by connecting an
 eventfd as ioeventfd in VM1 and irqfd in VM2.  And VM2 would also have
 an ioeventfd that is irqfd for VM1 to signal completions.

 We had such a discussion before:
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658

 Would be great to get this ball rolling again.

 Jan


 But one challenge would remain even then (unless both sides only poll):
 exit-free inter-VM signaling, no? But that's a hardware issue first of all.
 
 To start with ioeventfd-irqfd can be used.  It incurs a light-weight
 exit in VM1 and interrupt injection in VM2.
 
 For networking the cost is mitigated by NAPI drivers which switch
 between interrupts and polling.  During notification-heavy periods the
 guests would use polling anyway.
 
 A hardware solution would be some kind of inter-guest interrupt
 injection.  I don't know VMX well enough to know whether that is
 possible on Intel CPUs.

Today, we have posted interrupts to avoid the vm-exit on the target CPU,
but there is nothing yet (to my best knowledge) to avoid the exit on the
sender side (unless we ignore security). That's the same problem with
intra-guest IPIs, BTW.

For throughput and given NAPI patterns, that's probably not an issue as
you noted. It may be for latency, though, when almost every cycle counts.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net

2015-04-27 Thread Jan Kiszka
Am 2015-04-27 um 16:36 schrieb Luke Gorrie:
 On 27 April 2015 at 16:30, Jan Kiszka jan.kis...@siemens.com wrote:
 
 Today, we have posted interrupts to avoid the vm-exit on the target CPU,
 but there is nothing yet (to my best knowledge) to avoid the exit on the
 sender side (unless we ignore security). That's the same problem with
 intra-guest IPIs, BTW.

 For throughput and given NAPI patterns, that's probably not an issue as
 you noted. It may be for latency, though, when almost every cycle counts.

 
 Poll-mode networking applications (DPDK, Snabb Switch, etc) are typically
 busy-looping to poll the vring. They may have a very short usleep() between
 checks to save power but they don't wait on their eventfd. So for those
 particular applications latency is on the order of tens of microseconds
 even without guest exits.

That's one side, don't forget the others (the normal guests).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 00/11] qemu: towards virtio-1 host support

2014-10-23 Thread Jan Kiszka
On 2014-10-22 22:34, Benjamin Herrenschmidt wrote:
 On Wed, 2014-10-22 at 16:17 +0200, Jan Kiszka wrote:
 I thought about this again, and I'm not sure anymore if we can use
 ACPI
 to black-list the incompatible virtio devices. Reason: hotplug. To
 my
 understanding, the ACPI DRHD tables won't change during runtime when a
 device shows up or disappears. We would have to isolate virtio devices
 from the rest of the system by using separate buses for it (and avoid
 listing those in any DRHD table) and enforce that they only get
 plugged
 into those buses. I suppose that is not desirable.

 Maybe it's better to fix virtio /wrt IOMMUs.
 
 I always go back to my initial proposal which is to define that current
 virtio always bypass any iommu (which is what it does really) and have
 it expose via a new capability if that isn't the case. That means fixing
 that Xen thingy to allow qemu to know what to expose I assume but that
 seems to be the less bad approach.

Just one thing to consider: feature negotiation happens after guest
startup. If we run a virtio device under IOMMU control, what will we
have to do when the guest says it does not support such devices? Simply
reject operation?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 00/11] qemu: towards virtio-1 host support

2014-10-22 Thread Jan Kiszka
On 2014-10-22 10:44, Michael S. Tsirkin wrote:
 On Wed, Oct 08, 2014 at 11:04:28AM +0200, Cornelia Huck wrote:
 On Tue, 07 Oct 2014 18:24:22 -0700
 Andy Lutomirski l...@amacapital.net wrote:

 On 10/07/2014 07:39 AM, Cornelia Huck wrote:
 This patchset aims to get us some way to implement virtio-1 compliant
 and transitional devices in qemu. Branch available at

 git://github.com/cohuck/qemu virtio-1

 I've mainly focused on:
 - endianness handling
 - extended feature bits
 - virtio-ccw new/changed commands

 At the risk of some distraction, would it be worth thinking about a
 solution to the IOMMU bypassing mess as part of this?

 I think that is a whole different issue. virtio-1 is basically done - we
 just need to implement it - while the IOMMU/DMA stuff certainly needs
 more discussion. Therefore, I'd like to defer to the other discussion
 thread here.
 
 I agree, let's do a separate thread for this.
 I also think it's up to the hypervisors at this point.
 People talked about using ACPI to report IOMMU bypass
 to guest.
 If that happens, we don't need a feature bit.

I thought about this again, and I'm not sure anymore if we can use ACPI
to black-list the incompatible virtio devices. Reason: hotplug. To my
understanding, the ACPI DRHD tables won't change during runtime when a
device shows up or disappears. We would have to isolate virtio devices
from the rest of the system by using separate buses for it (and avoid
listing those in any DRHD table) and enforce that they only get plugged
into those buses. I suppose that is not desirable.

Maybe it's better to fix virtio /wrt IOMMUs.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Using virtio for inter-VM communication

2014-06-17 Thread Jan Kiszka
On 2014-06-17 07:24, Paolo Bonzini wrote:
 Il 15/06/2014 08:20, Jan Kiszka ha scritto:
  I think implementing Xen hypercalls in jailhouse for grant table and
  event channels would actually make a lot of sense.  The Xen
  implementation is 2.5kLOC and I think it should be possible to compact
  it noticeably, especially if you limit yourself to 64-bit guests.
 At least the grant table model seems unsuited for Jailhouse. It allows a
 guest to influence the mapping of another guest during runtime. This we
 want (or even have) to avoid in Jailhouse.
 
 IIRC implementing the grant table hypercalls with copies is inefficient
 but valid.

Back to #1: This is what Rusty is suggesting for virtio. Nothing to win
with grant tables then. And if we really have to copy, I would prefer to
use a standard.

I guess we need to play with prototypes to assess feasibility and impact
on existing code.

Jan




signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Using virtio for inter-VM communication

2014-06-16 Thread Jan Kiszka
On 2014-06-13 10:45, Paolo Bonzini wrote:
 Il 13/06/2014 08:23, Jan Kiszka ha scritto:
 That would preserve zero-copy capabilities (as long as you can work
 against the shared mem directly, e.g. doing DMA from a physical NIC or
 storage device into it) and keep the hypervisor out of the loop.
 
  This seems ill thought out.  How will you program a NIC via the virtio
  protocol without a hypervisor?  And how will you make it safe?  You'll
  need an IOMMU.  But if you have an IOMMU you don't need shared memory.

 Scenarios behind this are things like driver VMs: You pass through the
 physical hardware to a driver guest that talks to the hardware and
 relays data via one or more virtual channels to other VMs. This confines
 a certain set of security and stability risks to the driver VM.
 
 I think implementing Xen hypercalls in jailhouse for grant table and
 event channels would actually make a lot of sense.  The Xen
 implementation is 2.5kLOC and I think it should be possible to compact
 it noticeably, especially if you limit yourself to 64-bit guests.

At least the grant table model seems unsuited for Jailhouse. It allows a
guest to influence the mapping of another guest during runtime. This we
want (or even have) to avoid in Jailhouse.

I'm therefore more in favor of a model where the shared memory region is
defined on cell (guest) creation by adding a virtual device that comes
with such a region.

Jan

 
 It should also be almost enough to run Xen PVH guests as jailhouse
 partitions.
 
 If later Xen starts to support virtio, you will get that for free.
 
 Paolo




signature.asc
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Using virtio for inter-VM communication

2014-06-13 Thread Jan Kiszka
On 2014-06-13 02:47, Rusty Russell wrote:
 Jan Kiszka jan.kis...@siemens.com writes:
 On 2014-06-12 04:27, Rusty Russell wrote:
 Henning Schild henning.sch...@siemens.com writes:
 It was also never implemented, and remains a thought experiment.
 However, implementing it in lguest should be fairly easy.

 The reason why a trusted helper, i.e. additional logic in the
 hypervisor, is not our favorite solution is that we'd like to keep the
 hypervisor as small as possible. I wouldn't exclude such an approach
 categorically, but we have to weigh the costs (lines of code, additional
 hypervisor interface) carefully against the gain (existing
 specifications and guest driver infrastructure).
 
 Reasonable, but I think you'll find it is about the minimal
 implementation in practice.  Unfortunately, I don't have time during the
 next 6 months to implement it myself :(
 
 Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA
 working group discussion): What speaks against introducing an
 alternative encoding of addresses inside virtio data structures? The
 idea of this flag was to replace guest-physical addresses with offsets
 into a shared memory region associated with or part of a virtio
 device.
 
 We would also need a way of defining the shared memory region.  But
 that's not the problem.  If such a feature is not accepted by the guest?
 How to you fall back?

Depends on the hypervisor and its scope, but it should be quite
straightforward: full-featured ones like KVM could fall back to slow
copying, specialized ones like Jailhouse would clear FEATURES_OK if the
guest driver does not accept it (because there would be no ring walking
or copying code in Jailhouse), thus refuse the activate the device. That
would be absolutely fine for application domains of specialized
hypervisors (often embedded, customized guests etc.).

The shared memory regions could be exposed as a BARs (PCI) or additional
address ranges (device tree) and addressed in the redefined guest
address fields via some region index and offset.

 
 We don't add features which unmake the standard.
 
 That would preserve zero-copy capabilities (as long as you can work
 against the shared mem directly, e.g. doing DMA from a physical NIC or
 storage device into it) and keep the hypervisor out of the loop.
 
 This seems ill thought out.  How will you program a NIC via the virtio
 protocol without a hypervisor?  And how will you make it safe?  You'll
 need an IOMMU.  But if you have an IOMMU you don't need shared memory.

Scenarios behind this are things like driver VMs: You pass through the
physical hardware to a driver guest that talks to the hardware and
relays data via one or more virtual channels to other VMs. This confines
a certain set of security and stability risks to the driver VM.

 
 Is it
 too invasive to existing infrastructure or does it have some other pitfalls?
 
 You'll have to convince every vendor to implement your addition to the
 standard.  Which is easier than inventing a completely new system, but
 it's not quite virtio.

It would be an optional addition, a feature all three sides (host and
the communicating guests) would have to agree on. I think we would only
have to agree on extending the spec to enable this - after demonstrating
it via an implementation, of course.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Using virtio for inter-VM communication

2014-06-11 Thread Jan Kiszka
On 2014-06-12 04:27, Rusty Russell wrote:
 Henning Schild henning.sch...@siemens.com writes:
 Hi,

 i am working on the jailhouse[1] project and am currently looking at
 inter-VM communication. We want to connect guests directly with virtual
 consoles based on shared memory. The code complexity in the hypervisor
 should be minimal, it should just make the shared memory discoverable
 and provide a signaling mechanism.
 
 Hi Henning,
 
 The virtio assumption was that the host can see all of guest
 memory.  This simplifies things significantly, and makes it efficient.
 
 If you don't have this, *someone* needs to do a copy.  Usually the guest
 OS does a bounce buffer into your shared region.  Goodbye performance.
 Or you can play remapping tricks.  Goodbye performance again.
 
 My preferred model is to have a trusted helper (ie. host) which
 understands how to copy between virtio rings.  The backend guest (to
 steal Xen vocab) R/O maps the descriptor, avail ring and used rings in
 the guest.  It then asks the trusted helper to do various operation
 (copy into writable descriptor, copy out of readable descriptor, mark
 used).  The virtio ring itself acts as a grant table.
 
 Note: that helper mechanism is completely protocol agnostic.  It was
 also explicitly designed into the virtio mechanism (with its 4k
 boundaries for data structures and its 'len' field to indicate how much
 was written into the descriptor). 
 
 It was also never implemented, and remains a thought experiment.
 However, implementing it in lguest should be fairly easy.

The reason why a trusted helper, i.e. additional logic in the
hypervisor, is not our favorite solution is that we'd like to keep the
hypervisor as small as possible. I wouldn't exclude such an approach
categorically, but we have to weigh the costs (lines of code, additional
hypervisor interface) carefully against the gain (existing
specifications and guest driver infrastructure).

Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA
working group discussion): What speaks against introducing an
alternative encoding of addresses inside virtio data structures? The
idea of this flag was to replace guest-physical addresses with offsets
into a shared memory region associated with or part of a virtio device.
That would preserve zero-copy capabilities (as long as you can work
against the shared mem directly, e.g. doing DMA from a physical NIC or
storage device into it) and keep the hypervisor out of the loop. Is it
too invasive to existing infrastructure or does it have some other pitfalls?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: virtio PCI on KVM without IO BARs

2013-02-28 Thread Jan Kiszka
On 2013-02-28 16:24, Michael S. Tsirkin wrote:
 Another problem with PIO is support for physical virtio devices,
 and nested virt: KVM currently programs all PIO accesses
 to cause vm exit, so using this device in a VM will be slow.

Not answering your question, but support for programming direct PIO
access into KVM's I/O bitmap would be feasible. Such feature may have
some value for assigned devices that use PIO more heavily. They cause
lengthy user-space exists so far.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Jan Kiszka
On 2012-08-13 10:35, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 This is required to get past the following assert with:
 
 commit 1523ed9e1d46b0b54540049d491475ccac7e6421
 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Thu May 17 10:32:39 2012 -0300
 
 virtio/vhost: Add support for KVM in-kernel MSI injection
 
 Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Jan Kiszka jan.kis...@siemens.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Anthony Liguori aligu...@us.ibm.com
 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
 ---
  hw/msix.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/hw/msix.c b/hw/msix.c
 index 800fc32..c1e6dc3 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
  {
  int vector;
  
 +if (!dev-msix_vector_use_notifier  !dev-msix_vector_release_notifier)
 +return;
 +
  assert(dev-msix_vector_use_notifier 
 dev-msix_vector_release_notifier);
  
 

I think to remember pointing out that there is a bug somewhere in the
reset code which deactivates a non-active vhost instance, no?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Jan Kiszka
On 2012-08-13 20:03, Michael S. Tsirkin wrote:
 On Mon, Aug 13, 2012 at 02:06:10PM +0200, Jan Kiszka wrote:
 On 2012-08-13 10:35, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org

 This is required to get past the following assert with:

 commit 1523ed9e1d46b0b54540049d491475ccac7e6421
 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Thu May 17 10:32:39 2012 -0300

 virtio/vhost: Add support for KVM in-kernel MSI injection

 Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Jan Kiszka jan.kis...@siemens.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Anthony Liguori aligu...@us.ibm.com
 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
 ---
  hw/msix.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/hw/msix.c b/hw/msix.c
 index 800fc32..c1e6dc3 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
  {
  int vector;
  
 +if (!dev-msix_vector_use_notifier  
 !dev-msix_vector_release_notifier)
 +return;
 +
  assert(dev-msix_vector_use_notifier 
 dev-msix_vector_release_notifier);
  


 I think to remember pointing out that there is a bug somewhere in the
 reset code which deactivates a non-active vhost instance, no?

 Jan
 
 Could not find it. Could you dig it up pls?

http://thread.gmane.org/gmane.linux.scsi.target.devel/2277/focus=2309

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC V8 17/17] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2012-05-30 Thread Jan Kiszka
On 2012-05-02 12:09, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com 
 
 KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
 enabled guest.
 
 KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be 
 enabled
 in guest.
 
 Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU

This contains valuable documentation for features that are already
supported. Can you break them out and post as separate patch already?
One comment on them below.

 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  Documentation/virtual/kvm/cpuid.txt  |4 ++
  Documentation/virtual/kvm/hypercalls.txt |   60 
 ++
  2 files changed, 64 insertions(+), 0 deletions(-)
 diff --git a/Documentation/virtual/kvm/cpuid.txt 
 b/Documentation/virtual/kvm/cpuid.txt
 index 8820685..062dff9 100644
 --- a/Documentation/virtual/kvm/cpuid.txt
 +++ b/Documentation/virtual/kvm/cpuid.txt
 @@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
 available at msrs
  KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
 ||   || writing to msr 0x4b564d02
  
 --
 +KVM_FEATURE_PV_UNHALT  || 6 || guest checks this feature bit
 +   ||   || before enabling 
 paravirtualized
 +   ||   || spinlock support.
 +--
  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no 
 guest-side
 ||   || per-cpu warps are expected in
 ||   || kvmclock.
 diff --git a/Documentation/virtual/kvm/hypercalls.txt 
 b/Documentation/virtual/kvm/hypercalls.txt
 new file mode 100644
 index 000..bc3f14a
 --- /dev/null
 +++ b/Documentation/virtual/kvm/hypercalls.txt
 @@ -0,0 +1,60 @@
 +KVM Hypercalls Documentation
 +===
 +The template for each hypercall is:
 +1. Hypercall name, value.
 +2. Architecture(s)
 +3. Status (deprecated, obsolete, active)
 +4. Purpose
 +
 +1. KVM_HC_VAPIC_POLL_IRQ
 +
 +Value: 1
 +Architecture: x86
 +Purpose: None

Purpose: Trigger guest exit so that the host can check for pending
interrupts on reentry.

 +
 +2. KVM_HC_MMU_OP
 +
 +Value: 2
 +Architecture: x86
 +Status: deprecated.
 +Purpose: Support MMU operations such as writing to PTE,
 +flushing TLB, release PT.
 +
 +3. KVM_HC_FEATURES
 +
 +Value: 3
 +Architecture: PPC
 +Status: active
 +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid
 +used to enumerate which hypercalls are available. On PPC, either device tree
 +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration
 +mechanism (which is this hypercall) can be used.
 +
 +4. KVM_HC_PPC_MAP_MAGIC_PAGE
 +
 +Value: 4
 +Architecture: PPC
 +Status: active
 +Purpose: To enable communication between the hypervisor and guest there is a
 +shared page that contains parts of supervisor visible register state.
 +The guest can map this shared page to access its supervisor register through
 +memory using this hypercall.
 +
 +5. KVM_HC_KICK_CPU
 +
 +Value: 5
 +Architecture: x86
 +Status: active
 +Purpose: Hypercall used to wakeup a vcpu from HLT state
 +
 +Usage example : A vcpu of a paravirtualized guest that is busywaiting in 
 guest
 +kernel mode for an event to occur (ex: a spinlock to become available) can
 +execute HLT instruction once it has busy-waited for more than a threshold
 +time-interval. Execution of HLT instruction would cause the hypervisor to put
 +the vcpu to sleep until occurence of an appropriate event. Another vcpu of 
 the
 +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
 +specifying APIC ID of the vcpu to be wokenup.
 +
 +TODO:
 +1. more information on input and output needed?
 +2. Add more detail to purpose of hypercalls.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH (repost) RFC 2/2] virtio-pci: recall and return msix notifications on ISR read

2011-11-03 Thread Jan Kiszka
On 2011-11-02 21:11, Michael S. Tsirkin wrote:
 MSIX spec requires that device can be operated with
 all vectors masked, by polling pending bits.
 Add APIs to recall an msix notification, and make polling
 mode possible in virtio-pci by clearing the
 pending bits and setting ISR appropriately on ISR read.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/msix.c   |   26 ++
  hw/msix.h   |3 +++
  hw/virtio-pci.c |   11 ++-
  3 files changed, 39 insertions(+), 1 deletions(-)
 
 diff --git a/hw/msix.c b/hw/msix.c
 index 63b41b9..fe967c9 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -349,6 +349,32 @@ void msix_notify(PCIDevice *dev, unsigned vector)
  stl_le_phys(address, data);
  }
  
 +/* Recall outstanding MSI-X notifications for a vector, if possible.
 + * Return true if any were outstanding. */
 +bool msix_recall(PCIDevice *dev, unsigned vector)
 +{
 +bool ret;
 +if (vector = dev-msix_entries_nr)
 +return false;
 +ret = msix_is_pending(dev, vector);
 +msix_clr_pending(dev, vector);
 +return ret;
 +}

I would prefer to have a single API instead to clarify the tight relation:

bool msi[x]_set_notify(PCIDevice *dev, unsigned vector, unsigned level)

Would return true for level=1 if the message was either sent directly or
queued (we could deliver false if it was already queued, but I see no
use case for this yet).

Also, I don't see the generic value of some msix_recall_all. I think
it's better handled in a single loop over all vectors at caller site,
clearing the individual interrupt reason bits on a per-vector basis
there. msix_recall_all is only useful in the virtio case where you have
one vector of reason A and all the rest of B. Once you had multiple
reason C vectors as well, it would not help anymore.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: IO APIC emulation failure with qemu-kvm

2011-02-04 Thread Jan Kiszka
On 2011-02-04 14:35, Ravi Kumar Kulkarni wrote:
 Hi all,
  I'm Initializing the Local and IO APIC for a propeitary operating
 system running in Virtualized Environment  .
  Im facing some problem with qemu-kvm  but the code runs fine with qemu.

Does it also run fine with qemu-kvm and -no-kvm-irqchip? What versions
of the kernel and qemu-kvm are you using? If not the latest git, does
updating change the picture?

  when i run my kernel image with qemu-kvm it gives emulation error failure
 trying to execute the code outside ROM or RAM at fec0(IO APIC base 
 address)
 but the same code runs fine with qemu. can anyone please point me
 where might be the problem or how to find out this one?

Start with capturing the activity of you guest via ftrace, enabling all
kvm:* events. You may also try to attach gdb to qemu and analyze the
different code path in both versions (specifically if you have debugging
symbols for your guest).

BTW, is your OS doing any fancy [IO]APIC relocations?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization