from:"Michael Kelley via iommu"

[PATCH v3 2/2] PCI: hv: Propagate coherence from VMbus device to PCI device

2022-03-24 Thread Michael Kelley via iommu

PCI pass-thru devices in a Hyper-V VM are represented as a VMBus
device and as a PCI device.  The coherence of the VMbus device is
set based on the VMbus node in ACPI, but the PCI device has no
ACPI node and defaults to not hardware coherent.  This results
in extra software coherence management overhead on ARM64 when
devices are hardware coherent.

Fix this by setting up the PCI host bus so that normal
PCI mechanisms will propagate the coherence of the VMbus
device to the PCI device. There's no effect on x86/x64 where
devices are always hardware coherent.

Signed-off-by: Michael Kelley 
Acked-by: Boqun Feng 
Acked-by: Robin Murphy 
---
 drivers/pci/controller/pci-hyperv.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index ae0bc2f..88b3b56 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3404,6 +3404,15 @@ static int hv_pci_probe(struct hv_device *hdev,
hbus->bridge->domain_nr = dom;
 #ifdef CONFIG_X86
hbus->sysdata.domain = dom;
+#elif defined(CONFIG_ARM64)
+   /*
+* Set the PCI bus parent to be the corresponding VMbus
+* device. Then the VMbus device will be assigned as the
+* ACPI companion in pcibios_root_bridge_prepare() and
+* pci_dma_configure() will propagate device coherence
+* information to devices created on the bus.
+*/
+   hbus->sysdata.parent = hdev->device.parent;
 #endif
 
hbus->hdev = hdev;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/2] Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device

2022-03-24 Thread Michael Kelley via iommu

VMbus synthetic devices are not represented in the ACPI DSDT -- only
the top level VMbus device is represented. As a result, on ARM64
coherence information in the _CCA method is not specified for
synthetic devices, so they default to not hardware coherent.
Drivers for some of these synthetic devices have been recently
updated to use the standard DMA APIs, and they are incurring extra
overhead of unneeded software coherence management.

Fix this by propagating coherence information from the VMbus node
in ACPI to the individual synthetic devices. There's no effect on
x86/x64 where devices are always hardware coherent.

Signed-off-by: Michael Kelley 
Acked-by: Robin Murphy 
---
 drivers/hv/hv_common.c | 11 +++
 drivers/hv/vmbus_drv.c | 31 +++
 include/asm-generic/mshyperv.h |  1 +
 3 files changed, 43 insertions(+)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 181d16b..820e814 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -216,6 +217,16 @@ bool hv_query_ext_cap(u64 cap_query)
 }
 EXPORT_SYMBOL_GPL(hv_query_ext_cap);
 
+void hv_setup_dma_ops(struct device *dev, bool coherent)
+{
+   /*
+* Hyper-V does not offer a vIOMMU in the guest
+* VM, so pass 0/NULL for the IOMMU settings
+*/
+   arch_setup_dma_ops(dev, 0, 0, NULL, coherent);
+}
+EXPORT_SYMBOL_GPL(hv_setup_dma_ops);
+
 bool hv_is_hibernation_supported(void)
 {
return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 12a2b37..5c3b29a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -921,6 +921,21 @@ static int vmbus_probe(struct device *child_device)
 }
 
 /*
+ * vmbus_dma_configure -- Configure DMA coherence for VMbus device
+ */
+static int vmbus_dma_configure(struct device *child_device)
+{
+   /*
+* On ARM64, propagate the DMA coherence setting from the top level
+* VMbus ACPI device to the child VMbus device being added here.
+* On x86/x64 coherence is assumed and these calls have no effect.
+*/
+   hv_setup_dma_ops(child_device,
+   device_get_dma_attr(_acpi_dev->dev) == DEV_DMA_COHERENT);
+   return 0;
+}
+
+/*
  * vmbus_remove - Remove a vmbus device
  */
 static void vmbus_remove(struct device *child_device)
@@ -1040,6 +1055,7 @@ static void vmbus_device_release(struct device *device)
.remove =   vmbus_remove,
.probe =vmbus_probe,
.uevent =   vmbus_uevent,
+   .dma_configure =vmbus_dma_configure,
.dev_groups =   vmbus_dev_groups,
.drv_groups =   vmbus_drv_groups,
.bus_groups =   vmbus_bus_groups,
@@ -2428,6 +2444,21 @@ static int vmbus_acpi_add(struct acpi_device *device)
 
hv_acpi_dev = device;
 
+   /*
+* Older versions of Hyper-V for ARM64 fail to include the _CCA
+* method on the top level VMbus device in the DSDT. But devices
+* are hardware coherent in all current Hyper-V use cases, so fix
+* up the ACPI device to behave as if _CCA is present and indicates
+* hardware coherence.
+*/
+   ACPI_COMPANION_SET(>dev, device);
+   if (IS_ENABLED(CONFIG_ACPI_CCA_REQUIRED) &&
+   device_get_dma_attr(>dev) == DEV_DMA_NOT_SUPPORTED) {
+   pr_info("No ACPI _CCA found; assuming coherent device I/O\n");
+   device->flags.cca_seen = true;
+   device->flags.coherent_dma = true;
+   }
+
result = acpi_walk_resources(device->handle, METHOD_NAME__CRS,
vmbus_walk_resources, NULL);
 
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c08758b..c05d2ce 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -269,6 +269,7 @@ static inline int cpumask_to_vpset_noself(struct hv_vpset 
*vpset,
 u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 void hyperv_cleanup(void);
 bool hv_query_ext_cap(u64 cap_query);
+void hv_setup_dma_ops(struct device *dev, bool coherent);
 void *hv_map_memory(void *addr, unsigned long size);
 void hv_unmap_memory(void *addr);
 #else /* CONFIG_HYPERV */
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 0/2] Fix coherence for VMbus and PCI pass-thru devices in Hyper-V VM

2022-03-24 Thread Michael Kelley via iommu

Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are 
added
dynamically via the VMbus protocol and are not represented in the ACPI DSDT. 
Only
the top level VMbus node exists in the DSDT. As such, on ARM64 these devices 
don't
pick up coherence information and default to not hardware coherent.  This 
results
in extra software coherence management overhead since the synthetic devices are
always hardware coherent. PCI pass-thru devices are also hardware coherent in 
all
current usage scenarios.

Fix this by propagating coherence information from the top level VMbus node in
the DSDT to all VMbus synthetic devices and PCI pass-thru devices. While smaller
granularity of control would be better, basing on the VMbus node in the DSDT
gives as escape path if a future scenario arises with devices that are not
hardware coherent.

Changes since v2:
* Move coherence propagation for VMbus synthetic devices to a separate
  .dma_configure function instead of the .probe fucntion [Robin Murphy]

Changes since v1:
* Use device_get_dma_attr() instead of acpi_get_dma_attr(), eliminating the
  need to export acpi_get_dma_attr() [Robin Murphy]
* Use arch_setup_dma_ops() to set device coherence [Robin Murphy]
* Move handling of missing _CCA to vmbus_acpi_add() so it is only done once
* Rework handling of PCI devices so existing code in pci_dma_configure()
  just works

Michael Kelley (2):
  Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
  PCI: hv: Propagate coherence from VMbus device to PCI device

 drivers/hv/hv_common.c  | 11 +++
 drivers/hv/vmbus_drv.c  | 31 +++
 drivers/pci/controller/pci-hyperv.c |  9 +
 include/asm-generic/mshyperv.h  |  1 +
 4 files changed, 52 insertions(+)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 2/2] PCI: hv: Propagate coherence from VMbus device to PCI device

2022-03-23 Thread Michael Kelley via iommu

PCI pass-thru devices in a Hyper-V VM are represented as a VMBus
device and as a PCI device.  The coherence of the VMbus device is
set based on the VMbus node in ACPI, but the PCI device has no
ACPI node and defaults to not hardware coherent.  This results
in extra software coherence management overhead on ARM64 when
devices are hardware coherent.

Fix this by setting up the PCI host bus so that normal
PCI mechanisms will propagate the coherence of the VMbus
device to the PCI device. There's no effect on x86/x64 where
devices are always hardware coherent.

Signed-off-by: Michael Kelley 
---
 drivers/pci/controller/pci-hyperv.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index ae0bc2f..88b3b56 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3404,6 +3404,15 @@ static int hv_pci_probe(struct hv_device *hdev,
hbus->bridge->domain_nr = dom;
 #ifdef CONFIG_X86
hbus->sysdata.domain = dom;
+#elif defined(CONFIG_ARM64)
+   /*
+* Set the PCI bus parent to be the corresponding VMbus
+* device. Then the VMbus device will be assigned as the
+* ACPI companion in pcibios_root_bridge_prepare() and
+* pci_dma_configure() will propagate device coherence
+* information to devices created on the bus.
+*/
+   hbus->sysdata.parent = hdev->device.parent;
 #endif
 
hbus->hdev = hdev;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/2] Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device

2022-03-23 Thread Michael Kelley via iommu

VMbus synthetic devices are not represented in the ACPI DSDT -- only
the top level VMbus device is represented. As a result, on ARM64
coherence information in the _CCA method is not specified for
synthetic devices, so they default to not hardware coherent.
Drivers for some of these synthetic devices have been recently
updated to use the standard DMA APIs, and they are incurring extra
overhead of unneeded software coherence management.

Fix this by propagating coherence information from the VMbus node
in ACPI to the individual synthetic devices. There's no effect on
x86/x64 where devices are always hardware coherent.

Signed-off-by: Michael Kelley 
---
 drivers/hv/hv_common.c | 11 +++
 drivers/hv/vmbus_drv.c | 23 +++
 include/asm-generic/mshyperv.h |  1 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 181d16b..820e814 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -216,6 +217,16 @@ bool hv_query_ext_cap(u64 cap_query)
 }
 EXPORT_SYMBOL_GPL(hv_query_ext_cap);
 
+void hv_setup_dma_ops(struct device *dev, bool coherent)
+{
+   /*
+* Hyper-V does not offer a vIOMMU in the guest
+* VM, so pass 0/NULL for the IOMMU settings
+*/
+   arch_setup_dma_ops(dev, 0, 0, NULL, coherent);
+}
+EXPORT_SYMBOL_GPL(hv_setup_dma_ops);
+
 bool hv_is_hibernation_supported(void)
 {
return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 12a2b37..2d2c54c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -905,6 +905,14 @@ static int vmbus_probe(struct device *child_device)
struct hv_device *dev = device_to_hv_device(child_device);
const struct hv_vmbus_device_id *dev_id;
 
+   /*
+* On ARM64, propagate the DMA coherence setting from the top level
+* VMbus ACPI device to the child VMbus device being added here.
+* On x86/x64 coherence is assumed and these calls have no effect.
+*/
+   hv_setup_dma_ops(child_device,
+   device_get_dma_attr(_acpi_dev->dev) == DEV_DMA_COHERENT);
+
dev_id = hv_vmbus_get_id(drv, dev);
if (drv->probe) {
ret = drv->probe(dev, dev_id);
@@ -2428,6 +2436,21 @@ static int vmbus_acpi_add(struct acpi_device *device)
 
hv_acpi_dev = device;
 
+   /*
+* Older versions of Hyper-V for ARM64 fail to include the _CCA
+* method on the top level VMbus device in the DSDT. But devices
+* are hardware coherent in all current Hyper-V use cases, so fix
+* up the ACPI device to behave as if _CCA is present and indicates
+* hardware coherence.
+*/
+   ACPI_COMPANION_SET(>dev, device);
+   if (IS_ENABLED(CONFIG_ACPI_CCA_REQUIRED) &&
+   device_get_dma_attr(>dev) == DEV_DMA_NOT_SUPPORTED) {
+   pr_info("No ACPI _CCA found; assuming coherent device I/O\n");
+   device->flags.cca_seen = true;
+   device->flags.coherent_dma = true;
+   }
+
result = acpi_walk_resources(device->handle, METHOD_NAME__CRS,
vmbus_walk_resources, NULL);
 
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c08758b..c05d2ce 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -269,6 +269,7 @@ static inline int cpumask_to_vpset_noself(struct hv_vpset 
*vpset,
 u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 void hyperv_cleanup(void);
 bool hv_query_ext_cap(u64 cap_query);
+void hv_setup_dma_ops(struct device *dev, bool coherent);
 void *hv_map_memory(void *addr, unsigned long size);
 void hv_unmap_memory(void *addr);
 #else /* CONFIG_HYPERV */
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 0/2] Fix coherence for VMbus and PCI pass-thru devices in Hyper-V VM

2022-03-23 Thread Michael Kelley via iommu

Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are 
added
dynamically via the VMbus protocol and are not represented in the ACPI DSDT. 
Only
the top level VMbus node exists in the DSDT. As such, on ARM64 these devices 
don't
pick up coherence information and default to not hardware coherent.  This 
results
in extra software coherence management overhead since the synthetic devices are
always hardware coherent. PCI pass-thru devices are also hardware coherent in 
all
current usage scenarios.

Fix this by propagating coherence information from the top level VMbus node in
the DSDT to all VMbus synthetic devices and PCI pass-thru devices. While smaller
granularity of control would be better, basing on the VMbus node in the DSDT
gives as escape path if a future scenario arises with devices that are not
hardware coherent.

Robin Murphy -- I'm not ignoring your feedback about pci_dma_configure(), but
I wanted to try this alternate approach where pci_dma_configure() works as is.
If reviewers prefer modifying pci_dma_configure() to handle the Hyper-V
specifics, I can go back to that.

Changes since v1:
* Use device_get_dma_attr() instead of acpi_get_dma_attr(), eliminating the
  need to export acpi_get_dma_attr() [Robin Murphy]
* Use arch_setup_dma_ops() to set device coherence [Robin Murphy]
* Move handling of missing _CCA to vmbus_acpi_add() so it is only done once
* Rework handling of PCI devices so existing code in pci_dma_configure()
  just works

Michael Kelley (2):
  Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
  PCI: hv: Propagate coherence from VMbus device to PCI device

 drivers/hv/hv_common.c  | 11 +++
 drivers/hv/vmbus_drv.c  | 23 +++
 drivers/pci/controller/pci-hyperv.c |  9 +
 include/asm-generic/mshyperv.h  |  1 +
 4 files changed, 44 insertions(+)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/4 RESEND] Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device

2022-03-17 Thread Michael Kelley via iommu

VMbus synthetic devices are not represented in the ACPI DSDT -- only
the top level VMbus device is represented. As a result, on ARM64
coherence information in the _CCA method is not specified for
synthetic devices, so they default to not hardware coherent.
Drivers for some of these synthetic devices have been recently
updated to use the standard DMA APIs, and they are incurring extra
overhead of unneeded software coherence management.

Fix this by propagating coherence information from the VMbus node
in ACPI to the individual synthetic devices. There's no effect on
x86/x64 where devices are always hardware coherent.

Signed-off-by: Michael Kelley 
---
 drivers/hv/vmbus_drv.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 12a2b37..c0e993ad 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -904,6 +904,21 @@ static int vmbus_probe(struct device *child_device)
drv_to_hv_drv(child_device->driver);
struct hv_device *dev = device_to_hv_device(child_device);
const struct hv_vmbus_device_id *dev_id;
+   enum dev_dma_attr coherent;
+
+   /*
+* On ARM64, propagate the DMA coherence setting from the top level
+* VMbus ACPI device to the child VMbus device being added here.
+* Older Hyper-V ARM64 versions don't set the _CCA method on the
+* top level VMbus ACPI device as they should.  Treat these cases
+* as DMA coherent since that's the assumption made by Hyper-V.
+*
+* On x86/x64 these calls assume coherence and have no effect.
+*/
+   coherent = acpi_get_dma_attr(hv_acpi_dev);
+   if (coherent == DEV_DMA_NOT_SUPPORTED)
+   coherent = DEV_DMA_COHERENT;
+   acpi_dma_configure(child_device, coherent);
 
dev_id = hv_vmbus_get_id(drv, dev);
if (drv->probe) {
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 4/4 RESEND] PCI: hv: Propagate coherence from VMbus device to PCI device

2022-03-17 Thread Michael Kelley via iommu

PCI pass-thru devices in a Hyper-V VM are represented as a VMBus
device and as a PCI device.  The coherence of the VMbus device is
set based on the VMbus node in ACPI, but the PCI device has no
ACPI node and defaults to not hardware coherent.  This results
in extra software coherence management overhead on ARM64 when
devices are hardware coherent.

Fix this by propagating the coherence of the VMbus device to the
PCI device.  There's no effect on x86/x64 where devices are
always hardware coherent.

Signed-off-by: Michael Kelley 
---
 drivers/pci/controller/pci-hyperv.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index ae0bc2f..14276f5 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device 
*hbus)
 }
 
 /*
- * Set NUMA node for the devices on the bus
+ * Set NUMA node and DMA coherence for the devices on the bus
  */
-static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus)
+static void hv_pci_assign_properties(struct hv_pcibus_device *hbus)
 {
struct pci_dev *dev;
struct pci_bus *bus = hbus->bridge->bus;
@@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct 
hv_pcibus_device *hbus)
 numa_map_to_online_node(
 hv_dev->desc.virtual_numa_node));
 
+   /*
+* On ARM64, propagate the DMA coherence from the VMbus device
+* to the corresponding PCI device. On x86/x64, these calls
+* have no effect because DMA is always hardware coherent.
+*/
+   dev_set_dma_coherent(>dev,
+   dev_is_dma_coherent(>hdev->device));
+
put_pcichild(hv_dev);
}
 }
@@ -2191,7 +2200,7 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device 
*hbus)
return error;
 
pci_lock_rescan_remove();
-   hv_pci_assign_numa_node(hbus);
+   hv_pci_assign_properties(hbus);
pci_bus_assign_resources(bridge->bus);
hv_pci_assign_slots(hbus);
pci_bus_add_devices(bridge->bus);
@@ -2458,7 +2467,7 @@ static void pci_devices_present_work(struct work_struct 
*work)
 */
pci_lock_rescan_remove();
pci_scan_child_bus(hbus->bridge->bus);
-   hv_pci_assign_numa_node(hbus);
+   hv_pci_assign_properties(hbus);
hv_pci_assign_slots(hbus);
pci_unlock_rescan_remove();
break;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/4 RESEND] ACPI: scan: Export acpi_get_dma_attr()

2022-03-17 Thread Michael Kelley via iommu

Export acpi_get_dma_attr() so that it can be used by the Hyper-V
VMbus driver, which may be built as a module. The related function
acpi_dma_configure_id() is already exported.

Signed-off-by: Michael Kelley 
---
 drivers/acpi/scan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 1331756..9f3c88f 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1489,6 +1489,7 @@ enum dev_dma_attr acpi_get_dma_attr(struct acpi_device 
*adev)
else
return DEV_DMA_NON_COHERENT;
 }
+EXPORT_SYMBOL_GPL(acpi_get_dma_attr);
 
 /**
  * acpi_dma_get_range() - Get device DMA parameters.
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/4 RESEND] dma-mapping: Add wrapper function to set dma_coherent

2022-03-17 Thread Michael Kelley via iommu

Add a wrapper function to set dma_coherent, avoiding the need for
complex #ifdef's when setting it in architecture independent code.

Signed-off-by: Michael Kelley 
---
 include/linux/dma-map-ops.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b..3350e7a 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -254,11 +254,20 @@ static inline bool dev_is_dma_coherent(struct device *dev)
 {
return dev->dma_coherent;
 }
+static inline void dev_set_dma_coherent(struct device *dev,
+   bool coherent)
+{
+   dev->dma_coherent = coherent;
+}
 #else
 static inline bool dev_is_dma_coherent(struct device *dev)
 {
return true;
 }
+static inline void dev_set_dma_coherent(struct device *dev,
+   bool coherent)
+{
+}
 #endif /* CONFIG_ARCH_HAS_DMA_COHERENCE_H */
 
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/4 RESEND] Fix coherence for VMbus and PCI pass-thru devices in Hyper-V VM

2022-03-17 Thread Michael Kelley via iommu

[Resend to fix an email address typo for Bjorn Helgaas]

Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are 
added
dynamically via the VMbus protocol and are not represented in the ACPI DSDT. 
Only
the top level VMbus node exists in the DSDT. As such, on ARM64 these devices 
don't
pick up coherence information and default to not hardware coherent.  This 
results
in extra software coherence management overhead since the synthetic devices are
always hardware coherent. PCI pass-thru devices are also hardware coherent in 
all
current usage scenarios.

Fix this by propagating coherence information from the top level VMbus node in
the DSDT to all VMbus synthetic devices and PCI pass-thru devices. While smaller
granularity of control would be better, basing on the VMbus node in the DSDT
gives as escape path if a future scenario arises with devices that are not
hardware coherent.

The first two patches are prep to allow manipulating device coherence from a
module (since the VMbus driver can be built as a module) and from architecture
independent code without having a bunch of #ifdef's.

The third patch propagates the VMbus node coherence to VMbus synthetic devices.

The fourth patch propagates the coherence to PCI pass-thru devices.

Michael Kelley (4):
  ACPI: scan: Export acpi_get_dma_attr()
  dma-mapping: Add wrapper function to set dma_coherent
  Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
  PCI: hv: Propagate coherence from VMbus device to PCI device

 drivers/acpi/scan.c |  1 +
 drivers/hv/vmbus_drv.c  | 15 +++
 drivers/pci/controller/pci-hyperv.c | 17 +
 include/linux/dma-map-ops.h |  9 +
 4 files changed, 38 insertions(+), 4 deletions(-)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/4] dma-mapping: Add wrapper function to set dma_coherent

2022-03-17 Thread Michael Kelley via iommu

Add a wrapper function to set dma_coherent, avoiding the need for
complex #ifdef's when setting it in architecture independent code.

Signed-off-by: Michael Kelley 
---
 include/linux/dma-map-ops.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b..3350e7a 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -254,11 +254,20 @@ static inline bool dev_is_dma_coherent(struct device *dev)
 {
return dev->dma_coherent;
 }
+static inline void dev_set_dma_coherent(struct device *dev,
+   bool coherent)
+{
+   dev->dma_coherent = coherent;
+}
 #else
 static inline bool dev_is_dma_coherent(struct device *dev)
 {
return true;
 }
+static inline void dev_set_dma_coherent(struct device *dev,
+   bool coherent)
+{
+}
 #endif /* CONFIG_ARCH_HAS_DMA_COHERENCE_H */
 
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 4/4] PCI: hv: Propagate coherence from VMbus device to PCI device

2022-03-17 Thread Michael Kelley via iommu

PCI pass-thru devices in a Hyper-V VM are represented as a VMBus
device and as a PCI device.  The coherence of the VMbus device is
set based on the VMbus node in ACPI, but the PCI device has no
ACPI node and defaults to not hardware coherent.  This results
in extra software coherence management overhead on ARM64 when
devices are hardware coherent.

Fix this by propagating the coherence of the VMbus device to the
PCI device.  There's no effect on x86/x64 where devices are
always hardware coherent.

Signed-off-by: Michael Kelley 
---
 drivers/pci/controller/pci-hyperv.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index ae0bc2f..14276f5 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device 
*hbus)
 }
 
 /*
- * Set NUMA node for the devices on the bus
+ * Set NUMA node and DMA coherence for the devices on the bus
  */
-static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus)
+static void hv_pci_assign_properties(struct hv_pcibus_device *hbus)
 {
struct pci_dev *dev;
struct pci_bus *bus = hbus->bridge->bus;
@@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct 
hv_pcibus_device *hbus)
 numa_map_to_online_node(
 hv_dev->desc.virtual_numa_node));
 
+   /*
+* On ARM64, propagate the DMA coherence from the VMbus device
+* to the corresponding PCI device. On x86/x64, these calls
+* have no effect because DMA is always hardware coherent.
+*/
+   dev_set_dma_coherent(>dev,
+   dev_is_dma_coherent(>hdev->device));
+
put_pcichild(hv_dev);
}
 }
@@ -2191,7 +2200,7 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device 
*hbus)
return error;
 
pci_lock_rescan_remove();
-   hv_pci_assign_numa_node(hbus);
+   hv_pci_assign_properties(hbus);
pci_bus_assign_resources(bridge->bus);
hv_pci_assign_slots(hbus);
pci_bus_add_devices(bridge->bus);
@@ -2458,7 +2467,7 @@ static void pci_devices_present_work(struct work_struct 
*work)
 */
pci_lock_rescan_remove();
pci_scan_child_bus(hbus->bridge->bus);
-   hv_pci_assign_numa_node(hbus);
+   hv_pci_assign_properties(hbus);
hv_pci_assign_slots(hbus);
pci_unlock_rescan_remove();
break;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/4] ACPI: scan: Export acpi_get_dma_attr()

2022-03-17 Thread Michael Kelley via iommu

Export acpi_get_dma_attr() so that it can be used by the Hyper-V
VMbus driver, which may be built as a module. The related function
acpi_dma_configure_id() is already exported.

Signed-off-by: Michael Kelley 
---
 drivers/acpi/scan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 1331756..9f3c88f 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1489,6 +1489,7 @@ enum dev_dma_attr acpi_get_dma_attr(struct acpi_device 
*adev)
else
return DEV_DMA_NON_COHERENT;
 }
+EXPORT_SYMBOL_GPL(acpi_get_dma_attr);
 
 /**
  * acpi_dma_get_range() - Get device DMA parameters.
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/4] Fix coherence for VMbus and PCI pass-thru devices in Hyper-V VM

2022-03-17 Thread Michael Kelley via iommu

Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are 
added
dynamically via the VMbus protocol and are not represented in the ACPI DSDT. 
Only
the top level VMbus node exists in the DSDT. As such, on ARM64 these devices 
don't
pick up coherence information and default to not hardware coherent.  This 
results
in extra software coherence management overhead since the synthetic devices are
always hardware coherent. PCI pass-thru devices are also hardware coherent in 
all
current usage scenarios.

Fix this by propagating coherence information from the top level VMbus node in
the DSDT to all VMbus synthetic devices and PCI pass-thru devices. While smaller
granularity of control would be better, basing on the VMbus node in the DSDT
gives as escape path if a future scenario arises with devices that are not
hardware coherent.

The first two patches are prep to allow manipulating device coherence from a
module (since the VMbus driver can be built as a module) and from architecture
independent code without having a bunch of #ifdef's.

The third patch propagates the VMbus node coherence to VMbus synthetic devices.

The fourth patch propagates the coherence to PCI pass-thru devices.

Michael Kelley (4):
  ACPI: scan: Export acpi_get_dma_attr()
  dma-mapping: Add wrapper function to set dma_coherent
  Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
  PCI: hv: Propagate coherence from VMbus device to PCI device

 drivers/acpi/scan.c |  1 +
 drivers/hv/vmbus_drv.c  | 15 +++
 drivers/pci/controller/pci-hyperv.c | 17 +
 include/linux/dma-map-ops.h |  9 +
 4 files changed, 38 insertions(+), 4 deletions(-)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/4] Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device

2022-03-17 Thread Michael Kelley via iommu

VMbus synthetic devices are not represented in the ACPI DSDT -- only
the top level VMbus device is represented. As a result, on ARM64
coherence information in the _CCA method is not specified for
synthetic devices, so they default to not hardware coherent.
Drivers for some of these synthetic devices have been recently
updated to use the standard DMA APIs, and they are incurring extra
overhead of unneeded software coherence management.

Fix this by propagating coherence information from the VMbus node
in ACPI to the individual synthetic devices. There's no effect on
x86/x64 where devices are always hardware coherent.

Signed-off-by: Michael Kelley 
---
 drivers/hv/vmbus_drv.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 12a2b37..c0e993ad 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -904,6 +904,21 @@ static int vmbus_probe(struct device *child_device)
drv_to_hv_drv(child_device->driver);
struct hv_device *dev = device_to_hv_device(child_device);
const struct hv_vmbus_device_id *dev_id;
+   enum dev_dma_attr coherent;
+
+   /*
+* On ARM64, propagate the DMA coherence setting from the top level
+* VMbus ACPI device to the child VMbus device being added here.
+* Older Hyper-V ARM64 versions don't set the _CCA method on the
+* top level VMbus ACPI device as they should.  Treat these cases
+* as DMA coherent since that's the assumption made by Hyper-V.
+*
+* On x86/x64 these calls assume coherence and have no effect.
+*/
+   coherent = acpi_get_dma_attr(hv_acpi_dev);
+   if (coherent == DEV_DMA_NOT_SUPPORTED)
+   coherent = DEV_DMA_COHERENT;
+   acpi_dma_configure(child_device, coherent);
 
dev_id = hv_vmbus_get_id(drv, dev);
if (drv->probe) {
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver

2021-09-15 Thread Michael Kelley via iommu

From: Tianyu Lan   Sent: Tuesday, September 14, 2021 6:39 
AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma address will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
> 
> Allocate rx/tx ring buffer via alloc_pages() in Isolation VM and map
> these pages via vmap(). After calling vmbus_establish_gpadl() which
> marks these pages visible to host, unmap these pages to release the
> virtual address mapped with physical address below shared_gpa_boundary
> and map them in the extra address space via vmap_pfn().
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * Allocate rx/tx ring buffer via alloc_pages() in Isolation VM
>   * Map pages after calling vmbus_establish_gpadl().
>   * set dma_set_min_align_mask for netvsc driver.
> 
> Change since v3:
>   * Add comment to explain why not to use dma_map_sg()
>   * Fix some error handle.
> ---
>  drivers/net/hyperv/hyperv_net.h   |   7 +
>  drivers/net/hyperv/netvsc.c   | 287 +-
>  drivers/net/hyperv/netvsc_drv.c   |   1 +
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h|   5 +
>  5 files changed, 296 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 315278a7cf88..87e8c74398a5 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>   u32 total_bytes;
>   u32 send_buf_index;
>   u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,8 @@ struct netvsc_device {
> 
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + struct page **recv_pages;
> + u32 recv_page_count;
>   u32 recv_buf_size; /* allocated bytes */
>   struct vmbus_gpadl recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
> @@ -1082,6 +1085,8 @@ struct netvsc_device {
> 
>   /* Send buffer allocated by us */
>   void *send_buf;
> + struct page **send_pages;
> + u32 send_page_count;
>   u32 send_buf_size;
>   struct vmbus_gpadl send_buf_gpadl_handle;
>   u32 send_section_cnt;
> @@ -1731,4 +1736,6 @@ struct rndis_message {
>  #define RETRY_US_HI  1
>  #define RETRY_MAX2000/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +   struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 1f87e570ed2b..7d5254bf043e 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -150,11 +151,33 @@ static void free_netvsc_device(struct rcu_head *head)
>  {
>   struct netvsc_device *nvdev
>   = container_of(head, struct netvsc_device, rcu);
> + unsigned int alloc_unit;
>   int i;
> 
>   kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_pages) {
> + alloc_unit = (nvdev->recv_buf_size /
> + nvdev->recv_page_count) >> PAGE_SHIFT;
> +
> + vunmap(nvdev->recv_buf);
> + for (i = 0; i < nvdev->recv_page_count; i++)
> + __free_pages(nvdev->recv_pages[i], alloc_unit);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_pages) {
> + alloc_unit = (nvdev->send_buf_size /
> + nvdev->send_page_count) >> PAGE_SHIFT;
> +
> + vunmap(nvdev->send_buf);
> + for (i = 0; i < nvdev->send_page_count; i++)
> + __free_pages(nvdev->send_pages[i], alloc_unit);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
>   kfree(nvdev->send_section_map);
> 
>   for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -330,6 +353,108 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device 
> *net_device, u32 q_idx)
>   return nvchan->mrc.slots ? 0 : -ENOMEM;
>  }
> 
> +void *netvsc_alloc_pages(struct page ***pages_array, unsigned int *array_len,
> +  unsigned long size)
> +{
> + struct page *page, **pages, **vmap_pages;
> + unsigned long pg_count = size >> PAGE_SHIFT;
> + int alloc_unit = MAX_ORDER_NR_PAGES;
> + int i, j, vmap_page_index = 0;
> + void *vaddr;
> +
> + if (pg_count < alloc_unit)
> + alloc_unit = 1;
> +
> + /* vmap() accepts page array with PAGE_SIZE as

RE: [PATCH V5 11/12] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-09-15 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
> to map these memory during sending/receiving packet and return swiotlb
> bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
> marked to be visible to host and the swiotlb force mode is enabled.
> 
> Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
> keep the original data offset in the bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * use scsi_dma_map/unmap() instead of dma_map/unmap_sg()
>   * Add deleted comments back.
>   * Fix error calculation of  hvpnfs_to_add
> 
> Change since v3:
>   * Rplace dma_map_page with dma_map_sg()
>   * Use for_each_sg() to populate payload->range.pfn_array.
>   * Remove storvsc_dma_map macro
> ---
>  drivers/hv/vmbus_drv.c |  1 +
>  drivers/scsi/storvsc_drv.c | 24 +++-
>  include/linux/hyperv.h |  1 +
>  3 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index b0be287e9a32..9c53f823cde1 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2121,6 +2121,7 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   hv_debug_add_dev_dir(child_device_obj);
> 
>   child_device_obj->device.dma_mask = _dma_mask;
> + child_device_obj->device.dma_parms = _device_obj->dma_parms;
>   return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index ebbbc1299c62..d10b450bcf0c 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
>  #include 
>  #include 
>  #include 
> @@ -1322,6 +1324,7 @@ static void storvsc_on_channel_callback(void *context)
>   continue;
>   }
>   request = (struct storvsc_cmd_request 
> *)scsi_cmd_priv(scmnd);
> + scsi_dma_unmap(scmnd);
>   }
> 
>   storvsc_on_receive(stor_device, packet, request);
> @@ -1735,7 +1738,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
> struct scsi_cmnd *scmnd)
>   struct hv_host_device *host_dev = shost_priv(host);
>   struct hv_device *dev = host_dev->dev;
>   struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> - int i;
>   struct scatterlist *sgl;
>   unsigned int sg_count;
>   struct vmscsi_request *vm_srb;
> @@ -1817,10 +1819,11 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload_sz = sizeof(cmd_request->mpb);
> 
>   if (sg_count) {
> - unsigned int hvpgoff, hvpfns_to_add;
>   unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>   unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> - u64 hvpfn;
> + struct scatterlist *sg;
> + unsigned long hvpfn, hvpfns_to_add;
> + int j, i = 0;
> 
>   if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1834,8 +1837,11 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload->range.len = length;
>   payload->range.offset = offset_in_hvpg;
> 
> + sg_count = scsi_dma_map(scmnd);
> + if (sg_count < 0)
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> 
> - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
> + for_each_sg(sgl, sg, sg_count, j) {
>   /*
>* Init values for the current sgl entry. hvpgoff
>* and hvpfns_to_add are in units of Hyper-V size

Nit:  The above comment is now out-of-date because hvpgoff has
been removed.

> @@ -1845,10 +1851,9 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>* even on other than the first sgl entry, provided
>* they are a multiple of PAGE_SIZE.
>*/
> - hvpgoff = HVPFN_DOWN(sgl->offset);
> - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
> - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
> - hvpgoff;
> + hvpfn = HVPFN_DOWN(sg_dma_address(sg));
> + hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
> +  sg_dma_len(sg)) - hvpfn;

Good.  This looks correct now.

> 
>

RE: [PATCH V5 10/12] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-09-15 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> Swiotlb bounce buffer code calls set_memory_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space via memremap. Populate the shared_gpa_boundary
> (vTOM) via swiotlb_unencrypted_base variable.
> 
> The map function memremap() can't work in the early place
> hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
> buffer in the hyperv_iommu_swiotlb_later_init().
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>* Use swiotlb_unencrypted_base variable to pass shared_gpa_
>  boundary and map bounce buffer inside swiotlb code.
> 
> Change since v3:
>* Get hyperv bounce bufffer size via default swiotlb
>bounce buffer size function and keep default size as
>same as the one in the AMD SEV VM.
> ---
>  arch/x86/include/asm/mshyperv.h |  2 ++
>  arch/x86/mm/mem_encrypt.c   |  3 +-
>  arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
>  drivers/hv/vmbus_drv.c  |  3 ++
>  drivers/iommu/hyperv-iommu.c| 60 +
>  include/linux/hyperv.h  |  1 +
>  6 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 165423e8b67a..2d22f29f90c9 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -182,6 +182,8 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, 
> int vcpu, int vector,
>   struct hv_interrupt_entry *entry);
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool 
> visible);
> +void *hv_map_memory(void *addr, unsigned long size);
> +void hv_unmap_memory(void *addr);

Aren't these two declarations now spurious?

>  void hv_ghcb_msr_write(u64 msr, u64 value);
>  void hv_ghcb_msr_read(u64 msr, u64 *value);
>  #else /* CONFIG_HYPERV */
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ff08dc463634..e2db0b8ed938 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "mm_internal.h"
> 
> @@ -202,7 +203,7 @@ void __init sev_setup_arch(void)
>   phys_addr_t total_mem = memblock_phys_mem_size();
>   unsigned long size;
> 
> - if (!sev_active())
> + if (!sev_active() && !hv_is_isolation_supported())
>   return;
> 
>   /*
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 54f9aa7e8457..43bd031aa332 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include 
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
> 
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -   NULL,
> +   hyperv_swiotlb_detect,
> pci_xen_swiotlb_init,
> NULL);
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 392c1ac4f819..b0be287e9a32 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t 
> *type,
>   return child_device_obj;
>  }
> 
> +static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
>  /*
>   * vmbus_device_register - Register the child device
>   */
> @@ -2118,6 +2120,7 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   }
>   hv_debug_add_dev_dir(child_device_obj);
> 
> + child_device_obj->device.dma_mask = _dma_mask;
>   return 0;
> 
>  err_kset_unregister:
> diff --git

RE: [PATCH V5 09/12] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-09-15 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Expose swiotlb_unencrypted_base for platforms to set unencrypted
> memory base offset and call memremap() to map bounce buffer in the
> swiotlb code, store map address and use the address to copy data
> from/to swiotlb bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * Expose swiotlb_unencrypted_base to set unencrypted memory
> offset.
>   * Use memremap() to map bounce buffer if swiotlb_unencrypted_
> base is set.
> 
> Change since v1:
>   * Make swiotlb_init_io_tlb_mem() return error code and return
>   error when dma_map_decrypted() fails.
> ---
>  include/linux/swiotlb.h |  6 ++
>  kernel/dma/swiotlb.c| 41 +++--
>  2 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index b0cb2a9973f4..4998ed44ae3d 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
>   * @end: The end address of the swiotlb memory pool. Used to do a quick
>   *   range check to see if the memory was in fact allocated by this
>   *   API.
> + * @vaddr:   The vaddr of the swiotlb memory pool. The swiotlb
> + *   memory pool may be remapped in the memory encrypted case and 
> store
> + *   virtual address for bounce buffer operation.
>   * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
>   *   @end. For default swiotlb, this is command line adjustable via
>   *   setup_io_tlb_npages.
> @@ -91,6 +94,7 @@ extern enum swiotlb_force swiotlb_force;
>  struct io_tlb_mem {
>   phys_addr_t start;
>   phys_addr_t end;
> + void *vaddr;
>   unsigned long nslabs;
>   unsigned long used;
>   unsigned int index;
> @@ -185,4 +189,6 @@ static inline bool is_swiotlb_for_alloc(struct device 
> *dev)
>  }
>  #endif /* CONFIG_DMA_RESTRICTED_POOL */
> 
> +extern phys_addr_t swiotlb_unencrypted_base;
> +
>  #endif /* __LINUX_SWIOTLB_H */
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 87c40517e822..9e30cc4bd872 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
> 
>  struct io_tlb_mem io_tlb_default_mem;
> 
> +phys_addr_t swiotlb_unencrypted_base;
> +
>  /*
>   * Max segment that we can provide which (if pages are contingous) will
>   * not be bounced (unless SWIOTLB_FORCE is set).
> @@ -175,7 +178,7 @@ void __init swiotlb_update_mem_attributes(void)
>   memset(vaddr, 0, bytes);
>  }
> 
> -static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
>   unsigned long nslabs, bool late_alloc)
>  {
>   void *vaddr = phys_to_virt(start);
> @@ -196,13 +199,34 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>   mem->slots[i].alloc_size = 0;
>   }
> +
> + if (set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT))
> + return -EFAULT;
> +
> + /*
> +  * Map memory in the unencrypted physical address space when requested
> +  * (e.g. for Hyper-V AMD SEV-SNP Isolation VMs).
> +  */
> + if (swiotlb_unencrypted_base) {
> + phys_addr_t paddr = __pa(vaddr) + swiotlb_unencrypted_base;

Nit:  Use "start" instead of "__pa(vaddr)" since "start" is already the needed
physical address.

> +
> + vaddr = memremap(paddr, bytes, MEMREMAP_WB);
> + if (!vaddr) {
> + pr_err("Failed to map the unencrypted memory.\n");
> + return -ENOMEM;
> + }
> + }
> +
>   memset(vaddr, 0, bytes);
> + mem->vaddr = vaddr;
> + return 0;
>  }
> 
>  int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
>  {
>   struct io_tlb_mem *mem = _tlb_default_mem;
>   size_t alloc_size;
> + int ret;
> 
>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>   return 0;
> @@ -217,7 +241,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>

RE: [PATCH V5 07/12] Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message

2021-09-15 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
> with host in Isolation VM and so it's necessary to use hvcall to set
> them visible to host. In Isolation VM with AMD SEV SNP, the access
> address should be in the extra space which is above shared gpa
> boundary. So remap these pages into the extra address(pa +
> shared_gpa_boundary).
> 
> Introduce monitor_pages_original[] in the struct vmbus_connection
> to store monitor page virtual address returned by hv_alloc_hyperv_
> zeroed_page() and free monitor page via monitor_pages_original in
> the vmbus_disconnect(). The monitor_pages[] is to used to access
> monitor page and it is initialized to be equal with monitor_pages_
> original. The monitor_pages[] will be overridden in the isolation VM
> with va of extra address. Introduce monitor_pages_pa[] to store
> monitor pages' physical address and use it to populate pa in the
> initiate msg.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * Introduce monitor_pages_pa[] to store monitor pages' physical
> address and use it to populate pa in the initiate msg.
>   * Move code of mapping moniter pages in extra address into
> vmbus_connect().
> 
> Change since v3:
>   * Rename monitor_pages_va with monitor_pages_original
>   * free monitor page via monitor_pages_original and
> monitor_pages is used to access monitor page.
> 
> Change since v1:
> * Not remap monitor pages in the non-SNP isolation VM.
> ---
>  drivers/hv/connection.c   | 90 ---
>  drivers/hv/hyperv_vmbus.h |  2 +
>  2 files changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 8820ae68f20f..edd8f7dd169f 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -19,6 +19,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
> 
>  #include "hyperv_vmbus.h"
> @@ -102,8 +104,9 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>   vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
>   }
> 
> - msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
> - msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> + msg->monitor_page1 = vmbus_connection.monitor_pages_pa[0];
> + msg->monitor_page2 = vmbus_connection.monitor_pages_pa[1];
> +
>   msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
> 
>   /*
> @@ -216,6 +219,65 @@ int vmbus_connect(void)
>   goto cleanup;
>   }
> 
> + vmbus_connection.monitor_pages_original[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages_original[1]
> + = vmbus_connection.monitor_pages[1];
> + vmbus_connection.monitor_pages_pa[0]
> + = virt_to_phys(vmbus_connection.monitor_pages[0]);
> + vmbus_connection.monitor_pages_pa[1]
> + = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> + if (hv_is_isolation_supported()) {
> + vmbus_connection.monitor_pages_pa[0] +=
> + ms_hyperv.shared_gpa_boundary;
> + vmbus_connection.monitor_pages_pa[1] +=
> + ms_hyperv.shared_gpa_boundary;
> +
> + ret = set_memory_decrypted((unsigned long)
> +vmbus_connection.monitor_pages[0],
> +1);
> + ret |= set_memory_decrypted((unsigned long)
> + vmbus_connection.monitor_pages[1],
> + 1);
> + if (ret)
> + goto cleanup;
> +
> + /*
> +  * Isolation VM with AMD SNP needs to access monitor page via
> +  * address space above shared gpa boundary.
> +  */
> + if (hv_isolation_type_snp()) {
> + vmbus_connection.monitor_pages[0]
> + = memremap(vmbus_connection.monitor_pages_pa[0],
> +HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[0]) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + vmbus_connection.monitor_pages[1]
> + = memremap(vmbus_connection.monitor_pages_pa[1],
> +HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[1]) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> + }
> +
> + /*
>

RE: [PATCH V5 05/12] x86/hyperv: Add Write/Read MSR registers via ghcb page

2021-09-15 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> Hyperv provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyperv requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>* Remove hv_get_simp(), hv_get_siefp()  hv_get_synint_*()
>  helper function. Move the logic into hv_get/set_register().
> 
> Change since v3:
>  * Pass old_msg_type to hv_signal_eom() as parameter.
>* Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
>* Add hv_isolation_type_snp() weak function.
>* Add maros to set syinc register in ARM code.
> 
> Change since v1:
>  * Introduce sev_es_ghcb_hv_call_simple() and share code
>  between SEV and Hyper-V code.
> 
> Fix for hyperv: Add Write/Read MSR registers via ghcb page
> ---
>  arch/x86/hyperv/hv_init.c   |  36 +++
>  arch/x86/hyperv/ivm.c   | 103 
>  arch/x86/include/asm/mshyperv.h |  56 -
>  arch/x86/include/asm/sev.h  |   6 ++
>  arch/x86/kernel/sev-shared.c|  63 +++
>  drivers/hv/hv.c |  77 +++-
>  drivers/hv/hv_common.c  |   6 ++
>  include/asm-generic/mshyperv.h  |   2 +
>  8 files changed, 266 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index d57df6825527..a16a83e46a30 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> -void __percpu **hv_ghcb_pg;
> +union hv_ghcb __percpu **hv_ghcb_pg;
> 
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> @@ -406,7 +406,7 @@ void __init hyperv_init(void)
>   }
> 
>   if (hv_isolation_type_snp()) {
> - hv_ghcb_pg = alloc_percpu(void *);
> + hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
>   if (!hv_ghcb_pg)
>   goto free_vp_assist_page;
>   }
> @@ -424,6 +424,9 @@ void __init hyperv_init(void)
>   guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> +
>   hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
>   VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -501,6 +504,7 @@ void __init hyperv_init(void)
> 
>  clean_guest_os_id:
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>   cpuhp_remove_state(cpuhp);
>  free_ghcb_page:
>   free_percpu(hv_ghcb_pg);
> @@ -522,6 +526,7 @@ void hyperv_cleanup(void)
> 
>   /* Reset our OS id */
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>   /*
>* Reset hypercall page reference before reset the page,
> @@ -592,30 +597,3 @@ bool hv_is_hyperv_initialized(void)
>   return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> - if (!(ms_hyperv.priv_high & HV_ISOLATION))
> - return HV_ISOLATION_TYPE_NONE;
> - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> - if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> - return false;
> -
> - if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> - return false;
> -
> - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -
> -DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> -
> -bool hv_isolation_type_snp(void)
> -{
> - return static_branch_unlikely(_type_snp);
> -}
> -EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 79e7fb83472a..5439723446c9 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -6,12 +6,115 @@
>   *  Tianyu Lan 
>   */
> 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
> +#include 
> +
> +union hv_ghcb {
> + struct ghcb ghcb;
> +} __packed __aligned(HV_HYP_PAGE_SIZE);
> +
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long

RE: [PATCH V5 04/12] Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM

2021-09-15 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change sincv v4
>   * Change gpadl handle in netvsc and uio driver from u32 to
> struct vmbus_gpadl.
>   * Change vmbus_establish_gpadl()'s gpadl_handle parameter
> to vmbus_gpadl data structure.
> 
> Change since v3:
>   * Change vmbus_teardown_gpadl() parameter and put gpadl handle,
> buffer and buffer size in the struct vmbus_gpadl.
> ---
>  drivers/hv/channel.c| 54 -
>  drivers/net/hyperv/hyperv_net.h |  5 +--
>  drivers/net/hyperv/netvsc.c | 17 ++-
>  drivers/uio/uio_hv_generic.c| 20 ++--
>  include/linux/hyperv.h  | 12 ++--
>  5 files changed, 71 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..cf419eb1de77 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -456,7 +457,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, 
> void *kbuffer,
>  static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  enum hv_gpadl_type type, void *kbuffer,
>  u32 size, u32 send_offset,
> -u32 *gpadl_handle)
> +struct vmbus_gpadl *gpadl)
>  {
>   struct vmbus_channel_gpadl_header *gpadlmsg;
>   struct vmbus_channel_gpadl_body *gpadl_body;
> @@ -474,6 +475,15 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   if (ret)
>   return ret;
> 
> + ret = set_memory_decrypted((unsigned long)kbuffer,
> +HVPFN_UP(size));

This should be PFN_UP, not HVPFN_UP.  The numpages parameter to
set_memory_decrypted() is in guest size pages, not Hyper-V size pages.

> + if (ret) {
> + dev_warn(>device_obj->device,
> +  "Failed to set host visibility for new GPADL %d.\n",
> +  ret);
> + return ret;
> + }
> +
>   init_completion(>waitevent);
>   msginfo->waiting_channel = channel;
> 
> @@ -537,7 +547,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   }
> 
>   /* At this point, we received the gpadl created msg */
> - *gpadl_handle = gpadlmsg->gpadl;
> + gpadl->gpadl_handle = gpadlmsg->gpadl;
> + gpadl->buffer = kbuffer;
> + gpadl->size = size;
> +
> 
>  cleanup:
>   spin_lock_irqsave(_connection.channelmsg_lock, flags);
> @@ -549,6 +562,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   }
> 
>   kfree(msginfo);
> +
> + if (ret)
> + set_memory_encrypted((unsigned long)kbuffer,
> +  HVPFN_UP(size));

Should be PFN_UP as noted on the previous call to set_memory_decrypted().

> +
>   return ret;
>  }
> 
> @@ -561,10 +579,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   * @gpadl_handle: some funky thing
>   */
>  int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
> -   u32 size, u32 *gpadl_handle)
> +   u32 size, struct vmbus_gpadl *gpadl)
>  {
>   return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
> -0U, gpadl_handle);
> +0U, gpadl);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
> 
> @@ -639,6 +657,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   struct vmbus_channel_open_channel *open_msg;
>   struct vmbus_channel_msginfo *open_info = NULL;
>   struct page *page = newchannel->ringbuffer_page;
> + struct vmbus_gpadl gpadl;

I think this local variable was needed in a previous version of the patch, but
is now unused and should be deleted.

>   u32 send_pages, recv_pages;
>   unsigned long flags;
>   int err;
> @@ -675,7 +694,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   goto error_clean_ring;
> 
>   /* Establish the gpadl for the ring buffer */
> - newchannel->ringbuffer_gpadlhandle = 0;
> + newchannel->ringbuffer_gpadlhandle.gpadl_handle = 0;
> 
>   err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
> page_address(newchannel->ringbuffer_page),
> @@ -701,7 +720,8 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   open_msg->header.msgtype = CHANNELMSG_OPENCHANNEL;
>   open_msg->openid = newchannel->offermsg.child_relid;
>   open_msg->child_relid = newchannel->offermsg.child_relid;
> - open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
> +

RE: [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM

2021-09-02 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Thursday, September 2, 2021 6:36 AM
> 
> On 9/2/2021 8:23 AM, Michael Kelley wrote:
> >> +  } else {
> >> +  pages_wraparound = kcalloc(page_cnt * 2 - 1,
> >> + sizeof(struct page *),
> >> + GFP_KERNEL);
> >> +
> >> +  pages_wraparound[0] = pages;
> >> +  for (i = 0; i < 2 * (page_cnt - 1); i++)
> >> +  pages_wraparound[i + 1] =
> >> +  [i % (page_cnt - 1) + 1];
> >> +
> >> +  ring_info->ring_buffer = (struct hv_ring_buffer *)
> >> +  vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
> >> +  PAGE_KERNEL);
> >> +
> >> +  kfree(pages_wraparound);
> >> +  if (!ring_info->ring_buffer)
> >> +  return -ENOMEM;
> >> +  }
> > With this patch, the code is a big "if" statement with two halves -- one
> > when SNP isolation is in effect, and the other when not.  The SNP isolation
> > case does the work using PFNs with the shared_gpa_boundary added,
> > while the other case does the same work but using struct page.  Perhaps
> > I'm missing something, but can both halves be combined and always
> > do the work using PFNs?  The only difference is whether to add the
> > shared_gpa_boundary, and whether to zero the memory when done.
> > So get the starting PFN, then have an "if" statement for whether to
> > add the shared_gpa_boundary.  Then everything else is the same.
> > At the end, use an "if" statement to decide whether to zero the
> > memory.  It would really be better to have the logic in this algorithm
> > coded only once.
> >
> 
> Hi Michael:
>   I have tried this before. But vmap_pfn() only works for those pfns out
> of normal memory. Please see vmap_pfn_apply() for detail and
> return error when the PFN is valid.
> 

Indeed.  This ties into the discussion with Christoph about coming up
with generalized helper functions to assist in handling the
shared_gpa_boundary.   Having a single implementation here in
hv_ringbuffer_init() would be a good goal as well.

Michael

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-09-02 Thread Michael Kelley via iommu

From: Christoph Hellwig  Sent: Thursday, September 2, 2021 1:00 AM
> 
> On Tue, Aug 31, 2021 at 05:16:19PM +, Michael Kelley wrote:
> > As a quick overview, I think there are four places where the
> > shared_gpa_boundary must be applied to adjust the guest physical
> > address that is used.  Each requires mapping a corresponding
> > virtual address range.  Here are the four places:
> >
> > 1)  The so-called "monitor pages" that are a core communication
> > mechanism between the guest and Hyper-V.  These are two single
> > pages, and the mapping is handled by calling memremap() for
> > each of the two pages.  See Patch 7 of Tianyu's series.
> 
> Ah, interesting.
> 
> > 3)  The network driver send and receive buffers.  vmap_phys_range()
> > should work here.
> 
> Actually it won't.  The problem with these buffers is that they are
> physically non-contiguous allocations.  

Indeed you are right.  These buffers are allocated with vzalloc().

> We really have two sensible options:
> 
>  1) use vmap_pfn as in the current series.  But in that case I think
> we should get rid of the other mapping created by vmalloc.  I
> though a bit about finding a way to apply the offset in vmalloc
> itself, but I think it would be too invasive to the normal fast
> path.  So the other sub-option would be to allocate the pages
> manually (maybe even using high order allocations to reduce TLB
> pressure) and then remap them

What's the benefit of getting rid of the other mapping created by
vmalloc if it isn't referenced?  Just page table space?  The default sizes
are a 16 Meg receive buffer and a 1 Meg send buffer for each VMbus
channel used by netvsc, and usually the max number of channels
is 8.  So there's 128 Meg of virtual space to be saved on the receive
buffers,  which could be worth it.

Allocating the pages manually is also an option, but we have to
be careful about high order allocations.  While typically these buffers
are allocated during system boot, these synthetic NICs can be hot
added and removed while the VM is running.   The channel count
can also be changed while the VM is running.  So multiple 16 Meg
receive buffer allocations may need to be done after the system has
been running a long time.

>  2) do away with the contiguous kernel mapping entirely.  This means
> the simple memcpy calls become loops over kmap_local_pfn.  As
> I just found out for the send side that would be pretty easy,
> but the receive side would be more work.  We'd also need to check
> the performance implications.

Doing away with the contiguous kernel mapping entirely seems like
it would result in fairly messy code to access the buffer.  What's the
benefit of doing away with the mapping?  I'm not an expert on the
netvsc driver, but decoding the incoming packets is already fraught
with complexities because of the nature of the protocol with Hyper-V.
The contiguous kernel mapping at least keeps the basics sane.

> 
> > 4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
> > should work here as well.
> 
> Or memremap if it works for 1.
> 
> > Case #2 above does unusual mapping.  The ring buffer consists of a ring
> > buffer header page, followed by one or more pages that are the actual
> > ring buffer.  The pages making up the actual ring buffer are mapped
> > twice in succession.  For example, if the ring buffer has 4 pages
> > (one header page and three ring buffer pages), the contiguous
> > virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
> > The duplicate contiguous mapping allows the code that is reading
> > or writing the actual ring buffer to not be concerned about wrap-around
> > because writing off the end of the ring buffer is automatically
> > wrapped-around by the mapping.  The amount of data read or
> > written in one batch never exceeds the size of the ring buffer, and
> > after a batch is read or written, the read or write indices are adjusted
> > to put them back into the range of the first mapping of the actual
> > ring buffer pages.  So there's method to the madness, and the
> > technique works pretty well.  But this kind of mapping is not
> > amenable to using vmap_phys_range().
> 
> Hmm.  Can you point me to where this is mapped?  Especially for the
> classic non-isolated case where no vmap/vmalloc mapping is involved
> at all?

The existing code is in hv_ringbuffer_init() in drivers/hv/ring_buffer.c.
The code hasn't changed in a while, so any recent upstream code tree
is valid to look at.  The memory pages are typically allocated
in vmbus_alloc_ring() in drivers/hv/channel.c.

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver

2021-09-01 Thread Michael Kelley via iommu

From: Michael Kelley  Sent: Wednesday, September 1, 
2021 7:34 PM

[snip]

> > +int netvsc_dma_map(struct hv_device *hv_dev,
> > +  struct hv_netvsc_packet *packet,
> > +  struct hv_page_buffer *pb)
> > +{
> > +   u32 page_count =  packet->cp_partial ?
> > +   packet->page_buf_cnt - packet->rmsg_pgcnt :
> > +   packet->page_buf_cnt;
> > +   dma_addr_t dma;
> > +   int i;
> > +
> > +   if (!hv_is_isolation_supported())
> > +   return 0;
> > +
> > +   packet->dma_range = kcalloc(page_count,
> > +   sizeof(*packet->dma_range),
> > +   GFP_KERNEL);
> > +   if (!packet->dma_range)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < page_count; i++) {
> > +   char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
> > ++ pb[i].offset);
> > +   u32 len = pb[i].len;
> > +
> > +   dma = dma_map_single(_dev->device, src, len,
> > +DMA_TO_DEVICE);
> > +   if (dma_mapping_error(_dev->device, dma)) {
> > +   kfree(packet->dma_range);
> > +   return -ENOMEM;
> > +   }
> > +
> > +   packet->dma_range[i].dma = dma;
> > +   packet->dma_range[i].mapping_size = len;
> > +   pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
> > +   pb[i].offset = offset_in_hvpage(dma);
> > +   pb[i].len = len;
> > +   }
> 
> Just to confirm, this driver does *not* set the DMA min_align_mask
> like storvsc does.  So after the call to dma_map_single(), the offset
> in the page could be different.  That's why you are updating
> the pb[i].offset value.  Alternatively, you could set the DMA
> min_align_mask, which would ensure the offset is unchanged.
> I'm OK with either approach, though perhaps a comment is
> warranted to explain, as this is a subtle issue.
> 

On second thought, I don't think either approach is OK.  The default
alignment in the swiotlb is 2K, and if the length of the data in the
buffer was 3K, the data could cross a page boundary in the bounce
buffer when it originally did not.  This would break the above code
which can only deal with one page at a time.  So I think the netvsc
driver also must set the DMA min_align_mask to 4K, which will
preserve the offset.

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyperv provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyperv requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
>  * Introduce sev_es_ghcb_hv_call_simple() and share code
>between SEV and Hyper-V code.
> Change since v3:
>  * Pass old_msg_type to hv_signal_eom() as parameter.
>* Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
>* Add hv_isolation_type_snp() weak function.
>* Add maros to set syinc register in ARM code.
> ---
>  arch/arm64/include/asm/mshyperv.h |  23 ++
>  arch/x86/hyperv/hv_init.c |  36 ++
>  arch/x86/hyperv/ivm.c | 112 ++
>  arch/x86/include/asm/mshyperv.h   |  80 -
>  arch/x86/include/asm/sev.h|   3 +
>  arch/x86/kernel/sev-shared.c  |  63 ++---
>  drivers/hv/hv.c   | 112 --
>  drivers/hv/hv_common.c|   6 ++
>  include/asm-generic/mshyperv.h|   4 +-
>  9 files changed, 345 insertions(+), 94 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mshyperv.h 
> b/arch/arm64/include/asm/mshyperv.h
> index 20070a847304..ced83297e009 100644
> --- a/arch/arm64/include/asm/mshyperv.h
> +++ b/arch/arm64/include/asm/mshyperv.h
> @@ -41,6 +41,29 @@ static inline u64 hv_get_register(unsigned int reg)
>   return hv_get_vpreg(reg);
>  }
> 
> +#define hv_get_simp(val) { val = hv_get_register(HV_REGISTER_SIMP); }
> +#define hv_set_simp(val) hv_set_register(HV_REGISTER_SIMP, val)
> +
> +#define hv_get_siefp(val){ val = hv_get_register(HV_REGISTER_SIEFP); }
> +#define hv_set_siefp(val)hv_set_register(HV_REGISTER_SIEFP, val)
> +
> +#define hv_get_synint_state(int_num, val) {  \
> + val = hv_get_register(HV_REGISTER_SINT0 + int_num); \
> + }
> +
> +#define hv_set_synint_state(int_num, val)\
> + hv_set_register(HV_REGISTER_SINT0 + int_num, val)
> +
> +#define hv_get_synic_state(val) {\
> + val = hv_get_register(HV_REGISTER_SCONTROL);\
> + }
> +
> +#define hv_set_synic_state(val)  \
> + hv_set_register(HV_REGISTER_SCONTROL, val)
> +
> +#define hv_signal_eom(old_msg_type)   \
> + hv_set_register(HV_REGISTER_EOM, 0)
> +
>  /* SMCCC hypercall parameters */
>  #define HV_SMCCC_FUNC_NUMBER 1
>  #define HV_FUNC_ID   ARM_SMCCC_CALL_VAL( \
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b1aa42f60faa..be6210a3fd2f 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> -void __percpu **hv_ghcb_pg;
> +union hv_ghcb __percpu **hv_ghcb_pg;
> 
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> @@ -406,7 +406,7 @@ void __init hyperv_init(void)
>   }
> 
>   if (hv_isolation_type_snp()) {
> - hv_ghcb_pg = alloc_percpu(void *);
> + hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
>   if (!hv_ghcb_pg)
>   goto free_vp_assist_page;
>   }
> @@ -424,6 +424,9 @@ void __init hyperv_init(void)
>   guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> +
>   hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
>   VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -501,6 +504,7 @@ void __init hyperv_init(void)
> 
>  clean_guest_os_id:
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>   cpuhp_remove_state(cpuhp);
>  free_ghcb_page:
>   free_percpu(hv_ghcb_pg);
> @@ -522,6 +526,7 @@ void hyperv_cleanup(void)
> 
>   /* Reset our OS id */
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>   /*
>* Reset hypercall page reference before reset the page,
> @@ -592,30 +597,3 @@ bool hv_is_hyperv_initialized(void)
>   return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> - if (!(ms_hyperv.priv_high & HV_ISOLATION))
> - return

RE: [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma adress will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Add comment to explain why not to use dma_map_sg()
>   * Fix some error handle.
> ---
>  arch/x86/hyperv/ivm.c |   1 +
>  drivers/net/hyperv/hyperv_net.h   |   5 ++
>  drivers/net/hyperv/netvsc.c   | 135 +-
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h|   5 ++
>  5 files changed, 145 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 84563b3c9f3a..08d8e01de017 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -317,6 +317,7 @@ void *hv_map_memory(void *addr, unsigned long size)
> 
>   return vaddr;
>  }
> +EXPORT_SYMBOL_GPL(hv_map_memory);
> 
>  void hv_unmap_memory(void *addr)
>  {
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index aa7c9962dbd8..862419912bfb 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>   u32 total_bytes;
>   u32 send_buf_index;
>   u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
> 
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + void *recv_original_buf;
>   u32 recv_buf_size; /* allocated bytes */
>   u32 recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
> @@ -1082,6 +1084,7 @@ struct netvsc_device {
> 
>   /* Send buffer allocated by us */
>   void *send_buf;
> + void *send_original_buf;
>   u32 send_buf_size;
>   u32 send_buf_gpadl_handle;
>   u32 send_section_cnt;
> @@ -1731,4 +1734,6 @@ struct rndis_message {
>  #define RETRY_US_HI  1
>  #define RETRY_MAX2000/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +   struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index f19b6a63..edd336b08c2c 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
>   int i;
> 
>   kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_original_buf) {
> + vunmap(nvdev->recv_buf);

In patch 11, you have added a hv_unmap_memory()
function as the inverse of hv_map_memory().  Since this
buffer was mapped with hv_map_memory() and you have
added that function, the cleanup should use
hv_unmap_memory() rather than calling vunmap() directly.

> + vfree(nvdev->recv_original_buf);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_original_buf) {
> + vunmap(nvdev->send_buf);

Same here.

> + vfree(nvdev->send_original_buf);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
>   kfree(nvdev->send_section_map);
> 
>   for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -347,6 +360,7 @@ static int netvsc_init_buf(struct hv_device *device,
>   unsigned int buf_size;
>   size_t map_words;
>   int i, ret = 0;
> + void *vaddr;
> 
>   /* Get receive buffer area. */
>   buf_size = device_info->recv_sections * device_info->recv_section_size;
> @@ -382,6 +396,17 @@ static int netvsc_init_buf(struct hv_device *device,
>   goto cleanup;
>   }
> 
> + if (hv_isolation_type_snp()) {
> + vaddr = hv_map_memory(net_device->recv_buf, buf_size);

Since the netvsc driver is architecture neutral, this code also needs
to compile for ARM64.  A stub will be needed for hv_map_memory()
on the ARM64 side.  Same for hv_unmap_memory() as suggested
above.  Or better, move hv_map_memory() and hv_unmap_memory()
to an architecture neutral module such as hv_common.c.

Or if Christop's approach of creating the vmap_phys_addr() helper
comes to fruition, that's an even better approach since it will already
handle multiple architectures.

> + if (!vaddr) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + net_device->recv_original_buf = net_device->recv_buf;
> + net_device->recv_buf = vaddr;
> + }
> +
>   /* Notify the

RE: [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Per previous comment, the Subject line tag should be "scsi: storvsc: "

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still needs to be handled. Use DMA API(dma_map_sg) to map
> these memory during sending/receiving packet and return swiotlb bounce
> buffer dma address. In Isolation VM, swiotlb  bounce buffer is marked
> to be visible to host and the swiotlb force mode is enabled.
> 
> Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
> keep the original data offset in the bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Rplace dma_map_page with dma_map_sg()
>   * Use for_each_sg() to populate payload->range.pfn_array.
>   * Remove storvsc_dma_map macro
> ---
>  drivers/hv/vmbus_drv.c |  1 +
>  drivers/scsi/storvsc_drv.c | 41 +++---
>  include/linux/hyperv.h |  1 +
>  3 files changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index f068e22a5636..270d526fd9de 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2124,6 +2124,7 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   hv_debug_add_dev_dir(child_device_obj);
> 
>   child_device_obj->device.dma_mask = _dma_mask;
> + child_device_obj->device.dma_parms = _device_obj->dma_parms;
>   return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 328bb961c281..4f1793be1fdc 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
>  #include 
>  #include 
>  #include 
> @@ -1312,6 +1314,9 @@ static void storvsc_on_channel_callback(void *context)
>   continue;
>   }
>   request = (struct storvsc_cmd_request 
> *)scsi_cmd_priv(scmnd);
> + if (scsi_sg_count(scmnd))
> + dma_unmap_sg(>device, 
> scsi_sglist(scmnd),
> +  scsi_sg_count(scmnd), 
> scmnd->sc_data_direction);

Use scsi_dma_unmap(), which does exactly what you have written
above. :-)

>   }
> 
>   storvsc_on_receive(stor_device, packet, request);
> @@ -1725,7 +1730,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
> struct scsi_cmnd *scmnd)
>   struct hv_host_device *host_dev = shost_priv(host);
>   struct hv_device *dev = host_dev->dev;
>   struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> - int i;
>   struct scatterlist *sgl;
>   unsigned int sg_count;
>   struct vmscsi_request *vm_srb;
> @@ -1807,10 +1811,11 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload_sz = sizeof(cmd_request->mpb);
> 
>   if (sg_count) {
> - unsigned int hvpgoff, hvpfns_to_add;
>   unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>   unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> - u64 hvpfn;
> + struct scatterlist *sg;
> + unsigned long hvpfn, hvpfns_to_add;
> + int j, i = 0;
> 
>   if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1824,31 +1829,16 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload->range.len = length;
>   payload->range.offset = offset_in_hvpg;
> 
> + if (dma_map_sg(>device, sgl, sg_count,
> + scmnd->sc_data_direction) == 0)
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> 
> - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
> - /*
> -  * Init values for the current sgl entry. hvpgoff
> -  * and hvpfns_to_add are in units of Hyper-V size
> -  * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
> -  * case also handles values of sgl->offset that are
> -  * larger than PAGE_SIZE. Such offsets are handled
> -  * even on other than the first sgl entry, provided
> -  * they are a multiple of PAGE_SIZE.
> -  */

Any reason not to keep this comment?  It's still correct and
mentions important cases that must be handled.

> - hvpgoff = HVPFN_DOWN(sgl->offset);
> - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
> - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
> -

RE: [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Swiotlb bounce buffer code calls dma_map_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space. Populate dma memory decrypted ops with hv
> map/unmap function.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> The map function vmap_pfn() can't work in the early place
> hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
> buffer in the hyperv_iommu_swiotlb_later_init().
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>* Get hyperv bounce bufffer size via default swiotlb
>bounce buffer size function and keep default size as
>same as the one in the AMD SEV VM.
> ---
>  arch/x86/hyperv/ivm.c   | 28 +++
>  arch/x86/include/asm/mshyperv.h |  2 ++
>  arch/x86/mm/mem_encrypt.c   |  3 +-
>  arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
>  drivers/hv/vmbus_drv.c  |  3 ++
>  drivers/iommu/hyperv-iommu.c| 61 +
>  include/linux/hyperv.h  |  1 +
>  7 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index e761c67e2218..84563b3c9f3a 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -294,3 +294,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int 
> numpages, bool visible)
> 
>   return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
>  }
> +
> +/*
> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> + */
> +void *hv_map_memory(void *addr, unsigned long size)
> +{
> + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
> +   sizeof(unsigned long), GFP_KERNEL);

Should be PAGE_SIZE, not HV_HYP_PAGE_SIZE, since this code
only manipulates guest page tables.  There's no communication with
Hyper-V that requires HV_HYP_PAGE_SIZE.

> + void *vaddr;
> + int i;
> +
> + if (!pfns)
> + return NULL;
> +
> + for (i = 0; i < size / PAGE_SIZE; i++)
> + pfns[i] = virt_to_hvpfn(addr + i * PAGE_SIZE) +

Use virt_to_pfn(), not virt_to_hvpfn(), for the same reason.

> + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
> +
> + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
> + kfree(pfns);
> +
> + return vaddr;
> +}
> +
> +void hv_unmap_memory(void *addr)
> +{
> + vunmap(addr);
> +}
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index b77f4caee3ee..627fcf8d443c 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -252,6 +252,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct 
> hv_interrupt_entry *entry);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  enum hv_mem_host_visibility visibility);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool 
> visible);
> +void *hv_map_memory(void *addr, unsigned long size);
> +void hv_unmap_memory(void *addr);
>  void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
>  void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ff08dc463634..e2db0b8ed938 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "mm_internal.h"
> 
> @@ -202,7 +203,7 @@ void __init sev_setup_arch(void)
>   phys_addr_t total_mem = memblock_phys_mem_size();
>   unsigned long size;
> 
> - if (!sev_active())
> + if (!sev_active() && !hv_is_isolation_supported())
>   return;
> 
>   /*
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 54f9aa7e8457..43bd031aa332 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++

RE: [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Subject tag should be "Drivers: hv: vmbus: "

> VMbus ring buffer are shared with host and it's need to
> be accessed via extra address space of Isolation VM with
> AMD SNP support. This patch is to map the ring buffer
> address in extra address space via vmap_pfn(). Hyperv set
> memory host visibility hvcall smears data in the ring buffer
> and so reset the ring buffer memory to zero after mapping.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Remove hv_ringbuffer_post_init(), merge map
>   operation for Isolation VM into hv_ringbuffer_init()
>   * Call hv_ringbuffer_init() after __vmbus_establish_gpadl().
> ---
>  drivers/hv/Kconfig   |  1 +
>  drivers/hv/channel.c | 19 +++---
>  drivers/hv/ring_buffer.c | 56 ++--
>  3 files changed, 54 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index d1123ceb38f3..dd12af20e467 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -8,6 +8,7 @@ config HYPERV
>   || (ARM64 && !CPU_BIG_ENDIAN))
>   select PARAVIRT
>   select X86_HV_CALLBACK_VECTOR if X86
> + select VMAP_PFN
>   help
> Select this option to run Linux as a Hyper-V client operating
> system.
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 82650beb3af0..81f8629e4491 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -679,15 +679,6 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   if (!newchannel->max_pkt_size)
>   newchannel->max_pkt_size = VMBUS_DEFAULT_MAX_PKT_SIZE;
> 
> - err = hv_ringbuffer_init(>outbound, page, send_pages, 0);
> - if (err)
> - goto error_clean_ring;
> -
> - err = hv_ringbuffer_init(>inbound, [send_pages],
> -  recv_pages, newchannel->max_pkt_size);
> - if (err)
> - goto error_clean_ring;
> -
>   /* Establish the gpadl for the ring buffer */
>   newchannel->ringbuffer_gpadlhandle = 0;
> 
> @@ -699,6 +690,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   if (err)
>   goto error_clean_ring;
> 
> + err = hv_ringbuffer_init(>outbound,
> +  page, send_pages, 0);
> + if (err)
> + goto error_free_gpadl;
> +
> + err = hv_ringbuffer_init(>inbound, [send_pages],
> +  recv_pages, newchannel->max_pkt_size);
> + if (err)
> + goto error_free_gpadl;
> +
>   /* Create and init the channel open message */
>   open_info = kzalloc(sizeof(*open_info) +
>  sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 2aee356840a2..24d64d18eb65 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include "hyperv_vmbus.h"
> 
> @@ -183,8 +185,10 @@ void hv_ringbuffer_pre_init(struct vmbus_channel 
> *channel)
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  struct page *pages, u32 page_cnt, u32 max_pkt_size)
>  {
> - int i;
>   struct page **pages_wraparound;
> + unsigned long *pfns_wraparound;
> + u64 pfn;
> + int i;
> 
>   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
> 
> @@ -192,23 +196,49 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
> *ring_info,
>* First page holds struct hv_ring_buffer, do wraparound mapping for
>* the rest.
>*/
> - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> -GFP_KERNEL);
> - if (!pages_wraparound)
> - return -ENOMEM;
> + if (hv_isolation_type_snp()) {
> + pfn = page_to_pfn(pages) +
> + HVPFN_DOWN(ms_hyperv.shared_gpa_boundary);

Use PFN_DOWN, not HVPFN_DOWN.  This is all done in units of guest page
size, not Hyper-V page size.

> 
> - pages_wraparound[0] = pages;
> - for (i = 0; i < 2 * (page_cnt - 1); i++)
> - pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
> + pfns_wraparound = kcalloc(page_cnt * 2 - 1,
> + sizeof(unsigned long), GFP_KERNEL);
> + if (!pfns_wraparound)
> + return -ENOMEM;
> 
> - ring_info->ring_buffer = (struct hv_ring_buffer *)
> - vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
> + pfns_wraparound[0] = pfn;
> + for (i = 0; i < 2 * (page_cnt - 1); i++)
> + pfns_wraparound[i + 1] = pfn + i % (page_cnt - 1) + 1;
> 
> - kfree(pages_wraparound);
> + ring_info->ring_buffer = (struct hv_ring_buffer *)
> + vmap_pfn(pfns_wraparound, page_cnt * 2 - 1,
> +

RE: [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate message

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Subject line tag should be "Drivers: hv: vmbus:"

> The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
> with host in Isolation VM and so it's necessary to use hvcall to set
> them visible to host. In Isolation VM with AMD SEV SNP, the access
> address should be in the extra space which is above shared gpa
> boundary. So remap these pages into the extra address(pa +
> shared_gpa_boundary).
> 
> Introduce monitor_pages_original[] in the struct vmbus_connection
> to store monitor page virtual address returned by hv_alloc_hyperv_
> zeroed_page() and free monitor page via monitor_pages_original in
> the vmbus_disconnect(). The monitor_pages[] is to used to access
> monitor page and it is initialized to be equal with monitor_pages_
> original. The monitor_pages[] will be overridden in the isolation VM
> with va of extra address.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Rename monitor_pages_va with monitor_pages_original
>   * free monitor page via monitor_pages_original and
> monitor_pages is used to access monitor page.
> 
> Change since v1:
> * Not remap monitor pages in the non-SNP isolation VM.
> ---
>  drivers/hv/connection.c   | 75 ---
>  drivers/hv/hyperv_vmbus.h |  1 +
>  2 files changed, 72 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6d315c1465e0..9a48d8115c87 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "hyperv_vmbus.h"
> @@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
> 
>   msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
>   msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> + if (hv_isolation_type_snp()) {
> + msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
> + msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
> + }
> +
>   msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
> 
>   /*
> @@ -148,6 +155,35 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>   return -ECONNREFUSED;
>   }
> 
> +
> + if (hv_is_isolation_supported()) {
> + if (hv_isolation_type_snp()) {
> + vmbus_connection.monitor_pages[0]
> + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[0])
> + return -ENOMEM;
> +
> + vmbus_connection.monitor_pages[1]
> + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[1]) {
> + memunmap(vmbus_connection.monitor_pages[0]);
> + return -ENOMEM;
> + }
> + }
> +
> + /*
> +  * Set memory host visibility hvcall smears memory
> +  * and so zero monitor pages here.
> +  */
> + memset(vmbus_connection.monitor_pages[0], 0x00,
> +HV_HYP_PAGE_SIZE);
> + memset(vmbus_connection.monitor_pages[1], 0x00,
> +HV_HYP_PAGE_SIZE);
> +
> + }

I still find it somewhat confusing to have the handling of the
shared_gpa_boundary and memory mapping in the function for
negotiating the VMbus version.  I think the code works as written,
but it would seem cleaner and easier to understand to precompute
the physical addresses and do all the mapping and memory zero'ing
in a single place in vmbus_connect().  Then the negotiate version
function can focus on doing only the version negotiation.

> +
>   return ret;
>  }
> 
> @@ -159,6 +195,7 @@ int vmbus_connect(void)
>   struct vmbus_channel_msginfo *msginfo = NULL;
>   int i, ret = 0;
>   __u32 version;
> + u64 pfn[2];
> 
>   /* Initialize the vmbus connection */
>   vmbus_connection.conn_state = CONNECTING;
> @@ -216,6 +253,21 @@ int vmbus_connect(void)
>   goto cleanup;
>   }
> 
> + vmbus_connection.monitor_pages_original[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages_original[1]
> + = vmbus_connection.monitor_pages[1];
> +
> + if (hv_is_isolation_supported()) {
> + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> + if (hv_mark_gpa_visibility(2, pfn,
> + VMBUS_PAGE_VISIBLE_READ_WRITE)) {

In Patch 4 of

RE: [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Subject line tag should probably be "x86/hyperv:" since the majority
of the code added is under arch/x86.

> hyperv provides ghcb hvcall to handle VMBus
> HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
> msg in SNP Isolation VM. Add such support.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Add hv_ghcb_hypercall() stub function to avoid
> compile error for ARM.
> ---
>  arch/x86/hyperv/ivm.c  | 71 ++
>  drivers/hv/connection.c|  6 ++-
>  drivers/hv/hv.c|  8 +++-
>  drivers/hv/hv_common.c |  6 +++
>  include/asm-generic/mshyperv.h |  1 +
>  5 files changed, 90 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index f56fe4f73000..e761c67e2218 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -17,10 +17,81 @@
>  #include 
>  #include 
> 
> +#define GHCB_USAGE_HYPERV_CALL   1
> +
>  union hv_ghcb {
>   struct ghcb ghcb;
> + struct {
> + u64 hypercalldata[509];
> + u64 outputgpa;
> + union {
> + union {
> + struct {
> + u32 callcode: 16;
> + u32 isfast  : 1;
> + u32 reserved1   : 14;
> + u32 isnested: 1;
> + u32 countofelements : 12;
> + u32 reserved2   : 4;
> + u32 repstartindex   : 12;
> + u32 reserved3   : 4;
> + };
> + u64 asuint64;
> + } hypercallinput;
> + union {
> + struct {
> + u16 callstatus;
> + u16 reserved1;
> + u32 elementsprocessed : 12;
> + u32 reserved2 : 20;
> + };
> + u64 asunit64;
> + } hypercalloutput;
> + };
> + u64 reserved2;
> + } hypercall;
>  } __packed __aligned(HV_HYP_PAGE_SIZE);
> 
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!hv_ghcb_pg)
> + return -EFAULT;
> +
> + WARN_ON(in_nmi());
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return -EFAULT;
> + }
> +
> + hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
> + hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
> +
> + hv_ghcb->hypercall.outputgpa = (u64)output;
> + hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> + hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> + if (input_size)
> + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> + VMGEXIT();
> +
> + hv_ghcb->ghcb.ghcb_usage = 0x;
> + memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
> +sizeof(hv_ghcb->ghcb.save.valid_bitmap));
> +
> + local_irq_restore(flags);
> +
> + return hv_ghcb->hypercall.hypercalloutput.callstatus;

The hypercall.hypercalloutput.callstatus value must be saved
in a local variable *before* the call to local_irq_restore().  Then
the local variable is the return value.  Once local_irq_restore()
is called, the GHCB page could get reused.

> +}
> +
>  void hv_ghcb_msr_write(u64 msr, u64 value)
>  {
>   union hv_ghcb *hv_ghcb;
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 5e479d54918c..6d315c1465e0 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
> 
>   ++channel->sig_events;
> 
> - hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
> + if (hv_isolation_type_snp())
> + hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
> + NULL, sizeof(u64));

Better to use "sizeof(channel->sig_event)" instead of explicitly coding
the type.

> + else
> + hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_set_event);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 97b21256a9db..d4531c64d9d3 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
>   aligned_msg->payload_size = payload_size;
>

RE: [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>* Change vmbus_teardown_gpadl() parameter and put gpadl handle,
>buffer and buffer size in the struct vmbus_gpadl.
> ---
>  drivers/hv/channel.c| 36 -
>  drivers/net/hyperv/hyperv_net.h |  1 +
>  drivers/net/hyperv/netvsc.c | 16 +++
>  drivers/uio/uio_hv_generic.c| 14 +++--
>  include/linux/hyperv.h  |  8 +++-
>  5 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..82650beb3af0 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   if (ret)
>   return ret;
> 
> + ret = set_memory_decrypted((unsigned long)kbuffer,
> +HVPFN_UP(size));
> + if (ret) {
> + pr_warn("Failed to set host visibility for new GPADL %d.\n", 
> ret);
> + return ret;
> + }
> +
>   init_completion(>waitevent);
>   msginfo->waiting_channel = channel;
> 
> @@ -549,6 +557,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   }
> 
>   kfree(msginfo);
> +
> + if (ret)
> + set_memory_encrypted((unsigned long)kbuffer,
> +  HVPFN_UP(size));
> +
>   return ret;
>  }
> 
> @@ -639,6 +652,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   struct vmbus_channel_open_channel *open_msg;
>   struct vmbus_channel_msginfo *open_info = NULL;
>   struct page *page = newchannel->ringbuffer_page;
> + struct vmbus_gpadl gpadl;
>   u32 send_pages, recv_pages;
>   unsigned long flags;
>   int err;
> @@ -759,7 +773,10 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  error_free_info:
>   kfree(open_info);
>  error_free_gpadl:
> - vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
> + gpadl.gpadl_handle = newchannel->ringbuffer_gpadlhandle;
> + gpadl.buffer = page_address(newchannel->ringbuffer_page);
> + gpadl.size = (send_pages + recv_pages) << PAGE_SHIFT;
> + vmbus_teardown_gpadl(newchannel, );
>   newchannel->ringbuffer_gpadlhandle = 0;
>  error_clean_ring:
>   hv_ringbuffer_cleanup(>outbound);
> @@ -806,7 +823,7 @@ EXPORT_SYMBOL_GPL(vmbus_open);
>  /*
>   * vmbus_teardown_gpadl -Teardown the specified GPADL handle
>   */
> -int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
> +int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl 
> *gpadl)
>  {
>   struct vmbus_channel_gpadl_teardown *msg;
>   struct vmbus_channel_msginfo *info;
> @@ -825,7 +842,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
> u32 gpadl_handle)
> 
>   msg->header.msgtype = CHANNELMSG_GPADL_TEARDOWN;
>   msg->child_relid = channel->offermsg.child_relid;
> - msg->gpadl = gpadl_handle;
> + msg->gpadl = gpadl->gpadl_handle;
> 
>   spin_lock_irqsave(_connection.channelmsg_lock, flags);
>   list_add_tail(>msglistentry,
> @@ -859,6 +876,12 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
> u32 gpadl_handle)
>   spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
> 
>   kfree(info);
> +
> + ret = set_memory_encrypted((unsigned long)gpadl->buffer,
> +HVPFN_UP(gpadl->size));
> + if (ret)
> + pr_warn("Fail to set mem host visibility in GPADL teardown 
> %d.\n", ret);
> +
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
> @@ -896,6 +919,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
>  static int vmbus_close_internal(struct vmbus_channel *channel)
>  {
>   struct vmbus_channel_close_channel *msg;
> + struct vmbus_gpadl gpadl;
>   int ret;
> 
>   vmbus_reset_channel_cb(channel);
> @@ -934,8 +958,10 @@ static int vmbus_close_internal(struct vmbus_channel 
> *channel)
> 
>   /* Tear down the gpadl for the channel's ring buffer */
>   else if (channel->ringbuffer_gpadlhandle) {
> - ret = vmbus_teardown_gpadl(channel,
> -channel->ringbuffer_gpadlhandle);
> + gpadl.gpadl_handle = channel->ringbuffer_gpadlhandle;
> + gpadl.buffer = page_address(channel->ringbuffer_page);
> + gpadl.size = channel->ringbuffer_pagecount;
> + ret = vmbus_teardown_gpadl(channel, );
>   if (ret) {
>   pr_err("Close failed: teardown gpadl return %d\n", ret);
>   /*
> diff --git

RE: [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Acked-by: Dave Hansen 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Fix error code handle in the __hv_set_mem_host_visibility().
>   * Move HvCallModifySparseGpaPageHostVisibility near to enum
> hv_mem_host_visibility.
> 
> Change since v2:
>* Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>  according to platform check.
> 
> Change since v1:
>* Use new staic call x86_set_memory_enc to avoid add Hyper-V
>  specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile   |   2 +-
>  arch/x86/hyperv/hv_init.c  |   6 ++
>  arch/x86/hyperv/ivm.c  | 113 +
>  arch/x86/include/asm/hyperv-tlfs.h |  17 +
>  arch/x86/include/asm/mshyperv.h|   4 +-
>  arch/x86/mm/pat/set_memory.c   |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h |   1 +
>  8 files changed, 156 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index eba10ed4f73e..b1aa42f60faa 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -603,6 +603,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> 
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;

Use "return false" per previous comment from Wei Liu.

> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;

Use "return false".

> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> new file mode 100644
> index ..a069c788ce3c
> --- /dev/null
> +++ b/arch/x86/hyperv/ivm.c
> @@ -0,0 +1,113 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V Isolation VM interface with paravisor and hypervisor
> + *
> + * Author:
> + *  Tianyu Lan 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
> + *
> + * In Isolation VM, all guest memory is encripted from host and guest

s/encripted/encrypted/

> + * needs to set memory visible to host via hvcall before sharing memory
> + * with host.
> + */
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +enum hv_mem_host_visibility visibility)
> +{
> + struct hv_gpa_range_for_visibility **input_pcpu, *input;
> + u16 pages_processed;
> + u64 hv_status;
> + unsigned long flags;
> +
> + /* no-op if partition isolation is not enabled */
> + if (!hv_is_isolation_supported())
> + return 0;
> +
> + if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> + pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> + HV_MAX_MODIFY_GPA_REP_COUNT);
> + return -EINVAL;
> + }
> +
> + local_irq_save(flags);
> + input_pcpu = (struct hv_gpa_range_for_visibility **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> + input = *input_pcpu;
> + if (unlikely(!input)) {
> + local_irq_restore(flags);
> + return -EINVAL;
> + }
> +
> + input->partition_id = HV_PARTITION_ID_SELF;
> + input->host_visibility = visibility;
> + input->reserved0 = 0;
> + input->reserved1 = 0;
> + memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> + hv_status = hv_do_rep_hypercall(
> + HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> + 0, input, _processed);
> + local_irq_restore(flags);
> +
> + if (hv_result_success(hv_status))
> + return 0;
> + else
> + return -EFAULT;
> +}
> +EXPORT_SYMBOL(hv_mark_gpa_visibility);

In later comments on Patch 7 of this series, I have suggested that
code in that patch should not call hv_mark_gpa_visibility() directly,
but instead should call set_memory_encrypted() and
set_memory_decrypted().  I'm thinking that those functions should
be the standard way to change the visibility of pages in the

RE: [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the Isolation VM.

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyper-V exposes shared memory boundary via cpuid
> HYPERV_CPUID_ISOLATION_CONFIG and store it in the
> shared_gpa_boundary of ms_hyperv struct. This prepares
> to share memory with host for SNP guest.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * user BIT_ULL to get shared_gpa_boundary
>   * Rename field Reserved* to reserved
> ---
>  arch/x86/kernel/cpu/mshyperv.c |  2 ++
>  include/asm-generic/mshyperv.h | 12 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 20557a9d6e25..8bb001198316 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
>   if (ms_hyperv.priv_high & HV_ISOLATION) {
>   ms_hyperv.isolation_config_a = 
> cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>   ms_hyperv.isolation_config_b = 
> cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
> + ms_hyperv.shared_gpa_boundary =
> + BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> 
>   pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
> 0x%x\n",
>   ms_hyperv.isolation_config_a, 
> ms_hyperv.isolation_config_b);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 0924bbd8458e..7537ae1db828 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -35,7 +35,17 @@ struct ms_hyperv_info {
>   u32 max_vp_index;
>   u32 max_lp_index;
>   u32 isolation_config_a;
> - u32 isolation_config_b;
> + union {
> + u32 isolation_config_b;
> + struct {
> + u32 cvm_type : 4;
> + u32 reserved11 : 1;
> + u32 shared_gpa_boundary_active : 1;
> + u32 shared_gpa_boundary_bits : 6;
> + u32 reserved12 : 20;

I'm still curious about the "11" and "12" in the reserved
field names.  Why not just "reserved1" and "reserved2"?
Having the "11" and "12" isn't wrong, but it makes one
wonder why since it's not usual. :-)

> + };
> + };
> + u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;
> 
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM

2021-09-01 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest
> to communicate with hypervisor. Map GHCB page for all
> cpus to read/write MSR register and submit hvcall request
> via ghcb page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Chagne since v3:
> * Rename ghcb_base to hv_ghcb_pg and move it out of
> struct ms_hyperv_info.
>   * Allocate hv_ghcb_pg before cpuhp_setup_state() and leverage
> hv_cpu_init() to initialize ghcb page.
> ---
>  arch/x86/hyperv/hv_init.c   | 68 +
>  arch/x86/include/asm/mshyperv.h |  4 ++
>  arch/x86/kernel/cpu/mshyperv.c  |  3 ++
>  include/asm-generic/mshyperv.h  |  1 +
>  4 files changed, 69 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 708a2712a516..eba10ed4f73e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -36,12 +37,42 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> +void __percpu **hv_ghcb_pg;
> +
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> 
>  struct hv_vp_assist_page **hv_vp_assist_page;
>  EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> 
> +static int hyperv_init_ghcb(void)
> +{
> + u64 ghcb_gpa;
> + void *ghcb_va;
> + void **ghcb_base;
> +
> + if (!hv_isolation_type_snp())
> + return 0;
> +
> + if (!hv_ghcb_pg)
> + return -EINVAL;
> +
> + /*
> +  * GHCB page is allocated by paravisor. The address
> +  * returned by MSR_AMD64_SEV_ES_GHCB is above shared
> +  * ghcb boundary and map it here.

I'm not sure what the "shared ghcb boundary" is.  Did you
mean "shared_gpa_boundary"?

> +  */
> + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> + if (!ghcb_va)
> + return -ENOMEM;
> +
> + ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> + *ghcb_base = ghcb_va;
> +
> + return 0;
> +}
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>   union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,7 +116,7 @@ static int hv_cpu_init(unsigned int cpu)
>   }
>   }
> 
> - return 0;
> + return hyperv_init_ghcb();
>  }
> 
>  static void (*hv_reenlightenment_cb)(void);
> @@ -177,6 +208,14 @@ static int hv_cpu_die(unsigned int cpu)
>  {
>   struct hv_reenlightenment_control re_ctrl;
>   unsigned int new_cpu;
> + void **ghcb_va;
> +
> + if (hv_ghcb_pg) {
> + ghcb_va = (void **)this_cpu_ptr(hv_ghcb_pg);
> + if (*ghcb_va)
> + memunmap(*ghcb_va);
> + *ghcb_va = NULL;
> + }
> 
>   hv_common_cpu_die(cpu);
> 
> @@ -366,10 +405,16 @@ void __init hyperv_init(void)
>   goto common_free;
>   }
> 
> + if (hv_isolation_type_snp()) {
> + hv_ghcb_pg = alloc_percpu(void *);
> + if (!hv_ghcb_pg)
> + goto free_vp_assist_page;
> + }
> +
>   cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
> hv_cpu_init, hv_cpu_die);
>   if (cpuhp < 0)
> - goto free_vp_assist_page;
> + goto free_ghcb_page;
> 
>   /*
>* Setup the hypercall page and enable hypercalls.
> @@ -383,10 +428,8 @@ void __init hyperv_init(void)
>   VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>   __builtin_return_address(0));
> - if (hv_hypercall_pg == NULL) {
> - wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> - goto remove_cpuhp_state;
> - }
> + if (hv_hypercall_pg == NULL)
> + goto clean_guest_os_id;
> 
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>   hypercall_msr.enable = 1;
> @@ -456,8 +499,11 @@ void __init hyperv_init(void)
>   hv_query_ext_cap(0);
>   return;
> 
> -remove_cpuhp_state:
> +clean_guest_os_id:
> + wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>   cpuhp_remove_state(cpuhp);
> +free_ghcb_page:
> + free_percpu(hv_ghcb_pg);
>  free_vp_assist_page:
>   kfree(hv_vp_assist_page);
>   hv_vp_assist_page = NULL;
> @@ -559,3 +605,11 @@ bool hv_is_isolation_supported(void)
>  {
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> +
> +DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +
> +bool hv_isolation_type_snp(void)
> +{
> + return static_branch_unlikely(_type_snp);
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index adccbc209169..37739a277ac6 100644
> ---

RE: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-08-31 Thread Michael Kelley via iommu

From: Christoph Hellwig  Sent: Monday, August 30, 2021 5:01 AM
> 
> Sorry for the delayed answer, but I look at the vmap_pfn usage in the
> previous version and tried to come up with a better version.  This
> mostly untested branch:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hyperv-vmap
> 
> get us there for swiotlb and the channel infrastructure  I've started
> looking at the network driver and didn't get anywhere due to other work.
> 
> As far as I can tell the network driver does gigantic multi-megabyte
> vmalloc allocation for the send and receive buffers, which are then
> passed to the hardware, but always copied to/from when interacting
> with the networking stack.  Did I see that right?  Are these big
> buffers actually required unlike the normal buffer management schemes
> in other Linux network drivers?
> 
> If so I suspect the best way to allocate them is by not using vmalloc
> but just discontiguous pages, and then use kmap_local_pfn where the
> PFN includes the share_gpa offset when actually copying from/to the
> skbs.

As a quick overview, I think there are four places where the
shared_gpa_boundary must be applied to adjust the guest physical
address that is used.  Each requires mapping a corresponding
virtual address range.  Here are the four places:

1)  The so-called "monitor pages" that are a core communication
mechanism between the guest and Hyper-V.  These are two single
pages, and the mapping is handled by calling memremap() for
each of the two pages.  See Patch 7 of Tianyu's series.

2)  The VMbus channel ring buffers.  You have proposed using
your new  vmap_phys_range() helper, but I don't think that works
here.  More details below.

3)  The network driver send and receive buffers.  vmap_phys_range()
should work here.

4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
should work here as well.

Case #2 above does unusual mapping.  The ring buffer consists of a ring
buffer header page, followed by one or more pages that are the actual
ring buffer.  The pages making up the actual ring buffer are mapped
twice in succession.  For example, if the ring buffer has 4 pages
(one header page and three ring buffer pages), the contiguous
virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
The duplicate contiguous mapping allows the code that is reading
or writing the actual ring buffer to not be concerned about wrap-around
because writing off the end of the ring buffer is automatically
wrapped-around by the mapping.  The amount of data read or
written in one batch never exceeds the size of the ring buffer, and
after a batch is read or written, the read or write indices are adjusted
to put them back into the range of the first mapping of the actual
ring buffer pages.  So there's method to the madness, and the
technique works pretty well.  But this kind of mapping is not
amenable to using vmap_phys_range().

Michael


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-20 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 20, 2021 11:04 AM
> 
> On 8/21/2021 12:08 AM, Michael Kelley wrote:
>   }
> >>> The whole approach here is to do dma remapping on each individual page
> >>> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> >>> each scatterlist entry as a unit?  Each scatterlist entry describes a 
> >>> range of
> >>> physically contiguous memory.  After dma_map_sg(), the resulting dma
> >>> address must also refer to a physically contiguous range in the swiotlb
> >>> bounce buffer memory.   So at the top of the "for" loop over the 
> >>> scatterlist
> >>> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> >>> hvpfn value based on the dma address instead of sg_page().  But everything
> >>> else is the same, and the inner loop for populating the pfn_arry is 
> >>> unmodified.
> >>> Furthermore, the dma_range array that you've added is not needed, since
> >>> scatterlist entries already have a dma_address field for saving the mapped
> >>> address, and dma_unmap_sg() uses that field.
> >> I don't use dma_map_sg() here in order to avoid introducing one more
> >> loop(e,g dma_map_sg()). We already have a loop to populate
> >> cmd_request->dma_range[] and so do the dma map in the same loop.
> >>
> > I'm not seeing where the additional loop comes from.  Storvsc
> > already has a loop through the sgl entries.  Retain that loop and call
> > dma_map_sg() with nents set to 1.  Then the sequence is
> > dma_map_sg() --> dma_map_sg_attrs() --> dma_direct_map_sg() ->
> > dma_direct_map_page().  The latter function will call swiotlb_map()
> > to map all pages of the sgl entry as a single operation.
> 
> After dma_map_sg(), we still need to go through scatter list again to
> populate payload->rrange.pfn_array. We may just go through the scatter
> list just once if dma_map_sg() accepts a callback and run it during go
> through scatter list.

Here's some code for what I'm suggesting (not even compile tested).
The only change is what's in the "if" clause of the SNP test.  dma_map_sg()
is called with the nents parameter set to one so that it only
processes one sgl entry each time it is called, and doesn't walk the
entire sgl.  Arguably, we don't even need the SNP test and the else
clause -- just always do what's in the if clause.

The corresponding code in storvsc_on_channel_callback would also
have to be changed.   And we still have to set the min_align_mask
so swiotlb will preserve any offset.  Storsvsc already has things set up
so that higher levels ensure there are no holes between sgl entries,
and that needs to stay true.

if (sg_count) {
unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
u64 hvpfn;
int nents;

if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {

payload_sz = (hvpg_count * sizeof(u64) +
  sizeof(struct vmbus_packet_mpb_array));
payload = kzalloc(payload_sz, GFP_ATOMIC);
if (!payload)
return SCSI_MLQUEUE_DEVICE_BUSY;
}

payload->range.len = length;
payload->range.offset = offset_in_hvpg;

for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
/*
 * Init values for the current sgl entry. hvpgoff
 * and hvpfns_to_add are in units of Hyper-V size
 * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
 * case also handles values of sgl->offset that are
 * larger than PAGE_SIZE. Such offsets are handled
 * even on other than the first sgl entry, provided
 * they are a multiple of PAGE_SIZE.
 */
hvpgoff = HVPFN_DOWN(sgl->offset);

if (hv_isolation_type_snp()) {
nents = dma_map_sg(dev->device, sgl, 1, 
scmnd->sc_data_direction);
if (nents != 1)

hvpfn = HVPFN_DOWN(sg_dma_address(sgl)) + 
hvpgoff;
} else {
hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
}

hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
hvpgoff;

/*
 * Fill the next portion of the PFN array with
 * sequential Hyper-V PFNs for the contiguous physical
 * memory described by the sgl entry. The end of the
 * last

RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-20 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, August 20, 2021 8:20 AM
> 
> On 8/20/2021 2:17 AM, Michael Kelley wrote:
> > From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> >
> > I'm not clear on why payload->range.offset needs to be set again.
> > Even after the dma mapping is done, doesn't the offset in the first
> > page have to be the same?  If it wasn't the same, Hyper-V wouldn't
> > be able to process the PFN list correctly.  In fact, couldn't the above
> > code just always set offset_in_hvpg = 0?
> 
> The offset will be changed. The swiotlb bounce buffer is allocated with
> IO_TLB_SIZE(2K) as unit. So the offset here may be changed.
> 

We need to prevent the offset from changing.  The storvsc driver passes
just a PFN list to Hyper-V, plus an overall starting offset and length.  Unlike
the netvsc driver, each entry in the PFN list does *not* have its own offset
and length.  Hyper-V assumes that the list is "dense" and that there are
no holes (i.e., unused memory areas).

For example, consider an original buffer passed into storvsc_queuecommand()
of 8 Kbytes, but aligned with 1 Kbytes at the end of the first page, then
4 Kbytes in the second page, and 3 Kbytes in the beginning of the third page.
The offset of that first 1 Kbytes has to remain as 3 Kbytes.  If bounce 
buffering
moves it to a different offset, there's no way to tell Hyper-V to ignore the
remaining bytes in the first page (at least not without using a different
method to communicate with Hyper-V).   In such a case, the wrong
data will get transferred.  Presumably the easier solution is to set the
min_align_mask field as Christop suggested.

> 
> >
> >>}
> >
> > The whole approach here is to do dma remapping on each individual page
> > of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> > each scatterlist entry as a unit?  Each scatterlist entry describes a range 
> > of
> > physically contiguous memory.  After dma_map_sg(), the resulting dma
> > address must also refer to a physically contiguous range in the swiotlb
> > bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> > entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> > hvpfn value based on the dma address instead of sg_page().  But everything
> > else is the same, and the inner loop for populating the pfn_arry is 
> > unmodified.
> > Furthermore, the dma_range array that you've added is not needed, since
> > scatterlist entries already have a dma_address field for saving the mapped
> > address, and dma_unmap_sg() uses that field.
> 
> I don't use dma_map_sg() here in order to avoid introducing one more
> loop(e,g dma_map_sg()). We already have a loop to populate
> cmd_request->dma_range[] and so do the dma map in the same loop.
> 

I'm not seeing where the additional loop comes from.  Storvsc
already has a loop through the sgl entries.  Retain that loop and call
dma_map_sg() with nents set to 1.  Then the sequence is
dma_map_sg() --> dma_map_sg_attrs() --> dma_direct_map_sg() ->
dma_direct_map_page().  The latter function will call swiotlb_map()
to map all pages of the sgl entry as a single operation.

Michael

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-20 Thread Michael Kelley via iommu

From: h...@lst.de  Sent: Thursday, August 19, 2021 9:33 PM
> 
> On Thu, Aug 19, 2021 at 06:17:40PM +, Michael Kelley wrote:
> > >
> > > @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host 
> > > *host, struct scsi_cmnd *scmnd)
> > >   payload->range.len = length;
> > >   payload->range.offset = offset_in_hvpg;
> > >
> > > + cmd_request->dma_range = kcalloc(hvpg_count,
> > > +  sizeof(*cmd_request->dma_range),
> > > +  GFP_ATOMIC);
> >
> > With this patch, it appears that storvsc_queuecommand() is always
> > doing bounce buffering, even when running in a non-isolated VM.
> > The dma_range is always allocated, and the inner loop below does
> > the dma mapping for every I/O page.  The corresponding code in
> > storvsc_on_channel_callback() that does the dma unmap allows for
> > the dma_range to be NULL, but that never happens.
> 
> Maybe I'm missing something in the hyperv code, but I don't think
> dma_map_page would bounce buffer for the non-isolated case.  It
> will just return the physical address.

OK, right.  In the isolated VM case, the swiotlb is in force mode
and will do bounce buffering.  In the non-isolated case,
dma_map_page_attrs() -> dma_direct_map_page() does a lot of
checking but eventually just returns the physical address.  As this
patch is currently coded, it adds a fair amount of overhead
here in storvsc_queuecommand(), plus the overhead of the dma
mapping function deciding to use the identity mapping.  But if
dma_map_sg() is used and the code is simplified a bit, the overhead
will be less in general and will be per sgl entry instead of per page.

> 
> > > + if (offset_in_hvpg) {
> > > + payload->range.offset = dma & 
> > > ~HV_HYP_PAGE_MASK;
> > > + offset_in_hvpg = 0;
> > > + }
> >
> > I'm not clear on why payload->range.offset needs to be set again.
> > Even after the dma mapping is done, doesn't the offset in the first
> > page have to be the same?  If it wasn't the same, Hyper-V wouldn't
> > be able to process the PFN list correctly.  In fact, couldn't the above
> > code just always set offset_in_hvpg = 0?
> 
> Careful.  DMA mapping is supposed to keep the offset in the page, but
> for that the DMA mapping code needs to know what the device considers a
> "page".  For that the driver needs to set the min_align_mask field in
> struct device_dma_parameters.
> 

I see that the swiotlb code gets and uses the min_align_mask field.  But
the NVME driver is the only driver that ever sets it, so the value is zero
in all other cases.  Does swiotlb just use PAGE_SIZE in that that case?  I
couldn't tell from a quick glance at the swiotlb code.

> >
> > The whole approach here is to do dma remapping on each individual page
> > of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> > each scatterlist entry as a unit?  Each scatterlist entry describes a range 
> > of
> > physically contiguous memory.  After dma_map_sg(), the resulting dma
> > address must also refer to a physically contiguous range in the swiotlb
> > bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> > entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> > hvpfn value based on the dma address instead of sg_page().  But everything
> > else is the same, and the inner loop for populating the pfn_arry is 
> > unmodified.
> > Furthermore, the dma_range array that you've added is not needed, since
> > scatterlist entries already have a dma_address field for saving the mapped
> > address, and dma_unmap_sg() uses that field.
> 
> Yes, I think dma_map_sg is the right thing to use here, probably even
> for the non-isolated case so that we can get the hv drivers out of their
> little corner and into being more like a normal kernel driver.  That
> is, use the scsi_dma_map/scsi_dma_unmap helpers, and then iterate over
> the dma addresses one page at a time using for_each_sg_dma_page.
> 

Doing some broader revisions to the Hyper-V storvsc driver is up next on
my to-do list.  Rather than significantly modifying the non-isolated case in
this patch set, I'd suggest factoring it into my broader revisions.

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-19 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> 

Subject line tag should be "scsi: storvsc:"

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still need to handle. Use DMA API to map/umap these

s/need to handle/needs to be handled/

> memory during sending/receiving packet and Hyper-V DMA ops callback
> will use swiotlb function to allocate bounce buffer and copy data
> from/to bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/scsi/storvsc_drv.c | 68 +++---
>  1 file changed, 63 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 328bb961c281..78320719bdd8 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -427,6 +429,8 @@ struct storvsc_cmd_request {
>   u32 payload_sz;
> 
>   struct vstor_packet vstor_packet;
> + u32 hvpg_count;

This count is really the number of entries in the dma_range
array, right?  If so, perhaps "dma_range_count" would be
a better name so that it is more tightly associated.

> + struct hv_dma_range *dma_range;
>  };
> 
> 
> @@ -509,6 +513,14 @@ struct storvsc_scan_work {
>   u8 tgt_id;
>  };
> 
> +#define storvsc_dma_map(dev, page, offset, size, dir) \
> + dma_map_page(dev, page, offset, size, dir)
> +
> +#define storvsc_dma_unmap(dev, dma_range, dir)   \
> + dma_unmap_page(dev, dma_range.dma,  \
> +dma_range.mapping_size,  \
> +dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
> +

Each of these macros is used only once.  IMHO, they don't
add a lot of value.  Just coding dma_map/unmap_page()
inline would be fine and eliminate these lines of code.

>  static void storvsc_device_scan(struct work_struct *work)
>  {
>   struct storvsc_scan_work *wrk;
> @@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
>   struct hv_device *device;
>   struct storvsc_device *stor_device;
>   struct Scsi_Host *shost;
> + int i;
> 
>   if (channel->primary_channel != NULL)
>   device = channel->primary_channel->device_obj;
> @@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
>   request = (struct storvsc_cmd_request 
> *)scsi_cmd_priv(scmnd);
>   }
> 
> + if (request->dma_range) {
> + for (i = 0; i < request->hvpg_count; i++)
> + storvsc_dma_unmap(>device,
> + request->dma_range[i],
> + 
> request->vstor_packet.vm_srb.data_in == READ_TYPE);

I think you can directly get the DMA direction as 
request->cmd->sc_data_direction.

> +
> + kfree(request->dma_range);
> + }
> +
>   storvsc_on_receive(stor_device, packet, request);
>   continue;
>   }
> @@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
> struct scsi_cmnd *scmnd)
>   unsigned int hvpgoff, hvpfns_to_add;
>   unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>   unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> + dma_addr_t dma;
>   u64 hvpfn;
> + u32 size;
> 
>   if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload->range.len = length;
>   payload->range.offset = offset_in_hvpg;
> 
> + cmd_request->dma_range = kcalloc(hvpg_count,
> +  sizeof(*cmd_request->dma_range),
> +  GFP_ATOMIC);

With this patch, it appears that storvsc_queuecommand() is always
doing bounce buffering, even when running in a non-isolated VM.
The dma_range is always allocated, and the inner loop below does
the dma mapping for every I/O page.  The corresponding code in
storvsc_on_channel_callback() that does the dma unmap allows for
the dma_range to be NULL, but that never happens.

> + if (!cmd_request->dma_range) {
> + ret = -ENOMEM;

The other memory allocation failure in this function returns
SCSI_MLQUEUE_DEVICE_BUSY.   It may be debatable as to whether
that's the best approach, but that's a topic for a different patch.  I
would suggest being consistent and using the same return code
here.

> + goto free_payload;
> + }
> 
>

RE: [PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-08-19 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> 

The Subject line tag should be "hv_netvsc:".

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() still need to handle. Use DMA API to map/umap these
> memory during sending/receiving packet and Hyper-V DMA ops callback
> will use swiotlb function to allocate bounce buffer and copy data
> from/to bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/net/hyperv/hyperv_net.h   |   6 ++
>  drivers/net/hyperv/netvsc.c   | 144 +-
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h|   5 ++
>  4 files changed, 154 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index bc48855dff10..862419912bfb 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>   u32 total_bytes;
>   u32 send_buf_index;
>   u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
> 
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + void *recv_original_buf;
>   u32 recv_buf_size; /* allocated bytes */
>   u32 recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
> @@ -1082,6 +1084,8 @@ struct netvsc_device {
> 
>   /* Send buffer allocated by us */
>   void *send_buf;
> + void *send_original_buf;
> + u32 send_buf_size;
>   u32 send_buf_gpadl_handle;
>   u32 send_section_cnt;
>   u32 send_section_size;
> @@ -1730,4 +1734,6 @@ struct rndis_message {
>  #define RETRY_US_HI  1
>  #define RETRY_MAX2000/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +   struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 7bd935412853..fc312e5db4d5 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
>   int i;
> 
>   kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_original_buf) {
> + vunmap(nvdev->recv_buf);
> + vfree(nvdev->recv_original_buf);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_original_buf) {
> + vunmap(nvdev->send_buf);
> + vfree(nvdev->send_original_buf);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
>   kfree(nvdev->send_section_map);
> 
>   for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device 
> *net_device, u32 q_idx)
>   return nvchan->mrc.slots ? 0 : -ENOMEM;
>  }
> 
> +static void *netvsc_remap_buf(void *buf, unsigned long size)
> +{
> + unsigned long *pfns;
> + void *vaddr;
> + int i;
> +
> + pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
> +GFP_KERNEL);

This assumes that the "size" argument is a multiple of PAGE_SIZE.  I think
that's true in all the use cases, but it would be safer to check.

> + if (!pfns)
> + return NULL;
> +
> + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
> + pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
> + + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
> +
> + vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
> + kfree(pfns);
> +
> + return vaddr;
> +}

This function appears to be a duplicate of hv_map_memory() in Patch 11 of this
series.  Is it possible to structure things so there is only one 
implementation?  In
any case, see the comment in hv_map_memory() about PAGE_SIZE vs
HV_HYP_PAGE_SIZE and similar.

> +
>  static int netvsc_init_buf(struct hv_device *device,
>  struct netvsc_device *net_device,
>  const struct netvsc_device_info *device_info)
> @@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
>   unsigned int buf_size;
>   size_t map_words;
>   int i, ret = 0;
> + void *vaddr;
> 
>   /* Get receive buffer area. */
>   buf_size = device_info->recv_sections * device_info->recv_section_size;
> @@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
>   goto cleanup;
>   }
> 
> + if (hv_isolation_type_snp()) {
> + vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
> + if (!vaddr)
> + goto cleanup;
> +
> +

RE: [PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-08-19 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> 
> Hyper-V Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Swiotlb bounce buffer code calls dma_map_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space. Populate dma memory decrypted ops with hv
> map/unmap function.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> The map function vmap_pfn() can't work in the early place
> hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
> buffer in the hyperv_iommu_swiotlb_later_init().
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/hyperv/ivm.c   | 28 ++
>  arch/x86/include/asm/mshyperv.h |  2 +
>  arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
>  drivers/hv/vmbus_drv.c  |  3 ++
>  drivers/iommu/hyperv-iommu.c| 65 +
>  include/linux/hyperv.h  |  1 +
>  6 files changed, 101 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index c13ec5560d73..0f05e4d6fc62 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -265,3 +265,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int 
> numpages, bool visible)
> 
>   return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
>  }
> +
> +/*
> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> + */
> +void *hv_map_memory(void *addr, unsigned long size)
> +{
> + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
> +   sizeof(unsigned long), GFP_KERNEL);
> + void *vaddr;
> + int i;
> +
> + if (!pfns)
> + return NULL;
> +
> + for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
> + pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
> + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
> +
> + vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
> + kfree(pfns);
> +
> + return vaddr;
> +}

This function is manipulating page tables in the guest VM.  It is not involved
in communicating with Hyper-V, or passing PFNs to Hyper-V.  The pfn array
contains guest PFNs, not Hyper-V PFNs.  So it should use PAGE_SIZE
instead of HV_HYP_PAGE_SIZE, and similarly PAGE_SHIFT and virt_to_pfn().
If this code were ever to run on ARM64 in the future with PAGE_SIZE other
than 4 Kbytes, the use of PAGE_SIZE is correct choice.

> +
> +void hv_unmap_memory(void *addr)
> +{
> + vunmap(addr);
> +}
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index a30c60f189a3..b247739f57ac 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -250,6 +250,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct 
> hv_interrupt_entry *entry);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  enum hv_mem_host_visibility visibility);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool 
> visible);
> +void *hv_map_memory(void *addr, unsigned long size);
> +void hv_unmap_memory(void *addr);
>  void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
>  void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 54f9aa7e8457..43bd031aa332 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include 
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
> 
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -   NULL,
> +   hyperv_swiotlb_detect,
> pci_xen_swiotlb_init,
> NULL);
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 57bbbaa4e8f7..f068e22a5636 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -23,6 +23,7 @@
>

RE: [PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-08-16 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> 
> VMbus ring buffer are shared with host and it's need to

s/it's need/it needs/

> be accessed via extra address space of Isolation VM with
> SNP support. This patch is to map the ring buffer
> address in extra address space via ioremap(). HV host

It's actually using vmap_pfn(), not ioremap().

> visibility hvcall smears data in the ring buffer and
> so reset the ring buffer memory to zero after calling
> visibility hvcall.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/hv/Kconfig|  1 +
>  drivers/hv/channel.c  | 10 +
>  drivers/hv/hyperv_vmbus.h |  2 +
>  drivers/hv/ring_buffer.c  | 84 ++-
>  4 files changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index d1123ceb38f3..dd12af20e467 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -8,6 +8,7 @@ config HYPERV
>   || (ARM64 && !CPU_BIG_ENDIAN))
>   select PARAVIRT
>   select X86_HV_CALLBACK_VECTOR if X86
> + select VMAP_PFN
>   help
> Select this option to run Linux as a Hyper-V client operating
> system.
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 4c4717c26240..60ef881a700c 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -712,6 +712,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   if (err)
>   goto error_clean_ring;
> 
> + err = hv_ringbuffer_post_init(>outbound,
> +   page, send_pages);
> + if (err)
> + goto error_free_gpadl;
> +
> + err = hv_ringbuffer_post_init(>inbound,
> +   [send_pages], recv_pages);
> + if (err)
> + goto error_free_gpadl;
> +
>   /* Create and init the channel open message */
>   open_info = kzalloc(sizeof(*open_info) +
>  sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 40bc0eff6665..15cd23a561f3 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
>  /* Interface */
> 
>  void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> + struct page *pages, u32 page_cnt);
> 
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  struct page *pages, u32 pagecnt, u32 max_pkt_size);
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 2aee356840a2..d4f93fca1108 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include "hyperv_vmbus.h"
> 
> @@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel 
> *channel)
>   mutex_init(>outbound.ring_buffer_mutex);
>  }
> 
> -/* Initialize the ring buffer. */
> -int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
> -struct page *pages, u32 page_cnt, u32 max_pkt_size)
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> +struct page *pages, u32 page_cnt)
>  {
> + u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
> + unsigned long *pfns_wraparound;
> + void *vaddr;
>   int i;
> - struct page **pages_wraparound;
> 
> - BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
> + if (!hv_isolation_type_snp())
> + return 0;
> +
> + physic_addr += ms_hyperv.shared_gpa_boundary;
> 
>   /*
>* First page holds struct hv_ring_buffer, do wraparound mapping for
>* the rest.
>*/
> - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> + pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
>  GFP_KERNEL);
> - if (!pages_wraparound)
> + if (!pfns_wraparound)
>   return -ENOMEM;
> 
> - pages_wraparound[0] = pages;
> + pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
>   for (i = 0; i < 2 * (page_cnt - 1); i++)
> - pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
> -
> - ring_info->ring_buffer = (struct hv_ring_buffer *)
> - vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
> -
> - kfree(pages_wraparound);
> + pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
> + i % (page_cnt - 1) + 1;
> 
> -
> - if (!ring_info->ring_buffer)
> + vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
> + kfree(pfns_wraparound);
> + if (!vaddr)
>   return -ENOMEM;
> 
> - ring_info->ring_buffer->read_index =
> - ring_info->ring_buffer->write_index = 0;
> + /* Clean memory after setting

RE: [PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-08-16 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> 
> Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
> security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
> is to add support for these Isolation VM support in Linux.
> 

A general comment about this series:  I have not seen any statements
made about whether either type of Isolated VM is supported for 32-bit
Linux guests.   arch/x86/Kconfig has CONFIG_AMD_MEM_ENCRYPT as
64-bit only, so evidently SEV-SNP Isolated VMs would be 64-bit only.
But I don't know if VBS VMs are any different.

I didn't track down what happens if a 32-bit Linux is booted in
a VM that supports SEV-SNP.  Presumably some kind of message
is output that no encryption is being done.  But at a slightly
higher level, the Hyper-V initialization path should probably
also check for 32-bit and output a clear message that no isolation
is being provided.  At that point, I don't know if it is possible to
continue in non-isolated mode or whether the only choice is to
panic.  Continuing in non-isolated mode might be a bad idea
anyway since presumably the user has explicitly requested an
Isolated VM.

Related, I noticed usage of "unsigned long" for holding physical
addresses, which works when running 64-bit, but not when running
32-bit.  But even if Isolated VMs are always 64-bit, it would be still be
better to clean this up and use phys_addr_t instead.  Unfortunately,
more generic functions like set_memory_encrypted() and
set_memory_decrypted() have physical address arguments that
are of type unsigned long.

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-08-13 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> 
> The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
> with host in Isolation VM and so it's necessary to use hvcall to set
> them visible to host. In Isolation VM with AMD SEV SNP, the access
> address should be in the extra space which is above shared gpa
> boundary. So remap these pages into the extra address(pa +
> shared_gpa_boundary). Introduce monitor_pages_va to store
> the remap address and unmap these va when disconnect vmbus.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
> * Not remap monitor pages in the non-SNP isolation VM.
> ---
>  drivers/hv/connection.c   | 65 +++
>  drivers/hv/hyperv_vmbus.h |  1 +
>  2 files changed, 66 insertions(+)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6d315c1465e0..bf0ac3167bd2 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "hyperv_vmbus.h"
> @@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
> 
>   msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
>   msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> + if (hv_isolation_type_snp()) {
> + msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
> + msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
> + }
> +
>   msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
> 
>   /*
> @@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>   return -ECONNREFUSED;
>   }
> 
> + if (hv_isolation_type_snp()) {
> + vmbus_connection.monitor_pages_va[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages[0]
> + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[0])
> + return -ENOMEM;

This error case causes vmbus_negotiate_version() to return with
vmbus_connection.con_state set to CONNECTED.  But the caller never checks the
returned error code except for ETIMEDOUT.  So the caller will think that
vmbus_negotiate_version() succeeded when it didn't.  There may be some
existing bugs in that error handling code. :-(

> +
> + vmbus_connection.monitor_pages_va[1]
> + = vmbus_connection.monitor_pages[1];
> + vmbus_connection.monitor_pages[1]
> + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[1]) {
> + memunmap(vmbus_connection.monitor_pages[0]);
> + return -ENOMEM;
> + }
> +
> + memset(vmbus_connection.monitor_pages[0], 0x00,
> +HV_HYP_PAGE_SIZE);
> + memset(vmbus_connection.monitor_pages[1], 0x00,
> +HV_HYP_PAGE_SIZE);
> + }
> +

I don't think the memset() calls are needed.  The memory was originally
allocated with hv_alloc_hyperv_zeroed_page(), so it should already be zeroed.

>   return ret;
>  }
> 
> @@ -159,6 +191,7 @@ int vmbus_connect(void)
>   struct vmbus_channel_msginfo *msginfo = NULL;
>   int i, ret = 0;
>   __u32 version;
> + u64 pfn[2];
> 
>   /* Initialize the vmbus connection */
>   vmbus_connection.conn_state = CONNECTING;
> @@ -216,6 +249,16 @@ int vmbus_connect(void)
>   goto cleanup;
>   }
> 
> + if (hv_is_isolation_supported()) {
> + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> + if (hv_mark_gpa_visibility(2, pfn,
> + VMBUS_PAGE_VISIBLE_READ_WRITE)) {

Note that hv_mark_gpa_visibility() will need an appropriate no-op stub so
that this architecture independent code will compile for ARM64.

> + ret = -EFAULT;
> + goto cleanup;
> + }
> + }
> +
>   msginfo = kzalloc(sizeof(*msginfo) +
> sizeof(struct vmbus_channel_initiate_contact),
> GFP_KERNEL);
> @@ -284,6 +327,8 @@ int vmbus_connect(void)
> 
>  void vmbus_disconnect(void)
>  {
> + u64 pfn[2];
> +
>   /*
>* First send the unload request to the host.
>*/
> @@ -303,6 +348,26 @@ void vmbus_disconnect(void)
>   vmbus_connection.int_page = NULL;
>   }
> 
> + if (hv_is_isolation_supported()) {
> + if (vmbus_connection.monitor_pages_va[0]) {
> + memunmap(vmbus_connection.monitor_pages[0]);
> +

RE: [PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM

2021-08-13 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> 
> Hyper-V provides ghcb hvcall to handle VMBus
> HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
> msg in SNP Isolation VM. Add such support.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/hyperv/ivm.c   | 43 +
>  arch/x86/include/asm/mshyperv.h |  1 +
>  drivers/hv/connection.c |  6 -
>  drivers/hv/hv.c |  8 +-
>  include/asm-generic/mshyperv.h  | 29 ++
>  5 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index ec0e5c259740..c13ec5560d73 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -15,6 +15,49 @@
>  #include 
>  #include 
> 
> +#define GHCB_USAGE_HYPERV_CALL   1
> +
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return -EFAULT;
> +
> + WARN_ON(in_nmi());
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return -EFAULT;
> + }
> +
> + hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
> + hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
> +
> + hv_ghcb->hypercall.outputgpa = (u64)output;
> + hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> + hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> + if (input_size)
> + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> + VMGEXIT();
> +
> + hv_ghcb->ghcb.ghcb_usage = 0x;
> + memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
> +sizeof(hv_ghcb->ghcb.save.valid_bitmap));
> +
> + local_irq_restore(flags);
> +
> + return hv_ghcb->hypercall.hypercalloutput.callstatus;
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);

This function is called from architecture independent code, so it needs a
default no-op stub to enable the code to compile on ARM64.  The stub should
always return failure.

> +
>  void hv_ghcb_msr_write(u64 msr, u64 value)
>  {
>   union hv_ghcb *hv_ghcb;
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 730985676ea3..a30c60f189a3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -255,6 +255,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
>  void hv_ghcb_msr_write(u64 msr, u64 value);
>  void hv_ghcb_msr_read(u64 msr, u64 *value);
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 
> input_size);
> 
>  #define hv_get_synint_state_ghcb(int_num, val)   \
>   hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 5e479d54918c..6d315c1465e0 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
> 
>   ++channel->sig_events;
> 
> - hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
> + if (hv_isolation_type_snp())
> + hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
> + NULL, sizeof(u64));
> + else
> + hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_set_event);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 59f7173c4d9f..e5c9fc467893 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
>   aligned_msg->payload_size = payload_size;
>   memcpy((void *)aligned_msg->payload, payload, payload_size);
> 
> - status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
> + if (hv_isolation_type_snp())
> + status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
> + (void *)aligned_msg, NULL,
> + sizeof(struct hv_input_post_message));
> + else
> + status = hv_do_hypercall(HVCALL_POST_MESSAGE,
> + aligned_msg, NULL);
> 
>   /* Preemption must remain disabled until after the hypercall
>* so some other thread can't get scheduled onto this cpu and
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 90dac369a2dc..400181b855c1 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -31,6 +31,35 @@
> 
>  union hv_ghcb {
>   struct ghcb ghcb;
> + struct {
> + u64 hypercalldata[509];
> + u64 outputgpa;
> + union {
> + union {
> + struct {
> +

RE: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page

2021-08-13 Thread Michael Kelley via iommu

From: Michael Kelley  Sent: Friday, August 13, 2021 
12:31 PM
> To: Tianyu Lan ; KY Srinivasan ; 
> Haiyang Zhang ;
> Stephen Hemminger ; wei@kernel.org; Dexuan Cui 
> ;
> t...@linutronix.de; mi...@redhat.com; b...@alien8.de; x...@kernel.org; 
> h...@zytor.com; dave.han...@linux.intel.com;
> l...@kernel.org; pet...@infradead.org; konrad.w...@oracle.com; 
> boris.ostrov...@oracle.com; jgr...@suse.com;
> sstabell...@kernel.org; j...@8bytes.org; w...@kernel.org; 
> da...@davemloft.net; k...@kernel.org; j...@linux.ibm.com;
> martin.peter...@oracle.com; a...@arndb.de; h...@lst.de; 
> m.szyprow...@samsung.com; robin.mur...@arm.com;
> thomas.lenda...@amd.com; brijesh.si...@amd.com; a...@kernel.org; Tianyu Lan 
> ;
> pgo...@google.com; martin.b.ra...@gmail.com; a...@linux-foundation.org; 
> kirill.shute...@linux.intel.com;
> r...@kernel.org; s...@canb.auug.org.au; saravan...@fb.com; 
> krish.sadhuk...@oracle.com;
> aneesh.ku...@linux.ibm.com; xen-de...@lists.xenproject.org; 
> rient...@google.com; han...@cmpxchg.org;
> t...@kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-a...@vger.kernel.org; 
> linux-hyp...@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-s...@vger.kernel.org; net...@vger.kernel.org; 
> vkuznets ;
> parri.and...@gmail.com; dave.han...@intel.com
> Subject: RE: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
> 
> From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> > Subject: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page
> 
> See previous comments about tag in the Subject line.
> 
> > Hyper-V provides GHCB protocol to write Synthetic Interrupt
> > Controller MSR registers in Isolation VM with AMD SEV SNP
> > and these registers are emulated by hypervisor directly.
> > Hyper-V requires to write SINTx MSR registers twice. First
> > writes MSR via GHCB page to communicate with hypervisor
> > and then writes wrmsr instruction to talk with paravisor
> > which runs in VMPL0. Guest OS ID MSR also needs to be set
> > via GHCB.
> >
> > Signed-off-by: Tianyu Lan 
> > ---
> > Change since v1:
> >  * Introduce sev_es_ghcb_hv_call_simple() and share code
> >between SEV and Hyper-V code.
> > ---
> >  arch/x86/hyperv/hv_init.c   |  33 ++---
> >  arch/x86/hyperv/ivm.c   | 110 +
> >  arch/x86/include/asm/mshyperv.h |  78 +++-
> >  arch/x86/include/asm/sev.h  |   3 +
> >  arch/x86/kernel/cpu/mshyperv.c  |   3 +
> >  arch/x86/kernel/sev-shared.c|  63 ++---
> >  drivers/hv/hv.c | 121 ++--
> >  include/asm-generic/mshyperv.h  |  12 +++-
> >  8 files changed, 329 insertions(+), 94 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index b3683083208a..ab0b33f621e7 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -423,7 +423,7 @@ void __init hyperv_init(void)
> > goto clean_guest_os_id;
> >
> > if (hv_isolation_type_snp()) {
> > -   ms_hyperv.ghcb_base = alloc_percpu(void *);
> > +   ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
> 
> union hv_ghcb isn't defined.  It is not added until patch 6 of the series.
> 

Ignore this comment.  My mistake.

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page

2021-08-13 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page

See previous comments about tag in the Subject line.

> Hyper-V provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyper-V requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
>  * Introduce sev_es_ghcb_hv_call_simple() and share code
>between SEV and Hyper-V code.
> ---
>  arch/x86/hyperv/hv_init.c   |  33 ++---
>  arch/x86/hyperv/ivm.c   | 110 +
>  arch/x86/include/asm/mshyperv.h |  78 +++-
>  arch/x86/include/asm/sev.h  |   3 +
>  arch/x86/kernel/cpu/mshyperv.c  |   3 +
>  arch/x86/kernel/sev-shared.c|  63 ++---
>  drivers/hv/hv.c | 121 ++--
>  include/asm-generic/mshyperv.h  |  12 +++-
>  8 files changed, 329 insertions(+), 94 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b3683083208a..ab0b33f621e7 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -423,7 +423,7 @@ void __init hyperv_init(void)
>   goto clean_guest_os_id;
> 
>   if (hv_isolation_type_snp()) {
> - ms_hyperv.ghcb_base = alloc_percpu(void *);
> + ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);

union hv_ghcb isn't defined.  It is not added until patch 6 of the series.

>   if (!ms_hyperv.ghcb_base)
>   goto clean_guest_os_id;
> 
> @@ -432,6 +432,9 @@ void __init hyperv_init(void)
>   ms_hyperv.ghcb_base = NULL;
>   goto clean_guest_os_id;
>   }
> +
> + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>   }
> 
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -523,6 +526,7 @@ void hyperv_cleanup(void)
> 
>   /* Reset our OS id */
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>   /*
>* Reset hypercall page reference before reset the page,
> @@ -596,30 +600,3 @@ bool hv_is_hyperv_initialized(void)
>   return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> - if (!(ms_hyperv.priv_high & HV_ISOLATION))
> - return HV_ISOLATION_TYPE_NONE;
> - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> - if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> - return 0;
> -
> - if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> - return 0;
> -
> - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -
> -DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> -
> -bool hv_isolation_type_snp(void)
> -{
> - return static_branch_unlikely(_type_snp);
> -}
> -EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 8c905ffdba7f..ec0e5c259740 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -6,6 +6,8 @@
>   *  Tianyu Lan 
>   */
> 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -13,6 +15,114 @@
>  #include 
>  #include 
> 
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + WARN_ON(in_nmi());
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + ghcb_set_rcx(_ghcb->ghcb, msr);
> + ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
> + ghcb_set_rdx(_ghcb->ghcb, value >> 32);

Having used lower_32_bits() in the previous line, perhaps use
upper_32_bits() here?

> +
> + if (sev_es_ghcb_hv_call_simple(_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
> + pr_warn("Fail to write msr via ghcb %llx.\n", msr);
> +
> + local_irq_restore(flags);
> +}
> +
> +void hv_ghcb_msr_read(u64 msr, u64 *value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + WARN_ON(in_nmi());
> +
> + local_irq_save(flags);
> +

RE: [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-08-12 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in 
> Isolation VM
> 

Use tag "Drivers: hv: vmbus:" in the Subject line.

> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/hv/channel.c   | 44 --
>  include/linux/hyperv.h | 11 +++
>  2 files changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..4c4717c26240 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -465,7 +466,14 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   struct list_head *curr;
>   u32 next_gpadl_handle;
>   unsigned long flags;
> - int ret = 0;
> + int ret = 0, index;
> +
> + index = atomic_inc_return(>gpadl_index) - 1;
> +
> + if (index > VMBUS_GPADL_RANGE_COUNT - 1) {
> + pr_err("Gpadl handle position(%d) has been occupied.\n", index);
> + return -ENOSPC;
> + }
> 
>   next_gpadl_handle =
>   (atomic_inc_return(_connection.next_gpadl_handle) - 1);
> @@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   if (ret)
>   return ret;
> 
> + ret = set_memory_decrypted((unsigned long)kbuffer,
> +HVPFN_UP(size));
> + if (ret) {
> + pr_warn("Failed to set host visibility.\n");

Enhance this message a bit.  "Failed to set host visibility for new GPADL\n"
and also output the value of ret.

> + return ret;
> + }
> +
>   init_completion(>waitevent);
>   msginfo->waiting_channel = channel;
> 
> @@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   /* At this point, we received the gpadl created msg */
>   *gpadl_handle = gpadlmsg->gpadl;
> 
> + channel->gpadl_array[index].size = size;
> + channel->gpadl_array[index].buffer = kbuffer;
> + channel->gpadl_array[index].gpadlhandle = *gpadl_handle;
> +

I can see the merits of transparently stashing the memory address and size
that will be needed by vmbus_teardown_gpadl(), so that the callers of
__vmbus_establish_gpadl() don't have to worry about it.  But doing the
stashing transparently is somewhat messy.

Given that the callers are already have memory allocated to save the
GPADL handle, a little refactoring would make for a much cleaner solution.
Instead of having memory allocated for the 32-bit GPADL handle, callers
should allocate the slightly larger struct vmbus_gpadl that you've
defined below.  The calling interfaces can be updated to take a pointer
to this structure instead of a pointer to the 32-bit GPADL handle, and
you can save the memory address and size right along with the GPADL
handle.  This approach touches a few more files, but I think there are
only two callers outside of the channel management code -- netvsc
and hv_uio -- so it's not a big change.

>  cleanup:
>   spin_lock_irqsave(_connection.channelmsg_lock, flags);
>   list_del(>msglistentry);
> @@ -549,6 +568,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   }
> 
>   kfree(msginfo);
> +
> + if (ret) {
> + set_memory_encrypted((unsigned long)kbuffer,
> +  HVPFN_UP(size));
> + atomic_dec(>gpadl_index);
> + }
> +
>   return ret;
>  }
> 
> @@ -676,6 +702,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
> 
>   /* Establish the gpadl for the ring buffer */
>   newchannel->ringbuffer_gpadlhandle = 0;
> + atomic_set(>gpadl_index, 0);
> 
>   err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
> page_address(newchannel->ringbuffer_page),
> @@ -811,7 +838,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
> u32 gpadl_handle)
>   struct vmbus_channel_gpadl_teardown *msg;
>   struct vmbus_channel_msginfo *info;
>   unsigned long flags;
> - int ret;
> + int ret, i;
> 
>   info = kzalloc(sizeof(*info) +
>  sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
> @@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
> u32 gpadl_handle)
>   spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
> 
>   kfree(info);
> +
> + /* Find gpadl buffer virtual address and size. */
> + for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
> + if (channel->gpadl_array[i].gpadlhandle == gpadl_handle)
> + break;
> +
> + if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer,
> + HVPFN_UP(channel->gpadl_array[i].size)))
> +

RE: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-12 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM

[snip]

> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index ad8a5c586a35..1e4a0882820a 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -29,6 +29,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include "../mm_internal.h"
> 
> @@ -1980,15 +1982,11 @@ int set_memory_global(unsigned long addr, int 
> numpages)
>   __pgprot(_PAGE_GLOBAL), 0);
>  }
> 
> -static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool 
> enc)
>  {
>   struct cpa_data cpa;
>   int ret;
> 
> - /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> - return 0;
> -
>   /* Should not be working on unaligned addresses */
>   if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
>   addr &= PAGE_MASK;
> @@ -2023,6 +2021,17 @@ static int __set_memory_enc_dec(unsigned long addr, 
> int numpages, bool enc)
>   return ret;
>  }
> 
> +static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +{
> + if (hv_is_isolation_supported())
> + return hv_set_mem_host_visibility(addr, numpages, !enc);
> +
> + if (mem_encrypt_active())
> + return __set_memory_enc_pgtable(addr, numpages, enc);
> +
> + return 0;
> +}
> +

FYI, this not-yet-accepted patch
https://lore.kernel.org/lkml/ab5a7a983a943e7ca0a7ad28275a2d094c62c371.1623421410.git.ashish.ka...@amd.com/
looks to be providing a generic hook to notify the hypervisor when the
encryption status of a memory range changes.

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-12 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host 
> visibility support

Use "x86/hyperv:" tag in the Subject line.

> 
> From: Tianyu Lan 
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v2:
>* Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>  according to platform check.
> 
> Change since v1:
>* Use new staic call x86_set_memory_enc to avoid add Hyper-V
>  specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile   |   2 +-
>  arch/x86/hyperv/hv_init.c  |   6 ++
>  arch/x86/hyperv/ivm.c  | 114 +
>  arch/x86/include/asm/hyperv-tlfs.h |  20 +
>  arch/x86/include/asm/mshyperv.h|   4 +-
>  arch/x86/mm/pat/set_memory.c   |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h |   1 +
>  8 files changed, 160 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> 
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;
> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;
> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;

Could all of the tests in this function be run at initialization time, and
a single Boolean value pre-computed that this function returns?  I don't
think any of tests would change during the lifetime of the Linux instance,
so running the tests every time is slower than it needs to be.

>  }
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> new file mode 100644
> index ..8c905ffdba7f
> --- /dev/null
> +++ b/arch/x86/hyperv/ivm.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V Isolation VM interface with paravisor and hypervisor
> + *
> + * Author:
> + *  Tianyu Lan 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
> + *
> + * In Isolation VM, all guest memory is encripted from host and guest
> + * needs to set memory visible to host via hvcall before sharing memory
> + * with host.
> + */
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +enum hv_mem_host_visibility visibility)
> +{
> + struct hv_gpa_range_for_visibility **input_pcpu, *input;
> + u16 pages_processed;
> + u64 hv_status;
> + unsigned long flags;
> +
> + /* no-op if partition isolation is not enabled */
> + if (!hv_is_isolation_supported())
> + return 0;
> +
> + if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> + pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> + HV_MAX_MODIFY_GPA_REP_COUNT);
> + return -EINVAL;
> + }
> +
> + local_irq_save(flags);
> + input_pcpu = (struct hv_gpa_range_for_visibility **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> + input = *input_pcpu;
> + if (unlikely(!input)) {
> + local_irq_restore(flags);
> + return -EINVAL;
> + }
> +
> + input->partition_id = HV_PARTITION_ID_SELF;
> + input->host_visibility = visibility;
> + input->reserved0 = 0;
> + input->reserved1 = 0;
> + memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> + hv_status = hv_do_rep_hypercall(
> + HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> + 0, input, _processed);
> + local_irq_restore(flags);
> +
> + if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> + return 0;

pages_processed should also be checked to ensure that it equals count.
If not, something has gone wrong in the hypercall.

> +
> + return hv_status & HV_HYPERCALL_RESULT_MASK;
> +}
> +EXPORT_SYMBOL(hv_mark_gpa_visibility);
> +
> +static int

RE: [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.

2021-08-12 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the 
> Isolation VM.

As with Patch 1, use the "x86/hyperv:" tag in the Subject line.

> 
> From: Tianyu Lan 
> 
> Hyper-V exposes shared memory boundary via cpuid
> HYPERV_CPUID_ISOLATION_CONFIG and store it in the
> shared_gpa_boundary of ms_hyperv struct. This prepares
> to share memory with host for SNP guest.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/kernel/cpu/mshyperv.c |  2 ++
>  include/asm-generic/mshyperv.h | 12 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 6b5835a087a3..2b7f396ef1a5 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
>   if (ms_hyperv.priv_high & HV_ISOLATION) {
>   ms_hyperv.isolation_config_a = 
> cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>   ms_hyperv.isolation_config_b = 
> cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
> + ms_hyperv.shared_gpa_boundary =
> + (u64)1 << ms_hyperv.shared_gpa_boundary_bits;

You could use BIT_ULL() here, but it's kind of a shrug.

> 
>   pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
> 0x%x\n",
>   ms_hyperv.isolation_config_a, 
> ms_hyperv.isolation_config_b);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 4269f3174e58..aa26d24a5ca9 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -35,8 +35,18 @@ struct ms_hyperv_info {
>   u32 max_vp_index;
>   u32 max_lp_index;
>   u32 isolation_config_a;
> - u32 isolation_config_b;
> + union {
> + u32 isolation_config_b;
> + struct {
> + u32 cvm_type : 4;
> + u32 Reserved11 : 1;
> + u32 shared_gpa_boundary_active : 1;
> + u32 shared_gpa_boundary_bits : 6;
> + u32 Reserved12 : 20;

Any reason to name the reserved fields as "11" and "12"?  It
just looks a bit unusual.  And I'd suggest lowercase "r".

> + };
> + };
>   void  __percpu **ghcb_base;
> + u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;
> 
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-08-12 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Monday, August 9, 2021 10:56 AM
> Subject: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

The subject line tag on patches under arch/x86/hyperv is generally 
"x86/hyperv:".
There's some variation in the spelling of "hyperv", but let's go with the all
lowercase "hyperv".

> 
> Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
> to communicate with hypervisor. Map GHCB page for all
> cpus to read/write MSR register and submit hvcall request
> via GHCB.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/hyperv/hv_init.c   | 66 +++--
>  arch/x86/include/asm/mshyperv.h |  2 +
>  include/asm-generic/mshyperv.h  |  2 +
>  3 files changed, 66 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 708a2712a516..0bb4d9ca7a55 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved;
>  struct hv_vp_assist_page **hv_vp_assist_page;
>  EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> 
> +static int hyperv_init_ghcb(void)
> +{
> + u64 ghcb_gpa;
> + void *ghcb_va;
> + void **ghcb_base;
> +
> + if (!ms_hyperv.ghcb_base)
> + return -EINVAL;
> +
> + /*
> +  * GHCB page is allocated by paravisor. The address
> +  * returned by MSR_AMD64_SEV_ES_GHCB is above shared
> +  * ghcb boundary and map it here.
> +  */
> + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> + if (!ghcb_va)
> + return -ENOMEM;
> +
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + *ghcb_base = ghcb_va;
> +
> + return 0;
> +}
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>   union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
>   }
>   }
> 
> + hyperv_init_ghcb();
> +
>   return 0;
>  }
> 
> @@ -177,6 +205,14 @@ static int hv_cpu_die(unsigned int cpu)
>  {
>   struct hv_reenlightenment_control re_ctrl;
>   unsigned int new_cpu;
> + void **ghcb_va = NULL;

I'm not seeing any reason why this needs to be initialized.

> +
> + if (ms_hyperv.ghcb_base) {
> + ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + if (*ghcb_va)
> + memunmap(*ghcb_va);
> + *ghcb_va = NULL;
> + }
> 
>   hv_common_cpu_die(cpu);
> 
> @@ -383,9 +419,19 @@ void __init hyperv_init(void)
>   VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>   __builtin_return_address(0));
> - if (hv_hypercall_pg == NULL) {
> - wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> - goto remove_cpuhp_state;
> + if (hv_hypercall_pg == NULL)
> + goto clean_guest_os_id;
> +
> + if (hv_isolation_type_snp()) {
> + ms_hyperv.ghcb_base = alloc_percpu(void *);
> + if (!ms_hyperv.ghcb_base)
> + goto clean_guest_os_id;
> +
> + if (hyperv_init_ghcb()) {
> + free_percpu(ms_hyperv.ghcb_base);
> + ms_hyperv.ghcb_base = NULL;
> + goto clean_guest_os_id;
> + }

Having the GHCB setup code here splits the hypercall page setup into
two parts, which is unexpected.  First the memory is allocated
for the hypercall page, then the GHCB stuff is done, then the hypercall
MSR is setup.  Is there a need to do this split?  Also, if the GHCB stuff
fails and you goto clean_guest_os_id, the memory allocated for the
hypercall page is never freed.

It's also unexpected to have hyperv_init_ghcb() called here and called
in hv_cpu_init().  Wouldn't it be possible to setup ghcb_base *before*
cpu_setup_state() is called, so that hv_cpu_init() would take care of
calling hyperv_init_ghcb() for the boot CPU?  That's the pattern used
by the VP assist page, the percpu input page, etc.

>   }
> 
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -456,7 +502,8 @@ void __init hyperv_init(void)
>   hv_query_ext_cap(0);
>   return;
> 
> -remove_cpuhp_state:
> +clean_guest_os_id:
> + wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>   cpuhp_remove_state(cpuhp);
>  free_vp_assist_page:
>   kfree(hv_vp_assist_page);
> @@ -484,6 +531,9 @@ void hyperv_cleanup(void)
>*/
>   hv_hypercall_pg = NULL;
> 
> + if (ms_hyperv.ghcb_base)
> + free_percpu(ms_hyperv.ghcb_base);
> +

I don't think this cleanup is necessary.  The primary purpose of
hyperv_cleanup() is to ensure that things like overlay pages are
properly reset in Hyper-V before doing a kexec(), or before
panic'ing and running the kdump kernel.  There's no need to do
general

RE: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Michael Kelley via iommu

From: Wei Liu  Sent: Wednesday, February 3, 2021 7:05 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu 
> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
>  arch/x86/hyperv/irqdomain.c |  25 +
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c| 177 +++-
>  3 files changed, 203 insertions(+), 3 deletions(-)
> 

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Michael Kelley via iommu

From: Wei Liu  Sent: Wednesday, February 3, 2021 4:47 AM
> 
> On Wed, Jan 27, 2021 at 05:47:08AM +, Michael Kelley wrote:
> > From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > > domain to handle mapping and unmapping of IO-APIC interrupts.
> > >
> > > Signed-off-by: Wei Liu 
> > > ---
> > >  arch/x86/hyperv/irqdomain.c |  54 ++
> > >  arch/x86/include/asm/mshyperv.h |   4 +
> > >  drivers/iommu/hyperv-iommu.c| 179 +++-
> > >  3 files changed, 233 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > index 19637cd60231..8e2b4e478b70 100644
> > > --- a/arch/x86/hyperv/irqdomain.c
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -330,3 +330,57 @@ struct irq_domain * __init 
> > > hv_create_pci_msi_domain(void)
> > >  }
> > >
> > >  #endif /* CONFIG_PCI_MSI */
> > > +
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > > *entry)
> > > +{
> > > + union hv_device_id device_id;
> > > +
> > > + device_id.as_uint64 = 0;
> > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > + return hv_unmap_interrupt(device_id.as_uint64, entry) &
> HV_HYPERCALL_RESULT_MASK;
> >
> > The masking is already done in hv_unmap_interrupt.
> 
> Fixed.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > > +
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > > vector,
> > > + struct hv_interrupt_entry *entry)
> > > +{
> > > + unsigned long flags;
> > > + struct hv_input_map_device_interrupt *input;
> > > + struct hv_output_map_device_interrupt *output;
> > > + union hv_device_id device_id;
> > > + struct hv_device_interrupt_descriptor *intr_desc;
> > > + u16 status;
> > > +
> > > + device_id.as_uint64 = 0;
> > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > + local_irq_save(flags);
> > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > + memset(input, 0, sizeof(*input));
> > > + intr_desc = >interrupt_descriptor;
> > > + input->partition_id = hv_current_partition_id;
> > > + input->device_id = device_id.as_uint64;
> > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > + intr_desc->target.vector = vector;
> > > + intr_desc->vector_count = 1;
> > > +
> > > + if (level)
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > + else
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > + __set_bit(vcpu, (unsigned long *)_desc->target.vp_mask);
> > > +
> > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input,
> output) &
> > > +  HV_HYPERCALL_RESULT_MASK;
> > > + local_irq_restore(flags);
> > > +
> > > + *entry = output->interrupt_entry;
> > > +
> > > + return status;
> >
> > As a cross-check, I was comparing this code against hv_map_msi_interrupt(). 
> >  They are
> > mostly parallel, though some of the assignments are done in a different 
> > order.  It's a nit,
> > but making them as parallel as possible would be nice. :-)
> >
> 
> Indeed. I will see about factoring out a function.

If factoring out a separate helper function is clumsy, just having the parallel 
code
in the two functions be as similar as possible makes it easier to see what's the
same and what's different.

> 
> > Same 64 vCPU comment applies here as well.
> >
> 
> This is changed to use vpset instead. Took me a bit of time to get it
> working because document is a bit lacking.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > > diff --git a/arch/x86/include/asm/mshyperv.h 
> > > b/arch/x86/include/asm/mshyperv.h
> > > index ccc849e25d5e..345d7c6f8c37 100644
> > > --- a/arch/x86/include/asm/mshyperv.h
> > > +++ b/arch/x86/include/asm/mshyperv.h
> > > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > > hv_msi_entry *msi_entry,
> > >
> > >  struct irq_domain *hv_create_pci_msi_domain(void);
> > >
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > > vector,
> > > + struct hv_interrupt_entry *entry);
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > > *entry);
> > > +
> > >  #else /* CONFIG_HYPERV */
> > >  static inline void hyperv_init(void) {}
> > >  static inline void hyperv_setup_mmu_ops(void) {}
> > > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > > index b7db6024e65c..6d35e4c303c6 100644
> > > --- a/drivers/iommu/hyperv-iommu.c
> > > +++ b/drivers/iommu/hyperv-iommu.c
> > > @@ -116,30 +116,43 @@ static const struct irq_domain_ops 
> > > hyperv_ir_domain_ops = {
> >

RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-01-26 Thread Michael Kelley via iommu

From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/hyperv/irqdomain.c |  54 ++
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c| 179 +++-
>  3 files changed, 233 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 19637cd60231..8e2b4e478b70 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
>  }
> 
>  #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_unmap_interrupt(device_id.as_uint64, entry) & 
> HV_HYPERCALL_RESULT_MASK;

The masking is already done in hv_unmap_interrupt.

> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + unsigned long flags;
> + struct hv_input_map_device_interrupt *input;
> + struct hv_output_map_device_interrupt *output;
> + union hv_device_id device_id;
> + struct hv_device_interrupt_descriptor *intr_desc;
> + u16 status;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + memset(input, 0, sizeof(*input));
> + intr_desc = >interrupt_descriptor;
> + input->partition_id = hv_current_partition_id;
> + input->device_id = device_id.as_uint64;
> + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> + intr_desc->target.vector = vector;
> + intr_desc->vector_count = 1;
> +
> + if (level)
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> + else
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> + __set_bit(vcpu, (unsigned long *)_desc->target.vp_mask);
> +
> + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, 
> output) &
> +  HV_HYPERCALL_RESULT_MASK;
> + local_irq_restore(flags);
> +
> + *entry = output->interrupt_entry;
> +
> + return status;

As a cross-check, I was comparing this code against hv_map_msi_interrupt().  
They are
mostly parallel, though some of the assignments are done in a different order.  
It's a nit,
but making them as parallel as possible would be nice. :-)

Same 64 vCPU comment applies here as well.


> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> hv_msi_entry *msi_entry,
> 
>  struct irq_domain *hv_create_pci_msi_domain(void);
> 
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry);
> +
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index b7db6024e65c..6d35e4c303c6 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops 
> = {
>   .free = hyperv_irq_remapping_free,
>  };
> 
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
>  static int __init hyperv_prepare_irq_remapping(void)
>  {
>   struct fwnode_handle *fn;
>   int i;
> + const char *name;
> + const struct irq_domain_ops *ops;
> 
>   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>   x86_init.hyper.msi_ext_dest_id() ||
> - !x2apic_supported() || hv_root_partition)
> + !x2apic_supported())

Any reason that the check for hv_root_partition was added
in patch #4  of this series, and then removed here?  Could
patch #4 just be dropped?

>   return -ENODEV;
> 
> - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> + if (hv_root_partition) {
> + name = "HYPERV-ROOT-IR";
> + ops =

RE: [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root

2021-01-25 Thread Michael Kelley via iommu

From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root partition.
> 
> Signed-off-by: Wei Liu 
> Acked-by: Joerg Roedel 
> Reviewed-by: Vitaly Kuznetsov 
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..b7db6024e65c 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "irq_remapping.h"
> 
> @@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
> 
>   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>   x86_init.hyper.msi_ext_dest_id() ||
> - !x2apic_supported())
> + !x2apic_supported() || hv_root_partition)
>   return -ENODEV;
> 
>   fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 20/28] mm: remove the pgprot argument to __vmalloc

2020-04-10 Thread Michael Kelley via iommu

From: Christoph Hellwig  Sent: Wednesday, April 8, 2020 4:59 AM
> 
> The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove
> it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/hyperv/hv_init.c  |  3 +--
>  arch/x86/include/asm/kvm_host.h|  3 +--
>  arch/x86/kvm/svm.c |  3 +--
>  drivers/block/drbd/drbd_bitmap.c   |  4 +---
>  drivers/gpu/drm/etnaviv/etnaviv_dump.c |  4 ++--
>  drivers/lightnvm/pblk-init.c   |  5 ++---
>  drivers/md/dm-bufio.c  |  4 ++--
>  drivers/mtd/ubi/io.c   |  4 ++--
>  drivers/scsi/sd_zbc.c  |  3 +--
>  fs/gfs2/dir.c  |  9 -
>  fs/gfs2/quota.c|  2 +-
>  fs/nfs/blocklayout/extent_tree.c   |  2 +-
>  fs/ntfs/malloc.h   |  2 +-
>  fs/ubifs/debug.c   |  2 +-
>  fs/ubifs/lprops.c  |  2 +-
>  fs/ubifs/lpt_commit.c  |  4 ++--
>  fs/ubifs/orphan.c  |  2 +-
>  fs/xfs/kmem.c  |  2 +-
>  include/linux/vmalloc.h|  2 +-
>  kernel/bpf/core.c  |  6 +++---
>  kernel/groups.c|  2 +-
>  kernel/module.c|  3 +--
>  mm/nommu.c | 15 +++
>  mm/page_alloc.c|  2 +-
>  mm/percpu.c|  2 +-
>  mm/vmalloc.c   |  4 ++--
>  net/bridge/netfilter/ebtables.c|  6 ++
>  sound/core/memalloc.c  |  2 +-
>  sound/core/pcm_memory.c|  2 +-
>  29 files changed, 47 insertions(+), 59 deletions(-)
> 

For Hyper-V changes,

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page

2020-04-10 Thread Michael Kelley via iommu

From: Christoph Hellwig  Sent: Wednesday, April 8, 2020 4:59 AM
> 
> Use the designated helper for allocating executable kernel memory, and
> remove the now unused PAGE_KERNEL_RX define.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/hyperv/hv_init.c| 2 +-
>  arch/x86/include/asm/pgtable_types.h | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH] video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.

2019-11-01 Thread Michael Kelley via iommu

From: Wei Hu  Sent: Tuesday, October 22, 2019 4:11 AM
> 
> On Hyper-V, Generation 1 VMs can directly use VM's physical memory for
> their framebuffers. This can improve the efficiency of framebuffer and
> overall performence for VM. The physical memory assigned to framebuffer
> must be contiguous. We use CMA allocator to get contiguouse physicial
> memory when the framebuffer size is greater than 4MB. For size under
> 4MB, we use alloc_pages to achieve this.
> 
> To enable framebuffer memory allocation from CMA, supply a kernel
> parameter to give enough space to CMA allocator at boot time. For
> example:
> cma=130m
> This gives 130MB memory to CAM allocator that can be allocated to
> framebuffer. If this fails, we fall back to the old way of using
> mmio for framebuffer.
> 
> Signed-off-by: Wei Hu 
> ---

[snip]

> +/*
> + * Allocate enough contiguous physical memory.
> + * Return physical address if succeeded or -1 if failed.
> + */
> +static unsigned long hvfb_get_phymem(unsigned int request_size)
> +{
> + struct page *page = NULL;
> + unsigned int request_pages;
> + unsigned long paddr = 0;

Using 'unsigned long' for physical addresses is problematic on 32-bit
systems with Physical Address Extension (PAE) enabled. Linux has
phys_addr_t that handles 32-bit PAE correctly and that probably
should be used here.   This whole driver needs to be carefully
checked for the correct type for physical addresses.

> + unsigned int order = get_order(request_size);
> +
> + if (request_size == 0)
> + return -1;
> +
> + /* Try call alloc_pages if the size is less than 2^MAX_ORDER */
> + if (order < MAX_ORDER) {
> + page = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
> + if (!page)
> + return -1;
> +
> + request_pages = (1 << order);
> + goto get_phymem1;
> + }
> +
> + /* Allocate from CMA */
> + // request_pages = (request_size >> PAGE_SHIFT) + 1;
> + request_pages = (round_up(request_size, PAGE_SIZE) >> PAGE_SHIFT);
> + page = dma_alloc_from_contiguous(NULL, request_pages, 0, false);
> +
> + if (page == NULL)
> + return -1;
> +
> +get_phymem1:
> + paddr = (page_to_pfn(page) << PAGE_SHIFT);
> +
> + pr_info("Allocated %d pages starts at physical addr 0x%lx\n",
> + request_pages, paddr);
> +
> + return paddr;
> +}
> +
> +/* Release contiguous physical memory */
> +static void hvfb_release_phymem(unsigned long paddr, unsigned int size)

Same issue here with 'unsigned long paddr'.

> +{
> + unsigned int order = get_order(size);
> +
> + if (order < MAX_ORDER)
> + __free_pages(pfn_to_page(paddr >> PAGE_SHIFT), order);
> + else
> + dma_release_from_contiguous(NULL,
> + pfn_to_page(paddr >> PAGE_SHIFT),
> + (round_up(size, PAGE_SIZE) >>
> +  PAGE_SHIFT));
> + // (size >> PAGE_SHIFT) + 1);
> +}
> +
> 
>  /* Get framebuffer memory from Hyper-V video pci space */
>  static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
> @@ -947,8 +1017,58 @@ static int hvfb_getmem(struct hv_device *hdev, struct 
> fb_info
> *info)
>   void __iomem *fb_virt;
>   int gen2vm = efi_enabled(EFI_BOOT);
>   resource_size_t pot_start, pot_end;
> + unsigned long paddr;

Same issue with 'unsigned long'.

>   int ret;
> 
> + if (!gen2vm) {
> + pdev = pci_get_device(PCI_VENDOR_ID_MICROSOFT,
> + PCI_DEVICE_ID_HYPERV_VIDEO, NULL);
> + if (!pdev) {
> + pr_err("Unable to find PCI Hyper-V video\n");
> + return -ENODEV;
> + }
> + }
> +
> + info->apertures = alloc_apertures(1);
> + if (!info->apertures)
> + goto err1;
> +
> + if (gen2vm) {
> + info->apertures->ranges[0].base = screen_info.lfb_base;
> + info->apertures->ranges[0].size = screen_info.lfb_size;
> + } else {
> + info->apertures->ranges[0].base = pci_resource_start(pdev, 0);
> + info->apertures->ranges[0].size = pci_resource_len(pdev, 0);
> + }
> +
> + /*
> +  * For Gen 1 VM, we can directly use the contiguous memory
> +  * from VM. If we success, deferred IO happens directly
> +  * on this allocated framebuffer memory, avoiding extra
> +  * memory copy.
> +  */
> + if (!gen2vm) {
> + paddr = hvfb_get_phymem(screen_fb_size);
> + if (paddr != (unsigned long) -1) {

phys_addr_t

> + par->mmio_pp = paddr;

I'm not really sure what to do about the above, because mmio_pp is
declared as 'unsigned long' when it really should be phys_addr_t.
But maybe a MMIO address will always be less than 4 Gb and therefore
will fit in 32 bits?

> +

RE: [PATCH] drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86

2019-10-17 Thread Michael Kelley via iommu

From: Boqun Feng  Sent: Wednesday, October 16, 2019 5:57 
PM
> 
> Currently hyperv-iommu is implemented in a x86 specific way, for
> example, apic is used. So make the HYPERV_IOMMU Kconfig depend on X86
> as a preparation for enabling HyperV on architecture other than x86.
> 
> Cc: Lan Tianyu 
> Cc: Michael Kelley 
> Cc: linux-hyp...@vger.kernel.org
> Signed-off-by: Boqun Feng (Microsoft) 
> ---
> 
> Without this patch, I could observe compile error:
> 
> | drivers/iommu/hyperv-iommu.c:17:10: fatal error: asm/apic.h: No such
> | file or directory
> |   17 | #include 
> |  |  ^~~~
> 
> , after apply Michael's ARM64 on HyperV enablement patchset.
> 
>  drivers/iommu/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e3842eabcfdd..f1086eaed41c 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -467,7 +467,7 @@ config QCOM_IOMMU
> 
>  config HYPERV_IOMMU
>   bool "Hyper-V x2APIC IRQ Handling"
> - depends on HYPERV
> + depends on HYPERV && X86
>   select IOMMU_API
>   default HYPERV
>   help
> --
> 2.23.0

Reviewed-by: Michael Kelley 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V6 2/3] IOMMU/Hyper-V: Add Hyper-V stub IOMMU driver

2019-02-27 Thread Michael Kelley via iommu

From: lantianyu1...@gmail.com  Sent: Wednesday, 
February 27, 2019 6:54 AM
> 
> On the bare metal, enabling X2APIC mode requires interrupt remapping
> function which helps to deliver irq to cpu with 32-bit APIC ID.
> Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
> MSI protocol already supports to deliver interrupt to the CPU whose
> virtual processor index is more than 255. IO-APIC interrupt still has
> 8-bit APIC ID limitation.
> 
> This patch is to add Hyper-V stub IOMMU driver in order to enable
> X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
> interrupt remapping capability when X2APIC mode is available. Otherwise,
> it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity
> and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID.
> 
> Define 24 IO-APIC remapping entries because Hyper-V only expose one
> single IO-APIC and one IO-APIC has 24 pins according IO-APIC spec(
> https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf).
> 
> Reviewed-by: Michael Kelley 
> Signed-off-by: Lan Tianyu 

Reconfirming my reviewed-by after the change to fix the
compile error detected by the kbuild test robot.

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V6 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available

2019-02-27 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Wednesday, February 27, 2019 
6:54 AM
> 
> Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> set x2apic destination mode to physcial mode when x2apic is available
> and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
> 8-bit APIC id.
> 
> Reviewed-by: Thomas Gleixner 
> Reviewed-by: Michael Kelley 
> Signed-off-by: Lan Tianyu 
> ---
> Change since v5:
>- Fix comile error due to x2apic_phys
> 
> Change since v2:
>- Fix compile error due to x2apic_phys
>- Fix comment indent
> Change since v1:
>- Remove redundant extern for x2apic_phys
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 12 
>  1 file changed, 12 insertions(+)

Reconfirming my reviewed-by after the change to fix the
compile error detected by the kbuild test robot.

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode

2019-02-25 Thread Michael Kelley via iommu

From: Tianyu Lan  Sent: Friday, February 22, 2019 
4:12 AM
> 
> On the bare metal, enabling X2APIC mode requires interrupt remapping
> function which helps to deliver irq to cpu with 32-bit APIC ID.
> Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
> MSI protocol already supports to deliver interrupt to the CPU whose
> virtual processor index is more than 255. IO-APIC interrupt still has
> 8-bit APIC ID limitation.
> 
> This patchset is to add Hyper-V stub IOMMU driver in order to enable
> X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
> interrupt remapping capability when X2APIC mode is available. X2APIC
> destination mode is set to physical by PATCH 1 when X2APIC is available.
> Hyper-V IOMMU driver will scan cpu 0~255 and set cpu into IO-APIC MAX cpu
> affinity cpumask if its APIC ID is 8-bit. Driver creates a Hyper-V irq domain
> to limit IO-APIC interrupts' affinity and make sure cpus assigned with IO-APIC
> interrupt are in the scope of IO-APIC MAX cpu affinity.
> 
> Lan Tianyu (3):
>   x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
>  available
>   HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
>   MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
> scope
> 

Joerg -- What's your take on this patch set now that it has settled down?  If
you are good with it, from the Microsoft standpoint we're hoping that it
can get into linux-next this week (given the extra week due to 5.0-rc8).

Thanks,

Michael Kelley





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V5 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-22 Thread Michael Kelley via iommu

From: tianyu@microsoft.com  Sent: Friday, 
February 22, 2019 4:12 AM
> 
> On the bare metal, enabling X2APIC mode requires interrupt remapping
> function which helps to deliver irq to cpu with 32-bit APIC ID.
> Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
> MSI protocol already supports to deliver interrupt to the CPU whose
> virtual processor index is more than 255. IO-APIC interrupt still has
> 8-bit APIC ID limitation.
> 
> This patch is to add Hyper-V stub IOMMU driver in order to enable
> X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
> interrupt remapping capability when X2APIC mode is available. Otherwise,
> it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity
> and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID.
> 
> Define 24 IO-APIC remapping entries because Hyper-V only expose one
> single IO-APIC and one IO-APIC has 24 pins according IO-APIC spec(
> https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf).
> 
> Signed-off-by: Lan Tianyu 
> ---
> Change since v4:
>- Fix the loop of scan cpu's APIC id
> 
> Change since v3:
>- Make Hyper-V IOMMU as Hyper-V default driver
>- Fix hypervisor_is_type() input parameter
>- Check possible cpu numbers during scan 0~255 cpu's apic id.
> 
> Change since v2:
>- Improve comment about why save IO-APIC entry in the irq chip data.
>- Some code improvement.
>- Improve statement in the IOMMU Kconfig.
> 
> Change since v1:
>   - Remove unused pr_fmt
>   - Make ioapic_ir_domain as static variable
>   - Remove unused variables cfg and entry in the 
> hyperv_irq_remapping_alloc()
>   - Fix comments

Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-21 Thread Michael Kelley via iommu

From: lantianyu1...@gmail.com  Sent: Monday, February 
11, 2019 6:20 AM
> + /*
> +  * Hyper-V doesn't provide irq remapping function for
> +  * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> +  * Cpu's APIC ID is read from ACPI MADT table and APIC IDs
> +  * in the MADT table on Hyper-v are sorted monotonic increasingly.
> +  * APIC ID reflects cpu topology. There maybe some APIC ID
> +  * gaps when cpu number in a socket is not power of two. Prepare
> +  * max cpu affinity for IOAPIC irqs. Scan cpu 0-255 and set cpu
> +  * into ioapic_max_cpumask if its APIC ID is less than 256.
> +  */
> + for (i = min_t(unsigned int, num_possible_cpus(), 255); i >= 0; i--)

The above isn't quite right.  For example, if num_possible_cpus() is 8,
then the loop will be executed 9 times, for values 8 down through 0.
It should be executed for values 7 down through 0.

> + if (cpu_physical_id(i) < 256)
> + cpumask_set_cpu(i, _max_cpumask);
> +
> + return 0;
> +}

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V4 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available

2019-02-21 Thread Michael Kelley via iommu

From: lantianyu1...@gmail.com  Sent: Monday, February 
11, 2019 6:20 AM
> 
> Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> set x2apic destination mode to physcial mode when x2apic is available
> and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
> 8-bit APIC id.
> 
> Reviewed-by: Thomas Gleixner 
> Signed-off-by: Lan Tianyu 
> ---
> Change since v2:
>- Fix compile error due to x2apic_phys
>- Fix comment indent
> Change since v1:
>- Remove redundant extern for x2apic_phys
> 
Reviewed-by: Michael Kelley 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V2 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available

2019-02-03 Thread Michael Kelley via iommu

From: lantianyu1...@gmail.com   Sent: Saturday, 
February 2, 2019 5:15 AM
> 
> Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> set x2apic destination mode to physcial mode when x2apic is available
> and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
> 8-bit APIC id.
> 
> Signed-off-by: Lan Tianyu 
> ---
> Change since v1:
>- Remove redundant extern for x2apic_phys
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index e81a2db..4bd6d90 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -328,6 +328,16 @@ static void __init ms_hyperv_init_platform(void)
>  # ifdef CONFIG_SMP
>   smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
>  # endif
> +
> +/*
> + * Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> + * set x2apic destination mode to physcial mode when x2apic is available
> + * and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs
> + * have 8-bit APIC id.
> + */

Per comment from Dan Carpenter on v1 of this patch, the above comment
block should be indented one tab to line up with the "if" statement below.

Michael

> + if (IS_ENABLED(CONFIG_HYPERV_IOMMU) && x2apic_supported())
> + x2apic_phys = 1;
> +
>  #endif
>  }
> 
> --
> 2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V2 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-03 Thread Michael Kelley via iommu

From: lantianyu1...@gmail.com  Sent: Saturday, 
February 2, 2019 5:15 AM

I have a couple more comments 

> 
> +config HYPERV_IOMMU
> + bool "Hyper-V IRQ Remapping Support"
> + depends on HYPERV
> + select IOMMU_API
> + help
> + Hyper-V stub IOMMU driver provides IRQ Remapping capability
> + to run Linux guest with X2APIC mode on Hyper-V.
> +
> +

I'm a little concerned about the terminology here.  The comments and
commit messages for these patches all say that Hyper-V guests don't
have interrupt remapping support.  And we don't really *need* interrupt
remapping support because all the interrupts that should be nicely spread
out across all vCPUs (i.e., the MSI interrupts for PCI pass-thru devices) are
handled via a hypercall in pci-hyperv.c, and not via the virtual IOAPIC.  So
we have this stub IOMMU driver that doesn't actually do interrupt remapping.
It just prevents assigning the very small number of non-performance sensitive
IOAPIC interrupts to a CPU with an APIC ID above 255.

With that background, describing this feature as "Hyper-V IRQ Remapping
Support" seems incorrect, and similarly in the "help" description.  Finding
good terminology for this situation is hard.  But how about narrowing the
focus to x2APIC handling:

bool "Hyper-V x2APIC IRQ Handling"
...
help
   Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux
   guests to run with x2APIC mode enabled


> +static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
> +  unsigned int virq, unsigned int nr_irqs,
> +  void *arg)
> +{
> + struct irq_alloc_info *info = arg;
> + struct irq_data *irq_data;
> + struct irq_desc *desc;
> + int ret = 0;
> +
> + if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
> + return -EINVAL;
> +
> + ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> + if (ret < 0)
> + return ret;
> +
> + irq_data = irq_domain_get_irq_data(domain, virq);
> + if (!irq_data) {
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> + return -EINVAL;
> + }
> +
> + irq_data->chip = _ir_chip;
> +
> + /*
> +  * IOAPIC entry pointer is saved in chip_data to allow
> +  * hyperv_irq_remappng_activate()/hyperv_ir_set_affinity() to set
> +  * vector and dest_apicid. cfg->vector and cfg->dest_apicid are
> +  * ignorred when IRQ remapping is enabled. See ioapic_configure_entry().
> +  */
> + irq_data->chip_data = info->ioapic_entry;
> +
> + /*
> +  * Hypver-V IO APIC irq affinity should be in the scope of
> +  * ioapic_max_cpumask because no irq remapping support.
> +  */
> + desc = irq_data_to_desc(irq_data);
> + cpumask_and(desc->irq_common_data.affinity,
> + desc->irq_common_data.affinity,
> + _max_cpumask);

The intent of this cpumask_and() call is to ensure that IOAPIC interrupts
are initially assigned to a CPU with APIC ID < 256.   But do we know that
the initial value of desc->irq_common_data.affinity is such that the result
of the cpumask_and() will not be the empty set?  My impression is that
these local IOAPIC interrupts are assigned an initial affinity mask with all
CPUs set, in which case this will work just fine.  But I'm not sure if that
is guaranteed.

An alternative would be to set the affinity to ioapic_max_cpumask and
overwrite whatever might have been previously specified.  But I don't know
if that's really better.

> +
> + return 0;
> +}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V2 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-03 Thread Michael Kelley via iommu

From: lantianyu1...@gmail.com  Sent: Saturday, 
February 2, 2019 5:15 AM
> 
> +/*
> + * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
> + * Redirection Table.
> + */
> +#define IOAPIC_REMAPPING_ENTRY 24

The other unstated assumption here is that Hyper-v guest VMs
have only a single IOAPIC, regardless of the size of the VM.
Perhaps that should be stated in the comment explaining why
there are 24 entries?

> +
> +static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
> +static struct irq_domain *ioapic_ir_domain;
> +
> +static int hyperv_ir_set_affinity(struct irq_data *data,
> + const struct cpumask *mask, bool force)
> +{
> + struct irq_data *parent = data->parent_data;
> + struct irq_cfg *cfg = irqd_cfg(data);
> + struct IO_APIC_route_entry *entry;
> + cpumask_t cpumask;
> + int ret;
> +
> + cpumask_andnot(, mask, _max_cpumask);
> +
> + /* Return error If new irq affinity is out of ioapic_max_cpumask. */
> + if (!cpumask_empty())
> + return -EINVAL;

The above two cpumask functions can be combined in a single
call to cpumask_subset().  This has the nice property that determining
whether the cpus in "mask" are a subset of the cpus in "ioapic_max_cpumask"
is exactly what this code is trying to do. :-)  And it gets rid of the local
cpumask variable and the associated compiler warnings about stack frame
size.

> +
> + ret = parent->chip->irq_set_affinity(parent, mask, force);
> + if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> + return ret;
> +
> + entry = data->chip_data;
> + entry->dest = cfg->dest_apicid;
> + entry->vector = cfg->vector;
> + send_cleanup_vector(cfg);
> +
> + return 0;
> +}
> +
> +static struct irq_chip hyperv_ir_chip = {
> + .name   = "HYPERV-IR",
> + .irq_ack= apic_ack_irq,
> + .irq_set_affinity   = hyperv_ir_set_affinity,
> +};
> +
> +static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
> +  unsigned int virq, unsigned int nr_irqs,
> +  void *arg)
> +{
> + struct irq_alloc_info *info = arg;
> + struct irq_data *irq_data;
> + struct irq_desc *desc;
> + int ret = 0;
> +
> + if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
> + return -EINVAL;
> +
> + ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> + if (ret < 0)
> + return ret;
> +
> + irq_data = irq_domain_get_irq_data(domain, virq);
> + if (!irq_data) {
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> + return -EINVAL;
> + }
> +
> + irq_data->chip = _ir_chip;
> +
> + /*
> +  * IOAPIC entry pointer is saved in chip_data to allow
> +  * hyperv_irq_remappng_activate()/hyperv_ir_set_affinity() to set
> +  * vector and dest_apicid. cfg->vector and cfg->dest_apicid are
> +  * ignorred when IRQ remapping is enabled. See ioapic_configure_entry().
 
Spelling: "ignored".

I saw Vitaly previous comments, and I still don't understand this comment. :-(
Is IRQ remapping considered to be enabled by this IOMMU driver, such that
cfg->vector and cfg->dest_apicid are ignored?  Or is the "when IRQ remapping
is enabled" a statement about some future enhancement?

> +  */
> + irq_data->chip_data = info->ioapic_entry;
> +
> + /*
> +  * Hypver-V IO APIC irq affinity should be in the scope of
> +  * ioapic_max_cpumask because no irq remapping support.
> +  */
> + desc = irq_data_to_desc(irq_data);
> + cpumask_and(desc->irq_common_data.affinity,
> + desc->irq_common_data.affinity,
> + _max_cpumask);
> +
> + return 0;
> +}
> +
> +static void hyperv_irq_remapping_free(struct irq_domain *domain,
> +  unsigned int virq, unsigned int nr_irqs)
> +{
> + irq_domain_free_irqs_common(domain, virq, nr_irqs);
> +}
> +
> +static int hyperv_irq_remappng_activate(struct irq_domain *domain,
> +   struct irq_data *irq_data, bool reserve)
> +{
> + struct irq_cfg *cfg = irqd_cfg(irq_data);
> + struct IO_APIC_route_entry *entry = irq_data->chip_data;
> +
> + entry->dest = cfg->dest_apicid;
> + entry->vector = cfg->vector;
> +
> + return 0;
> +}
> +
> +static struct irq_domain_ops hyperv_ir_domain_ops = {
> + .alloc = hyperv_irq_remapping_alloc,
> + .free = hyperv_irq_remapping_free,
> + .activate = hyperv_irq_remappng_activate,
> +};
> +
> +static int __init hyperv_prepare_irq_remapping(void)
> +{
> + struct fwnode_handle *fn;
> + u32 apic_id;
> + int i;
> +
> + if (x86_hyper_type != X86_HYPER_MS_HYPERV ||

The function hypervisor_is_type() exists for doing the above test.
See include/asm/hypervisor.h

> + !x2apic_supported())
> + return -ENODEV;
> +
> + fn =

72 matches

Mail list logo