Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-15 Thread Tianyu Lan




On 4/14/2021 11:45 PM, Greg KH wrote:

On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.
Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

Signed-off-by: Tianyu Lan 
---
  drivers/uio/uio_hv_generic.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 0330ba99730e..678b021d66f8 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "../hv/hyperv_vmbus.h"
  
@@ -241,6 +242,10 @@ hv_uio_probe(struct hv_device *dev,

void *ring_buffer;
int ret;
  
+	/* UIO driver should not be loaded in the isolation VM.*/

+   if (hv_is_isolation_supported())
+   return -ENOTSUPP;
+   
/* Communicating with host has to be via shared memory not hypercall */
if (!channel->offermsg.monitor_allocated) {
dev_err(>device, "vmbus channel requires hypercall\n");
--
2.25.1



Again you send out known-wrong patches?

:(

Sorry for noise. Will fix this next version and I think we should make 
sure user space driver to check data from host. This patch will be removed.


Re: [Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-15 Thread Tianyu Lan

Hi Stephen:
Thanks for your review.


On 4/15/2021 12:17 AM, Stephen Hemminger wrote:

On Wed, 14 Apr 2021 17:45:51 +0200
Greg KH  wrote:


On Wed, Apr 14, 2021 at 10:49:41AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.
Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

Signed-off-by: Tianyu Lan 


This is debatable, in isolation VM's shouldn't userspace take responsibility
to validate host communication. If that is an issue please participate with
the DPDK community (main user of this) to make sure netvsc userspace driver
has the required checks.



Agree. Will report back to secure team and apply request to add change 
in userspace netvsc driver. Thanks for advise.


Re: [Resend RFC PATCH V2 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-04-15 Thread Tianyu Lan




On 4/14/2021 11:50 PM, Christoph Hellwig wrote:

+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};


That's a rather generic name that is bound to create a conflict sooner
or later.


Good point. Will update.




  #include "hyperv_net.h"
  #include "netvsc_trace.h"
+#include "../../hv/hyperv_vmbus.h"


Please move public interfaces out of the private header rather than doing
this.


OK. Will update.




+   if (hv_isolation_type_snp()) {
+   area = get_vm_area(buf_size, VM_IOREMAP);


Err, no.  get_vm_area is private a for a reason.


+   if (!area)
+   goto cleanup;
+
+   vaddr = (unsigned long)area->addr;
+   for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
+   extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * 
HV_HYP_PAGE_SIZE)
+   << HV_HYP_PAGE_SHIFT) + 
ms_hyperv.shared_gpa_boundary;
+   ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
+  vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
+  extra_phys, PAGE_KERNEL_IO);
+   }
+
+   if (ret)
+   goto cleanup;


And this is not something a driver should ever do.  I think you are badly
reimplementing functionality that should be in the dma coherent allocator
here.


OK. I will try hiding these in the Hyper-V dma ops callback. Thanks.


Re: [Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb

2021-04-15 Thread Tianyu Lan

On 4/14/2021 11:41 PM, Christoph Hellwig wrote:

+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);


Just curious, who is going to use all these exports?  These seems like
extremely low-level functionality.  Isn't there a way to build a more
useful higher level API?



Yes, will remove it.



Re: [Resend RFC PATCH V2 03/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-04-15 Thread Tianyu Lan

Hi Christoph:
Thanks for your review.

On 4/14/2021 11:40 PM, Christoph Hellwig wrote:

+/*
+ * hv_set_mem_host_visibility - Set host visibility for specified memory.
+ */


I don't think this comment really clarifies anything over the function
name.  What is 'host visibility'


OK. Will update the comment.




+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)


Should size be a size_t?
Should visibility be an enum of some kind?



Will update.


+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)


Not sure what this does either.


Will add a comment.




+   local_irq_save(flags);
+   input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)


Is there a chance we could find a shorter but still descriptive
name for this variable?  Why do we need the cast?


Sure. The cast is to avoid build error due to "incompatible-pointer-types"



+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)


pointlessly overlong line.



Re: [RFC V2 PATCH 8/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-14 Thread Tianyu Lan

Hi Greg:
Thanks for your review.

On 4/14/2021 12:00 AM, Greg KH wrote:

On Tue, Apr 13, 2021 at 11:22:13AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.


Why?  I need a lot more excuse than that.


The reason is that ring buffers have been marked as visible to host.
UIO driver will expose these buffers to user space and user space
driver hasn't done some secure check for data from host. This
is considered as insecure in isolation VM.



Why would the vm allow UIO devices to bind to it if it was not possible?
Shouldn't the VM be handling this type of logic and not forcing all
individual hyperv drivers to do this?

This feels wrong...


Hypervisor exposes network and storage devices but can't prohibit guest
from binding these devices to UIO driver.

You are right. This should not happen in the individual driver and will
try handling this in the vmbus driver level.





thanks,

greg k-h



[Resend RFC PATCH V2 12/12] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb fucntion to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 67 +-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4fa77445fd..d271578b1811 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -414,6 +416,11 @@ static void storvsc_on_channel_callback(void *context);
 #define STORVSC_IDE_MAX_TARGETS1
 #define STORVSC_IDE_MAX_CHANNELS   1
 
+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};
+
 struct storvsc_cmd_request {
struct scsi_cmnd *cmd;
 
@@ -427,6 +434,8 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   u32 hvpg_count;
+   struct dma_range *dma_range;
 };
 
 
@@ -1236,6 +1245,7 @@ static void storvsc_on_channel_callback(void *context)
const struct vmpacket_descriptor *desc;
struct hv_device *device;
struct storvsc_device *stor_device;
+   int i;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1249,6 +1259,8 @@ static void storvsc_on_channel_callback(void *context)
foreach_vmbus_pkt(desc, channel) {
void *packet = hv_pkt_data(desc);
struct storvsc_cmd_request *request;
+   enum dma_data_direction dir;
+   u32 attrs;
u64 cmd_rqst;
 
cmd_rqst = vmbus_request_addr(>requestor,
@@ -1261,6 +1273,22 @@ static void storvsc_on_channel_callback(void *context)
 
request = (struct storvsc_cmd_request *)(unsigned long)cmd_rqst;
 
+   if (request->vstor_packet.vm_srb.data_in == READ_TYPE)
+   dir = DMA_FROM_DEVICE;
+else
+   dir = DMA_TO_DEVICE;
+
+   if (request->dma_range) {
+   for (i = 0; i < request->hvpg_count; i++)
+   dma_unmap_page_attrs(>device,
+   request->dma_range[i].dma,
+   
request->dma_range[i].mapping_size,
+   
request->vstor_packet.vm_srb.data_in
+== READ_TYPE ?
+   DMA_FROM_DEVICE : 
DMA_TO_DEVICE, attrs);
+   kfree(request->dma_range);
+   }
+
if (request == _device->init_request ||
request == _device->reset_request) {
memcpy(>vstor_packet, packet,
@@ -1682,8 +1710,10 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct vmscsi_request *vm_srb;
struct scatterlist *cur_sgl;
struct vmbus_packet_mpb_array  *payload;
+   enum dma_data_direction dir;
u32 payload_sz;
u32 length;
+   u32 attrs;
 
if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
/*
@@ -1722,14 +1752,17 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
case DMA_TO_DEVICE:
vm_srb->data_in = WRITE_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_OUT;
+   dir = DMA_TO_DEVICE;
break;
case DMA_FROM_DEVICE:
vm_srb->data_in = READ_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_IN;
+   dir = DMA_FROM_DEVICE;
break;
case DMA_NONE:
vm_srb->data_in = UNKNOWN_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_NO_DATA_TRANSFER;
+   dir = DMA_NONE;
break;
default:
/*
@@ -1786,6 +1819,12 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
hvpgoff = sgl->offset >> HV_HYP_PAGE_SHIFT;
 
cur_sgl = sgl;
+
+   cmd_request->dma_range = kzalloc(sizeof(struct dma_range) * 
hvpg_count,
+ GFP_ATOMIC);
+   if (!cmd_request->dma_range)
+   return -ENOMEM;
+
for (i 

[Resend RFC PATCH V2 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb fucntion to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |  11 +++
 drivers/net/hyperv/netvsc.c   | 137 --
 drivers/net/hyperv/rndis_filter.c |   3 +
 3 files changed, 144 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 2a87cfa27ac0..d85f811238c7 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -130,6 +130,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1026,6 +1027,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1034,6 +1036,8 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
+   u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1715,4 +1719,11 @@ struct rndis_message {
 #define TRANSPORT_INFO_IPV6_TCP 0x10
 #define TRANSPORT_INFO_IPV6_UDP 0x20
 
+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};
+
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 2353623259f3..1a5f5be4eeea 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -26,6 +26,7 @@
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
+#include "../../hv/hyperv_vmbus.h"
 
 /*
  * Switch the data path from the synthetic interface to the VF
@@ -119,8 +120,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   iounmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   iounmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -302,9 +316,12 @@ static int netvsc_init_buf(struct hv_device *device,
struct nvsp_1_message_send_receive_buffer_complete *resp;
struct net_device *ndev = hv_get_drvdata(device);
struct nvsp_message *init_packet;
+   struct vm_struct *area;
+   u64 extra_phys;
unsigned int buf_size;
+   unsigned long vaddr;
size_t map_words;
-   int ret = 0;
+   int ret = 0, i;
 
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -340,6 +357,27 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   area = get_vm_area(buf_size, VM_IOREMAP);
+   if (!area)
+   goto cleanup;
+
+   vaddr = (unsigned long)area->addr;
+   for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
+   extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * 
HV_HYP_PAGE_SIZE)
+   << HV_HYP_PAGE_SHIFT) + 
ms_hyperv.shared_gpa_boundary;
+   ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
+  vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
+  extra_phys, PAGE_KERNEL_IO);
+   }
+
+   if (ret)
+   goto cleanup;
+
+   net_device->recv_original_buf = net_device->recv_buf;
+   net_device->recv_buf = (void *)vaddr;
+   }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = _device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -432,6 +470,28 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;

[Resend RFC PATCH V2 10/12] HV/IOMMU: Add Hyper-V dma ops support

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM requires bounce buffer support. To use swiotlb
bounce buffer, add Hyper-V dma ops and use swiotlb functions in the
map and unmap callback.

Allocate bounce buffer in the Hyper-V code because bounce buffer
needs to be accessed via extra address space(e.g, address above 39bit)
in the AMD SEV SNP based Isolation VM.

ioremap_cache() can't use in the hyperv_iommu_swiotlb_init() which
is too early place and remap bounce buffer in the hyperv_iommu_swiotlb_
later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/pci-swiotlb.c |   3 +-
 drivers/hv/vmbus_drv.c|   3 +
 drivers/iommu/hyperv-iommu.c  | 127 ++
 3 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index c2cfa5e7c152..caaf68c06f24 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -15,6 +15,7 @@
 #include 
 
 int swiotlb __read_mostly;
+extern int hyperv_swiotlb;
 
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
@@ -68,7 +69,7 @@ void __init pci_swiotlb_init(void)
 void __init pci_swiotlb_late_init(void)
 {
/* An IOMMU turned us off. */
-   if (!swiotlb)
+   if (!swiotlb && !hyperv_swiotlb)
swiotlb_exit();
else {
printk(KERN_INFO "PCI-DMA: "
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 10dce9f91216..0ee6ec3a5de6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -2030,6 +2031,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2070,6 +2072,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..588ba847f0cc 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,19 +13,28 @@
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "irq_remapping.h"
 
 #ifdef CONFIG_IRQ_REMAP
 
+int hyperv_swiotlb __read_mostly;
+
 /*
  * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
  * Redirection Table. Hyper-V exposes one single IO-APIC and so define
@@ -36,6 +45,10 @@
 static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
 static struct irq_domain *ioapic_ir_domain;
 
+static unsigned long hyperv_io_tlb_start, *hyperv_io_tlb_end; 
+static unsigned long hyperv_io_tlb_nslabs, hyperv_io_tlb_size;
+static void *hyperv_io_tlb_remap;
+
 static int hyperv_ir_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
 {
@@ -337,4 +350,118 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
 };
 
+static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
+
+   if (!hv_is_isolation_supported())
+   return phys;
+
+   map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
+attrs);
+   if (map == (phys_addr_t)DMA_MAPPING_ERROR)
+   return DMA_MAPPING_ERROR;
+
+   return map;
+}
+
+static void hyperv_unmap_page(struct device *dev, dma_addr_t dev_addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+   if (!hv_is_isolation_supported())
+   return;
+
+   swiotlb_tbl_unmap_single(dev, dev_addr, size, HV_HYP_PAGE_SIZE, dir,
+   attrs);
+}
+
+int __init hyperv_swiotlb_init(void)
+{
+   unsigned long bytes;
+   void *vstart = 0;
+
+   bytes = 200 * 1024 * 1024;
+   vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
+   hyperv_io_tlb_nslabs = bytes >> IO_TLB_SHIFT;
+   hyperv_io_tlb_size = bytes;
+
+   if (!vstart) {
+   pr_warn("Fail to allocate swiotlb.\n");
+   return -ENOMEM;
+   }
+
+   hyperv_io_tlb_start = virt_to_phys(vstart);
+   if (!hyperv_io_tlb_start)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx.\n",
+ __func__, PAGE_ALIGN(bytes), PAGE_SIZE);
+
+ 

[Resend RFC PATCH V2 09/12] swiotlb: Add bounce buffer remap address setting function

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory)
needs to be accessed via extra address space(e.g address above bit39).
Hyper-V code may remap extra address space outside of swiotlb. swiotlb_bounce()
needs to use remap virtual address to copy data from/to bounce buffer. Add
new interface swiotlb_set_bounce_remap() to do that.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  5 +
 kernel/dma/swiotlb.c| 13 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d9c9fc9ca5d2..3ccd08116683 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -82,8 +82,13 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long new_size);
+void swiotlb_set_bounce_remap(unsigned char *vaddr);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
+static inline void swiotlb_set_bounce_remap(unsigned char *vaddr)
+{
+}
+
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
return false;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..5fd2db6aa149 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -94,6 +94,7 @@ static unsigned int io_tlb_index;
  * not be bounced (unless SWIOTLB_FORCE is set).
  */
 static unsigned int max_segment;
+static unsigned char *swiotlb_bounce_remap_addr;
 
 /*
  * We need to save away the original address corresponding to a mapped entry
@@ -421,6 +422,11 @@ void __init swiotlb_exit(void)
swiotlb_cleanup();
 }
 
+void swiotlb_set_bounce_remap(unsigned char *vaddr)
+{
+   swiotlb_bounce_remap_addr = vaddr;
+}
+
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
@@ -428,7 +434,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
   size_t size, enum dma_data_direction dir)
 {
unsigned long pfn = PFN_DOWN(orig_addr);
-   unsigned char *vaddr = phys_to_virt(tlb_addr);
+   unsigned char *vaddr;
+
+   if (swiotlb_bounce_remap_addr)
+   vaddr = swiotlb_bounce_remap_addr + tlb_addr - io_tlb_start;
+   else
+   vaddr = phys_to_virt(tlb_addr);
 
if (PageHighMem(pfn_to_page(pfn))) {
/* The buffer does not have a mapping.  Map it in and copy */
-- 
2.25.1



[Resend RFC PATCH V2 08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.
Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

Signed-off-by: Tianyu Lan 
---
 drivers/uio/uio_hv_generic.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 0330ba99730e..678b021d66f8 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../hv/hyperv_vmbus.h"
 
@@ -241,6 +242,10 @@ hv_uio_probe(struct hv_device *dev,
void *ring_buffer;
int ret;
 
+   /* UIO driver should not be loaded in the isolation VM.*/
+   if (hv_is_isolation_supported())
+   return -ENOTSUPP;
+   
/* Communicating with host has to be via shared memory not hypercall */
if (!channel->offermsg.monitor_allocated) {
dev_err(>device, "vmbus channel requires hypercall\n");
-- 
2.25.1



[Resend RFC PATCH V2 07/12] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 83 +--
 mm/ioremap.c  |  1 +
 mm/vmalloc.c  |  1 +
 5 files changed, 76 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 407b74d72f3f..4a9fb7ad4c72 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(>outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(>inbound,
+ [send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 0778add21a9c..d78a04ad5490 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 35833d4d1a1d..c8b0f7b45158 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(>outbound.ring_buffer_mutex);
 }
 
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
+{
+   struct vm_struct *area;
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long vaddr;
+   int err = 0;
+
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
+   area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
+   if (!area || !area->addr)
+   return -EFAULT;
+
+   vaddr = (unsigned long)area->addr;
+   err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
+  physic_addr, PAGE_KERNEL_IO);
+   err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
+ vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
+ physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
+   if (err) {
+   vunmap((void *)vaddr);
+   return -EFAULT;
+   }
+
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+   ring_info->ring_buffer->write_index = 0;
+   ring_info->ring_buffer->feature_bits.value = 1;
+
+   return 0;
+}
+
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 page_cnt)
@@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
 
BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
 
-   /*
-* First page holds struct hv_ring_buffer, do wraparound mapping for
-* the rest.
-*/
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
-  GFP_KERNEL);
-   if (!pages_wraparound)
-   return -ENOMEM;
-
-   pages_wraparound[0] = pages;
-   for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
+   if (!hv_isolation_type_snp()) {
+   /*
+* First page holds struct hv_ring_buffer, do wraparound 
mapping for
+* the rest.
+*/
+   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page 
*),
+  GFP_KERNEL);
+   if (!p

[Resend RFC PATCH V2 06/12] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT
msg should be in the extra address space for SNP support and these
pages also should be accessed via the extra address space inside Linux
guest and remap the extra address by ioremap function.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/connection.c   | 62 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 63 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 79bca653dce9..a0be9c11d737 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_isolation_type_snp()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_isolation_type_snp()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[1]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -156,6 +185,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -213,6 +243,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -279,6 +319,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -298,6 +340,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_isolation_type_snp()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   vunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+   }
+
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
vmbus_connection.monitor_pages[0] = NULL;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9416e09ebd58..0778add21a9c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
  

[Resend RFC PATCH V2 05/12] HV: Add ghcb hvcall support for SNP VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 69 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 ++-
 drivers/hv/hv.c |  8 +++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 2ec64b367aaf..0ad73ea60c8f 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -18,8 +18,77 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2   : 4;
+   u32 repstartindex   : 12;
+   u32 reserved3   : 4;
+   };
+   u64 asuint64;
+   } hypercallinput;
+   union {
+   struct {
+   u16 callstatus;
+   u16 reserved1;
+   u32 elementsprocessed : 12;
+   u32 reserved2 : 20;
+   };
+   u64 asunit64;
+   } hypercalloutput;
+   };
+   u64 reserved2;
+   } hypercall;
 } __packed __aligned(PAGE_SIZE);
 
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 1;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 73501dbbc240..929504fe8654 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -318,6 +318,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index c83612cddb99..79bca653dce9 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 069530eeb7c6..bff7c9049ffb 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, align

[Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers and these registers are emulated by
Hypervisor rather than paravisor.

Hyper-V requests to write SINTx MSR registers twice(once via
GHCB and once via wrmsr instruction including the proxy bit 21)
Guest OS ID MSR also needs to be set via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   |  18 +
 arch/x86/hyperv/ivm.c   | 130 
 arch/x86/include/asm/mshyperv.h |  87 +
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 drivers/hv/hv.c |  65 +++-
 include/asm-generic/mshyperv.h  |   4 +-
 6 files changed, 261 insertions(+), 46 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 90e65fbf4c58..87b1dd9c84d6 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -475,6 +475,9 @@ void __init hyperv_init(void)
 
ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
*ghcb_base = ghcb_va;
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -561,6 +564,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
 }
 EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.features_b & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index a5950b7a9214..2ec64b367aaf 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,12 +6,139 @@
  *  Tianyu Lan 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
+union hv_ghcb {
+   struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(_ghcb->ghcb, value >> 32);
+   ghcb_set_sw_exit_info_1(_ghcb->ghcb, 1);
+   ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_sw_exit_info_1(_ghcb->ghcb, 0);
+   ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+   else
+   *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+   | ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+   hv_ghcb_

[Resend RFC PATCH V2 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

"Resend all patches because someone in CC list didn't receive all
patchset. Sorry for nosy."

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Tianyu Lan (12):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in Isolation VM
  x86/Hyper-V: Add new hvcall guest address host visibility support
  HV: Add Write/Read MSR registers via ghcb
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  UIO/Hyper-V: Not load UIO HV driver in the isolation VM.
  swiotlb: Add bounce buffer remap address setting function
  HV/IOMMU: Add Hyper-V dma ops support
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  70 +--
 arch/x86/hyperv/ivm.c  | 289 +
 arch/x86/include/asm/hyperv-tlfs.h |  22 +++
 arch/x86/include/asm/mshyperv.h|  90 +++--
 arch/x86/kernel/cpu/mshyperv.c |   5 +
 arch/x86/kernel/pci-swiotlb.c  |   3 +-
 drivers/hv/channel.c   |  44 -
 drivers/hv/connection.c|  68 ++-
 drivers/hv/hv.c|  73 ++--
 drivers/hv/hyperv_vmbus.h  |   3 +
 drivers/hv/ring_buffer.c   |  83 ++---
 drivers/hv/vmbus_drv.c |   3 +
 drivers/iommu/hyperv-iommu.c   | 127 +
 drivers/net/hyperv/hyperv_net.h|  11 ++
 drivers/net/hyperv/netvsc.c| 137 +-
 drivers/net/hyperv/rndis_filter.c  |   3 +
 drivers/scsi/storvsc_drv.c |  67 ++-
 drivers/uio/uio_hv_generic.c   |   5 +
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  18 +-
 include/linux/hyperv.h |  12 +-
 include/linux/swiotlb.h|   5 +
 kernel/dma/swiotlb.c   |  13 +-
 mm/ioremap.c   |   1 +
 mm/vmalloc.c   |   1 +
 26 files changed, 1068 insertions(+), 88 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

-- 
2.25.1



[Resend RFC PATCH V2 03/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Add new hvcall guest address host visibility support. Mark vmbus
ring buffer visible to host when create gpadl buffer and mark back
to not visible when tear down gpadl buffer.

Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/Makefile   |  2 +-
 arch/x86/hyperv/ivm.c  | 90 ++
 arch/x86/include/asm/hyperv-tlfs.h | 22 
 arch/x86/include/asm/mshyperv.h|  2 +
 drivers/hv/channel.c   | 34 ++-
 include/asm-generic/hyperv-tlfs.h  |  1 +
 include/linux/hyperv.h | 12 +++-
 7 files changed, 159 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..a5950b7a9214
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * hv_set_mem_host_visibility - Set host visibility for specified memory.
+ */
+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
+{
+   int i, pfn;
+   int pagecount = size >> HV_HYP_PAGE_SHIFT;
+   u64 *pfn_array;
+   int ret = 0;
+
+   pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
+   if (!pfn_array)
+   return -ENOMEM;
+
+   for (i = 0, pfn = 0; i < pagecount; i++) {
+   pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+   pfn++;
+
+   if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+   ret = hv_mark_gpa_visibility(pfn, pfn_array, 
visibility);
+   pfn = 0;
+
+   if (ret)
+   goto err_free_pfn_array;
+   }
+   }
+
+ err_free_pfn_array:
+   vfree(pfn_array);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(hv_set_mem_host_visibility);
+
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+   struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
+   struct hv_input_modify_sparse_gpa_page_host_visibility *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -1;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index e6cd3fee562b..1f1ce9afb6f1 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -236,6 +236,15 @@ enum hv_isolation_type {
 /* TSC invariant control */
 #define HV_X64_MSR_TSC_INVARIANT_CONTROL   0x4118
 
+/* Hyper-V GPA map flags */
+#define HV_MAP_GPA_PERMISSIONS_NONE0x0
+#define HV_MAP_GPA_READABLE0x1
+#define HV_MAP_GPA_WRITABLE0x2
+
+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
+#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
@@ -564,4 +573,17 @@ enum hv_interrupt_type {
 

[Resend RFC PATCH V2 01/12] x86/HV: Initialize GHCB page in Isolation VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c  | 52 +++---
 include/asm-generic/mshyperv.h |  1 +
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0db5137d5b81..90e65fbf4c58 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -82,6 +82,9 @@ static int hv_cpu_init(unsigned int cpu)
struct hv_vp_assist_page **hvp = _vp_assist_page[smp_processor_id()];
void **input_arg;
struct page *pg;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
hv_root_partition ? 1 : 0);
@@ -128,6 +131,17 @@ static int hv_cpu_init(unsigned int cpu)
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}
 
+   if (ms_hyperv.ghcb_base) {
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+   }
+
return 0;
 }
 
@@ -223,6 +237,7 @@ static int hv_cpu_die(unsigned int cpu)
unsigned long flags;
void **input_arg;
void *pg;
+   void **ghcb_va = NULL;
 
local_irq_save(flags);
input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -236,6 +251,13 @@ static int hv_cpu_die(unsigned int cpu)
*output_arg = NULL;
}
 
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   iounmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
+
local_irq_restore(flags);
 
free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
@@ -372,6 +394,9 @@ void __init hyperv_init(void)
u64 guest_id, required_msrs;
union hv_x64_msr_hypercall_contents hypercall_msr;
int cpuhp, i;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return;
@@ -432,9 +457,24 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -499,7 +539,8 @@ void __init hyperv_init(void)
 
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -528,6 +569,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index dff58a3db5d5..c6f4c5c20fb8 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,6 +35,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+   void  __percpu **ghcb_base;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1



[Resend RFC PATCH V2 02/12] x86/HV: Initialize shared memory boundary in Isolation VM

2021-04-14 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes shared memory boundary via cpuid HYPERV_
CPUID_ISOLATION_CONFIG and store it in the shared_gpa_
boundary of ms_hyperv struct. This prepares to share
memory with host for AMD SEV SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e88bc296afca..aeafd4017c89 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -328,6 +328,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.features_b & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+   ms_hyperv.shared_gpa_boundary =
+   (u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c6f4c5c20fb8..b73e201abc70 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -34,8 +34,19 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
-   u32 isolation_config_b;
+   union
+   {
+   u32 isolation_config_b;
+   struct {
+   u32 cvm_type : 4;
+   u32 Reserved11 : 1;
+   u32 shared_gpa_boundary_active : 1;
+   u32 shared_gpa_boundary_bits : 6;
+   u32 Reserved12 : 20;
+   };
+   };
void  __percpu **ghcb_base;
+   u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1



Re: [RFC V2 PATCH 9/12] swiotlb: Add bounce buffer remap address setting function

2021-04-14 Thread Tianyu Lan

On 4/14/2021 2:43 PM, Christoph Hellwig wrote:

On Tue, Apr 13, 2021 at 11:22:14AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory)
needs to be accessed via extra address space(e.g address above bit39).
Hyper-V code may remap extra address space outside of swiotlb. swiotlb_bounce()
needs to use remap virtual address to copy data from/to bounce buffer. Add
new interface swiotlb_set_bounce_remap() to do that.


I have no way to review what this actually doing when you only Cc me
on a single patch.  Please make sure everyone is Cced on the whole
series to enable proper review.



Sure. I will resend all patches. Thanks for reminder.


[RFC V2 PATCH 9/12] swiotlb: Add bounce buffer remap address setting function

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory)
needs to be accessed via extra address space(e.g address above bit39).
Hyper-V code may remap extra address space outside of swiotlb. swiotlb_bounce()
needs to use remap virtual address to copy data from/to bounce buffer. Add
new interface swiotlb_set_bounce_remap() to do that.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  5 +
 kernel/dma/swiotlb.c| 13 -
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d9c9fc9ca5d2..3ccd08116683 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -82,8 +82,13 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long new_size);
+void swiotlb_set_bounce_remap(unsigned char *vaddr);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
+static inline void swiotlb_set_bounce_remap(unsigned char *vaddr)
+{
+}
+
 static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
return false;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..5fd2db6aa149 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -94,6 +94,7 @@ static unsigned int io_tlb_index;
  * not be bounced (unless SWIOTLB_FORCE is set).
  */
 static unsigned int max_segment;
+static unsigned char *swiotlb_bounce_remap_addr;
 
 /*
  * We need to save away the original address corresponding to a mapped entry
@@ -421,6 +422,11 @@ void __init swiotlb_exit(void)
swiotlb_cleanup();
 }
 
+void swiotlb_set_bounce_remap(unsigned char *vaddr)
+{
+   swiotlb_bounce_remap_addr = vaddr;
+}
+
 /*
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
@@ -428,7 +434,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
   size_t size, enum dma_data_direction dir)
 {
unsigned long pfn = PFN_DOWN(orig_addr);
-   unsigned char *vaddr = phys_to_virt(tlb_addr);
+   unsigned char *vaddr;
+
+   if (swiotlb_bounce_remap_addr)
+   vaddr = swiotlb_bounce_remap_addr + tlb_addr - io_tlb_start;
+   else
+   vaddr = phys_to_virt(tlb_addr);
 
if (PageHighMem(pfn_to_page(pfn))) {
/* The buffer does not have a mapping.  Map it in and copy */
-- 
2.25.1



[RFC V2 PATCH 7/12] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 83 +--
 mm/ioremap.c  |  1 +
 mm/vmalloc.c  |  1 +
 5 files changed, 76 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 407b74d72f3f..4a9fb7ad4c72 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(>outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(>inbound,
+ [send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 0778add21a9c..d78a04ad5490 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 35833d4d1a1d..c8b0f7b45158 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(>outbound.ring_buffer_mutex);
 }
 
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
+{
+   struct vm_struct *area;
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long vaddr;
+   int err = 0;
+
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
+   area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
+   if (!area || !area->addr)
+   return -EFAULT;
+
+   vaddr = (unsigned long)area->addr;
+   err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
+  physic_addr, PAGE_KERNEL_IO);
+   err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
+ vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
+ physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
+   if (err) {
+   vunmap((void *)vaddr);
+   return -EFAULT;
+   }
+
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+   ring_info->ring_buffer->write_index = 0;
+   ring_info->ring_buffer->feature_bits.value = 1;
+
+   return 0;
+}
+
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 page_cnt)
@@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
 
BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
 
-   /*
-* First page holds struct hv_ring_buffer, do wraparound mapping for
-* the rest.
-*/
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
-  GFP_KERNEL);
-   if (!pages_wraparound)
-   return -ENOMEM;
-
-   pages_wraparound[0] = pages;
-   for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
+   if (!hv_isolation_type_snp()) {
+   /*
+* First page holds struct hv_ring_buffer, do wraparound 
mapping for
+* the rest.
+*/
+   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page 
*),
+  GFP_KERNEL);
+   if (!p

[RFC V2 PATCH 1/12] x86/HV: Initialize GHCB page in Isolation VM

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c  | 52 +++---
 include/asm-generic/mshyperv.h |  1 +
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0db5137d5b81..90e65fbf4c58 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -82,6 +82,9 @@ static int hv_cpu_init(unsigned int cpu)
struct hv_vp_assist_page **hvp = _vp_assist_page[smp_processor_id()];
void **input_arg;
struct page *pg;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
hv_root_partition ? 1 : 0);
@@ -128,6 +131,17 @@ static int hv_cpu_init(unsigned int cpu)
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}
 
+   if (ms_hyperv.ghcb_base) {
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+   }
+
return 0;
 }
 
@@ -223,6 +237,7 @@ static int hv_cpu_die(unsigned int cpu)
unsigned long flags;
void **input_arg;
void *pg;
+   void **ghcb_va = NULL;
 
local_irq_save(flags);
input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -236,6 +251,13 @@ static int hv_cpu_die(unsigned int cpu)
*output_arg = NULL;
}
 
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   iounmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
+
local_irq_restore(flags);
 
free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
@@ -372,6 +394,9 @@ void __init hyperv_init(void)
u64 guest_id, required_msrs;
union hv_x64_msr_hypercall_contents hypercall_msr;
int cpuhp, i;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return;
@@ -432,9 +457,24 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -499,7 +539,8 @@ void __init hyperv_init(void)
 
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -528,6 +569,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index dff58a3db5d5..c6f4c5c20fb8 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,6 +35,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+   void  __percpu **ghcb_base;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1



[RFC V2 PATCH 6/12] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT
msg should be in the extra address space for SNP support and these
pages also should be accessed via the extra address space inside Linux
guest and remap the extra address by ioremap function.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/connection.c   | 62 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 63 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 79bca653dce9..a0be9c11d737 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_isolation_type_snp()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_isolation_type_snp()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[1]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -156,6 +185,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -213,6 +243,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -279,6 +319,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -298,6 +340,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_isolation_type_snp()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   vunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+   }
+
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
vmbus_connection.monitor_pages[0] = NULL;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9416e09ebd58..0778add21a9c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
  

[RFC V2 PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 


Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible to
host.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add Hyper-V DMA Ops and use DMA API in the netvsc and storvsc
drivers to allocate bounce buffer via swiotlb interface.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Change since v1:
   * Add DMA API support in the netvsc and storvsc driver.
   * Add Hyper-V DMA ops.
   * Add static branch for the check of isolation type snp.
   * Fix some code style comments.

Tianyu Lan (12):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in Isolation VM
  x86/Hyper-V: Add new hvcall guest address host visibility support
  HV: Add Write/Read MSR registers via ghcb
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  UIO/Hyper-V: Not load UIO HV driver in the isolation VM.
  swiotlb: Add bounce buffer remap address setting function
  HV/IOMMU: Add Hyper-V dma ops support
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  70 +--
 arch/x86/hyperv/ivm.c  | 289 +
 arch/x86/include/asm/hyperv-tlfs.h |  22 +++
 arch/x86/include/asm/mshyperv.h|  90 +++--
 arch/x86/kernel/cpu/mshyperv.c |   5 +
 arch/x86/kernel/pci-swiotlb.c  |   3 +-
 drivers/hv/channel.c   |  44 -
 drivers/hv/connection.c|  68 ++-
 drivers/hv/hv.c|  73 ++--
 drivers/hv/hyperv_vmbus.h  |   3 +
 drivers/hv/ring_buffer.c   |  83 ++---
 drivers/hv/vmbus_drv.c |   3 +
 drivers/iommu/hyperv-iommu.c   | 127 +
 drivers/net/hyperv/hyperv_net.h|  11 ++
 drivers/net/hyperv/netvsc.c| 137 +-
 drivers/net/hyperv/rndis_filter.c  |   3 +
 drivers/scsi/storvsc_drv.c |  67 ++-
 drivers/uio/uio_hv_generic.c   |   5 +
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  18 +-
 include/linux/hyperv.h |  12 +-
 include/linux/swiotlb.h|   5 +
 kernel/dma/swiotlb.c   |  13 +-
 mm/ioremap.c   |   1 +
 mm/vmalloc.c   |   1 +
 26 files changed, 1068 insertions(+), 88 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

-- 
2.25.1



[RFC V2 PATCH 4/12] HV: Add Write/Read MSR registers via ghcb

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers and these registers are emulated by
Hypervisor rather than paravisor.

Hyper-V requests to write SINTx MSR registers twice(once via
GHCB and once via wrmsr instruction including the proxy bit 21)
Guest OS ID MSR also needs to be set via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   |  18 +
 arch/x86/hyperv/ivm.c   | 130 
 arch/x86/include/asm/mshyperv.h |  87 +
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 drivers/hv/hv.c |  65 +++-
 include/asm-generic/mshyperv.h  |   4 +-
 6 files changed, 261 insertions(+), 46 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 90e65fbf4c58..87b1dd9c84d6 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -475,6 +475,9 @@ void __init hyperv_init(void)
 
ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
*ghcb_base = ghcb_va;
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -561,6 +564,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
 }
 EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.features_b & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index a5950b7a9214..2ec64b367aaf 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,12 +6,139 @@
  *  Tianyu Lan 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
+union hv_ghcb {
+   struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(_ghcb->ghcb, value >> 32);
+   ghcb_set_sw_exit_info_1(_ghcb->ghcb, 1);
+   ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_sw_exit_info_1(_ghcb->ghcb, 0);
+   ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+   else
+   *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+   | ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+   hv_ghcb_

[RFC V2 PATCH 12/12] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb fucntion to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 67 +-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4fa77445fd..d271578b1811 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -414,6 +416,11 @@ static void storvsc_on_channel_callback(void *context);
 #define STORVSC_IDE_MAX_TARGETS1
 #define STORVSC_IDE_MAX_CHANNELS   1
 
+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};
+
 struct storvsc_cmd_request {
struct scsi_cmnd *cmd;
 
@@ -427,6 +434,8 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   u32 hvpg_count;
+   struct dma_range *dma_range;
 };
 
 
@@ -1236,6 +1245,7 @@ static void storvsc_on_channel_callback(void *context)
const struct vmpacket_descriptor *desc;
struct hv_device *device;
struct storvsc_device *stor_device;
+   int i;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1249,6 +1259,8 @@ static void storvsc_on_channel_callback(void *context)
foreach_vmbus_pkt(desc, channel) {
void *packet = hv_pkt_data(desc);
struct storvsc_cmd_request *request;
+   enum dma_data_direction dir;
+   u32 attrs;
u64 cmd_rqst;
 
cmd_rqst = vmbus_request_addr(>requestor,
@@ -1261,6 +1273,22 @@ static void storvsc_on_channel_callback(void *context)
 
request = (struct storvsc_cmd_request *)(unsigned long)cmd_rqst;
 
+   if (request->vstor_packet.vm_srb.data_in == READ_TYPE)
+   dir = DMA_FROM_DEVICE;
+else
+   dir = DMA_TO_DEVICE;
+
+   if (request->dma_range) {
+   for (i = 0; i < request->hvpg_count; i++)
+   dma_unmap_page_attrs(>device,
+   request->dma_range[i].dma,
+   
request->dma_range[i].mapping_size,
+   
request->vstor_packet.vm_srb.data_in
+== READ_TYPE ?
+   DMA_FROM_DEVICE : 
DMA_TO_DEVICE, attrs);
+   kfree(request->dma_range);
+   }
+
if (request == _device->init_request ||
request == _device->reset_request) {
memcpy(>vstor_packet, packet,
@@ -1682,8 +1710,10 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct vmscsi_request *vm_srb;
struct scatterlist *cur_sgl;
struct vmbus_packet_mpb_array  *payload;
+   enum dma_data_direction dir;
u32 payload_sz;
u32 length;
+   u32 attrs;
 
if (vmstor_proto_version <= VMSTOR_PROTO_VERSION_WIN8) {
/*
@@ -1722,14 +1752,17 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
case DMA_TO_DEVICE:
vm_srb->data_in = WRITE_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_OUT;
+   dir = DMA_TO_DEVICE;
break;
case DMA_FROM_DEVICE:
vm_srb->data_in = READ_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_DATA_IN;
+   dir = DMA_FROM_DEVICE;
break;
case DMA_NONE:
vm_srb->data_in = UNKNOWN_TYPE;
vm_srb->win8_extension.srb_flags |= SRB_FLAGS_NO_DATA_TRANSFER;
+   dir = DMA_NONE;
break;
default:
/*
@@ -1786,6 +1819,12 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
hvpgoff = sgl->offset >> HV_HYP_PAGE_SHIFT;
 
cur_sgl = sgl;
+
+   cmd_request->dma_range = kzalloc(sizeof(struct dma_range) * 
hvpg_count,
+ GFP_ATOMIC);
+   if (!cmd_request->dma_range)
+   return -ENOMEM;
+
for (i 

[RFC V2 PATCH 2/12] x86/HV: Initialize shared memory boundary in Isolation VM

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes shared memory boundary via cpuid HYPERV_
CPUID_ISOLATION_CONFIG and store it in the shared_gpa_
boundary of ms_hyperv struct. This prepares to share
memory with host for AMD SEV SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e88bc296afca..aeafd4017c89 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -328,6 +328,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.features_b & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+   ms_hyperv.shared_gpa_boundary =
+   (u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c6f4c5c20fb8..b73e201abc70 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -34,8 +34,19 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
-   u32 isolation_config_b;
+   union
+   {
+   u32 isolation_config_b;
+   struct {
+   u32 cvm_type : 4;
+   u32 Reserved11 : 1;
+   u32 shared_gpa_boundary_active : 1;
+   u32 shared_gpa_boundary_bits : 6;
+   u32 Reserved12 : 20;
+   };
+   };
void  __percpu **ghcb_base;
+   u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1



[RFC V2 PATCH 10/12] HV/IOMMU: Add Hyper-V dma ops support

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM requires bounce buffer support. To use swiotlb
bounce buffer, add Hyper-V dma ops and use swiotlb functions in the
map and unmap callback.

Allocate bounce buffer in the Hyper-V code because bounce buffer
needs to be accessed via extra address space(e.g, address above 39bit)
in the AMD SEV SNP based Isolation VM.

ioremap_cache() can't use in the hyperv_iommu_swiotlb_init() which
is too early place and remap bounce buffer in the hyperv_iommu_swiotlb_
later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/pci-swiotlb.c |   3 +-
 drivers/hv/vmbus_drv.c|   3 +
 drivers/iommu/hyperv-iommu.c  | 127 ++
 3 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index c2cfa5e7c152..caaf68c06f24 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -15,6 +15,7 @@
 #include 
 
 int swiotlb __read_mostly;
+extern int hyperv_swiotlb;
 
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
@@ -68,7 +69,7 @@ void __init pci_swiotlb_init(void)
 void __init pci_swiotlb_late_init(void)
 {
/* An IOMMU turned us off. */
-   if (!swiotlb)
+   if (!swiotlb && !hyperv_swiotlb)
swiotlb_exit();
else {
printk(KERN_INFO "PCI-DMA: "
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 10dce9f91216..0ee6ec3a5de6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -2030,6 +2031,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2070,6 +2072,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..588ba847f0cc 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,19 +13,28 @@
 #include 
 #include 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "irq_remapping.h"
 
 #ifdef CONFIG_IRQ_REMAP
 
+int hyperv_swiotlb __read_mostly;
+
 /*
  * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
  * Redirection Table. Hyper-V exposes one single IO-APIC and so define
@@ -36,6 +45,10 @@
 static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
 static struct irq_domain *ioapic_ir_domain;
 
+static unsigned long hyperv_io_tlb_start, *hyperv_io_tlb_end; 
+static unsigned long hyperv_io_tlb_nslabs, hyperv_io_tlb_size;
+static void *hyperv_io_tlb_remap;
+
 static int hyperv_ir_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
 {
@@ -337,4 +350,118 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
 };
 
+static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
+
+   if (!hv_is_isolation_supported())
+   return phys;
+
+   map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
+attrs);
+   if (map == (phys_addr_t)DMA_MAPPING_ERROR)
+   return DMA_MAPPING_ERROR;
+
+   return map;
+}
+
+static void hyperv_unmap_page(struct device *dev, dma_addr_t dev_addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+   if (!hv_is_isolation_supported())
+   return;
+
+   swiotlb_tbl_unmap_single(dev, dev_addr, size, HV_HYP_PAGE_SIZE, dir,
+   attrs);
+}
+
+int __init hyperv_swiotlb_init(void)
+{
+   unsigned long bytes;
+   void *vstart = 0;
+
+   bytes = 200 * 1024 * 1024;
+   vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
+   hyperv_io_tlb_nslabs = bytes >> IO_TLB_SHIFT;
+   hyperv_io_tlb_size = bytes;
+
+   if (!vstart) {
+   pr_warn("Fail to allocate swiotlb.\n");
+   return -ENOMEM;
+   }
+
+   hyperv_io_tlb_start = virt_to_phys(vstart);
+   if (!hyperv_io_tlb_start)
+   panic("%s: Failed to allocate %lu bytes align=0x%lx.\n",
+ __func__, PAGE_ALIGN(bytes), PAGE_SIZE);
+
+ 

[RFC V2 PATCH 5/12] HV: Add ghcb hvcall support for SNP VM

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 69 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 ++-
 drivers/hv/hv.c |  8 +++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 2ec64b367aaf..0ad73ea60c8f 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -18,8 +18,77 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2   : 4;
+   u32 repstartindex   : 12;
+   u32 reserved3   : 4;
+   };
+   u64 asuint64;
+   } hypercallinput;
+   union {
+   struct {
+   u16 callstatus;
+   u16 reserved1;
+   u32 elementsprocessed : 12;
+   u32 reserved2 : 20;
+   };
+   u64 asunit64;
+   } hypercalloutput;
+   };
+   u64 reserved2;
+   } hypercall;
 } __packed __aligned(PAGE_SIZE);
 
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 1;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 73501dbbc240..929504fe8654 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -318,6 +318,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index c83612cddb99..79bca653dce9 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 069530eeb7c6..bff7c9049ffb 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, align

[RFC V2 PATCH 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb fucntion to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |  11 +++
 drivers/net/hyperv/netvsc.c   | 137 --
 drivers/net/hyperv/rndis_filter.c |   3 +
 3 files changed, 144 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 2a87cfa27ac0..d85f811238c7 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -130,6 +130,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1026,6 +1027,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1034,6 +1036,8 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
+   u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1715,4 +1719,11 @@ struct rndis_message {
 #define TRANSPORT_INFO_IPV6_TCP 0x10
 #define TRANSPORT_INFO_IPV6_UDP 0x20
 
+struct dma_range {
+   dma_addr_t dma;
+   u32 mapping_size;
+};
+
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 2353623259f3..1a5f5be4eeea 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -26,6 +26,7 @@
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
+#include "../../hv/hyperv_vmbus.h"
 
 /*
  * Switch the data path from the synthetic interface to the VF
@@ -119,8 +120,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   iounmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   iounmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -302,9 +316,12 @@ static int netvsc_init_buf(struct hv_device *device,
struct nvsp_1_message_send_receive_buffer_complete *resp;
struct net_device *ndev = hv_get_drvdata(device);
struct nvsp_message *init_packet;
+   struct vm_struct *area;
+   u64 extra_phys;
unsigned int buf_size;
+   unsigned long vaddr;
size_t map_words;
-   int ret = 0;
+   int ret = 0, i;
 
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -340,6 +357,27 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   area = get_vm_area(buf_size, VM_IOREMAP);
+   if (!area)
+   goto cleanup;
+
+   vaddr = (unsigned long)area->addr;
+   for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
+   extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * 
HV_HYP_PAGE_SIZE)
+   << HV_HYP_PAGE_SHIFT) + 
ms_hyperv.shared_gpa_boundary;
+   ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
+  vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
+  extra_phys, PAGE_KERNEL_IO);
+   }
+
+   if (ret)
+   goto cleanup;
+
+   net_device->recv_original_buf = net_device->recv_buf;
+   net_device->recv_buf = (void *)vaddr;
+   }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = _device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -432,6 +470,28 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;

[RFC V2 PATCH 3/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

Add new hvcall guest address host visibility support. Mark vmbus
ring buffer visible to host when create gpadl buffer and mark back
to not visible when tear down gpadl buffer.

Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/Makefile   |  2 +-
 arch/x86/hyperv/ivm.c  | 90 ++
 arch/x86/include/asm/hyperv-tlfs.h | 22 
 arch/x86/include/asm/mshyperv.h|  2 +
 drivers/hv/channel.c   | 34 ++-
 include/asm-generic/hyperv-tlfs.h  |  1 +
 include/linux/hyperv.h | 12 +++-
 7 files changed, 159 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..a5950b7a9214
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * hv_set_mem_host_visibility - Set host visibility for specified memory.
+ */
+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)
+{
+   int i, pfn;
+   int pagecount = size >> HV_HYP_PAGE_SHIFT;
+   u64 *pfn_array;
+   int ret = 0;
+
+   pfn_array = vzalloc(HV_HYP_PAGE_SIZE);
+   if (!pfn_array)
+   return -ENOMEM;
+
+   for (i = 0, pfn = 0; i < pagecount; i++) {
+   pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+   pfn++;
+
+   if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+   ret = hv_mark_gpa_visibility(pfn, pfn_array, 
visibility);
+   pfn = 0;
+
+   if (ret)
+   goto err_free_pfn_array;
+   }
+   }
+
+ err_free_pfn_array:
+   vfree(pfn_array);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(hv_set_mem_host_visibility);
+
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+   struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
+   struct hv_input_modify_sparse_gpa_page_host_visibility *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -1;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index e6cd3fee562b..1f1ce9afb6f1 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -236,6 +236,15 @@ enum hv_isolation_type {
 /* TSC invariant control */
 #define HV_X64_MSR_TSC_INVARIANT_CONTROL   0x4118
 
+/* Hyper-V GPA map flags */
+#define HV_MAP_GPA_PERMISSIONS_NONE0x0
+#define HV_MAP_GPA_READABLE0x1
+#define HV_MAP_GPA_WRITABLE0x2
+
+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
+#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
@@ -564,4 +573,17 @@ enum hv_interrupt_type {
 

[RFC V2 PATCH 8/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM.

2021-04-13 Thread Tianyu Lan
From: Tianyu Lan 

UIO HV driver should not load in the isolation VM for security reason.
Return ENOTSUPP in the hv_uio_probe() in the isolation VM.

Signed-off-by: Tianyu Lan 
---
 drivers/uio/uio_hv_generic.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 0330ba99730e..678b021d66f8 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../hv/hyperv_vmbus.h"
 
@@ -241,6 +242,10 @@ hv_uio_probe(struct hv_device *dev,
void *ring_buffer;
int ret;
 
+   /* UIO driver should not be loaded in the isolation VM.*/
+   if (hv_is_isolation_supported())
+   return -ENOTSUPP;
+   
/* Communicating with host has to be via shared memory not hypercall */
if (!channel->offermsg.monitor_allocated) {
dev_err(>device, "vmbus channel requires hypercall\n");
-- 
2.25.1



Re: [RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM

2021-03-05 Thread Tianyu Lan




On 3/4/2021 1:21 AM, Vitaly Kuznetsov wrote:

Tianyu Lan  writes:


From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
  arch/x86/hyperv/ivm.c   | 69 +
  arch/x86/include/asm/mshyperv.h |  1 +
  drivers/hv/connection.c |  6 ++-
  drivers/hv/hv.c |  8 +++-
  4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 4332bf7aaf9b..feaabcd151f5 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -14,8 +14,77 @@
  
  union hv_ghcb {

struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2   : 4;
+   u32 repstartindex   : 12;
+   u32 reserved3   : 4;
+   };
+   u64 asuint64;
+   } hypercallinput;
+   union {
+   struct {
+   u16 callstatus;
+   u16 reserved1;
+   u32 elementsprocessed : 12;
+   u32 reserved2 : 20;
+   };
+   u64 asunit64;
+   } hypercalloutput;
+   };
+   u64 reserved2;
+   } hypercall;
  } __packed __aligned(PAGE_SIZE);
  
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)

+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 1;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
  void hv_ghcb_msr_write(u64 msr, u64 value)
  {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f624d72b99d3..c8f66d269e5b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -274,6 +274,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
  void hv_signal_eom_ghcb(void);
  void hv_ghcb_msr_write(u64 msr, u64 value);
  void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
  
  #define hv_get_synint_state_ghcb(int_num, val)			\

hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index c83612cddb99..79bca653dce9 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
  
  	++channel->sig_events;
  
-	hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);

+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);


vmbus_set_event() is a hotpath so I'd suggest we introduce a static
branch instead of checking hv_isolation_type_snp() every time.



Good suggestion. Will add it in the next version. Thanks.



Re: [RFC PATCH 4/12] HV: Add Write/Read MSR registers via ghcb

2021-03-04 Thread Tianyu Lan




On 3/4/2021 1:16 AM, Vitaly Kuznetsov wrote:

Tianyu Lan  writes:


From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers and these registers are emulated by
Hypervisor rather than paravisor.

Hyper-V requests to write SINTx MSR registers twice(once via
GHCB and once via wrmsr instruction including the proxy bit 21)
Guest OS ID MSR also needs to be set via GHCB.

Signed-off-by: Tianyu Lan 
---
  arch/x86/hyperv/Makefile|   2 +-
  arch/x86/hyperv/hv_init.c   |  18 +--
  arch/x86/hyperv/ivm.c   | 178 ++
  arch/x86/include/asm/mshyperv.h |  21 +++-
  arch/x86/kernel/cpu/mshyperv.c  |  46 
  drivers/hv/channel.c|   2 +-
  drivers/hv/hv.c | 188 ++--
  include/asm-generic/mshyperv.h  |  10 +-
  8 files changed, 343 insertions(+), 122 deletions(-)
  create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
  # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
  obj-$(CONFIG_X86_64)  += hv_apic.o hv_proc.o
  
  ifdef CONFIG_X86_64

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 90e65fbf4c58..87b1dd9c84d6 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -475,6 +475,9 @@ void __init hyperv_init(void)
  
  		ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);

*ghcb_base = ghcb_va;
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
  
  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);

@@ -561,6 +564,7 @@ void hyperv_cleanup(void)
  
  	/* Reset our OS id */

wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
  
  	/*

 * Reset hypercall page reference before reset the page,
@@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
  }
  EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.features_b & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..4332bf7aaf9b
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+union hv_ghcb {
+   struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(_ghcb->ghcb, value >> 32);
+   ghcb_set_sw_exit_info_1(_ghcb->ghcb, 1);
+   ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->

Re: [RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-03-04 Thread Tianyu Lan



On 3/4/2021 12:58 AM, Vitaly Kuznetsov wrote:

Tianyu Lan  writes:


From: Tianyu Lan 

Add new hvcall guest address host visibility support. Mark vmbus
ring buffer visible to host when create gpadl buffer and mark back
to not visible when tear down gpadl buffer.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
  arch/x86/include/asm/hyperv-tlfs.h | 13 
  arch/x86/include/asm/mshyperv.h|  4 +--
  arch/x86/kernel/cpu/mshyperv.c | 46 ++
  drivers/hv/channel.c   | 53 --
  drivers/net/hyperv/hyperv_net.h|  1 +
  drivers/net/hyperv/netvsc.c|  9 +++--
  drivers/uio/uio_hv_generic.c   |  6 ++--
  include/asm-generic/hyperv-tlfs.h  |  1 +
  include/linux/hyperv.h |  3 +-
  9 files changed, 126 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index fb1893a4c32b..d22b1c3f425a 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -573,4 +573,17 @@ enum hv_interrupt_type {
  
  #include 
  
+/* All input parameters should be in single page. */

+#define HV_MAX_MODIFY_GPA_REP_COUNT\
+   ((PAGE_SIZE - 2 * sizeof(u64)) / (sizeof(u64)))


Would it be easier to express this as '((PAGE_SIZE / sizeof(u64)) - 2'

Yes, will update. Thanks.


+
+/* HvCallModifySparseGpaPageHostVisibility hypercall */
+struct hv_input_modify_sparse_gpa_page_host_visibility {
+   u64 partition_id;
+   u32 host_visibility:2;
+   u32 reserved0:30;
+   u32 reserved1;
+   u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+} __packed;
+
  #endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccf60a809a17..1e8275d35c1f 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -262,13 +262,13 @@ static inline void hv_set_msi_entry_from_desc(union 
hv_msi_entry *msi_entry,
msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
msi_entry->data.as_uint32 = msi_desc->msg.data;
  }
-


stray change


  struct irq_domain *hv_create_pci_msi_domain(void);
  
  int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,

struct hv_interrupt_entry *entry);
  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
*entry);
-
+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
  #else /* CONFIG_HYPERV */
  static inline void hyperv_init(void) {}
  static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e88bc296afca..347c32eac8fd 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -37,6 +37,8 @@
  bool hv_root_partition;
  EXPORT_SYMBOL_GPL(hv_root_partition);
  
+#define HV_PARTITION_ID_SELF ((u64)-1)

+


We seem to have this already:

include/asm-generic/hyperv-tlfs.h:#define HV_PARTITION_ID_SELF  
((u64)-1)



  struct ms_hyperv_info ms_hyperv;
  EXPORT_SYMBOL_GPL(ms_hyperv);
  
@@ -477,3 +479,47 @@ const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {

.init.msi_ext_dest_id   = ms_hyperv_msi_ext_dest_id,
.init.init_platform = ms_hyperv_init_platform,
  };
+
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+   struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
+   struct hv_input_modify_sparse_gpa_page_host_visibility *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -1;


-EFAULT/-ENOMEM/... maybe ?


Yes, will update.



+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+ 

Re: [RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl()

2021-03-04 Thread Tianyu Lan

Hi Vitaly:
 Thanks for your review.

On 3/4/2021 12:27 AM, Vitaly Kuznetsov wrote:

Tianyu Lan  writes:


From: Tianyu Lan 

Add visibility parameter for vmbus_establish_gpadl() and prepare
to change host visibility when create gpadl for buffer.



"No functional change" as you don't actually use the parameter.


Yes, will add it into commit log.




Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 


Nit: Sunil's SoB looks misleading because the patch is from you,
Co-Developed-by should be sufficient.



Will update.


---
  arch/x86/include/asm/hyperv-tlfs.h |  9 +
  drivers/hv/channel.c   | 20 +++-
  drivers/net/hyperv/netvsc.c|  8 ++--
  drivers/uio/uio_hv_generic.c   |  7 +--
  include/linux/hyperv.h |  3 ++-
  5 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index e6cd3fee562b..fb1893a4c32b 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -236,6 +236,15 @@ enum hv_isolation_type {
  /* TSC invariant control */
  #define HV_X64_MSR_TSC_INVARIANT_CONTROL  0x4118
  
+/* Hyper-V GPA map flags */

+#define HV_MAP_GPA_PERMISSIONS_NONE0x0
+#define HV_MAP_GPA_READABLE0x1
+#define HV_MAP_GPA_WRITABLE0x2
+
+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
+#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
+


Are these x86-only? If not, then we should probably move these defines
to include/asm-generic/hyperv-tlfs.h. In case they are, we should do
something as we're using them from arch neutral places.

Also, could you please add a comment stating that these flags define
host's visibility of a page and not guest's (this seems to be not
obvious at least to me).







  /*
   * Declare the MSR used to setup pages used to communicate with the 
hypervisor.
   */
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 0bd202de7960..daa21cc72beb 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -242,7 +242,7 @@ EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
   */
  static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
   u32 size, u32 send_offset,
-  struct vmbus_channel_msginfo **msginfo)
+  struct vmbus_channel_msginfo **msginfo, u32 
visibility)
  {
int i;
int pagecount;
@@ -391,7 +391,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, 
void *kbuffer,
  static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
   enum hv_gpadl_type type, void *kbuffer,
   u32 size, u32 send_offset,
-  u32 *gpadl_handle)
+  u32 *gpadl_handle, u32 visibility)
  {
struct vmbus_channel_gpadl_header *gpadlmsg;
struct vmbus_channel_gpadl_body *gpadl_body;
@@ -405,7 +405,8 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
next_gpadl_handle =
(atomic_inc_return(_connection.next_gpadl_handle) - 1);
  
-	ret = create_gpadl_header(type, kbuffer, size, send_offset, );

+   ret = create_gpadl_header(type, kbuffer, size, send_offset,
+ , visibility);
if (ret)
return ret;
  
@@ -496,10 +497,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,

   * @gpadl_handle: some funky thing
   */
  int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
- u32 size, u32 *gpadl_handle)
+ u32 size, u32 *gpadl_handle, u32 visibility)
  {
return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
-  0U, gpadl_handle);
+  0U, gpadl_handle, visibility);
  }
  EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
  
@@ -610,10 +611,11 @@ static int __vmbus_open(struct vmbus_channel *newchannel,

newchannel->ringbuffer_gpadlhandle = 0;
  
  	err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,

- page_address(newchannel->ringbuffer_page),
- (send_pages + recv_pages) << PAGE_SHIFT,
- newchannel->ringbuffer_send_offset << 
PAGE_SHIFT,
- >ringbuffer_gpadlhandle);
+   page_address(newchannel->ringbuffer_page),
+   (send_pages + recv_pages) << PAGE_SHIFT,
+   newchannel->ringbuffer_send_offset << PAGE_SH

Re: [EXTERNAL] Re: [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc

2021-03-02 Thread Tianyu Lan

Hi Sunil:
 Thanks for your review.

On 3/2/2021 3:45 AM, Sunil Muthuswamy wrote:

Hi Christoph:
   Thanks a lot for your review. There are some reasons.
   1) Vmbus drivers don't use DMA API now.

What is blocking us from making the Hyper-V drivers use the DMA API's? They
will be a null-op generally, when there is no bounce buffer support needed.


   2) Hyper-V Vmbus channel ring buffer already play bounce buffer
role for most vmbus drivers. Just two kinds of packets from
netvsc/storvsc are uncovered.

How does this make a difference here?


   3) In AMD SEV-SNP based Hyper-V guest, the access physical address
of shared memory should be bounce buffer memory physical address plus
with a shared memory boundary(e.g, 48bit) reported Hyper-V CPUID. It's
called virtual top of memory(vTom) in AMD spec and works as a watermark.
So it needs to ioremap/memremap the associated physical address above
the share memory boundary before accessing them. swiotlb_bounce() uses
low end physical address to access bounce buffer and this doesn't work
in this senario. If something wrong, please help me correct me.


There are alternative implementations of swiotlb on top of the core swiotlb
API's. One option is to have Hyper-V specific swiotlb wrapper DMA API's with
the custom logic above.


Agree. Hyper-V should have its own DMA ops and put Hyper-V bounce buffer
code in DMA API callback. For vmbus channel ring buffer, it doesn't need 
additional bounce buffer and there are two options. 1) Not call DMA API 
around them 2) pass a flag in DMA API to notify Hyper-V DMA callback

and not allocate bounce buffer for them.




Thanks.


On 3/1/2021 2:54 PM, Christoph Hellwig wrote:

This should be handled by the DMA mapping layer, just like for native
SEV support.

I agree with Christoph's comment that in principle, this should be handled using
the DMA API's



Re: [RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc

2021-03-01 Thread Tianyu Lan

Hi Christoph:
 Thanks a lot for your review. There are some reasons.
 1) Vmbus drivers don't use DMA API now.
 2) Hyper-V Vmbus channel ring buffer already play bounce buffer 
role for most vmbus drivers. Just two kinds of packets from 
netvsc/storvsc are uncovered.
 3) In AMD SEV-SNP based Hyper-V guest, the access physical address 
of shared memory should be bounce buffer memory physical address plus

with a shared memory boundary(e.g, 48bit) reported Hyper-V CPUID. It's
called virtual top of memory(vTom) in AMD spec and works as a watermark. 
So it needs to ioremap/memremap the associated physical address above 
the share memory boundary before accessing them. swiotlb_bounce() uses
low end physical address to access bounce buffer and this doesn't work 
in this senario. If something wrong, please help me correct me.


Thanks.


On 3/1/2021 2:54 PM, Christoph Hellwig wrote:

This should be handled by the DMA mapping layer, just like for native
SEV support.



[RFC PATCH 12/12] HV/Storvsc: Add bounce buffer support for Storvsc

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Storvsc driver needs to reverse additional bounce
buffers to receive multipagebuffer packet and copy
data from brounce buffer when get response messge
from message.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index c5b4974eb41f..4ae8e2a427e4 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -33,6 +33,8 @@
 #include 
 #include 
 
+#include "../hv/hyperv_vmbus.h"
+
 /*
  * All wire protocol details (storage protocol between the guest and the host)
  * are consolidated here.
@@ -725,6 +727,10 @@ static void handle_sc_creation(struct vmbus_channel 
*new_sc)
/* Add the sub-channel to the array of available channels. */
stor_device->stor_chns[new_sc->target_cpu] = new_sc;
cpumask_set_cpu(new_sc->target_cpu, _device->alloced_cpus);
+
+   if (hv_bounce_resources_reserve(device->channel,
+   stor_device->max_transfer_bytes))
+   pr_warn("Fail to reserve bounce buffer\n");
 }
 
 static void  handle_multichannel_storage(struct hv_device *device, int 
max_chns)
@@ -964,6 +970,18 @@ static int storvsc_channel_init(struct hv_device *device, 
bool is_fc)
stor_device->max_transfer_bytes =
vstor_packet->storage_channel_properties.max_transfer_bytes;
 
+   /*
+* Reserve enough bounce resources to be able to support paging
+* operations under low memory conditions, that cannot rely on
+* additional resources to be allocated.
+*/
+   ret =  hv_bounce_resources_reserve(device->channel,
+   stor_device->max_transfer_bytes);
+   if (ret < 0) {
+   pr_warn("Fail to reserve bounce buffer\n");
+   goto done;
+   }
+
if (!is_fc)
goto done;
 
@@ -1263,6 +1281,11 @@ static void storvsc_on_channel_callback(void *context)
 
request = (struct storvsc_cmd_request *)(unsigned long)cmd_rqst;
 
+   if (desc->type == VM_PKT_COMP && request->bounce_pkt) {
+   hv_pkt_bounce(channel, request->bounce_pkt);
+   request->bounce_pkt = NULL;
+   }
+
if (request == _device->init_request ||
request == _device->reset_request) {
memcpy(>vstor_packet, packet,
-- 
2.25.1



[RFC PATCH 9/12] x86/Hyper-V: Add new parameter for vmbus_sendpacket_pagebuffer()/mpb_desc()

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Add new parameter io_type and struct bounce_pkt for 
vmbus_sendpacket_pagebuffer()
and vmbus_sendpacket_mpb_desc() in order to add bounce buffer support
later.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c|  7 +--
 drivers/hv/hyperv_vmbus.h   | 12 
 drivers/net/hyperv/hyperv_net.h |  1 +
 drivers/net/hyperv/netvsc.c |  5 -
 drivers/scsi/storvsc_drv.c  | 23 +--
 include/linux/hyperv.h  | 16 ++--
 6 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 4c05b1488649..976ef99dda28 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1044,7 +1044,8 @@ EXPORT_SYMBOL(vmbus_sendpacket);
 int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
struct hv_page_buffer pagebuffers[],
u32 pagecount, void *buffer, u32 bufferlen,
-   u64 requestid)
+   u64 requestid, u8 io_type,
+   struct hv_bounce_pkt **bounce_pkt)
 {
int i;
struct vmbus_channel_packet_page_buffer desc;
@@ -1101,7 +1102,9 @@ EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
 int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
  struct vmbus_packet_mpb_array *desc,
  u32 desc_size,
- void *buffer, u32 bufferlen, u64 requestid)
+ void *buffer, u32 bufferlen, u64 requestid,
+ u32 pfn_count, u8 io_type,
+ struct hv_bounce_pkt **bounce_pkt)
 {
u32 packetlen;
u32 packetlen_aligned;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 7edf2be60d2c..7677f083d33a 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -57,6 +57,18 @@ union hv_monitor_trigger_state {
};
 };
 
+/*
+ * Hyper-V bounce packet. Each in-use bounce packet is mapped to a vmbus
+ * transaction and contains a list of bounce pages for that transaction.
+ */
+struct hv_bounce_pkt {
+   /* Link to the next bounce packet, when it is in the free list */
+   struct list_head link;
+   struct list_head bounce_page_head;
+   u32 flags;
+};
+
+
 /*
  * All vmbus channels initially start with zero bounce pages and are required
  * to set any non-zero size, if needed.
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index b3a43c4ec8ab..11266b92bcf0 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -130,6 +130,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_bounce_pkt *bounce_pkt;
 };
 
 #define NETVSC_HASH_KEYLEN 40
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 08d73401bb28..77657c5acc65 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -926,14 +926,17 @@ static inline int netvsc_send_pkt(
 
trace_nvsp_send_pkt(ndev, out_channel, rpkt);
 
+   packet->bounce_pkt = NULL;
if (packet->page_buf_cnt) {
if (packet->cp_partial)
pb += packet->rmsg_pgcnt;
 
+   /* The I/O type is always 'write' for netvsc */
ret = vmbus_sendpacket_pagebuffer(out_channel,
  pb, packet->page_buf_cnt,
  , sizeof(nvmsg),
- req_id);
+ req_id, IO_TYPE_WRITE,
+ >bounce_pkt);
} else {
ret = vmbus_sendpacket(out_channel,
   , sizeof(nvmsg),
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2e4fa77445fd..c5b4974eb41f 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * All wire protocol details (storage protocol between the guest and the host)
@@ -427,6 +428,7 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   struct hv_bounce_pkt *bounce_pkt;
 };
 
 
@@ -1390,7 +1392,8 @@ static struct vmbus_channel *get_og_chn(struct 
storvsc_device *stor_device,
 
 
 static int storvsc_do_io(struct hv_device *device,
-struct storvsc_cmd_request *request, u16 q_num)
+struct storvsc_cmd_request *request, u16 q_num,
+u32 pfn_count)
 {
struct storvsc_device *stor_device;
struct vstor_packet *vstor_packet;
@@ -1493,14 +149

[RFC PATCH 10/12] HV: Add bounce buffer support for Isolation VM

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP base Isolation VMs. The memory of these vms
are encrypted and host can't access guest memory directly. The
guest needs to call hv host visibility hvcall to mark memory visible
to host before sharing memory with host for IO operation. So there
is bounce buffer request for IO operation to get data from host.
To receive data, host puts data into the shared memory(bounce buffer)
and guest copies the data to private memory. Vice versa.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary.

Vmbus channel ring buffer has been marked as host visible and works
as bounce buffer for vmbus devices. vmbus_sendpacket_pagebuffer()
and vmbus_sendpacket_mpb_desc() send package which uses system memory
out of vmbus channel ring buffer. These memory still needs to allocate
additional bounce buffer to commnuicate with host. Add vmbus_sendpacket_
pagebuffer_bounce () and vmbus_sendpacket_mpb_desc_bounce() to handle
such case.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c  |  13 +-
 drivers/hv/channel_mgmt.c |   1 +
 drivers/hv/hv_bounce.c| 579 +-
 drivers/hv/hyperv_vmbus.h |  13 +
 include/linux/hyperv.h|   2 +
 5 files changed, 605 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 976ef99dda28..f5391a050bdc 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1090,7 +1090,11 @@ int vmbus_sendpacket_pagebuffer(struct vmbus_channel 
*channel,
bufferlist[2].iov_base = _data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+   if (hv_is_isolation_supported())
+   return vmbus_sendpacket_pagebuffer_bounce(channel, ,
+   descsize, bufferlist, io_type, bounce_pkt, requestid);
+   else
+   return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
 
@@ -1130,7 +1134,12 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel 
*channel,
bufferlist[2].iov_base = _data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+   if (hv_is_isolation_supported()) {
+   return vmbus_sendpacket_mpb_desc_bounce(channel, desc,
+   desc_size, bufferlist, io_type, bounce_pkt, requestid);
+   } else {
+   return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+   }
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_mpb_desc);
 
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index e2846cacfd70..d8090b2e2421 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -359,6 +359,7 @@ static struct vmbus_channel *alloc_channel(void)
if (!channel)
return NULL;
 
+   spin_lock_init(>bp_lock);
spin_lock_init(>sched_lock);
init_completion(>rescind_event);
 
diff --git a/drivers/hv/hv_bounce.c b/drivers/hv/hv_bounce.c
index c5898325b238..bed1a361d167 100644
--- a/drivers/hv/hv_bounce.c
+++ b/drivers/hv/hv_bounce.c
@@ -9,12 +9,589 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include "hyperv_vmbus.h"
+#include 
+#include 
+
+/* BP == Bounce Pages here */
+#define BP_LIST_MAINTENANCE_FREQ (30 * HZ)
+#define BP_MIN_TIME_IN_FREE_LIST (30 * HZ)
+#define IS_BP_MAINTENANCE_TASK_NEEDED(channel) \
+   (channel->bounce_page_alloc_count > \
+channel->min_bounce_resource_count && \
+!list_empty(>bounce_page_free_head))
+#define BP_QUEUE_MAINTENANCE_WORK(channel) \
+   queue_delayed_work(system_unbound_wq,   \
+  >bounce_page_list_maintain, \
+  BP_LIST_MAINTENANCE_FREQ)
+
+#define hv_copy_to_bounce(bounce_pkt) \
+   hv_copy_to_from_bounce(bounce_pkt, true)
+#define hv_copy_from_bounce(bounce_pkt)\
+   hv_copy_to_from_bounce(bounce_pkt, false)
+/*
+ * A list of bounce pages, with original va, bounce va and I/O details such as
+ * the offset and length.
+ */
+struct hv_bounce_page_list {
+   struct list_head link;
+   u32 offset;
+   u32 len;
+   unsigned long va;
+   unsigned long bounce_va;
+   unsigned long bounce_original_va;
+   unsigned long bounce_extra_pfn;
+   unsigned long last_used_jiff;
+};
+
+/*
+ * This structure can be safely used to iterate over objects of the type
+ * 'hv_page_buffer', 'hv_mpb_array' or 'hv_multipage_buffer'. The m

[RFC PATCH 11/12] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Add Isolation VM support for netvsc driver. Map send/receive
ring buffer in extra address space in SNP isolation VM, reserve
bounce buffer for packets sent via vmbus_sendpacket_pagebuffer()
and release bounce buffer via hv_pkt_bounce() when get send
complete response from host.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h |  3 +
 drivers/net/hyperv/netvsc.c | 97 ++---
 2 files changed, 92 insertions(+), 8 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 11266b92bcf0..45d5838ff128 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1027,14 +1027,17 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
u32 recv_section_size;
u32 recv_completion_cnt;
 
+
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 77657c5acc65..171af85e055d 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -26,7 +26,7 @@
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
-
+#include "../../hv/hyperv_vmbus.h"
 /*
  * Switch the data path from the synthetic interface to the VF
  * interface.
@@ -119,8 +119,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   iounmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   iounmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -241,13 +254,18 @@ static void netvsc_teardown_recv_gpadl(struct hv_device 
*device,
   struct netvsc_device *net_device,
   struct net_device *ndev)
 {
+   void *recv_buf;
int ret;
 
if (net_device->recv_buf_gpadl_handle) {
+   if (net_device->recv_original_buf)
+   recv_buf = net_device->recv_original_buf;
+   else
+   recv_buf = net_device->recv_buf;
+
ret = vmbus_teardown_gpadl(device->channel,
   net_device->recv_buf_gpadl_handle,
-  net_device->recv_buf,
-  net_device->recv_buf_size);
+  recv_buf, net_device->recv_buf_size);
 
/* If we failed here, we might as well return and have a leak
 * rather than continue and a bugchk
@@ -265,13 +283,18 @@ static void netvsc_teardown_send_gpadl(struct hv_device 
*device,
   struct netvsc_device *net_device,
   struct net_device *ndev)
 {
+   void *send_buf;
int ret;
 
if (net_device->send_buf_gpadl_handle) {
+   if (net_device->send_original_buf)
+   send_buf = net_device->send_original_buf;
+   else
+   send_buf = net_device->send_buf;
+
ret = vmbus_teardown_gpadl(device->channel,
   net_device->send_buf_gpadl_handle,
-  net_device->send_buf,
-  net_device->send_buf_size);
+  send_buf, net_device->send_buf_size);
 
/* If we failed here, we might as well return and have a leak
 * rather than continue and a bugchk
@@ -306,9 +329,19 @@ static int netvsc_init_buf(struct hv_device *device,
struct nvsp_1_message_send_receive_buffer_complete *resp;
struct net_device *ndev = hv_get_drvdata(device);
struct nvsp_message *init_packet;
+   struct vm_struct *area;
+   u64 extra_phys;
unsigned int buf_size;
+   unsigned long vaddr;
size_t map_words;
-   int ret = 0;
+   int ret = 0, i;
+
+   ret = hv_bounce_resources_reserve(device->ch

[RFC PATCH 6/12] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT
msg should be in the extra address space for SNP support and these
pages also should be accessed via the extra address space inside Linux
guest and remap the extra address by ioremap function.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/connection.c   | 62 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 63 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 79bca653dce9..a0be9c11d737 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_isolation_type_snp()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_isolation_type_snp()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE);
+   if (!vmbus_connection.monitor_pages[1]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -156,6 +185,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -213,6 +243,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -279,6 +319,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -298,6 +340,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_isolation_type_snp()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   vunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   vunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+   }
+
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
vmbus_connection.monitor_pages[0] = NULL;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 9416e09ebd58..0778add21a9c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
  

[RFC PATCH 8/12] x86/Hyper-V: Initialize bounce buffer page cache and list

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Initialize/free bounce buffer resource when add/delete
vmbus channel in Isolation VM.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 drivers/hv/Makefile   |  2 +-
 drivers/hv/channel_mgmt.c | 29 +--
 drivers/hv/hv_bounce.c| 42 +++
 drivers/hv/hyperv_vmbus.h | 14 +
 include/linux/hyperv.h| 22 
 5 files changed, 97 insertions(+), 12 deletions(-)
 create mode 100644 drivers/hv/hv_bounce.c

diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index 94daf8240c95..b0c20fed9153 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -8,6 +8,6 @@ CFLAGS_hv_balloon.o = -I$(src)
 
 hv_vmbus-y := vmbus_drv.o \
 hv.o connection.o channel.o \
-channel_mgmt.o ring_buffer.o hv_trace.o
+channel_mgmt.o ring_buffer.o hv_trace.o hv_bounce.o
 hv_vmbus-$(CONFIG_HYPERV_TESTING)  += hv_debugfs.o
 hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_fcopy.o hv_utils_transport.o
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index f0ed730e2e4e..e2846cacfd70 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -336,6 +336,18 @@ bool vmbus_prep_negotiate_resp(struct icmsg_hdr 
*icmsghdrp, u8 *buf,
 
 EXPORT_SYMBOL_GPL(vmbus_prep_negotiate_resp);
 
+/*
+ * free_channel - Release the resources used by the vmbus channel object
+ */
+static void free_channel(struct vmbus_channel *channel)
+{
+   tasklet_kill(>callback_event);
+   vmbus_remove_channel_attr_group(channel);
+
+   kobject_put(>kobj);
+   hv_free_channel_ivm(channel);
+}
+
 /*
  * alloc_channel - Allocate and initialize a vmbus channel object
  */
@@ -360,17 +372,6 @@ static struct vmbus_channel *alloc_channel(void)
return channel;
 }
 
-/*
- * free_channel - Release the resources used by the vmbus channel object
- */
-static void free_channel(struct vmbus_channel *channel)
-{
-   tasklet_kill(>callback_event);
-   vmbus_remove_channel_attr_group(channel);
-
-   kobject_put(>kobj);
-}
-
 void vmbus_channel_map_relid(struct vmbus_channel *channel)
 {
if (WARN_ON(channel->offermsg.child_relid >= MAX_CHANNEL_RELIDS))
@@ -510,6 +511,8 @@ static void vmbus_add_channel_work(struct work_struct *work)
if (vmbus_add_channel_kobj(dev, newchannel))
goto err_deq_chan;
 
+   hv_init_channel_ivm(newchannel);
+
if (primary_channel->sc_creation_callback != NULL)
primary_channel->sc_creation_callback(newchannel);
 
@@ -543,6 +546,10 @@ static void vmbus_add_channel_work(struct work_struct 
*work)
}
 
newchannel->probe_done = true;
+
+   if (hv_init_channel_ivm(newchannel))
+   goto err_deq_chan;
+
return;
 
 err_deq_chan:
diff --git a/drivers/hv/hv_bounce.c b/drivers/hv/hv_bounce.c
new file mode 100644
index ..c5898325b238
--- /dev/null
+++ b/drivers/hv/hv_bounce.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Bounce buffer code for Hyper-V Isolation VM support.
+ *
+ * Authors:
+ *   Sunil Muthuswamy 
+ *   Tianyu Lan 
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "hyperv_vmbus.h"
+
+int hv_init_channel_ivm(struct vmbus_channel *channel)
+{
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   INIT_LIST_HEAD(>bounce_page_free_head);
+   INIT_LIST_HEAD(>bounce_pkt_free_list_head);
+
+   channel->bounce_pkt_cache = KMEM_CACHE(hv_bounce_pkt, 0);
+   if (unlikely(!channel->bounce_pkt_cache))
+   return -ENOMEM;
+   channel->bounce_page_cache = KMEM_CACHE(hv_bounce_page_list, 0);
+   if (unlikely(!channel->bounce_page_cache))
+   return -ENOMEM;
+
+   return 0;
+}
+
+void hv_free_channel_ivm(struct vmbus_channel *channel)
+{
+   if (!hv_is_isolation_supported())
+   return;
+
+
+   cancel_delayed_work_sync(>bounce_page_list_maintain);
+   hv_bounce_pkt_list_free(channel, >bounce_pkt_free_list_head);
+   hv_bounce_page_list_free(channel, >bounce_page_free_head);
+   kmem_cache_destroy(channel->bounce_pkt_cache);
+   kmem_cache_destroy(channel->bounce_page_cache);
+}
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index d78a04ad5490..7edf2be60d2c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 
+#include 
 #include "hv_trace.h"
 
 /*
@@ -56,6 +57,19 @@ union hv_monitor_trigger_state {
};
 };
 
+/*
+ * All vmbus channels initially start with zero bounce pages and are required
+ * to set any non-zero size, if needed.
+ */
+#define HV_DEFAULT_BOUNCE_BUFFER_PAGES  0
+
+/* MIN should be a power of 2 */
+#define HV_MIN_

[RFC PATCH 7/12] hv/vmbus: Initialize VMbus ring buffer for Isolation VM

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 83 +--
 mm/ioremap.c  |  1 +
 mm/vmalloc.c  |  1 +
 5 files changed, 76 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f31b669a1ddf..4c05b1488649 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -657,6 +657,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(>outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(>inbound,
+ [send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 0778add21a9c..d78a04ad5490 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 35833d4d1a1d..c8b0f7b45158 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(>outbound.ring_buffer_mutex);
 }
 
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
+{
+   struct vm_struct *area;
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long vaddr;
+   int err = 0;
+
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
+   area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
+   if (!area || !area->addr)
+   return -EFAULT;
+
+   vaddr = (unsigned long)area->addr;
+   err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
+  physic_addr, PAGE_KERNEL_IO);
+   err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
+ vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
+ physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
+   if (err) {
+   vunmap((void *)vaddr);
+   return -EFAULT;
+   }
+
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+   ring_info->ring_buffer->write_index = 0;
+   ring_info->ring_buffer->feature_bits.value = 1;
+
+   return 0;
+}
+
 /* Initialize the ring buffer. */
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 page_cnt)
@@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
 
BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
 
-   /*
-* First page holds struct hv_ring_buffer, do wraparound mapping for
-* the rest.
-*/
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
-  GFP_KERNEL);
-   if (!pages_wraparound)
-   return -ENOMEM;
-
-   pages_wraparound[0] = pages;
-   for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
+   if (!hv_isolation_type_snp()) {
+   /*
+* First page holds struct hv_ring_buffer, do wraparound 
mapping for
+* the rest.
+*/
+   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page 
*)

[RFC PATCH 5/12] HV: Add ghcb hvcall support for SNP VM

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 69 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 ++-
 drivers/hv/hv.c |  8 +++-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 4332bf7aaf9b..feaabcd151f5 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -14,8 +14,77 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2   : 4;
+   u32 repstartindex   : 12;
+   u32 reserved3   : 4;
+   };
+   u64 asuint64;
+   } hypercallinput;
+   union {
+   struct {
+   u16 callstatus;
+   u16 reserved1;
+   u32 elementsprocessed : 12;
+   u32 reserved2 : 20;
+   };
+   u64 asunit64;
+   } hypercalloutput;
+   };
+   u64 reserved2;
+   } hypercall;
 } __packed __aligned(PAGE_SIZE);
 
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 1;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f624d72b99d3..c8f66d269e5b 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -274,6 +274,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index c83612cddb99..79bca653dce9 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -442,6 +442,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 28e28ccc2081..6c64a7fd1ebd 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, align

[RFC PATCH 4/12] HV: Add Write/Read MSR registers via ghcb

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers and these registers are emulated by
Hypervisor rather than paravisor.

Hyper-V requests to write SINTx MSR registers twice(once via
GHCB and once via wrmsr instruction including the proxy bit 21)
Guest OS ID MSR also needs to be set via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/Makefile|   2 +-
 arch/x86/hyperv/hv_init.c   |  18 +--
 arch/x86/hyperv/ivm.c   | 178 ++
 arch/x86/include/asm/mshyperv.h |  21 +++-
 arch/x86/kernel/cpu/mshyperv.c  |  46 
 drivers/hv/channel.c|   2 +-
 drivers/hv/hv.c | 188 ++--
 include/asm-generic/mshyperv.h  |  10 +-
 8 files changed, 343 insertions(+), 122 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 90e65fbf4c58..87b1dd9c84d6 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -475,6 +475,9 @@ void __init hyperv_init(void)
 
ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
*ghcb_base = ghcb_va;
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -561,6 +564,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
 }
 EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.features_b & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..4332bf7aaf9b
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+union hv_ghcb {
+   struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(_ghcb->ghcb, SVM_EXIT_MSR);
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(_ghcb->ghcb, value >> 32);
+   ghcb_set_sw_exit_info_1(_ghcb->ghcb, 1);
+   ghcb_set_sw_exit_info_2(_ghcb->ghcb, 0);
+
+   VMGEXIT();
+
+   if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
+   pr_warn("Fail to write msr via ghcb.\n.");
+
+   local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   memset(hv_ghcb, 0x00, PAGE_SIZE);
+   hv_ghcb->ghcb.protocol_version = 1;
+   hv_ghcb->ghcb.ghcb_usage = 0;
+
+   ghcb_set_sw_exit_code(

[RFC PATCH 3/12] x86/HV: Initialize GHCB page and shared memory boundary

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB. Hyper-V also exposes shared memory boundary via
cpuid HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c  | 52 +++---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 14 -
 3 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0db5137d5b81..90e65fbf4c58 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -82,6 +82,9 @@ static int hv_cpu_init(unsigned int cpu)
struct hv_vp_assist_page **hvp = _vp_assist_page[smp_processor_id()];
void **input_arg;
struct page *pg;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
hv_root_partition ? 1 : 0);
@@ -128,6 +131,17 @@ static int hv_cpu_init(unsigned int cpu)
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}
 
+   if (ms_hyperv.ghcb_base) {
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+   }
+
return 0;
 }
 
@@ -223,6 +237,7 @@ static int hv_cpu_die(unsigned int cpu)
unsigned long flags;
void **input_arg;
void *pg;
+   void **ghcb_va = NULL;
 
local_irq_save(flags);
input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
@@ -236,6 +251,13 @@ static int hv_cpu_die(unsigned int cpu)
*output_arg = NULL;
}
 
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   iounmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
+
local_irq_restore(flags);
 
free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
@@ -372,6 +394,9 @@ void __init hyperv_init(void)
u64 guest_id, required_msrs;
union hv_x64_msr_hypercall_contents hypercall_msr;
int cpuhp, i;
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
 
if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return;
@@ -432,9 +457,24 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
+   if (!ghcb_va) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -499,7 +539,8 @@ void __init hyperv_init(void)
 
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -528,6 +569,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 347c32eac8fd..d6c363456cbf 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -330,6 +330,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.features_b & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CO

[RFC PATCH 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Tianyu Lan (12):
  x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl()
  x86/Hyper-V: Add new hvcall guest address host visibility support
  x86/HV: Initialize GHCB page and shared memory boundary
  HV: Add Write/Read MSR registers via ghcb
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  hv/vmbus: Initialize VMbus ring buffer for Isolation VM
  x86/Hyper-V: Initialize bounce buffer page cache and list
  x86/Hyper-V: Add new parameter for
vmbus_sendpacket_pagebuffer()/mpb_desc()
  HV: Add bounce buffer support for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add bounce buffer support for Storvsc

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  70 +++-
 arch/x86/hyperv/ivm.c  | 257 
 arch/x86/include/asm/hyperv-tlfs.h |  22 +
 arch/x86/include/asm/mshyperv.h|  26 +-
 arch/x86/kernel/cpu/mshyperv.c |   2 +
 drivers/hv/Makefile|   2 +-
 drivers/hv/channel.c   | 103 -
 drivers/hv/channel_mgmt.c  |  30 +-
 drivers/hv/connection.c|  68 +++-
 drivers/hv/hv.c| 196 ++---
 drivers/hv/hv_bounce.c | 619 +
 drivers/hv/hyperv_vmbus.h  |  42 ++
 drivers/hv/ring_buffer.c   |  83 +++-
 drivers/net/hyperv/hyperv_net.h|   5 +
 drivers/net/hyperv/netvsc.c| 111 +-
 drivers/scsi/storvsc_drv.c |  46 ++-
 drivers/uio/uio_hv_generic.c   |  13 +-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  24 +-
 include/linux/hyperv.h |  46 ++-
 mm/ioremap.c   |   1 +
 mm/vmalloc.c   |   1 +
 23 files changed, 1614 insertions(+), 156 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c
 create mode 100644 drivers/hv/hv_bounce.c

-- 
2.25.1



[RFC PATCH 2/12] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Add new hvcall guest address host visibility support. Mark vmbus
ring buffer visible to host when create gpadl buffer and mark back
to not visible when tear down gpadl buffer.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 arch/x86/include/asm/hyperv-tlfs.h | 13 
 arch/x86/include/asm/mshyperv.h|  4 +--
 arch/x86/kernel/cpu/mshyperv.c | 46 ++
 drivers/hv/channel.c   | 53 --
 drivers/net/hyperv/hyperv_net.h|  1 +
 drivers/net/hyperv/netvsc.c|  9 +++--
 drivers/uio/uio_hv_generic.c   |  6 ++--
 include/asm-generic/hyperv-tlfs.h  |  1 +
 include/linux/hyperv.h |  3 +-
 9 files changed, 126 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index fb1893a4c32b..d22b1c3f425a 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -573,4 +573,17 @@ enum hv_interrupt_type {
 
 #include 
 
+/* All input parameters should be in single page. */
+#define HV_MAX_MODIFY_GPA_REP_COUNT\
+   ((PAGE_SIZE - 2 * sizeof(u64)) / (sizeof(u64)))
+
+/* HvCallModifySparseGpaPageHostVisibility hypercall */
+struct hv_input_modify_sparse_gpa_page_host_visibility {
+   u64 partition_id;
+   u32 host_visibility:2;
+   u32 reserved0:30;
+   u32 reserved1;
+   u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+} __packed;
+
 #endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccf60a809a17..1e8275d35c1f 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -262,13 +262,13 @@ static inline void hv_set_msi_entry_from_desc(union 
hv_msi_entry *msi_entry,
msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
msi_entry->data.as_uint32 = msi_desc->msg.data;
 }
-
 struct irq_domain *hv_create_pci_msi_domain(void);
 
 int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
struct hv_interrupt_entry *entry);
 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-
+int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility);
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e88bc296afca..347c32eac8fd 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -37,6 +37,8 @@
 bool hv_root_partition;
 EXPORT_SYMBOL_GPL(hv_root_partition);
 
+#define HV_PARTITION_ID_SELF ((u64)-1)
+
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
 
@@ -477,3 +479,47 @@ const __initconst struct hypervisor_x86 
x86_hyper_ms_hyperv = {
.init.msi_ext_dest_id   = ms_hyperv_msi_ext_dest_id,
.init.init_platform = ms_hyperv_init_platform,
 };
+
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+   struct hv_input_modify_sparse_gpa_page_host_visibility **input_pcpu;
+   struct hv_input_modify_sparse_gpa_page_host_visibility *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -1;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return -EFAULT;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index daa21cc72beb..204e6f3598a5 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -237,6 +237,38 @@ int vmbus_send_modifychannel(u32 child_relid, u32 
target_vp)
 }
 EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
 
+/*
+ * hv_set_mem_host_vi

[RFC PATCH 1/12] x86/Hyper-V: Add visibility parameter for vmbus_establish_gpadl()

2021-02-28 Thread Tianyu Lan
From: Tianyu Lan 

Add visibility parameter for vmbus_establish_gpadl() and prepare
to change host visibility when create gpadl for buffer.

Signed-off-by: Sunil Muthuswamy 
Co-Developed-by: Sunil Muthuswamy 
Signed-off-by: Tianyu Lan 
---
 arch/x86/include/asm/hyperv-tlfs.h |  9 +
 drivers/hv/channel.c   | 20 +++-
 drivers/net/hyperv/netvsc.c|  8 ++--
 drivers/uio/uio_hv_generic.c   |  7 +--
 include/linux/hyperv.h |  3 ++-
 5 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index e6cd3fee562b..fb1893a4c32b 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -236,6 +236,15 @@ enum hv_isolation_type {
 /* TSC invariant control */
 #define HV_X64_MSR_TSC_INVARIANT_CONTROL   0x4118
 
+/* Hyper-V GPA map flags */
+#define HV_MAP_GPA_PERMISSIONS_NONE0x0
+#define HV_MAP_GPA_READABLE0x1
+#define HV_MAP_GPA_WRITABLE0x2
+
+#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
+#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)
+#define VMBUS_PAGE_NOT_VISIBLE HV_MAP_GPA_PERMISSIONS_NONE
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 0bd202de7960..daa21cc72beb 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -242,7 +242,7 @@ EXPORT_SYMBOL_GPL(vmbus_send_modifychannel);
  */
 static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
   u32 size, u32 send_offset,
-  struct vmbus_channel_msginfo **msginfo)
+  struct vmbus_channel_msginfo **msginfo, u32 
visibility)
 {
int i;
int pagecount;
@@ -391,7 +391,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, 
void *kbuffer,
 static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
   enum hv_gpadl_type type, void *kbuffer,
   u32 size, u32 send_offset,
-  u32 *gpadl_handle)
+  u32 *gpadl_handle, u32 visibility)
 {
struct vmbus_channel_gpadl_header *gpadlmsg;
struct vmbus_channel_gpadl_body *gpadl_body;
@@ -405,7 +405,8 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
next_gpadl_handle =
(atomic_inc_return(_connection.next_gpadl_handle) - 1);
 
-   ret = create_gpadl_header(type, kbuffer, size, send_offset, );
+   ret = create_gpadl_header(type, kbuffer, size, send_offset,
+ , visibility);
if (ret)
return ret;
 
@@ -496,10 +497,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
  * @gpadl_handle: some funky thing
  */
 int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
- u32 size, u32 *gpadl_handle)
+ u32 size, u32 *gpadl_handle, u32 visibility)
 {
return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
-  0U, gpadl_handle);
+  0U, gpadl_handle, visibility);
 }
 EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
 
@@ -610,10 +611,11 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
newchannel->ringbuffer_gpadlhandle = 0;
 
err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
- page_address(newchannel->ringbuffer_page),
- (send_pages + recv_pages) << PAGE_SHIFT,
- newchannel->ringbuffer_send_offset << 
PAGE_SHIFT,
- >ringbuffer_gpadlhandle);
+   page_address(newchannel->ringbuffer_page),
+   (send_pages + recv_pages) << PAGE_SHIFT,
+   newchannel->ringbuffer_send_offset << PAGE_SHIFT,
+   >ringbuffer_gpadlhandle,
+   VMBUS_PAGE_VISIBLE_READ_WRITE);
if (err)
goto error_clean_ring;
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 2353623259f3..bb72c7578330 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -333,7 +333,8 @@ static int netvsc_init_buf(struct hv_device *device,
 */
ret = vmbus_establish_gpadl(device->channel, net_device->recv_buf,
buf_size,
-   _device->recv_buf_gpadl_handle);
+   _device->recv_buf_gpadl_handle,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)

Re: [PATCH] x86/hyperv: Properly suspend/resume reenlightenment notifications

2020-05-13 Thread Tianyu Lan

On 5/13/2020 12:01 AM, Vitaly Kuznetsov wrote:

Errors during hibernation with reenlightenment notifications enabled were
reported:

  [   51.730435] PM: hibernation entry
  [   51.737435] PM: Syncing filesystems ...
  ...
  [   54.102216] Disabling non-boot CPUs ...
  [   54.106633] smpboot: CPU 1 is now offline
  [   54.110006] unchecked MSR access error: WRMSR to 0x4106 (tried to
  write 0x47c7278100ee) at rIP: 0x90062f24
  native_write_msr+0x4/0x20)
  [   54.110006] Call Trace:
  [   54.110006]  hv_cpu_die+0xd9/0xf0
  ...

Normally, hv_cpu_die() just reassigns reenlightenment notifications to some
other CPU when the CPU receiving them goes offline. Upon hibernation, there
is no other CPU which is still online so cpumask_any_but(cpu_online_mask)
returns >= nr_cpu_ids and using it as hv_vp_index index is incorrect.
Disable the feature when cpumask_any_but() fails.

Also, as we now disable reenlightenment notifications upon hibernation we
need to restore them on resume. Check if hv_reenlightenment_cb was
previously set and restore from hv_resume().

Signed-off-by: Vitaly Kuznetsov 
---
  arch/x86/hyperv/hv_init.c | 19 +--
  1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index fd51bac11b46..acf76b466db6 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -226,10 +226,18 @@ static int hv_cpu_die(unsigned int cpu)
  
  	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)_ctrl));

if (re_ctrl.target_vp == hv_vp_index[cpu]) {
-   /* Reassign to some other online CPU */
+   /*
+* Reassign reenlightenment notifications to some other online
+* CPU or just disable the feature if there are no online CPUs
+* left (happens on hibernation).
+*/
new_cpu = cpumask_any_but(cpu_online_mask, cpu);
  
-		re_ctrl.target_vp = hv_vp_index[new_cpu];

+   if (new_cpu < nr_cpu_ids)
+   re_ctrl.target_vp = hv_vp_index[new_cpu];
+   else
+   re_ctrl.enabled = 0;
+
wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)_ctrl));
}
  
@@ -293,6 +301,13 @@ static void hv_resume(void)
  
  	hv_hypercall_pg = hv_hypercall_pg_saved;

hv_hypercall_pg_saved = NULL;
+
+   /*
+* Reenlightenment notifications are disabled by hv_cpu_die(0),
+* reenable them here if hv_reenlightenment_cb was previously set.
+*/
+   if (hv_reenlightenment_cb)
+   set_hv_tscchange_cb(hv_reenlightenment_cb);
  }
  
  /* Note: when the ops are called, only CPU0 is online and IRQs are disabled. */




Reviewed-by: Tianyu Lan 


RE: [PATCH] mm/resource: Move child to new resource when release mem region.

2019-10-11 Thread Tianyu Lan
On 10/10/2019 10:29 PM, Dave Hansen wrote:> On 10/10/19 12:28 AM, 
lantianyu1...@gmail.com wrote:
>> When release mem region, old mem region may be splited to
>> two regions. Current allocate new struct resource for high
>> end mem region but not move child resources whose ranges are
>> in the high end range to new resource. When adjust old mem
>> region's range, adjust_resource() detects child region's range
>> is out of new range and return error. Move child resources to
>> high end resource before adjusting old mem range.
> 
>  From the comment, it appears the old code intended to have the behavior
> that you are changing.  Could you explain _why_ this has become a
> problem for you?
Hi Dave:
Thanks for your review. current code assumes that all children remain in
 the lower address entry for simplicity. For memory hot-remove, selecting
remove region via scanning system memory may hit case of child in the
higher address entry.

For example, the following output from /proc/iomem shows kernel code,
data and bss locate from 3a00 to 3b5f and these resources are the
system ram resource's children. If the 3980-39ff was selected as
remove range, the resource will be split into two ranges 0010-397f
and 3980-b87f1fff. Current code move kernel image related resources
under 0010-397f resource. This will cause adjust_resource() return
error because children are not in the parent's range.

0010-b87f1fff : System RAM
  3a00-3ac00e80 : Kernel code
  3ac00e81-3b33883f : Kernel data
  3b4d3000-3b5f : Kernel bss





RE: [PATCH] KVM: vmx: fix a build warning in hv_enable_direct_tlbflush() on i386

2019-09-25 Thread Tianyu Lan
There is another warning in the report.

arch/x86/kvm/vmx/vmx.c: In function 'hv_enable_direct_tlbflush':
arch/x86/kvm/vmx/vmx.c:507:20: warning: cast from pointer to integer of 
different size [-Wpointer-to-int-cast]
  evmcs->hv_vm_id = (u64)vcpu->kvm;
^
The following change can fix it.
-   evmcs->hv_vm_id = (u64)vcpu->kvm;
+   evmcs->hv_vm_id = (unsigned long)vcpu->kvm;
evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;

-Original Message-
From: Vitaly Kuznetsov  
Sent: Wednesday, September 25, 2019 4:53 PM
To: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org; Paolo Bonzini ; Radim 
Krčmář ; Sean Christopherson 
; Jim Mattson ; Tianyu 
Lan 
Subject: [PATCH] KVM: vmx: fix a build warning in hv_enable_direct_tlbflush() 
on i386

The following was reported on i386:

  arch/x86/kvm/vmx/vmx.c: In function 'hv_enable_direct_tlbflush':
  arch/x86/kvm/vmx/vmx.c:503:10: warning: cast from pointer to integer of 
different size [-Wpointer-to-int-cast]

The particular pr_debug() causing it is more or less useless, let's just remove 
it. Also, simplify the condition a little bit.

Reported-by: kbuild test robot 
Signed-off-by: Vitaly Kuznetsov 
---
 arch/x86/kvm/vmx/vmx.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 
a7c9922e3905..812553b7270f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -495,13 +495,11 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu 
*vcpu)
 * Synthetic VM-Exit is not enabled in current code and so All
 * evmcs in singe VM shares same assist page.
 */
-   if (!*p_hv_pa_pg) {
+   if (!*p_hv_pa_pg)
*p_hv_pa_pg = kzalloc(PAGE_SIZE, GFP_KERNEL);
-   if (!*p_hv_pa_pg)
-   return -ENOMEM;
-   pr_debug("KVM: Hyper-V: allocated PA_PG for %llx\n",
-  (u64)>kvm);
-   }
+
+   if (!*p_hv_pa_pg)
+   return -ENOMEM;
 
evmcs = (struct hv_enlightened_vmcs *)to_vmx(vcpu)->loaded_vmcs->vmcs;
 
--
2.20.1



[tip: x86/urgent] x86/hyper-v: Fix overflow bug in fill_gva_list()

2019-09-02 Thread tip-bot2 for Tianyu Lan
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 4030b4c585c41eeefec7bd20ce3d0e100a0f2e4d
Gitweb:
https://git.kernel.org/tip/4030b4c585c41eeefec7bd20ce3d0e100a0f2e4d
Author:Tianyu Lan 
AuthorDate:Mon, 02 Sep 2019 20:41:43 +08:00
Committer: Ingo Molnar 
CommitterDate: Mon, 02 Sep 2019 19:57:19 +02:00

x86/hyper-v: Fix overflow bug in fill_gva_list()

When the 'start' parameter is >=  0xFF00 on 32-bit
systems, or >= 0x'FF00 on 64-bit systems,
fill_gva_list() gets into an infinite loop.

With such inputs, 'cur' overflows after adding HV_TLB_FLUSH_UNIT
and always compares as less than end.  Memory is filled with
guest virtual addresses until the system crashes.

Fix this by never incrementing 'cur' to be larger than 'end'.

Reported-by: Jong Hyun Park 
Signed-off-by: Tianyu Lan 
Reviewed-by: Michael Kelley 
Cc: Borislav Petkov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 2ffd9e33ce4a ("x86/hyper-v: Use hypercall for remote TLB flush")
Signed-off-by: Ingo Molnar 
---
 arch/x86/hyperv/mmu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index e65d7fe..5208ba4 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -37,12 +37,14 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
 * Lower 12 bits encode the number of additional
 * pages to flush (in addition to the 'cur' page).
 */
-   if (diff >= HV_TLB_FLUSH_UNIT)
+   if (diff >= HV_TLB_FLUSH_UNIT) {
gva_list[gva_n] |= ~PAGE_MASK;
-   else if (diff)
+   cur += HV_TLB_FLUSH_UNIT;
+   }  else if (diff) {
gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT;
+   cur = end;
+   }
 
-   cur += HV_TLB_FLUSH_UNIT;
gva_n++;
 
} while (cur < end);


Re: [PATCH] x86/Hyper-V: Fix overflow issue in the fill_gva_list()

2019-09-02 Thread Tianyu Lan
On Sat, Aug 31, 2019 at 1:41 AM Michael Kelley  wrote:
>
> From: lantianyu1...@gmail.com  Sent: Thursday, August 29, 2019 11:16 PM
> >
> > From: Tianyu Lan 
> >
> > fill_gva_list() populates gva list and adds offset
> > HV_TLB_FLUSH_UNIT(0x100) to variable "cur"
> > in the each loop. When diff between "end" and "cur" is
> > less than HV_TLB_FLUSH_UNIT, the gva entry should
> > be the last one and the loop should be end.
> >
> > If cur is equal or greater than 0xFF00 on 32-bit
> > mode, "cur" will be overflow after adding HV_TLB_FLUSH_UNIT.
> > Its value will be wrapped and less than "end". fill_gva_list()
> > falls into an infinite loop and fill gva list out of
> > border finally.
> >
> > Set "cur" to be "end" to make loop end when diff is
> > less than HV_TLB_FLUSH_UNIT and add HV_TLB_FLUSH_UNIT to
> > "cur" when diff is equal or greater than HV_TLB_FLUSH_UNIT.
> > Fix the overflow issue.
>
> Let me suggest simplifying the commit message a bit.  It
> doesn't need to describe every line of the code change.   I think
> it should also make clear that the same problem could occur on
> 64-bit systems with the right "start" address.  My suggestion:
>
> When the 'start' parameter is >=  0xFF00 on 32-bit
> systems, or >= 0x'FF00 on 64-bit systems,
> fill_gva_list gets into an infinite loop.  With such inputs,
> 'cur' overflows after adding HV_TLB_FLUSH_UNIT and always
> compares as less than end.  Memory is filled with guest virtual
> addresses until the system crashes
>
> Fix this by never incrementing 'cur' to be larger than 'end'.
>
> >
> > Reported-by: Jong Hyun Park 
> > Signed-off-by: Tianyu Lan 
> > Fixes: 2ffd9e33ce4a ("x86/hyper-v: Use hypercall for remote
> > TLB flush")
>
> The "Fixes:" line needs to not wrap.  It's exempt from the
> "wrap at 75 columns" rule in order to simplify parsing scripts.
>
> The code itself looks good.

Hi Michael:
   Thanks for suggestion. Update commit log in V2.
-- 
Best regards
Tianyu Lan


[tip: timers/core] x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n

2019-08-28 Thread tip-bot2 for Tianyu Lan
The following commit has been merged into the timers/core branch of tip:

Commit-ID: 41cfe2a2a7f4fad5647031ad3a1da166452b5437
Gitweb:
https://git.kernel.org/tip/41cfe2a2a7f4fad5647031ad3a1da166452b5437
Author:Tianyu Lan 
AuthorDate:Wed, 28 Aug 2019 16:07:47 +08:00
Committer: Thomas Gleixner 
CommitterDate: Wed, 28 Aug 2019 12:25:06 +02:00

x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n

hv_setup_sched_clock() references pv_ops which is only available when
CONFIG_PARAVIRT=Y.

Wrap it into a #ifdef

Signed-off-by: Tianyu Lan 
Signed-off-by: Thomas Gleixner 
Link: https://lkml.kernel.org/r/20190828080747.204419-1-tianyu@microsoft.com

---
 arch/x86/kernel/cpu/mshyperv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 53afd33..267daad 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -346,7 +346,9 @@ static void __init ms_hyperv_init_platform(void)
 
 void hv_setup_sched_clock(void *sched_clock)
 {
+#ifdef CONFIG_PARAVIRT
pv_ops.time.sched_clock = sched_clock;
+#endif
 }
 
 const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {


Re: [PATCH V3 0/3] KVM/Hyper-V: Add Hyper-V direct tlb flush support

2019-08-27 Thread Tianyu Lan
On Tue, Aug 27, 2019 at 8:38 PM Vitaly Kuznetsov  wrote:
>
> Tianyu Lan  writes:
>
> > On Tue, Aug 27, 2019 at 2:41 PM Vitaly Kuznetsov  
> > wrote:
> >>
> >> lantianyu1...@gmail.com writes:
> >>
> >> > From: Tianyu Lan 
> >> >
> >> > This patchset is to add Hyper-V direct tlb support in KVM. Hyper-V
> >> > in L0 can delegate L1 hypervisor to handle tlb flush request from
> >> > L2 guest when direct tlb flush is enabled in L1.
> >> >
> >> > Patch 2 introduces new cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH to enable
> >> > feature from user space. User space should enable this feature only
> >> > when Hyper-V hypervisor capability is exposed to guest and KVM profile
> >> > is hided. There is a parameter conflict between KVM and Hyper-V 
> >> > hypercall.
> >> > We hope L2 guest doesn't use KVM hypercall when the feature is
> >> > enabled. Detail please see comment of new API
> >> > "KVM_CAP_HYPERV_DIRECT_TLBFLUSH"
> >>
> >> I was thinking about this for awhile and I think I have a better
> >> proposal. Instead of adding this new capability let's enable direct TLB
> >> flush when KVM guest enables Hyper-V Hypercall page (writes to
> >> HV_X64_MSR_HYPERCALL) - this guarantees that the guest doesn't need KVM
> >> hypercalls as we can't handle both KVM-style and Hyper-V-style
> >> hypercalls simultaneously and kvm_emulate_hypercall() does:
> >>
> >> if (kvm_hv_hypercall_enabled(vcpu->kvm))
> >> return kvm_hv_hypercall(vcpu);
> >>
> >> What do you think?
> >>
> >> (and instead of adding the capability we can add kvm.ko module parameter
> >> to enable direct tlb flush unconditionally, like
> >> 'hv_direct_tlbflush=-1/0/1' with '-1' being the default (autoselect
> >> based on Hyper-V hypercall enablement, '0' - permanently disabled, '1' -
> >> permanenetly enabled)).
> >>
> >
> > Hi Vitaly::
> >  Actually, I had such idea before. But user space should check
> > whether hv tlb flush
> > is exposed to VM before enabling direct tlb flush. If no, user space
> > should not direct
> > tlb flush for guest since Hyper-V will do more check for each
> > hypercall from nested
> > VM with enabling the feauter..
>
> If TLB Flush enlightenment is not exposed to the VM at all there's no
> difference if we enable direct TLB flush in eVMCS or not: the guest
> won't be using 'TLB Flush' hypercall and will do TLB flushing with
> IPIs. And, in case the guest enables Hyper-V hypercall page, it is
> definitelly not going to use KVM hypercalls so we can't break these.
>

Yes, this won't tigger KVM/Hyper-V hypercall conflict. My point is
that if tlb flush enlightenment is not enabled, enabling direct tlb
flush will not accelate anything and Hyper-V still will check each
hypercalls from nested VM in order to intercpt tlb flush hypercall
But guest won't use tlb flush hypercall in this case. The check
of each hypercall in Hyper-V is redundant. We may avoid the
overhead via checking status of tlb flush enlightenment and just
enable direct tlb flush when it's enabled.

---
Best regards
Tianyu Lan


Re: [PATCH V3 0/3] KVM/Hyper-V: Add Hyper-V direct tlb flush support

2019-08-27 Thread Tianyu Lan
On Tue, Aug 27, 2019 at 2:41 PM Vitaly Kuznetsov  wrote:
>
> lantianyu1...@gmail.com writes:
>
> > From: Tianyu Lan 
> >
> > This patchset is to add Hyper-V direct tlb support in KVM. Hyper-V
> > in L0 can delegate L1 hypervisor to handle tlb flush request from
> > L2 guest when direct tlb flush is enabled in L1.
> >
> > Patch 2 introduces new cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH to enable
> > feature from user space. User space should enable this feature only
> > when Hyper-V hypervisor capability is exposed to guest and KVM profile
> > is hided. There is a parameter conflict between KVM and Hyper-V hypercall.
> > We hope L2 guest doesn't use KVM hypercall when the feature is
> > enabled. Detail please see comment of new API
> > "KVM_CAP_HYPERV_DIRECT_TLBFLUSH"
>
> I was thinking about this for awhile and I think I have a better
> proposal. Instead of adding this new capability let's enable direct TLB
> flush when KVM guest enables Hyper-V Hypercall page (writes to
> HV_X64_MSR_HYPERCALL) - this guarantees that the guest doesn't need KVM
> hypercalls as we can't handle both KVM-style and Hyper-V-style
> hypercalls simultaneously and kvm_emulate_hypercall() does:
>
> if (kvm_hv_hypercall_enabled(vcpu->kvm))
> return kvm_hv_hypercall(vcpu);
>
> What do you think?
>
> (and instead of adding the capability we can add kvm.ko module parameter
> to enable direct tlb flush unconditionally, like
> 'hv_direct_tlbflush=-1/0/1' with '-1' being the default (autoselect
> based on Hyper-V hypercall enablement, '0' - permanently disabled, '1' -
> permanenetly enabled)).
>

Hi Vitaly::
 Actually, I had such idea before. But user space should check
whether hv tlb flush
is exposed to VM before enabling direct tlb flush. If no, user space
should not direct
tlb flush for guest since Hyper-V will do more check for each
hypercall from nested
VM with enabling the feauter..

-- 
Best regards
Tianyu Lan


Re: [PATCH] x86/Hyper-V: Fix build error with CONFIG_HYPERV_TSCPAGE=N

2019-08-25 Thread Tianyu Lan
On Sun, Aug 25, 2019 at 1:52 AM Sasha Levin  wrote:
>
> On Thu, Aug 22, 2019 at 10:39:46AM +0200, Vitaly Kuznetsov wrote:
> >lantianyu1...@gmail.com writes:
> >
> >> From: Tianyu Lan 
> >>
> >> Both Hyper-V tsc page and Hyper-V tsc MSR code use variable
> >> hv_sched_clock_offset for their sched clock callback and so
> >> define the variable regardless of CONFIG_HYPERV_TSCPAGE setting.
> >
> >CONFIG_HYPERV_TSCPAGE is gone after my "x86/hyper-v: enable TSC page
> >clocksource on 32bit" patch. Do we still have an issue to fix?
>
> Yes. Let's get it fixed on older kernels (as such we need to tag this
> one for stable). The 32bit TSC patch won't come in before 5.4 anyway.
>
> Vitaly, does can you ack this patch? It might require you to re-spin
> your patch.
>
Hi Sasha:
           Thomas has foled this fix into original patch.
https://lkml.org/lkml/2019/8/23/600
--
Best regards
Tianyu Lan


[tip: timers/core] clocksource/drivers/hyperv: Add Hyper-V specific sched clock function

2019-08-23 Thread tip-bot2 for Tianyu Lan
The following commit has been merged into the timers/core branch of tip:

Commit-ID: bd00cd52d5be655a2f217e2ed74b91a71cb2b14f
Gitweb:
https://git.kernel.org/tip/bd00cd52d5be655a2f217e2ed74b91a71cb2b14f
Author:Tianyu Lan 
AuthorDate:Wed, 14 Aug 2019 20:32:16 +08:00
Committer: Thomas Gleixner 
CommitterDate: Fri, 23 Aug 2019 16:59:54 +02:00

clocksource/drivers/hyperv: Add Hyper-V specific sched clock function

Hyper-V guests use the default native_sched_clock() in
pv_ops.time.sched_clock on x86. But native_sched_clock() directly uses the
raw TSC value, which can be discontinuous in a Hyper-V VM.

Add the generic hv_setup_sched_clock() to set the sched clock function
appropriately. On x86, this sets pv_ops.time.sched_clock to read the
Hyper-V reference TSC value that is scaled and adjusted to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.

[ tglx: Folded build fix ]

Signed-off-by: Tianyu Lan 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Michael Kelley 
Link: https://lkml.kernel.org/r/20190814123216.32245-3-tianyu@microsoft.com
---
 arch/x86/hyperv/hv_init.c  |  2 --
 arch/x86/kernel/cpu/mshyperv.c |  8 
 drivers/clocksource/hyperv_timer.c | 22 --
 include/asm-generic/mshyperv.h |  1 +
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0d25868..866dfb3 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -301,8 +301,6 @@ void __init hyperv_init(void)
 
x86_init.pci.arch_init = hv_pci_init;
 
-   /* Register Hyper-V specific clocksource */
-   hv_init_clocksource();
return;
 
 remove_cpuhp_state:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 062f772..53afd33 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
@@ -338,9 +339,16 @@ static void __init ms_hyperv_init_platform(void)
x2apic_phys = 1;
 # endif
 
+   /* Register Hyper-V specific clocksource */
+   hv_init_clocksource();
 #endif
 }
 
+void hv_setup_sched_clock(void *sched_clock)
+{
+   pv_ops.time.sched_clock = sched_clock;
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
.name   = "Microsoft Hyper-V",
.detect = ms_hyperv_platform,
diff --git a/drivers/clocksource/hyperv_timer.c 
b/drivers/clocksource/hyperv_timer.c
index 432aa33..c322ab4 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -22,6 +22,7 @@
 #include 
 
 static struct clock_event_device __percpu *hv_clock_event;
+static u64 hv_sched_clock_offset __ro_after_init;
 
 /*
  * If false, we're using the old mechanism for stimer0 interrupts
@@ -222,7 +223,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 notrace read_hv_clock_tsc(struct clocksource *arg)
 {
u64 current_tick = hv_read_tsc_page(_pg);
 
@@ -232,9 +233,9 @@ static u64 notrace read_hv_sched_clock_tsc(void)
return current_tick;
 }
 
-static u64 read_hv_clock_tsc(struct clocksource *arg)
+static u64 read_hv_sched_clock_tsc(void)
 {
-   return read_hv_sched_clock_tsc();
+   return read_hv_clock_tsc(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_tsc = {
@@ -246,7 +247,7 @@ static struct clocksource hyperv_cs_tsc = {
 };
 #endif
 
-static u64 notrace read_hv_sched_clock_msr(void)
+static u64 notrace read_hv_clock_msr(struct clocksource *arg)
 {
u64 current_tick;
/*
@@ -258,9 +259,9 @@ static u64 notrace read_hv_sched_clock_msr(void)
return current_tick;
 }
 
-static u64 read_hv_clock_msr(struct clocksource *arg)
+static u64 read_hv_sched_clock_msr(void)
 {
-   return read_hv_sched_clock_msr();
+   return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_msr = {
@@ -298,8 +299,9 @@ static bool __init hv_init_tsc_clocksource(void)
hv_set_clocksource_vdso(hyperv_cs_tsc);
clocksource_register_hz(_cs_tsc, NSEC_PER_SEC/100);
 
-   /* sched_clock_register is needed on ARM64 but is a no-op on x86 */
-   sched_clock_register(read_hv_sched_clock_tsc, 64, HV_CLOCK_HZ);
+   hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
+   hv_setup_sched_clock(read_hv_sched_clock_tsc);
+
return true;
 }
 #else
@@ -329,7 +331,7 @@ void __init hv_init_clocksource(void)
hyperv_cs = _cs_msr;
clocksource_register_hz(_cs_msr, NSEC_PER_SEC/100);
 
-   /* sched_clock_register is needed on ARM64 but is a

[tip: timers/core] clocksource/drivers/hyperv: Allocate Hyper-V TSC page statically

2019-08-23 Thread tip-bot2 for Tianyu Lan
The following commit has been merged into the timers/core branch of tip:

Commit-ID: adb87ff4f96c9700718e09c97a804124d5cd61ff
Gitweb:
https://git.kernel.org/tip/adb87ff4f96c9700718e09c97a804124d5cd61ff
Author:Tianyu Lan 
AuthorDate:Wed, 14 Aug 2019 20:32:15 +08:00
Committer: Thomas Gleixner 
CommitterDate: Fri, 23 Aug 2019 16:59:53 +02:00

clocksource/drivers/hyperv: Allocate Hyper-V TSC page statically

Prepare to add Hyper-V sched clock callback and move Hyper-V Reference TSC
initialization much earlier in the boot process.  Earlier initialization is
needed so that it happens while the timestamp value is still 0 and no
discontinuity in the timestamp will occur when pv_ops.time.sched_clock
calculates its offset.

The earlier initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references to the TSC
page and the method of getting its physical address.

Signed-off-by: Tianyu Lan 
Signed-off-by: Thomas Gleixner 
Acked-by: Daniel Lezcano 
Link: https://lkml.kernel.org/r/20190814123216.32245-2-tianyu@microsoft.com
---
 arch/x86/entry/vdso/vma.c  |  2 +-
 drivers/clocksource/hyperv_timer.c | 12 
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 349a61d..f593774 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -122,7 +122,7 @@ static vm_fault_t vvar_fault(const struct 
vm_special_mapping *sm,
 
if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
return vmf_insert_pfn(vma, vmf->address,
-   vmalloc_to_pfn(tsc_pg));
+   virt_to_phys(tsc_pg) >> PAGE_SHIFT);
}
 
return VM_FAULT_SIGBUS;
diff --git a/drivers/clocksource/hyperv_timer.c 
b/drivers/clocksource/hyperv_timer.c
index ba2c79e..432aa33 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -214,17 +214,17 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 
-static struct ms_hyperv_tsc_page *tsc_pg;
+static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
-   return tsc_pg;
+   return _pg;
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
 static u64 notrace read_hv_sched_clock_tsc(void)
 {
-   u64 current_tick = hv_read_tsc_page(tsc_pg);
+   u64 current_tick = hv_read_tsc_page(_pg);
 
if (current_tick == U64_MAX)
hv_get_time_ref_count(current_tick);
@@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
return false;
 
-   tsc_pg = vmalloc(PAGE_SIZE);
-   if (!tsc_pg)
-   return false;
-
hyperv_cs = _cs_tsc;
-   phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+   phys_addr = virt_to_phys(_pg);
 
/*
 * The Hyper-V TLFS specifies to preserve the value of reserved


Re: [PATCH] x86/Hyper-V: Fix build error with CONFIG_HYPERV_TSCPAGE=N

2019-08-22 Thread Tianyu Lan
On Thu, Aug 22, 2019 at 4:39 PM Vitaly Kuznetsov  wrote:
>
> lantianyu1...@gmail.com writes:
>
> > From: Tianyu Lan 
> >
> > Both Hyper-V tsc page and Hyper-V tsc MSR code use variable
> > hv_sched_clock_offset for their sched clock callback and so
> > define the variable regardless of CONFIG_HYPERV_TSCPAGE setting.
>
> CONFIG_HYPERV_TSCPAGE is gone after my "x86/hyper-v: enable TSC page
> clocksource on 32bit" patch. Do we still have an issue to fix?
>
Hi Vtialy:
 Your patch also fixs the build issue. If it's not
necessary to have a dedicated patch
to fix the issue, please ignore it. Thanks.

> >
> > Signed-off-by: Tianyu Lan 
> > ---
> > This patch is based on the top of 
> > "git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
> > timers/core".
> >
> >  drivers/clocksource/hyperv_timer.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/clocksource/hyperv_timer.c 
> > b/drivers/clocksource/hyperv_timer.c
> > index dad8af198e20..c322ab4d3689 100644
> > --- a/drivers/clocksource/hyperv_timer.c
> > +++ b/drivers/clocksource/hyperv_timer.c
> > @@ -22,6 +22,7 @@
> >  #include 
> >
> >  static struct clock_event_device __percpu *hv_clock_event;
> > +static u64 hv_sched_clock_offset __ro_after_init;
> >
> >  /*
> >   * If false, we're using the old mechanism for stimer0 interrupts
> > @@ -215,7 +216,6 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
> >  #ifdef CONFIG_HYPERV_TSCPAGE
> >
> >  static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
> > -static u64 hv_sched_clock_offset __ro_after_init;
> >
> >  struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> >  {
>
> --
> Vitaly



-- 
Best regards
Tianyu Lan


RE: [tip:timers/core 34/34] drivers//clocksource/hyperv_timer.c:264:35: error: 'hv_sched_clock_offset' undeclared; did you mean 'sched_clock_register'?

2019-08-21 Thread Tianyu Lan
Thanks for reporting. I will send out fix patch.

-Original Message-
From: kbuild test robot  
Sent: Thursday, August 22, 2019 10:25 AM
To: Tianyu Lan 
Cc: kbuild-...@01.org; linux-kernel@vger.kernel.org; tipbu...@zytor.com; Thomas 
Gleixner ; Michael Kelley 
Subject: [tip:timers/core 34/34] drivers//clocksource/hyperv_timer.c:264:35: 
error: 'hv_sched_clock_offset' undeclared; did you mean 'sched_clock_register'?

tree:   
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkernel.googlesource.com%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftip%2Ftip.gitdata=02%7C01%7CTianyu.Lan%40microsoft.com%7Cfa01680d45d1424cbbc308d726a82122%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637020378361213701sdata=56fY4vgmkc4Nk3ZqCZhRyaA%2BmfKSd%2Fp9eYXZUahw5uo%3Dreserved=0
 timers/core
head:   b74e1d61dbc614ff35ef3ad9267c61ed06b09051
commit: b74e1d61dbc614ff35ef3ad9267c61ed06b09051 [34/34] clocksource/hyperv: 
Add Hyper-V specific sched clock function
config: i386-randconfig-g002-201933 (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
git checkout b74e1d61dbc614ff35ef3ad9267c61ed06b09051
# save the attached .config to linux build tree
make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All error/warnings (new ones prefixed by >>):

   drivers//clocksource/hyperv_timer.c: In function 'read_hv_sched_clock_msr':
>> drivers//clocksource/hyperv_timer.c:264:35: error: 'hv_sched_clock_offset' 
>> undeclared (first use in this function); did you mean 'sched_clock_register'?
 return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
  ^
  sched_clock_register
   drivers//clocksource/hyperv_timer.c:264:35: note: each undeclared identifier 
is reported only once for each function it appears in
   drivers//clocksource/hyperv_timer.c: In function 'hv_init_clocksource':
   drivers//clocksource/hyperv_timer.c:334:2: error: 'hv_sched_clock_offset' 
undeclared (first use in this function); did you mean 'sched_clock_register'?
 hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
 ^
 sched_clock_register
   drivers//clocksource/hyperv_timer.c: In function 'read_hv_sched_clock_msr':
>> drivers//clocksource/hyperv_timer.c:265:1: warning: control reaches end of 
>> non-void function [-Wreturn-type]
}
^

vim +264 drivers//clocksource/hyperv_timer.c

   261  
   262  static u64 read_hv_sched_clock_msr(void)
   263  {
 > 264  return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
 > 265  }
   266  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.01.org%2Fpipermail%2Fkbuild-alldata=02%7C01%7CTianyu.Lan%40microsoft.com%7Cfa01680d45d1424cbbc308d726a82122%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637020378361213701sdata=bmZV%2B2uKHUlwngubxhE2bZlfOqCRNYVDCXOs%2FWcy3f8%3Dreserved=0
   Intel Corporation


Re: [PATCH V3 2/3] KVM/Hyper-V: Add new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH

2019-08-20 Thread Tianyu Lan
Hi Thomas:
   Thanks for your review. Will fix your comment in the
next version.

On Mon, Aug 19, 2019 at 9:27 PM Thomas Gleixner  wrote:
>
> On Mon, 19 Aug 2019, lantianyu1...@gmail.com wrote:
>
> > From: Tianyu Lan 
> >
> > This patch adds
>
> Same git grep command as before
>
> >  new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH and let
>
> baseball cap? Please do not use weird acronyms. This is text and there is
> not limitation on characters.
>
> > user space to enable direct tlb flush function when only Hyper-V
> > hypervsior capability is exposed to VM.
>
> Sorry, but I'm not understanding this sentence.
>
> > This patch also adds
>
> Once more
>
> > enable_direct_tlbflush callback in the struct kvm_x86_ops and
> > platforms may use it to implement direct tlb flush support.
>
> Please tell in the changelog WHY you are doing things not what. The what is
> obviously in the patch.
>
> So you want to explain what you are trying to achieve and why it is
> useful. Then you can add a short note about what you are adding, but not at
> the level of detail which is available from the diff itself.
>
> Thanks,
>
> tglx



--
Best regards
Tianyu Lan


Re: [PATCH V2 3/3] KVM/Hyper-V/VMX: Add direct tlb flush support

2019-08-15 Thread Tianyu Lan
Hi Paolo:
  Thanks for your review.

On Wed, Aug 14, 2019 at 9:33 PM Paolo Bonzini  wrote:
>
> On 14/08/19 09:34, lantianyu1...@gmail.com wrote:
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index c5da875f19e3..479ad76661e6 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -500,6 +500,7 @@ struct kvm {
> >   struct srcu_struct srcu;
> >   struct srcu_struct irq_srcu;
> >   pid_t userspace_pid;
> > + struct hv_partition_assist_pg *hv_pa_pg;
> >  };
> >
> >  #define kvm_err(fmt, ...) \
>
> This does not exist on non-x86 architectures.  Please move it to struct
> kvm_arch.
>
Nice catch. Will update in the next version. Thanks.
-- 
Best regards
Tianyu Lan


RE: [PATCH] MAINTAINERS: Hyper-V: Fix typo in a filepath

2019-08-13 Thread Tianyu Lan
Hi Denis:
 Thanks for notice. I has posted a fix patch before. 
https://lkml.org/lkml/2019/8/13/73

Hi Sashe:
Could you take care the fix patch? Thanks.

-Original Message-
From: Denis Efremov  
Sent: Tuesday, August 13, 2019 2:04 PM
To: linux-kernel@vger.kernel.org
Cc: Denis Efremov ; j...@perches.com; Tianyu Lan 
; Sasha Levin ; 
linux-hyp...@vger.kernel.org
Subject: [PATCH] MAINTAINERS: Hyper-V: Fix typo in a filepath

Fix typo in hyperv-iommu.c filepath.

Cc: Lan Tianyu 
Cc: Sasha Levin 
Cc: linux-hyp...@vger.kernel.org
Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Signed-off-by: Denis Efremov 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2764e0872ebd..51ab502485ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7452,7 +7452,7 @@ F:drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/uio/uio_hv_generic.c
 F: drivers/video/fbdev/hyperv_fb.c
-F: drivers/iommu/hyperv_iommu.c
+F: drivers/iommu/hyperv-iommu.c
 F: net/vmw_vsock/hyperv_transport.c
 F: include/clocksource/hyperv_timer.h
 F: include/linux/hyperv.h
-- 
2.21.0



Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function

2019-07-30 Thread Tianyu Lan
Hi Vitaly & Peter:
Thanks for your review.

On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov  wrote:
>
> Peter Zijlstra  writes:
>
> > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> >> lantianyu1...@gmail.com writes:
> >>
> >> > From: Tianyu Lan 
> >> >
> >> > Hyper-V guests use the default native_sched_clock() in 
> >> > pv_ops.time.sched_clock
> >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> >> > can be discontinuous in a Hyper-V VM.   Add the generic 
> >> > hv_setup_sched_clock()
> >> > to set the sched clock function appropriately.  On x86, this sets
> >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> >> > scaled and adjusted to be continuous.
> >>
> >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> >> MSR-based clocksource but using it as sched_clock() can be very slow,
> >> I'm afraid.
> >>
> >> On the other hand, what we have now is probably worse: TSC can,
> >> actually, jump backwards (e.g. on migration) and we're breaking the
> >> requirements for sched_clock().
> >
> > That (obviously) also breaks the requirements for using TSC as
> > clocksource.
> >
> > IOW, it breaks the entire purpose of having TSC in the first place.
>
> Currently, we mark raw TSC as unstable when running on Hyper-V (see
> 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> instead. The problem is that 'TSC page' can be disabled by the
> hypervisor and in that case the only remaining clocksource is MSR-based
> (slow).
>

Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
kernel uses MSR based
clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
take this into
account and determine which clocksource should be exposed or not.

-- 
Best regards
Tianyu Lan


Re: [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N

2019-03-26 Thread Tianyu Lan
Y on x86 would be a quick fix and it's easy to
be backported.

>
> tglx
>
> 8<
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -564,6 +564,20 @@ static void undo_cpu_up(unsigned int cpu
> cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
>  }
>
> +static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
> +{
> +   if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
> +   return true;
> +   /*
> +* When CPU hotplug is disabled, then taking the CPU down is not
> +* possible because takedown_cpu() and the architecture and
> +* subsystem specific mechanisms are not available. So the CPU
> +* which would be completely unplugged again needs to stay around
> +* in the current state, i.e. <= CPUHP_AP_ONLINE_IDLE.
> +*/
> +   return st->state <= CPUHP_BRINGUP_CPU;
> +}
> +
>  static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
>   enum cpuhp_state target)
>  {
> @@ -574,8 +588,10 @@ static int cpuhp_up_callbacks(unsigned i
> st->state++;
> ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
> if (ret) {
> -   st->target = prev_state;
> -   undo_cpu_up(cpu, st);
> +   if (can_rollback_cpu(st)) {
> +   st->target = prev_state;
> +   undo_cpu_up(cpu, st);
> +   }
> break;
> }
> }
>

I have tested your patch. It resolves the crash with "nosmt" parameter.
-- 
Best regards
Tianyu Lan


RE: Bad file pattern in MAINTAINERS section 'Hyper-V CORE AND DRIVERS'

2019-03-26 Thread Tianyu Lan
Hi Joe:
Thanks for report. I just sent out a fix patch.

-Original Message-
From: Joe Perches  
Sent: Tuesday, March 26, 2019 5:25 AM
To: linux-kernel@vger.kernel.org
Cc: KY Srinivasan ; Haiyang Zhang ; 
Stephen Hemminger ; Sasha Levin ; 
linux-hyp...@vger.kernel.org; Michael Kelley ; Tianyu 
Lan ; Joerg Roedel 
Subject: Bad file pattern in MAINTAINERS section 'Hyper-V CORE AND DRIVERS'

A file pattern line in this section of the MAINTAINERS file in linux-next does 
not have a match in the linux source files.

This could occur because a matching filename was never added, was deleted or 
renamed in some other commit.

The commits that added and if found renamed or removed the file pattern are 
shown below.

Please fix this defect appropriately.

1: ---

linux-next MAINTAINERS section:

7164Hyper-V CORE AND DRIVERS
7165M:  "K. Y. Srinivasan" 
7166M:  Haiyang Zhang 
7167M:  Stephen Hemminger 
7168M:  Sasha Levin 
7169T:  git 
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
7170L:  linux-hyp...@vger.kernel.org
7171S:  Supported
7172F:  
Documentation/networking/device_drivers/microsoft/netvsc.txt
7173F:  arch/x86/include/asm/mshyperv.h
7174F:  arch/x86/include/asm/trace/hyperv.h
7175F:  arch/x86/include/asm/hyperv-tlfs.h
7176F:  arch/x86/kernel/cpu/mshyperv.c
7177F:  arch/x86/hyperv
7178F:  drivers/hid/hid-hyperv.c
7179F:  drivers/hv/
7180F:  drivers/input/serio/hyperv-keyboard.c
7181F:  drivers/pci/controller/pci-hyperv.c
7182F:  drivers/net/hyperv/
7183F:  drivers/scsi/storvsc_drv.c
7184F:  drivers/uio/uio_hv_generic.c
7185F:  drivers/video/fbdev/hyperv_fb.c
--> 7186F:  drivers/iommu/hyperv_iommu.c
7187F:  net/vmw_vsock/hyperv_transport.c
7188F:  include/linux/hyperv.h
7189F:  include/uapi/linux/hyperv.h
7190F:  tools/hv/
7191F:  Documentation/ABI/stable/sysfs-bus-vmbus

2: ---

The most recent commit that added or modified file pattern 
'drivers/iommu/hyperv_iommu.c':

commit 32d5860a9e3c98b5043716fff05a7b20b15918f9
Author: Lan Tianyu 
Date:   Wed Feb 27 22:54:05 2019 +0800

MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS scope

This patch is to add Hyper-V IOMMU driver file into Hyper-V CORE and
DRIVERS scope.

Reviewed-by: Michael Kelley 
Signed-off-by: Lan Tianyu 
Signed-off-by: Joerg Roedel 

 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

3: ---

No commit with file pattern 'drivers/iommu/hyperv_iommu.c' was found


Re: [Resend PATCH V5 0/3] x86/Hyper-V/IOMMU: Add Hyper-V IOMMU driver to support x2apic mode

2019-02-26 Thread Tianyu Lan
On Tue, Feb 26, 2019 at 9:07 PM Joerg Roedel  wrote:
>
> On Tue, Feb 26, 2019 at 08:07:17PM +0800, lantianyu1...@gmail.com wrote:
> > Lan Tianyu (3):
> >   x86/Hyper-V: Set x2apic destination mode to physical when x2apic is
> >  available
> >   HYPERV/IOMMU: Add Hyper-V stub IOMMU driver
> >   MAINTAINERS: Add Hyper-V IOMMU driver into Hyper-V CORE AND DRIVERS
> > scope
>
> Applied (patch 2 with slight subject changes
> 'HYPERV/IOMMU' -> 'iommu/hyper-v'), thanks.

Great. Thanks.

-- 
Best regards
Tianyu Lan


Re: [PATCH V3 1/10] X86/Hyper-V: Add parameter offset for hyperv_fill_flush_guest_mapping_list()

2019-02-26 Thread Tianyu Lan
Hi Stephen:
   Thanks for your review.
On Sat, Feb 23, 2019 at 1:08 AM Stephen Hemminger
 wrote:
>
> int hyperv_fill_flush_guest_mapping_list(
> struct hv_guest_mapping_flush_list *flush,
> -   u64 start_gfn, u64 pages)
> +   int offset, u64 start_gfn, u64 pages)
>  {
> u64 cur = start_gfn;
> u64 additional_pages;
> -   int gpa_n = 0;
> +   int gpa_n = offset;
>
> do {
> /*
>
> Do you mean to support negative offsets here? Maybe unsigned would be better?

Yes, this makes sense. Will update. Thanks.

-- 
Best regards
Tianyu Lan


Re: [Update PATCH] x86/Hyper-V: Fix definition HV_MAX_FLUSH_REP_COUNT

2019-02-25 Thread Tianyu Lan
On Mon, Feb 25, 2019 at 10:19 PM Greg KH  wrote:
>
> On Mon, Feb 25, 2019 at 10:12:14PM +0800, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > The max flush rep count of HvFlushGuestPhysicalAddressList hypercall
> > is equal with how many entries of union hv_gpa_page_range can be populated
> > into the input parameter page. The origin code lacks parenthesis around
> > PAGE_SIZE - 2 * sizeof(u64). This patch is to fix it.
> >
> > Cc: 
> > Fixs: cc4edae4b924 ("x86/hyper-v: Add HvFlushGuestAddressList hypercall 
> > support")
>
> "Fixes"

Sorry, fix this in the V2.
-- 
Best regards
Tianyu Lan


Re: [PATCH V3 00/10] X86/KVM/Hyper-V: Add HV ept tlb range list flush support in KVM

2019-02-23 Thread Tianyu Lan
On Sat, Feb 23, 2019 at 2:26 AM Paolo Bonzini  wrote:
>
> On 22/02/19 16:06, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > This patchset is to introduce hv ept tlb range list flush function
> > support in the KVM MMU component. Flushing ept tlbs of several address
> > range can be done via single hypercall and new list flush function is
> > used in the kvm_mmu_commit_zap_page() and FNAME(sync_page). This patchset
> > also adds more hv ept tlb range flush support in more KVM MMU function.
> >
> > This patchset is based on the fix patch "x86/Hyper-V: Fix definition 
> > HV_MAX_FLUSH_REP_COUNT".
> > (https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1939455.html)
>
> Note that this won't make it in 5.1 unless Linus releases an -rc8.
> Otherwise, I'll get to it next week.
Hi Paolo:
  Sure. Thanks for your review.

-- 
Best regards
Tianyu Lan


Re: [PATCH] x86/Hyper-V: Fix definition HV_MAX_FLUSH_REP_COUNT

2019-02-23 Thread Tianyu Lan
On Fri, Feb 22, 2019 at 10:32 PM Greg KH  wrote:
>
> On Fri, Feb 22, 2019 at 06:48:44PM +0800, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > The max flush rep count of HvFlushGuestPhysicalAddressList hypercall
> > is equal with how many entries of union hv_gpa_page_range can be populated
> > into the input parameter page. The origin code lacks parenthesis around
> > PAGE_SIZE - 2 * sizeof(u64). This patch is to fix it.
> >
> > Cc: 
> > Fixs: cc4edae4b9(x86/hyper-v: Add HvFlushGuestAddressList hypercall support)
>
> Please use this format instead:
>
> Fixes: cc4edae4b924 ("x86/hyper-v: Add HvFlushGuestAddressList hypercall 
> support")
>
> And don't type it by hand, use a git alias for it:
> git show -s --abbrev-commit --abbrev=12 --pretty=format:"%h 
> (\"%s\")%n"
>

OK. Will update. Thanks.

> You also messed up your To: line, keeping anyone from being able to
> respond to this message who do not know how to hand-edit the response
> line :(

I put all expected reviewers in the Cc line and will put into To line.

>
> thanks,
>
> greg k-h



-- 
Best regards
Tianyu Lan


Re: [PATCH V2 3/10] KVM/MMU: Add last_level in the struct mmu_spte_page

2019-02-22 Thread Tianyu Lan
On Fri, Feb 15, 2019 at 11:23 PM Paolo Bonzini  wrote:
>
> On 15/02/19 16:05, Tianyu Lan wrote:
> > Yes, you are right. Thanks to point out and will fix. The last_level
> > flag is to avoid adding middle page node(e.g, PGD, PMD)
> > into flush list. The address range will be duplicated if adding both
> > leaf, node and middle node into flush list.
>
> Hmm, that's not easy to track.  One kvm_mmu_page could include both leaf
> and non-leaf page (for example a huge page for 0 to 2 MB and a page
> table for 2 MB to 4 MB).
>
> Is this really needed?  First, your benchmarks so far have been done
> with sp->last_level always set to true.  Second, you would only
> encounter this optimization in kvm_mmu_commit_zap_page when zapping a 1
> GB region (which then would be invalidated twice, at both the PMD and
> PGD level) or bigger.
>
> Paolo

Hi Paolo:
 Sorry for later response and I tried to figure out a bug
lead by defining wrong
max flush count. I just sent out V3. I still put the last_level flag
patch in the end of patchset.
Detail please see the change log. Just like you said this was an
optimization and wasn't 100%
required. If you still have some concerns, you can ignore it and other
patches in this patchset
should be good. Thanks.

-- 
Best regards
Tianyu Lan


RE: [PATCH V4 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-22 Thread Tianyu Lan
Hi Michael:
   Thanks for your review.

-Original Message-
From: Michael Kelley  
Sent: Friday, February 22, 2019 1:28 AM
To: lantianyu1...@gmail.com
Cc: Tianyu Lan ; j...@8bytes.org; 
mchehab+sams...@kernel.org; da...@davemloft.net; gre...@linuxfoundation.org; 
nicolas.fe...@microchip.com; a...@arndb.de; linux-kernel@vger.kernel.org; 
io...@lists.linux-foundation.org; KY Srinivasan ; vkuznets 
; alex.william...@redhat.com; sas...@kernel.org; 
dan.carpen...@oracle.com; linux-hyp...@vger.kernel.org
Subject: RE: [PATCH V4 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

From: lantianyu1...@gmail.com  Sent: Monday, February 
11, 2019 6:20 AM
> + /*
> +  * Hyper-V doesn't provide irq remapping function for
> +  * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> +  * Cpu's APIC ID is read from ACPI MADT table and APIC IDs
> +  * in the MADT table on Hyper-v are sorted monotonic increasingly.
> +  * APIC ID reflects cpu topology. There maybe some APIC ID
> +  * gaps when cpu number in a socket is not power of two. Prepare
> +  * max cpu affinity for IOAPIC irqs. Scan cpu 0-255 and set cpu
> +  * into ioapic_max_cpumask if its APIC ID is less than 256.
> +  */
> + for (i = min_t(unsigned int, num_possible_cpus(), 255); i >= 0; i--)

The above isn't quite right.  For example, if num_possible_cpus() is 8, then 
the loop will be executed 9 times, for values 8 down through 0.
It should be executed for values 7 down through 0.

Yes, fix this in the V5. Thanks.

> + if (cpu_physical_id(i) < 256)
> + cpumask_set_cpu(i, _max_cpumask);
> +
> + return 0;
> +}

Michael


Re: [PATCH V2 3/10] KVM/MMU: Add last_level in the struct mmu_spte_page

2019-02-15 Thread Tianyu Lan
On Fri, Feb 15, 2019 at 12:32 AM Paolo Bonzini  wrote:
>
> On 02/02/19 02:38, lantianyu1...@gmail.com wrote:
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index ce770b446238..70cafd3f95ab 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >
> >   if (level > PT_PAGE_TABLE_LEVEL)
> >   spte |= PT_PAGE_SIZE_MASK;
> > +
> > + sp->last_level = is_last_spte(spte, level);
>
> Wait, I wasn't thinking straight.  If a struct kvm_mmu_page exists, it
> is never the last level.  Page table entries for the last level do not
> have a struct kvm_mmu_page.
>
> Therefore you don't need the flag after all.  I suspect your
> calculations in patch 2 are off by one, and you actually need
>
> hlist_for_each_entry(sp, range->flush_list, flush_link) {
> int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1);
> ...
> }
>
> For example, if sp->role.level is 1 then the struct kvm_mmu_page is for
> a page containing PTEs and covers an area of 2 MiB.

Yes, you are right. Thanks to point out and will fix. The last_level
flag is to avoid adding middle page node(e.g, PGD, PMD)
into flush list. The address range will be duplicated if adding both
leaf, node and middle node into flush list.

>
> Thanks,
>
> Paolo
>
> >   if (tdp_enabled)
> >   spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> >   kvm_is_mmio_pfn(pfn));
>


-- 
Best regards
Tianyu Lan


Re: [PATCH V3 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-11 Thread Tianyu Lan
Hi Olaf:
 Thanks for your review.

On Fri, Feb 8, 2019 at 10:52 PM Olaf Hering  wrote:
>
> On Thu, Feb 07, lantianyu1...@gmail.com wrote:
>
> > +++ b/drivers/iommu/Kconfig
> > +config HYPERV_IOMMU
> > + bool "Hyper-V x2APIC IRQ Handling"
> > + depends on HYPERV
> > + select IOMMU_API
> > + help
>
>
> Consider adding 'default HYPERV' like some other drivers already do it.
>
> Olaf

Good suggestion and will update. Thanks.

-- 
Best regards
Tianyu Lan


Re: [PATCH V3 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-11 Thread Tianyu Lan
> > + 0, IOAPIC_REMAPPING_ENTRY, fn,
> > + _ir_domain_ops, NULL);
> > +
> > + irq_domain_free_fwnode(fn);
> > +
> > + /*
> > +  * Hyper-V doesn't provide irq remapping function for
> > +  * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> > +  * Cpu's APIC ID is read from ACPI MADT table and APIC IDs
> > +  * in the MADT table on Hyper-v are sorted monotonic increasingly.
> > +  * APIC ID reflects cpu topology. There maybe some APIC ID
> > +  * gaps when cpu number in a socket is not power of two. Prepare
> > +  * max cpu affinity for IOAPIC irqs. Scan cpu 0-255 and set cpu
> > +  * into ioapic_max_cpumask if its APIC ID is less than 256.
> > +  */
> > + for (i = 0; i < 256; i++)
> > + if (cpu_physical_id(i) < 256)
> > + cpumask_set_cpu(i, _max_cpumask);
> > +
> > + return 0;
> > +}
> > +
> > +static int __init hyperv_enable_irq_remapping(void)
> > +{
> > + return IRQ_REMAP_X2APIC_MODE;
> > +}
> > +
> > +static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info 
> > *info)
> > +{
> > + if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
> > + return ioapic_ir_domain;
> > + else
> > + return NULL;
> > +}
> > +
> > +struct irq_remap_ops hyperv_irq_remap_ops = {
> > + .prepare= hyperv_prepare_irq_remapping,
> > + .enable = hyperv_enable_irq_remapping,
> > + .get_ir_irq_domain  = hyperv_get_ir_irq_domain,
> > +};
> > +
> > +#endif
> > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > index b94ebd4..81cf290 100644
> > --- a/drivers/iommu/irq_remapping.c
> > +++ b/drivers/iommu/irq_remapping.c
> > @@ -103,6 +103,9 @@ int __init irq_remapping_prepare(void)
> >   else if (IS_ENABLED(CONFIG_AMD_IOMMU) &&
> >amd_iommu_irq_ops.prepare() == 0)
> >   remap_ops = _iommu_irq_ops;
> > + else if (IS_ENABLED(CONFIG_HYPERV_IOMMU) &&
> > +  hyperv_irq_remap_ops.prepare() == 0)
> > + remap_ops = _irq_remap_ops;
> >   else
> >   return -ENOSYS;
> >
> > diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> > index 0afef6e..f8609e9 100644
> > --- a/drivers/iommu/irq_remapping.h
> > +++ b/drivers/iommu/irq_remapping.h
> > @@ -64,6 +64,7 @@ struct irq_remap_ops {
> >
> >  extern struct irq_remap_ops intel_irq_remap_ops;
> >  extern struct irq_remap_ops amd_iommu_irq_ops;
> > +extern struct irq_remap_ops hyperv_irq_remap_ops;
> >
> >  #else  /* CONFIG_IRQ_REMAP */
>
> --
> Vitaly



-- 
Best regards
Tianyu Lan


Re: [PATCH V3 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-11 Thread Tianyu Lan
Hi Alex:
Thanks for your review.

On Fri, Feb 8, 2019 at 2:15 AM Alex Williamson
 wrote:
>
> On Thu,  7 Feb 2019 23:33:48 +0800
> lantianyu1...@gmail.com wrote:
>
> > From: Lan Tianyu 
> >
> > On the bare metal, enabling X2APIC mode requires interrupt remapping
> > function which helps to deliver irq to cpu with 32-bit APIC ID.
> > Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
> > MSI protocol already supports to deliver interrupt to the CPU whose
> > virtual processor index is more than 255. IO-APIC interrupt still has
> > 8-bit APIC ID limitation.
> >
> > This patch is to add Hyper-V stub IOMMU driver in order to enable
> > X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
> > interrupt remapping capability when X2APIC mode is available. Otherwise,
> > it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity
> > and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID.
> >
> > Define 24 IO-APIC remapping entries because Hyper-V only expose one
> > single IO-APIC and one IO-APIC has 24 pins according IO-APIC spec(
> > https://pdos.csail.mit.edu/6.828/2016/readings/ia32/ioapic.pdf).
> >
> > Signed-off-by: Lan Tianyu 
> > ---
> > Change since v2:
> >- Improve comment about why save IO-APIC entry in the irq chip data.
> >- Some code improvement.
> >- Improve statement in the IOMMU Kconfig.
> >
> > Change since v1:
> >   - Remove unused pr_fmt
> >   - Make ioapic_ir_domain as static variable
> >   - Remove unused variables cfg and entry in the 
> > hyperv_irq_remapping_alloc()
> >   - Fix comments
> > ---
> >  drivers/iommu/Kconfig |   8 ++
> >  drivers/iommu/Makefile|   1 +
> >  drivers/iommu/hyperv-iommu.c  | 194 
> > ++
> >  drivers/iommu/irq_remapping.c |   3 +
> >  drivers/iommu/irq_remapping.h |   1 +
> >  5 files changed, 207 insertions(+)
> >  create mode 100644 drivers/iommu/hyperv-iommu.c
> ...
> > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > new file mode 100644
> > index 000..d8572c5
> > --- /dev/null
> > +++ b/drivers/iommu/hyperv-iommu.c
> ...
> > +static int __init hyperv_prepare_irq_remapping(void)
> > +{
> > + struct fwnode_handle *fn;
> > + int i;
> > +
> > + if (!hypervisor_is_type(x86_hyper_type) ||
> > + !x2apic_supported())
> > + return -ENODEV;
> > +
> > + fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> > + if (!fn)
> > + return -ENOMEM;
> > +
> > + ioapic_ir_domain =
> > + irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
> > + 0, IOAPIC_REMAPPING_ENTRY, fn,
> > + _ir_domain_ops, NULL);
> > +
> > + irq_domain_free_fwnode(fn);
> > +
> > + /*
> > +  * Hyper-V doesn't provide irq remapping function for
> > +  * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> > +  * Cpu's APIC ID is read from ACPI MADT table and APIC IDs
> > +  * in the MADT table on Hyper-v are sorted monotonic increasingly.
> > +  * APIC ID reflects cpu topology. There maybe some APIC ID
> > +  * gaps when cpu number in a socket is not power of two. Prepare
> > +  * max cpu affinity for IOAPIC irqs. Scan cpu 0-255 and set cpu
> > +  * into ioapic_max_cpumask if its APIC ID is less than 256.
> > +  */
> > + for (i = 0; i < 256; i++)
> > + if (cpu_physical_id(i) < 256)
> > + cpumask_set_cpu(i, _max_cpumask);
>
> This looks sketchy.  What if NR_CPUS is less than 256?  Thanks,

Nice catch. I should check NR_CPUS here. Will update. Thanks.


-- 
Best regards
Tianyu Lan


Re: [PATCH V3 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available

2019-02-11 Thread Tianyu Lan
Hi Thomas:
  Thanks for your review.

On Mon, Feb 11, 2019 at 5:48 AM Thomas Gleixner  wrote:
>
> On Thu, 7 Feb 2019, lantianyu1...@gmail.com wrote:
>
> > From: Lan Tianyu 
> >
> > Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> > set x2apic destination mode to physcial mode when x2apic is available
> > and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
> > 8-bit APIC id.
>
> This looks good now. Can that be applied independent of the IOMMU stuff or
> should this go together. If the latter:
>
>Reviewed-by: Thomas Gleixner 
>
> If not, I just queue if for 5.1. Let me know,
>

This patch can be applied independently. Thanks.
-- 
Best regards
Tianyu Lan


Re: [PATCH 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-01 Thread Tianyu Lan
med_id_fwnode("HYPERV-IR", 0);
> > + if (!fn)
> > + return -EFAULT;
> > +
> > + ioapic_ir_domain =
> > + irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
> > + 0, IOAPIC_REMAPPING_ENTRY, fn,
> > + _ir_domain_ops, NULL);
> > +
> > + irq_domain_free_fwnode(fn);
> > +
> > + /*
> > +  * Hyper-V doesn't provide irq remapping function for
> > +  * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
> > +  * Prepare max cpu affinity for IOAPIC irqs. Scan cpu 0-255
> > +  * and set cpu into ioapic_max_cpumask if its APIC ID is less
> > +  * than 255.
> > +  */
> > + for (i = 0; i < 256; i++) {
> > + apic_id = cpu_physical_id(i);
> > + if (apic_id > 255)
> > + continue;
> > +
> > + cpumask_set_cpu(i, _max_cpumask);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int __init hyperv_enable_irq_remapping(void)
> > +{
> > + return IRQ_REMAP_X2APIC_MODE;
> > +}
> > +
> > +static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info 
> > *info)
> > +{
> > + if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
> > + return ioapic_ir_domain;
> > + else
> > + return NULL;
> > +}
> > +
> > +struct irq_remap_ops hyperv_irq_remap_ops = {
> > + .prepare= hyperv_prepare_irq_remapping,
> > + .enable = hyperv_enable_irq_remapping,
> > + .get_ir_irq_domain  = hyperv_get_ir_irq_domain,
> > +};
> > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > index b94ebd4..81cf290 100644
> > --- a/drivers/iommu/irq_remapping.c
> > +++ b/drivers/iommu/irq_remapping.c
> > @@ -103,6 +103,9 @@ int __init irq_remapping_prepare(void)
> >   else if (IS_ENABLED(CONFIG_AMD_IOMMU) &&
> >amd_iommu_irq_ops.prepare() == 0)
> >   remap_ops = _iommu_irq_ops;
> > + else if (IS_ENABLED(CONFIG_HYPERV_IOMMU) &&
> > +  hyperv_irq_remap_ops.prepare() == 0)
> > + remap_ops = _irq_remap_ops;
> >   else
> >   return -ENOSYS;
> >
> > diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> > index 0afef6e..f8609e9 100644
> > --- a/drivers/iommu/irq_remapping.h
> > +++ b/drivers/iommu/irq_remapping.h
> > @@ -64,6 +64,7 @@ struct irq_remap_ops {
> >
> >   extern struct irq_remap_ops intel_irq_remap_ops;
> >   extern struct irq_remap_ops amd_iommu_irq_ops;
> > +extern struct irq_remap_ops hyperv_irq_remap_ops;
> >
> >   #else  /* CONFIG_IRQ_REMAP */
> >
> >



-- 
Best regards
Tianyu Lan


Re: [PATCH 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-01 Thread Tianyu Lan
IC irqs. Scan cpu 0-255
> >+   * and set cpu into ioapic_max_cpumask if its APIC ID is less
> >+   * than 255.
>
> Off-by-one here: it'll set the CPU in the affinity mask if it's less
> than 256, not 255.

Yes. will update.

>
> >+   */
> >+  for (i = 0; i < 256; i++) {
> >+  apic_id = cpu_physical_id(i);
> >+  if (apic_id > 255)
> >+  continue;
> >+
> >+  cpumask_set_cpu(i, _max_cpumask);
> >+  }
>
> I'm curious here: assuming we have a large amount of CPUs, what
> guarantee do we have that this mask will have anything set? What happens
> if it remains empty?

The APIC id of BSP is always 0. The CPU's APIC ID comes from ACPI MADT table.
The APIC ID in the ACPI MADT table will be monotone increasing from Hyper-V
team.

>
> >+
> >+  return 0;
> >+}
> >+
> >+static int __init hyperv_enable_irq_remapping(void)
> >+{
> >+      return IRQ_REMAP_X2APIC_MODE;
> >+}
> >+
> >+static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info 
> >*info)
> >+{
> >+  if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
> >+  return ioapic_ir_domain;
> >+  else
> >+  return NULL;
> >+}
> >+
> >+struct irq_remap_ops hyperv_irq_remap_ops = {
> >+  .prepare= hyperv_prepare_irq_remapping,
> >+  .enable = hyperv_enable_irq_remapping,
> >+  .get_ir_irq_domain  = hyperv_get_ir_irq_domain,
> >+};
> >diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> >index b94ebd4..81cf290 100644
> >--- a/drivers/iommu/irq_remapping.c
> >+++ b/drivers/iommu/irq_remapping.c
> >@@ -103,6 +103,9 @@ int __init irq_remapping_prepare(void)
> >   else if (IS_ENABLED(CONFIG_AMD_IOMMU) &&
> >amd_iommu_irq_ops.prepare() == 0)
> >   remap_ops = _iommu_irq_ops;
> >+  else if (IS_ENABLED(CONFIG_HYPERV_IOMMU) &&
> >+   hyperv_irq_remap_ops.prepare() == 0)
> >+  remap_ops = _irq_remap_ops;
> >   else
> >   return -ENOSYS;
> >
> >diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> >index 0afef6e..f8609e9 100644
> >--- a/drivers/iommu/irq_remapping.h
> >+++ b/drivers/iommu/irq_remapping.h
> >@@ -64,6 +64,7 @@ struct irq_remap_ops {
> >
> > extern struct irq_remap_ops intel_irq_remap_ops;
> > extern struct irq_remap_ops amd_iommu_irq_ops;
> >+extern struct irq_remap_ops hyperv_irq_remap_ops;
> >
> > #else  /* CONFIG_IRQ_REMAP */
> >
> >--
> >2.7.4
> >



-- 
Best regards
Tianyu Lan


Re: [PATCH 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-02-01 Thread Tianyu Lan
Hi Joerg:
 Thanks for your review.

On Sat, Feb 2, 2019 at 12:34 AM Joerg Roedel  wrote:
>
> Hi,
>
> On Thu, Jan 31, 2019 at 06:17:32PM +0800, lantianyu1...@gmail.com wrote:
> > +config HYPERV_IOMMU
> > + bool "Hyper-V stub IOMMU support"
>
> This is not a real IOMMU driver, it only implements IRQ remapping
> capabilities. Please change the name to reflect that, e.g. to
> "Hyper-V IRQ Remapping Support" or something like that.

Yes, that makes sense. Will update.

>
> > +static int __init hyperv_prepare_irq_remapping(void)
> > +{
> > + struct fwnode_handle *fn;
> > + u32 apic_id;
> > + int i;
> > +
> > + if (x86_hyper_type != X86_HYPER_MS_HYPERV ||
> > + !x2apic_supported())
> > + return -ENODEV;
> > +
> > + fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> > + if (!fn)
> > + return -EFAULT;
>
> Why does this return -EFAULT? I guess there is no fault happening in
> irq_domain_alloc_named_id_fwnode()...

Yes, “-ENOMEM” should be more accurate.

-- 
Best regards
Tianyu Lan


Re: [PATCH 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available

2019-01-31 Thread Tianyu Lan
On Fri, Feb 1, 2019 at 3:07 PM Dan Carpenter  wrote:
>
> On Thu, Jan 31, 2019 at 06:17:31PM +0800, lantianyu1...@gmail.com wrote:
> >
> >
>
> This comment needs to be indented one tab or it looks like we're outside
> the funciton.
>
> > +/*
> > + * Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> > + * set x2apic destination mode to physcial mode when x2apic is available
> > + * and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs
> > + * have 8-bit APIC id.
> > + */
> > +# if IS_ENABLED(CONFIG_HYPERV_IOMMU)
> > + if (x2apic_supported())
> > + x2apic_phys = 1;
> > +# endif
>
> The IS_ENABLED() macro is really magical.  You could write this like so:
>
> if (IS_ENABLED(CONFIG_HYPERV_IOMMU) && x2apic_supported())
> x2apic_phys = 1;
>
> It works the same and is slightly more pleasant to look at.

Yes, that will better. Thanks for your suggestion. Dan

-- 
Best regards
Tianyu Lan


Re: [PATCH 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-01-31 Thread Tianyu Lan
Hi Vitaly:
Thanks for your review.

On Thu, Jan 31, 2019 at 10:04 PM Vitaly Kuznetsov  wrote:
>
> lantianyu1...@gmail.com writes:
>
> > From: Lan Tianyu 
> >
> > On the bare metal, enabling X2APIC mode requires interrupt remapping
> > function which helps to deliver irq to cpu with 32-bit APIC ID.
> > Hyper-V doesn't provide interrupt remapping function so far and Hyper-V
> > MSI protocol already supports to deliver interrupt to the CPU whose
> > virtual processor index is more than 255. IO-APIC interrupt still has
> > 8-bit APIC ID limitation.
> >
> > This patch is to add Hyper-V stub IOMMU driver in order to enable
> > X2APIC mode successfully in Hyper-V Linux guest. The driver returns X2APIC
> > interrupt remapping capability when X2APIC mode is available. Otherwise,
> > it creates a Hyper-V irq domain to limit IO-APIC interrupts' affinity
> > and make sure cpus assigned with IO-APIC interrupt have 8-bit APIC ID.
> >
> > Signed-off-by: Lan Tianyu 
> > ---
> >  drivers/iommu/Kconfig |   7 ++
> >  drivers/iommu/Makefile|   1 +
> >  drivers/iommu/hyperv-iommu.c  | 189 
> > ++
> >  drivers/iommu/irq_remapping.c |   3 +
> >  drivers/iommu/irq_remapping.h |   1 +
> >  5 files changed, 201 insertions(+)
> >  create mode 100644 drivers/iommu/hyperv-iommu.c
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 45d7021..5c397c0 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -437,4 +437,11 @@ config QCOM_IOMMU
> >   help
> > Support for IOMMU on certain Qualcomm SoCs.
> >
> > +config HYPERV_IOMMU
> > + bool "Hyper-V stub IOMMU support"
> > + depends on HYPERV
> > + help
> > + Hyper-V stub IOMMU driver provides capability to run
> > + Linux guest with X2APIC mode enabled.
> > +
> >  endif # IOMMU_SUPPORT
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index a158a68..8c71a15 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -32,3 +32,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
> >  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
> >  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
> >  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> > +obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
> > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > new file mode 100644
> > index 000..a64b747
> > --- /dev/null
> > +++ b/drivers/iommu/hyperv-iommu.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#define pr_fmt(fmt) "HYPERV-IR: " fmt
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "irq_remapping.h"
> > +
> > +/*
> > + * According IO-APIC spec, IO APIC has a 24-entry Interrupt
> > + * Redirection Table.
> > + */
> > +#define IOAPIC_REMAPPING_ENTRY 24
>
> KVM already defines KVM_IOAPIC_NUM_PINS - is this the same thing?

It's the same purpose  IOMMU driver is out of KVM scope and so define a new one.
Otherwise, this maybe changed in the future when add interrupt
remapping function.

>
> > +
> > +static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
> > +struct irq_domain *ioapic_ir_domain;
> > +
> > +static int hyperv_ir_set_affinity(struct irq_data *data,
> > + const struct cpumask *mask, bool force)
> > +{
> > + struct irq_data *parent = data->parent_data;
> > + struct irq_cfg *cfg = irqd_cfg(data);
> > + struct IO_APIC_route_entry *entry;
> > + cpumask_t cpumask;
> > + int ret;
> > +
> > + cpumask_andnot(, mask, _max_cpumask);
> > +
> > + /* Return error If new irq affinity is out of ioapic_max_cpumask. */
> > + if (!cpumask_empty())
> > + return -EINVAL;
> > +
> > + ret = parent->chip->irq_set_affinity(parent, mask, force);
> > + if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
> > + return ret;
> > +
> > + entry = data->chip_data;
> > + entry->dest = cfg->dest_apicid;
> > + entry->vector = cfg->vector;
> > + send_cleanup_vector(cfg);
> > +
> > + return 0;
> > +}
> > +
> > +static struct irq_chip hyperv_ir_chip = {
> > + .name   = "HYPERV-IR",
> > + .irq_ack= apic_ack_irq,
> > + .irq_set_affinity   = hyperv_ir_set_affinity,
> > +};
> > +
> > +static int hyperv_irq_remapping_alloc(struct irq_domain *domain,
> > +  unsigned int virq, unsigned int nr_irqs,
> > +  void *arg)
> > +{
> > + struct irq_alloc_info *info = arg;
> > + struct IO_APIC_route_entry *entry;
> > + struct irq_data *irq_data;
> > + struct irq_desc *desc;
> > + struct irq_cfg *cfg;
> > + int ret = 0;
> > +
> > + if (!info || info->type != X86_IRQ_ALLOC_TYPE_IOAPIC || nr_irqs > 1)
> > + return -EINVAL;
> > +
> > + ret = 

Re: [PATCH 2/3] HYPERV/IOMMU: Add Hyper-V stub IOMMU driver

2019-01-31 Thread Tianyu Lan
On Thu, Jan 31, 2019 at 7:59 PM Greg KH  wrote:
>
> On Thu, Jan 31, 2019 at 06:17:32PM +0800, lantianyu1...@gmail.com wrote:
> > --- /dev/null
> > +++ b/drivers/iommu/hyperv-iommu.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#define pr_fmt(fmt) "HYPERV-IR: " fmt
>
> Minor nit, you never do any pr_*() calls, so this isn't needed, right?

Yes, you are right. I will remove it. Sorry. I used pr_info() during
development stage and removed
them before sending patch out. Thanks.

>
> > +static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
> > +struct irq_domain *ioapic_ir_domain;
>
> Global?  Why?

It should be "static" here.

-- 
Best regards
Tianyu Lan


Re: [PATCH 1/3] x86/Hyper-V: Set x2apic destination mode to physical when x2apic is available

2019-01-31 Thread Tianyu Lan
Hi Greg:
 Thanks for your review.

On Thu, Jan 31, 2019 at 7:57 PM Greg KH  wrote:
>
> On Thu, Jan 31, 2019 at 06:17:31PM +0800, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > Hyper-V doesn't provide irq remapping for IO-APIC. To enable x2apic,
> > set x2apic destination mode to physcial mode when x2apic is available
> > and Hyper-V IOMMU driver makes sure cpus assigned with IO-APIC irqs have
> > 8-bit APIC id.
> >
> > Signed-off-by: Lan Tianyu 
> > ---
> >  arch/x86/kernel/cpu/mshyperv.c | 14 ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > index e81a2db..9d62f33 100644
> > --- a/arch/x86/kernel/cpu/mshyperv.c
> > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > @@ -36,6 +36,8 @@
> >  struct ms_hyperv_info ms_hyperv;
> >  EXPORT_SYMBOL_GPL(ms_hyperv);
> >
> > +extern int x2apic_phys;
>
> Shouldn't this be in a .h file somewhere instead?

You are right. I should use  here. Thanks.

> thanks,
>
> greg k-h



-- 
Best regards
Tianyu Lan


Re: [PATCH 9/11] KVM/MMU: Flush tlb in the kvm_mmu_write_protect_pt_masked()

2019-01-10 Thread Tianyu Lan
On Tue, Jan 8, 2019 at 12:26 AM Paolo Bonzini  wrote:
>
> On 04/01/19 09:54, lantianyu1...@gmail.com wrote:
> >   rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + 
> > __ffs(mask),
> > PT_PAGE_TABLE_LEVEL, slot);
> > - __rmap_write_protect(kvm, rmap_head, false);
> > + flush |= __rmap_write_protect(kvm, rmap_head, false);
> >
> >   /* clear the first set bit */
> >   mask &= mask - 1;
> >   }
> > +
> > + if (flush && kvm_available_flush_tlb_with_range()) {
> > + kvm_flush_remote_tlbs_with_address(kvm,
> > + slot->base_gfn + gfn_offset,
> > + hweight_long(mask));
>
> Mask is zero here, so this probably won't work.
>
> In addition, I suspect calling the hypercall once for every 64 pages is
> not very efficient.  Passing a flush list into
> kvm_mmu_write_protect_pt_masked, and flushing in
> kvm_arch_mmu_enable_log_dirty_pt_masked, isn't efficient either because
> kvm_arch_mmu_enable_log_dirty_pt_masked is also called once per word.
>
Yes, this is not efficient.

> I don't have any good ideas, except for moving the whole
> kvm_clear_dirty_log_protect loop into architecture-specific code (which
> is not the direction we want---architectures should share more code, not
> less).

kvm_vm_ioctl_clear_dirty_log/get_dirty_log()  is to get/clear dirty log with
memslot as unit. We may just flush tlbs of the affected memslot instead of
entire page table's when range flush is available.

>
> Paolo
>
> > + flush = false;
> > + }
> > +
>


--
Best regards
Tianyu Lan


Re: [PATCH 11/11] KVM/MMU: Flush tlb in the kvm_age_rmapp()

2019-01-07 Thread Tianyu Lan
Hi Paolo:
   Thanks for your review.

On Tue, Jan 8, 2019 at 12:31 AM Paolo Bonzini  wrote:
>
> On 07/01/19 04:42, Tianyu Lan wrote:
> >> I'm assuming you're
> >> clearing young to avoid the flush in kvm_mmu_notifier_clear_flush_young(),
> >> but keeping that flush is silly since it will never be invoked.  Just
> >> squash this patch with patch 10/11 so that you can remove the unnecessary
> >> flush in kvm_mmu_notifier_clear_flush_young() and preserve young.
> >>
> > The platform may provide tlb flush with address range as granularity. My 
> > changes
> > are to use range flush when it's available. 
> > kvm_mmu_notifier_clear_flush_young()
> > is common function for all platforms and most platforms still need the
> > flush in the
> > kvm_mmu_notifier_clear_flush_young(). I think it's better to separate
> > flush request and
> > "young" from return value of kvm_age_hva(). New flush parameter I
> > added in the patch 10
> > can be changed to a pointer and kvm_age_hva() can use it to return
> > flush request.
>
> There are two possibilities:
>
> - pass a "bool *flush".  If NULL, kvm_age_hva should not flush.  If not
> NULL, kvm_age_hva should receive a true *flush, and should change it to
> false if kvm_age_hva takes care of the flush
>
> - pass a "bool flush".  In patch 10, change all kvm_age_hva
> implementation to do the flush if they return 1.
>
> I think I prefer the latter, in this case the small code duplication is
> offset by a simpler API.
>

>From my understanding, this means to move the flush in the
kvm_mmu_notifier_clear_flush_young()
to kvm_age_hva() and do flush in kvm_age_hva() when young is >0 and "flush"
parameter is true, right?
-- 
Best regards
Tianyu Lan


Re: [PATCH 6/11] KVM/MMU: Flush tlb with range list in sync_page()

2019-01-06 Thread Tianyu Lan
On Sat, Jan 5, 2019 at 12:30 AM Sean Christopherson
 wrote:
>
> On Fri, Jan 04, 2019 at 04:54:00PM +0800, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > This patch is to flush tlb via flush list function.
>
> More explanation of why this is beneficial would be nice.  Without the
> context of the overall series it's not immediately obvious what
> kvm_flush_remote_tlbs_with_list() does without a bit of digging.
>
> >
> > Signed-off-by: Lan Tianyu 
> > ---
> >  arch/x86/kvm/paging_tmpl.h | 16 ++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > index 833e8855bbc9..866ccdea762e 100644
> > --- a/arch/x86/kvm/paging_tmpl.h
> > +++ b/arch/x86/kvm/paging_tmpl.h
> > @@ -973,6 +973,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, 
> > struct kvm_mmu_page *sp)
> >   bool host_writable;
> >   gpa_t first_pte_gpa;
> >   int set_spte_ret = 0;
> > + LIST_HEAD(flush_list);
> >
> >   /* direct kvm_mmu_page can not be unsync. */
> >   BUG_ON(sp->role.direct);
> > @@ -980,6 +981,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, 
> > struct kvm_mmu_page *sp)
> >   first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
> >
> >   for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
> > + int tmp_spte_ret = 0;
> >   unsigned pte_access;
> >   pt_element_t gpte;
> >   gpa_t pte_gpa;
> > @@ -1029,14 +1031,24 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, 
> > struct kvm_mmu_page *sp)
> >
> >   host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE;
> >
> > - set_spte_ret |= set_spte(vcpu, >spt[i],
> > + tmp_spte_ret = set_spte(vcpu, >spt[i],
> >pte_access, PT_PAGE_TABLE_LEVEL,
> >gfn, spte_to_pfn(sp->spt[i]),
> >true, false, host_writable);
> > +
> > + if (kvm_available_flush_tlb_with_range()
> > + && (tmp_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)) {
> > + struct kvm_mmu_page *leaf_sp = page_header(sp->spt[i]
> > + & PT64_BASE_ADDR_MASK);
> > + list_add(_sp->flush_link, _list);
> > + }
> > +
> > + set_spte_ret |= tmp_spte_ret;
> > +
> >   }
> >
> >   if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
> > - kvm_flush_remote_tlbs(vcpu->kvm);
> > + kvm_flush_remote_tlbs_with_list(vcpu->kvm, _list);
>
> This is a bit confusing and potentially fragile.  It's not obvious that
> kvm_flush_remote_tlbs_with_list() is guaranteed to call
> kvm_flush_remote_tlbs() when kvm_available_flush_tlb_with_range() is
> false, and you're relying on the kvm_flush_remote_tlbs_with_list() call
> chain to never optimize away the empty list case.  Rechecking
> kvm_available_flush_tlb_with_range() isn't expensive.

That makes sense. Will update. Thanks.

>
> >
> >   return nr_present;
> >  }
> > --
> > 2.14.4
> >



-- 
Best regards
Tianyu Lan


Re: [PATCH 11/11] KVM/MMU: Flush tlb in the kvm_age_rmapp()

2019-01-06 Thread Tianyu Lan
Hi Sean:
 Thanks for your review.

On Sat, Jan 5, 2019 at 12:12 AM Sean Christopherson
 wrote:
>
> On Fri, Jan 04, 2019 at 04:54:05PM +0800, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > This patch is to flush tlb in the kvm_age_rmapp() when tlb range flush
> > is available and flush request is true.
> >
> > Signed-off-by: Lan Tianyu 
> > ---
> >  arch/x86/kvm/mmu.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index a5728f51bf7d..bc402a72956a 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -1958,10 +1958,17 @@ static int kvm_age_rmapp(struct kvm *kvm, struct 
> > kvm_rmap_head *rmap_head,
> >   u64 *sptep;
> >   struct rmap_iterator uninitialized_var(iter);
> >   int young = 0;
> > + bool flush = (bool)data;
> >
> >   for_each_rmap_spte(rmap_head, , sptep)
> >   young |= mmu_spte_age(sptep);
> >
> > + if (young && flush) {
> > + kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > + KVM_PAGES_PER_HPAGE(level));
> > + young = 0;
> > + }
> > +
>
> young shouldn't be cleared, the tracing will be wrong and the caller
> might actually care about the return value.

Yes, this is wrong and will update.

> I'm assuming you're
> clearing young to avoid the flush in kvm_mmu_notifier_clear_flush_young(),
> but keeping that flush is silly since it will never be invoked.  Just
> squash this patch with patch 10/11 so that you can remove the unnecessary
> flush in kvm_mmu_notifier_clear_flush_young() and preserve young.
>

The platform may provide tlb flush with address range as granularity. My changes
are to use range flush when it's available. kvm_mmu_notifier_clear_flush_young()
is common function for all platforms and most platforms still need the
flush in the
kvm_mmu_notifier_clear_flush_young(). I think it's better to separate
flush request and
"young" from return value of kvm_age_hva(). New flush parameter I
added in the patch 10
can be changed to a pointer and kvm_age_hva() can use it to return
flush request.

-- 
Best regards
Tianyu Lan


Re: [PATCH V2 1/2] KVM/VMX: Check ept_pointer before flushing ept tlb

2018-12-17 Thread Tianyu Lan
On Fri, Dec 14, 2018 at 7:00 PM Paolo Bonzini  wrote:
>
> On 06/12/18 08:34, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > This patch is to initialize ept_pointer to INVALID_PAGE and check it
> > before flushing ept tlb. If ept_pointer is invalid, bypass the flush
> > request.
> >
> > Signed-off-by: Lan Tianyu 
>
> Can you explain better *why* this patch is needed?
Yes, hypercall still maybe called when ept_pointers aren't
initialized. Such case happens
during guest boot up when BP works at first and APs aren't activated.

>  Also, should vmx->ept_pointer be cleared at reset time, rather than vCPU 
> creation?
>

Yes, that make sense. Thanks for suggestion.

> Thanks,
>
> Paolo
>
> > ---
> >  arch/x86/kvm/vmx.c | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index c379d0bfdcba..6577ec8cbb0f 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -1582,11 +1582,18 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
> >   /*
> >* FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs the address of 
> > the
> >* base of EPT PML4 table, strip off EPT configuration information.
> > +  * If ept_pointer is invalid pointer, bypass the flush request.
> >*/
> >   if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
> > - kvm_for_each_vcpu(i, vcpu, kvm)
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + u64 ept_pointer = to_vmx(vcpu)->ept_pointer;
> > +
> > + if (!VALID_PAGE(ept_pointer))
> > + continue;
> > +
> >   ret |= hyperv_flush_guest_mapping(
> > - to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer & 
> > PAGE_MASK);
> > + ept_pointer & PAGE_MASK);
> > + }
> >   } else {
> >   ret = hyperv_flush_guest_mapping(
> >   to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer & 
> > PAGE_MASK);
> > @@ -11614,6 +11621,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
> > *kvm, unsigned int id)
> >   vmx->pi_desc.nv = POSTED_INTR_VECTOR;
> >   vmx->pi_desc.sn = 1;
> >
> > + vmx->ept_pointer = INVALID_PAGE;
> > +
> >   return >vcpu;
> >
> >  free_vmcs:
> >
>


-- 
Best regards
Tianyu Lan


Re: [Resend PATCH V5 7/10] KVM: Make kvm_set_spte_hva() return int

2018-12-12 Thread Tianyu Lan
Hi Paul:
 Thanks for your review.
On Wed, Dec 12, 2018 at 1:03 PM Paul Mackerras  wrote:
>
> On Thu, Dec 06, 2018 at 09:21:10PM +0800, lantianyu1...@gmail.com wrote:
> > From: Lan Tianyu 
> >
> > The patch is to make kvm_set_spte_hva() return int and caller can
> > check return value to determine flush tlb or not.
>
> It would be helpful if the patch description told the reader which
> return value(s) mean that the caller should flush the tlb.  I would
> guess that non-zero means to do the flush, but you should make that
> explicit.

OK. Thanks for suggestion and will update in the next version.

>
> > Signed-off-by: Lan Tianyu 
>
> For the powerpc bits:
>
> Acked-by: Paul Mackerras 



-- 
Best regards
Tianyu Lan


Re: [PATCH] KVM/VMX: Check ept_pointer before flushing ept tlb

2018-11-07 Thread Tianyu Lan




On 11/7/2018 6:49 PM, Vitaly Kuznetsov wrote:

Tianyu Lan  writes:


Hi Vitaly:
Thanks for your review.

On 11/6/2018 11:50 PM, Vitaly Kuznetsov wrote:

ltyker...@gmail.com writes:


From: Lan Tianyu 

This patch is to initialize ept_pointer to INVALID_PAGE and check it
before flushing ept tlb. If ept_pointer is invalidated, bypass the flush
request.



To be honest I fail to understand the reason behind the patch: instead
of doing one unneeded flush request with ept_pointer==0 (after vCPU is
initialized) we now do the check every time. Could you please elaborate
on why this is needed?


The reason to introduce the check here is to avoid flushing ept tlb
without valid ept table. When nested guest boots up and only BP is
active, we should not do flush for APs and L1 hypervisor hasn't set
valid EPT table for APs.


Yes, I understand that but I'm trying to avoid additional checks on
hotpath as during normal operation EPT pointer is always set.

Could we just initialize ept_pointers_match to something like
EPT_POINTERS_NOTSET and achive the same result?


vmx->ept_pointers_match presents match status of all vcpus' ept table. 
EPT_POINTER_NOSET should be per cpu status and so I select ept_pointer 
as check condition.


BTW, I think we may remove the check for match case which is normal 
status and all ept pointers should be set at that point. Mismatch status 
should be corner case when VM runs and this will not affect a lot.


Re: [PATCH] KVM/VMX: Check ept_pointer before flushing ept tlb

2018-11-07 Thread Tianyu Lan




On 11/7/2018 6:49 PM, Vitaly Kuznetsov wrote:

Tianyu Lan  writes:


Hi Vitaly:
Thanks for your review.

On 11/6/2018 11:50 PM, Vitaly Kuznetsov wrote:

ltyker...@gmail.com writes:


From: Lan Tianyu 

This patch is to initialize ept_pointer to INVALID_PAGE and check it
before flushing ept tlb. If ept_pointer is invalidated, bypass the flush
request.



To be honest I fail to understand the reason behind the patch: instead
of doing one unneeded flush request with ept_pointer==0 (after vCPU is
initialized) we now do the check every time. Could you please elaborate
on why this is needed?


The reason to introduce the check here is to avoid flushing ept tlb
without valid ept table. When nested guest boots up and only BP is
active, we should not do flush for APs and L1 hypervisor hasn't set
valid EPT table for APs.


Yes, I understand that but I'm trying to avoid additional checks on
hotpath as during normal operation EPT pointer is always set.

Could we just initialize ept_pointers_match to something like
EPT_POINTERS_NOTSET and achive the same result?


vmx->ept_pointers_match presents match status of all vcpus' ept table. 
EPT_POINTER_NOSET should be per cpu status and so I select ept_pointer 
as check condition.


BTW, I think we may remove the check for match case which is normal 
status and all ept pointers should be set at that point. Mismatch status 
should be corner case when VM runs and this will not affect a lot.


Re: [PATCH] KVM/VMX: Check ept_pointer before flushing ept tlb

2018-11-06 Thread Tianyu Lan

Hi Vitaly:
Thanks for your review.

On 11/6/2018 11:50 PM, Vitaly Kuznetsov wrote:

ltyker...@gmail.com writes:


From: Lan Tianyu 

This patch is to initialize ept_pointer to INVALID_PAGE and check it
before flushing ept tlb. If ept_pointer is invalidated, bypass the flush
request.

Signed-off-by: Lan Tianyu 
---
  arch/x86/kvm/vmx.c | 16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4555077d69ce..edbc96cb990a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1580,14 +1580,22 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
/*
 * FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs the address of the
 * base of EPT PML4 table, strip off EPT configuration information.
+* If ept_pointer is invalid pointer, bypass the flush request.
 */
if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
-   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (!VALID_PAGE(to_vmx(vcpu)->ept_pointer))
+   return 0;
+


To be honest I fail to understand the reason behind the patch: instead
of doing one unneeded flush request with ept_pointer==0 (after vCPU is
initialized) we now do the check every time. Could you please elaborate
on why this is needed?


The reason to introduce the check here is to avoid flushing ept tlb
without valid ept table. When nested guest boots up and only BP is
active, we should not do flush for APs and L1 hypervisor hasn't set
valid EPT table for APs.




ret |= hyperv_flush_guest_mapping(
-   to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer & 
PAGE_MASK);
+   to_vmx(vcpu)->ept_pointer & PAGE_MASK);


I would use a local variable for 'to_vmx(vcpu)->ept_pointer' or even
'to_vmx(vcpu)->ept_pointer & PAGE_MASK' and use it in VALID_PAGE() - as
lower bits are unrelated;


Yes, that makes sense. INVALID_PAGE also contains lower bits and so a 
local variable for 'to_vmx(vcpu)->ept_pointer' maybe better.







+   }
} else {
+   if (!VALID_PAGE(to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer))
+   return 0;


Ditto.


+
ret = hyperv_flush_guest_mapping(
-   to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer & 
PAGE_MASK);
+   to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer & PAGE_MASK);


This doesn't belong to this patch.


I found the line exceeds 80 chars and so adjust indent. Maybe I should 
change it in a separate patch despite it's a small change.





}
  
  	spin_unlock(_kvm_vmx(kvm)->ept_pointer_lock);

@@ -11568,6 +11576,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
vmx->pi_desc.nv = POSTED_INTR_VECTOR;
vmx->pi_desc.sn = 1;
  
+	vmx->ept_pointer = INVALID_PAGE;

+
return >vcpu;
  
  free_vmcs:




Re: [PATCH] KVM/VMX: Check ept_pointer before flushing ept tlb

2018-11-06 Thread Tianyu Lan

Hi Vitaly:
Thanks for your review.

On 11/6/2018 11:50 PM, Vitaly Kuznetsov wrote:

ltyker...@gmail.com writes:


From: Lan Tianyu 

This patch is to initialize ept_pointer to INVALID_PAGE and check it
before flushing ept tlb. If ept_pointer is invalidated, bypass the flush
request.

Signed-off-by: Lan Tianyu 
---
  arch/x86/kvm/vmx.c | 16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4555077d69ce..edbc96cb990a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1580,14 +1580,22 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
/*
 * FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs the address of the
 * base of EPT PML4 table, strip off EPT configuration information.
+* If ept_pointer is invalid pointer, bypass the flush request.
 */
if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
-   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (!VALID_PAGE(to_vmx(vcpu)->ept_pointer))
+   return 0;
+


To be honest I fail to understand the reason behind the patch: instead
of doing one unneeded flush request with ept_pointer==0 (after vCPU is
initialized) we now do the check every time. Could you please elaborate
on why this is needed?


The reason to introduce the check here is to avoid flushing ept tlb
without valid ept table. When nested guest boots up and only BP is
active, we should not do flush for APs and L1 hypervisor hasn't set
valid EPT table for APs.




ret |= hyperv_flush_guest_mapping(
-   to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer & 
PAGE_MASK);
+   to_vmx(vcpu)->ept_pointer & PAGE_MASK);


I would use a local variable for 'to_vmx(vcpu)->ept_pointer' or even
'to_vmx(vcpu)->ept_pointer & PAGE_MASK' and use it in VALID_PAGE() - as
lower bits are unrelated;


Yes, that makes sense. INVALID_PAGE also contains lower bits and so a 
local variable for 'to_vmx(vcpu)->ept_pointer' maybe better.







+   }
} else {
+   if (!VALID_PAGE(to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer))
+   return 0;


Ditto.


+
ret = hyperv_flush_guest_mapping(
-   to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer & 
PAGE_MASK);
+   to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer & PAGE_MASK);


This doesn't belong to this patch.


I found the line exceeds 80 chars and so adjust indent. Maybe I should 
change it in a separate patch despite it's a small change.





}
  
  	spin_unlock(_kvm_vmx(kvm)->ept_pointer_lock);

@@ -11568,6 +11576,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
vmx->pi_desc.nv = POSTED_INTR_VECTOR;
vmx->pi_desc.sn = 1;
  
+	vmx->ept_pointer = INVALID_PAGE;

+
return >vcpu;
  
  free_vmcs:




  1   2   3   4   >