RE: [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver

2021-09-01 Thread Michael Kelley via iommu
From: Michael Kelley  Sent: Wednesday, September 1, 
2021 7:34 PM

[snip]

> > +int netvsc_dma_map(struct hv_device *hv_dev,
> > +  struct hv_netvsc_packet *packet,
> > +  struct hv_page_buffer *pb)
> > +{
> > +   u32 page_count =  packet->cp_partial ?
> > +   packet->page_buf_cnt - packet->rmsg_pgcnt :
> > +   packet->page_buf_cnt;
> > +   dma_addr_t dma;
> > +   int i;
> > +
> > +   if (!hv_is_isolation_supported())
> > +   return 0;
> > +
> > +   packet->dma_range = kcalloc(page_count,
> > +   sizeof(*packet->dma_range),
> > +   GFP_KERNEL);
> > +   if (!packet->dma_range)
> > +   return -ENOMEM;
> > +
> > +   for (i = 0; i < page_count; i++) {
> > +   char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
> > ++ pb[i].offset);
> > +   u32 len = pb[i].len;
> > +
> > +   dma = dma_map_single(_dev->device, src, len,
> > +DMA_TO_DEVICE);
> > +   if (dma_mapping_error(_dev->device, dma)) {
> > +   kfree(packet->dma_range);
> > +   return -ENOMEM;
> > +   }
> > +
> > +   packet->dma_range[i].dma = dma;
> > +   packet->dma_range[i].mapping_size = len;
> > +   pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
> > +   pb[i].offset = offset_in_hvpage(dma);
> > +   pb[i].len = len;
> > +   }
> 
> Just to confirm, this driver does *not* set the DMA min_align_mask
> like storvsc does.  So after the call to dma_map_single(), the offset
> in the page could be different.  That's why you are updating
> the pb[i].offset value.  Alternatively, you could set the DMA
> min_align_mask, which would ensure the offset is unchanged.
> I'm OK with either approach, though perhaps a comment is
> warranted to explain, as this is a subtle issue.
> 

On second thought, I don't think either approach is OK.  The default
alignment in the swiotlb is 2K, and if the length of the data in the
buffer was 3K, the data could cross a page boundary in the bounce
buffer when it originally did not.  This would break the above code
which can only deal with one page at a time.  So I think the netvsc
driver also must set the DMA min_align_mask to 4K, which will
preserve the offset.

Michael
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 05/13] hyperv: Add Write/Read MSR registers via ghcb page

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyperv provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyperv requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v1:
>  * Introduce sev_es_ghcb_hv_call_simple() and share code
>between SEV and Hyper-V code.
> Change since v3:
>  * Pass old_msg_type to hv_signal_eom() as parameter.
>* Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
>* Add hv_isolation_type_snp() weak function.
>* Add maros to set syinc register in ARM code.
> ---
>  arch/arm64/include/asm/mshyperv.h |  23 ++
>  arch/x86/hyperv/hv_init.c |  36 ++
>  arch/x86/hyperv/ivm.c | 112 ++
>  arch/x86/include/asm/mshyperv.h   |  80 -
>  arch/x86/include/asm/sev.h|   3 +
>  arch/x86/kernel/sev-shared.c  |  63 ++---
>  drivers/hv/hv.c   | 112 --
>  drivers/hv/hv_common.c|   6 ++
>  include/asm-generic/mshyperv.h|   4 +-
>  9 files changed, 345 insertions(+), 94 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mshyperv.h 
> b/arch/arm64/include/asm/mshyperv.h
> index 20070a847304..ced83297e009 100644
> --- a/arch/arm64/include/asm/mshyperv.h
> +++ b/arch/arm64/include/asm/mshyperv.h
> @@ -41,6 +41,29 @@ static inline u64 hv_get_register(unsigned int reg)
>   return hv_get_vpreg(reg);
>  }
> 
> +#define hv_get_simp(val) { val = hv_get_register(HV_REGISTER_SIMP); }
> +#define hv_set_simp(val) hv_set_register(HV_REGISTER_SIMP, val)
> +
> +#define hv_get_siefp(val){ val = hv_get_register(HV_REGISTER_SIEFP); }
> +#define hv_set_siefp(val)hv_set_register(HV_REGISTER_SIEFP, val)
> +
> +#define hv_get_synint_state(int_num, val) {  \
> + val = hv_get_register(HV_REGISTER_SINT0 + int_num); \
> + }
> +
> +#define hv_set_synint_state(int_num, val)\
> + hv_set_register(HV_REGISTER_SINT0 + int_num, val)
> +
> +#define hv_get_synic_state(val) {\
> + val = hv_get_register(HV_REGISTER_SCONTROL);\
> + }
> +
> +#define hv_set_synic_state(val)  \
> + hv_set_register(HV_REGISTER_SCONTROL, val)
> +
> +#define hv_signal_eom(old_msg_type)   \
> + hv_set_register(HV_REGISTER_EOM, 0)
> +
>  /* SMCCC hypercall parameters */
>  #define HV_SMCCC_FUNC_NUMBER 1
>  #define HV_FUNC_ID   ARM_SMCCC_CALL_VAL( \
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b1aa42f60faa..be6210a3fd2f 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> -void __percpu **hv_ghcb_pg;
> +union hv_ghcb __percpu **hv_ghcb_pg;
> 
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> @@ -406,7 +406,7 @@ void __init hyperv_init(void)
>   }
> 
>   if (hv_isolation_type_snp()) {
> - hv_ghcb_pg = alloc_percpu(void *);
> + hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
>   if (!hv_ghcb_pg)
>   goto free_vp_assist_page;
>   }
> @@ -424,6 +424,9 @@ void __init hyperv_init(void)
>   guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> +
>   hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
>   VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -501,6 +504,7 @@ void __init hyperv_init(void)
> 
>  clean_guest_os_id:
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>   cpuhp_remove_state(cpuhp);
>  free_ghcb_page:
>   free_percpu(hv_ghcb_pg);
> @@ -522,6 +526,7 @@ void hyperv_cleanup(void)
> 
>   /* Reset our OS id */
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>   /*
>* Reset hypercall page reference before reset the page,
> @@ -592,30 +597,3 @@ bool hv_is_hyperv_initialized(void)
>   return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> - if (!(ms_hyperv.priv_high & HV_ISOLATION))
> - return 

RE: [PATCH V4 12/13] hv_netvsc: Add Isolation VM support for netvsc driver

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma adress will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Add comment to explain why not to use dma_map_sg()
>   * Fix some error handle.
> ---
>  arch/x86/hyperv/ivm.c |   1 +
>  drivers/net/hyperv/hyperv_net.h   |   5 ++
>  drivers/net/hyperv/netvsc.c   | 135 +-
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h|   5 ++
>  5 files changed, 145 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 84563b3c9f3a..08d8e01de017 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -317,6 +317,7 @@ void *hv_map_memory(void *addr, unsigned long size)
> 
>   return vaddr;
>  }
> +EXPORT_SYMBOL_GPL(hv_map_memory);
> 
>  void hv_unmap_memory(void *addr)
>  {
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index aa7c9962dbd8..862419912bfb 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>   u32 total_bytes;
>   u32 send_buf_index;
>   u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
> 
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + void *recv_original_buf;
>   u32 recv_buf_size; /* allocated bytes */
>   u32 recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
> @@ -1082,6 +1084,7 @@ struct netvsc_device {
> 
>   /* Send buffer allocated by us */
>   void *send_buf;
> + void *send_original_buf;
>   u32 send_buf_size;
>   u32 send_buf_gpadl_handle;
>   u32 send_section_cnt;
> @@ -1731,4 +1734,6 @@ struct rndis_message {
>  #define RETRY_US_HI  1
>  #define RETRY_MAX2000/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +   struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index f19b6a63..edd336b08c2c 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
>   int i;
> 
>   kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_original_buf) {
> + vunmap(nvdev->recv_buf);

In patch 11, you have added a hv_unmap_memory()
function as the inverse of hv_map_memory().  Since this
buffer was mapped with hv_map_memory() and you have
added that function, the cleanup should use
hv_unmap_memory() rather than calling vunmap() directly.

> + vfree(nvdev->recv_original_buf);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_original_buf) {
> + vunmap(nvdev->send_buf);

Same here.

> + vfree(nvdev->send_original_buf);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
>   kfree(nvdev->send_section_map);
> 
>   for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -347,6 +360,7 @@ static int netvsc_init_buf(struct hv_device *device,
>   unsigned int buf_size;
>   size_t map_words;
>   int i, ret = 0;
> + void *vaddr;
> 
>   /* Get receive buffer area. */
>   buf_size = device_info->recv_sections * device_info->recv_section_size;
> @@ -382,6 +396,17 @@ static int netvsc_init_buf(struct hv_device *device,
>   goto cleanup;
>   }
> 
> + if (hv_isolation_type_snp()) {
> + vaddr = hv_map_memory(net_device->recv_buf, buf_size);

Since the netvsc driver is architecture neutral, this code also needs
to compile for ARM64.  A stub will be needed for hv_map_memory()
on the ARM64 side.  Same for hv_unmap_memory() as suggested
above.  Or better, move hv_map_memory() and hv_unmap_memory()
to an architecture neutral module such as hv_common.c.

Or if Christop's approach of creating the vmap_phys_addr() helper
comes to fruition, that's an even better approach since it will already
handle multiple architectures.

> + if (!vaddr) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + net_device->recv_original_buf = net_device->recv_buf;
> + net_device->recv_buf = vaddr;
> + }
> +
>   /* Notify the 

RE: [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Per previous comment, the Subject line tag should be "scsi: storvsc: "

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still needs to be handled. Use DMA API(dma_map_sg) to map
> these memory during sending/receiving packet and return swiotlb bounce
> buffer dma address. In Isolation VM, swiotlb  bounce buffer is marked
> to be visible to host and the swiotlb force mode is enabled.
> 
> Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
> keep the original data offset in the bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Rplace dma_map_page with dma_map_sg()
>   * Use for_each_sg() to populate payload->range.pfn_array.
>   * Remove storvsc_dma_map macro
> ---
>  drivers/hv/vmbus_drv.c |  1 +
>  drivers/scsi/storvsc_drv.c | 41 +++---
>  include/linux/hyperv.h |  1 +
>  3 files changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index f068e22a5636..270d526fd9de 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2124,6 +2124,7 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   hv_debug_add_dev_dir(child_device_obj);
> 
>   child_device_obj->device.dma_mask = _dma_mask;
> + child_device_obj->device.dma_parms = _device_obj->dma_parms;
>   return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 328bb961c281..4f1793be1fdc 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
>  #include 
>  #include 
>  #include 
> @@ -1312,6 +1314,9 @@ static void storvsc_on_channel_callback(void *context)
>   continue;
>   }
>   request = (struct storvsc_cmd_request 
> *)scsi_cmd_priv(scmnd);
> + if (scsi_sg_count(scmnd))
> + dma_unmap_sg(>device, 
> scsi_sglist(scmnd),
> +  scsi_sg_count(scmnd), 
> scmnd->sc_data_direction);

Use scsi_dma_unmap(), which does exactly what you have written
above. :-)

>   }
> 
>   storvsc_on_receive(stor_device, packet, request);
> @@ -1725,7 +1730,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
> struct scsi_cmnd *scmnd)
>   struct hv_host_device *host_dev = shost_priv(host);
>   struct hv_device *dev = host_dev->dev;
>   struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> - int i;
>   struct scatterlist *sgl;
>   unsigned int sg_count;
>   struct vmscsi_request *vm_srb;
> @@ -1807,10 +1811,11 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload_sz = sizeof(cmd_request->mpb);
> 
>   if (sg_count) {
> - unsigned int hvpgoff, hvpfns_to_add;
>   unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>   unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> - u64 hvpfn;
> + struct scatterlist *sg;
> + unsigned long hvpfn, hvpfns_to_add;
> + int j, i = 0;
> 
>   if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1824,31 +1829,16 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload->range.len = length;
>   payload->range.offset = offset_in_hvpg;
> 
> + if (dma_map_sg(>device, sgl, sg_count,
> + scmnd->sc_data_direction) == 0)
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> 
> - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
> - /*
> -  * Init values for the current sgl entry. hvpgoff
> -  * and hvpfns_to_add are in units of Hyper-V size
> -  * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
> -  * case also handles values of sgl->offset that are
> -  * larger than PAGE_SIZE. Such offsets are handled
> -  * even on other than the first sgl entry, provided
> -  * they are a multiple of PAGE_SIZE.
> -  */

Any reason not to keep this comment?  It's still correct and
mentions important cases that must be handled.

> - hvpgoff = HVPFN_DOWN(sgl->offset);
> - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
> - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
> - 

Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE

2021-09-01 Thread Alyssa Rosenzweig
> My biggest issue is that I do not understand how this function is supposed
> to be used correctly. It would work fine as-is if it only ever gets passed 
> buffers
> allocated by the coherent API but there's not way to check or guarantee that.
> There may also be callers making assumptions that no longer hold when
> iovad->granule > PAGE_SIZE.
> 
> Regarding your case: I'm not convinced the function is meant to be used there.
> If I understand it correctly, your code first allocates memory with 
> dma_alloc_coherent
> (which possibly creates a sgt internally and then maps it with iommu_map_sg),
> then coerces that back into a sgt with dma_get_sgtable, and then maps that 
> sgt to
> another iommu domain with dma_map_sg while assuming that the result will be 
> contiguous
> in IOVA space. It'll work out because dma_alloc_coherent is the very thing
> meant to allocate pages that can be mapped into kernel and device VA space
> as a single contiguous block and because both of your IOMMUs are different
> instances of the same HW block. Anything allocated by dma_alloc_coherent for 
> the
> first IOMMU will have the right shape that will allow it to be mapped as
> a single contiguous block for the second IOMMU.
> 
> What could be done in your case is to instead use the IOMMU API,
> allocate the pages yourself (while ensuring the sgt your create is made up
> of blocks with size and physaddr aligned to max(domain_a->granule, 
> domain_b->granule))
> and then just use iommu_map_sg for both domains which actually comes with the
> guarantee that the result will be a single contiguous block in IOVA space and
> doesn't required the sgt roundtrip.

In principle I agree. I am getting the sense this function can't be used
correctly in general, and yet is the function that's meant to be used.
If my interpretation of prior LKML discussion holds, the problems are
far deeper than my code or indeed page size problems...

If the right way to handle this is with the IOMMU and IOVA APIs, I really wish
that dance were wrapped up in a safe helper function instead of open
coding it in every driver that does cross device sharing.

We might even call that helper... hmm... dma_map_sg *ducks*

For context for people-other-than-Sven, the code in question from my
tree appears below the break.

-

/*
 * Allocate an IOVA contiguous buffer mapped to the DCP. The buffer need not be
 * physically contigiuous, however we should save the sgtable in case the
 * buffer needs to be later mapped for PIODMA.
 */
static bool dcpep_cb_allocate_buffer(struct apple_dcp *dcp, void *out, void *in)
{
struct dcp_allocate_buffer_resp *resp = out;
struct dcp_allocate_buffer_req *req = in;
void *buf;

resp->dva_size = ALIGN(req->size, 4096);
resp->mem_desc_id = ++dcp->nr_mappings;

if (resp->mem_desc_id >= ARRAY_SIZE(dcp->mappings)) {
dev_warn(dcp->dev, "DCP overflowed mapping table, ignoring");
return true;
}

buf = dma_alloc_coherent(dcp->dev, resp->dva_size, >dva,
 GFP_KERNEL);

dma_get_sgtable(dcp->dev, >mappings[resp->mem_desc_id], buf,
resp->dva, resp->dva_size);

WARN_ON(resp->mem_desc_id == 0);
return true;
}

/*
 * Callback to map a buffer allocated with allocate_buf for PIODMA usage.
 * PIODMA is separate from the main DCP and uses own IOVA space on a dedicated
 * stream of the display DART, rather than the expected DCP DART.
 *
 * XXX: This relies on dma_get_sgtable in concert with dma_map_sgtable, which
 * is a "fundamentally unsafe" operation according to the docs. And yet
 * everyone does it...
 */
static bool dcpep_cb_map_piodma(struct apple_dcp *dcp, void *out, void *in)
{
struct dcp_map_buf_resp *resp = out;
struct dcp_map_buf_req *req = in;
struct sg_table *map;

if (req->buffer >= ARRAY_SIZE(dcp->mappings))
goto reject;

map = >mappings[req->buffer];

if (!map->sgl)
goto reject;

/* XNU leaks a kernel VA here, breaking kASLR. Don't do that. */
resp->vaddr = 0;

/* Use PIODMA device instead of DCP to map against the right IOMMU. */
resp->ret = dma_map_sgtable(dcp->piodma, map, DMA_BIDIRECTIONAL, 0);

if (resp->ret)
dev_warn(dcp->dev, "failed to map for piodma %d\n", resp->ret);
else
resp->dva = sg_dma_address(map->sgl);

resp->ret = 0;
return true;

reject:
dev_warn(dcp->dev, "denying map of invalid buffer %llx for pidoma\n",
 req->buffer);
resp->ret = EINVAL;
return true;
}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 11/13] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Swiotlb bounce buffer code calls dma_map_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space. Populate dma memory decrypted ops with hv
> map/unmap function.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> The map function vmap_pfn() can't work in the early place
> hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
> buffer in the hyperv_iommu_swiotlb_later_init().
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>* Get hyperv bounce bufffer size via default swiotlb
>bounce buffer size function and keep default size as
>same as the one in the AMD SEV VM.
> ---
>  arch/x86/hyperv/ivm.c   | 28 +++
>  arch/x86/include/asm/mshyperv.h |  2 ++
>  arch/x86/mm/mem_encrypt.c   |  3 +-
>  arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
>  drivers/hv/vmbus_drv.c  |  3 ++
>  drivers/iommu/hyperv-iommu.c| 61 +
>  include/linux/hyperv.h  |  1 +
>  7 files changed, 99 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index e761c67e2218..84563b3c9f3a 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -294,3 +294,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int 
> numpages, bool visible)
> 
>   return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
>  }
> +
> +/*
> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> + */
> +void *hv_map_memory(void *addr, unsigned long size)
> +{
> + unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
> +   sizeof(unsigned long), GFP_KERNEL);

Should be PAGE_SIZE, not HV_HYP_PAGE_SIZE, since this code
only manipulates guest page tables.  There's no communication with
Hyper-V that requires HV_HYP_PAGE_SIZE.

> + void *vaddr;
> + int i;
> +
> + if (!pfns)
> + return NULL;
> +
> + for (i = 0; i < size / PAGE_SIZE; i++)
> + pfns[i] = virt_to_hvpfn(addr + i * PAGE_SIZE) +

Use virt_to_pfn(), not virt_to_hvpfn(), for the same reason.

> + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
> +
> + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
> + kfree(pfns);
> +
> + return vaddr;
> +}
> +
> +void hv_unmap_memory(void *addr)
> +{
> + vunmap(addr);
> +}
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index b77f4caee3ee..627fcf8d443c 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -252,6 +252,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct 
> hv_interrupt_entry *entry);
>  int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  enum hv_mem_host_visibility visibility);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool 
> visible);
> +void *hv_map_memory(void *addr, unsigned long size);
> +void hv_unmap_memory(void *addr);
>  void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
>  void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
>  void hv_signal_eom_ghcb(void);
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ff08dc463634..e2db0b8ed938 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "mm_internal.h"
> 
> @@ -202,7 +203,7 @@ void __init sev_setup_arch(void)
>   phys_addr_t total_mem = memblock_phys_mem_size();
>   unsigned long size;
> 
> - if (!sev_active())
> + if (!sev_active() && !hv_is_isolation_supported())
>   return;
> 
>   /*
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 54f9aa7e8457..43bd031aa332 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ 

RE: [PATCH V4 08/13] hyperv/vmbus: Initialize VMbus ring buffer for Isolation VM

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Subject tag should be "Drivers: hv: vmbus: "

> VMbus ring buffer are shared with host and it's need to
> be accessed via extra address space of Isolation VM with
> AMD SNP support. This patch is to map the ring buffer
> address in extra address space via vmap_pfn(). Hyperv set
> memory host visibility hvcall smears data in the ring buffer
> and so reset the ring buffer memory to zero after mapping.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Remove hv_ringbuffer_post_init(), merge map
>   operation for Isolation VM into hv_ringbuffer_init()
>   * Call hv_ringbuffer_init() after __vmbus_establish_gpadl().
> ---
>  drivers/hv/Kconfig   |  1 +
>  drivers/hv/channel.c | 19 +++---
>  drivers/hv/ring_buffer.c | 56 ++--
>  3 files changed, 54 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index d1123ceb38f3..dd12af20e467 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -8,6 +8,7 @@ config HYPERV
>   || (ARM64 && !CPU_BIG_ENDIAN))
>   select PARAVIRT
>   select X86_HV_CALLBACK_VECTOR if X86
> + select VMAP_PFN
>   help
> Select this option to run Linux as a Hyper-V client operating
> system.
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 82650beb3af0..81f8629e4491 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -679,15 +679,6 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   if (!newchannel->max_pkt_size)
>   newchannel->max_pkt_size = VMBUS_DEFAULT_MAX_PKT_SIZE;
> 
> - err = hv_ringbuffer_init(>outbound, page, send_pages, 0);
> - if (err)
> - goto error_clean_ring;
> -
> - err = hv_ringbuffer_init(>inbound, [send_pages],
> -  recv_pages, newchannel->max_pkt_size);
> - if (err)
> - goto error_clean_ring;
> -
>   /* Establish the gpadl for the ring buffer */
>   newchannel->ringbuffer_gpadlhandle = 0;
> 
> @@ -699,6 +690,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   if (err)
>   goto error_clean_ring;
> 
> + err = hv_ringbuffer_init(>outbound,
> +  page, send_pages, 0);
> + if (err)
> + goto error_free_gpadl;
> +
> + err = hv_ringbuffer_init(>inbound, [send_pages],
> +  recv_pages, newchannel->max_pkt_size);
> + if (err)
> + goto error_free_gpadl;
> +
>   /* Create and init the channel open message */
>   open_info = kzalloc(sizeof(*open_info) +
>  sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 2aee356840a2..24d64d18eb65 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include "hyperv_vmbus.h"
> 
> @@ -183,8 +185,10 @@ void hv_ringbuffer_pre_init(struct vmbus_channel 
> *channel)
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  struct page *pages, u32 page_cnt, u32 max_pkt_size)
>  {
> - int i;
>   struct page **pages_wraparound;
> + unsigned long *pfns_wraparound;
> + u64 pfn;
> + int i;
> 
>   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
> 
> @@ -192,23 +196,49 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
> *ring_info,
>* First page holds struct hv_ring_buffer, do wraparound mapping for
>* the rest.
>*/
> - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> -GFP_KERNEL);
> - if (!pages_wraparound)
> - return -ENOMEM;
> + if (hv_isolation_type_snp()) {
> + pfn = page_to_pfn(pages) +
> + HVPFN_DOWN(ms_hyperv.shared_gpa_boundary);

Use PFN_DOWN, not HVPFN_DOWN.  This is all done in units of guest page
size, not Hyper-V page size.

> 
> - pages_wraparound[0] = pages;
> - for (i = 0; i < 2 * (page_cnt - 1); i++)
> - pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
> + pfns_wraparound = kcalloc(page_cnt * 2 - 1,
> + sizeof(unsigned long), GFP_KERNEL);
> + if (!pfns_wraparound)
> + return -ENOMEM;
> 
> - ring_info->ring_buffer = (struct hv_ring_buffer *)
> - vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
> + pfns_wraparound[0] = pfn;
> + for (i = 0; i < 2 * (page_cnt - 1); i++)
> + pfns_wraparound[i + 1] = pfn + i % (page_cnt - 1) + 1;
> 
> - kfree(pages_wraparound);
> + ring_info->ring_buffer = (struct hv_ring_buffer *)
> + vmap_pfn(pfns_wraparound, page_cnt * 2 - 1,
> +  

RE: [PATCH V4 07/13] hyperv/Vmbus: Add SNP support for VMbus channel initiate message

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Subject line tag should be "Drivers: hv: vmbus:"

> The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
> with host in Isolation VM and so it's necessary to use hvcall to set
> them visible to host. In Isolation VM with AMD SEV SNP, the access
> address should be in the extra space which is above shared gpa
> boundary. So remap these pages into the extra address(pa +
> shared_gpa_boundary).
> 
> Introduce monitor_pages_original[] in the struct vmbus_connection
> to store monitor page virtual address returned by hv_alloc_hyperv_
> zeroed_page() and free monitor page via monitor_pages_original in
> the vmbus_disconnect(). The monitor_pages[] is to used to access
> monitor page and it is initialized to be equal with monitor_pages_
> original. The monitor_pages[] will be overridden in the isolation VM
> with va of extra address.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Rename monitor_pages_va with monitor_pages_original
>   * free monitor page via monitor_pages_original and
> monitor_pages is used to access monitor page.
> 
> Change since v1:
> * Not remap monitor pages in the non-SNP isolation VM.
> ---
>  drivers/hv/connection.c   | 75 ---
>  drivers/hv/hyperv_vmbus.h |  1 +
>  2 files changed, 72 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6d315c1465e0..9a48d8115c87 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "hyperv_vmbus.h"
> @@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
> 
>   msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
>   msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> + if (hv_isolation_type_snp()) {
> + msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
> + msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
> + }
> +
>   msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
> 
>   /*
> @@ -148,6 +155,35 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>   return -ECONNREFUSED;
>   }
> 
> +
> + if (hv_is_isolation_supported()) {
> + if (hv_isolation_type_snp()) {
> + vmbus_connection.monitor_pages[0]
> + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[0])
> + return -ENOMEM;
> +
> + vmbus_connection.monitor_pages[1]
> + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[1]) {
> + memunmap(vmbus_connection.monitor_pages[0]);
> + return -ENOMEM;
> + }
> + }
> +
> + /*
> +  * Set memory host visibility hvcall smears memory
> +  * and so zero monitor pages here.
> +  */
> + memset(vmbus_connection.monitor_pages[0], 0x00,
> +HV_HYP_PAGE_SIZE);
> + memset(vmbus_connection.monitor_pages[1], 0x00,
> +HV_HYP_PAGE_SIZE);
> +
> + }

I still find it somewhat confusing to have the handling of the
shared_gpa_boundary and memory mapping in the function for
negotiating the VMbus version.  I think the code works as written,
but it would seem cleaner and easier to understand to precompute
the physical addresses and do all the mapping and memory zero'ing
in a single place in vmbus_connect().  Then the negotiate version
function can focus on doing only the version negotiation.

> +
>   return ret;
>  }
> 
> @@ -159,6 +195,7 @@ int vmbus_connect(void)
>   struct vmbus_channel_msginfo *msginfo = NULL;
>   int i, ret = 0;
>   __u32 version;
> + u64 pfn[2];
> 
>   /* Initialize the vmbus connection */
>   vmbus_connection.conn_state = CONNECTING;
> @@ -216,6 +253,21 @@ int vmbus_connect(void)
>   goto cleanup;
>   }
> 
> + vmbus_connection.monitor_pages_original[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages_original[1]
> + = vmbus_connection.monitor_pages[1];
> +
> + if (hv_is_isolation_supported()) {
> + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> + if (hv_mark_gpa_visibility(2, pfn,
> + VMBUS_PAGE_VISIBLE_READ_WRITE)) {

In Patch 4 of 

RE: [PATCH V4 06/13] hyperv: Add ghcb hvcall support for SNP VM

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 

Subject line tag should probably be "x86/hyperv:" since the majority
of the code added is under arch/x86.

> hyperv provides ghcb hvcall to handle VMBus
> HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
> msg in SNP Isolation VM. Add such support.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Add hv_ghcb_hypercall() stub function to avoid
> compile error for ARM.
> ---
>  arch/x86/hyperv/ivm.c  | 71 ++
>  drivers/hv/connection.c|  6 ++-
>  drivers/hv/hv.c|  8 +++-
>  drivers/hv/hv_common.c |  6 +++
>  include/asm-generic/mshyperv.h |  1 +
>  5 files changed, 90 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index f56fe4f73000..e761c67e2218 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -17,10 +17,81 @@
>  #include 
>  #include 
> 
> +#define GHCB_USAGE_HYPERV_CALL   1
> +
>  union hv_ghcb {
>   struct ghcb ghcb;
> + struct {
> + u64 hypercalldata[509];
> + u64 outputgpa;
> + union {
> + union {
> + struct {
> + u32 callcode: 16;
> + u32 isfast  : 1;
> + u32 reserved1   : 14;
> + u32 isnested: 1;
> + u32 countofelements : 12;
> + u32 reserved2   : 4;
> + u32 repstartindex   : 12;
> + u32 reserved3   : 4;
> + };
> + u64 asuint64;
> + } hypercallinput;
> + union {
> + struct {
> + u16 callstatus;
> + u16 reserved1;
> + u32 elementsprocessed : 12;
> + u32 reserved2 : 20;
> + };
> + u64 asunit64;
> + } hypercalloutput;
> + };
> + u64 reserved2;
> + } hypercall;
>  } __packed __aligned(HV_HYP_PAGE_SIZE);
> 
> +u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!hv_ghcb_pg)
> + return -EFAULT;
> +
> + WARN_ON(in_nmi());
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return -EFAULT;
> + }
> +
> + hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
> + hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
> +
> + hv_ghcb->hypercall.outputgpa = (u64)output;
> + hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> + hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> + if (input_size)
> + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> + VMGEXIT();
> +
> + hv_ghcb->ghcb.ghcb_usage = 0x;
> + memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
> +sizeof(hv_ghcb->ghcb.save.valid_bitmap));
> +
> + local_irq_restore(flags);
> +
> + return hv_ghcb->hypercall.hypercalloutput.callstatus;

The hypercall.hypercalloutput.callstatus value must be saved
in a local variable *before* the call to local_irq_restore().  Then
the local variable is the return value.  Once local_irq_restore()
is called, the GHCB page could get reused.

> +}
> +
>  void hv_ghcb_msr_write(u64 msr, u64 value)
>  {
>   union hv_ghcb *hv_ghcb;
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 5e479d54918c..6d315c1465e0 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
> 
>   ++channel->sig_events;
> 
> - hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
> + if (hv_isolation_type_snp())
> + hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
> + NULL, sizeof(u64));

Better to use "sizeof(channel->sig_event)" instead of explicitly coding
the type.

> + else
> + hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_set_event);
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 97b21256a9db..d4531c64d9d3 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
>   aligned_msg->payload_size = payload_size;
>   

RE: [PATCH V4 04/13] hyperv: Mark vmbus ring buffer visible to host in Isolation VM

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>* Change vmbus_teardown_gpadl() parameter and put gpadl handle,
>buffer and buffer size in the struct vmbus_gpadl.
> ---
>  drivers/hv/channel.c| 36 -
>  drivers/net/hyperv/hyperv_net.h |  1 +
>  drivers/net/hyperv/netvsc.c | 16 +++
>  drivers/uio/uio_hv_generic.c| 14 +++--
>  include/linux/hyperv.h  |  8 +++-
>  5 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..82650beb3af0 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   if (ret)
>   return ret;
> 
> + ret = set_memory_decrypted((unsigned long)kbuffer,
> +HVPFN_UP(size));
> + if (ret) {
> + pr_warn("Failed to set host visibility for new GPADL %d.\n", 
> ret);
> + return ret;
> + }
> +
>   init_completion(>waitevent);
>   msginfo->waiting_channel = channel;
> 
> @@ -549,6 +557,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   }
> 
>   kfree(msginfo);
> +
> + if (ret)
> + set_memory_encrypted((unsigned long)kbuffer,
> +  HVPFN_UP(size));
> +
>   return ret;
>  }
> 
> @@ -639,6 +652,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   struct vmbus_channel_open_channel *open_msg;
>   struct vmbus_channel_msginfo *open_info = NULL;
>   struct page *page = newchannel->ringbuffer_page;
> + struct vmbus_gpadl gpadl;
>   u32 send_pages, recv_pages;
>   unsigned long flags;
>   int err;
> @@ -759,7 +773,10 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  error_free_info:
>   kfree(open_info);
>  error_free_gpadl:
> - vmbus_teardown_gpadl(newchannel, newchannel->ringbuffer_gpadlhandle);
> + gpadl.gpadl_handle = newchannel->ringbuffer_gpadlhandle;
> + gpadl.buffer = page_address(newchannel->ringbuffer_page);
> + gpadl.size = (send_pages + recv_pages) << PAGE_SHIFT;
> + vmbus_teardown_gpadl(newchannel, );
>   newchannel->ringbuffer_gpadlhandle = 0;
>  error_clean_ring:
>   hv_ringbuffer_cleanup(>outbound);
> @@ -806,7 +823,7 @@ EXPORT_SYMBOL_GPL(vmbus_open);
>  /*
>   * vmbus_teardown_gpadl -Teardown the specified GPADL handle
>   */
> -int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
> +int vmbus_teardown_gpadl(struct vmbus_channel *channel, struct vmbus_gpadl 
> *gpadl)
>  {
>   struct vmbus_channel_gpadl_teardown *msg;
>   struct vmbus_channel_msginfo *info;
> @@ -825,7 +842,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
> u32 gpadl_handle)
> 
>   msg->header.msgtype = CHANNELMSG_GPADL_TEARDOWN;
>   msg->child_relid = channel->offermsg.child_relid;
> - msg->gpadl = gpadl_handle;
> + msg->gpadl = gpadl->gpadl_handle;
> 
>   spin_lock_irqsave(_connection.channelmsg_lock, flags);
>   list_add_tail(>msglistentry,
> @@ -859,6 +876,12 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
> u32 gpadl_handle)
>   spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
> 
>   kfree(info);
> +
> + ret = set_memory_encrypted((unsigned long)gpadl->buffer,
> +HVPFN_UP(gpadl->size));
> + if (ret)
> + pr_warn("Fail to set mem host visibility in GPADL teardown 
> %d.\n", ret);
> +
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
> @@ -896,6 +919,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
>  static int vmbus_close_internal(struct vmbus_channel *channel)
>  {
>   struct vmbus_channel_close_channel *msg;
> + struct vmbus_gpadl gpadl;
>   int ret;
> 
>   vmbus_reset_channel_cb(channel);
> @@ -934,8 +958,10 @@ static int vmbus_close_internal(struct vmbus_channel 
> *channel)
> 
>   /* Tear down the gpadl for the channel's ring buffer */
>   else if (channel->ringbuffer_gpadlhandle) {
> - ret = vmbus_teardown_gpadl(channel,
> -channel->ringbuffer_gpadlhandle);
> + gpadl.gpadl_handle = channel->ringbuffer_gpadlhandle;
> + gpadl.buffer = page_address(channel->ringbuffer_page);
> + gpadl.size = channel->ringbuffer_pagecount;
> + ret = vmbus_teardown_gpadl(channel, );
>   if (ret) {
>   pr_err("Close failed: teardown gpadl return %d\n", ret);
>   /*
> diff --git 

RE: [PATCH V4 03/13] x86/hyperv: Add new hvcall guest address host visibility support

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Acked-by: Dave Hansen 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * Fix error code handle in the __hv_set_mem_host_visibility().
>   * Move HvCallModifySparseGpaPageHostVisibility near to enum
> hv_mem_host_visibility.
> 
> Change since v2:
>* Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>  according to platform check.
> 
> Change since v1:
>* Use new staic call x86_set_memory_enc to avoid add Hyper-V
>  specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile   |   2 +-
>  arch/x86/hyperv/hv_init.c  |   6 ++
>  arch/x86/hyperv/ivm.c  | 113 +
>  arch/x86/include/asm/hyperv-tlfs.h |  17 +
>  arch/x86/include/asm/mshyperv.h|   4 +-
>  arch/x86/mm/pat/set_memory.c   |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h |   1 +
>  8 files changed, 156 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index eba10ed4f73e..b1aa42f60faa 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -603,6 +603,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> 
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;

Use "return false" per previous comment from Wei Liu.

> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;

Use "return false".

> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> new file mode 100644
> index ..a069c788ce3c
> --- /dev/null
> +++ b/arch/x86/hyperv/ivm.c
> @@ -0,0 +1,113 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Hyper-V Isolation VM interface with paravisor and hypervisor
> + *
> + * Author:
> + *  Tianyu Lan 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
> + *
> + * In Isolation VM, all guest memory is encripted from host and guest

s/encripted/encrypted/

> + * needs to set memory visible to host via hvcall before sharing memory
> + * with host.
> + */
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +enum hv_mem_host_visibility visibility)
> +{
> + struct hv_gpa_range_for_visibility **input_pcpu, *input;
> + u16 pages_processed;
> + u64 hv_status;
> + unsigned long flags;
> +
> + /* no-op if partition isolation is not enabled */
> + if (!hv_is_isolation_supported())
> + return 0;
> +
> + if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> + pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> + HV_MAX_MODIFY_GPA_REP_COUNT);
> + return -EINVAL;
> + }
> +
> + local_irq_save(flags);
> + input_pcpu = (struct hv_gpa_range_for_visibility **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> + input = *input_pcpu;
> + if (unlikely(!input)) {
> + local_irq_restore(flags);
> + return -EINVAL;
> + }
> +
> + input->partition_id = HV_PARTITION_ID_SELF;
> + input->host_visibility = visibility;
> + input->reserved0 = 0;
> + input->reserved1 = 0;
> + memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> + hv_status = hv_do_rep_hypercall(
> + HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> + 0, input, _processed);
> + local_irq_restore(flags);
> +
> + if (hv_result_success(hv_status))
> + return 0;
> + else
> + return -EFAULT;
> +}
> +EXPORT_SYMBOL(hv_mark_gpa_visibility);

In later comments on Patch 7 of this series, I have suggested that
code in that patch should not call hv_mark_gpa_visibility() directly,
but instead should call set_memory_encrypted() and
set_memory_decrypted().  I'm thinking that those functions should
be the standard way to change the visibility of pages in the 

RE: [PATCH V4 02/13] x86/hyperv: Initialize shared memory boundary in the Isolation VM.

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyper-V exposes shared memory boundary via cpuid
> HYPERV_CPUID_ISOLATION_CONFIG and store it in the
> shared_gpa_boundary of ms_hyperv struct. This prepares
> to share memory with host for SNP guest.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v3:
>   * user BIT_ULL to get shared_gpa_boundary
>   * Rename field Reserved* to reserved
> ---
>  arch/x86/kernel/cpu/mshyperv.c |  2 ++
>  include/asm-generic/mshyperv.h | 12 +++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 20557a9d6e25..8bb001198316 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
>   if (ms_hyperv.priv_high & HV_ISOLATION) {
>   ms_hyperv.isolation_config_a = 
> cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
>   ms_hyperv.isolation_config_b = 
> cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
> + ms_hyperv.shared_gpa_boundary =
> + BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> 
>   pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
> 0x%x\n",
>   ms_hyperv.isolation_config_a, 
> ms_hyperv.isolation_config_b);
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index 0924bbd8458e..7537ae1db828 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -35,7 +35,17 @@ struct ms_hyperv_info {
>   u32 max_vp_index;
>   u32 max_lp_index;
>   u32 isolation_config_a;
> - u32 isolation_config_b;
> + union {
> + u32 isolation_config_b;
> + struct {
> + u32 cvm_type : 4;
> + u32 reserved11 : 1;
> + u32 shared_gpa_boundary_active : 1;
> + u32 shared_gpa_boundary_bits : 6;
> + u32 reserved12 : 20;

I'm still curious about the "11" and "12" in the reserved
field names.  Why not just "reserved1" and "reserved2"?
Having the "11" and "12" isn't wrong, but it makes one
wonder why since it's not usual. :-)

> + };
> + };
> + u64 shared_gpa_boundary;
>  };
>  extern struct ms_hyperv_info ms_hyperv;
> 
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V4 01/13] x86/hyperv: Initialize GHCB page in Isolation VM

2021-09-01 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Friday, August 27, 2021 10:21 AM
> 
> Hyperv exposes GHCB page via SEV ES GHCB MSR for SNP guest
> to communicate with hypervisor. Map GHCB page for all
> cpus to read/write MSR register and submit hvcall request
> via ghcb page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Chagne since v3:
> * Rename ghcb_base to hv_ghcb_pg and move it out of
> struct ms_hyperv_info.
>   * Allocate hv_ghcb_pg before cpuhp_setup_state() and leverage
> hv_cpu_init() to initialize ghcb page.
> ---
>  arch/x86/hyperv/hv_init.c   | 68 +
>  arch/x86/include/asm/mshyperv.h |  4 ++
>  arch/x86/kernel/cpu/mshyperv.c  |  3 ++
>  include/asm-generic/mshyperv.h  |  1 +
>  4 files changed, 69 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 708a2712a516..eba10ed4f73e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -36,12 +37,42 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> +void __percpu **hv_ghcb_pg;
> +
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> 
>  struct hv_vp_assist_page **hv_vp_assist_page;
>  EXPORT_SYMBOL_GPL(hv_vp_assist_page);
> 
> +static int hyperv_init_ghcb(void)
> +{
> + u64 ghcb_gpa;
> + void *ghcb_va;
> + void **ghcb_base;
> +
> + if (!hv_isolation_type_snp())
> + return 0;
> +
> + if (!hv_ghcb_pg)
> + return -EINVAL;
> +
> + /*
> +  * GHCB page is allocated by paravisor. The address
> +  * returned by MSR_AMD64_SEV_ES_GHCB is above shared
> +  * ghcb boundary and map it here.

I'm not sure what the "shared ghcb boundary" is.  Did you
mean "shared_gpa_boundary"?

> +  */
> + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
> + if (!ghcb_va)
> + return -ENOMEM;
> +
> + ghcb_base = (void **)this_cpu_ptr(hv_ghcb_pg);
> + *ghcb_base = ghcb_va;
> +
> + return 0;
> +}
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>   union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,7 +116,7 @@ static int hv_cpu_init(unsigned int cpu)
>   }
>   }
> 
> - return 0;
> + return hyperv_init_ghcb();
>  }
> 
>  static void (*hv_reenlightenment_cb)(void);
> @@ -177,6 +208,14 @@ static int hv_cpu_die(unsigned int cpu)
>  {
>   struct hv_reenlightenment_control re_ctrl;
>   unsigned int new_cpu;
> + void **ghcb_va;
> +
> + if (hv_ghcb_pg) {
> + ghcb_va = (void **)this_cpu_ptr(hv_ghcb_pg);
> + if (*ghcb_va)
> + memunmap(*ghcb_va);
> + *ghcb_va = NULL;
> + }
> 
>   hv_common_cpu_die(cpu);
> 
> @@ -366,10 +405,16 @@ void __init hyperv_init(void)
>   goto common_free;
>   }
> 
> + if (hv_isolation_type_snp()) {
> + hv_ghcb_pg = alloc_percpu(void *);
> + if (!hv_ghcb_pg)
> + goto free_vp_assist_page;
> + }
> +
>   cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
> hv_cpu_init, hv_cpu_die);
>   if (cpuhp < 0)
> - goto free_vp_assist_page;
> + goto free_ghcb_page;
> 
>   /*
>* Setup the hypercall page and enable hypercalls.
> @@ -383,10 +428,8 @@ void __init hyperv_init(void)
>   VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>   __builtin_return_address(0));
> - if (hv_hypercall_pg == NULL) {
> - wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> - goto remove_cpuhp_state;
> - }
> + if (hv_hypercall_pg == NULL)
> + goto clean_guest_os_id;
> 
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>   hypercall_msr.enable = 1;
> @@ -456,8 +499,11 @@ void __init hyperv_init(void)
>   hv_query_ext_cap(0);
>   return;
> 
> -remove_cpuhp_state:
> +clean_guest_os_id:
> + wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>   cpuhp_remove_state(cpuhp);
> +free_ghcb_page:
> + free_percpu(hv_ghcb_pg);
>  free_vp_assist_page:
>   kfree(hv_vp_assist_page);
>   hv_vp_assist_page = NULL;
> @@ -559,3 +605,11 @@ bool hv_is_isolation_supported(void)
>  {
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
> +
> +DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> +
> +bool hv_isolation_type_snp(void)
> +{
> + return static_branch_unlikely(_type_snp);
> +}
> +EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index adccbc209169..37739a277ac6 100644
> --- 

Re: [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device

2021-09-01 Thread Robin Murphy

On 2021-09-01 18:14, Sven Peter wrote:



On Tue, Aug 31, 2021, at 23:39, Alyssa Rosenzweig wrote:

+   if ((1 << __ffs(domain->pgsize_bitmap)) > PAGE_SIZE) {


Not a fan of this construction. Could you assign `(1 <<
__ffs(domain->pgsize_bitmap))` to an appropriately named temporary (e.g
min_io_pgsize) so it's clearer what's going on?


Good point, will do that for the next version.


Or maybe just test "__ffs(domain->pgsize_bitmap) > PAGE_SHIFT", or 
perhaps even avoid shifts altogether with something like 
"domain->pgsize_bitmap & (PAGE_SIZE | PAGE_SIZE - 1)".


Robin.






+   pr_warn("IOMMU page size cannot represent CPU pages.\n");


"Represent" how?



Looks like I dropped an "exactly" there when taking this line from iova.c :)



Thanks,


Sven


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/8] iommu: Move IOMMU pagesize check to attach_device

2021-09-01 Thread Sven Peter via iommu



On Tue, Aug 31, 2021, at 23:39, Alyssa Rosenzweig wrote:
> > +   if ((1 << __ffs(domain->pgsize_bitmap)) > PAGE_SIZE) {
> 
> Not a fan of this construction. Could you assign `(1 <<
> __ffs(domain->pgsize_bitmap))` to an appropriately named temporary (e.g
> min_io_pgsize) so it's clearer what's going on?

Good point, will do that for the next version.

> 
> > +   pr_warn("IOMMU page size cannot represent CPU pages.\n");
> 
> "Represent" how?
> 

Looks like I dropped an "exactly" there when taking this line from iova.c :)



Thanks,


Sven
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] iommu/dma: Disable get_sgtable for granule > PAGE_SIZE

2021-09-01 Thread Sven Peter via iommu



On Tue, Aug 31, 2021, at 23:30, Alyssa Rosenzweig wrote:
> I use this function for cross-device sharing on the M1 display driver.
> Arguably this is unsafe but it works on 16k kernels and if you want to
> test the function on 4k, you know where my code is.
> 

My biggest issue is that I do not understand how this function is supposed
to be used correctly. It would work fine as-is if it only ever gets passed 
buffers
allocated by the coherent API but there's not way to check or guarantee that.
There may also be callers making assumptions that no longer hold when
iovad->granule > PAGE_SIZE.


Regarding your case: I'm not convinced the function is meant to be used there.
If I understand it correctly, your code first allocates memory with 
dma_alloc_coherent
(which possibly creates a sgt internally and then maps it with iommu_map_sg),
then coerces that back into a sgt with dma_get_sgtable, and then maps that sgt 
to
another iommu domain with dma_map_sg while assuming that the result will be 
contiguous
in IOVA space. It'll work out because dma_alloc_coherent is the very thing
meant to allocate pages that can be mapped into kernel and device VA space
as a single contiguous block and because both of your IOMMUs are different
instances of the same HW block. Anything allocated by dma_alloc_coherent for the
first IOMMU will have the right shape that will allow it to be mapped as
a single contiguous block for the second IOMMU.

What could be done in your case is to instead use the IOMMU API,
allocate the pages yourself (while ensuring the sgt your create is made up
of blocks with size and physaddr aligned to max(domain_a->granule, 
domain_b->granule))
and then just use iommu_map_sg for both domains which actually comes with the
guarantee that the result will be a single contiguous block in IOVA space and
doesn't required the sgt roundtrip.



Sven


> On Sat, Aug 28, 2021 at 05:36:37PM +0200, Sven Peter wrote:
> > Pretend that iommu_dma_get_sgtable is not implemented when
> > granule > PAGE_SIZE since I can neither test this function right now
> > nor do I fully understand how it is used.
> > 
> > Signed-off-by: Sven Peter 
> > ---
> >  drivers/iommu/dma-iommu.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index d6e273ec3de6..64fbd9236820 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1315,9 +1315,15 @@ static int iommu_dma_get_sgtable(struct device *dev, 
> > struct sg_table *sgt,
> > void *cpu_addr, dma_addr_t dma_addr, size_t size,
> > unsigned long attrs)
> >  {
> > +   struct iommu_domain *domain = iommu_get_dma_domain(dev);
> > +   struct iommu_dma_cookie *cookie = domain->iova_cookie;
> > +   struct iova_domain *iovad = >iovad;
> > struct page *page;
> > int ret;
> >  
> > +   if (iovad->granule > PAGE_SIZE)
> > +   return -ENXIO;
> > +
> > if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
> > struct page **pages = dma_common_find_pages(cpu_addr);
> >  
> > -- 
> > 2.25.1
> > 
> 


-- 
Sven Peter
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier

2021-09-01 Thread Thierry Reding
On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote:
> 01.07.2021 21:14, Thierry Reding пишет:
> > On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding wrote:
> >> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote:
> >>> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote:
>  On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote:
> > From: Thierry Reding 
> >
> > Reserved memory region phandle references can be accompanied by a
> > specifier that provides additional information about how that specific
> > reference should be treated.
> >
> > One use-case is to mark a memory region as needing an identity mapping
> > in the system's IOMMU for the device that references the region. This is
> > needed for example when the bootloader has set up hardware (such as a
> > display controller) to actively access a memory region (e.g. a boot
> > splash screen framebuffer) during boot. The operating system can use the
> > identity mapping flag from the specifier to make sure an IOMMU identity
> > mapping is set up for the framebuffer before IOMMU translations are
> > enabled for the display controller.
> >
> > Signed-off-by: Thierry Reding 
> > ---
> >  .../reserved-memory/reserved-memory.txt   | 21 +++
> >  include/dt-bindings/reserved-memory.h |  8 +++
> >  2 files changed, 29 insertions(+)
> >  create mode 100644 include/dt-bindings/reserved-memory.h
> 
>  Sorry for being slow on this. I have 2 concerns.
> 
>  First, this creates an ABI issue. A DT with cells in 'memory-region' 
>  will not be understood by an existing OS. I'm less concerned about this 
>  if we address that with a stable fix. (Though I'm pretty sure we've 
>  naively added #?-cells in the past ignoring this issue.)
> >>>
> >>> A while ago I had proposed adding memory-region*s* as an alternative
> >>> name for memory-region to make the naming more consistent with other
> >>> types of properties (think clocks, resets, gpios, ...). If we added
> >>> that, we could easily differentiate between the "legacy" cases where
> >>> no #memory-region-cells was allowed and the new cases where it was.
> >>>
>  Second, it could be the bootloader setting up the reserved region. If a 
>  node already has 'memory-region', then adding more regions is more 
>  complicated compared to adding new properties. And defining what each 
>  memory-region entry is or how many in schemas is impossible.
> >>>
> >>> It's true that updating the property gets a bit complicated, but it's
> >>> not exactly rocket science. We really just need to splice the array. I
> >>> have a working implemention for this in U-Boot.
> >>>
> >>> For what it's worth, we could run into the same issue with any new
> >>> property that we add. Even if we renamed this to iommu-memory-region,
> >>> it's still possible that a bootloader may have to update this property
> >>> if it already exists (it could be hard-coded in DT, or it could have
> >>> been added by some earlier bootloader or firmware).
> >>>
>  Both could be addressed with a new property. Perhaps something like 
>  'iommu-memory-region = <>;'. I think the 'iommu' prefix is 
>  appropriate given this is entirely because of the IOMMU being in the 
>  mix. I might feel differently if we had other uses for cells, but I 
>  don't really see it in this case. 
> >>>
> >>> I'm afraid that down the road we'll end up with other cases and then we
> >>> might proliferate a number of *-memory-region properties with varying
> >>> prefixes.
> >>>
> >>> I am aware of one other case where we might need something like this: on
> >>> some Tegra SoCs we have audio processors that will access memory buffers
> >>> using a DMA engine. These processors are booted from early firmware
> >>> using firmware from system memory. In order to avoid trashing the
> >>> firmware, we need to reserve memory. We can do this using reserved
> >>> memory nodes. However, the audio DMA engine also uses the SMMU, so we
> >>> need to make sure that the firmware memory is marked as reserved within
> >>> the SMMU. This is similar to the identity mapping case, but not exactly
> >>> the same. Instead of creating a 1:1 mapping, we just want that IOVA
> >>> region to be reserved (i.e. IOMMU_RESV_RESERVED instead of
> >>> IOMMU_RESV_DIRECT{,_RELAXABLE}).
> >>>
> >>> That would also fall into the IOMMU domain, but we can't reuse the
> >>> iommu-memory-region property for that because then we don't have enough
> >>> information to decide which type of reservation we need.
> >>>
> >>> We could obviously make iommu-memory-region take a specifier, but we
> >>> could just as well use memory-regions in that case since we have
> >>> something more generic anyway.
> >>>
> >>> With the #memory-region-cells proposal, we can easily extend the cell in
> >>> the specifier with an 

Re: [PATCH v2 16/29] iommu/mediatek: Adjust device link when it is sub-common

2021-09-01 Thread 吴勇
On Tue, 2021-08-24 at 15:35 +0800, Hsin-Yi Wang wrote:
> On Fri, Aug 13, 2021 at 3:03 PM Yong Wu  wrote:
> > 
> > For MM IOMMU, We always add device link between smi-common and
> > IOMMU HW.
> > In mt8195, we add smi-sub-common. Thus, if the node is sub-common,
> > we still
> > need find again to get smi-common, then do device link.
> > 
> > Signed-off-by: Yong Wu 
> > ---
> >  drivers/iommu/mtk_iommu.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> > index a4479916ad33..a72241724adb 100644
> > --- a/drivers/iommu/mtk_iommu.c
> > +++ b/drivers/iommu/mtk_iommu.c
> > @@ -845,6 +845,17 @@ static int mtk_iommu_mm_dts_parse(struct
> > device *dev,
> > if (!smicomm_node)
> > return -EINVAL;
> > 
> > +   /* Find smi-common again if this is smi-sub-common */
> > +   if (of_property_read_bool(smicomm_node,
> > "mediatek,smi_sub_common")) {
> > +   of_node_put(smicomm_node); /* put the sub common */
> > +
> > +   smicomm_node = of_parse_phandle(smicomm_node,
> > "mediatek,smi", 0);
> 
> This only checks 1 level here, and does not check if the mediatek,smi
> of a sub-common node is not another sub-common node.
> So maybe add a check that the updated node here doesn't have
> mediatek,smi_sub_common property.

Yes. Currently there are only 2 levels. we could confirm if it is sub-
common from if it has "mediatek,smi", then "mediatek,smi_sub_common" is
unnecessary.

Will fix in the next version.

> 
> > +   if (!smicomm_node) {
> > +   dev_err(dev, "sub-comm has no common.\n");
> > +   return -EINVAL;
> > +   }
> > +   }
> > +
> > plarbdev = of_find_device_by_node(smicomm_node);
> > of_node_put(smicomm_node);
> > data->smicomm_dev = >dev;
> > --
> > 2.18.0
> > ___
> > Linux-mediatek mailing list
> > linux-media...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-mediatek
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 11/29] iommu/mediatek: Always pm_runtime_get while tlb flush

2021-09-01 Thread 吴勇
On Tue, 2021-08-24 at 15:10 +0800, Hsin-Yi Wang wrote:
> On Fri, Aug 13, 2021 at 2:57 PM Yong Wu  wrote:
> > 
> > Prepare for 2 HWs that sharing pgtable in different power-domains.
> > 
> > The previous SoC don't have PM. Only mt8192 has power-domain,
> > and it is display's power-domain which nearly always is enabled.
> > 
> > When there are 2 M4U HWs, it may has problem.
> > In this function, we get the pm_status via the m4u dev, but it
> > don't
> > reflect the real power-domain status of the HW since there may be
> > other
> > HW also use that power-domain.
> > 
> > Currently we could not get the real power-domain status, thus
> > always
> > pm_runtime_get here.
> > 
> > Prepare for mt8195, thus, no need fix tags here.
> > 
> > This patch may drop the performance, we expect the user could
> > pm_runtime_get_sync before dma_alloc_attrs which need tlb ops.
> 
> Can you check if there are existing users that need to add this
> change?

The issue may exist in our most users. Our users mainly are in v4l2.
normally their flow like this:
a) VIDIOC_REQBUFS: call dma_alloc_attrs or dma_buf_map_attachment.
b) VIDIOC_STREAMON. 
c) VIDIOC_QBUF: device_run: pm_runtime_get_sync.

Requesting they call pm_runtime_get before dma_alloc_attrs looks not
reasonable. It seems that they should not care about this.

This patch mainly make sure the flow is right. Locally I have a TODO to
try get the real power-domain status here, the sample code like below:

static struct notifier_block mtk_penpd_notifier;

/* Register the genpd notifier. */
mtk_penpd_notifier.notifier_call = mtk_iommu_pd_callback;
ret = dev_pm_genpd_add_notifier(dev, _penpd_notifier);

/* Then get the real power domain status in the notifier */
 static int mtk_iommu_pd_callback(struct notifier_block *nb,
unsigned long flags, void *data) 
 {
   if (flags == GENPD_NOTIFY_ON)
   /* the real power domain is power on */
   else if (flags == GENPD_NOTIFY_PRE_OFF)
   /* the real power domain are going to power off. Take it as
power off.
* Skip the tlb ops after receivice this flag.
*/
 }
 
 How about this? or any other suggestion to get the real power-domain
rather than the iommu device's power domain status.
 Thanks.

> 
> 
> > 
> > Signed-off-by: Yong Wu 
> > ---
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] dma-mapping updates for Linux 5.15

2021-09-01 Thread Christoph Hellwig
[Note that there is a conflict with changes from the swiotlb tree due
 dma_direct_{alloc,free}.  The solution is to basically take the changes
 from both trees and apply them manually.]

The following changes since commit 36a21d51725af2ce0700c6ebcb6b9594aac658a6:

  Linux 5.14-rc5 (2021-08-08 13:49:31 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.15

for you to fetch changes up to c1dec343d7abdf8e71aab2a289ab45ce8b1afb7e:

  hexagon: use the generic global coherent pool (2021-08-19 09:02:40 +0200)


dma-mapping updates for Linux 5.15

 - fix debugfs initialization order (Anthony Iliopoulos)
 - use memory_intersects() directly (Kefeng Wang)
 - allow to return specific errors from ->map_sg
   (Logan Gunthorpe, Martin Oliveira)
 - turn the dma_map_sg return value into an unsigned int (me)
 - provide a common global coherent pool іmplementation (me)


Anthony Iliopoulos (1):
  dma-debug: fix debugfs initialization order

Christoph Hellwig (8):
  dma-mapping: return an unsigned int from dma_map_sg{,_attrs}
  dma-direct: add support for dma_coherent_default_memory
  ARM/nommu: use the generic dma-direct code for non-coherent devices
  dma-mapping: allow using the global coherent pool for !ARM
  dma-mapping: simplify dma_init_coherent_memory
  dma-mapping: add a dma_init_global_coherent helper
  dma-mapping: make the global coherent pool conditional
  hexagon: use the generic global coherent pool

Kefeng Wang (1):
  dma-debug: use memory_intersects() directly

Logan Gunthorpe (10):
  dma-mapping: allow map_sg() ops to return negative error codes
  dma-direct: return appropriate error code from dma_direct_map_sg()
  iommu: return full error code from iommu_map_sg[_atomic]()
  iommu/dma: return error code from iommu_dma_map_sg()
  ARM/dma-mapping: don't set failed sg dma_address to DMA_MAPPING_ERROR
  powerpc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
  s390/pci: don't set failed sg dma_address to DMA_MAPPING_ERROR
  sparc/iommu: don't set failed sg dma_address to DMA_MAPPING_ERROR
  x86/amd_gart: don't set failed sg dma_address to DMA_MAPPING_ERROR
  dma-mapping: disallow .map_sg operations from returning zero on error

Martin Oliveira (11):
  alpha: return error code from alpha_pci_map_sg()
  ARM/dma-mapping: return error code from .map_sg() ops
  ia64/sba_iommu: return error code from sba_map_sg_attrs()
  MIPS/jazzdma: return error code from jazz_dma_map_sg()
  powerpc/iommu: return error code from .map_sg() ops
  s390/pci: return error code from s390_dma_map_sg()
  sparc/iommu: return error codes from .map_sg() ops
  parisc: return error code from .map_sg() ops
  xen: swiotlb: return error code from xen_swiotlb_map_sg()
  x86/amd_gart: return error code from gart_map_sg()
  dma-mapping: return error code from dma_dummy_map_sg()

 arch/alpha/kernel/pci_iommu.c   |  10 +-
 arch/arm/Kconfig|   5 +-
 arch/arm/mm/dma-mapping-nommu.c | 173 ++--
 arch/arm/mm/dma-mapping.c   |  26 +++--
 arch/hexagon/Kconfig|   1 +
 arch/hexagon/kernel/dma.c   |  57 ++-
 arch/ia64/hp/common/sba_iommu.c |   4 +-
 arch/mips/jazz/jazzdma.c|   2 +-
 arch/powerpc/kernel/iommu.c |   6 +-
 arch/powerpc/platforms/ps3/system-bus.c |   2 +-
 arch/powerpc/platforms/pseries/vio.c|   5 +-
 arch/s390/pci/pci_dma.c |  13 +--
 arch/sparc/kernel/iommu.c   |   6 +-
 arch/sparc/kernel/pci_sun4v.c   |   6 +-
 arch/sparc/mm/iommu.c   |   2 +-
 arch/x86/kernel/amd_gart_64.c   |  18 ++--
 drivers/iommu/dma-iommu.c   |  22 ++--
 drivers/iommu/iommu.c   |  15 ++-
 drivers/parisc/ccio-dma.c   |   2 +-
 drivers/parisc/sba_iommu.c  |   2 +-
 drivers/xen/swiotlb-xen.c   |   2 +-
 include/linux/dma-map-ops.h |  23 +++--
 include/linux/dma-mapping.h |  44 +++-
 include/linux/iommu.h   |  22 ++--
 kernel/dma/Kconfig  |   4 +
 kernel/dma/coherent.c   | 161 ++---
 kernel/dma/debug.c  |  21 ++--
 kernel/dma/direct.c |  17 +++-
 kernel/dma/dummy.c  |   2 +-
 kernel/dma/mapping.c|  80 +--
 30 files changed, 310 insertions(+), 443 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/2] iommu/ipmmu-vmsa: Add support for r8a779a0

2021-09-01 Thread Yoshihiro Shimoda
Add support for r8a779a0 (R-Car V3U). The IPMMU hardware design
of this SoC differs than others. So, add a new ipmmu_features for it.

Signed-off-by: Yoshihiro Shimoda 
---
 drivers/iommu/ipmmu-vmsa.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index d38ff29a76e8..c46951367f93 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -36,7 +36,7 @@
 #define IPMMU_CTX_MAX  8U
 #define IPMMU_CTX_INVALID  -1
 
-#define IPMMU_UTLB_MAX 48U
+#define IPMMU_UTLB_MAX 64U
 
 struct ipmmu_features {
bool use_ns_alias_offset;
@@ -922,6 +922,20 @@ static const struct ipmmu_features 
ipmmu_features_rcar_gen3 = {
.utlb_offset_base = 0,
 };
 
+static const struct ipmmu_features ipmmu_features_r8a779a0 = {
+   .use_ns_alias_offset = false,
+   .has_cache_leaf_nodes = true,
+   .number_of_contexts = 8,
+   .num_utlbs = 64,
+   .setup_imbuscr = false,
+   .twobit_imttbcr_sl0 = true,
+   .reserved_context = true,
+   .cache_snoop = false,
+   .ctx_offset_base = 0x1,
+   .ctx_offset_stride = 0x1040,
+   .utlb_offset_base = 0x3000,
+};
+
 static const struct of_device_id ipmmu_of_ids[] = {
{
.compatible = "renesas,ipmmu-vmsa",
@@ -959,6 +973,9 @@ static const struct of_device_id ipmmu_of_ids[] = {
}, {
.compatible = "renesas,ipmmu-r8a77995",
.data = _features_rcar_gen3,
+   }, {
+   .compatible = "renesas,ipmmu-r8a779a0",
+   .data = _features_r8a779a0,
}, {
/* Terminator */
},
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] dt-bindings: iommu: renesas, ipmmu-vmsa: add r8a779a0 support

2021-09-01 Thread Yoshihiro Shimoda
Add support for r8a779a0 (R-Car V3U).

Signed-off-by: Yoshihiro Shimoda 
---
 Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml 
b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
index 02c69a95c332..ce0c715205c6 100644
--- a/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
+++ b/Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
@@ -43,6 +43,7 @@ properties:
   - renesas,ipmmu-r8a77980 # R-Car V3H
   - renesas,ipmmu-r8a77990 # R-Car E3
   - renesas,ipmmu-r8a77995 # R-Car D3
+  - renesas,ipmmu-r8a779a0 # R-Car V3U
 
   reg:
 maxItems: 1
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] iommu/ipmmu-vmsa: Add support for r8a779a0

2021-09-01 Thread Yoshihiro Shimoda
This patch series adds support for r8a779a0 (R-Car V3U).

Yoshihiro Shimoda (2):
  dt-bindings: iommu: renesas,ipmmu-vmsa: add r8a779a0 support
  iommu/ipmmu-vmsa: Add support for r8a779a0

 .../bindings/iommu/renesas,ipmmu-vmsa.yaml|  1 +
 drivers/iommu/ipmmu-vmsa.c| 19 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-09-01 Thread Robin Murphy

On 2021-09-01 09:59, Marek Szyprowski wrote:

On 21.05.2021 05:03, Wang Xingang wrote:

From: Xingang Wang 

When booting with devicetree, the pci_request_acs() is called after the
enumeration and initialization of PCI devices, thus the ACS is not
enabled. And ACS should be enabled when IOMMU is detected for the
PCI host bridge, so add check for IOMMU before probe of PCI host and call
pci_request_acs() to make sure ACS will be enabled when enumerating PCI
devices.

Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
configuring IOMMU linkage")
Signed-off-by: Xingang Wang 


This patch landed in linux-next as commit 57a4ab1584e6 ("iommu/of: Fix
pci_request_acs() before enumerating PCI devices"). Sadly it breaks PCI
operation on ARM Juno R1 board (arch/arm64/boot/dts/arm/juno-r1.dts). It
looks that the IOMMU controller is not probed for some reasons:

# cat /sys/kernel/debug/devices_deferred
2b60.iommu


That IOMMU belongs to the CoreSight debug subsystem and often gets stuck 
waiting for a power domain (especially if you have a mismatch of 
SCPI/SCMI expectations between the DT and SCP firmware). Unless you're 
trying to use CoreSight trace that shouldn't matter.


The PCIe on Juno doesn't support ACS anyway, so I'm puzzled why this 
should make any difference :/


Robin.


Reverting this patch on top of current linux-next fixes this issue. If
you need more information to debug this issue, just let me know.


---
   drivers/iommu/of_iommu.c | 1 -
   drivers/pci/of.c | 8 +++-
   2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index a9d2df001149..54a14da242cc 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
*dev,
.np = master_np,
};
   
-		pci_request_acs();

err = pci_for_each_dma_alias(to_pci_dev(dev),
 of_pci_iommu_init, );
} else {
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index da5b414d585a..2313c3f848b0 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
*dev,
   
   int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge *bridge)

   {
-   if (!dev->of_node)
+   struct device_node *node = dev->of_node;
+
+   if (!node)
return 0;
   
+	/* Detect IOMMU and make sure ACS will be enabled */

+   if (of_property_read_bool(node, "iommu-map"))
+   pci_request_acs();
+
bridge->swizzle_irq = pci_common_swizzle;
bridge->map_irq = of_irq_parse_and_map_pci;
   


Best regards


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-09-01 Thread Marek Szyprowski
On 21.05.2021 05:03, Wang Xingang wrote:
> From: Xingang Wang 
>
> When booting with devicetree, the pci_request_acs() is called after the
> enumeration and initialization of PCI devices, thus the ACS is not
> enabled. And ACS should be enabled when IOMMU is detected for the
> PCI host bridge, so add check for IOMMU before probe of PCI host and call
> pci_request_acs() to make sure ACS will be enabled when enumerating PCI
> devices.
>
> Fixes: 6bf6c24720d33 ("iommu/of: Request ACS from the PCI core when
> configuring IOMMU linkage")
> Signed-off-by: Xingang Wang 

This patch landed in linux-next as commit 57a4ab1584e6 ("iommu/of: Fix 
pci_request_acs() before enumerating PCI devices"). Sadly it breaks PCI 
operation on ARM Juno R1 board (arch/arm64/boot/dts/arm/juno-r1.dts). It 
looks that the IOMMU controller is not probed for some reasons:

# cat /sys/kernel/debug/devices_deferred
2b60.iommu

Reverting this patch on top of current linux-next fixes this issue. If 
you need more information to debug this issue, just let me know.

> ---
>   drivers/iommu/of_iommu.c | 1 -
>   drivers/pci/of.c | 8 +++-
>   2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index a9d2df001149..54a14da242cc 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -205,7 +205,6 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   .np = master_np,
>   };
>   
> - pci_request_acs();
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>of_pci_iommu_init, );
>   } else {
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index da5b414d585a..2313c3f848b0 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -581,9 +581,15 @@ static int pci_parse_request_of_pci_ranges(struct device 
> *dev,
>   
>   int devm_of_pci_bridge_init(struct device *dev, struct pci_host_bridge 
> *bridge)
>   {
> - if (!dev->of_node)
> + struct device_node *node = dev->of_node;
> +
> + if (!node)
>   return 0;
>   
> + /* Detect IOMMU and make sure ACS will be enabled */
> + if (of_property_read_bool(node, "iommu-map"))
> + pci_request_acs();
> +
>   bridge->swizzle_irq = pci_common_swizzle;
>   bridge->map_irq = of_irq_parse_and_map_pci;
>   

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC][PATCH v2 00/13] iommu/arm-smmu-v3: Add NVIDIA implementation

2021-09-01 Thread Tian, Kevin
> From: Alex Williamson
> Sent: Wednesday, September 1, 2021 12:16 AM
> 
> On Mon, 30 Aug 2021 19:59:10 -0700
> Nicolin Chen  wrote:
> 
> > The SMMUv3 devices implemented in the Grace SoC support NVIDIA's
> custom
> > CMDQ-Virtualization (CMDQV) hardware. Like the new ECMDQ feature first
> > introduced in the ARM SMMUv3.3 specification, CMDQV adds multiple
> VCMDQ
> > interfaces to supplement the single architected SMMU_CMDQ in an effort
> > to reduce contention.
> >
> > This series of patches add CMDQV support with its preparational changes:
> >
> > * PATCH-1 to PATCH-8 are related to shared VMID feature: they are used
> >   first to improve TLB utilization, second to bind a shared VMID with a
> >   VCMDQ interface for hardware configuring requirement.
> 
> The vfio changes would need to be implemented in alignment with the
> /dev/iommu proposals[1].  AIUI, the VMID is essentially binding
> multiple containers together for TLB invalidation, which I expect in
> the proposal below is largely already taken care of in that a single
> iommu-fd can support multiple I/O address spaces and it's largely
> expected that a hypervisor would use a single iommu-fd so this explicit
> connection by userspace across containers wouldn't be necessary.

Agree. VMID is equivalent to DID (domain id) in other vendor iommus.
with /dev/iommu multiple I/O address spaces can share the same VMID
via nesting. No need of exposing VMID to userspace to build the 
connection.

Thanks,
Kevin

> 
> We're expecting to talk more about the /dev/iommu approach at Plumbers
> in few weeks.  Thanks,
> 
> Alex
> 
> [1]https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C1
> 8...@bn9pr11mb5433.namprd11.prod.outlook.com/
> 
> 
> > * PATCH-9 and PATCH-10 are to accommodate the NVIDIA implementation
> with
> >   the existing arm-smmu-v3 driver.
> >
> > * PATCH-11 borrows the "implementation infrastructure" from the arm-
> smmu
> >   driver so later change can build upon it.
> >
> > * PATCH-12 adds an initial NVIDIA implementation related to host feature,
> >   and also adds implementation specific ->device_reset() and ->get_cmdq()
> >   callback functions.
> >
> > * PATCH-13 adds virtualization features using VFIO mdev interface, which
> >   allows user space hypervisor to map and get access to one of the VCMDQ
> >   interfaces of CMDQV module.
> >
> > ( Thinking that reviewers can get a better view of this implementation,
> >   I am attaching QEMU changes here for reference purpose:
> >   https://github.com/nicolinc/qemu/commits/dev/cmdqv_v6.0.0-rc2
> >   The branch has all preparational changes, while I'm still integrating
> >   device model and ARM-VIRT changes, and will push them these two days,
> >   although they might not be in a good shape of being sent to review yet )
> >
> > Above all, I marked RFC for this series, as I feel that we may come up
> > some better solution. So please kindly share your reviews and insights.
> >
> > Thank you!
> >
> > Changelog
> > v1->v2:
> >  * Added mdev interface support for hypervisor and VMs.
> >  * Added preparational changes for mdev interface implementation.
> >  * PATCH-12 Changed ->issue_cmdlist() to ->get_cmdq() for a better
> >integration with recently merged ECMDQ-related changes.
> >
> > Nate Watterson (3):
> >   iommu/arm-smmu-v3: Add implementation infrastructure
> >   iommu/arm-smmu-v3: Add support for NVIDIA CMDQ-Virtualization hw
> >   iommu/nvidia-smmu-v3: Add mdev interface support
> >
> > Nicolin Chen (10):
> >   iommu: Add set_nesting_vmid/get_nesting_vmid functions
> >   vfio: add VFIO_IOMMU_GET_VMID and VFIO_IOMMU_SET_VMID
> >   vfio: Document VMID control for IOMMU Virtualization
> >   vfio: add set_vmid and get_vmid for vfio_iommu_type1
> >   vfio/type1: Implement set_vmid and get_vmid
> >   vfio/type1: Set/get VMID to/from iommu driver
> >   iommu/arm-smmu-v3: Add shared VMID support for NESTING
> >   iommu/arm-smmu-v3: Add VMID alloc/free helpers
> >   iommu/arm-smmu-v3: Pass dev pointer to arm_smmu_detach_dev
> >   iommu/arm-smmu-v3: Pass cmdq pointer in
> arm_smmu_cmdq_issue_cmdlist()
> >
> >  Documentation/driver-api/vfio.rst |   34 +
> >  MAINTAINERS   |2 +
> >  drivers/iommu/arm/arm-smmu-v3/Makefile|2 +-
> >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-impl.c  |   15 +
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  121 +-
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   18 +
> >  .../iommu/arm/arm-smmu-v3/nvidia-smmu-v3.c| 1249
> +
> >  drivers/iommu/iommu.c |   20 +
> >  drivers/vfio/vfio.c   |   25 +
> >  drivers/vfio/vfio_iommu_type1.c   |   37 +
> >  include/linux/iommu.h |5 +
> >  include/linux/vfio.h  |2 +
> >  include/uapi/linux/vfio.h |   26 +
> >  13 files changed, 1537 insertions(+), 19 deletions(-)
> >  create mode 100644