Re: [PATCH v6 6/6] iommu/tegra-smmu: Add pagetable mappings to debugfs

2021-09-15 Thread Nicolin Chen
On Wed, Sep 15, 2021 at 03:09:48PM +0300, Dmitry Osipenko wrote:
> 15.09.2021 07:38, Nicolin Chen пишет:
> > On Tue, Sep 14, 2021 at 10:20:30PM +0300, Dmitry Osipenko wrote:
> >> 14.09.2021 21:49, Nicolin Chen пишет:
> >>> On Tue, Sep 14, 2021 at 04:29:15PM +0300, Dmitry Osipenko wrote:
>  14.09.2021 04:38, Nicolin Chen пишет:
> > +static unsigned long pd_pt_index_iova(unsigned int pd_index, unsigned 
> > int pt_index)
> > +{
> > +   return ((dma_addr_t)pd_index & (SMMU_NUM_PDE - 1)) << 
> > SMMU_PDE_SHIFT |
> > +  ((dma_addr_t)pt_index & (SMMU_NUM_PTE - 1)) << 
> > SMMU_PTE_SHIFT;
> > +}
> 
>  We know that IOVA is fixed to u32 for this controller. Can we avoid all
>  these dma_addr_t castings? It should make code cleaner a tad, IMO.
> >>>
> >>> Tegra210 actually supports 34-bit IOVA...
> >>>
> >>
> >> It doesn't. 34-bit is PA, 32-bit is VA.
> >>
> >> Quote from T210 TRM:
> >>
> >> "The SMMU is a centralized virtual-to-physical translation for MSS. It
> >> maps a 32-bit virtual address to a 34-bit physical address. If the
> >> client address is 40 bits then bits 39:32 are ignored."
> > 
> > If you scroll down by a couple of sections, you can see 34-bit
> > virtual addresses in section 18.6.1.2; and if checking one ASID
> > register, you can see it mention the extra two bits va[33:32].
> 
> Thanks for the pointer. It says that only certain memory clients allow
> to combine 4 ASIDs to form 34bit VA space. In this case the PA space is
> split into 4GB areas and there are additional bitfields which configure
> the ASID mapping of each 4GB area. Still each ASID is 32bit.

True.

> This is what TRM says:
> 
> "For the GPU and other clients with 34-bit address interfaces, the ASID
> registers are extended to point to four ASIDs. The SMMU supports 4GB of
> virtual address space per ASID, so mapping addr[33:32] into ASID[1:0]
> extends the virtual address space of a client to 16GB."
> 
> > However, the driver currently sets its geometry.aperture_end to
> > 32-bit, and we can only get 32-bit IOVAs using PDE and PTE only,
> > so I think it should be safe to remove the castings here. I'll
> > wait for a couple of days and see if there'd be other comments
> > for me to address in next version.
> 
> You will need to read the special "ASID Assignment Register" which
> supports 4 sub-ASIDs to translate the PA address into the actual VA. By
> default all clients are limited to a single ASID and upstream kernel
> doesn't support programming of 34bit VAs. So doesn't worth the effort to
> fully translate the VA, IMO.

Yea. It'd be easier to just stay in 32-bit. I will remove those
castings in the next version, waiting for Thierry taking a look
at this v6 first.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote:
> Could you please provide more explicit explanation why inlining such an
> helper is considered as bad practice and messy ?

Tom already told you to look at the previous threads. Let's read them
together. This one, for example:

https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/

| > To take it out of line, I'm leaning towards the latter, creating a new
| > file that is built based on the ARCH_HAS_PROTECTED_GUEST setting.
| 
| Yes.  In general everytime architectures have to provide the prototype
| and not just the implementation of something we end up with a giant mess
| sooner or later.  In a few cases that is still warranted due to
| performance concerns, but i don't think that is the case here.

So I think what Christoph means here is that you want to have the
generic prototype defined in a header and arches get to implement it
exactly to the letter so that there's no mess.

As to what mess exactly, I'd let him explain that.

> Because as demonstrated in my previous response some days ago, taking that
> outline ends up with an unneccessary ugly generated code and we don't
> benefit front GCC's capability to fold in and opt out unreachable code.

And this is real fast path where a couple of instructions matter or what?

set_memory_encrypted/_decrypted doesn't look like one to me.

> I can't see your point here. Inlining the function wouldn't add any
> ifdeffery as far as I can see.

If the function is touching defines etc, they all need to be visible.
If that function needs to call other functions - which is the case on
x86, perhaps not so much on power - then you need to either ifdef around
them or provide stubs with ifdeffery in the headers. And you need to
make them global functions instead of keeping them static to the same
compilation unit, etc, etc.

With a separate compilation unit, you don't need any of that and it is
all kept in that single file.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-15 Thread Kuppuswamy, Sathyanarayanan




On 9/15/21 9:46 AM, Borislav Petkov wrote:

Sathya,

if you want to prepare the Intel variant intel_cc_platform_has() ontop
of those and send it to me, that would be good because then I can
integrate it all in one branch which can be used to base future work
ontop.


I have a Intel variant patch (please check following patch). But it includes
TDX changes as well. Shall I move TDX changes to different patch and just
create a separate patch for adding intel_cc_platform_has()?


commit fc5f98a0ed94629d903827c5b44ee9295f835831
Author: Kuppuswamy Sathyanarayanan 
Date:   Wed May 12 11:35:13 2021 -0700

x86/tdx: Add confidential guest support for TDX guest

TDX architecture provides a way for VM guests to be highly secure and
isolated (from untrusted VMM). To achieve this requirement, any data
coming from VMM cannot be completely trusted. TDX guest fixes this
issue by hardening the IO drivers against the attack from the VMM.
So, when adding hardening fixes to the generic drivers, to protect
custom fixes use cc_platform_has() API.

Also add TDX guest support to cc_platform_has() API to protect the
TDX specific fixes.

Signed-off-by: Kuppuswamy Sathyanarayanan 


diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a5b14de03458..2e78358923a1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -871,6 +871,7 @@ config INTEL_TDX_GUEST
depends on SECURITY
select X86_X2APIC
select SECURITY_LOCKDOWN_LSM
+   select ARCH_HAS_CC_PLATFORM
help
  Provide support for running in a trusted domain on Intel processors
  equipped with Trusted Domain eXtensions. TDX is a new Intel
diff --git a/arch/x86/include/asm/intel_cc_platform.h 
b/arch/x86/include/asm/intel_cc_platform.h
new file mode 100644
index ..472c3174beac
--- /dev/null
+++ b/arch/x86/include/asm/intel_cc_platform.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2021 Intel Corporation */
+#ifndef _ASM_X86_INTEL_CC_PLATFORM_H
+#define _ASM_X86_INTEL_CC_PLATFORM_H
+
+#if defined(CONFIG_CPU_SUP_INTEL) && defined(CONFIG_ARCH_HAS_CC_PLATFORM)
+bool intel_cc_platform_has(unsigned int flag);
+#else
+static inline bool intel_cc_platform_has(unsigned int flag) { return false; }
+#endif
+
+#endif /* _ASM_X86_INTEL_CC_PLATFORM_H */
+
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 3c9bacd3c3f3..e83bc2f48efe 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -10,11 +10,16 @@
 #include 
 #include 
 #include 
+#include 
+
+#include 

 bool cc_platform_has(enum cc_attr attr)
 {
if (sme_me_mask)
return amd_cc_platform_has(attr);
+   else if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+   return intel_cc_platform_has(attr);

return false;
 }
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8321c43554a1..ab486a3b1eb0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -60,6 +61,21 @@ static u64 msr_test_ctrl_cache __ro_after_init;
  */
 static bool cpu_model_supports_sld __ro_after_init;

+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+bool intel_cc_platform_has(enum cc_attr attr)
+{
+   switch (attr) {
+   case CC_ATTR_GUEST_TDX:
+   return cpu_feature_enabled(X86_FEATURE_TDX_GUEST);
+   default:
+   return false;
+   }
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(intel_cc_platform_has);
+#endif
+
 /*
  * Processors which have self-snooping capability can handle conflicting
  * memory type across CPUs by snooping its own cache. However, there exists
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index 253f3ea66cd8..e38430e6e396 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -61,6 +61,15 @@ enum cc_attr {
 * Examples include SEV-ES.
 */
CC_ATTR_GUEST_STATE_ENCRYPT,
+
+   /**
+* @CC_ATTR_GUEST_TDX: Trusted Domain Extension Support
+*
+* The platform/OS is running as a TDX guest/virtual machine.
+*
+* Examples include SEV-ES.
+*/
+   CC_ATTR_GUEST_TDX,
 };

 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Christophe Leroy



Le 15/09/2021 à 12:08, Borislav Petkov a écrit :

On Wed, Sep 15, 2021 at 10:28:59AM +1000, Michael Ellerman wrote:

I don't love it, a new C file and an out-of-line call to then call back
to a static inline that for most configuration will return false ... but
whatever :)


Yeah, hch thinks it'll cause a big mess otherwise:

https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/


Could you please provide more explicit explanation why inlining such an 
helper is considered as bad practice and messy ?


Because as demonstrated in my previous response some days ago, taking 
that outline ends up with an unneccessary ugly generated code and we 
don't benefit front GCC's capability to fold in and opt out unreachable 
code.


As pointed by Michael in most cases the function will just return false 
so behind the performance concern, there is also the code size and code 
coverage topic that is to be taken into account. And even when the 
function doesn't return false, the only thing it does folds into a 
single powerpc instruction so there is really no point in making a 
dedicated out-of-line fonction for that and suffer the cost and the size 
of a function call and to justify the addition of a dedicated C file.





I guess less ifdeffery is nice too.


I can't see your point here. Inlining the function wouldn't add any 
ifdeffery as far as I can see.


So, would you mind reconsidering your approach and allow architectures 
to provide inline implementation by just not enforcing a generic 
prototype ? Or otherwise provide more details and exemple of why the 
cons are more important versus the pros ?


Thanks
Christophe
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver

2021-09-15 Thread Haiyang Zhang via iommu



> -Original Message-
> From: Michael Kelley 
> Sent: Wednesday, September 15, 2021 12:22 PM
> To: Tianyu Lan ; KY Srinivasan ;

> > +   memset(vmap_pages, 0,
> > +  sizeof(*vmap_pages) * vmap_page_index);
> > +   vmap_page_index = 0;
> > +
> > +   for (j = 0; j < i; j++)
> > +   __free_pages(pages[j], alloc_unit);
> > +
> > +   kfree(pages);
> > +   alloc_unit = 1;
> 
> This is the case where a large enough contiguous physical memory chunk
> could not be found.  But rather than dropping all the way down to single
> pages, would it make sense to try something smaller, but not 1?  For
> example, cut the alloc_unit in half and try again.  But I'm not sure of
> all the implications.

I had the same question. But probably gradually decrementing uses too much
time?

> 
> > +   goto retry;
> > +   }
> > +   }
> > +
> > +   pages[i] = page;
> > +   for (j = 0; j < alloc_unit; j++)
> > +   vmap_pages[vmap_page_index++] = page++;
> > +   }
> > +
> > +   vaddr = vmap(vmap_pages, vmap_page_index, VM_MAP, PAGE_KERNEL);
> > +   kfree(vmap_pages);
> > +
> > +   *pages_array = pages;
> > +   return vaddr;
> > +
> > +cleanup:
> > +   for (j = 0; j < i; j++)
> > +   __free_pages(pages[i], alloc_unit);
> > +
> > +   kfree(pages);
> > +   kfree(vmap_pages);
> > +   return NULL;
> > +}
> > +
> > +static void *netvsc_map_pages(struct page **pages, int count, int
> > +alloc_unit) {
> > +   int pg_count = count * alloc_unit;
> > +   struct page *page;
> > +   unsigned long *pfns;
> > +   int pfn_index = 0;
> > +   void *vaddr;
> > +   int i, j;
> > +
> > +   if (!pages)
> > +   return NULL;
> > +
> > +   pfns = kcalloc(pg_count, sizeof(*pfns), GFP_KERNEL);
> > +   if (!pfns)
> > +   return NULL;
> > +
> > +   for (i = 0; i < count; i++) {
> > +   page = pages[i];
> > +   if (!page) {
> > +   pr_warn("page is not available %d.\n", i);
> > +   return NULL;
> > +   }
> > +
> > +   for (j = 0; j < alloc_unit; j++) {
> > +   pfns[pfn_index++] = page_to_pfn(page++) +
> > +   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
> > +   }
> > +   }
> > +
> > +   vaddr = vmap_pfn(pfns, pg_count, PAGE_KERNEL_IO);
> > +   kfree(pfns);
> > +   return vaddr;
> > +}
> > +
> 
> I think you are proposing this approach to allocating memory for the
> send and receive buffers so that you can avoid having two virtual
> mappings for the memory, per comments from Christop Hellwig.  But
> overall, the approach seems a bit complex and I wonder if it is worth it.
> If allocating large contiguous chunks of physical memory is successful,
> then there is some memory savings in that the data structures needed to
> keep track of the physical pages is smaller than the equivalent page
> tables might be.  But if you have to revert to allocating individual
> pages, then the memory savings is reduced.
> 
> Ultimately, the list of actual PFNs has to be kept somewhere.  Another
> approach would be to do the reverse of what hv_map_memory() from the v4
> patch series does.  I.e., you could do virt_to_phys() on each virtual
> address that maps above VTOM, and subtract out the shared_gpa_boundary
> to get the
> list of actual PFNs that need to be freed.   This way you don't have two
> copies
> of the list of PFNs -- one with and one without the shared_gpa_boundary
> added.
> But it comes at the cost of additional code so that may not be a great
> idea.
> 
> I think what you have here works, and I don't have a clearly better
> solution at the moment except perhaps to revert to the v4 solution and
> just have two virtual mappings.  I'll keep thinking about it.  Maybe
> Christop has other thoughts.
> 
> >  static int netvsc_init_buf(struct hv_device *device,
> >struct netvsc_device *net_device,
> >const struct netvsc_device_info *device_info) @@ -
> 337,7 +462,7
> > @@ static int netvsc_init_buf(struct hv_device *device,
> > struct nvsp_1_message_send_receive_buffer_complete *resp;
> > struct net_device *ndev = hv_get_drvdata(device);
> > struct nvsp_message *init_packet;
> > -   unsigned int buf_size;
> > +   unsigned int buf_size, alloc_unit;
> > size_t map_words;
> > int i, ret = 0;
> >
> > @@ -350,7 +475,14 @@ static int netvsc_init_buf(struct hv_device
> *device,
> > buf_size = min_t(unsigned int, buf_size,
> >  NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
> >
> > -   net_device->recv_buf = vzalloc(buf_size);
> > +   if (hv_isolation_type_snp())
> > +   net_device->recv_buf =
> > +   netvsc_alloc_pages(_device->recv_pages,
> > +  

Re: [PATCH v3 0/8] Implement generic cc_platform_has() helper function

2021-09-15 Thread Borislav Petkov
On Wed, Sep 08, 2021 at 05:58:31PM -0500, Tom Lendacky wrote:
> This patch series provides a generic helper function, cc_platform_has(),
> to replace the sme_active(), sev_active(), sev_es_active() and
> mem_encrypt_active() functions.
> 
> It is expected that as new confidential computing technologies are
> added to the kernel, they can all be covered by a single function call
> instead of a collection of specific function calls all called from the
> same locations.
> 
> The powerpc and s390 patches have been compile tested only. Can the
> folks copied on this series verify that nothing breaks for them. Also,
> a new file, arch/powerpc/platforms/pseries/cc_platform.c, has been
> created for powerpc to hold the out of line function.

...

> 
> Tom Lendacky (8):
>   x86/ioremap: Selectively build arch override encryption functions
>   mm: Introduce a function to check for confidential computing features
>   x86/sev: Add an x86 version of cc_platform_has()
>   powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
>   x86/sme: Replace occurrences of sme_active() with cc_platform_has()
>   x86/sev: Replace occurrences of sev_active() with cc_platform_has()
>   x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
>   treewide: Replace the use of mem_encrypt_active() with
> cc_platform_has()

Ok, modulo the minor things the plan is to take this through tip after
-rc2 releases in order to pick up the powerpc build fix and have a clean
base (-rc2) to base stuff on, at the same time.

Pls holler if something's still amiss.

Sathya,

if you want to prepare the Intel variant intel_cc_platform_has() ontop
of those and send it to me, that would be good because then I can
integrate it all in one branch which can be used to base future work
ontop.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver

2021-09-15 Thread Michael Kelley via iommu
From: Tianyu Lan   Sent: Tuesday, September 14, 2021 6:39 
AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma address will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
> 
> Allocate rx/tx ring buffer via alloc_pages() in Isolation VM and map
> these pages via vmap(). After calling vmbus_establish_gpadl() which
> marks these pages visible to host, unmap these pages to release the
> virtual address mapped with physical address below shared_gpa_boundary
> and map them in the extra address space via vmap_pfn().
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * Allocate rx/tx ring buffer via alloc_pages() in Isolation VM
>   * Map pages after calling vmbus_establish_gpadl().
>   * set dma_set_min_align_mask for netvsc driver.
> 
> Change since v3:
>   * Add comment to explain why not to use dma_map_sg()
>   * Fix some error handle.
> ---
>  drivers/net/hyperv/hyperv_net.h   |   7 +
>  drivers/net/hyperv/netvsc.c   | 287 +-
>  drivers/net/hyperv/netvsc_drv.c   |   1 +
>  drivers/net/hyperv/rndis_filter.c |   2 +
>  include/linux/hyperv.h|   5 +
>  5 files changed, 296 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 315278a7cf88..87e8c74398a5 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>   u32 total_bytes;
>   u32 send_buf_index;
>   u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
>  };
> 
>  #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,8 @@ struct netvsc_device {
> 
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + struct page **recv_pages;
> + u32 recv_page_count;
>   u32 recv_buf_size; /* allocated bytes */
>   struct vmbus_gpadl recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
> @@ -1082,6 +1085,8 @@ struct netvsc_device {
> 
>   /* Send buffer allocated by us */
>   void *send_buf;
> + struct page **send_pages;
> + u32 send_page_count;
>   u32 send_buf_size;
>   struct vmbus_gpadl send_buf_gpadl_handle;
>   u32 send_section_cnt;
> @@ -1731,4 +1736,6 @@ struct rndis_message {
>  #define RETRY_US_HI  1
>  #define RETRY_MAX2000/* >10 sec */
> 
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> +   struct hv_netvsc_packet *packet);
>  #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 1f87e570ed2b..7d5254bf043e 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
> @@ -150,11 +151,33 @@ static void free_netvsc_device(struct rcu_head *head)
>  {
>   struct netvsc_device *nvdev
>   = container_of(head, struct netvsc_device, rcu);
> + unsigned int alloc_unit;
>   int i;
> 
>   kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_pages) {
> + alloc_unit = (nvdev->recv_buf_size /
> + nvdev->recv_page_count) >> PAGE_SHIFT;
> +
> + vunmap(nvdev->recv_buf);
> + for (i = 0; i < nvdev->recv_page_count; i++)
> + __free_pages(nvdev->recv_pages[i], alloc_unit);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_pages) {
> + alloc_unit = (nvdev->send_buf_size /
> + nvdev->send_page_count) >> PAGE_SHIFT;
> +
> + vunmap(nvdev->send_buf);
> + for (i = 0; i < nvdev->send_page_count; i++)
> + __free_pages(nvdev->send_pages[i], alloc_unit);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
>   kfree(nvdev->send_section_map);
> 
>   for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -330,6 +353,108 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device 
> *net_device, u32 q_idx)
>   return nvchan->mrc.slots ? 0 : -ENOMEM;
>  }
> 
> +void *netvsc_alloc_pages(struct page ***pages_array, unsigned int *array_len,
> +  unsigned long size)
> +{
> + struct page *page, **pages, **vmap_pages;
> + unsigned long pg_count = size >> PAGE_SHIFT;
> + int alloc_unit = MAX_ORDER_NR_PAGES;
> + int i, j, vmap_page_index = 0;
> + void *vaddr;
> +
> + if (pg_count < alloc_unit)
> + alloc_unit = 1;
> +
> + /* vmap() accepts page array with PAGE_SIZE as 

RE: [PATCH V5 11/12] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-09-15 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
> to map these memory during sending/receiving packet and return swiotlb
> bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
> marked to be visible to host and the swiotlb force mode is enabled.
> 
> Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
> keep the original data offset in the bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * use scsi_dma_map/unmap() instead of dma_map/unmap_sg()
>   * Add deleted comments back.
>   * Fix error calculation of  hvpnfs_to_add
> 
> Change since v3:
>   * Rplace dma_map_page with dma_map_sg()
>   * Use for_each_sg() to populate payload->range.pfn_array.
>   * Remove storvsc_dma_map macro
> ---
>  drivers/hv/vmbus_drv.c |  1 +
>  drivers/scsi/storvsc_drv.c | 24 +++-
>  include/linux/hyperv.h |  1 +
>  3 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index b0be287e9a32..9c53f823cde1 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2121,6 +2121,7 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   hv_debug_add_dev_dir(child_device_obj);
> 
>   child_device_obj->device.dma_mask = _dma_mask;
> + child_device_obj->device.dma_parms = _device_obj->dma_parms;
>   return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index ebbbc1299c62..d10b450bcf0c 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
>  #include 
>  #include 
>  #include 
> @@ -1322,6 +1324,7 @@ static void storvsc_on_channel_callback(void *context)
>   continue;
>   }
>   request = (struct storvsc_cmd_request 
> *)scsi_cmd_priv(scmnd);
> + scsi_dma_unmap(scmnd);
>   }
> 
>   storvsc_on_receive(stor_device, packet, request);
> @@ -1735,7 +1738,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
> struct scsi_cmnd *scmnd)
>   struct hv_host_device *host_dev = shost_priv(host);
>   struct hv_device *dev = host_dev->dev;
>   struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> - int i;
>   struct scatterlist *sgl;
>   unsigned int sg_count;
>   struct vmscsi_request *vm_srb;
> @@ -1817,10 +1819,11 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload_sz = sizeof(cmd_request->mpb);
> 
>   if (sg_count) {
> - unsigned int hvpgoff, hvpfns_to_add;
>   unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>   unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> - u64 hvpfn;
> + struct scatterlist *sg;
> + unsigned long hvpfn, hvpfns_to_add;
> + int j, i = 0;
> 
>   if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1834,8 +1837,11 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>   payload->range.len = length;
>   payload->range.offset = offset_in_hvpg;
> 
> + sg_count = scsi_dma_map(scmnd);
> + if (sg_count < 0)
> + return SCSI_MLQUEUE_DEVICE_BUSY;
> 
> - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
> + for_each_sg(sgl, sg, sg_count, j) {
>   /*
>* Init values for the current sgl entry. hvpgoff
>* and hvpfns_to_add are in units of Hyper-V size

Nit:  The above comment is now out-of-date because hvpgoff has
been removed.

> @@ -1845,10 +1851,9 @@ static int storvsc_queuecommand(struct Scsi_Host 
> *host, struct scsi_cmnd *scmnd)
>* even on other than the first sgl entry, provided
>* they are a multiple of PAGE_SIZE.
>*/
> - hvpgoff = HVPFN_DOWN(sgl->offset);
> - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
> - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
> - hvpgoff;
> + hvpfn = HVPFN_DOWN(sg_dma_address(sg));
> + hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
> +  sg_dma_len(sg)) - hvpfn;

Good.  This looks correct now.

> 
>  

RE: [PATCH V5 10/12] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-09-15 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Hyper-V initalizes swiotlb bounce buffer and default swiotlb
> needs to be disabled. pci_swiotlb_detect_override() and
> pci_swiotlb_detect_4gb() enable the default one. To override
> the setting, hyperv_swiotlb_detect() needs to run before
> these detect functions which depends on the pci_xen_swiotlb_
> init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
> _detect() to keep the order.
> 
> Swiotlb bounce buffer code calls set_memory_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space via memremap. Populate the shared_gpa_boundary
> (vTOM) via swiotlb_unencrypted_base variable.
> 
> The map function memremap() can't work in the early place
> hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
> buffer in the hyperv_iommu_swiotlb_later_init().
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>* Use swiotlb_unencrypted_base variable to pass shared_gpa_
>  boundary and map bounce buffer inside swiotlb code.
> 
> Change since v3:
>* Get hyperv bounce bufffer size via default swiotlb
>bounce buffer size function and keep default size as
>same as the one in the AMD SEV VM.
> ---
>  arch/x86/include/asm/mshyperv.h |  2 ++
>  arch/x86/mm/mem_encrypt.c   |  3 +-
>  arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
>  drivers/hv/vmbus_drv.c  |  3 ++
>  drivers/iommu/hyperv-iommu.c| 60 +
>  include/linux/hyperv.h  |  1 +
>  6 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 165423e8b67a..2d22f29f90c9 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -182,6 +182,8 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, 
> int vcpu, int vector,
>   struct hv_interrupt_entry *entry);
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry);
>  int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool 
> visible);
> +void *hv_map_memory(void *addr, unsigned long size);
> +void hv_unmap_memory(void *addr);

Aren't these two declarations now spurious?

>  void hv_ghcb_msr_write(u64 msr, u64 value);
>  void hv_ghcb_msr_read(u64 msr, u64 *value);
>  #else /* CONFIG_HYPERV */
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ff08dc463634..e2db0b8ed938 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "mm_internal.h"
> 
> @@ -202,7 +203,7 @@ void __init sev_setup_arch(void)
>   phys_addr_t total_mem = memblock_phys_mem_size();
>   unsigned long size;
> 
> - if (!sev_active())
> + if (!sev_active() && !hv_is_isolation_supported())
>   return;
> 
>   /*
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 54f9aa7e8457..43bd031aa332 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include 
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
> 
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -   NULL,
> +   hyperv_swiotlb_detect,
> pci_xen_swiotlb_init,
> NULL);
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 392c1ac4f819..b0be287e9a32 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t 
> *type,
>   return child_device_obj;
>  }
> 
> +static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
>  /*
>   * vmbus_device_register - Register the child device
>   */
> @@ -2118,6 +2120,7 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   }
>   hv_debug_add_dev_dir(child_device_obj);
> 
> + child_device_obj->device.dma_mask = _dma_mask;
>   return 0;
> 
>  err_kset_unregister:
> diff --git 

RE: [PATCH V5 09/12] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-09-15 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Expose swiotlb_unencrypted_base for platforms to set unencrypted
> memory base offset and call memremap() to map bounce buffer in the
> swiotlb code, store map address and use the address to copy data
> from/to swiotlb bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * Expose swiotlb_unencrypted_base to set unencrypted memory
> offset.
>   * Use memremap() to map bounce buffer if swiotlb_unencrypted_
> base is set.
> 
> Change since v1:
>   * Make swiotlb_init_io_tlb_mem() return error code and return
>   error when dma_map_decrypted() fails.
> ---
>  include/linux/swiotlb.h |  6 ++
>  kernel/dma/swiotlb.c| 41 +++--
>  2 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index b0cb2a9973f4..4998ed44ae3d 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
>   * @end: The end address of the swiotlb memory pool. Used to do a quick
>   *   range check to see if the memory was in fact allocated by this
>   *   API.
> + * @vaddr:   The vaddr of the swiotlb memory pool. The swiotlb
> + *   memory pool may be remapped in the memory encrypted case and 
> store
> + *   virtual address for bounce buffer operation.
>   * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
>   *   @end. For default swiotlb, this is command line adjustable via
>   *   setup_io_tlb_npages.
> @@ -91,6 +94,7 @@ extern enum swiotlb_force swiotlb_force;
>  struct io_tlb_mem {
>   phys_addr_t start;
>   phys_addr_t end;
> + void *vaddr;
>   unsigned long nslabs;
>   unsigned long used;
>   unsigned int index;
> @@ -185,4 +189,6 @@ static inline bool is_swiotlb_for_alloc(struct device 
> *dev)
>  }
>  #endif /* CONFIG_DMA_RESTRICTED_POOL */
> 
> +extern phys_addr_t swiotlb_unencrypted_base;
> +
>  #endif /* __LINUX_SWIOTLB_H */
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 87c40517e822..9e30cc4bd872 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
> 
>  struct io_tlb_mem io_tlb_default_mem;
> 
> +phys_addr_t swiotlb_unencrypted_base;
> +
>  /*
>   * Max segment that we can provide which (if pages are contingous) will
>   * not be bounced (unless SWIOTLB_FORCE is set).
> @@ -175,7 +178,7 @@ void __init swiotlb_update_mem_attributes(void)
>   memset(vaddr, 0, bytes);
>  }
> 
> -static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
> start,
> +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
>   unsigned long nslabs, bool late_alloc)
>  {
>   void *vaddr = phys_to_virt(start);
> @@ -196,13 +199,34 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>   mem->slots[i].alloc_size = 0;
>   }
> +
> + if (set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT))
> + return -EFAULT;
> +
> + /*
> +  * Map memory in the unencrypted physical address space when requested
> +  * (e.g. for Hyper-V AMD SEV-SNP Isolation VMs).
> +  */
> + if (swiotlb_unencrypted_base) {
> + phys_addr_t paddr = __pa(vaddr) + swiotlb_unencrypted_base;

Nit:  Use "start" instead of "__pa(vaddr)" since "start" is already the needed
physical address.

> +
> + vaddr = memremap(paddr, bytes, MEMREMAP_WB);
> + if (!vaddr) {
> + pr_err("Failed to map the unencrypted memory.\n");
> + return -ENOMEM;
> + }
> + }
> +
>   memset(vaddr, 0, bytes);
> + mem->vaddr = vaddr;
> + return 0;
>  }
> 
>  int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
>  {
>   struct io_tlb_mem *mem = _tlb_default_mem;
>   size_t alloc_size;
> + int ret;
> 
>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>   return 0;
> @@ -217,7 +241,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>   

RE: [PATCH V5 07/12] Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message

2021-09-15 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
> with host in Isolation VM and so it's necessary to use hvcall to set
> them visible to host. In Isolation VM with AMD SEV SNP, the access
> address should be in the extra space which is above shared gpa
> boundary. So remap these pages into the extra address(pa +
> shared_gpa_boundary).
> 
> Introduce monitor_pages_original[] in the struct vmbus_connection
> to store monitor page virtual address returned by hv_alloc_hyperv_
> zeroed_page() and free monitor page via monitor_pages_original in
> the vmbus_disconnect(). The monitor_pages[] is to used to access
> monitor page and it is initialized to be equal with monitor_pages_
> original. The monitor_pages[] will be overridden in the isolation VM
> with va of extra address. Introduce monitor_pages_pa[] to store
> monitor pages' physical address and use it to populate pa in the
> initiate msg.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>   * Introduce monitor_pages_pa[] to store monitor pages' physical
> address and use it to populate pa in the initiate msg.
>   * Move code of mapping moniter pages in extra address into
> vmbus_connect().
> 
> Change since v3:
>   * Rename monitor_pages_va with monitor_pages_original
>   * free monitor page via monitor_pages_original and
> monitor_pages is used to access monitor page.
> 
> Change since v1:
> * Not remap monitor pages in the non-SNP isolation VM.
> ---
>  drivers/hv/connection.c   | 90 ---
>  drivers/hv/hyperv_vmbus.h |  2 +
>  2 files changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 8820ae68f20f..edd8f7dd169f 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -19,6 +19,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
> 
>  #include "hyperv_vmbus.h"
> @@ -102,8 +104,9 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>   vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
>   }
> 
> - msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
> - msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> + msg->monitor_page1 = vmbus_connection.monitor_pages_pa[0];
> + msg->monitor_page2 = vmbus_connection.monitor_pages_pa[1];
> +
>   msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
> 
>   /*
> @@ -216,6 +219,65 @@ int vmbus_connect(void)
>   goto cleanup;
>   }
> 
> + vmbus_connection.monitor_pages_original[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages_original[1]
> + = vmbus_connection.monitor_pages[1];
> + vmbus_connection.monitor_pages_pa[0]
> + = virt_to_phys(vmbus_connection.monitor_pages[0]);
> + vmbus_connection.monitor_pages_pa[1]
> + = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> + if (hv_is_isolation_supported()) {
> + vmbus_connection.monitor_pages_pa[0] +=
> + ms_hyperv.shared_gpa_boundary;
> + vmbus_connection.monitor_pages_pa[1] +=
> + ms_hyperv.shared_gpa_boundary;
> +
> + ret = set_memory_decrypted((unsigned long)
> +vmbus_connection.monitor_pages[0],
> +1);
> + ret |= set_memory_decrypted((unsigned long)
> + vmbus_connection.monitor_pages[1],
> + 1);
> + if (ret)
> + goto cleanup;
> +
> + /*
> +  * Isolation VM with AMD SNP needs to access monitor page via
> +  * address space above shared gpa boundary.
> +  */
> + if (hv_isolation_type_snp()) {
> + vmbus_connection.monitor_pages[0]
> + = memremap(vmbus_connection.monitor_pages_pa[0],
> +HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[0]) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + vmbus_connection.monitor_pages[1]
> + = memremap(vmbus_connection.monitor_pages_pa[1],
> +HV_HYP_PAGE_SIZE,
> +MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[1]) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> + }
> +
> + /*
> 

RE: [PATCH V5 05/12] x86/hyperv: Add Write/Read MSR registers via ghcb page

2021-09-15 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> Hyperv provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers in Isolation VM with AMD SEV SNP
> and these registers are emulated by hypervisor directly.
> Hyperv requires to write SINTx MSR registers twice. First
> writes MSR via GHCB page to communicate with hypervisor
> and then writes wrmsr instruction to talk with paravisor
> which runs in VMPL0. Guest OS ID MSR also needs to be set
> via GHCB page.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v4:
>* Remove hv_get_simp(), hv_get_siefp()  hv_get_synint_*()
>  helper function. Move the logic into hv_get/set_register().
> 
> Change since v3:
>  * Pass old_msg_type to hv_signal_eom() as parameter.
>* Use HV_REGISTER_* marcro instead of HV_X64_MSR_*
>* Add hv_isolation_type_snp() weak function.
>* Add maros to set syinc register in ARM code.
> 
> Change since v1:
>  * Introduce sev_es_ghcb_hv_call_simple() and share code
>  between SEV and Hyper-V code.
> 
> Fix for hyperv: Add Write/Read MSR registers via ghcb page
> ---
>  arch/x86/hyperv/hv_init.c   |  36 +++
>  arch/x86/hyperv/ivm.c   | 103 
>  arch/x86/include/asm/mshyperv.h |  56 -
>  arch/x86/include/asm/sev.h  |   6 ++
>  arch/x86/kernel/sev-shared.c|  63 +++
>  drivers/hv/hv.c |  77 +++-
>  drivers/hv/hv_common.c  |   6 ++
>  include/asm-generic/mshyperv.h  |   2 +
>  8 files changed, 266 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index d57df6825527..a16a83e46a30 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(hv_current_partition_id);
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> -void __percpu **hv_ghcb_pg;
> +union hv_ghcb __percpu **hv_ghcb_pg;
> 
>  /* Storage to save the hypercall page temporarily for hibernation */
>  static void *hv_hypercall_pg_saved;
> @@ -406,7 +406,7 @@ void __init hyperv_init(void)
>   }
> 
>   if (hv_isolation_type_snp()) {
> - hv_ghcb_pg = alloc_percpu(void *);
> + hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
>   if (!hv_ghcb_pg)
>   goto free_vp_assist_page;
>   }
> @@ -424,6 +424,9 @@ void __init hyperv_init(void)
>   guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 
> + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
> +
>   hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
>   VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> @@ -501,6 +504,7 @@ void __init hyperv_init(void)
> 
>  clean_guest_os_id:
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>   cpuhp_remove_state(cpuhp);
>  free_ghcb_page:
>   free_percpu(hv_ghcb_pg);
> @@ -522,6 +526,7 @@ void hyperv_cleanup(void)
> 
>   /* Reset our OS id */
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
> 
>   /*
>* Reset hypercall page reference before reset the page,
> @@ -592,30 +597,3 @@ bool hv_is_hyperv_initialized(void)
>   return hypercall_msr.enable;
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> - if (!(ms_hyperv.priv_high & HV_ISOLATION))
> - return HV_ISOLATION_TYPE_NONE;
> - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> - if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> - return false;
> -
> - if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> - return false;
> -
> - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -
> -DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
> -
> -bool hv_isolation_type_snp(void)
> -{
> - return static_branch_unlikely(_type_snp);
> -}
> -EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 79e7fb83472a..5439723446c9 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -6,12 +6,115 @@
>   *  Tianyu Lan 
>   */
> 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
> +#include 
> +
> +union hv_ghcb {
> + struct ghcb ghcb;
> +} __packed __aligned(HV_HYP_PAGE_SIZE);
> +
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long 

RE: [PATCH V5 04/12] Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM

2021-09-15 Thread Michael Kelley via iommu
From: Tianyu Lan  Sent: Tuesday, September 14, 2021 6:39 AM
> 
> Mark vmbus ring buffer visible with set_memory_decrypted() when
> establish gpadl handle.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change sincv v4
>   * Change gpadl handle in netvsc and uio driver from u32 to
> struct vmbus_gpadl.
>   * Change vmbus_establish_gpadl()'s gpadl_handle parameter
> to vmbus_gpadl data structure.
> 
> Change since v3:
>   * Change vmbus_teardown_gpadl() parameter and put gpadl handle,
> buffer and buffer size in the struct vmbus_gpadl.
> ---
>  drivers/hv/channel.c| 54 -
>  drivers/net/hyperv/hyperv_net.h |  5 +--
>  drivers/net/hyperv/netvsc.c | 17 ++-
>  drivers/uio/uio_hv_generic.c| 20 ++--
>  include/linux/hyperv.h  | 12 ++--
>  5 files changed, 71 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index f3761c73b074..cf419eb1de77 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -456,7 +457,7 @@ static int create_gpadl_header(enum hv_gpadl_type type, 
> void *kbuffer,
>  static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
>  enum hv_gpadl_type type, void *kbuffer,
>  u32 size, u32 send_offset,
> -u32 *gpadl_handle)
> +struct vmbus_gpadl *gpadl)
>  {
>   struct vmbus_channel_gpadl_header *gpadlmsg;
>   struct vmbus_channel_gpadl_body *gpadl_body;
> @@ -474,6 +475,15 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   if (ret)
>   return ret;
> 
> + ret = set_memory_decrypted((unsigned long)kbuffer,
> +HVPFN_UP(size));

This should be PFN_UP, not HVPFN_UP.  The numpages parameter to
set_memory_decrypted() is in guest size pages, not Hyper-V size pages.

> + if (ret) {
> + dev_warn(>device_obj->device,
> +  "Failed to set host visibility for new GPADL %d.\n",
> +  ret);
> + return ret;
> + }
> +
>   init_completion(>waitevent);
>   msginfo->waiting_channel = channel;
> 
> @@ -537,7 +547,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   }
> 
>   /* At this point, we received the gpadl created msg */
> - *gpadl_handle = gpadlmsg->gpadl;
> + gpadl->gpadl_handle = gpadlmsg->gpadl;
> + gpadl->buffer = kbuffer;
> + gpadl->size = size;
> +
> 
>  cleanup:
>   spin_lock_irqsave(_connection.channelmsg_lock, flags);
> @@ -549,6 +562,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   }
> 
>   kfree(msginfo);
> +
> + if (ret)
> + set_memory_encrypted((unsigned long)kbuffer,
> +  HVPFN_UP(size));

Should be PFN_UP as noted on the previous call to set_memory_decrypted().

> +
>   return ret;
>  }
> 
> @@ -561,10 +579,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
> *channel,
>   * @gpadl_handle: some funky thing
>   */
>  int vmbus_establish_gpadl(struct vmbus_channel *channel, void *kbuffer,
> -   u32 size, u32 *gpadl_handle)
> +   u32 size, struct vmbus_gpadl *gpadl)
>  {
>   return __vmbus_establish_gpadl(channel, HV_GPADL_BUFFER, kbuffer, size,
> -0U, gpadl_handle);
> +0U, gpadl);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_establish_gpadl);
> 
> @@ -639,6 +657,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   struct vmbus_channel_open_channel *open_msg;
>   struct vmbus_channel_msginfo *open_info = NULL;
>   struct page *page = newchannel->ringbuffer_page;
> + struct vmbus_gpadl gpadl;

I think this local variable was needed in a previous version of the patch, but
is now unused and should be deleted.

>   u32 send_pages, recv_pages;
>   unsigned long flags;
>   int err;
> @@ -675,7 +694,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   goto error_clean_ring;
> 
>   /* Establish the gpadl for the ring buffer */
> - newchannel->ringbuffer_gpadlhandle = 0;
> + newchannel->ringbuffer_gpadlhandle.gpadl_handle = 0;
> 
>   err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
> page_address(newchannel->ringbuffer_page),
> @@ -701,7 +720,8 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   open_msg->header.msgtype = CHANNELMSG_OPENCHANNEL;
>   open_msg->openid = newchannel->offermsg.child_relid;
>   open_msg->child_relid = newchannel->offermsg.child_relid;
> - open_msg->ringbuffer_gpadlhandle = newchannel->ringbuffer_gpadlhandle;
> + 

[PATCH v1 2/2] iommu/vt-d: avoid duplicated removing in __domain_mapping

2021-09-15 Thread Longpeng(Mike)
__domain_mapping() always removes the pages in the range from
'iov_pfn' to 'end_pfn', but the 'end_pfn' is always the last pfn
of the range that the caller wants to map.

This would introduce too many duplicated removing and leads the
map operation take too long, for example:

  Map iova=0x10,nr_pages=0x7d61800
iov_pfn: 0x10, end_pfn: 0x7e617ff
iov_pfn: 0x14, end_pfn: 0x7e617ff
iov_pfn: 0x18, end_pfn: 0x7e617ff
iov_pfn: 0x1c, end_pfn: 0x7e617ff
iov_pfn: 0x20, end_pfn: 0x7e617ff
...
  it takes about 50ms in total.

We can reduce the cost by recalculate the 'end_pfn' and limit it
to the boundary of the end of the pte page.

  Map iova=0x10,nr_pages=0x7d61800
iov_pfn: 0x10, end_pfn: 0x13
iov_pfn: 0x14, end_pfn: 0x17
iov_pfn: 0x18, end_pfn: 0x1b
iov_pfn: 0x1c, end_pfn: 0x1f
iov_pfn: 0x20, end_pfn: 0x23
...
  it only need 9ms now.

Signed-off-by: Longpeng(Mike) 
---
 drivers/iommu/intel/iommu.c | 12 +++-
 include/linux/intel-iommu.h |  6 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59a..87cbf34 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2354,12 +2354,18 @@ static void switch_to_super_page(struct dmar_domain 
*domain,
return -ENOMEM;
first_pte = pte;
 
+   lvl_pages = lvl_to_nr_pages(largepage_lvl);
+   BUG_ON(nr_pages < lvl_pages);
+
/* It is large page*/
if (largepage_lvl > 1) {
unsigned long end_pfn;
+   unsigned long pages_to_remove;
 
pteval |= DMA_PTE_LARGE_PAGE;
-   end_pfn = ((iov_pfn + nr_pages) & 
level_mask(largepage_lvl)) - 1;
+   pages_to_remove = min_t(unsigned long, nr_pages,
+   
nr_pte_to_next_page(pte) * lvl_pages);
+   end_pfn = iov_pfn + pages_to_remove - 1;
switch_to_super_page(domain, iov_pfn, end_pfn, 
largepage_lvl);
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
@@ -2381,10 +2387,6 @@ static void switch_to_super_page(struct dmar_domain 
*domain,
WARN_ON(1);
}
 
-   lvl_pages = lvl_to_nr_pages(largepage_lvl);
-
-   BUG_ON(nr_pages < lvl_pages);
-
nr_pages -= lvl_pages;
iov_pfn += lvl_pages;
phys_pfn += lvl_pages;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a590b00..4bff70c 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -713,6 +713,12 @@ static inline bool first_pte_in_page(struct dma_pte *pte)
return !((unsigned long)pte & ~VTD_PAGE_MASK);
 }
 
+static inline int nr_pte_to_next_page(struct dma_pte *pte)
+{
+   return first_pte_in_page(pte) ? BIT_ULL(VTD_STRIDE_SHIFT) :
+   (struct dma_pte *)VTD_PAGE_ALIGN((unsigned long)pte) - pte;
+}
+
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev 
*dev);
 extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 1/2] iommu/vt-d: convert the return type of first_pte_in_page to bool

2021-09-15 Thread Longpeng(Mike)
first_pte_in_page() returns boolean value, so let's convert its
return type to bool.

Signed-off-by: Longpeng(Mike) 
---
 include/linux/intel-iommu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 05a65eb..a590b00 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -708,7 +708,7 @@ static inline bool dma_pte_superpage(struct dma_pte *pte)
return (pte->val & DMA_PTE_LARGE_PAGE);
 }
 
-static inline int first_pte_in_page(struct dma_pte *pte)
+static inline bool first_pte_in_page(struct dma_pte *pte)
 {
return !((unsigned long)pte & ~VTD_PAGE_MASK);
 }
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 0/2] iommu/vt-d: boost the mapping process

2021-09-15 Thread Longpeng(Mike)
Hi guys,

We found that the __domain_mapping() would take too long when
the memory region is too large, we try to make it faster in this
patchset. The performance number can be found in PATCH 2, please
review when you free, thanks.

Longpeng(Mike) (2):
  iommu/vt-d: convert the return type of first_pte_in_page to bool
  iommu/vt-d: avoid duplicated removing in __domain_mapping

 drivers/iommu/intel/iommu.c | 12 +++-
 include/linux/intel-iommu.h |  8 +++-
 2 files changed, 14 insertions(+), 6 deletions(-)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier

2021-09-15 Thread Thierry Reding
On Tue, Sep 07, 2021 at 07:44:44PM +0200, Thierry Reding wrote:
> On Tue, Sep 07, 2021 at 10:33:24AM -0500, Rob Herring wrote:
> > On Fri, Sep 3, 2021 at 10:36 AM Thierry Reding  
> > wrote:
> > >
> > > On Fri, Sep 03, 2021 at 09:36:33AM -0500, Rob Herring wrote:
> > > > On Fri, Sep 3, 2021 at 8:52 AM Thierry Reding 
> > > >  wrote:
> > > > >
> > > > > On Fri, Sep 03, 2021 at 08:20:55AM -0500, Rob Herring wrote:
> > > > > > On Wed, Sep 1, 2021 at 9:13 AM Thierry Reding 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Jul 02, 2021 at 05:16:25PM +0300, Dmitry Osipenko wrote:
> > > > > > > > 01.07.2021 21:14, Thierry Reding пишет:
> > > > > > > > > On Tue, Jun 08, 2021 at 06:51:40PM +0200, Thierry Reding 
> > > > > > > > > wrote:
> > > > > > > > >> On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding 
> > > > > > > > >> wrote:
> > > > > > > > >>> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote:
> > > > > > > >  On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding 
> > > > > > > >  wrote:
> > > > > > > > > From: Thierry Reding 
> > > > > > > > >
> > > > > > > > > Reserved memory region phandle references can be 
> > > > > > > > > accompanied by a
> > > > > > > > > specifier that provides additional information about how 
> > > > > > > > > that specific
> > > > > > > > > reference should be treated.
> > > > > > > > >
> > > > > > > > > One use-case is to mark a memory region as needing an 
> > > > > > > > > identity mapping
> > > > > > > > > in the system's IOMMU for the device that references the 
> > > > > > > > > region. This is
> > > > > > > > > needed for example when the bootloader has set up 
> > > > > > > > > hardware (such as a
> > > > > > > > > display controller) to actively access a memory region 
> > > > > > > > > (e.g. a boot
> > > > > > > > > splash screen framebuffer) during boot. The operating 
> > > > > > > > > system can use the
> > > > > > > > > identity mapping flag from the specifier to make sure an 
> > > > > > > > > IOMMU identity
> > > > > > > > > mapping is set up for the framebuffer before IOMMU 
> > > > > > > > > translations are
> > > > > > > > > enabled for the display controller.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Thierry Reding 
> > > > > > > > > ---
> > > > > > > > >  .../reserved-memory/reserved-memory.txt   | 21 
> > > > > > > > > +++
> > > > > > > > >  include/dt-bindings/reserved-memory.h |  8 
> > > > > > > > > +++
> > > > > > > > >  2 files changed, 29 insertions(+)
> > > > > > > > >  create mode 100644 include/dt-bindings/reserved-memory.h
> > > > > > > > 
> > > > > > > >  Sorry for being slow on this. I have 2 concerns.
> > > > > > > > 
> > > > > > > >  First, this creates an ABI issue. A DT with cells in 
> > > > > > > >  'memory-region'
> > > > > > > >  will not be understood by an existing OS. I'm less 
> > > > > > > >  concerned about this
> > > > > > > >  if we address that with a stable fix. (Though I'm pretty 
> > > > > > > >  sure we've
> > > > > > > >  naively added #?-cells in the past ignoring this issue.)
> > > > > > > > >>>
> > > > > > > > >>> A while ago I had proposed adding memory-region*s* as an 
> > > > > > > > >>> alternative
> > > > > > > > >>> name for memory-region to make the naming more consistent 
> > > > > > > > >>> with other
> > > > > > > > >>> types of properties (think clocks, resets, gpios, ...). If 
> > > > > > > > >>> we added
> > > > > > > > >>> that, we could easily differentiate between the "legacy" 
> > > > > > > > >>> cases where
> > > > > > > > >>> no #memory-region-cells was allowed and the new cases where 
> > > > > > > > >>> it was.
> > > > > > > > >>>
> > > > > > > >  Second, it could be the bootloader setting up the reserved 
> > > > > > > >  region. If a
> > > > > > > >  node already has 'memory-region', then adding more regions 
> > > > > > > >  is more
> > > > > > > >  complicated compared to adding new properties. And 
> > > > > > > >  defining what each
> > > > > > > >  memory-region entry is or how many in schemas is 
> > > > > > > >  impossible.
> > > > > > > > >>>
> > > > > > > > >>> It's true that updating the property gets a bit 
> > > > > > > > >>> complicated, but it's
> > > > > > > > >>> not exactly rocket science. We really just need to splice 
> > > > > > > > >>> the array. I
> > > > > > > > >>> have a working implemention for this in U-Boot.
> > > > > > > > >>>
> > > > > > > > >>> For what it's worth, we could run into the same issue with 
> > > > > > > > >>> any new
> > > > > > > > >>> property that we add. Even if we renamed this to 
> > > > > > > > >>> iommu-memory-region,
> > > > > > > > >>> it's still possible that a bootloader may have to update 
> > > > > > > > >>> this property
> > > > > > > > >>> if it already exists (it 

Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).

2021-09-15 Thread Paul Menzel

Dear Linux folks,


Am 15.09.21 um 00:17 schrieb Paul Menzel:

[Use Mario’s current address]

Am 15.09.21 um 00:15 schrieb Paul Menzel:

[Cc: +Mario from AMD]



Am 14.09.21 um 14:09 schrieb Jörg Rödel:

On Tue, Sep 14, 2021 at 11:10:57AM +0200, Paul Menzel wrote:

Linux 5.15-rc1 still warns about that (also with latest system firmware
1.1.50).


The reason is most likely that the latest firmware still reports a
different EFR feature set in the IVRS table than the IOMMU reports in
its EFR MMIO register.


What do you mean exactly? Only 0x400 is prepended. The rest of the 
string is identical. What feature set would the 0x400 in the beginning 
be?


ACPICA 20200326 is able to deassemble IVRS subtable type 0x11. The 
incorrect(?) value can be seen there.


$ sudo iasl -d -p IVRS.dsl /sys/firmware/acpi/tables/IVRS
[…]
$ grep EFR IVRS.dsl
[090h 0144   8]EFR Image : 400F77EF22294ADA


Anyway, it’d be great if AMD and Dell could take a look.

Dell client kernel team, please confirm, that you received the report.


Kind regards,

Paul


[1]: https://acpica.org/node/178
/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20200925 (64-bit version)
 * Copyright (c) 2000 - 2020 Intel Corporation
 * 
 * Disassembly of IVRS, Wed Sep 15 11:48:42 2021
 *
 * ACPI Data Table [IVRS]
 *
 * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
 */

[000h    4]Signature : "IVRS"[I/O Virtualization Reporting Structure]
[004h 0004   4] Table Length : 00D0
[008h 0008   1] Revision : 02
[009h 0009   1] Checksum : EE
[00Ah 0010   6]   Oem ID : "AMD"
[010h 0016   8] Oem Table ID : "MYRTLE"
[018h 0024   4] Oem Revision : 0001
[01Ch 0028   4]  Asl Compiler ID : "ACPI"
[020h 0032   4]Asl Compiler Revision : 0004

[024h 0036   4]  Virtualization Info : 00203041
[028h 0040   8] Reserved : 

[030h 0048   1]Subtable Type : 10 [Hardware Definition Block]
[031h 0049   1]Flags : B0
[032h 0050   2]   Length : 0048
[034h 0052   2] DeviceId : 0002

[036h 0054   2]Capability Offset : 0040
[038h 0056   8] Base Address : FC00
[040h 0064   2]PCI Segment Group : 
[042h 0066   2]  Virtualization Info : 
[044h 0068   4]Feature Reporting : 80048F6E

[048h 0072   1]   Entry Type : 03
[049h 0073   2]Device ID : 0008
[04Bh 0075   1] Data Setting : 00

[04Ch 0076   1]   Entry Type : 04
[04Dh 0077   2]Device ID : FFFE
[04Fh 0079   1] Data Setting : 00

[050h 0080   1]   Entry Type : 43
[051h 0081   2]Device ID : FF00
[053h 0083   1] Data Setting : 00
[054h 0084   1] Reserved : 00
[055h 0085   2]Source Used Device ID : 00A4
[057h 0087   1] Reserved : 00

[058h 0088   1]   Entry Type : 04
[059h 0089   2]Device ID : 
[05Bh 0091   1] Data Setting : 00

[05Ch 0092   1]   Entry Type : 00
[05Dh 0093   2]Device ID : 
[05Fh 0095   1] Data Setting : 00

[060h 0096   1]   Entry Type : 48
[061h 0097   2]Device ID : 
[063h 0099   1] Data Setting : 00
[064h 0100   1]   Handle : 00
[065h 0101   2]Source Used Device ID : 00A0
[067h 0103   1]  Variety : 02

[068h 0104   1]   Entry Type : 48
[069h 0105   2]Device ID : 
[06Bh 0107   1] Data Setting : D7
[06Ch 0108   1]   Handle : 20
[06Dh 0109   2]Source Used Device ID : 00A0
[06Fh 0111   1]  Variety : 01

[070h 0112   1]   Entry Type : 48
[071h 0113   2]Device ID : 
[073h 0115   1] Data Setting : 00
[074h 0116   1]   Handle : 21
[075h 0117   2]Source Used Device ID : 0001
[077h 0119   1]  Variety : 01

[078h 0120   1]Subtable Type : 11 [Hardware Definition Block]
[079h 0121   1]Flags : B0
[07Ah 0122   2]   Length : 0058
[07Ch 0124   2] DeviceId : 0002

[07Eh 0126   2]Capability Offset : 0040
[080h 0128   8] Base Address : FC00
[088h 0136   2]PCI Segment Group : 
[08Ah 0138   2]  Virtualization Info : 
[08Ch 0140   4]   Attributes : 00040200
[090h 0144   8]EFR Image : 400F77EF22294ADA
[098h 

Re: [PATCH] swiotlb: set IO TLB segment size via cmdline

2021-09-15 Thread Christoph Hellwig
On Wed, Sep 15, 2021 at 03:49:52PM +0200, Jan Beulich wrote:
> But the question remains: Why does the framebuffer need to be mapped
> in a single giant chunk?

More importantly: if you use dynamic dma mappings for your framebuffer
you're doing something wrong.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: set IO TLB segment size via cmdline

2021-09-15 Thread Jan Beulich via iommu
On 15.09.2021 15:37, Roman Skakun wrote:
>>> From: Roman Skakun 
>>>
>>> It is possible when default IO TLB size is not
>>> enough to fit a long buffers as described here [1].
>>>
>>> This patch makes a way to set this parameter
>>> using cmdline instead of recompiling a kernel.
>>>
>>> [1] https://www.xilinx.com/support/answers/72694.html
>>
>>  I'm not convinced the swiotlb use describe there falls under "intended
>>  use" - mapping a 1280x720 framebuffer in a single chunk?
> 
> I had the same issue while mapping DMA chuck ~4MB for gem fb when
> using xen vdispl.
> I got the next log:
> [ 142.030421] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz:
> 3686400 bytes), total 32768 (slots), used 32 (slots)
> 
> It happened when I tried to map bounce buffer, which has a large size.
> The default size if 128(IO_TLB_SEGSIZE) * 2048(IO_TLB_SHIFT) = 262144
> bytes, but we requested 3686400 bytes.
> When I change IO_TLB_SEGSIZE to 2048. (2048(IO_TLB_SEGSIZE)  *
> 2048(IO_TLB_SHIFT) = 4194304bytes).
> It makes possible to retrieve a bounce buffer for requested size.
> After changing this value, the problem is gone.

But the question remains: Why does the framebuffer need to be mapped
in a single giant chunk?

>>  In order to be sure to catch all uses like this one (including ones
>>  which make it upstream in parallel to yours), I think you will want
>>  to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE.
> 
> I don't understand your point. Can you clarify this?

There's a concrete present example: I have a patch pending adding
another use of IO_TLB_SEGSIZE. This use would need to be replaced
like you do here in several places. The need for the additional
replacement would be quite obvious (from a build failure) if you
renamed the manifest constant. Without renaming, it'll take
someone running into an issue on a live system, which I consider
far worse. This is because a simple re-basing of one of the
patches on top of the other will not point out the need for the
extra replacement, nor would a test build (with both patches in
place).

Jan

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: set IO TLB segment size via cmdline

2021-09-15 Thread Roman Skakun
Hi Jan,

Thanks for the answer.

>> From: Roman Skakun 
>>
>> It is possible when default IO TLB size is not
>> enough to fit a long buffers as described here [1].
>>
>> This patch makes a way to set this parameter
>> using cmdline instead of recompiling a kernel.
>>
>> [1] https://www.xilinx.com/support/answers/72694.html
>
>  I'm not convinced the swiotlb use describe there falls under "intended
>  use" - mapping a 1280x720 framebuffer in a single chunk?

I had the same issue while mapping DMA chuck ~4MB for gem fb when
using xen vdispl.
I got the next log:
[ 142.030421] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz:
3686400 bytes), total 32768 (slots), used 32 (slots)

It happened when I tried to map bounce buffer, which has a large size.
The default size if 128(IO_TLB_SEGSIZE) * 2048(IO_TLB_SHIFT) = 262144
bytes, but we requested 3686400 bytes.
When I change IO_TLB_SEGSIZE to 2048. (2048(IO_TLB_SEGSIZE)  *
2048(IO_TLB_SHIFT) = 4194304bytes).
It makes possible to retrieve a bounce buffer for requested size.
After changing this value, the problem is gone.

>  the bottom of this page is also confusing, as following "Then we can
>  confirm the modified swiotlb size in the boot log:" there is a log
>  fragment showing the same original size of 64Mb.

I suspect, this is a mistake in the article.
According to 
https://elixir.bootlin.com/linux/v5.14.4/source/kernel/dma/swiotlb.c#L214
and
https://elixir.bootlin.com/linux/v5.15-rc1/source/kernel/dma/swiotlb.c#L182
The IO_TLB_SEGSIZE is not used to calculate total size of swiotlb area.
This explains why we have the same total size before and after changing of
TLB segment size.

>  In order to be sure to catch all uses like this one (including ones
>  which make it upstream in parallel to yours), I think you will want
>  to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE.

I don't understand your point. Can you clarify this?

>> + /* get max IO TLB segment size */
>> + if (isdigit(*str)) {
>> + tmp = simple_strtoul(str, , 0);
>> + if (tmp)
>> + io_tlb_seg_size = ALIGN(tmp, IO_TLB_SEGSIZE);
>
> From all I can tell io_tlb_seg_size wants to be a power of 2. Merely
> aligning to a multiple of IO_TLB_SEGSIZE isn't going to be enough.

Yes, right, thanks!

Cheers,
Roman.

вт, 14 сент. 2021 г. в 18:29, Jan Beulich :
>
> On 14.09.2021 17:10, Roman Skakun wrote:
> > From: Roman Skakun 
> >
> > It is possible when default IO TLB size is not
> > enough to fit a long buffers as described here [1].
> >
> > This patch makes a way to set this parameter
> > using cmdline instead of recompiling a kernel.
> >
> > [1] https://www.xilinx.com/support/answers/72694.html
>
> I'm not convinced the swiotlb use describe there falls under "intended
> use" - mapping a 1280x720 framebuffer in a single chunk? (As an aside,
> the bottom of this page is also confusing, as following "Then we can
> confirm the modified swiotlb size in the boot log:" there is a log
> fragment showing the same original size of 64Mb.
>
> > --- a/arch/mips/cavium-octeon/dma-octeon.c
> > +++ b/arch/mips/cavium-octeon/dma-octeon.c
> > @@ -237,7 +237,7 @@ void __init plat_swiotlb_setup(void)
> >   swiotlbsize = 64 * (1<<20);
> >  #endif
> >   swiotlb_nslabs = swiotlbsize >> IO_TLB_SHIFT;
> > - swiotlb_nslabs = ALIGN(swiotlb_nslabs, IO_TLB_SEGSIZE);
> > + swiotlb_nslabs = ALIGN(swiotlb_nslabs, swiotlb_io_seg_size());
>
> In order to be sure to catch all uses like this one (including ones
> which make it upstream in parallel to yours), I think you will want
> to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE.
>
> > @@ -81,15 +86,30 @@ static unsigned int max_segment;
> >  static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
> >
> >  static int __init
> > -setup_io_tlb_npages(char *str)
> > +setup_io_tlb_params(char *str)
> >  {
> > + unsigned long tmp;
> > +
> >   if (isdigit(*str)) {
> > - /* avoid tail segment of size < IO_TLB_SEGSIZE */
> > - default_nslabs =
> > - ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE);
> > + default_nslabs = simple_strtoul(str, , 0);
> >   }
> >   if (*str == ',')
> >   ++str;
> > +
> > + /* get max IO TLB segment size */
> > + if (isdigit(*str)) {
> > + tmp = simple_strtoul(str, , 0);
> > + if (tmp)
> > + io_tlb_seg_size = ALIGN(tmp, IO_TLB_SEGSIZE);
>
> From all I can tell io_tlb_seg_size wants to be a power of 2. Merely
> aligning to a multiple of IO_TLB_SEGSIZE isn't going to be enough.
>
> Jan
>


-- 
Best Regards, Roman.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 6/6] iommu/tegra-smmu: Add pagetable mappings to debugfs

2021-09-15 Thread Dmitry Osipenko
15.09.2021 15:09, Dmitry Osipenko пишет:
> 15.09.2021 07:38, Nicolin Chen пишет:
>> On Tue, Sep 14, 2021 at 10:20:30PM +0300, Dmitry Osipenko wrote:
>>> 14.09.2021 21:49, Nicolin Chen пишет:
 On Tue, Sep 14, 2021 at 04:29:15PM +0300, Dmitry Osipenko wrote:
> 14.09.2021 04:38, Nicolin Chen пишет:
>> +static unsigned long pd_pt_index_iova(unsigned int pd_index, unsigned 
>> int pt_index)
>> +{
>> +return ((dma_addr_t)pd_index & (SMMU_NUM_PDE - 1)) << 
>> SMMU_PDE_SHIFT |
>> +   ((dma_addr_t)pt_index & (SMMU_NUM_PTE - 1)) << 
>> SMMU_PTE_SHIFT;
>> +}
>
> We know that IOVA is fixed to u32 for this controller. Can we avoid all
> these dma_addr_t castings? It should make code cleaner a tad, IMO.

 Tegra210 actually supports 34-bit IOVA...

>>>
>>> It doesn't. 34-bit is PA, 32-bit is VA.
>>>
>>> Quote from T210 TRM:
>>>
>>> "The SMMU is a centralized virtual-to-physical translation for MSS. It
>>> maps a 32-bit virtual address to a 34-bit physical address. If the
>>> client address is 40 bits then bits 39:32 are ignored."
>>
>> If you scroll down by a couple of sections, you can see 34-bit
>> virtual addresses in section 18.6.1.2; and if checking one ASID
>> register, you can see it mention the extra two bits va[33:32].
> 
> Thanks for the pointer. It says that only certain memory clients allow
> to combine 4 ASIDs to form 34bit VA space. In this case the PA space is
> split into 4GB areas and there are additional bitfields which configure
> the ASID mapping of each 4GB area. Still each ASID is 32bit.
> 
> This is what TRM says:
> 
> "For the GPU and other clients with 34-bit address interfaces, the ASID
> registers are extended to point to four ASIDs. The SMMU supports 4GB of
> virtual address space per ASID, so mapping addr[33:32] into ASID[1:0]
> extends the virtual address space of a client to 16GB."
> 
>> However, the driver currently sets its geometry.aperture_end to
>> 32-bit, and we can only get 32-bit IOVAs using PDE and PTE only,
>> so I think it should be safe to remove the castings here. I'll
>> wait for a couple of days and see if there'd be other comments
>> for me to address in next version.
> 
> You will need to read the special "ASID Assignment Register" which
> supports 4 sub-ASIDs to translate the PA address into the actual VA. By

* VA to PA

> default all clients are limited to a single ASID and upstream kernel
> doesn't support programming of 34bit VAs. So doesn't worth the effort to
> fully translate the VA, IMO.
> 
>>> Even if it supported more than 32bit, then the returned ulong is 32bit,
>>> which doesn't make sense.
>>
>> On ARM64 (Tegra210), isn't ulong 64-bit?
> 
> Yes, indeed.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 6/6] iommu/tegra-smmu: Add pagetable mappings to debugfs

2021-09-15 Thread Dmitry Osipenko
15.09.2021 07:38, Nicolin Chen пишет:
> On Tue, Sep 14, 2021 at 10:20:30PM +0300, Dmitry Osipenko wrote:
>> 14.09.2021 21:49, Nicolin Chen пишет:
>>> On Tue, Sep 14, 2021 at 04:29:15PM +0300, Dmitry Osipenko wrote:
 14.09.2021 04:38, Nicolin Chen пишет:
> +static unsigned long pd_pt_index_iova(unsigned int pd_index, unsigned 
> int pt_index)
> +{
> + return ((dma_addr_t)pd_index & (SMMU_NUM_PDE - 1)) << SMMU_PDE_SHIFT |
> +((dma_addr_t)pt_index & (SMMU_NUM_PTE - 1)) << SMMU_PTE_SHIFT;
> +}

 We know that IOVA is fixed to u32 for this controller. Can we avoid all
 these dma_addr_t castings? It should make code cleaner a tad, IMO.
>>>
>>> Tegra210 actually supports 34-bit IOVA...
>>>
>>
>> It doesn't. 34-bit is PA, 32-bit is VA.
>>
>> Quote from T210 TRM:
>>
>> "The SMMU is a centralized virtual-to-physical translation for MSS. It
>> maps a 32-bit virtual address to a 34-bit physical address. If the
>> client address is 40 bits then bits 39:32 are ignored."
> 
> If you scroll down by a couple of sections, you can see 34-bit
> virtual addresses in section 18.6.1.2; and if checking one ASID
> register, you can see it mention the extra two bits va[33:32].

Thanks for the pointer. It says that only certain memory clients allow
to combine 4 ASIDs to form 34bit VA space. In this case the PA space is
split into 4GB areas and there are additional bitfields which configure
the ASID mapping of each 4GB area. Still each ASID is 32bit.

This is what TRM says:

"For the GPU and other clients with 34-bit address interfaces, the ASID
registers are extended to point to four ASIDs. The SMMU supports 4GB of
virtual address space per ASID, so mapping addr[33:32] into ASID[1:0]
extends the virtual address space of a client to 16GB."

> However, the driver currently sets its geometry.aperture_end to
> 32-bit, and we can only get 32-bit IOVAs using PDE and PTE only,
> so I think it should be safe to remove the castings here. I'll
> wait for a couple of days and see if there'd be other comments
> for me to address in next version.

You will need to read the special "ASID Assignment Register" which
supports 4 sub-ASIDs to translate the PA address into the actual VA. By
default all clients are limited to a single ASID and upstream kernel
doesn't support programming of 34bit VAs. So doesn't worth the effort to
fully translate the VA, IMO.

>> Even if it supported more than 32bit, then the returned ulong is 32bit,
>> which doesn't make sense.
> 
> On ARM64 (Tegra210), isn't ulong 64-bit?

Yes, indeed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-15 Thread Borislav Petkov
On Wed, Sep 15, 2021 at 10:28:59AM +1000, Michael Ellerman wrote:
> I don't love it, a new C file and an out-of-line call to then call back
> to a static inline that for most configuration will return false ... but
> whatever :)

Yeah, hch thinks it'll cause a big mess otherwise:

https://lore.kernel.org/lkml/ysscwvpxevxw%2f...@infradead.org/

I guess less ifdeffery is nice too.

> Acked-by: Michael Ellerman  (powerpc)

Thx.

> Yeah, fixed in mainline today, thanks for trying to cross compile :)

Always!

:-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).

2021-09-15 Thread Jörg Rödel
Possible, but still DELL is the first point of contact here. If they find out 
that the problem is actually in Agesa, then DELL can escalate this to AMD.

> Am 15.09.2021 um 10:42 schrieb Paul Menzel :
> 
> Dear Jörg,
> 
> 
> Am 15.09.21 um 10:30 schrieb Jörg Rödel:
>> Mainly DELL should look at this, because it is their BIOS which is
>> responsible for this inconsistency.
> 
> I am not so sure about that, as today’s (x86) firmware often consists of 
> platform initialization (PI) code (or firmware support package (FSP), 
> provided by the chipset/SoC vendors, like AGESA for AMD, which the ODM just 
> integrate.
> 
> If only Dell systems are affected, that would of course point to a bug in the 
> Dell firmware.
> 
> 
> Kind regards,
> 
> Paul

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).

2021-09-15 Thread Paul Menzel

Dear Jörg,


Am 15.09.21 um 10:30 schrieb Jörg Rödel:

Mainly DELL should look at this, because it is their BIOS which is
responsible for this inconsistency.


I am not so sure about that, as today’s (x86) firmware often consists of 
platform initialization (PI) code (or firmware support package (FSP), 
provided by the chipset/SoC vendors, like AGESA for AMD, which the ODM 
just integrate.


If only Dell systems are affected, that would of course point to a bug 
in the Dell firmware.



Kind regards,

Paul
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: AMD-Vi: [Firmware Warn]: EFR mismatch. Use IVHD EFR (0xf77ef22294ada : 0x400f77ef22294ada).

2021-09-15 Thread Jörg Rödel
Mainly DELL should look at this, because it is their BIOS which is responsible 
for this inconsistency.

> Am 15.09.2021 um 00:17 schrieb Paul Menzel :
> 
> [Use Mario’s current address]
> 
> Am 15.09.21 um 00:15 schrieb Paul Menzel:
>> [Cc: +Mario from AMD]
>> Dear Jörg,
>> Am 14.09.21 um 14:09 schrieb Jörg Rödel:
>>> On Tue, Sep 14, 2021 at 11:10:57AM +0200, Paul Menzel wrote:
 Linux 5.15-rc1 still warns about that (also with latest system firmware
 1.1.50).
>>> 
>>> The reason is most likely that the latest firmware still reports a
>>> different EFR feature set in the IVRS table than the IOMMU reports in
>>> its EFR MMIO register.
>> What do you mean exactly? Only 0x400 is prepended. The rest of the string is 
>> identical. What feature set would the 0x400 in the beginning be?
>> Anyway, it’d be great if AMD and Dell could take a look.
>> Kind regards,
>> Paul

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu