from:"Joerg Roedel"

Re: [PATCH 1/4] iommu: constify pointer to bus_type

2024-03-01 Thread Joerg Roedel

On Fri, Feb 16, 2024 at 03:40:24PM +0100, Krzysztof Kozlowski wrote:

Applied all, thanks.

Re: [PATCH 0/8] iommu: fix a couple of spelling mistakes detected by codespell tool

2021-04-16 Thread Joerg Roedel

On Fri, Mar 26, 2021 at 02:24:04PM +0800, Zhen Lei wrote:
> This detection and correction covers the entire driver/iommu directory.
> 
> Zhen Lei (8):
>   iommu/pamu: fix a couple of spelling mistakes
>   iommu/omap: Fix spelling mistake "alignement" -> "alignment"
>   iommu/mediatek: Fix spelling mistake "phyiscal" -> "physical"
>   iommu/sun50i: Fix spelling mistake "consits" -> "consists"
>   iommu: fix a couple of spelling mistakes
>   iommu/amd: fix a couple of spelling mistakes
>   iommu/arm-smmu: Fix spelling mistake "initally" -> "initially"
>   iommu/vt-d: fix a couple of spelling mistakes

This patch-set doesn't apply. Please re-send it as a single patch when
v5.13-rc1 is released.

Thanks,

Joerg

Re: [PATCH v2 1/2] iommu: Statically set module owner

2021-04-16 Thread Joerg Roedel

On Thu, Apr 01, 2021 at 02:56:25PM +0100, Robin Murphy wrote:
> It happens that the 3 drivers which first supported being modular are
> also ones which play games with their pgsize_bitmap, so have non-const
> iommu_ops where dynamically setting the owner manages to work out OK.
> However, it's less than ideal to force that upon all drivers which want
> to be modular - like the new sprd-iommu driver which now has a potential
> bug in that regard - so let's just statically set the module owner and
> let ops remain const wherever possible.
> 
> Reviewed-by: Christoph Hellwig 
> Acked-by: Will Deacon 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 1 +
>  drivers/iommu/sprd-iommu.c  | 1 +
>  drivers/iommu/virtio-iommu.c| 1 +
>  include/linux/iommu.h   | 9 +
>  5 files changed, 5 insertions(+), 8 deletions(-)

Applied these two directly on-top of my next branch. This essentially
means that all topic branches are frozen now until after the merge
window.

Regards,

Joerg

Re: [PATCH] iommu/mediatek: always enable the clk on resume

2021-04-16 Thread Joerg Roedel

On Fri, Apr 16, 2021 at 03:47:01PM +0200, Dafna Hirschfeld wrote:
> Hi,
> I sent v2, removing the word 'comment' from the 'Fixes' tag
> after a problem report from Stephen Rothwell,
> could you replace v1 with v2?

Replaced it, thanks.

[PATCH] iommu/fsl-pamu: Fix uninitialized variable warning

2021-04-15 Thread Joerg Roedel

From: Joerg Roedel 

The variable 'i' in the function update_liodn_stash() is not
initialized and only used in a debug printk(). So it has no
meaning at all, remove it.

Reported-by: kernel test robot 
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/fsl_pamu_domain.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 32944d67baa4..0ac781186dbd 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -57,14 +57,13 @@ static int __init iommu_init_mempool(void)
 static int update_liodn_stash(int liodn, struct fsl_dma_domain *dma_domain,
  u32 val)
 {
-   int ret = 0, i;
+   int ret = 0;
unsigned long flags;
 
spin_lock_irqsave(_lock, flags);
ret = pamu_update_paace_stash(liodn, val);
if (ret) {
-   pr_debug("Failed to update SPAACE %d field for liodn %d\n ",
-i, liodn);
+   pr_debug("Failed to update SPAACE for liodn %d\n ", liodn);
spin_unlock_irqrestore(_lock, flags);
return ret;
}
-- 
2.30.2

Re: [PATCH v2] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-15 Thread Joerg Roedel

On Thu, Apr 15, 2021 at 08:46:28AM +0800, Longpeng(Mike) wrote:
> Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating 
> superpage")
> Cc:  # v3.0+
> Link: 
> https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5...@huawei.com/
> Suggested-by: Lu Baolu 
> Signed-off-by: Longpeng(Mike) 
> ---
> v1 -> v2:
>   - add Joerg
>   - reconstruct the solution base on the Baolu's suggestion
> ---
>  drivers/iommu/intel/iommu.c | 52 
> +
>  1 file changed, 38 insertions(+), 14 deletions(-)

Applied, thanks.

Re: [PATCH 1/2] iommu/mediatek-v1: Avoid build fail when build as module

2021-04-15 Thread Joerg Roedel

On Mon, Apr 12, 2021 at 02:48:42PM +0800, Yong Wu wrote:
> When this driver build as module, It build fail like:
> 
> ERROR: modpost: "of_phandle_iterator_args"
> [drivers/iommu/mtk_iommu_v1.ko] undefined!
> 
> This patch remove this interface to avoid this build fail.
> 
> Reported-by: Valdis Kletnieks 
> Signed-off-by: Yong Wu 
> ---
> Currently below patch is only in linux-next-20210409. This fixes tag may be
> not needed. we can add this if it is need.
> Fixes: 8de000cf0265 ("iommu/mediatek-v1: Allow building as module")
> ---
>  drivers/iommu/mtk_iommu_v1.c | 62 
>  1 file changed, 28 insertions(+), 34 deletions(-)

Applied both, thanks.

Re: [PATCH] iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()'

2021-04-15 Thread Joerg Roedel

On Sun, Apr 11, 2021 at 09:08:17AM +0200, Christophe JAILLET wrote:
>  drivers/iommu/intel/irq_remapping.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

Re: [PATCH 1/1] iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86

2021-04-15 Thread Joerg Roedel

On Sun, Apr 11, 2021 at 02:23:12PM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/pasid.c | 2 ++
>  1 file changed, 2 insertions(+)

Applied, thanks.

Re: [PATCH 0/2] iommu/amd: Revert and remove failing PMC test

2021-04-15 Thread Joerg Roedel

On Fri, Apr 09, 2021 at 03:58:46AM -0500, Suravee Suthikulpanit wrote:
> Paul Menzel (1):
>   Revert "iommu/amd: Fix performance counter initialization"
> 
> Suravee Suthikulpanit (1):
>   iommu/amd: Remove performance counter pre-initialization test

Applied, thanks Paul and Suravee.

Re: [PATCH] iommu: exynos: remove unneeded local variable initialization

2021-04-15 Thread Joerg Roedel

On Thu, Apr 08, 2021 at 10:16:22PM +0200, Krzysztof Kozlowski wrote:
>  drivers/iommu/exynos-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

Re: [PATCH] iommu/mediatek: always enable the clk on resume

2021-04-15 Thread Joerg Roedel

On Thu, Apr 08, 2021 at 02:28:42PM +0200, Dafna Hirschfeld wrote:
>  drivers/iommu/mtk_iommu.c | 19 ---
>  1 file changed, 8 insertions(+), 11 deletions(-)

Applied, thanks.

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-08 Thread Joerg Roedel

On Tue, Mar 23, 2021 at 02:06:19PM -0700, Nadav Amit wrote:
>  drivers/iommu/amd/iommu.c | 76 +--
>  1 file changed, 42 insertions(+), 34 deletions(-)

Load-testing looks good here too, so this patch is queued now for v5.13,
thanks Nadav.

Regards,

Joerg

Re: [GIT PULL] iommu/arm-smmu: Updates for 5.13

2021-04-08 Thread Joerg Roedel

On Thu, Apr 08, 2021 at 02:29:59PM +0100, Will Deacon wrote:
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
> tags/arm-smmu-updates

Pulled, thanks Will.

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-08 Thread Joerg Roedel

On Thu, Apr 08, 2021 at 10:29:25AM +, Nadav Amit wrote:
> In the version that you referred me to, iommu_update_domain_tlb() only
> regards the size of the region to be flushed and disregards the
> alignment:
> 
> + order   = get_order(domain->flush.end - domain->flush.start);
> + mask= (0x1000ULL << order) - 1;
> + address = ((domain->flush.start & ~mask) | (mask >> 1)) & ~0xfffULL;
> 
> 
> If you need to flush for instance the region between 0x1000-0x5000, this
> version would use the address|mask of 0x1000 (16KB page). The version I
> sent regards the alignment, and since the range is not aligned would use
> address|mask of 0x3000 (32KB page).
> 
> IIUC, IOVA allocations today are aligned in such way, but at least in
> the past (looking on 3.19 for the matter), it was not like always like
> that, which can explain the problems.

Yeah, that make sense and explains the data corruption problems. I will
give your patch a try on one of my test machines and consider it for
v5.13 if all goes well.

Thanks,

Joerg

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-08 Thread Joerg Roedel

Hi Nadav,

On Wed, Apr 07, 2021 at 05:57:31PM +, Nadav Amit wrote:
> I tested it on real bare-metal hardware. I ran some basic I/O workloads
> with the IOMMU enabled, checkers enabled/disabled, and so on.
> 
> However, I only tested the IOMMU-flushes and I did not test that the
> device-IOTLB flush work, since I did not have the hardware for that.
> 
> If you can refer me to the old patches, I will have a look and see
> whether I can see a difference in the logic or test them. If you want
> me to run different tests - let me know. If you want me to remove
> the device-IOTLB invalidations logic - that is also fine with me.

Here is the patch-set, it is from 2010 and against a very old version of
the AMD IOMMU driver:


https://lore.kernel.org/lkml/1265898797-32183-1-git-send-email-joerg.roe...@amd.com/

Regards,

Joerg

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-07 Thread Joerg Roedel

On Tue, Mar 23, 2021 at 02:06:19PM -0700, Nadav Amit wrote:
> From: Nadav Amit 
> 
> Currently, IOMMU invalidations and device-IOTLB invalidations using
> AMD IOMMU fall back to full address-space invalidation if more than a
> single page need to be flushed.
> 
> Full flushes are especially inefficient when the IOMMU is virtualized by
> a hypervisor, since it requires the hypervisor to synchronize the entire
> address-space.
> 
> AMD IOMMUs allow to provide a mask to perform page-specific
> invalidations for multiple pages that match the address. The mask is
> encoded as part of the address, and the first zero bit in the address
> (in bits [51:12]) indicates the mask size.
> 
> Use this hardware feature to perform selective IOMMU and IOTLB flushes.
> Combine the logic between both for better code reuse.
> 
> The IOMMU invalidations passed a smoke-test. The device IOTLB
> invalidations are untested.

Have you thoroughly tested this on real hardware? I had a patch-set
doing the same many years ago and it lead to data corruption under load.
Back then it could have been a bug in my code of course, but it made me
cautious about using targeted invalidations.

Regards,

Joerg

Re: [PATCH] iommu: Remove duplicate check of pasids

2021-04-07 Thread Joerg Roedel

On Thu, Apr 01, 2021 at 07:19:16PM +0800, Qi Liu wrote:
> Remove duplicate check of pasids in amd_iommu_domain_enable_v2(), as it
> has been guaranteed in amd_iommu_init_device().
> 
> Signed-off-by: Qi Liu 
> ---
>  drivers/iommu/amd/iommu.c | 3 ---
>  1 file changed, 3 deletions(-)

Applied, thanks.

Re: [PATCH] iommu: sprd: Fix parameter type warning

2021-04-07 Thread Joerg Roedel

On Wed, Mar 31, 2021 at 11:16:45AM +0800, Chunyan Zhang wrote:
>  drivers/iommu/sprd-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

Re: [PATCH 1/1] iommu/vt-d: Report right snoop capability when using FL for IOVA

2021-04-07 Thread Joerg Roedel

On Tue, Mar 30, 2021 at 10:11:45AM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/pasid.h |  1 +
>  drivers/iommu/intel/iommu.c | 11 ++-
>  drivers/iommu/intel/pasid.c | 16 
>  3 files changed, 27 insertions(+), 1 deletion(-)

Applied, thanks.

Re: [PATCH v2 1/2] iommu/mediatek-v1: Allow building as module

2021-04-07 Thread Joerg Roedel

On Fri, Mar 26, 2021 at 11:23:36AM +0800, Yong Wu wrote:
> This patch only adds support for building the IOMMU-v1 driver as module.
> Correspondingly switch the config to tristate and update the iommu_ops's
> owner to THIS_MODULE.
> 
> Signed-off-by: Yong Wu 

Applied both, thanks.

Re: [PATCH v2 0/4] iommu/iova: Add CPU hotplug handler to flush rcaches to core code

2021-04-07 Thread Joerg Roedel

On Thu, Mar 25, 2021 at 08:29:57PM +0800, John Garry wrote:
> John Garry (4):
>   iova: Add CPU hotplug handler to flush rcaches
>   iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
>   iommu: Delete iommu_dma_free_cpu_cached_iovas()
>   iommu: Stop exporting free_iova_fast()
> 
>  drivers/iommu/dma-iommu.c   |  9 -
>  drivers/iommu/intel/iommu.c | 31 ---
>  drivers/iommu/iova.c| 34 +++---
>  include/linux/cpuhotplug.h  |  2 +-
>  include/linux/dma-iommu.h   |  8 
>  include/linux/iova.h|  6 +-
>  6 files changed, 33 insertions(+), 57 deletions(-)

Okay, newer version. Applied this one instead, thanks.

Re: [PATCH 0/5] iommu/vt-d: Several misc cleanups

2021-04-07 Thread Joerg Roedel

On Tue, Mar 23, 2021 at 09:05:55AM +0800, Lu Baolu wrote:
> Lu Baolu (5):
>   iommu/vt-d: Remove unused dma map/unmap trace events
>   iommu/vt-d: Remove svm_dev_ops
>   iommu/vt-d: Remove SVM_FLAG_PRIVATE_PASID
>   iommu/vt-d: Remove unused function declarations
>   iommu/vt-d: Make unnecessarily global functions static

Applied all, thanks.

Re: [PATCH v2 1/1] iommu/vt-d: Don't set then clear private data in prq_event_thread()

2021-04-07 Thread Joerg Roedel

On Sat, Mar 20, 2021 at 10:41:56AM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/svm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.

Re: [PATCH v2 1/1] iommu/vt-d: Fix lockdep splat in intel_pasid_get_entry()

2021-04-07 Thread Joerg Roedel

On Sat, Mar 20, 2021 at 10:09:16AM +0800, Lu Baolu wrote:
>  drivers/iommu/intel/pasid.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)

Applied, thanks.

Re: [PATCH 0/3] iommu/iova: Add CPU hotplug handler to flush rcaches to core code

2021-04-07 Thread Joerg Roedel

On Mon, Mar 01, 2021 at 08:12:18PM +0800, John Garry wrote:
> The Intel IOMMU driver supports flushing the per-CPU rcaches when a CPU is
> offlined.
> 
> Let's move it to core code, so everyone can take advantage.
> 
> Also correct a code comment.
> 
> Based on v5.12-rc1. Tested on arm64 only.
> 
> John Garry (3):
>   iova: Add CPU hotplug handler to flush rcaches
>   iommu/vt-d: Remove IOVA domain rcache flushing for CPU offlining
>   iova: Correct comment for free_cpu_cached_iovas()
> 
>  drivers/iommu/intel/iommu.c | 31 ---
>  drivers/iommu/iova.c| 32 ++--
>  include/linux/cpuhotplug.h  |  2 +-
>  include/linux/iova.h|  1 +
>  4 files changed, 32 insertions(+), 34 deletions(-)

Applied, thanks.

Re: [PATCH] iommu: Add device name to iommu map/unmap trace events

2021-04-06 Thread Joerg Roedel

On Tue, Apr 06, 2021 at 02:56:53PM +0800, chenxiang (M) wrote:
> Is it possible to use group id to identify different domains?

No, the best is to do this during post-processing of your traces by also
keeping tack of domain-device attachments/detachments.

Regards,

Joerg

[tip: x86/seves] x86/sev-es: Optimize __sev_es_ist_enter() for better readability

2021-03-19 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 799de1baaf3509a54ff713efb768020f8defd709
Gitweb:
https://git.kernel.org/tip/799de1baaf3509a54ff713efb768020f8defd709
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:14 +01:00
Committer: Borislav Petkov 
CommitterDate: Fri, 19 Mar 2021 13:37:22 +01:00

x86/sev-es: Optimize __sev_es_ist_enter() for better readability

Reorganize the code and improve the comments to make the function more
readable and easier to understand.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210303141716.29223-4-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 225704e..26f5479 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -137,29 +137,41 @@ static __always_inline bool on_vc_stack(struct pt_regs 
*regs)
 }
 
 /*
- * This function handles the case when an NMI is raised in the #VC exception
- * handler entry code. In this case, the IST entry for #VC must be adjusted, so
- * that any subsequent #VC exception will not overwrite the stack contents of 
the
- * interrupted #VC handler.
+ * This function handles the case when an NMI is raised in the #VC
+ * exception handler entry code, before the #VC handler has switched off
+ * its IST stack. In this case, the IST entry for #VC must be adjusted,
+ * so that any nested #VC exception will not overwrite the stack
+ * contents of the interrupted #VC handler.
  *
  * The IST entry is adjusted unconditionally so that it can be also be
- * unconditionally adjusted back in sev_es_ist_exit(). Otherwise a nested
- * sev_es_ist_exit() call may adjust back the IST entry too early.
+ * unconditionally adjusted back in __sev_es_ist_exit(). Otherwise a
+ * nested sev_es_ist_exit() call may adjust back the IST entry too
+ * early.
+ *
+ * The __sev_es_ist_enter() and __sev_es_ist_exit() functions always run
+ * on the NMI IST stack, as they are only called from NMI handling code
+ * right now.
  */
 void noinstr __sev_es_ist_enter(struct pt_regs *regs)
 {
unsigned long old_ist, new_ist;
 
/* Read old IST entry */
-   old_ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
+   new_ist = old_ist = 
__this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
 
-   /* Make room on the IST stack */
+   /*
+* If NMI happened while on the #VC IST stack, set the new IST
+* value below regs->sp, so that the interrupted stack frame is
+* not overwritten by subsequent #VC exceptions.
+*/
if (on_vc_stack(regs))
-   new_ist = ALIGN_DOWN(regs->sp, 8) - sizeof(old_ist);
-   else
-   new_ist = old_ist - sizeof(old_ist);
+   new_ist = regs->sp;
 
-   /* Store old IST entry */
+   /*
+* Reserve additional 8 bytes and store old IST value so this
+* adjustment can be unrolled in __sev_es_ist_exit().
+*/
+   new_ist -= sizeof(old_ist);
*(unsigned long *)new_ist = old_ist;
 
/* Set new IST entry */

Re: [PATCH][next] iommu: Fix spelling mistake "sixe" -> "size"

2021-03-19 Thread Joerg Roedel

On Fri, Mar 19, 2021 at 09:57:50AM +, Colin King wrote:
> From: Colin Ian King 
> 
> There is a spelling mistake in a dev_err message. Fix it.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/iommu/sprd-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

> 
> diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
> index 7100ed17dcce..e1dc2f7d5639 100644
> --- a/drivers/iommu/sprd-iommu.c
> +++ b/drivers/iommu/sprd-iommu.c
> @@ -297,7 +297,7 @@ static int sprd_iommu_map(struct iommu_domain *domain, 
> unsigned long iova,
>   }
>  
>   if (iova < start || (iova + size) > (end + 1)) {
> - dev_err(dom->sdev->dev, "(iova(0x%lx) + sixe(%zx)) are not in 
> the range!\n",
> + dev_err(dom->sdev->dev, "(iova(0x%lx) + size(%zx)) are not in 
> the range!\n",
>   iova, size);
>   return -EINVAL;
>   }
> -- 
> 2.30.2

[git pull] IOMMU Fixes for Linux v5.12-rc3

2021-03-19 Thread Joerg Roedel

Hi Linus,

The following changes since commit 1e28eed17697bcf343c6743f0028cc3b5dd88bf0:

  Linux 5.12-rc3 (2021-03-14 14:41:02 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v5.12-rc3

for you to fetch changes up to 8dfd0fa6ecdc5e2099a57d485b7ce237abc6c7a0:

  iommu/tegra-smmu: Make tegra_smmu_probe_device() to handle all IOMMU phandles 
(2021-03-18 11:31:12 +0100)


IOMMU Fixes for Linux v5.12-rc3

Including:

- Three AMD IOMMU patches to fix a boot crash on AMD Stoney
  systems and every other AMD IOMMU system booted with
  'amd_iommu=off'. This is a v5.11 regression.

- A Fix for the Tegra IOMMU driver to make sure it detects all
  IOMMUs


Dmitry Osipenko (1):
  iommu/tegra-smmu: Make tegra_smmu_probe_device() to handle all IOMMU 
phandles

Joerg Roedel (3):
  iommu/amd: Move Stoney Ridge check to detect_ivrs()
  iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled
  iommu/amd: Keep track of amd_iommu_irq_remap state

 drivers/iommu/amd/init.c   | 36 
 drivers/iommu/tegra-smmu.c |  7 +++
 2 files changed, 23 insertions(+), 20 deletions(-)

Please pull.

Thanks,

Joerg

Re: [PATCH 1/1] iommu/vt-d: Fix lockdep splat in intel_pasid_get_entry()

2021-03-19 Thread Joerg Roedel

Hi Baolu,

On Fri, Mar 19, 2021 at 09:02:34AM +0800, Lu Baolu wrote:
> This code modifies the pasid directory entry. The pasid directory
> entries are allocated on demand and will never be freed.
> 
> > What you need to do here is to retry the whole path by adding a goto
> > to before  the first get_pasid_table_from_pde() call.
> 
> Yes. Retrying by adding a goto makes the code clearer.
> 
> > 
> > Btw, what makes sure that the pasid_entry does not go away when it is
> > returned here?
> 
> As explained above, it handles the pasid directory table entry which
> won't go away.

Okay, I think the goto is a good idea anyway, in case this changes
someday. Please also add a comment to this code stating that the entries
are never freed.

Regards,

Joerg

[tip: x86/seves] x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: e927e62d8e370ebfc0d702fec22bc752249ebcef
Gitweb:
https://git.kernel.org/tip/e927e62d8e370ebfc0d702fec22bc752249ebcef
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:04:12 +01:00

x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C-bit.

But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-7-j...@8bytes.org
---
 arch/x86/boot/compressed/mem_encrypt.S | 28 +-
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index ebc4a29..c1e81a8 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -139,6 +139,26 @@ SYM_CODE_START(startup32_vc_handler)
jnz .Lfail
movl%edx, 0(%esp)   # Store result
 
+   /*
+* Sanity check CPUID results from the Hypervisor. See comment in
+* do_vc_no_ghcb() for more details on why this is necessary.
+*/
+
+   /* Fail if SEV leaf not available in CPUID[0x8000].EAX */
+   cmpl$0x8000, %ebx
+   jne .Lcheck_sev
+   cmpl$0x801f, 12(%esp)
+   jb  .Lfail
+   jmp .Ldone
+
+.Lcheck_sev:
+   /* Fail if SEV bit not set in CPUID[0x801f].EAX[1] */
+   cmpl$0x801f, %ebx
+   jne .Ldone
+   btl $1, 12(%esp)
+   jnc .Lfail
+
+.Ldone:
popl%edx
popl%ecx
popl%ebx
@@ -152,6 +172,14 @@ SYM_CODE_START(startup32_vc_handler)
 
iret
 .Lfail:
+   /* Send terminate request to Hypervisor */
+   movl$0x100, %eax
+   xorl%edx, %edx
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall
+
+   /* If request fails, go to hlt loop */
hlt
jmp .Lfail
 SYM_CODE_END(startup32_vc_handler)

[tip: x86/seves] x86/boot/compressed/64: Add 32-bit boot #VC handler

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 1ccdbf748d862bc2ea106fa9f2300983c77860fe
Gitweb:
https://git.kernel.org/tip/1ccdbf748d862bc2ea106fa9f2300983c77860fe
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:03:43 +01:00

x86/boot/compressed/64: Add 32-bit boot #VC handler

Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause #VC
exceptions in an SEV-ES guest.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-6-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S |  6 ++-
 arch/x86/boot/compressed/mem_encrypt.S | 96 -
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2001c3b..ee448ae 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pgtable.h"
 
 /*
@@ -857,6 +858,11 @@ SYM_FUNC_END(startup32_set_idt_entry)
 
 SYM_FUNC_START(startup32_load_idt)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* #VC handler */
+   lealrva(startup32_vc_handler)(%ebp), %eax
+   movl$X86_TRAP_VC, %edx
+   callstartup32_set_idt_entry
+
/* Load IDT */
lealrva(boot32_idt)(%ebp), %eax
movl%eax, rva(boot32_idt_desc+2)(%ebp)
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index a6dea4e..ebc4a29 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -61,10 +61,104 @@ SYM_FUNC_START(get_sev_encryption_bit)
ret
 SYM_FUNC_END(get_sev_encryption_bit)
 
+/**
+ * sev_es_req_cpuid - Request a CPUID value from the Hypervisor using
+ *   the GHCB MSR protocol
+ *
+ * @%eax:  Register to request (0=EAX, 1=EBX, 2=ECX, 3=EDX)
+ * @%edx:  CPUID Function
+ *
+ * Returns 0 in %eax on success, non-zero on failure
+ * %edx returns CPUID value on success
+ */
+SYM_CODE_START_LOCAL(sev_es_req_cpuid)
+   shll$30, %eax
+   orl $0x0004, %eax
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall# VMGEXIT
+   rdmsr
+
+   /* Check response */
+   movl%eax, %ecx
+   andl$0x3000, %ecx   # Bits [12-29] MBZ
+   jnz 2f
+
+   /* Check return code */
+   andl$0xfff, %eax
+   cmpl$5, %eax
+   jne 2f
+
+   /* All good - return success */
+   xorl%eax, %eax
+1:
+   ret
+2:
+   movl$-1, %eax
+   jmp 1b
+SYM_CODE_END(sev_es_req_cpuid)
+
+SYM_CODE_START(startup32_vc_handler)
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Keep CPUID function in %ebx */
+   movl%eax, %ebx
+
+   /* Check if error-code == SVM_EXIT_CPUID */
+   cmpl$0x72, 16(%esp)
+   jne .Lfail
+
+   movl$0, %eax# Request CPUID[fn].EAX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 12(%esp)  # Store result
+
+   movl$1, %eax# Request CPUID[fn].EBX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 8(%esp)   # Store result
+
+   movl$2, %eax# Request CPUID[fn].ECX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 4(%esp)   # Store result
+
+   movl$3, %eax# Request CPUID[fn].EDX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 0(%esp)   # Store result
+
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+
+   /* Remove error code */
+   addl$4, %esp
+
+   /* Jump over CPUID instruction */
+   addl$2, (%esp)
+
+   iret
+.Lfail:
+   hlt
+   jmp .Lfail
+SYM_CODE_END(startup32_vc_handler)
+
.code64
 
 #include "../../kernel/sev_verify_cbit.S"
-
 SYM_FUNC_START(set_sev_encryption_mask)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
push%rbp

[tip: x86/seves] x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: f15a0a732aefb46f999c2a8aa8f9f16e71cec5b2
Gitweb:
https://git.kernel.org/tip/f15a0a732aefb46f999c2a8aa8f9f16e71cec5b2
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:24 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:04:12 +01:00

x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-9-j...@8bytes.org
---
 arch/x86/boot/compressed/sev-es.c | 12 +++-
 arch/x86/kernel/sev-es-shared.c   | 10 +++---
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 27826c2..d904bd5 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -200,14 +200,8 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
}
 
 finish:
-   if (result == ES_OK) {
+   if (result == ES_OK)
vc_finish_insn();
-   } else if (result != ES_RETRY) {
-   /*
-* For now, just halt the machine. That makes debugging easier,
-* later we just call sev_es_terminate() here.
-*/
-   while (true)
-   asm volatile("hlt\n");
-   }
+   else if (result != ES_RETRY)
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 387b716..0aa9f13 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int reason)
 {
u64 val = GHCB_SEV_TERMINATE;
 
@@ -206,12 +206,8 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
return;
 
 fail:
-   sev_es_wr_ghcb_msr(GHCB_SEV_TERMINATE);
-   VMGEXIT();
-
-   /* Shouldn't get here - if we do halt the machine */
-   while (true)
-   asm volatile("hlt\n");
+   /* Terminate the guest */
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
 
 static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,

[tip: x86/seves] x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: fef81c86262879d4b1176ef51a834c15b805ebb9
Gitweb:
https://git.kernel.org/tip/fef81c86262879d4b1176ef51a834c15b805ebb9
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:23 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 23:04:12 +01:00

x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

Check whether the hypervisor reported the correct C-bit when running
as an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-8-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S | 83 +-
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index ee448ae..91ea0d5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -183,11 +183,21 @@ SYM_FUNC_START(startup_32)
 */
callget_sev_encryption_bit
xorl%edx, %edx
+#ifdef CONFIG_AMD_MEM_ENCRYPT
testl   %eax, %eax
jz  1f
subl$32, %eax   /* Encryption bit is always above bit 31 */
bts %eax, %edx  /* Set encryption mask for page tables */
+   /*
+* Mark SEV as active in sev_status so that startup32_check_sev_cbit()
+* will do a check. The sev_status memory will be fully initialized
+* with the contents of MSR_AMD_SEV_STATUS later in
+* set_sev_encryption_mask(). For now it is sufficient to know that SEV
+* is active.
+*/
+   movl$1, rva(sev_status)(%ebp)
 1:
+#endif
 
/* Initialize Page tables to 0 */
lealrva(pgtable)(%ebx), %edi
@@ -272,6 +282,9 @@ SYM_FUNC_START(startup_32)
movl%esi, %edx
 1:
 #endif
+   /* Check if the C-bit position is correct when SEV is active */
+   callstartup32_check_sev_cbit
+
pushl   $__KERNEL_CS
pushl   %eax
 
@@ -872,6 +885,76 @@ SYM_FUNC_START(startup32_load_idt)
 SYM_FUNC_END(startup32_load_idt)
 
 /*
+ * Check for the correct C-bit position when the startup_32 boot-path is used.
+ *
+ * The check makes use of the fact that all memory is encrypted when paging is
+ * disabled. The function creates 64 bits of random data using the RDRAND
+ * instruction. RDRAND is mandatory for SEV guests, so always available. If the
+ * hypervisor violates that the kernel will crash right here.
+ *
+ * The 64 bits of random data are stored to a memory location and at the same
+ * time kept in the %eax and %ebx registers. Since encryption is always active
+ * when paging is off the random data will be stored encrypted in main memory.
+ *
+ * Then paging is enabled. When the C-bit position is correct all memory is
+ * still mapped encrypted and comparing the register values with memory will
+ * succeed. An incorrect C-bit position will map all memory unencrypted, so 
that
+ * the compare will use the encrypted random data and fail.
+ */
+SYM_FUNC_START(startup32_check_sev_cbit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Check for non-zero sev_status */
+   movlrva(sev_status)(%ebp), %eax
+   testl   %eax, %eax
+   jz  4f
+
+   /*
+* Get two 32-bit random values - Don't bail out if RDRAND fails
+* because it is better to prevent forward progress if no random value
+* can be gathered.
+*/
+1: rdrand  %eax
+   jnc 1b
+2: rdrand  %ebx
+   jnc 2b
+
+   /* Store to memory and keep it in the registers */
+   movl%eax, rva(sev_check_data)(%ebp)
+   movl%ebx, rva(sev_check_data+4)(%ebp)
+
+   /* Enable paging to see if encryption is active */
+   movl%cr0, %edx   /* Backup %cr0 in %edx */
+   movl$(X86_CR0_PG | X86_CR0_PE), %ecx /* Enable Paging and Protected 
mode */
+   movl%ecx, %cr0
+
+   cmpl%eax, rva(sev_check_data)(%ebp)
+   jne 3f
+   cmpl%ebx, rva(sev_check_data+4)(%ebp)
+   jne 3f
+
+   movl%edx, %cr0  /* Restore previous %cr0 */
+
+   jmp 4f
+
+3: /* Check failed - hlt the machine */
+   hlt
+   jmp 3b
+
+4:
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+#endif
+   ret
+SYM_FUNC_END(startup32_check_sev_cbit)
+
+/*
  * Stack and heap for uncompression
  */
.bss

[tip: x86/seves] x86/sev: Do not require Hypervisor CPUID bit for SEV guests

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: eab696d8e8b9c9d600be6fad8dd8dfdfaca6ca7c
Gitweb:
https://git.kernel.org/tip/eab696d8e8b9c9d600be6fad8dd8dfdfaca6ca7c
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:18 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:40 +01:00

x86/sev: Do not require Hypervisor CPUID bit for SEV guests

A malicious hypervisor could disable the CPUID intercept for an SEV or
SEV-ES guest and trick it into the no-SEV boot path, where it could
potentially reveal secrets. This is not an issue for SEV-SNP guests,
as the CPUID intercept can't be disabled for those.

Remove the Hypervisor CPUID bit check from the SEV detection code to
protect against this kind of attack and add a Hypervisor bit equals zero
check to the SME detection path to prevent non-encrypted guests from
trying to enable SME.

This handles the following cases:

1) SEV(-ES) guest where CPUID intercept is disabled. The guest
   will still see leaf 0x801f and the SEV bit. It can
   retrieve the C-bit and boot normally.

2) Non-encrypted guests with intercepted CPUID will check
   the SEV_STATUS MSR and find it 0 and will try to enable SME.
   This will fail when the guest finds MSR_K8_SYSCFG to be zero,
   as it is emulated by KVM. But we can't rely on that, as there
   might be other hypervisors which return this MSR with bit
   23 set. The Hypervisor bit check will prevent that the guest
   tries to enable SME in this case.

3) Non-encrypted guests on SEV capable hosts with CPUID intercept
   disabled (by a malicious hypervisor) will try to boot into
   the SME path. This will fail, but it is also not considered
   a problem because non-encrypted guests have no protection
   against the hypervisor anyway.

 [ bp: s/non-SEV/non-encrypted/g ]

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Acked-by: Tom Lendacky 
Link: https://lkml.kernel.org/r/20210312123824.306-3-j...@8bytes.org
---
 arch/x86/boot/compressed/mem_encrypt.S |  6 +
 arch/x86/kernel/sev-es-shared.c|  6 +
 arch/x86/mm/mem_encrypt_identity.c | 35 +
 3 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index aa56179..a6dea4e 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -23,12 +23,6 @@ SYM_FUNC_START(get_sev_encryption_bit)
push%ecx
push%edx
 
-   /* Check if running under a hypervisor */
-   movl$1, %eax
-   cpuid
-   bt  $31, %ecx   /* Check the hypervisor bit */
-   jnc .Lno_sev
-
movl$0x8000, %eax   /* CPUID to check the highest leaf */
cpuid
cmpl$0x801f, %eax   /* See if 0x801f is available */
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index cdc04d0..387b716 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -186,7 +186,6 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
 * make it accessible to the hypervisor.
 *
 * In particular, check for:
-*  - Hypervisor CPUID bit
 *  - Availability of CPUID leaf 0x801f
 *  - SEV CPUID bit.
 *
@@ -194,10 +193,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
 * can't be checked here.
 */
 
-   if ((fn == 1 && !(regs->cx & BIT(31
-   /* Hypervisor bit */
-   goto fail;
-   else if (fn == 0x8000 && (regs->ax < 0x801f))
+   if (fn == 0x8000 && (regs->ax < 0x801f))
/* SEV leaf check */
goto fail;
else if ((fn == 0x801f && !(regs->ax & BIT(1
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index 6c5eb6f..a19374d 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -503,14 +503,10 @@ void __init sme_enable(struct boot_params *bp)
 
 #define AMD_SME_BITBIT(0)
 #define AMD_SEV_BITBIT(1)
-   /*
-* Set the feature mask (SME or SEV) based on whether we are
-* running under a hypervisor.
-*/
-   eax = 1;
-   ecx = 0;
-   native_cpuid(, , , );
-   feature_mask = (ecx & BIT(31)) ? AMD_SEV_BIT : AMD_SME_BIT;
+
+   /* Check the SEV MSR whether SEV or SME is enabled */
+   sev_status   = __rdmsr(MSR_AMD64_SEV);
+   feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : 
AMD_SME_BIT;
 
/*
 * Check for the SME/SEV feature:
@@ -530,19 +526,26 @@

[tip: x86/seves] x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 0dd986f3a1e31bd827d2f1f52f07c8a82cd83143
Gitweb:
https://git.kernel.org/tip/0dd986f3a1e31bd827d2f1f52f07c8a82cd83143
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:52 +01:00

x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C-bit.

But the CPUID information comes from the hypervisor which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C-bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-7-j...@8bytes.org
---
 arch/x86/boot/compressed/mem_encrypt.S | 28 +-
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index ebc4a29..c1e81a8 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -139,6 +139,26 @@ SYM_CODE_START(startup32_vc_handler)
jnz .Lfail
movl%edx, 0(%esp)   # Store result
 
+   /*
+* Sanity check CPUID results from the Hypervisor. See comment in
+* do_vc_no_ghcb() for more details on why this is necessary.
+*/
+
+   /* Fail if SEV leaf not available in CPUID[0x8000].EAX */
+   cmpl$0x8000, %ebx
+   jne .Lcheck_sev
+   cmpl$0x801f, 12(%esp)
+   jb  .Lfail
+   jmp .Ldone
+
+.Lcheck_sev:
+   /* Fail if SEV bit not set in CPUID[0x801f].EAX[1] */
+   cmpl$0x801f, %ebx
+   jne .Ldone
+   btl $1, 12(%esp)
+   jnc .Lfail
+
+.Ldone:
popl%edx
popl%ecx
popl%ebx
@@ -152,6 +172,14 @@ SYM_CODE_START(startup32_vc_handler)
 
iret
 .Lfail:
+   /* Send terminate request to Hypervisor */
+   movl$0x100, %eax
+   xorl%edx, %edx
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall
+
+   /* If request fails, go to hlt loop */
hlt
jmp .Lfail
 SYM_CODE_END(startup32_vc_handler)

[tip: x86/seves] x86/boot/compressed/64: Cleanup exception handling before booting kernel

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: b099155e2df7dadf8b1ad9828158b89f5639f654
Gitweb:
https://git.kernel.org/tip/b099155e2df7dadf8b1ad9828158b89f5639f654
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:19 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:36 +01:00

x86/boot/compressed/64: Cleanup exception handling before booting kernel

Disable the exception handling before booting the kernel to make sure
any exceptions that happen during early kernel boot are not directed to
the pre-decompression code.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-2-j...@8bytes.org
---
 arch/x86/boot/compressed/idt_64.c | 14 ++
 arch/x86/boot/compressed/misc.c   |  7 ++-
 arch/x86/boot/compressed/misc.h   |  6 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index 804a502..9b93567 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -52,3 +52,17 @@ void load_stage2_idt(void)
 
load_boot_idt(_idt_desc);
 }
+
+void cleanup_exception_handling(void)
+{
+   /*
+* Flush GHCB from cache and map it encrypted again when running as
+* SEV-ES guest.
+*/
+   sev_es_shutdown_ghcb();
+
+   /* Set a null-idt, disabling #PF and #VC handling */
+   boot_idt_desc.size= 0;
+   boot_idt_desc.address = 0;
+   load_boot_idt(_idt_desc);
+}
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 267e7f9..cc9fd0e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -443,11 +443,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
handle_relocations(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");
 
-   /*
-* Flush GHCB from cache and map it encrypted again when running as
-* SEV-ES guest.
-*/
-   sev_es_shutdown_ghcb();
+   /* Disable exception handling before booting the kernel */
+   cleanup_exception_handling();
 
return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 901ea5e..e5612f0 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -155,6 +155,12 @@ extern pteval_t __default_kernel_pte_mask;
 extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
 extern struct desc_ptr boot_idt_desc;
 
+#ifdef CONFIG_X86_64
+void cleanup_exception_handling(void);
+#else
+static inline void cleanup_exception_handling(void) { }
+#endif
+
 /* IDT Entry Points */
 void boot_page_fault(void);
 void boot_stage1_vc(void);

[tip: x86/seves] x86/boot/compressed/64: Setup IDT in startup_32 boot path

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 79419e13e8082cc15d174df979a363528e31f2e7
Gitweb:
https://git.kernel.org/tip/79419e13e8082cc15d174df979a363528e31f2e7
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:21 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:46 +01:00

x86/boot/compressed/64: Setup IDT in startup_32 boot path

This boot path needs exception handling when it is used with SEV-ES.
Setup an IDT and provide a helper function to write IDT entries for
use in 32-bit protected mode.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-5-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S | 72 +-
 1 file changed, 72 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index c59c80c..2001c3b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -116,6 +116,9 @@ SYM_FUNC_START(startup_32)
lretl
 1:
 
+   /* Setup Exception handling for SEV-ES */
+   callstartup32_load_idt
+
/* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
@@ -701,6 +704,19 @@ SYM_DATA_START(boot_idt)
.endr
 SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+SYM_DATA_START(boot32_idt_desc)
+   .word   boot32_idt_end - boot32_idt - 1
+   .long   0
+SYM_DATA_END(boot32_idt_desc)
+   .balign 8
+SYM_DATA_START(boot32_idt)
+   .rept 32
+   .quad 0
+   .endr
+SYM_DATA_END_LABEL(boot32_idt, SYM_L_GLOBAL, boot32_idt_end)
+#endif
+
 #ifdef CONFIG_EFI_STUB
 SYM_DATA(image_offset, .long 0)
 #endif
@@ -793,6 +809,62 @@ SYM_DATA_START_LOCAL(loaded_image_proto)
 SYM_DATA_END(loaded_image_proto)
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   __HEAD
+   .code32
+/*
+ * Write an IDT entry into boot32_idt
+ *
+ * Parameters:
+ *
+ * %eax:   Handler address
+ * %edx:   Vector number
+ *
+ * Physical offset is expected in %ebp
+ */
+SYM_FUNC_START(startup32_set_idt_entry)
+   push%ebx
+   push%ecx
+
+   /* IDT entry address to %ebx */
+   lealrva(boot32_idt)(%ebp), %ebx
+   shl $3, %edx
+   addl%edx, %ebx
+
+   /* Build IDT entry, lower 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [15:0]
+   movl$__KERNEL32_CS, %ecx# Target code segment selector
+   shl $16, %ecx
+   orl %ecx, %edx
+
+   /* Store lower 4 bytes to IDT */
+   movl%edx, (%ebx)
+
+   /* Build IDT entry, upper 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [31:16]
+   orl $0x8e00, %edx   # Present, Type 32-bit Interrupt Gate
+
+   /* Store upper 4 bytes to IDT */
+   movl%edx, 4(%ebx)
+
+   pop %ecx
+   pop %ebx
+   ret
+SYM_FUNC_END(startup32_set_idt_entry)
+#endif
+
+SYM_FUNC_START(startup32_load_idt)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* Load IDT */
+   lealrva(boot32_idt)(%ebp), %eax
+   movl%eax, rva(boot32_idt_desc+2)(%ebp)
+   lidtrva(boot32_idt_desc)(%ebp)
+#endif
+   ret
+SYM_FUNC_END(startup32_load_idt)
+
 /*
  * Stack and heap for uncompression
  */

[tip: x86/seves] x86/boot/compressed/64: Add 32-bit boot #VC handler

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 9e373ba233b236a831d0d9578be095a4b7435abe
Gitweb:
https://git.kernel.org/tip/9e373ba233b236a831d0d9578be095a4b7435abe
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:22 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:48 +01:00

x86/boot/compressed/64: Add 32-bit boot #VC handler

Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-6-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S |  6 ++-
 arch/x86/boot/compressed/mem_encrypt.S | 96 -
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2001c3b..ee448ae 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pgtable.h"
 
 /*
@@ -857,6 +858,11 @@ SYM_FUNC_END(startup32_set_idt_entry)
 
 SYM_FUNC_START(startup32_load_idt)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* #VC handler */
+   lealrva(startup32_vc_handler)(%ebp), %eax
+   movl$X86_TRAP_VC, %edx
+   callstartup32_set_idt_entry
+
/* Load IDT */
lealrva(boot32_idt)(%ebp), %eax
movl%eax, rva(boot32_idt_desc+2)(%ebp)
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index a6dea4e..ebc4a29 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -61,10 +61,104 @@ SYM_FUNC_START(get_sev_encryption_bit)
ret
 SYM_FUNC_END(get_sev_encryption_bit)
 
+/**
+ * sev_es_req_cpuid - Request a CPUID value from the Hypervisor using
+ *   the GHCB MSR protocol
+ *
+ * @%eax:  Register to request (0=EAX, 1=EBX, 2=ECX, 3=EDX)
+ * @%edx:  CPUID Function
+ *
+ * Returns 0 in %eax on success, non-zero on failure
+ * %edx returns CPUID value on success
+ */
+SYM_CODE_START_LOCAL(sev_es_req_cpuid)
+   shll$30, %eax
+   orl $0x0004, %eax
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall# VMGEXIT
+   rdmsr
+
+   /* Check response */
+   movl%eax, %ecx
+   andl$0x3000, %ecx   # Bits [12-29] MBZ
+   jnz 2f
+
+   /* Check return code */
+   andl$0xfff, %eax
+   cmpl$5, %eax
+   jne 2f
+
+   /* All good - return success */
+   xorl%eax, %eax
+1:
+   ret
+2:
+   movl$-1, %eax
+   jmp 1b
+SYM_CODE_END(sev_es_req_cpuid)
+
+SYM_CODE_START(startup32_vc_handler)
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Keep CPUID function in %ebx */
+   movl%eax, %ebx
+
+   /* Check if error-code == SVM_EXIT_CPUID */
+   cmpl$0x72, 16(%esp)
+   jne .Lfail
+
+   movl$0, %eax# Request CPUID[fn].EAX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 12(%esp)  # Store result
+
+   movl$1, %eax# Request CPUID[fn].EBX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 8(%esp)   # Store result
+
+   movl$2, %eax# Request CPUID[fn].ECX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 4(%esp)   # Store result
+
+   movl$3, %eax# Request CPUID[fn].EDX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 0(%esp)   # Store result
+
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+
+   /* Remove error code */
+   addl$4, %esp
+
+   /* Jump over CPUID instruction */
+   addl$2, (%esp)
+
+   iret
+.Lfail:
+   hlt
+   jmp .Lfail
+SYM_CODE_END(startup32_vc_handler)
+
.code64
 
 #include "../../kernel/sev_verify_cbit.S"
-
 SYM_FUNC_START(set_sev_encryption_mask)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
push%rbp

[tip: x86/seves] x86/boot/compressed/64: Reload CS in startup_32

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 0c289ff81c24033777fab23019039f11e1449ba4
Gitweb:
https://git.kernel.org/tip/0c289ff81c24033777fab23019039f11e1449ba4
Author:Joerg Roedel 
AuthorDate:Wed, 10 Mar 2021 09:43:20 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:43 +01:00

x86/boot/compressed/64: Reload CS in startup_32

Exception handling in the startup_32 boot path requires the CS
selector to be correctly set up. Reload it from the current GDT.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-4-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S |  9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index e94874f..c59c80c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -107,9 +107,16 @@ SYM_FUNC_START(startup_32)
movl%eax, %gs
movl%eax, %ss
 
-/* setup a stack and make sure cpu supports long mode. */
+   /* Setup a stack and load CS from current GDT */
lealrva(boot_stack_end)(%ebp), %esp
 
+   pushl   $__KERNEL32_CS
+   lealrva(1f)(%ebp), %eax
+   pushl   %eax
+   lretl
+1:
+
+   /* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
jnz .Lno_longmode

[tip: x86/seves] x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 769eb023aa77062cf15c2a179fc8d13b43422c9b
Gitweb:
https://git.kernel.org/tip/769eb023aa77062cf15c2a179fc8d13b43422c9b
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:23 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:54 +01:00

x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

Check whether the hypervisor reported the correct C-bit when running
as an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-8-j...@8bytes.org
---
 arch/x86/boot/compressed/head_64.S | 83 +-
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index ee448ae..91ea0d5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -183,11 +183,21 @@ SYM_FUNC_START(startup_32)
 */
callget_sev_encryption_bit
xorl%edx, %edx
+#ifdef CONFIG_AMD_MEM_ENCRYPT
testl   %eax, %eax
jz  1f
subl$32, %eax   /* Encryption bit is always above bit 31 */
bts %eax, %edx  /* Set encryption mask for page tables */
+   /*
+* Mark SEV as active in sev_status so that startup32_check_sev_cbit()
+* will do a check. The sev_status memory will be fully initialized
+* with the contents of MSR_AMD_SEV_STATUS later in
+* set_sev_encryption_mask(). For now it is sufficient to know that SEV
+* is active.
+*/
+   movl$1, rva(sev_status)(%ebp)
 1:
+#endif
 
/* Initialize Page tables to 0 */
lealrva(pgtable)(%ebx), %edi
@@ -272,6 +282,9 @@ SYM_FUNC_START(startup_32)
movl%esi, %edx
 1:
 #endif
+   /* Check if the C-bit position is correct when SEV is active */
+   callstartup32_check_sev_cbit
+
pushl   $__KERNEL_CS
pushl   %eax
 
@@ -872,6 +885,76 @@ SYM_FUNC_START(startup32_load_idt)
 SYM_FUNC_END(startup32_load_idt)
 
 /*
+ * Check for the correct C-bit position when the startup_32 boot-path is used.
+ *
+ * The check makes use of the fact that all memory is encrypted when paging is
+ * disabled. The function creates 64 bits of random data using the RDRAND
+ * instruction. RDRAND is mandatory for SEV guests, so always available. If the
+ * hypervisor violates that the kernel will crash right here.
+ *
+ * The 64 bits of random data are stored to a memory location and at the same
+ * time kept in the %eax and %ebx registers. Since encryption is always active
+ * when paging is off the random data will be stored encrypted in main memory.
+ *
+ * Then paging is enabled. When the C-bit position is correct all memory is
+ * still mapped encrypted and comparing the register values with memory will
+ * succeed. An incorrect C-bit position will map all memory unencrypted, so 
that
+ * the compare will use the encrypted random data and fail.
+ */
+SYM_FUNC_START(startup32_check_sev_cbit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Check for non-zero sev_status */
+   movlrva(sev_status)(%ebp), %eax
+   testl   %eax, %eax
+   jz  4f
+
+   /*
+* Get two 32-bit random values - Don't bail out if RDRAND fails
+* because it is better to prevent forward progress if no random value
+* can be gathered.
+*/
+1: rdrand  %eax
+   jnc 1b
+2: rdrand  %ebx
+   jnc 2b
+
+   /* Store to memory and keep it in the registers */
+   movl%eax, rva(sev_check_data)(%ebp)
+   movl%ebx, rva(sev_check_data+4)(%ebp)
+
+   /* Enable paging to see if encryption is active */
+   movl%cr0, %edx   /* Backup %cr0 in %edx */
+   movl$(X86_CR0_PG | X86_CR0_PE), %ecx /* Enable Paging and Protected 
mode */
+   movl%ecx, %cr0
+
+   cmpl%eax, rva(sev_check_data)(%ebp)
+   jne 3f
+   cmpl%ebx, rva(sev_check_data+4)(%ebp)
+   jne 3f
+
+   movl%edx, %cr0  /* Restore previous %cr0 */
+
+   jmp 4f
+
+3: /* Check failed - hlt the machine */
+   hlt
+   jmp 3b
+
+4:
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+#endif
+   ret
+SYM_FUNC_END(startup32_check_sev_cbit)
+
+/*
  * Stack and heap for uncompression
  */
.bss

[tip: x86/seves] x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

2021-03-18 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 4fbe2c3b9dd04f44608d710ad2ae83d7f1c04182
Gitweb:
https://git.kernel.org/tip/4fbe2c3b9dd04f44608d710ad2ae83d7f1c04182
Author:Joerg Roedel 
AuthorDate:Fri, 12 Mar 2021 13:38:24 +01:00
Committer: Borislav Petkov 
CommitterDate: Thu, 18 Mar 2021 16:44:59 +01:00

x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().

Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Link: https://lkml.kernel.org/r/20210312123824.306-9-j...@8bytes.org
---
 arch/x86/boot/compressed/sev-es.c | 12 +++-
 arch/x86/kernel/sev-es-shared.c   | 10 +++---
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 27826c2..d904bd5 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -200,14 +200,8 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
}
 
 finish:
-   if (result == ES_OK) {
+   if (result == ES_OK)
vc_finish_insn();
-   } else if (result != ES_RETRY) {
-   /*
-* For now, just halt the machine. That makes debugging easier,
-* later we just call sev_es_terminate() here.
-*/
-   while (true)
-   asm volatile("hlt\n");
-   }
+   else if (result != ES_RETRY)
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 387b716..0aa9f13 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int reason)
 {
u64 val = GHCB_SEV_TERMINATE;
 
@@ -206,12 +206,8 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
return;
 
 fail:
-   sev_es_wr_ghcb_msr(GHCB_SEV_TERMINATE);
-   VMGEXIT();
-
-   /* Shouldn't get here - if we do halt the machine */
-   while (true)
-   asm volatile("hlt\n");
+   /* Terminate the guest */
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
 
 static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,

Re: [RFC PATCH 5/7] iommu/amd: Add support for Guest IO protection

2021-03-18 Thread Joerg Roedel

On Fri, Mar 12, 2021 at 03:04:09AM -0600, Suravee Suthikulpanit wrote:
> @@ -519,6 +521,7 @@ struct protection_domain {
>   spinlock_t lock;/* mostly used to lock the page table*/
>   u16 id; /* the domain id written to the device table */
>   int glx;/* Number of levels for GCR3 table */
> + bool giov;  /* guest IO protection domain */

Could this be turned into a flag?

Regards,

Joerg

Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug

2021-03-18 Thread Joerg Roedel

On Thu, Mar 18, 2021 at 11:24:25AM +0200, Maxim Levitsky wrote:
> But again this is a debug feature, and it is intended to allow the user
> to shoot himself in the foot.

And one can't debug SEV-ES guests with it, so what is the point of
enabling it for them too?

Regards,

Joerg

Re: [RFC PATCH 7/7] iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API

2021-03-18 Thread Joerg Roedel

On Fri, Mar 12, 2021 at 03:04:11AM -0600, Suravee Suthikulpanit wrote:
> Introduce init function for setting up DMA domain for DMA-API with
> the IOMMU v2 page table.
> 
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  drivers/iommu/amd/iommu.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index e29ece6e1e68..bd26de8764bd 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1937,6 +1937,24 @@ static int protection_domain_init_v1(struct 
> protection_domain *domain, int mode)
>   return 0;
>  }
>  
> +static int protection_domain_init_v2(struct protection_domain *domain)
> +{
> + spin_lock_init(>lock);
> + domain->id = domain_id_alloc();
> + if (!domain->id)
> + return -ENOMEM;
> + INIT_LIST_HEAD(>dev_list);
> +
> + domain->giov = true;
> +
> + if (amd_iommu_pgtable == AMD_IOMMU_V2 &&
> + domain_enable_v2(domain, 1, false)) {
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +

You also need to advertise a different aperture size and a different
pgsize-bitmap. The host page-table can only map a 48 bit address space,
not a 64 bit one like with v1 page-tables.

Regards,

Joerg

Re: [RFC PATCH 6/7] iommu/amd: Introduce amd_iommu_pgtable command-line option

2021-03-18 Thread Joerg Roedel

On Fri, Mar 12, 2021 at 03:04:10AM -0600, Suravee Suthikulpanit wrote:
> To allow specification whether to use v1 or v2 IOMMU pagetable for
> DMA remapping when calling kernel DMA-API.
> 
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  6 ++
>  drivers/iommu/amd/init.c| 15 +++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 04545725f187..466e807369ea 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -319,6 +319,12 @@
>This mode requires kvm-amd.avic=1.
>(Default when IOMMU HW support is present.)
>  
> + amd_iommu_pgtable= [HW,X86-64]
> + Specifies one of the following AMD IOMMU page table to
> + be used for DMA remapping for DMA-API:
> + v1 - Use v1 page table (Default)
> + v2 - Use v2 page table

Any reason v2 can not be the default when it is supported by the IOMMU?

Re: [RFC PATCH 4/7] iommu/amd: Initial support for AMD IOMMU v2 page table

2021-03-18 Thread Joerg Roedel

Hi Suravee,

On Fri, Mar 12, 2021 at 03:04:08AM -0600, Suravee Suthikulpanit wrote:
> @@ -503,6 +504,7 @@ struct amd_io_pgtable {
>   int mode;
>   u64 *root;
>   atomic64_t  pt_root;/* pgtable root and pgtable mode */
> + struct mm_structv2_mm;
>  };

A whole mm_struct is a bit too much when all we really need is an 8-byte
page-table root pointer.

> +static pte_t *fetch_pte(struct amd_io_pgtable *pgtable,
> +   unsigned long iova,
> +   unsigned long *page_size)
> +{
> + int level;
> + pte_t *ptep;
> +
> + ptep = lookup_address_in_mm(>v2_mm, iova, );
> + if (!ptep || pte_none(*ptep) || (level == PG_LEVEL_NONE))
> + return NULL;

So you are re-using the in-kernel page-table building code. That safes
some lines of code, but has several problems:

1) When you boot a kernel with this code on a machine with
   5-level paging, the IOMMU code will build a 5-level
   page-table too, breaking IOMMU mappings.

2) You need a whole mm_struct per domain, which is big.

3) The existing macros for CPU page-tables require locking. For
   IOMMU page-tables this is not really necessary and might
   cause scalability issues.

Overall I think you should write your own code to build a 4-level
page-table and use cmpxchg64 to avoid the need for locking. Then things
will not break when such a kernel is suddenly booted on a machine which
as 5-level paging enabled.

Regards,

Joerg

Re: [PATCH v2 0/4] Misc vSVA fixes for VT-d

2021-03-18 Thread Joerg Roedel

On Tue, Mar 02, 2021 at 02:13:56AM -0800, Jacob Pan wrote:
> Hi Baolu et al,
> 
> This is a collection of SVA-related fixes.
> 
> ChangeLog:
> 
> v2:
>   - For guest SVA, call pasid_set_wpe directly w/o checking host CR0.wp
> (Review comments by Kevin T.)
>   - Added fixes tag
> 
> Thanks,
> 
> Jacob
> 
> Jacob Pan (4):
>   iommu/vt-d: Enable write protect for supervisor SVM
>   iommu/vt-d: Enable write protect propagation from guest
>   iommu/vt-d: Reject unsupported page request modes
>   iommu/vt-d: Calculate and set flags for handle_mm_fault
> 
>  drivers/iommu/intel/pasid.c | 29 +
>  drivers/iommu/intel/svm.c   | 21 +
>  include/uapi/linux/iommu.h  |  3 ++-
>  3 files changed, 48 insertions(+), 5 deletions(-)

Applied, thanks.

Re: [PATCH v1] iommu/tegra-smmu: Make tegra_smmu_probe_device() to handle all IOMMU phandles

2021-03-18 Thread Joerg Roedel

On Fri, Mar 12, 2021 at 06:54:39PM +0300, Dmitry Osipenko wrote:
> The tegra_smmu_probe_device() handles only the first IOMMU device-tree
> phandle, skipping the rest. Devices like 3D module on Tegra30 have
> multiple IOMMU phandles, one for each h/w block, and thus, only one
> IOMMU phandle is added to fwspec for the 3D module, breaking GPU.
> Previously this problem was masked by tegra_smmu_attach_dev() which
> didn't use the fwspec, but parsed the DT by itself. The previous commit
> to tegra-smmu driver partially reverted changes that caused problems for
> T124 and now we have tegra_smmu_attach_dev() that uses the fwspec and
> the old-buggy variant of tegra_smmu_probe_device() which skips secondary
> IOMMUs.
> 
> Make tegra_smmu_probe_device() not to skip the secondary IOMMUs. This
> fixes a partially attached IOMMU of the 3D module on Tegra30 and now GPU
> works properly once again.
> 
> Fixes: 765a9d1d02b2 ("iommu/tegra-smmu: Fix mc errors on tegra124-nyan")
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/iommu/tegra-smmu.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)

Applied for v5.12, thanks.

Re: [PATCH 1/1] iommu/vt-d: Report more information about invalidation errors

2021-03-18 Thread Joerg Roedel

On Thu, Mar 18, 2021 at 08:53:40AM +0800, Lu Baolu wrote:
> When the invalidation queue errors are encountered, dump the information
> logged by the VT-d hardware together with the pending queue invalidation
> descriptors.
> 
> Signed-off-by: Ashok Raj 
> Tested-by: Guo Kaijie 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/dmar.c  | 68 ++---
>  include/linux/intel-iommu.h |  6 
>  2 files changed, 70 insertions(+), 4 deletions(-)

Applied, thanks.

Re: [PATCH] iommu/vt-d: Disable SVM when ATS/PRI/PASID are not enabled in the device

2021-03-18 Thread Joerg Roedel

On Sun, Mar 14, 2021 at 01:15:34PM -0700, Kyung Min Park wrote:
> Currently, the Intel VT-d supports Shared Virtual Memory (SVM) only when
> IO page fault is supported. Otherwise, shared memory pages can not be
> swapped out and need to be pinned. The device needs the Address Translation
> Service (ATS), Page Request Interface (PRI) and Process Address Space
> Identifier (PASID) capabilities to be enabled to support IO page fault.
> 
> Disable SVM when ATS, PRI and PASID are not enabled in the device.
> 
> Signed-off-by: Kyung Min Park 

Applied, thanks.

Re: [PATCH 1/1] iommu/vt-d: Fix lockdep splat in intel_pasid_get_entry()

2021-03-18 Thread Joerg Roedel

On Wed, Mar 17, 2021 at 08:58:34AM +0800, Lu Baolu wrote:
> The pasid_lock is used to synchronize different threads from modifying a
> same pasid directory entry at the same time. It causes below lockdep splat.
> 
> [   83.296538] 
> [   83.296538] WARNING: possible irq lock inversion dependency detected
> [   83.296539] 5.12.0-rc3+ #25 Tainted: GW
> [   83.296539] 
> [   83.296540] bash/780 just changed the state of lock:
> [   83.296540] 82b29c98 (device_domain_lock){..-.}-{2:2}, at:
>iommu_flush_dev_iotlb.part.0+0x32/0x110
> [   83.296547] but this lock took another, SOFTIRQ-unsafe lock in the past:
> [   83.296547]  (pasid_lock){+.+.}-{2:2}
> [   83.296548]
> 
>and interrupts could create inverse lock ordering between them.
> 
> [   83.296549] other info that might help us debug this:
> [   83.296549] Chain exists of:
>  device_domain_lock --> >lock --> pasid_lock
> [   83.296551]  Possible interrupt unsafe locking scenario:
> 
> [   83.296551]CPU0CPU1
> [   83.296552]
> [   83.296552]   lock(pasid_lock);
> [   83.296553]local_irq_disable();
> [   83.296553]lock(device_domain_lock);
> [   83.296554]lock(>lock);
> [   83.296554]   
> [   83.296554] lock(device_domain_lock);
> [   83.296555]
> *** DEADLOCK ***
> 
> Fix it by replacing the pasid_lock with an atomic exchange operation.
> 
> Reported-and-tested-by: Dave Jiang 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/pasid.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 9fb3d3e80408..1ddcb8295f72 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -24,7 +24,6 @@
>  /*
>   * Intel IOMMU system wide PASID name space:
>   */
> -static DEFINE_SPINLOCK(pasid_lock);
>  u32 intel_pasid_max_id = PASID_MAX;
>  
>  int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid)
> @@ -259,19 +258,18 @@ struct pasid_entry *intel_pasid_get_entry(struct device 
> *dev, u32 pasid)
>   dir_index = pasid >> PASID_PDE_SHIFT;
>   index = pasid & PASID_PTE_MASK;
>  
> - spin_lock(_lock);
>   entries = get_pasid_table_from_pde([dir_index]);
>   if (!entries) {
>   entries = alloc_pgtable_page(info->iommu->node);
> - if (!entries) {
> - spin_unlock(_lock);
> + if (!entries)
>   return NULL;
> - }
>  
> - WRITE_ONCE(dir[dir_index].val,
> -(u64)virt_to_phys(entries) | PASID_PTE_PRESENT);
> + if (cmpxchg64([dir_index].val, 0ULL,
> +   (u64)virt_to_phys(entries) | PASID_PTE_PRESENT)) {
> + free_pgtable_page(entries);
> + entries = get_pasid_table_from_pde([dir_index]);

This is racy, someone could have already cleared the pasid-entry again.
What you need to do here is to retry the whole path by adding a goto
to before  the first get_pasid_table_from_pde() call.

Btw, what makes sure that the pasid_entry does not go away when it is
returned here?

Regards,

Joerg

Re: [PATCH 1/1] iommu/vt-d: Don't set then immediately clear in prq_event_thread()

2021-03-18 Thread Joerg Roedel

Hi Baolu,

On Tue, Mar 09, 2021 at 08:46:41AM +0800, Lu Baolu wrote:
> The private data field of a page group response descriptor is set then
> immediately cleared in prq_event_thread(). Fix this by moving clearing
> code up.
> 
> Fixes: 5b438f4ba315d ("iommu/vt-d: Support page request in scalable mode")
> Cc: Jacob Pan 
> Reviewed-by: Liu Yi L 
> Signed-off-by: Lu Baolu 

Does this fix an actual bug? If so, please state it in the commit
message and also fix the subject line to state what is set/cleared.

Thanks,

Joerg

Re: [PATCH 1/2] iommu/iova: Add rbtree entry helper

2021-03-18 Thread Joerg Roedel

On Fri, Mar 05, 2021 at 04:35:22PM +, Robin Murphy wrote:
> Repeating the rb_entry() boilerplate all over the place gets old fast.
> Before adding yet more instances, add a little hepler to tidy it up.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/iova.c | 23 ++-
>  1 file changed, 14 insertions(+), 9 deletions(-)

Applied both, thanks Robin.

Re: [PATCH] iommu/dma: Resurrect the "forcedac" option

2021-03-18 Thread Joerg Roedel

On Fri, Mar 05, 2021 at 04:32:34PM +, Robin Murphy wrote:
> In converting intel-iommu over to the common IOMMU DMA ops, it quietly
> lost the functionality of its "forcedac" option. Since this is a handy
> thing both for testing and for performance optimisation on certain
> platforms, reimplement it under the common IOMMU parameter namespace.
> 
> For the sake of fixing the inadvertent breakage of the Intel-specific
> parameter, remove the dmar_forcedac remnants and hook it up as an alias
> while documenting the transition to the new common parameter.
> 
> Fixes: c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu 
> ops")
> Signed-off-by: Robin Murphy 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 15 ---
>  drivers/iommu/dma-iommu.c   | 13 -
>  drivers/iommu/intel/iommu.c |  5 ++---
>  include/linux/dma-iommu.h   |  2 ++
>  4 files changed, 24 insertions(+), 11 deletions(-)

Applied for v5.13, thanks.

Re: [PATCH v5 0/2] Add Unisoc IOMMU basic driver

2021-03-18 Thread Joerg Roedel

On Fri, Mar 05, 2021 at 05:32:14PM +0800, Chunyan Zhang wrote:
>  .../devicetree/bindings/iommu/sprd,iommu.yaml |  57 ++
>  drivers/iommu/Kconfig |  12 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/sprd-iommu.c| 577 ++
>  4 files changed, 647 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/sprd,iommu.yaml
>  create mode 100644 drivers/iommu/sprd-iommu.c

Applied, thanks.

Re: [PATCH v3] iommu: Check dev->iommu in iommu_dev_xxx functions

2021-03-18 Thread Joerg Roedel

On Wed, Mar 03, 2021 at 05:36:11PM +, Shameer Kolothum wrote:
> The device iommu probe/attach might have failed leaving dev->iommu
> to NULL and device drivers may still invoke these functions resulting
> in a crash in iommu vendor driver code.
> 
> Hence make sure we check that.
> 
> Fixes: a3a195929d40 ("iommu: Add APIs for multiple domains per device")
> Signed-off-by: Shameer Kolothum 
> ---
> v2 --> v3
>  -Removed iommu_ops from struct dev_iommu.
> v1 --> v2:
>  -Added iommu_ops to struct dev_iommu based on the discussion with Robin.
>  -Rebased against iommu-tree core branch.
> ---
>  drivers/iommu/iommu.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)

Applied, thanks.

Re: [PATCH 0/3] iommu/amd: Fix booting with amd_iommu=off

2021-03-18 Thread Joerg Roedel

On Wed, Mar 17, 2021 at 06:48:50PM +0800, Huang Rui wrote:
> Series are Acked-by: Huang Rui 

Thanks, series is applied for v5.12

Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug

2021-03-18 Thread Joerg Roedel

On Tue, Mar 16, 2021 at 12:51:20PM +0200, Maxim Levitsky wrote:
> I agree but what is wrong with that? 
> This is a debug feature, and it only can be enabled by the root,
> and so someone might actually want this case to happen
> (e.g to see if a SEV guest can cope with extra #VC exceptions).

That doesn't make sense, we know that and SEV-ES guest can't cope with
extra #VC exceptions, so there is no point in testing this. It is more a
way to shot oneself into the foot for the user and a potential source of
bug reports for SEV-ES guests.

> I have nothing against not allowing this for SEV-ES guests though.
> What do you think?

I think SEV-ES guests should only have the intercept bits set which
guests acutally support.

Regards,

Joerg

Re: [PATCH 2/5] iommu/vt-d: Remove WO permissions on second-level paging entries

2021-03-18 Thread Joerg Roedel

Hi,

On Mon, Mar 08, 2021 at 11:47:46AM -0800, Raj, Ashok wrote:
> That is the primary motivation, given that we have moved to 1st level for
> general IOVA, first level doesn't have a WO mapping. I didn't know enough
> about the history to determine if a WO without a READ is very useful. I
> guess the ZLR was invented to support those cases without a READ in PCIe. I

Okay, please update the commit message and re-send. I guess these
patches are 5.13 stuff. In that case, Baolu can include them into his
pull request later this cycle.

Regards,

Joerg

Re: [PATCH 2/3] iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled

2021-03-17 Thread Joerg Roedel

On Wed, Mar 17, 2021 at 01:37:16PM +, David Woodhouse wrote:
> If we can get to the point where we don't even need to check
> amd_iommu_irq_remap in the ...select() function because the IRQ domain
> is never even registered in the case where the flag ends up false, all
> the better :)

This should already be achieved with this patch :)

But the check is still needed if something goes wrong during IOMMU
initialization. In this case the IOMMUs are teared down and the memory
is freed. But the IRQ domains stay registered for now, mostly because
the upper-level APIs to register them lack a deregister function.

I havn't looked into the details yet whether it is suffient to call
irq_domain_remove() on a domain created with
arch_create_remap_msi_irq_domain() for example. This needs more research
on my side :)

Regards,

Joerg

Re: [PATCH 2/3] iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled

2021-03-17 Thread Joerg Roedel

On Wed, Mar 17, 2021 at 11:47:11AM +, David Woodhouse wrote:
> If you've already moved the Stoney Ridge check out of the way, there's
> no real reason why you can't just set init_state=IOMMU_CMDLINE_DISABLED
> directly from parse_amd_iommu_options(), is there? Then you don't need
> the condition here at all?

True, there is even more room for optimization. The amd_iommu_disabled
variable can go away entirely, including its checks in
early_amd_iommu_init(). I will do a patch-set on-top of this for v5.13
which does more cleanups.

Thanks,

Joerg

[PATCH 2/3] iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled

2021-03-17 Thread Joerg Roedel

From: Joerg Roedel 

Don't even try to initialize the AMD IOMMU hardware when amd_iommu=off has been
passed on the kernel command line.

References: https://bugzilla.kernel.org/show_bug.cgi?id=212133
References: https://bugzilla.suse.com/show_bug.cgi?id=1183132
Cc: sta...@vger.kernel.org # v5.11
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/init.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 3280e6f5b720..61dae1800b7f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2919,12 +2919,12 @@ static int __init state_next(void)
}
break;
case IOMMU_IVRS_DETECTED:
-   ret = early_amd_iommu_init();
-   init_state = ret ? IOMMU_INIT_ERROR : IOMMU_ACPI_FINISHED;
-   if (init_state == IOMMU_ACPI_FINISHED && amd_iommu_disabled) {
-   pr_info("AMD IOMMU disabled\n");
+   if (amd_iommu_disabled) {
init_state = IOMMU_CMDLINE_DISABLED;
ret = -EINVAL;
+   } else {
+   ret = early_amd_iommu_init();
+   init_state = ret ? IOMMU_INIT_ERROR : 
IOMMU_ACPI_FINISHED;
}
break;
case IOMMU_ACPI_FINISHED:
-- 
2.30.2

[PATCH 3/3] iommu/amd: Keep track of amd_iommu_irq_remap state

2021-03-17 Thread Joerg Roedel

From: Joerg Roedel 

The amd_iommu_irq_remap variable is set to true in amd_iommu_prepare().
But if initialization fails it is not set to false. Fix that and
correctly keep track of whether irq remapping is enabled or not.

References: https://bugzilla.kernel.org/show_bug.cgi?id=212133
References: https://bugzilla.suse.com/show_bug.cgi?id=1183132
Fixes: b34f10c2dc59 ("iommu/amd: Stop irq_remapping_select() matching when 
remapping is disabled")
Cc: sta...@vger.kernel.org # v5.11
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/init.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 61dae1800b7f..321f5906e6ed 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3002,8 +3002,11 @@ int __init amd_iommu_prepare(void)
amd_iommu_irq_remap = true;
 
ret = iommu_go_to_state(IOMMU_ACPI_FINISHED);
-   if (ret)
+   if (ret) {
+   amd_iommu_irq_remap = false;
return ret;
+   }
+
return amd_iommu_irq_remap ? 0 : -ENODEV;
 }
 
-- 
2.30.2

[PATCH 0/3] iommu/amd: Fix booting with amd_iommu=off

2021-03-17 Thread Joerg Roedel

From: Joerg Roedel 

Hi,

it turned out that booting a kernel with amd_iommu=off on a machine
that has an AMD IOMMU causes an early kernel crash. There are two
reasons for this, and fixing one of them is already sufficient, but
both reasons deserve fixing, which is done in this patch-set.

Regards,

Joerg

Joerg Roedel (3):
  iommu/amd: Move Stoney Ridge check to detect_ivrs()
  iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is
disabled
  iommu/amd: Keep track of amd_iommu_irq_remap state

 drivers/iommu/amd/init.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

-- 
2.30.2

[PATCH 1/3] iommu/amd: Move Stoney Ridge check to detect_ivrs()

2021-03-17 Thread Joerg Roedel

From: Joerg Roedel 

The AMD IOMMU will not be enabled on AMD Stoney Ridge systems. Bail
out even earlier and refuse to even detect the IOMMU there.

References: https://bugzilla.kernel.org/show_bug.cgi?id=212133
References: https://bugzilla.suse.com/show_bug.cgi?id=1183132
Cc: sta...@vger.kernel.org # v5.11
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/init.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9126efcbaf2c..3280e6f5b720 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2714,7 +2714,6 @@ static int __init early_amd_iommu_init(void)
struct acpi_table_header *ivrs_base;
int i, remap_cache_sz, ret;
acpi_status status;
-   u32 pci_id;
 
if (!amd_iommu_detected)
return -ENODEV;
@@ -2804,16 +2803,6 @@ static int __init early_amd_iommu_init(void)
if (ret)
goto out;
 
-   /* Disable IOMMU if there's Stoney Ridge graphics */
-   for (i = 0; i < 32; i++) {
-   pci_id = read_pci_config(0, i, 0, 0);
-   if ((pci_id & 0x) == 0x1002 && (pci_id >> 16) == 0x98e4) {
-   pr_info("Disable IOMMU on Stoney Ridge\n");
-   amd_iommu_disabled = true;
-   break;
-   }
-   }
-
/* Disable any previously enabled IOMMUs */
if (!is_kdump_kernel() || amd_iommu_disabled)
disable_iommus();
@@ -2880,6 +2869,7 @@ static bool detect_ivrs(void)
 {
struct acpi_table_header *ivrs_base;
acpi_status status;
+   int i;
 
status = acpi_get_table("IVRS", 0, _base);
if (status == AE_NOT_FOUND)
@@ -2892,6 +2882,17 @@ static bool detect_ivrs(void)
 
acpi_put_table(ivrs_base);
 
+   /* Don't use IOMMU if there is Stoney Ridge graphics */
+   for (i = 0; i < 32; i++) {
+   u32 pci_id;
+
+   pci_id = read_pci_config(0, i, 0, 0);
+   if ((pci_id & 0x) == 0x1002 && (pci_id >> 16) == 0x98e4) {
+   pr_info("Disable IOMMU on Stoney Ridge\n");
+   return false;
+   }
+   }
+
/* Make sure ACS will be enabled during PCI probe */
pci_request_acs();
 
-- 
2.30.2

Re: [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable

2021-03-16 Thread Joerg Roedel

On Tue, Mar 16, 2021 at 09:36:02PM +0800, Huang Rui wrote:
> Thanks for the comments. Could you please elaborate this?
> 
> Do you mean while amd_iommu=off, we won't prepare the IVRS, and even
> needn't get all ACPI talbes. Because they are never be used and the next
> state will always goes into IOMMU_CMDLINE_DISABLED, am I right?

The first problem was that amd_iommu_irq_remap is never set back to
false when irq-remapping initialization fails in amd_iommu_prepare().

But there are other problems, like that even when the IOMMU is set to
disabled on the command line with amd_iommu=off, the code still sets up
all IOMMUs and registers IRQ domains for them.

Later the code checks wheter the IOMMU should stay disabled and tears
everything down, except for the IRQ domains, which stay in the global
list.

The APIs do not really support tearing down IRQ domains well, so its not
so easy to add this to the tear-down path. Now that the IRQ domains stay
in the list, the ACPI code will come along later and calls the
->select() call-back for every IRQ domain, which gets execution to
irq_remapping_select(), depite IOMMU being disabled and
amd_iommu_rlookup_table already de-allocated. But since
amd_iommu_irq_remap is still true the NULL pointer is dereferenced,
causing the crash.

When the IRQ domains would not be around, this would also not happen. So
my patches also change the initializtion to not do all the setup work
when amd_iommu=off was passed on the command line.

Regards,

Joerg

Re: [PATCH] iommu/amd: Fix iommu remap panic while amd_iommu is set to disable

2021-03-16 Thread Joerg Roedel

Hi Huang,

On Thu, Mar 11, 2021 at 10:28:07PM +0800, Huang Rui wrote:
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index f0adbc48fd17..a08e885403b7 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3862,7 +3862,7 @@ static int irq_remapping_select(struct irq_domain *d, 
> struct irq_fwspec *fwspec,
>   else if (x86_fwspec_is_hpet(fwspec))
>   devid = get_hpet_devid(fwspec->param[0]);
>  
> - if (devid < 0)
> + if (devid < 0 || !amd_iommu_rlookup_table)
>   return 0;

The problem is deeper than this fix suggests. I prepared other fixes for
this particular problem. Please find them here:


https://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git/log/?h=iommu-fixes

Regards,

Joerg

Re: [PATCH 3/3] KVM: SVM: allow to intercept all exceptions for debug

2021-03-16 Thread Joerg Roedel

Hi Maxim,

On Tue, Mar 16, 2021 at 12:10:20AM +0200, Maxim Levitsky wrote:
> -static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> +static int (*svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {

Can you keep this const and always set the necessary handlers? If
exceptions are not intercepted they will not be used.

> @@ -333,7 +334,9 @@ static inline void clr_exception_intercept(struct 
> vcpu_svm *svm, u32 bit)
>   struct vmcb *vmcb = svm->vmcb01.ptr;
>  
>   WARN_ON_ONCE(bit >= 32);
> - vmcb_clr_intercept(>control, INTERCEPT_EXCEPTION_OFFSET + bit);
> +
> + if (!((1 << bit) & debug_intercept_exceptions))
> + vmcb_clr_intercept(>control, INTERCEPT_EXCEPTION_OFFSET + 
> bit);

This will break SEV-ES guests, as those will not cause an intercept but
now start to get #VC exceptions on every other exception that is raised.
SEV-ES guests are not prepared for that and will not even boot, so
please don't enable this feature for them.

[PATCH v3 3/8] x86/boot/compressed/64: Reload CS in startup_32

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

Exception handling in the startup_32 boot path requires the CS
selector to be correctly set up. Reload it from the current GDT.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index e94874f4bbc1..c59c80ca546d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -107,9 +107,16 @@ SYM_FUNC_START(startup_32)
movl%eax, %gs
movl%eax, %ss
 
-/* setup a stack and make sure cpu supports long mode. */
+   /* Setup a stack and load CS from current GDT */
lealrva(boot_stack_end)(%ebp), %esp
 
+   pushl   $__KERNEL32_CS
+   lealrva(1f)(%ebp), %eax
+   pushl   %eax
+   lretl
+1:
+
+   /* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
jnz .Lno_longmode
-- 
2.30.1

[PATCH v3 2/8] x86/sev: Do not require Hypervisor CPUID bit for SEV guests

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

A malicious hypervisor could disable the CPUID intercept for an SEV or
SEV-ES guest and trick it into the no-SEV boot path, where it could
potentially reveal secrets. This is not an issue for SEV-SNP guests,
as the CPUID intercept can't be disabled for those.

Remove the Hypervisor CPUID bit check from the SEV detection code to
protect against this kind of attack and add a Hypervisor bit equals
zero check to the SME detection path to prevent non-SEV guests from
trying to enable SME.

This handles the following cases:

1) SEV(-ES) guest where CPUID intercept is disabled. The guest
   will still see leaf 0x801f and the SEV bit. It can
   retrieve the C-bit and boot normally.

2) Non-SEV guests with intercepted CPUID will check SEV_STATUS
   MSR and find it 0 and will try to enable SME. This will
   fail when the guest finds MSR_K8_SYSCFG to be zero, as it
   is emulated by KVM. But we can't rely on that, as there
   might be other hypervisors which return this MSR with bit
   23 set. The Hypervisor bit check will prevent that the
   guest tries to enable SME in this case.

3) Non-SEV guests on SEV capable hosts with CPUID intercept
   disabled (by a malicious hypervisor) will try to boot into
   the SME path. This will fail, but it is also not considered
   a problem because non-encrypted guests have no protection
   against the hypervisor anyway.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/mem_encrypt.S |  6 -
 arch/x86/kernel/sev-es-shared.c|  6 +
 arch/x86/mm/mem_encrypt_identity.c | 35 ++
 3 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index aa561795efd1..a6dea4e8a082 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -23,12 +23,6 @@ SYM_FUNC_START(get_sev_encryption_bit)
push%ecx
push%edx
 
-   /* Check if running under a hypervisor */
-   movl$1, %eax
-   cpuid
-   bt  $31, %ecx   /* Check the hypervisor bit */
-   jnc .Lno_sev
-
movl$0x8000, %eax   /* CPUID to check the highest leaf */
cpuid
cmpl$0x801f, %eax   /* See if 0x801f is available */
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index cdc04d091242..387b71669818 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -186,7 +186,6 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
 * make it accessible to the hypervisor.
 *
 * In particular, check for:
-*  - Hypervisor CPUID bit
 *  - Availability of CPUID leaf 0x801f
 *  - SEV CPUID bit.
 *
@@ -194,10 +193,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
 * can't be checked here.
 */
 
-   if ((fn == 1 && !(regs->cx & BIT(31
-   /* Hypervisor bit */
-   goto fail;
-   else if (fn == 0x8000 && (regs->ax < 0x801f))
+   if (fn == 0x8000 && (regs->ax < 0x801f))
/* SEV leaf check */
goto fail;
else if ((fn == 0x801f && !(regs->ax & BIT(1
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index 6c5eb6f3f14f..a19374d26101 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -503,14 +503,10 @@ void __init sme_enable(struct boot_params *bp)
 
 #define AMD_SME_BITBIT(0)
 #define AMD_SEV_BITBIT(1)
-   /*
-* Set the feature mask (SME or SEV) based on whether we are
-* running under a hypervisor.
-*/
-   eax = 1;
-   ecx = 0;
-   native_cpuid(, , , );
-   feature_mask = (ecx & BIT(31)) ? AMD_SEV_BIT : AMD_SME_BIT;
+
+   /* Check the SEV MSR whether SEV or SME is enabled */
+   sev_status   = __rdmsr(MSR_AMD64_SEV);
+   feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : 
AMD_SME_BIT;
 
/*
 * Check for the SME/SEV feature:
@@ -530,19 +526,26 @@ void __init sme_enable(struct boot_params *bp)
 
/* Check if memory encryption is enabled */
if (feature_mask == AMD_SME_BIT) {
+   /*
+* No SME if Hypervisor bit is set. This check is here to
+* prevent a guest from trying to enable SME. For running as a
+* KVM guest the MSR_K8_SYSCFG will be sufficient, but there
+* might be other hypervisors which emulate that MSR as non-zero
+* or even pass it through to the guest.
+* A malicious hy

[PATCH v3 0/8] x86/seves: Support 32-bit boot path and other updates

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

Hi,

these patches add support for the 32-bit boot in the decompressor
code. This is needed to boot an SEV-ES guest on some firmware and grub
versions. The patches also add the necessary CPUID sanity checks and a
32-bit version of the C-bit check.

Other updates included here:

1. Add code to shut down exception handling in the
   decompressor code before jumping to the real kernel.
   Once in the real kernel it is not safe anymore to jump
   back to the decompressor code via exceptions.

2. Replace open-coded hlt loops with proper calls to
   sev_es_terminate().

Please review.

Thanks,

Joerg

Changes v2->v3:

- Added a patch to remove the check for the Hypervisor CPUID
  bit for detecting SEV

Changes v1->v2:

- Addressed Boris' review comments.
- Fixed a bug which caused the cbit-check to never be
  executed even in an SEV guest.

Joerg Roedel (8):
  x86/boot/compressed/64: Cleanup exception handling before booting
kernel
  x86/sev: Do not require Hypervisor CPUID bit for SEV guests
  x86/boot/compressed/64: Reload CS in startup_32
  x86/boot/compressed/64: Setup IDT in startup_32 boot path
  x86/boot/compressed/64: Add 32-bit boot #VC handler
  x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path
  x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path
  x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

 arch/x86/boot/compressed/head_64.S | 170 -
 arch/x86/boot/compressed/idt_64.c  |  14 ++
 arch/x86/boot/compressed/mem_encrypt.S | 130 ++-
 arch/x86/boot/compressed/misc.c|   7 +-
 arch/x86/boot/compressed/misc.h|   6 +
 arch/x86/boot/compressed/sev-es.c  |  12 +-
 arch/x86/kernel/sev-es-shared.c|  16 +--
 arch/x86/mm/mem_encrypt_identity.c |  35 ++---
 8 files changed, 340 insertions(+), 50 deletions(-)

-- 
2.30.1

[PATCH v3 6/8] x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C bit.

But the CPUID information comes from the hypervisor, which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/mem_encrypt.S | 28 ++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index e915f4906477..0211eddefeb0 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -139,6 +139,26 @@ SYM_CODE_START(startup32_vc_handler)
jnz .Lfail
movl%edx, 0(%esp)   # Store result
 
+   /*
+* Sanity check CPUID results from the Hypervisor. See comment in
+* do_vc_no_ghcb() for more details on why this is necessary.
+*/
+
+   /* Fail if SEV leaf not available in CPUID[0x8000].EAX */
+   cmpl$0x8000, %ebx
+   jne .Lcheck_sev
+   cmpl$0x801f, 12(%esp)
+   jb  .Lfail
+   jmp .Ldone
+
+.Lcheck_sev:
+   /* Fail if SEV bit not set in CPUID[0x801f].EAX[1] */
+   cmpl$0x801f, %ebx
+   jne .Ldone
+   btl $1, 12(%esp)
+   jnc .Lfail
+
+.Ldone:
popl%edx
popl%ecx
popl%ebx
@@ -152,6 +172,14 @@ SYM_CODE_START(startup32_vc_handler)
 
iret
 .Lfail:
+   /* Send terminate request to Hypervisor */
+   movl$0x100, %eax
+   xorl%edx, %edx
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall
+
+   /* If request fails, go to hlt loop */
hlt
jmp .Lfail
 SYM_CODE_END(startup32_vc_handler)
-- 
2.30.1

[PATCH v3 5/8] x86/boot/compressed/64: Add 32-bit boot #VC handler

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause #VC
exceptions in an SEV-ES guest.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S |  6 ++
 arch/x86/boot/compressed/mem_encrypt.S | 96 +-
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2001c3bf0748..ee448aedb8b0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pgtable.h"
 
 /*
@@ -857,6 +858,11 @@ SYM_FUNC_END(startup32_set_idt_entry)
 
 SYM_FUNC_START(startup32_load_idt)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* #VC handler */
+   lealrva(startup32_vc_handler)(%ebp), %eax
+   movl$X86_TRAP_VC, %edx
+   callstartup32_set_idt_entry
+
/* Load IDT */
lealrva(boot32_idt)(%ebp), %eax
movl%eax, rva(boot32_idt_desc+2)(%ebp)
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index a6dea4e8a082..e915f4906477 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -61,10 +61,104 @@ SYM_FUNC_START(get_sev_encryption_bit)
ret
 SYM_FUNC_END(get_sev_encryption_bit)
 
+/**
+ * sev_es_req_cpuid - Request a CPUID value from the Hypervisor using
+ *   the GHCB MSR protocol
+ *
+ * @%eax:  Register to request (0=EAX, 1=EBX, 2=ECX, 3=EDX)
+ * @%edx:  CPUID Function
+ *
+ * Returns 0 in %eax on sucess, non-zero on failure
+ * %edx returns CPUID value on success
+ */
+SYM_CODE_START_LOCAL(sev_es_req_cpuid)
+   shll$30, %eax
+   orl $0x0004, %eax
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall# VMGEXIT
+   rdmsr
+
+   /* Check response */
+   movl%eax, %ecx
+   andl$0x3000, %ecx   # Bits [12-29] MBZ
+   jnz 2f
+
+   /* Check return code */
+   andl$0xfff, %eax
+   cmpl$5, %eax
+   jne 2f
+
+   /* All good - return success */
+   xorl%eax, %eax
+1:
+   ret
+2:
+   movl$-1, %eax
+   jmp 1b
+SYM_CODE_END(sev_es_req_cpuid)
+
+SYM_CODE_START(startup32_vc_handler)
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Keep CPUID function in %ebx */
+   movl%eax, %ebx
+
+   /* Check if error-code == SVM_EXIT_CPUID */
+   cmpl$0x72, 16(%esp)
+   jne .Lfail
+
+   movl$0, %eax# Request CPUID[fn].EAX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 12(%esp)  # Store result
+
+   movl$1, %eax# Request CPUID[fn].EBX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 8(%esp)   # Store result
+
+   movl$2, %eax# Request CPUID[fn].ECX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 4(%esp)   # Store result
+
+   movl$3, %eax# Request CPUID[fn].EDX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 0(%esp)   # Store result
+
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+
+   /* Remove error code */
+   addl$4, %esp
+
+   /* Jump over CPUID instruction */
+   addl$2, (%esp)
+
+   iret
+.Lfail:
+   hlt
+   jmp .Lfail
+SYM_CODE_END(startup32_vc_handler)
+
.code64
 
 #include "../../kernel/sev_verify_cbit.S"
-
 SYM_FUNC_START(set_sev_encryption_mask)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
push%rbp
-- 
2.30.1

[PATCH v3 4/8] x86/boot/compressed/64: Setup IDT in startup_32 boot path

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

This boot path needs exception handling when it is used with SEV-ES.
Setup an IDT and provide a helper function to write IDT entries for
use in 32-bit protected mode.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index c59c80ca546d..2001c3bf0748 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -116,6 +116,9 @@ SYM_FUNC_START(startup_32)
lretl
 1:
 
+   /* Setup Exception handling for SEV-ES */
+   callstartup32_load_idt
+
/* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
@@ -701,6 +704,19 @@ SYM_DATA_START(boot_idt)
.endr
 SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+SYM_DATA_START(boot32_idt_desc)
+   .word   boot32_idt_end - boot32_idt - 1
+   .long   0
+SYM_DATA_END(boot32_idt_desc)
+   .balign 8
+SYM_DATA_START(boot32_idt)
+   .rept 32
+   .quad 0
+   .endr
+SYM_DATA_END_LABEL(boot32_idt, SYM_L_GLOBAL, boot32_idt_end)
+#endif
+
 #ifdef CONFIG_EFI_STUB
 SYM_DATA(image_offset, .long 0)
 #endif
@@ -793,6 +809,62 @@ SYM_DATA_START_LOCAL(loaded_image_proto)
 SYM_DATA_END(loaded_image_proto)
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   __HEAD
+   .code32
+/*
+ * Write an IDT entry into boot32_idt
+ *
+ * Parameters:
+ *
+ * %eax:   Handler address
+ * %edx:   Vector number
+ *
+ * Physical offset is expected in %ebp
+ */
+SYM_FUNC_START(startup32_set_idt_entry)
+   push%ebx
+   push%ecx
+
+   /* IDT entry address to %ebx */
+   lealrva(boot32_idt)(%ebp), %ebx
+   shl $3, %edx
+   addl%edx, %ebx
+
+   /* Build IDT entry, lower 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [15:0]
+   movl$__KERNEL32_CS, %ecx# Target code segment selector
+   shl $16, %ecx
+   orl %ecx, %edx
+
+   /* Store lower 4 bytes to IDT */
+   movl%edx, (%ebx)
+
+   /* Build IDT entry, upper 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [31:16]
+   orl $0x8e00, %edx   # Present, Type 32-bit Interrupt Gate
+
+   /* Store upper 4 bytes to IDT */
+   movl%edx, 4(%ebx)
+
+   pop %ecx
+   pop %ebx
+   ret
+SYM_FUNC_END(startup32_set_idt_entry)
+#endif
+
+SYM_FUNC_START(startup32_load_idt)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* Load IDT */
+   lealrva(boot32_idt)(%ebp), %eax
+   movl%eax, rva(boot32_idt_desc+2)(%ebp)
+   lidtrva(boot32_idt_desc)(%ebp)
+#endif
+   ret
+SYM_FUNC_END(startup32_load_idt)
+
 /*
  * Stack and heap for uncompression
  */
-- 
2.30.1

[PATCH v3 7/8] x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

Check whether the hypervisor reported the correct C-bit when running as
an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S | 83 ++
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index ee448aedb8b0..7c5c2698a96e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -183,11 +183,21 @@ SYM_FUNC_START(startup_32)
 */
callget_sev_encryption_bit
xorl%edx, %edx
+#ifdef CONFIG_AMD_MEM_ENCRYPT
testl   %eax, %eax
jz  1f
subl$32, %eax   /* Encryption bit is always above bit 31 */
bts %eax, %edx  /* Set encryption mask for page tables */
+   /*
+* Mark SEV as active in sev_status so that startup32_check_sev_cbit()
+* will do a check. The sev_status memory will be fully initialized
+* with the contents of MSR_AMD_SEV_STATUS later in
+* set_sev_encryption_mask(). For now it is sufficient to know that SEV
+* is active.
+*/
+   movl$1, rva(sev_status)(%ebp)
 1:
+#endif
 
/* Initialize Page tables to 0 */
lealrva(pgtable)(%ebx), %edi
@@ -272,6 +282,9 @@ SYM_FUNC_START(startup_32)
movl%esi, %edx
 1:
 #endif
+   /* Check if the C-bit position is correct when SEV is active */
+   callstartup32_check_sev_cbit
+
pushl   $__KERNEL_CS
pushl   %eax
 
@@ -871,6 +884,76 @@ SYM_FUNC_START(startup32_load_idt)
ret
 SYM_FUNC_END(startup32_load_idt)
 
+/*
+ * Check for the correct C-bit position when the startup_32 boot-path is used.
+ *
+ * The check makes use of the fact that all memory is encrypted when paging is
+ * disabled. The function creates 64 bits of random data using the RDRAND
+ * instruction. RDRAND is mandatory for SEV guests, so always available. If the
+ * hypervisor violates that the kernel will crash right here.
+ *
+ * The 64 bits of random data are stored to a memory location and at the same
+ * time kept in the %eax and %ebx registers. Since encryption is always active
+ * when paging is off the random data will be stored encrypted in main memory.
+ *
+ * Then paging is enabled. When the C-bit position is correct all memory is
+ * still mapped encrypted and comparing the register values with memory will
+ * succeed. An incorrect C-bit position will map all memory unencrypted, so 
that
+ * the compare will use the encrypted random data and fail.
+ */
+SYM_FUNC_START(startup32_check_sev_cbit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Check for non-zero sev_status */
+   movlrva(sev_status)(%ebp), %eax
+   testl   %eax, %eax
+   jz  4f
+
+   /*
+* Get two 32-bit random values - Don't bail out if RDRAND fails
+* because it is better to prevent forward progress if no random value
+* can be gathered.
+*/
+1: rdrand  %eax
+   jnc 1b
+2: rdrand  %ebx
+   jnc 2b
+
+   /* Store to memory and keep it in the registers */
+   movl%eax, rva(sev_check_data)(%ebp)
+   movl%ebx, rva(sev_check_data+4)(%ebp)
+
+   /* Enable paging to see if encryption is active */
+   movl%cr0, %edx  /* Backup %cr0 in %edx */
+   movl$(X86_CR0_PG | X86_CR0_PE), %ecx /* Enable Paging and Protected 
mode */
+   movl%ecx, %cr0
+
+   cmpl%eax, rva(sev_check_data)(%ebp)
+   jne 3f
+   cmpl%ebx, rva(sev_check_data+4)(%ebp)
+   jne 3f
+
+   movl%edx, %cr0  /* Restore previous %cr0 */
+
+   jmp 4f
+
+3: /* Check failed - hlt the machine */
+   hlt
+   jmp 3b
+
+4:
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+#endif
+   ret
+SYM_FUNC_END(startup32_check_sev_cbit)
+
 /*
  * Stack and heap for uncompression
  */
-- 
2.30.1

[PATCH v3 1/8] x86/boot/compressed/64: Cleanup exception handling before booting kernel

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

Disable the exception handling before booting the kernel to make sure
any exceptions that happen during early kernel boot are not directed to
the pre-decompression code.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/idt_64.c | 14 ++
 arch/x86/boot/compressed/misc.c   |  7 ++-
 arch/x86/boot/compressed/misc.h   |  6 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index 804a502ee0d2..9b93567d663a 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -52,3 +52,17 @@ void load_stage2_idt(void)
 
load_boot_idt(_idt_desc);
 }
+
+void cleanup_exception_handling(void)
+{
+   /*
+* Flush GHCB from cache and map it encrypted again when running as
+* SEV-ES guest.
+*/
+   sev_es_shutdown_ghcb();
+
+   /* Set a null-idt, disabling #PF and #VC handling */
+   boot_idt_desc.size= 0;
+   boot_idt_desc.address = 0;
+   load_boot_idt(_idt_desc);
+}
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 267e7f93050e..cc9fd0e8766a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -443,11 +443,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
handle_relocations(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");
 
-   /*
-* Flush GHCB from cache and map it encrypted again when running as
-* SEV-ES guest.
-*/
-   sev_es_shutdown_ghcb();
+   /* Disable exception handling before booting the kernel */
+   cleanup_exception_handling();
 
return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 901ea5ebec22..e5612f035498 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -155,6 +155,12 @@ extern pteval_t __default_kernel_pte_mask;
 extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
 extern struct desc_ptr boot_idt_desc;
 
+#ifdef CONFIG_X86_64
+void cleanup_exception_handling(void);
+#else
+static inline void cleanup_exception_handling(void) { }
+#endif
+
 /* IDT Entry Points */
 void boot_page_fault(void);
 void boot_stage1_vc(void);
-- 
2.30.1

[PATCH v3 8/8] x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

2021-03-12 Thread Joerg Roedel

From: Joerg Roedel 

There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/sev-es.c | 12 +++-
 arch/x86/kernel/sev-es-shared.c   | 10 +++---
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 27826c265aab..d904bd56b3e3 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -200,14 +200,8 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
}
 
 finish:
-   if (result == ES_OK) {
+   if (result == ES_OK)
vc_finish_insn();
-   } else if (result != ES_RETRY) {
-   /*
-* For now, just halt the machine. That makes debugging easier,
-* later we just call sev_es_terminate() here.
-*/
-   while (true)
-   asm volatile("hlt\n");
-   }
+   else if (result != ES_RETRY)
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 387b71669818..0aa9f13efd57 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int reason)
 {
u64 val = GHCB_SEV_TERMINATE;
 
@@ -206,12 +206,8 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
return;
 
 fail:
-   sev_es_wr_ghcb_msr(GHCB_SEV_TERMINATE);
-   VMGEXIT();
-
-   /* Shouldn't get here - if we do halt the machine */
-   while (true)
-   asm volatile("hlt\n");
+   /* Terminate the guest */
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
 
 static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
-- 
2.30.1

[PATCH v2 5/7] x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

The 32-bit #VC handler has no GHCB and can only handle CPUID exit codes.
It is needed by the early boot code to handle #VC exceptions raised in
verify_cpu() and to get the position of the C bit.

But the CPUID information comes from the hypervisor, which is untrusted
and might return results which trick the guest into the no-SEV boot path
with no C bit set in the page-tables. All data written to memory would
then be unencrypted and could leak sensitive data to the hypervisor.

Add sanity checks to the 32-bit boot #VC handler to make sure the
hypervisor does not pretend that SEV is not enabled.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/mem_encrypt.S | 36 ++
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index 2ca056a3707c..8941c3a8ff8a 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -145,6 +145,34 @@ SYM_CODE_START(startup32_vc_handler)
jnz .Lfail
movl%edx, 0(%esp)   # Store result
 
+   /*
+* Sanity check CPUID results from the Hypervisor. See comment in
+* do_vc_no_ghcb() for more details on why this is necessary.
+*/
+
+   /* Fail if Hypervisor bit not set in CPUID[1].ECX[31] */
+   cmpl$1, %ebx
+   jne .Lcheck_leaf
+   btl $31, 4(%esp)
+   jnc .Lfail
+   jmp .Ldone
+
+.Lcheck_leaf:
+   /* Fail if SEV leaf not available in CPUID[0x8000].EAX */
+   cmpl$0x8000, %ebx
+   jne .Lcheck_sev
+   cmpl$0x801f, 12(%esp)
+   jb  .Lfail
+   jmp .Ldone
+
+.Lcheck_sev:
+   /* Fail if SEV bit not set in CPUID[0x801f].EAX[1] */
+   cmpl$0x801f, %ebx
+   jne .Ldone
+   btl $1, 12(%esp)
+   jnc .Lfail
+
+.Ldone:
popl%edx
popl%ecx
popl%ebx
@@ -158,6 +186,14 @@ SYM_CODE_START(startup32_vc_handler)
 
iret
 .Lfail:
+   /* Send terminate request to Hypervisor */
+   movl$0x100, %eax
+   xorl%edx, %edx
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall
+
+   /* If request fails, go to hlt loop */
hlt
jmp .Lfail
 SYM_CODE_END(startup32_vc_handler)
-- 
2.30.1

[PATCH v2 6/7] x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

Check whether the hypervisor reported the correct C-bit when running as
an SEV guest. Using a wrong C-bit position could be used to leak
sensitive data from the guest to the hypervisor.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S | 83 ++
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index ee448aedb8b0..7c5c2698a96e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -183,11 +183,21 @@ SYM_FUNC_START(startup_32)
 */
callget_sev_encryption_bit
xorl%edx, %edx
+#ifdef CONFIG_AMD_MEM_ENCRYPT
testl   %eax, %eax
jz  1f
subl$32, %eax   /* Encryption bit is always above bit 31 */
bts %eax, %edx  /* Set encryption mask for page tables */
+   /*
+* Mark SEV as active in sev_status so that startup32_check_sev_cbit()
+* will do a check. The sev_status memory will be fully initialized
+* with the contents of MSR_AMD_SEV_STATUS later in
+* set_sev_encryption_mask(). For now it is sufficient to know that SEV
+* is active.
+*/
+   movl$1, rva(sev_status)(%ebp)
 1:
+#endif
 
/* Initialize Page tables to 0 */
lealrva(pgtable)(%ebx), %edi
@@ -272,6 +282,9 @@ SYM_FUNC_START(startup_32)
movl%esi, %edx
 1:
 #endif
+   /* Check if the C-bit position is correct when SEV is active */
+   callstartup32_check_sev_cbit
+
pushl   $__KERNEL_CS
pushl   %eax
 
@@ -871,6 +884,76 @@ SYM_FUNC_START(startup32_load_idt)
ret
 SYM_FUNC_END(startup32_load_idt)
 
+/*
+ * Check for the correct C-bit position when the startup_32 boot-path is used.
+ *
+ * The check makes use of the fact that all memory is encrypted when paging is
+ * disabled. The function creates 64 bits of random data using the RDRAND
+ * instruction. RDRAND is mandatory for SEV guests, so always available. If the
+ * hypervisor violates that the kernel will crash right here.
+ *
+ * The 64 bits of random data are stored to a memory location and at the same
+ * time kept in the %eax and %ebx registers. Since encryption is always active
+ * when paging is off the random data will be stored encrypted in main memory.
+ *
+ * Then paging is enabled. When the C-bit position is correct all memory is
+ * still mapped encrypted and comparing the register values with memory will
+ * succeed. An incorrect C-bit position will map all memory unencrypted, so 
that
+ * the compare will use the encrypted random data and fail.
+ */
+SYM_FUNC_START(startup32_check_sev_cbit)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Check for non-zero sev_status */
+   movlrva(sev_status)(%ebp), %eax
+   testl   %eax, %eax
+   jz  4f
+
+   /*
+* Get two 32-bit random values - Don't bail out if RDRAND fails
+* because it is better to prevent forward progress if no random value
+* can be gathered.
+*/
+1: rdrand  %eax
+   jnc 1b
+2: rdrand  %ebx
+   jnc 2b
+
+   /* Store to memory and keep it in the registers */
+   movl%eax, rva(sev_check_data)(%ebp)
+   movl%ebx, rva(sev_check_data+4)(%ebp)
+
+   /* Enable paging to see if encryption is active */
+   movl%cr0, %edx  /* Backup %cr0 in %edx */
+   movl$(X86_CR0_PG | X86_CR0_PE), %ecx /* Enable Paging and Protected 
mode */
+   movl%ecx, %cr0
+
+   cmpl%eax, rva(sev_check_data)(%ebp)
+   jne 3f
+   cmpl%ebx, rva(sev_check_data+4)(%ebp)
+   jne 3f
+
+   movl%edx, %cr0  /* Restore previous %cr0 */
+
+   jmp 4f
+
+3: /* Check failed - hlt the machine */
+   hlt
+   jmp 3b
+
+4:
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+#endif
+   ret
+SYM_FUNC_END(startup32_check_sev_cbit)
+
 /*
  * Stack and heap for uncompression
  */
-- 
2.30.1

[PATCH v2 2/7] x86/boot/compressed/64: Reload CS in startup_32

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

Exception handling in the startup_32 boot path requires the CS
selector to be correctly set up. Reload it from the current GDT.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index e94874f4bbc1..c59c80ca546d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -107,9 +107,16 @@ SYM_FUNC_START(startup_32)
movl%eax, %gs
movl%eax, %ss
 
-/* setup a stack and make sure cpu supports long mode. */
+   /* Setup a stack and load CS from current GDT */
lealrva(boot_stack_end)(%ebp), %esp
 
+   pushl   $__KERNEL32_CS
+   lealrva(1f)(%ebp), %eax
+   pushl   %eax
+   lretl
+1:
+
+   /* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
jnz .Lno_longmode
-- 
2.30.1

[PATCH v2 0/7] x86/seves: Support 32-bit boot path and other updates

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

Hi,

these patches add support for the 32-bit boot in the decompressor
code. This is needed to boot an SEV-ES guest on some firmware and grub
versions. The patches also add the necessary CPUID sanity checks and a
32-bit version of the C-bit check.

Other updates included here:

1. Add code to shut down exception handling in the
   decompressor code before jumping to the real kernel.
   Once in the real kernel it is not safe anymore to jump
   back to the decompressor code via exceptions.

2. Replace open-coded hlt loops with proper calls to
   sev_es_terminate().

Please review.

Thanks,

Joerg

Changes to v1:

- Addressed Boris' review comments.
- Fixed a bug which caused the cbit-check to never be
  executed even in an SEV guest.

Joerg Roedel (7):
  x86/boot/compressed/64: Cleanup exception handling before booting
kernel
  x86/boot/compressed/64: Reload CS in startup_32
  x86/boot/compressed/64: Setup IDT in startup_32 boot path
  x86/boot/compressed/64: Add 32-bit boot #VC handler
  x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path
  x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path
  x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

 arch/x86/boot/compressed/head_64.S | 170 -
 arch/x86/boot/compressed/idt_64.c  |  14 ++
 arch/x86/boot/compressed/mem_encrypt.S | 132 ++-
 arch/x86/boot/compressed/misc.c|   7 +-
 arch/x86/boot/compressed/misc.h|   6 +
 arch/x86/boot/compressed/sev-es.c  |  12 +-
 arch/x86/kernel/sev-es-shared.c|  10 +-
 7 files changed, 328 insertions(+), 23 deletions(-)

-- 
2.30.1

[PATCH v2 4/7] x86/boot/compressed/64: Add 32-bit boot #VC handler

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

Add a #VC exception handler which is used when the kernel still executes
in protected mode. This boot-path already uses CPUID, which will cause #VC
exceptions in an SEV-ES guest.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S |  6 ++
 arch/x86/boot/compressed/mem_encrypt.S | 96 +-
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2001c3bf0748..ee448aedb8b0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "pgtable.h"
 
 /*
@@ -857,6 +858,11 @@ SYM_FUNC_END(startup32_set_idt_entry)
 
 SYM_FUNC_START(startup32_load_idt)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* #VC handler */
+   lealrva(startup32_vc_handler)(%ebp), %eax
+   movl$X86_TRAP_VC, %edx
+   callstartup32_set_idt_entry
+
/* Load IDT */
lealrva(boot32_idt)(%ebp), %eax
movl%eax, rva(boot32_idt_desc+2)(%ebp)
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index aa561795efd1..2ca056a3707c 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -67,10 +67,104 @@ SYM_FUNC_START(get_sev_encryption_bit)
ret
 SYM_FUNC_END(get_sev_encryption_bit)
 
+/**
+ * sev_es_req_cpuid - Request a CPUID value from the Hypervisor using
+ *   the GHCB MSR protocol
+ *
+ * @%eax:  Register to request (0=EAX, 1=EBX, 2=ECX, 3=EDX)
+ * @%edx:  CPUID Function
+ *
+ * Returns 0 in %eax on sucess, non-zero on failure
+ * %edx returns CPUID value on success
+ */
+SYM_CODE_START_LOCAL(sev_es_req_cpuid)
+   shll$30, %eax
+   orl $0x0004, %eax
+   movl$MSR_AMD64_SEV_ES_GHCB, %ecx
+   wrmsr
+   rep; vmmcall# VMGEXIT
+   rdmsr
+
+   /* Check response */
+   movl%eax, %ecx
+   andl$0x3000, %ecx   # Bits [12-29] MBZ
+   jnz 2f
+
+   /* Check return code */
+   andl$0xfff, %eax
+   cmpl$5, %eax
+   jne 2f
+
+   /* All good - return success */
+   xorl%eax, %eax
+1:
+   ret
+2:
+   movl$-1, %eax
+   jmp 1b
+SYM_CODE_END(sev_es_req_cpuid)
+
+SYM_CODE_START(startup32_vc_handler)
+   pushl   %eax
+   pushl   %ebx
+   pushl   %ecx
+   pushl   %edx
+
+   /* Keep CPUID function in %ebx */
+   movl%eax, %ebx
+
+   /* Check if error-code == SVM_EXIT_CPUID */
+   cmpl$0x72, 16(%esp)
+   jne .Lfail
+
+   movl$0, %eax# Request CPUID[fn].EAX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 12(%esp)  # Store result
+
+   movl$1, %eax# Request CPUID[fn].EBX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 8(%esp)   # Store result
+
+   movl$2, %eax# Request CPUID[fn].ECX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 4(%esp)   # Store result
+
+   movl$3, %eax# Request CPUID[fn].EDX
+   movl%ebx, %edx  # CPUID fn
+   callsev_es_req_cpuid# Call helper
+   testl   %eax, %eax  # Check return code
+   jnz .Lfail
+   movl%edx, 0(%esp)   # Store result
+
+   popl%edx
+   popl%ecx
+   popl%ebx
+   popl%eax
+
+   /* Remove error code */
+   addl$4, %esp
+
+   /* Jump over CPUID instruction */
+   addl$2, (%esp)
+
+   iret
+.Lfail:
+   hlt
+   jmp .Lfail
+SYM_CODE_END(startup32_vc_handler)
+
.code64
 
 #include "../../kernel/sev_verify_cbit.S"
-
 SYM_FUNC_START(set_sev_encryption_mask)
 #ifdef CONFIG_AMD_MEM_ENCRYPT
push%rbp
-- 
2.30.1

[PATCH v2 3/7] x86/boot/compressed/64: Setup IDT in startup_32 boot path

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

This boot path needs exception handling when it is used with SEV-ES.
Setup an IDT and provide a helper function to write IDT entries for
use in 32-bit protected mode.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/head_64.S | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index c59c80ca546d..2001c3bf0748 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -116,6 +116,9 @@ SYM_FUNC_START(startup_32)
lretl
 1:
 
+   /* Setup Exception handling for SEV-ES */
+   callstartup32_load_idt
+
/* Make sure cpu supports long mode. */
callverify_cpu
testl   %eax, %eax
@@ -701,6 +704,19 @@ SYM_DATA_START(boot_idt)
.endr
 SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+SYM_DATA_START(boot32_idt_desc)
+   .word   boot32_idt_end - boot32_idt - 1
+   .long   0
+SYM_DATA_END(boot32_idt_desc)
+   .balign 8
+SYM_DATA_START(boot32_idt)
+   .rept 32
+   .quad 0
+   .endr
+SYM_DATA_END_LABEL(boot32_idt, SYM_L_GLOBAL, boot32_idt_end)
+#endif
+
 #ifdef CONFIG_EFI_STUB
 SYM_DATA(image_offset, .long 0)
 #endif
@@ -793,6 +809,62 @@ SYM_DATA_START_LOCAL(loaded_image_proto)
 SYM_DATA_END(loaded_image_proto)
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   __HEAD
+   .code32
+/*
+ * Write an IDT entry into boot32_idt
+ *
+ * Parameters:
+ *
+ * %eax:   Handler address
+ * %edx:   Vector number
+ *
+ * Physical offset is expected in %ebp
+ */
+SYM_FUNC_START(startup32_set_idt_entry)
+   push%ebx
+   push%ecx
+
+   /* IDT entry address to %ebx */
+   lealrva(boot32_idt)(%ebp), %ebx
+   shl $3, %edx
+   addl%edx, %ebx
+
+   /* Build IDT entry, lower 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [15:0]
+   movl$__KERNEL32_CS, %ecx# Target code segment selector
+   shl $16, %ecx
+   orl %ecx, %edx
+
+   /* Store lower 4 bytes to IDT */
+   movl%edx, (%ebx)
+
+   /* Build IDT entry, upper 4 bytes */
+   movl%eax, %edx
+   andl$0x, %edx   # Target code segment offset [31:16]
+   orl $0x8e00, %edx   # Present, Type 32-bit Interrupt Gate
+
+   /* Store upper 4 bytes to IDT */
+   movl%edx, 4(%ebx)
+
+   pop %ecx
+   pop %ebx
+   ret
+SYM_FUNC_END(startup32_set_idt_entry)
+#endif
+
+SYM_FUNC_START(startup32_load_idt)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   /* Load IDT */
+   lealrva(boot32_idt)(%ebp), %eax
+   movl%eax, rva(boot32_idt_desc+2)(%ebp)
+   lidtrva(boot32_idt_desc)(%ebp)
+#endif
+   ret
+SYM_FUNC_END(startup32_load_idt)
+
 /*
  * Stack and heap for uncompression
  */
-- 
2.30.1

[PATCH v2 7/7] x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate()

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

There are a few places left in the SEV-ES C code where hlt loops and/or
terminate requests are implemented. Replace them all with calls to
sev_es_terminate().

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/sev-es.c | 12 +++-
 arch/x86/kernel/sev-es-shared.c   | 10 +++---
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 27826c265aab..d904bd56b3e3 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -200,14 +200,8 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
}
 
 finish:
-   if (result == ES_OK) {
+   if (result == ES_OK)
vc_finish_insn();
-   } else if (result != ES_RETRY) {
-   /*
-* For now, just halt the machine. That makes debugging easier,
-* later we just call sev_es_terminate() here.
-*/
-   while (true)
-   asm volatile("hlt\n");
-   }
+   else if (result != ES_RETRY)
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index cdc04d091242..7c34be61258e 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -24,7 +24,7 @@ static bool __init sev_es_check_cpu_features(void)
return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
+static void __noreturn sev_es_terminate(unsigned int reason)
 {
u64 val = GHCB_SEV_TERMINATE;
 
@@ -210,12 +210,8 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
long exit_code)
return;
 
 fail:
-   sev_es_wr_ghcb_msr(GHCB_SEV_TERMINATE);
-   VMGEXIT();
-
-   /* Shouldn't get here - if we do halt the machine */
-   while (true)
-   asm volatile("hlt\n");
+   /* Terminate the guest */
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
 }
 
 static enum es_result vc_insn_string_read(struct es_em_ctxt *ctxt,
-- 
2.30.1

[PATCH v2 1/7] x86/boot/compressed/64: Cleanup exception handling before booting kernel

2021-03-10 Thread Joerg Roedel

From: Joerg Roedel 

Disable the exception handling before booting the kernel to make sure
any exceptions that happen during early kernel boot are not directed to
the pre-decompression code.

Signed-off-by: Joerg Roedel 
---
 arch/x86/boot/compressed/idt_64.c | 14 ++
 arch/x86/boot/compressed/misc.c   |  7 ++-
 arch/x86/boot/compressed/misc.h   |  6 ++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index 804a502ee0d2..9b93567d663a 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -52,3 +52,17 @@ void load_stage2_idt(void)
 
load_boot_idt(_idt_desc);
 }
+
+void cleanup_exception_handling(void)
+{
+   /*
+* Flush GHCB from cache and map it encrypted again when running as
+* SEV-ES guest.
+*/
+   sev_es_shutdown_ghcb();
+
+   /* Set a null-idt, disabling #PF and #VC handling */
+   boot_idt_desc.size= 0;
+   boot_idt_desc.address = 0;
+   load_boot_idt(_idt_desc);
+}
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 267e7f93050e..cc9fd0e8766a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -443,11 +443,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
handle_relocations(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");
 
-   /*
-* Flush GHCB from cache and map it encrypted again when running as
-* SEV-ES guest.
-*/
-   sev_es_shutdown_ghcb();
+   /* Disable exception handling before booting the kernel */
+   cleanup_exception_handling();
 
return output;
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 901ea5ebec22..e5612f035498 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -155,6 +155,12 @@ extern pteval_t __default_kernel_pte_mask;
 extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
 extern struct desc_ptr boot_idt_desc;
 
+#ifdef CONFIG_X86_64
+void cleanup_exception_handling(void);
+#else
+static inline void cleanup_exception_handling(void) { }
+#endif
+
 /* IDT Entry Points */
 void boot_page_fault(void);
 void boot_stage1_vc(void);
-- 
2.30.1

[tip: x86/urgent] x86/sev-es: Introduce ip_within_syscall_gap() helper

2021-03-09 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 78a81d88f60ba773cbe890205e1ee67f00502948
Gitweb:
https://git.kernel.org/tip/78a81d88f60ba773cbe890205e1ee67f00502948
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:12 +01:00
Committer: Borislav Petkov 
CommitterDate: Mon, 08 Mar 2021 14:22:17 +01:00

x86/sev-es: Introduce ip_within_syscall_gap() helper

Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.

Fixes: 315562c9af3d5 ("x86/sev-es: Adjust #VC IST Stack on entering NMI 
handler")
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-2-j...@8bytes.org
---
 arch/x86/entry/entry_64_compat.S |  2 ++
 arch/x86/include/asm/proto.h |  1 +
 arch/x86/include/asm/ptrace.h| 15 +++
 arch/x86/kernel/traps.c  |  3 +--
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 541fdaf..0051cf5 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -210,6 +210,8 @@ SYM_CODE_START(entry_SYSCALL_compat)
/* Switch to the kernel stack */
movqPER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
+SYM_INNER_LABEL(entry_SYSCALL_compat_safe_stack, SYM_L_GLOBAL)
+
/* Construct struct pt_regs on stack */
pushq   $__USER32_DS/* pt_regs->ss */
pushq   %r8 /* pt_regs->sp */
diff --git a/arch/x86/include/asm/proto.h b/arch/x86/include/asm/proto.h
index 2c35f1c..b6a9d51 100644
--- a/arch/x86/include/asm/proto.h
+++ b/arch/x86/include/asm/proto.h
@@ -25,6 +25,7 @@ void __end_SYSENTER_singlestep_region(void);
 void entry_SYSENTER_compat(void);
 void __end_entry_SYSENTER_compat(void);
 void entry_SYSCALL_compat(void);
+void entry_SYSCALL_compat_safe_stack(void);
 void entry_INT80_compat(void);
 #ifdef CONFIG_XEN_PV
 void xen_entry_INT80_compat(void);
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index d8324a2..409f661 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -94,6 +94,8 @@ struct pt_regs {
 #include 
 #endif
 
+#include 
+
 struct cpuinfo_x86;
 struct task_struct;
 
@@ -175,6 +177,19 @@ static inline bool any_64bit_mode(struct pt_regs *regs)
 #ifdef CONFIG_X86_64
 #define current_user_stack_pointer()   current_pt_regs()->sp
 #define compat_user_stack_pointer()current_pt_regs()->sp
+
+static inline bool ip_within_syscall_gap(struct pt_regs *regs)
+{
+   bool ret = (regs->ip >= (unsigned long)entry_SYSCALL_64 &&
+   regs->ip <  (unsigned long)entry_SYSCALL_64_safe_stack);
+
+#ifdef CONFIG_IA32_EMULATION
+   ret = ret || (regs->ip >= (unsigned long)entry_SYSCALL_compat &&
+ regs->ip <  (unsigned 
long)entry_SYSCALL_compat_safe_stack);
+#endif
+
+   return ret;
+}
 #endif
 
 static inline unsigned long kernel_stack_pointer(struct pt_regs *regs)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7f5aec7..ac1874a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -694,8 +694,7 @@ asmlinkage __visible noinstr struct pt_regs 
*vc_switch_off_ist(struct pt_regs *r
 * In the SYSCALL entry path the RSP value comes from user-space - don't
 * trust it and switch to the current kernel stack
 */
-   if (regs->ip >= (unsigned long)entry_SYSCALL_64 &&
-   regs->ip <  (unsigned long)entry_SYSCALL_64_safe_stack) {
+   if (ip_within_syscall_gap(regs)) {
sp = this_cpu_read(cpu_current_top_of_stack);
goto sync;
}

[tip: x86/urgent] x86/sev-es: Use __copy_from_user_inatomic()

2021-03-09 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: bffe30dd9f1f3b2608a87ac909a224d6be472485
Gitweb:
https://git.kernel.org/tip/bffe30dd9f1f3b2608a87ac909a224d6be472485
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:16 +01:00
Committer: Borislav Petkov 
CommitterDate: Tue, 09 Mar 2021 12:37:54 +01:00

x86/sev-es: Use __copy_from_user_inatomic()

The #VC handler must run in atomic context and cannot sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user().

Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.

Fixes: 5e3427a7bc432 ("x86/sev-es: Handle instruction fetches from user-space")
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-6-j...@8bytes.org
---
 arch/x86/include/asm/insn-eval.h |  2 +-
 arch/x86/kernel/sev-es.c |  2 +-
 arch/x86/lib/insn-eval.c | 66 ---
 3 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index a0f839a..98b4dae 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -23,6 +23,8 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int 
seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
 int insn_fetch_from_user(struct pt_regs *regs,
 unsigned char buf[MAX_INSN_SIZE]);
+int insn_fetch_from_user_inatomic(struct pt_regs *regs,
+ unsigned char buf[MAX_INSN_SIZE]);
 bool insn_decode(struct insn *insn, struct pt_regs *regs,
 unsigned char buf[MAX_INSN_SIZE], int buf_size);
 
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index c3fd8fa..04a780a 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -258,7 +258,7 @@ static enum es_result vc_decode_insn(struct es_em_ctxt 
*ctxt)
int res;
 
if (user_mode(ctxt->regs)) {
-   res = insn_fetch_from_user(ctxt->regs, buffer);
+   res = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
if (!res) {
ctxt->fi.vector = X86_TRAP_PF;
ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4229950..bb0b3fe 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1415,6 +1415,25 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
}
 }
 
+static unsigned long insn_get_effective_ip(struct pt_regs *regs)
+{
+   unsigned long seg_base = 0;
+
+   /*
+* If not in user-space long mode, a custom code segment could be in
+* use. This is true in protected mode (if the process defined a local
+* descriptor table), or virtual-8086 mode. In most of the cases
+* seg_base will be zero as in USER_CS.
+*/
+   if (!user_64bit_mode(regs)) {
+   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
+   if (seg_base == -1L)
+   return 0;
+   }
+
+   return seg_base + regs->ip;
+}
+
 /**
  * insn_fetch_from_user() - Copy instruction bytes from user-space memory
  * @regs:  Structure with register values as seen when entering kernel mode
@@ -1431,24 +1450,43 @@ void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
  */
 int insn_fetch_from_user(struct pt_regs *regs, unsigned char 
buf[MAX_INSN_SIZE])
 {
-   unsigned long seg_base = 0;
+   unsigned long ip;
int not_copied;
 
-   /*
-* If not in user-space long mode, a custom code segment could be in
-* use. This is true in protected mode (if the process defined a local
-* descriptor table), or virtual-8086 mode. In most of the cases
-* seg_base will be zero as in USER_CS.
-*/
-   if (!user_64bit_mode(regs)) {
-   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
-   if (seg_base == -1L)
-   return 0;
-   }
+   ip = insn_get_effective_ip(regs);
+   if (!ip)
+   return 0;
+
+   not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE);
 
+   return MAX_INSN_SIZE - not_copied;
+}
+
+/**
+ * insn_fetch_from_user_inatomic() - Copy instruction bytes from user-space 
memory
+ *   while in atomic code
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @buf:   Array to store the fetched instruction
+ *
+ * Gets the linear address of the instruction and copies the instruction bytes
+ * to the buf. This function must be used in atomic context.
+ *

[tip: x86/urgent] x86/sev-es: Correctly track IRQ states in runtime #VC handler

2021-03-09 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 62441a1fb53263bda349b6e5997c3cc5c120d89e
Gitweb:
https://git.kernel.org/tip/62441a1fb53263bda349b6e5997c3cc5c120d89e
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:15 +01:00
Committer: Borislav Petkov 
CommitterDate: Tue, 09 Mar 2021 12:33:46 +01:00

x86/sev-es: Correctly track IRQ states in runtime #VC handler

Call irqentry_nmi_enter()/irqentry_nmi_exit() in the #VC handler to
correctly track the IRQ state during its execution.

Fixes: 0786138c78e79 ("x86/sev-es: Add a Runtime #VC Exception Handler")
Reported-by: Andy Lutomirski 
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # v5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-5-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 301f20f..c3fd8fa 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -1258,13 +1258,12 @@ static __always_inline bool on_vc_fallback_stack(struct 
pt_regs *regs)
 DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 {
struct sev_es_runtime_data *data = this_cpu_read(runtime_data);
+   irqentry_state_t irq_state;
struct ghcb_state state;
struct es_em_ctxt ctxt;
enum es_result result;
struct ghcb *ghcb;
 
-   lockdep_assert_irqs_disabled();
-
/*
 * Handle #DB before calling into !noinstr code to avoid recursive #DB.
 */
@@ -1273,6 +1272,8 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
return;
}
 
+   irq_state = irqentry_nmi_enter(regs);
+   lockdep_assert_irqs_disabled();
instrumentation_begin();
 
/*
@@ -1335,6 +1336,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication)
 
 out:
instrumentation_end();
+   irqentry_nmi_exit(regs, irq_state);
 
return;

[tip: x86/urgent] x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack

2021-03-09 Thread tip-bot2 for Joerg Roedel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 545ac14c16b5dbd909d5a90ddf5b5a629a40fa94
Gitweb:
https://git.kernel.org/tip/545ac14c16b5dbd909d5a90ddf5b5a629a40fa94
Author:Joerg Roedel 
AuthorDate:Wed, 03 Mar 2021 15:17:13 +01:00
Committer: Borislav Petkov 
CommitterDate: Tue, 09 Mar 2021 12:26:26 +01:00

x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack

The code in the NMI handler to adjust the #VC handler IST stack is
needed in case an NMI hits when the #VC handler is still using its IST
stack.

But the check for this condition also needs to look if the regs->sp
value is trusted, meaning it was not set by user-space. Extend the check
to not use regs->sp when the NMI interrupted user-space code or the
SYSCALL gap.

Fixes: 315562c9af3d5 ("x86/sev-es: Adjust #VC IST Stack on entering NMI 
handler")
Reported-by: Andy Lutomirski 
Signed-off-by: Joerg Roedel 
Signed-off-by: Borislav Petkov 
Cc: sta...@vger.kernel.org # 5.10+
Link: https://lkml.kernel.org/r/20210303141716.29223-3-j...@8bytes.org
---
 arch/x86/kernel/sev-es.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 84c1821..301f20f 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -121,8 +121,18 @@ static void __init setup_vc_stacks(int cpu)
cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
 }
 
-static __always_inline bool on_vc_stack(unsigned long sp)
+static __always_inline bool on_vc_stack(struct pt_regs *regs)
 {
+   unsigned long sp = regs->sp;
+
+   /* User-mode RSP is not trusted */
+   if (user_mode(regs))
+   return false;
+
+   /* SYSCALL gap still has user-mode RSP */
+   if (ip_within_syscall_gap(regs))
+   return false;
+
return ((sp >= __this_cpu_ist_bottom_va(VC)) && (sp < 
__this_cpu_ist_top_va(VC)));
 }
 
@@ -144,7 +154,7 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
old_ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
 
/* Make room on the IST stack */
-   if (on_vc_stack(regs->sp))
+   if (on_vc_stack(regs))
new_ist = ALIGN_DOWN(regs->sp, 8) - sizeof(old_ist);
else
new_ist = old_ist - sizeof(old_ist);

Re: [PATCH 6/7] x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path

2021-03-09 Thread Joerg Roedel

On Tue, Mar 02, 2021 at 08:43:53PM +0100, Borislav Petkov wrote:
> On Wed, Feb 10, 2021 at 11:21:34AM +0100, Joerg Roedel wrote:
> > +   /*
> > +* Store the sme_me_mask as an indicator that SEV is active. It will be
> > +* set again in startup_64().
> 
> So why bother? Or does something needs it before that?

This was actually a bug. The startup32_check_sev_cbit() needs something
to skip the check when SEV is not active. Therefore the value is set
here in sme_me_mask, but the function later checks sev_status.

I fixed it by setting sev_status to 1 here (indicates SEV is active).

Regards,

Joerg

[git pull] IOMMU Fixes for Linux iommu-updates-v5.12

2021-03-05 Thread Joerg Roedel

Hi Linus,

The following changes since commit 45e606f2726926b04094e1c9bf809bca4884c57f:

  Merge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'x86/vt-d' and 'core' 
into next (2021-02-12 15:27:17 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fixes-v5.12-rc1

for you to fetch changes up to 444d66a23c1f1e4c4d12aed4812681d0ad835d60:

  iommu/vt-d: Fix status code for Allocate/Free PASID command (2021-03-04 
13:32:04 +0100)


IOMMU Fixes for Linux v5.12-rc1

Including:

- Fix for a sleeping-while-atomic issue in the AMD IOMMU code

- Disabling lazy IOTLB flush for untrusted devices in the Intel VT-d 
driver

- Fix status code definitions for Intel VT-d

- Fix IO Page Fault issue in Tegra IOMMU driver


Andrey Ryabinin (1):
  iommu/amd: Fix sleeping in atomic in increase_address_space()

Lu Baolu (1):
  iommu: Don't use lazy flush for untrusted device

Nicolin Chen (1):
  iommu/tegra-smmu: Fix mc errors on tegra124-nyan

Zenghui Yu (1):
  iommu/vt-d: Fix status code for Allocate/Free PASID command

 drivers/iommu/amd/io_pgtable.c | 10 +++---
 drivers/iommu/dma-iommu.c  | 15 +
 drivers/iommu/intel/pasid.h|  4 +--
 drivers/iommu/tegra-smmu.c | 72 +-
 4 files changed, 87 insertions(+), 14 deletions(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature

Re: [PATCH] iommu/vt-d: Fix status code for Allocate/Free PASID command

2021-03-04 Thread Joerg Roedel

On Sat, Feb 27, 2021 at 03:39:09PM +0800, Zenghui Yu wrote:
> As per Intel vt-d spec, Rev 3.0 (section 10.4.45 "Virtual Command Response
> Register"), the status code of "No PASID available" error in response to
> the Allocate PASID command is 2, not 1. The same for "Invalid PASID" error
> in response to the Free PASID command.
> 
> We will otherwise see confusing kernel log under the command failure from
> guest side. Fix it.
> 
> Fixes: 24f27d32ab6b ("iommu/vt-d: Enlightened PASID allocation")
> Signed-off-by: Zenghui Yu 

Applied for v5.12, thanks.

Re: [PATCH 2/5] iommu/vt-d: Remove WO permissions on second-level paging entries

2021-03-04 Thread Joerg Roedel

On Thu, Feb 25, 2021 at 02:26:51PM +0800, Lu Baolu wrote:
> When the first level page table is used for IOVA translation, it only
> supports Read-Only and Read-Write permissions. The Write-Only permission
> is not supported as the PRESENT bit (implying Read permission) should
> always set. When using second level, we still give separate permissions
> that allows WriteOnly which seems inconsistent and awkward. There is no
> use case we can think off, hence remove that configuration to make it
> consistent.

No use-case for WriteOnly mappings? How about DMA_FROM_DEVICE mappings?

Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

2021-03-04 Thread Joerg Roedel

On Thu, Feb 18, 2021 at 02:07:02PM -0800, Nicolin Chen wrote:
>  drivers/iommu/tegra-smmu.c | 72 +-
>  1 file changed, 71 insertions(+), 1 deletion(-)

Applied for v5.12, thanks.

Re: [PATCH] iommu/amd: Fix sleeping in atomic in increase_address_space()

2021-03-04 Thread Joerg Roedel

On Wed, Feb 17, 2021 at 06:10:02PM +, Will Deacon wrote:
> >  drivers/iommu/amd/iommu.c | 10 ++
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> Acked-by: Will Deacon 

Applied for v5.12, thanks.

There were some conflicts which I resolved, can you please check the
result, Andrey? The updated patch is attached.

>From 140456f994195b568ecd7fc2287a34eadffef3ca Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin 
Date: Wed, 17 Feb 2021 17:30:04 +0300
Subject: [PATCH] iommu/amd: Fix sleeping in atomic in increase_address_space()

increase_address_space() calls get_zeroed_page(gfp) under spin_lock with
disabled interrupts. gfp flags passed to increase_address_space() may allow
sleeping, so it comes to this:

 BUG: sleeping function called from invalid context at mm/page_alloc.c:4342
 in_atomic(): 1, irqs_disabled(): 1, pid: 21555, name: epdcbbf1qnhbsd8

 Call Trace:
  dump_stack+0x66/0x8b
  ___might_sleep+0xec/0x110
  __alloc_pages_nodemask+0x104/0x300
  get_zeroed_page+0x15/0x40
  iommu_map_page+0xdd/0x3e0
  amd_iommu_map+0x50/0x70
  iommu_map+0x106/0x220
  vfio_iommu_type1_ioctl+0x76e/0x950 [vfio_iommu_type1]
  do_vfs_ioctl+0xa3/0x6f0
  ksys_ioctl+0x66/0x70
  __x64_sys_ioctl+0x16/0x20
  do_syscall_64+0x4e/0x100
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by moving get_zeroed_page() out of spin_lock/unlock section.

Fixes: 754265bcab ("iommu/amd: Fix race in increase_address_space()")
Signed-off-by: Andrey Ryabinin 
Acked-by: Will Deacon 
Cc: 
Link: https://lore.kernel.org/r/20210217143004.19165-1-a...@yandex-team.com
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/io_pgtable.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 1c4961e05c12..bb0ee5c9fde7 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -182,6 +182,10 @@ static bool increase_address_space(struct 
protection_domain *domain,
bool ret = true;
u64 *pte;
 
+   pte = (void *)get_zeroed_page(gfp);
+   if (!pte)
+   return false;
+
spin_lock_irqsave(>lock, flags);
 
if (address <= PM_LEVEL_SIZE(domain->iop.mode))
@@ -191,10 +195,6 @@ static bool increase_address_space(struct 
protection_domain *domain,
if (WARN_ON_ONCE(domain->iop.mode == PAGE_MODE_6_LEVEL))
goto out;
 
-   pte = (void *)get_zeroed_page(gfp);
-   if (!pte)
-   goto out;
-
*pte = PM_LEVEL_PDE(domain->iop.mode, 
iommu_virt_to_phys(domain->iop.root));
 
domain->iop.root  = pte;
@@ -208,10 +208,12 @@ static bool increase_address_space(struct 
protection_domain *domain,
 */
amd_iommu_domain_set_pgtable(domain, pte, domain->iop.mode);
 
+   pte = NULL;
ret = true;
 
 out:
spin_unlock_irqrestore(>lock, flags);
+   free_page((unsigned long)pte);
 
return ret;
 }
-- 
2.26.2

[PATCH 0/5 v2] x86/sev-es: SEV-ES Fixes for v5.12

2021-03-03 Thread Joerg Roedel

From: Joerg Roedel 

Hi,

here are a couple of fixes for 5.12 in the SEV-ES guest support code.
Patches 1-3 have in a similar form already been posted, so this is v2.
The last two patches are new an arose from me running an SEV-ES guest
with more debugging features and instrumentation enabled.  I changed
the first patches according to the review comments I received.

Please review.

Thanks,

Joerg

Joerg Roedel (5):
  x86/sev-es: Introduce ip_within_syscall_gap() helper
  x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST
stack
  x86/sev-es: Optimize __sev_es_ist_enter() for better readability
  x86/sev-es: Correctly track IRQ states in runtime #VC handler
  x86/sev-es: Use __copy_from_user_inatomic()

 arch/x86/entry/entry_64_compat.S |  2 +
 arch/x86/include/asm/insn-eval.h |  2 +
 arch/x86/include/asm/proto.h |  1 +
 arch/x86/include/asm/ptrace.h| 15 
 arch/x86/kernel/sev-es.c | 58 
 arch/x86/kernel/traps.c  |  3 +-
 arch/x86/lib/insn-eval.c | 66 +---
 7 files changed, 114 insertions(+), 33 deletions(-)

-- 
2.30.1

[PATCH 5/5] x86/sev-es: Use __copy_from_user_inatomic()

2021-03-03 Thread Joerg Roedel

From: Joerg Roedel 

The #VC handler must run atomic and can not be put to sleep. This is a
problem when it tries to fetch instruction bytes from user-space via
copy_from_user.

Introduce a insn_fetch_from_user_inatomic() helper which uses
__copy_from_user_inatomic() to safely copy the instruction bytes to
kernel memory in the #VC handler.

Fixes: 5e3427a7bc432 ("x86/sev-es: Handle instruction fetches from user-space")
Cc: sta...@vger.kernel.org # v5.10+
Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/insn-eval.h |  2 +
 arch/x86/kernel/sev-es.c |  2 +-
 arch/x86/lib/insn-eval.c | 66 +---
 3 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index a0f839aa144d..98b4dae5e8bc 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -23,6 +23,8 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int 
seg_reg_idx);
 int insn_get_code_seg_params(struct pt_regs *regs);
 int insn_fetch_from_user(struct pt_regs *regs,
 unsigned char buf[MAX_INSN_SIZE]);
+int insn_fetch_from_user_inatomic(struct pt_regs *regs,
+ unsigned char buf[MAX_INSN_SIZE]);
 bool insn_decode(struct insn *insn, struct pt_regs *regs,
 unsigned char buf[MAX_INSN_SIZE], int buf_size);
 
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 3d8ec5bf6f79..26f5479a97a8 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -270,7 +270,7 @@ static enum es_result vc_decode_insn(struct es_em_ctxt 
*ctxt)
int res;
 
if (user_mode(ctxt->regs)) {
-   res = insn_fetch_from_user(ctxt->regs, buffer);
+   res = insn_fetch_from_user_inatomic(ctxt->regs, buffer);
if (!res) {
ctxt->fi.vector = X86_TRAP_PF;
ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c
index 4229950a5d78..bb0b3fe1e0a0 100644
--- a/arch/x86/lib/insn-eval.c
+++ b/arch/x86/lib/insn-eval.c
@@ -1415,6 +1415,25 @@ void __user *insn_get_addr_ref(struct insn *insn, struct 
pt_regs *regs)
}
 }
 
+static unsigned long insn_get_effective_ip(struct pt_regs *regs)
+{
+   unsigned long seg_base = 0;
+
+   /*
+* If not in user-space long mode, a custom code segment could be in
+* use. This is true in protected mode (if the process defined a local
+* descriptor table), or virtual-8086 mode. In most of the cases
+* seg_base will be zero as in USER_CS.
+*/
+   if (!user_64bit_mode(regs)) {
+   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
+   if (seg_base == -1L)
+   return 0;
+   }
+
+   return seg_base + regs->ip;
+}
+
 /**
  * insn_fetch_from_user() - Copy instruction bytes from user-space memory
  * @regs:  Structure with register values as seen when entering kernel mode
@@ -1431,24 +1450,43 @@ void __user *insn_get_addr_ref(struct insn *insn, 
struct pt_regs *regs)
  */
 int insn_fetch_from_user(struct pt_regs *regs, unsigned char 
buf[MAX_INSN_SIZE])
 {
-   unsigned long seg_base = 0;
+   unsigned long ip;
int not_copied;
 
-   /*
-* If not in user-space long mode, a custom code segment could be in
-* use. This is true in protected mode (if the process defined a local
-* descriptor table), or virtual-8086 mode. In most of the cases
-* seg_base will be zero as in USER_CS.
-*/
-   if (!user_64bit_mode(regs)) {
-   seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS);
-   if (seg_base == -1L)
-   return 0;
-   }
+   ip = insn_get_effective_ip(regs);
+   if (!ip)
+   return 0;
+
+   not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE);
 
+   return MAX_INSN_SIZE - not_copied;
+}
+
+/**
+ * insn_fetch_from_user_inatomic() - Copy instruction bytes from user-space 
memory
+ *   while in atomic code
+ * @regs:  Structure with register values as seen when entering kernel mode
+ * @buf:   Array to store the fetched instruction
+ *
+ * Gets the linear address of the instruction and copies the instruction bytes
+ * to the buf. This function must be used in atomic context.
+ *
+ * Returns:
+ *
+ * Number of instruction bytes copied.
+ *
+ * 0 if nothing was copied.
+ */
+int insn_fetch_from_user_inatomic(struct pt_regs *regs, unsigned char 
buf[MAX_INSN_SIZE])
+{
+   unsigned long ip;
+   int not_copied;
+
+   ip = insn_get_effective_ip(regs);
+   if (!ip)
+   return 0;
 
-   not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),
-   MAX

[PATCH 2/5] x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST stack

2021-03-03 Thread Joerg Roedel

From: Joerg Roedel 

The code in the NMI handler to adjust the #VC handler IST stack is
needed in case an NMI hits when the #VC handler is still using its IST
stack.
But the check for this condition also needs to look if the regs->sp
value is trusted, meaning it was not set by user-space. Extend the
check to not use regs->sp when the NMI interrupted user-space code or
the SYSCALL gap.

Reported-by: Andy Lutomirski 
Fixes: 315562c9af3d5 ("x86/sev-es: Adjust #VC IST Stack on entering NMI 
handler")
Cc: sta...@vger.kernel.org # 5.10+
Signed-off-by: Joerg Roedel 
---
 arch/x86/kernel/sev-es.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 1e78f4bd7bf2..28b0144daddd 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -121,8 +121,18 @@ static void __init setup_vc_stacks(int cpu)
cea_set_pte((void *)vaddr, pa, PAGE_KERNEL);
 }
 
-static __always_inline bool on_vc_stack(unsigned long sp)
+static __always_inline bool on_vc_stack(struct pt_regs *regs)
 {
+   unsigned long sp = regs->sp;
+
+   /* User-mode RSP is not trusted */
+   if (user_mode(regs))
+   return false;
+
+   /* SYSCALL gap still has user-mode RSP */
+   if (ip_within_syscall_gap(regs))
+   return false;
+
return ((sp >= __this_cpu_ist_bottom_va(VC)) && (sp < 
__this_cpu_ist_top_va(VC)));
 }
 
@@ -144,7 +154,7 @@ void noinstr __sev_es_ist_enter(struct pt_regs *regs)
old_ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
 
/* Make room on the IST stack */
-   if (on_vc_stack(regs->sp))
+   if (on_vc_stack(regs))
new_ist = ALIGN_DOWN(regs->sp, 8) - sizeof(old_ist);
else
new_ist = old_ist - sizeof(old_ist);
-- 
2.30.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 6605 matches

Mail list logo