date:20171204

[PATCH 02/45] drivers: crypto: remove duplicate includes

2017-12-04 Thread Pravin Shedge

These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge 
---
 drivers/crypto/bcm/cipher.c  | 1 -
 drivers/crypto/cavium/nitrox/nitrox_reqmgr.c | 1 -
 drivers/crypto/ccp/ccp-crypto-aes-galois.c   | 1 -
 3 files changed, 3 deletions(-)

diff --git a/drivers/crypto/bcm/cipher.c b/drivers/crypto/bcm/cipher.c
index ce70b44..2b75f95 100644
--- a/drivers/crypto/bcm/cipher.c
+++ b/drivers/crypto/bcm/cipher.c
@@ -42,7 +42,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "util.h"
diff --git a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c 
b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
index 4addc23..deaefd5 100644
--- a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
+++ b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c
@@ -6,7 +6,6 @@
 #include "nitrox_dev.h"
 #include "nitrox_req.h"
 #include "nitrox_csr.h"
-#include "nitrox_req.h"
 
 /* SLC_STORE_INFO */
 #define MIN_UDD_LEN 16
diff --git a/drivers/crypto/ccp/ccp-crypto-aes-galois.c 
b/drivers/crypto/ccp/ccp-crypto-aes-galois.c
index ff02b71..ca1f0d7 100644
--- a/drivers/crypto/ccp/ccp-crypto-aes-galois.c
+++ b/drivers/crypto/ccp/ccp-crypto-aes-galois.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "ccp-crypto.h"
 
-- 
2.7.4

[Part2 PATCH v9 00/38] x86: Secure Encrypted Virtualization (AMD)

2017-12-04 Thread Brijesh Singh

This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have 
their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption 
key;
if its data is accessed to a different entity using a different key the 
encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.

The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.

The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.

The following links provide additional details:

AMD Memory Encryption white paper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

SEV Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

SEV Guest BIOS support:
  SEV support has been add to EDKII/OVMF BIOS
  https://github.com/tianocore/edk2

--

The series applies on kvm/next commit : 4fbd8d194f06 (Linux 4.15-rc1)

Complete tree is available at:
repo: https://github.com/codomania/kvm.git
branch: sev-v9-p2

TODO:
* Add SEV guest migration command support

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: "Radim KrÄmÃ¡Å™" 
Cc: Joerg Roedel 
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Herbert Xu 
Cc: David S. Miller 
Cc: Gary Hook 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-crypto@vger.kernel.org

Changes since v8:
 * Rebase the series to kvm/next branch
 * Update SEV asid allocation to limit the ASID between SEV_MIN_ASID to 
SEV_MAX_ASID
   (EPYC BIOS provide option to change the SEV_MIN_ASID -- which can be used to
   limit the number of SEV-enable guest)

Changes since v7:
 * Rebase the series to kvm/next branch
 * move the FW error enum definition in include/uapi/linux/psp-sev.h so that
   both userspace and kernel can share it.
 * (ccp) drop cmd_buf arg from sev_platform_init()
 * (ccp) apply some cleanup/fixup from Boris
 * (ccp) add some comments in FACTORY_RESET command handling
 * (kvm) some fixup/cleanup from Boris
 * (kvm) acquire the kvm->lock when modifying the sev->regions_list

Changes since v6:
 * (ccp): Extend psp_device structure to track the FW INIT and SHUTDOWN states.
 * (ccp): Init and Uninit SEV FW during module load/unload
 * (ccp): Avoid repeated k*alloc() for init and status command buffer
 * (kvm): Rework DBG command to fix the compilation warning seen with gcc7.x
 * (kvm): Convert the SEV doc in rst format

Changes since v5:
 * split the PSP driver support into multiple patches
 * multiple improvements from Boris
 * remove mem_enc_enabled() ops

Changes since v4:
 * Fixes to address kbuild robot errors
 * Add 'sev' module params to allow enable/disable SEV feature
 * Update documentation
 * Multiple fixes to address v4 feedbacks
 * Some coding style changes to address checkpatch reports

Changes since v3:
 * Re-design the PSP interface support patch
 * Rename the ioctls based on the feedbacks
 * Improve documentation
 * Fix i386 build issues
 * Add LAUNCH_SECRET command
 * Add new Kconfig option to enable SEV support
 * Changes to address v3 feedbacks.

Changes since v2:
 * Add KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioct to register encrypted
   memory ranges (recommend by Paolo)
 * Extend kvm_x86_ops to provide new memory_encryption_enabled ops
 * Enhance DEBUG DECRYPT/ENCRYPT commands to work with more than one page \
(recommended by Paolo)
 * Optimize LAUNCH_UPDATE command to reduce the number of calls to AMD-SP driver
 * Changes to address v2 feedbacks


Borislav Petkov (1):
  crypto: ccp: Build the AMD secure processor driver only with AMD CPU
support

Brijesh Singh (34):
  Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization
(SEV)
  KVM: SVM: Prepare to reserve asid for SEV guest
  KVM: X86: Extend CPUID range to include new leaf
  KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl
  KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl
  crypto: ccp: Define SEV

[Part2 PATCH v9 15/38] crypto: ccp: Implement SEV_PLATFORM_STATUS ioctl command

2017-12-04 Thread Brijesh Singh

The SEV_PLATFORM_STATUS command can be used by the platform owner to
get the current status of the platform. The command is defined in
SEV spec section 5.5.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
Acked-by: Gary R Hook 
---
 drivers/crypto/ccp/psp-dev.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index b49583a45a55..a5072b166ab8 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -274,6 +274,21 @@ static int sev_ioctl_do_reset(struct sev_issue_cmd *argp)
return __sev_do_cmd_locked(SEV_CMD_FACTORY_RESET, 0, &argp->error);
 }
 
+static int sev_ioctl_do_platform_status(struct sev_issue_cmd *argp)
+{
+   struct sev_user_data_status *data = &psp_master->status_cmd_buf;
+   int ret;
+
+   ret = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, data, &argp->error);
+   if (ret)
+   return ret;
+
+   if (copy_to_user((void __user *)argp->data, data, sizeof(*data)))
+   ret = -EFAULT;
+
+   return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *)arg;
@@ -299,6 +314,9 @@ static long sev_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg)
case SEV_FACTORY_RESET:
ret = sev_ioctl_do_reset(&input);
break;
+   case SEV_PLATFORM_STATUS:
+   ret = sev_ioctl_do_platform_status(&input);
+   break;
default:
ret = -EINVAL;
goto out;
-- 
2.9.5

[Part2 PATCH v9 14/38] crypto: ccp: Implement SEV_FACTORY_RESET ioctl command

2017-12-04 Thread Brijesh Singh

The SEV_FACTORY_RESET command can be used by the platform owner to
reset the non-volatile SEV related data. The command is defined in
SEV spec section 5.4

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/psp-dev.c | 77 +++-
 1 file changed, 76 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index 9915a6c604a3..b49583a45a55 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -232,9 +232,84 @@ static int sev_platform_shutdown(int *error)
return rc;
 }
 
+static int sev_get_platform_state(int *state, int *error)
+{
+   int rc;
+
+   rc = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS,
+&psp_master->status_cmd_buf, error);
+   if (rc)
+   return rc;
+
+   *state = psp_master->status_cmd_buf.state;
+   return rc;
+}
+
+static int sev_ioctl_do_reset(struct sev_issue_cmd *argp)
+{
+   int state, rc;
+
+   /*
+* The SEV spec requires that FACTORY_RESET must be issued in
+* UNINIT state. Before we go further lets check if any guest is
+* active.
+*
+* If FW is in WORKING state then deny the request otherwise issue
+* SHUTDOWN command do INIT -> UNINIT before issuing the FACTORY_RESET.
+*
+*/
+   rc = sev_get_platform_state(&state, &argp->error);
+   if (rc)
+   return rc;
+
+   if (state == SEV_STATE_WORKING)
+   return -EBUSY;
+
+   if (state == SEV_STATE_INIT) {
+   rc = __sev_platform_shutdown_locked(&argp->error);
+   if (rc)
+   return rc;
+   }
+
+   return __sev_do_cmd_locked(SEV_CMD_FACTORY_RESET, 0, &argp->error);
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
-   return -ENOTTY;
+   void __user *argp = (void __user *)arg;
+   struct sev_issue_cmd input;
+   int ret = -EFAULT;
+
+   if (!psp_master)
+   return -ENODEV;
+
+   if (ioctl != SEV_ISSUE_CMD)
+   return -EINVAL;
+
+   if (copy_from_user(&input, argp, sizeof(struct sev_issue_cmd)))
+   return -EFAULT;
+
+   if (input.cmd > SEV_MAX)
+   return -EINVAL;
+
+   mutex_lock(&sev_cmd_mutex);
+
+   switch (input.cmd) {
+
+   case SEV_FACTORY_RESET:
+   ret = sev_ioctl_do_reset(&input);
+   break;
+   default:
+   ret = -EINVAL;
+   goto out;
+   }
+
+   if (copy_to_user(argp, &input, sizeof(struct sev_issue_cmd)))
+   ret = -EFAULT;
+out:
+   mutex_unlock(&sev_cmd_mutex);
+
+   return ret;
 }
 
 static const struct file_operations sev_fops = {
-- 
2.9.5

[Part2 PATCH v9 12/38] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-12-04 Thread Brijesh Singh

The Platform Security Processor (PSP) is part of the AMD Secure
Processor (AMD-SP) functionality. The PSP is a dedicated processor
that provides support for key management commands in Secure Encrypted
Virtualization (SEV) mode, along with software-based Trusted Execution
Environment (TEE) to enable third-party trusted applications.

Note that the key management functionality provided by the SEV firmware
can be used outside of the kvm-amd driver hence it doesn't need to
depend on CONFIG_KVM_AMD.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
---
 drivers/crypto/ccp/Kconfig   |  11 +
 drivers/crypto/ccp/Makefile  |   1 +
 drivers/crypto/ccp/psp-dev.c | 105 +++
 drivers/crypto/ccp/psp-dev.h |  59 
 drivers/crypto/ccp/sp-dev.c  |  26 +++
 drivers/crypto/ccp/sp-dev.h  |  24 +-
 drivers/crypto/ccp/sp-pci.c  |  52 +
 7 files changed, 277 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/ccp/psp-dev.c
 create mode 100644 drivers/crypto/ccp/psp-dev.h

diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig
index 9c84f9838931..b9dfae47aefd 100644
--- a/drivers/crypto/ccp/Kconfig
+++ b/drivers/crypto/ccp/Kconfig
@@ -33,3 +33,14 @@ config CRYPTO_DEV_CCP_CRYPTO
  Support for using the cryptographic API with the AMD Cryptographic
  Coprocessor. This module supports offload of SHA and AES algorithms.
  If you choose 'M' here, this module will be called ccp_crypto.
+
+config CRYPTO_DEV_SP_PSP
+   bool "Platform Security Processor (PSP) device"
+   default y
+   depends on CRYPTO_DEV_CCP_DD && X86_64
+   help
+Provide support for the AMD Platform Security Processor (PSP).
+The PSP is a dedicated processor that provides support for key
+management commands in Secure Encrypted Virtualization (SEV) mode,
+along with software-based Trusted Execution Environment (TEE) to
+enable third-party trusted applications.
diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile
index c4ce726b931e..51d1c0cf66c7 100644
--- a/drivers/crypto/ccp/Makefile
+++ b/drivers/crypto/ccp/Makefile
@@ -8,6 +8,7 @@ ccp-$(CONFIG_CRYPTO_DEV_SP_CCP) += ccp-dev.o \
ccp-dmaengine.o \
ccp-debugfs.o
 ccp-$(CONFIG_PCI) += sp-pci.o
+ccp-$(CONFIG_CRYPTO_DEV_SP_PSP) += psp-dev.o
 
 obj-$(CONFIG_CRYPTO_DEV_CCP_CRYPTO) += ccp-crypto.o
 ccp-crypto-objs := ccp-crypto-main.o \
diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
new file mode 100644
index ..b5789f878560
--- /dev/null
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -0,0 +1,105 @@
+/*
+ * AMD Platform Security Processor (PSP) interface
+ *
+ * Copyright (C) 2016-2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sp-dev.h"
+#include "psp-dev.h"
+
+static struct psp_device *psp_alloc_struct(struct sp_device *sp)
+{
+   struct device *dev = sp->dev;
+   struct psp_device *psp;
+
+   psp = devm_kzalloc(dev, sizeof(*psp), GFP_KERNEL);
+   if (!psp)
+   return NULL;
+
+   psp->dev = dev;
+   psp->sp = sp;
+
+   snprintf(psp->name, sizeof(psp->name), "psp-%u", sp->ord);
+
+   return psp;
+}
+
+static irqreturn_t psp_irq_handler(int irq, void *data)
+{
+   return IRQ_HANDLED;
+}
+
+int psp_dev_init(struct sp_device *sp)
+{
+   struct device *dev = sp->dev;
+   struct psp_device *psp;
+   int ret;
+
+   ret = -ENOMEM;
+   psp = psp_alloc_struct(sp);
+   if (!psp)
+   goto e_err;
+
+   sp->psp_data = psp;
+
+   psp->vdata = (struct psp_vdata *)sp->dev_vdata->psp_vdata;
+   if (!psp->vdata) {
+   ret = -ENODEV;
+   dev_err(dev, "missing driver data\n");
+   goto e_err;
+   }
+
+   psp->io_regs = sp->io_map + psp->vdata->offset;
+
+   /* Disable and clear interrupts until ready */
+   iowrite32(0, psp->io_regs + PSP_P2CMSG_INTEN);
+   iowrite32(-1, psp->io_regs + PSP_P2CMSG_INTSTS);
+
+   /* Request an irq */
+   ret = sp_request_psp_irq(psp->sp, psp_irq_handler, psp->name, psp);
+   if (ret) {
+   dev_err(dev, "psp: unable to allocate an IRQ\n");
+   goto e_err;
+   }
+
+   if (sp->set_psp_master_device)
+   sp->set_psp_master_device(sp);
+
+

[Part2 PATCH v9 13/38] crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support

2017-12-04 Thread Brijesh Singh

AMD's new Secure Encrypted Virtualization (SEV) feature allows the
memory contents of virtual machines to be transparently encrypted with a
key unique to the VM. The programming and management of the encryption
keys are handled by the AMD Secure Processor (AMD-SP) which exposes the
commands for these tasks. The complete spec is available at:

http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf

Extend the AMD-SP driver to provide the following support:

 - an in-kernel API to communicate with the SEV firmware. The API can be
   used by the hypervisor to create encryption context for a SEV guest.

 - a userspace IOCTL to manage the platform certificates.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/psp-dev.c | 344 +++
 drivers/crypto/ccp/psp-dev.h |  24 +++
 drivers/crypto/ccp/sp-dev.c  |   9 ++
 drivers/crypto/ccp/sp-dev.h  |   4 +
 include/linux/psp-sev.h  | 137 +
 5 files changed, 518 insertions(+)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index b5789f878560..9915a6c604a3 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -26,6 +26,12 @@
 #include "sp-dev.h"
 #include "psp-dev.h"
 
+#define DEVICE_NAME"sev"
+
+static DEFINE_MUTEX(sev_cmd_mutex);
+static struct sev_misc_dev *misc_dev;
+static struct psp_device *psp_master;
+
 static struct psp_device *psp_alloc_struct(struct sp_device *sp)
 {
struct device *dev = sp->dev;
@@ -45,9 +51,285 @@ static struct psp_device *psp_alloc_struct(struct sp_device 
*sp)
 
 static irqreturn_t psp_irq_handler(int irq, void *data)
 {
+   struct psp_device *psp = data;
+   unsigned int status;
+   int reg;
+
+   /* Read the interrupt status: */
+   status = ioread32(psp->io_regs + PSP_P2CMSG_INTSTS);
+
+   /* Check if it is command completion: */
+   if (!(status & BIT(PSP_CMD_COMPLETE_REG)))
+   goto done;
+
+   /* Check if it is SEV command completion: */
+   reg = ioread32(psp->io_regs + PSP_CMDRESP);
+   if (reg & PSP_CMDRESP_RESP) {
+   psp->sev_int_rcvd = 1;
+   wake_up(&psp->sev_int_queue);
+   }
+
+done:
+   /* Clear the interrupt status by writing the same value we read. */
+   iowrite32(status, psp->io_regs + PSP_P2CMSG_INTSTS);
+
return IRQ_HANDLED;
 }
 
+static void sev_wait_cmd_ioc(struct psp_device *psp, unsigned int *reg)
+{
+   psp->sev_int_rcvd = 0;
+
+   wait_event(psp->sev_int_queue, psp->sev_int_rcvd);
+   *reg = ioread32(psp->io_regs + PSP_CMDRESP);
+}
+
+static int sev_cmd_buffer_len(int cmd)
+{
+   switch (cmd) {
+   case SEV_CMD_INIT:  return sizeof(struct 
sev_data_init);
+   case SEV_CMD_PLATFORM_STATUS:   return sizeof(struct 
sev_user_data_status);
+   case SEV_CMD_PEK_CSR:   return sizeof(struct 
sev_data_pek_csr);
+   case SEV_CMD_PEK_CERT_IMPORT:   return sizeof(struct 
sev_data_pek_cert_import);
+   case SEV_CMD_PDH_CERT_EXPORT:   return sizeof(struct 
sev_data_pdh_cert_export);
+   case SEV_CMD_LAUNCH_START:  return sizeof(struct 
sev_data_launch_start);
+   case SEV_CMD_LAUNCH_UPDATE_DATA:return sizeof(struct 
sev_data_launch_update_data);
+   case SEV_CMD_LAUNCH_UPDATE_VMSA:return sizeof(struct 
sev_data_launch_update_vmsa);
+   case SEV_CMD_LAUNCH_FINISH: return sizeof(struct 
sev_data_launch_finish);
+   case SEV_CMD_LAUNCH_MEASURE:return sizeof(struct 
sev_data_launch_measure);
+   case SEV_CMD_ACTIVATE:  return sizeof(struct 
sev_data_activate);
+   case SEV_CMD_DEACTIVATE:return sizeof(struct 
sev_data_deactivate);
+   case SEV_CMD_DECOMMISSION:  return sizeof(struct 
sev_data_decommission);
+   case SEV_CMD_GUEST_STATUS:  return sizeof(struct 
sev_data_guest_status);
+   case SEV_CMD_DBG_DECRYPT:   return sizeof(struct 
sev_data_dbg);
+   case SEV_CMD_DBG_ENCRYPT:   return sizeof(struct 
sev_data_dbg);
+   case SEV_CMD_SEND_START:return sizeof(struct 
sev_data_send_start);
+   case SEV_CMD_SEND_UPDATE_DATA:  return sizeof(struct 
sev_data_send_update_data);
+   case SEV_CMD_SEND_UPDATE_VMSA:  return sizeof(struct 
sev_data_send_update_vmsa);
+   case SEV_CMD_SEND_FINISH:   return sizeof(struct 
sev_data_send_finish);
+   case SEV_CMD_RECEIVE_START: return sizeof(struct 
sev_data_receive_start);
+   case SEV_CMD_RECEIVE_FINISH:return sizeof(struct 
sev_data_receive_finish);
+   case SEV_CMD_R

[Part2 PATCH v9 10/38] crypto: ccp: Define SEV userspace ioctl and command id

2017-12-04 Thread Brijesh Singh

Add a include file which defines the ioctl and command id used for
issuing SEV platform management specific commands.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
Acked-by: Gary R Hook 
---
 include/uapi/linux/psp-sev.h | 142 +++
 1 file changed, 142 insertions(+)
 create mode 100644 include/uapi/linux/psp-sev.h

diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h
new file mode 100644
index ..3d77fe91239a
--- /dev/null
+++ b/include/uapi/linux/psp-sev.h
@@ -0,0 +1,142 @@
+/*
+ * Userspace interface for AMD Secure Encrypted Virtualization (SEV)
+ * platform management commands.
+ *
+ * Copyright (C) 2016-2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh 
+ *
+ * SEV spec 0.14 is available at:
+ * http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __PSP_SEV_USER_H__
+#define __PSP_SEV_USER_H__
+
+#include 
+
+/**
+ * SEV platform commands
+ */
+enum {
+   SEV_FACTORY_RESET = 0,
+   SEV_PLATFORM_STATUS,
+   SEV_PEK_GEN,
+   SEV_PEK_CSR,
+   SEV_PDH_GEN,
+   SEV_PDH_CERT_EXPORT,
+   SEV_PEK_CERT_IMPORT,
+
+   SEV_MAX,
+};
+
+/**
+ * SEV Firmware status code
+ */
+typedef enum {
+   SEV_RET_SUCCESS = 0,
+   SEV_RET_INVALID_PLATFORM_STATE,
+   SEV_RET_INVALID_GUEST_STATE,
+   SEV_RET_INAVLID_CONFIG,
+   SEV_RET_INVALID_len,
+   SEV_RET_ALREADY_OWNED,
+   SEV_RET_INVALID_CERTIFICATE,
+   SEV_RET_POLICY_FAILURE,
+   SEV_RET_INACTIVE,
+   SEV_RET_INVALID_ADDRESS,
+   SEV_RET_BAD_SIGNATURE,
+   SEV_RET_BAD_MEASUREMENT,
+   SEV_RET_ASID_OWNED,
+   SEV_RET_INVALID_ASID,
+   SEV_RET_WBINVD_REQUIRED,
+   SEV_RET_DFFLUSH_REQUIRED,
+   SEV_RET_INVALID_GUEST,
+   SEV_RET_INVALID_COMMAND,
+   SEV_RET_ACTIVE,
+   SEV_RET_HWSEV_RET_PLATFORM,
+   SEV_RET_HWSEV_RET_UNSAFE,
+   SEV_RET_UNSUPPORTED,
+   SEV_RET_MAX,
+} sev_ret_code;
+
+/**
+ * struct sev_user_data_status - PLATFORM_STATUS command parameters
+ *
+ * @major: major API version
+ * @minor: minor API version
+ * @state: platform state
+ * @flags: platform config flags
+ * @build: firmware build id for API version
+ * @guest_count: number of active guests
+ */
+struct sev_user_data_status {
+   __u8 api_major; /* Out */
+   __u8 api_minor; /* Out */
+   __u8 state; /* Out */
+   __u32 flags;/* Out */
+   __u8 build; /* Out */
+   __u32 guest_count;  /* Out */
+} __packed;
+
+/**
+ * struct sev_user_data_pek_csr - PEK_CSR command parameters
+ *
+ * @address: PEK certificate chain
+ * @length: length of certificate
+ */
+struct sev_user_data_pek_csr {
+   __u64 address;  /* In */
+   __u32 length;   /* In/Out */
+} __packed;
+
+/**
+ * struct sev_user_data_cert_import - PEK_CERT_IMPORT command parameters
+ *
+ * @pek_address: PEK certificate chain
+ * @pek_len: length of PEK certificate
+ * @oca_address: OCA certificate chain
+ * @oca_len: length of OCA certificate
+ */
+struct sev_user_data_pek_cert_import {
+   __u64 pek_cert_address; /* In */
+   __u32 pek_cert_len; /* In */
+   __u64 oca_cert_address; /* In */
+   __u32 oca_cert_len; /* In */
+} __packed;
+
+/**
+ * struct sev_user_data_pdh_cert_export - PDH_CERT_EXPORT command parameters
+ *
+ * @pdh_address: PDH certificate address
+ * @pdh_len: length of PDH certificate
+ * @cert_chain_address: PDH certificate chain
+ * @cert_chain_len: length of PDH certificate chain
+ */
+struct sev_user_data_pdh_cert_export {
+   __u64 pdh_cert_address; /* In */
+   __u32 pdh_cert_len; /* In/Out */
+   __u64 cert_chain_address;   /* In */
+   __u32 cert_chain_len;   /* In/Out */
+} __packed;
+
+/**
+ * struct sev_issue_cmd - SEV ioctl parameters
+ *
+ * @cmd: SEV commands to execute
+ * @opaque: pointer to the command structure
+ * @error: SEV FW return code on failure
+ */
+struct sev_issue_cmd {
+   __u32 cmd;  /* In */
+   __u64 data; /* In */
+   __u32 error;/* Out */
+} __packed;
+
+#define SEV_IOC_TYPE   'S'
+#define SEV_ISSUE_CMD  _IOWR(SEV_IOC_TYPE,

[Part2 PATCH v9 17/38] crypto: ccp: Implement SEV_PDH_GEN ioctl command

2017-12-04 Thread Brijesh Singh

The SEV_PDH_GEN command is used to re-generate the Platform
Diffie-Hellman (PDH) key. The command is defined in SEV spec section
5.6.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
Acked-by: Gary R Hook 
---
 drivers/crypto/ccp/psp-dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index 8aa8036023e0..fd3daf0a1176 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -333,6 +333,9 @@ static long sev_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg)
case SEV_PEK_GEN:
ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PEK_GEN, &input);
break;
+   case SEV_PDH_GEN:
+   ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PDH_GEN, &input);
+   break;
default:
ret = -EINVAL;
goto out;
-- 
2.9.5

[Part2 PATCH v9 19/38] crypto: ccp: Implement SEV_PEK_CERT_IMPORT ioctl command

2017-12-04 Thread Brijesh Singh

The SEV_PEK_CERT_IMPORT command can be used to import the signed PEK
certificate. The command is defined in SEV spec section 5.8.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Acked-by: Gary R Hook 
Reviewed-by: Borislav Petkov 
---
 drivers/crypto/ccp/psp-dev.c | 81 
 include/linux/psp-sev.h  |  4 +++
 2 files changed, 85 insertions(+)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index c3906bbdb69b..9d1c4600db19 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -365,6 +365,84 @@ static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp)
return ret;
 }
 
+void *psp_copy_user_blob(u64 __user uaddr, u32 len)
+{
+   void *data;
+
+   if (!uaddr || !len)
+   return ERR_PTR(-EINVAL);
+
+   /* verify that blob length does not exceed our limit */
+   if (len > SEV_FW_BLOB_MAX_SIZE)
+   return ERR_PTR(-EINVAL);
+
+   data = kmalloc(len, GFP_KERNEL);
+   if (!data)
+   return ERR_PTR(-ENOMEM);
+
+   if (copy_from_user(data, (void __user *)(uintptr_t)uaddr, len))
+   goto e_free;
+
+   return data;
+
+e_free:
+   kfree(data);
+   return ERR_PTR(-EFAULT);
+}
+EXPORT_SYMBOL_GPL(psp_copy_user_blob);
+
+static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp)
+{
+   struct sev_user_data_pek_cert_import input;
+   struct sev_data_pek_cert_import *data;
+   void *pek_blob, *oca_blob;
+   int ret;
+
+   if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   /* copy PEK certificate blobs from userspace */
+   pek_blob = psp_copy_user_blob(input.pek_cert_address, 
input.pek_cert_len);
+   if (IS_ERR(pek_blob)) {
+   ret = PTR_ERR(pek_blob);
+   goto e_free;
+   }
+
+   data->pek_cert_address = __psp_pa(pek_blob);
+   data->pek_cert_len = input.pek_cert_len;
+
+   /* copy PEK certificate blobs from userspace */
+   oca_blob = psp_copy_user_blob(input.oca_cert_address, 
input.oca_cert_len);
+   if (IS_ERR(oca_blob)) {
+   ret = PTR_ERR(oca_blob);
+   goto e_free_pek;
+   }
+
+   data->oca_cert_address = __psp_pa(oca_blob);
+   data->oca_cert_len = input.oca_cert_len;
+
+   /* If platform is not in INIT state then transition it to INIT */
+   if (psp_master->sev_state != SEV_STATE_INIT) {
+   ret = __sev_platform_init_locked(&argp->error);
+   if (ret)
+   goto e_free_oca;
+   }
+
+   ret = __sev_do_cmd_locked(SEV_CMD_PEK_CERT_IMPORT, data, &argp->error);
+
+e_free_oca:
+   kfree(oca_blob);
+e_free_pek:
+   kfree(pek_blob);
+e_free:
+   kfree(data);
+   return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *)arg;
@@ -402,6 +480,9 @@ static long sev_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg)
case SEV_PEK_CSR:
ret = sev_ioctl_do_pek_csr(&input);
break;
+   case SEV_PEK_CERT_IMPORT:
+   ret = sev_ioctl_do_pek_import(&input);
+   break;
default:
ret = -EINVAL;
goto out;
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 0b6dd306d88b..93addfa34061 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -576,6 +576,8 @@ int sev_guest_df_flush(int *error);
  */
 int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 
+void *psp_copy_user_blob(u64 __user uaddr, u32 len);
+
 #else  /* !CONFIG_CRYPTO_DEV_SP_PSP */
 
 static inline int
@@ -597,6 +599,8 @@ static inline int sev_guest_df_flush(int *error) { return 
-ENODEV; }
 static inline int
 sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, 
int *error) { return -ENODEV; }
 
+static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return 
ERR_PTR(-EINVAL); }
+
 #endif /* CONFIG_CRYPTO_DEV_SP_PSP */
 
 #endif /* __PSP_SEV_H__ */
-- 
2.9.5

[Part2 PATCH v9 18/38] crypto: ccp: Implement SEV_PEK_CSR ioctl command

2017-12-04 Thread Brijesh Singh

The SEV_PEK_CSR command can be used to generate a PEK certificate
signing request. The command is defined in SEV spec section 5.7.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Acked-by: Gary R Hook 
---
 drivers/crypto/ccp/psp-dev.c | 66 
 1 file changed, 66 insertions(+)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index fd3daf0a1176..c3906bbdb69b 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -302,6 +302,69 @@ static int sev_ioctl_do_pek_pdh_gen(int cmd, struct 
sev_issue_cmd *argp)
return __sev_do_cmd_locked(cmd, 0, &argp->error);
 }
 
+static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp)
+{
+   struct sev_user_data_pek_csr input;
+   struct sev_data_pek_csr *data;
+   void *blob = NULL;
+   int ret;
+
+   if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   /* userspace wants to query CSR length */
+   if (!input.address || !input.length)
+   goto cmd;
+
+   /* allocate a physically contiguous buffer to store the CSR blob */
+   if (!access_ok(VERIFY_WRITE, input.address, input.length) ||
+   input.length > SEV_FW_BLOB_MAX_SIZE) {
+   ret = -EFAULT;
+   goto e_free;
+   }
+
+   blob = kmalloc(input.length, GFP_KERNEL);
+   if (!blob) {
+   ret = -ENOMEM;
+   goto e_free;
+   }
+
+   data->address = __psp_pa(blob);
+   data->len = input.length;
+
+cmd:
+   if (psp_master->sev_state == SEV_STATE_UNINIT) {
+   ret = __sev_platform_init_locked(&argp->error);
+   if (ret)
+   goto e_free_blob;
+   }
+
+   ret = __sev_do_cmd_locked(SEV_CMD_PEK_CSR, data, &argp->error);
+
+/* If we query the CSR length, FW responded with expected data. */
+   input.length = data->len;
+
+   if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) {
+   ret = -EFAULT;
+   goto e_free_blob;
+   }
+
+   if (blob) {
+   if (copy_to_user((void __user *)input.address, blob, 
input.length))
+   ret = -EFAULT;
+   }
+
+e_free_blob:
+   kfree(blob);
+e_free:
+   kfree(data);
+   return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *)arg;
@@ -336,6 +399,9 @@ static long sev_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg)
case SEV_PDH_GEN:
ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PDH_GEN, &input);
break;
+   case SEV_PEK_CSR:
+   ret = sev_ioctl_do_pek_csr(&input);
+   break;
default:
ret = -EINVAL;
goto out;
-- 
2.9.5

[Part2 PATCH v9 09/38] crypto: ccp: Build the AMD secure processor driver only with AMD CPU support

2017-12-04 Thread Brijesh Singh

From: Borislav Petkov 

This is AMD-specific hardware so present it in Kconfig only when AMD
CPU support is enabled or on ARM64 where it is also used.

Signed-off-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Reviewed-by: Gary R Hook 
Cc: Brijesh Singh 
Cc: Tom Lendacky 
Cc: Gary Hook 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: linux-crypto@vger.kernel.org
---
 drivers/crypto/ccp/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig
index 6d626606b9c5..9c84f9838931 100644
--- a/drivers/crypto/ccp/Kconfig
+++ b/drivers/crypto/ccp/Kconfig
@@ -1,5 +1,6 @@
 config CRYPTO_DEV_CCP_DD
tristate "Secure Processor device driver"
+   depends on CPU_SUP_AMD || ARM64
default m
help
  Provides AMD Secure Processor device driver.
-- 
2.9.5

[Part2 PATCH v9 20/38] crypto: ccp: Implement SEV_PDH_CERT_EXPORT ioctl command

2017-12-04 Thread Brijesh Singh

The SEV_PDH_CERT_EXPORT command can be used to export the PDH and its
certificate chain. The command is defined in SEV spec section 5.10.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Acked-by: Gary R Hook 
---
 drivers/crypto/ccp/psp-dev.c | 97 
 1 file changed, 97 insertions(+)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index 9d1c4600db19..fcfa5b1eae61 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -443,6 +443,100 @@ static int sev_ioctl_do_pek_import(struct sev_issue_cmd 
*argp)
return ret;
 }
 
+static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp)
+{
+   struct sev_user_data_pdh_cert_export input;
+   void *pdh_blob = NULL, *cert_blob = NULL;
+   struct sev_data_pdh_cert_export *data;
+   int ret;
+
+   if (copy_from_user(&input, (void __user *)argp->data, sizeof(input)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   /* Userspace wants to query the certificate length. */
+   if (!input.pdh_cert_address ||
+   !input.pdh_cert_len ||
+   !input.cert_chain_address)
+   goto cmd;
+
+   /* Allocate a physically contiguous buffer to store the PDH blob. */
+   if ((input.pdh_cert_len > SEV_FW_BLOB_MAX_SIZE) ||
+   !access_ok(VERIFY_WRITE, input.pdh_cert_address, 
input.pdh_cert_len)) {
+   ret = -EFAULT;
+   goto e_free;
+   }
+
+   /* Allocate a physically contiguous buffer to store the cert chain 
blob. */
+   if ((input.cert_chain_len > SEV_FW_BLOB_MAX_SIZE) ||
+   !access_ok(VERIFY_WRITE, input.cert_chain_address, 
input.cert_chain_len)) {
+   ret = -EFAULT;
+   goto e_free;
+   }
+
+   pdh_blob = kmalloc(input.pdh_cert_len, GFP_KERNEL);
+   if (!pdh_blob) {
+   ret = -ENOMEM;
+   goto e_free;
+   }
+
+   data->pdh_cert_address = __psp_pa(pdh_blob);
+   data->pdh_cert_len = input.pdh_cert_len;
+
+   cert_blob = kmalloc(input.cert_chain_len, GFP_KERNEL);
+   if (!cert_blob) {
+   ret = -ENOMEM;
+   goto e_free_pdh;
+   }
+
+   data->cert_chain_address = __psp_pa(cert_blob);
+   data->cert_chain_len = input.cert_chain_len;
+
+cmd:
+   /* If platform is not in INIT state then transition it to INIT. */
+   if (psp_master->sev_state != SEV_STATE_INIT) {
+   ret = __sev_platform_init_locked(&argp->error);
+   if (ret)
+   goto e_free_cert;
+   }
+
+   ret = __sev_do_cmd_locked(SEV_CMD_PDH_CERT_EXPORT, data, &argp->error);
+
+   /* If we query the length, FW responded with expected data. */
+   input.cert_chain_len = data->cert_chain_len;
+   input.pdh_cert_len = data->pdh_cert_len;
+
+   if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) {
+   ret = -EFAULT;
+   goto e_free_cert;
+   }
+
+   if (pdh_blob) {
+   if (copy_to_user((void __user *)input.pdh_cert_address,
+pdh_blob, input.pdh_cert_len)) {
+   ret = -EFAULT;
+   goto e_free_cert;
+   }
+   }
+
+   if (cert_blob) {
+   if (copy_to_user((void __user *)input.cert_chain_address,
+cert_blob, input.cert_chain_len))
+   ret = -EFAULT;
+   }
+
+e_free_cert:
+   kfree(cert_blob);
+e_free_pdh:
+   kfree(pdh_blob);
+e_free:
+   kfree(data);
+   return ret;
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *)arg;
@@ -483,6 +577,9 @@ static long sev_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg)
case SEV_PEK_CERT_IMPORT:
ret = sev_ioctl_do_pek_import(&input);
break;
+   case SEV_PDH_CERT_EXPORT:
+   ret = sev_ioctl_do_pdh_export(&input);
+   break;
default:
ret = -EINVAL;
goto out;
-- 
2.9.5

[Part2 PATCH v9 16/38] crypto: ccp: Implement SEV_PEK_GEN ioctl command

2017-12-04 Thread Brijesh Singh

The SEV_PEK_GEN command is used to generate a new Platform Endorsement
Key (PEK). The command is defined in SEV spec section 5.6.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Reviewed-by: Borislav Petkov 
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Acked-by: Gary R Hook 
---
 drivers/crypto/ccp/psp-dev.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index a5072b166ab8..8aa8036023e0 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -289,6 +289,19 @@ static int sev_ioctl_do_platform_status(struct 
sev_issue_cmd *argp)
return ret;
 }
 
+static int sev_ioctl_do_pek_pdh_gen(int cmd, struct sev_issue_cmd *argp)
+{
+   int rc;
+
+   if (psp_master->sev_state == SEV_STATE_UNINIT) {
+   rc = __sev_platform_init_locked(&argp->error);
+   if (rc)
+   return rc;
+   }
+
+   return __sev_do_cmd_locked(cmd, 0, &argp->error);
+}
+
 static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg)
 {
void __user *argp = (void __user *)arg;
@@ -317,6 +330,9 @@ static long sev_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg)
case SEV_PLATFORM_STATUS:
ret = sev_ioctl_do_platform_status(&input);
break;
+   case SEV_PEK_GEN:
+   ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PEK_GEN, &input);
+   break;
default:
ret = -EINVAL;
goto out;
-- 
2.9.5

[Part2 PATCH v9 11/38] crypto: ccp: Define SEV key management command id

2017-12-04 Thread Brijesh Singh

Define Secure Encrypted Virtualization (SEV) key management command id
and structure. The command definition is available in SEV KM spec
0.14 (http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf)

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Borislav Petkov 
Cc: Herbert Xu 
Cc: Gary Hook 
Cc: Tom Lendacky 
Cc: linux-crypto@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Improvements-by: Borislav Petkov 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
Acked-by: Gary R Hook 
---
 include/linux/psp-sev.h | 465 
 1 file changed, 465 insertions(+)
 create mode 100644 include/linux/psp-sev.h

diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
new file mode 100644
index ..4a150d17d537
--- /dev/null
+++ b/include/linux/psp-sev.h
@@ -0,0 +1,465 @@
+/*
+ * AMD Secure Encrypted Virtualization (SEV) driver interface
+ *
+ * Copyright (C) 2016-2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh 
+ *
+ * SEV spec 0.14 is available at:
+ * http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __PSP_SEV_H__
+#define __PSP_SEV_H__
+
+#include 
+
+#ifdef CONFIG_X86
+#include 
+
+#define __psp_pa(x)__sme_pa(x)
+#else
+#define __psp_pa(x)__pa(x)
+#endif
+
+#define SEV_FW_BLOB_MAX_SIZE   0x4000  /* 16KB */
+
+/**
+ * SEV platform state
+ */
+enum sev_state {
+   SEV_STATE_UNINIT= 0x0,
+   SEV_STATE_INIT  = 0x1,
+   SEV_STATE_WORKING   = 0x2,
+
+   SEV_STATE_MAX
+};
+
+/**
+ * SEV platform and guest management commands
+ */
+enum sev_cmd {
+   /* platform commands */
+   SEV_CMD_INIT= 0x001,
+   SEV_CMD_SHUTDOWN= 0x002,
+   SEV_CMD_FACTORY_RESET   = 0x003,
+   SEV_CMD_PLATFORM_STATUS = 0x004,
+   SEV_CMD_PEK_GEN = 0x005,
+   SEV_CMD_PEK_CSR = 0x006,
+   SEV_CMD_PEK_CERT_IMPORT = 0x007,
+   SEV_CMD_PDH_CERT_EXPORT = 0x008,
+   SEV_CMD_PDH_GEN = 0x009,
+   SEV_CMD_DF_FLUSH= 0x00A,
+
+   /* Guest commands */
+   SEV_CMD_DECOMMISSION= 0x020,
+   SEV_CMD_ACTIVATE= 0x021,
+   SEV_CMD_DEACTIVATE  = 0x022,
+   SEV_CMD_GUEST_STATUS= 0x023,
+
+   /* Guest launch commands */
+   SEV_CMD_LAUNCH_START= 0x030,
+   SEV_CMD_LAUNCH_UPDATE_DATA  = 0x031,
+   SEV_CMD_LAUNCH_UPDATE_VMSA  = 0x032,
+   SEV_CMD_LAUNCH_MEASURE  = 0x033,
+   SEV_CMD_LAUNCH_UPDATE_SECRET= 0x034,
+   SEV_CMD_LAUNCH_FINISH   = 0x035,
+
+   /* Guest migration commands (outgoing) */
+   SEV_CMD_SEND_START  = 0x040,
+   SEV_CMD_SEND_UPDATE_DATA= 0x041,
+   SEV_CMD_SEND_UPDATE_VMSA= 0x042,
+   SEV_CMD_SEND_FINISH = 0x043,
+
+   /* Guest migration commands (incoming) */
+   SEV_CMD_RECEIVE_START   = 0x050,
+   SEV_CMD_RECEIVE_UPDATE_DATA = 0x051,
+   SEV_CMD_RECEIVE_UPDATE_VMSA = 0x052,
+   SEV_CMD_RECEIVE_FINISH  = 0x053,
+
+   /* Guest debug commands */
+   SEV_CMD_DBG_DECRYPT = 0x060,
+   SEV_CMD_DBG_ENCRYPT = 0x061,
+
+   SEV_CMD_MAX,
+};
+
+/**
+ * struct sev_data_init - INIT command parameters
+ *
+ * @flags: processing flags
+ * @tmr_address: system physical address used for SEV-ES
+ * @tmr_len: len of tmr_address
+ */
+struct sev_data_init {
+   u32 flags;  /* In */
+   u32 reserved;   /* In */
+   u64 tmr_address;/* In */
+   u32 tmr_len;/* In */
+} __packed;
+
+/**
+ * struct sev_data_pek_csr - PEK_CSR command parameters
+ *
+ * @address: PEK certificate chain
+ * @len: len of certificate
+ */
+struct sev_data_pek_csr {
+   u64 address;/* In */
+   u32 len;/* In/Out */
+} __packed;
+
+/**
+ * struct sev_data_cert_import - PEK_CERT_IMPORT command parameters
+ *
+ * @pek_address: PEK certificate chain
+ * @pek_len: len of PEK certificate
+ * @oca_address: OCA certificate chain
+ * @oca_len: len of OCA certificate
+ */
+struct sev_data_pek_cert_import {
+   u64 pek_cert_address;   /* In */
+   u32 pek_cert_len;   /* In */
+   u32 reserved;   /* In */
+   u64 oca_cert_address;   /* In */
+   u32 oca_cert_len;   /* In */
+} __packed;
+
+/**
+ * struct sev_data_pdh_cert_export - PDH_CERT_EXPORT command parameters
+ *
+ * @pdh_address: PD

Re: [PATCH v3 1/3] dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings

2017-12-04 Thread Rob Herring

On Mon, Dec 04, 2017 at 01:53:49PM +0100, Łukasz Stelmach wrote:
> Add binding documentation for the True Random Number Generator
> found on Samsung Exynos 5250+ SoCs.
> 
> Signed-off-by: Łukasz Stelmach 
> ---
>  .../devicetree/bindings/rng/samsung,exynos5250-trng.txt | 17 
> +
>  1 file changed, 17 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt

I acked v1 (and so did Krzysztof). You added them in v2, but 
dropped here?

Re: [PATCH] treewide: remove duplicate includes

2017-12-04 Thread Eduardo Valentin

Hello,

On Mon, Dec 04, 2017 at 03:19:39AM +0530, Pravin Shedge wrote:


> diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c
> index d04ec3b..e09f035 100644
> --- a/drivers/thermal/of-thermal.c
> +++ b/drivers/thermal/of-thermal.c
> @@ -30,7 +30,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "thermal_core.h"
>  

No issues with this but,

Please send a separate patch to linux...@vger.kernel.org and copy 
edube...@gmail.com

thanks,

-- 
All the best,
Eduardo Valentin

Re: [PATCH] treewide: remove duplicate includes

2017-12-04 Thread Darrick J. Wong

On Mon, Dec 04, 2017 at 03:19:39AM +0530, Pravin Shedge wrote:
> These duplicate includes have been found with scripts/checkincludes.pl but
> they have been removed manually to avoid removing false positives.
> 
> Unit Testing:
> 
> - build successful
> - LTP testsuite passes.
> - checkpatch.pl passes
> 
> Signed-off-by: Pravin Shedge 



> diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> index 9c42c4e..ab3aef2 100644
> --- a/fs/xfs/scrub/scrub.c
> +++ b/fs/xfs/scrub/scrub.c

These look reasonable, but please send me (and linux-xfs) the three
xfs changes separately so that I can add them to the xfs tree.

(Also, thank you for cc'ing the xfs list for this treewide change...)

--D

Re: [PATCH v3 2/3] hwrng: exynos - add Samsung Exynos True RNG driver

2017-12-04 Thread Krzysztof Kozlowski

On Mon, Dec 4, 2017 at 1:53 PM, Łukasz Stelmach  wrote:
> Add support for True Random Number Generator found in Samsung Exynos
> 5250+ SoCs.
>
> Signed-off-by: Łukasz Stelmach 
> ---
>  MAINTAINERS  |   7 +
>  drivers/char/hw_random/Kconfig   |  12 ++
>  drivers/char/hw_random/Makefile  |   1 +
>  drivers/char/hw_random/exynos-trng.c | 245 
> +++
>  4 files changed, 265 insertions(+)
>  create mode 100644 drivers/char/hw_random/exynos-trng.c

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof

Re: [PATCH v3 1/3] dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings

2017-12-04 Thread Krzysztof Kozlowski

On Mon, Dec 4, 2017 at 1:53 PM, Łukasz Stelmach  wrote:
> Add binding documentation for the True Random Number Generator
> found on Samsung Exynos 5250+ SoCs.
>
> Signed-off-by: Łukasz Stelmach 
> ---
>  .../devicetree/bindings/rng/samsung,exynos5250-trng.txt | 17 
> +
>  1 file changed, 17 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
>
> diff --git 
> a/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt 
> b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
> new file mode 100644
> index ..5a613a4ec780
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
> @@ -0,0 +1,17 @@
> +Exynos True Random Number Generator
> +
> +Required properties:
> +
> +- compatible  : Should be "samsung,exynos5250-trng".
> +- reg : Specifies base physical address and size of the registers 
> map.
> +- clocks  : Phandle to clock-controller plus clock-specifier pair.
> +- clock-names : "secss" as a clock name.
> +
> +Example:
> +
> +   rng@10830600 {
> +   compatible = "samsung,exynos5250-trng";
> +   reg = <0x10830600 0x100>;
> +   clocks = <&clock CLK_SSS>;
> +   clock-names = "secss";
> +   };
> --
> 2.11.0

Mine and Rob's tags disappeared and I think you did not introduce any
major changes here, right?

Best regards,
Krzysztof

[PATCH v3 1/3] dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings

2017-12-04 Thread Łukasz Stelmach

Add binding documentation for the True Random Number Generator
found on Samsung Exynos 5250+ SoCs.

Signed-off-by: Łukasz Stelmach 
---
 .../devicetree/bindings/rng/samsung,exynos5250-trng.txt | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt

diff --git a/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt 
b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
new file mode 100644
index ..5a613a4ec780
--- /dev/null
+++ b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
@@ -0,0 +1,17 @@
+Exynos True Random Number Generator
+
+Required properties:
+
+- compatible  : Should be "samsung,exynos5250-trng".
+- reg : Specifies base physical address and size of the registers map.
+- clocks  : Phandle to clock-controller plus clock-specifier pair.
+- clock-names : "secss" as a clock name.
+
+Example:
+
+   rng@10830600 {
+   compatible = "samsung,exynos5250-trng";
+   reg = <0x10830600 0x100>;
+   clocks = <&clock CLK_SSS>;
+   clock-names = "secss";
+   };
-- 
2.11.0

[PATCH v3 0/3] True RNG driver for Samsung Exynos 5250+ SoCs

2017-12-04 Thread Łukasz Stelmach

Hello.

The following patches add support for the true random number generator
found in Samsung Exynos 5250+ SoCs.

Patch #1 adds documentation for devicetree bindings.

Patch #2 introduces the driver and appropriate changes in Makefile and Kconfig.

Patch #3 adds nodes in devicetree files for Exynos SoCs (requires
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git/commit/?id=cdd745c8c76b02471d88f467c44a3d4eb431aa0b).

Changes in v3:

- Changed node-name in device-tree bindings according to from Krzysztof
  Koz��owski's recommendation.
- Fixed name and added EXYNOS_ in EXYNOS_TRNG_CTRL_RNGEN
- Removed unnecessary label and simplifed the abnormal exit path in
  exynos_trng_probe()
- Replaced __raw_{readl,writel}() with {readl,writel}_relaxed() (thanks
  PrasannaKumar Muralidharan)

Changes in v2:
- Fixed indentation in drivers/char/hw_random/Kconfig.
- Defined TRNG_CTRL_RGNEN.
- Removed global variable exynos_trng_dev.
- Removed exynos_trng_{set,get}_reg() functions.
- Used the min_t() macro instead of the ternary operator in
  exynos_trng_do_read().
- Moved trng initialisation to the variable declaration in
  exynos_trng_init().
- Fixed comment formating.
- Removed unnecessary "TODO" comments.
- Return ENOMEM, if devm_kzalloc() devm_kstrdup() fail.
- Rephrased and unified error messages in exynos_trng_probe().
- Removed nullification of trng->mem.
- Added err_pm_get label at the end of exynos_trng_probe().
- Removed double error message at the end of exynos_trng_probe().
- Implemented exynos_trng_remove().

v2 available here:

https://www.spinics.net/lists/linux-samsung-soc/msg61280.html
https://patchwork.kernel.org/patch/10076225/
https://patchwork.kernel.org/patch/10076227/
https://patchwork.kernel.org/patch/10076237/

v1 can be found:

https://www.spinics.net/lists/linux-samsung-soc/msg61253.html
https://patchwork.kernel.org/patch/10072967/
https://patchwork.kernel.org/patch/10072971/
https://patchwork.kernel.org/patch/10072963/

��ukasz Stelmach (3):
  dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings
  hwrng: exynos - add Samsung Exynos True RNG driver
  ARM: dts: exynos: Add nodes for True Random Number Generator

 .../bindings/rng/samsung,exynos5250-trng.txt   |  17 ++
 MAINTAINERS|   7 +
 arch/arm/boot/dts/exynos5.dtsi |   5 +
 arch/arm/boot/dts/exynos5250.dtsi  |   5 +
 arch/arm/boot/dts/exynos5410.dtsi  |   5 +
 arch/arm/boot/dts/exynos5420.dtsi  |   5 +
 drivers/char/hw_random/Kconfig |  12 +
 drivers/char/hw_random/Makefile|   1 +
 drivers/char/hw_random/exynos-trng.c   | 245 +
 9 files changed, 302 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
 create mode 100644 drivers/char/hw_random/exynos-trng.c

-- 
2.11.0

[PATCH v3 2/3] hwrng: exynos - add Samsung Exynos True RNG driver

2017-12-04 Thread Łukasz Stelmach

Add support for True Random Number Generator found in Samsung Exynos
5250+ SoCs.

Signed-off-by: Łukasz Stelmach 
---
 MAINTAINERS  |   7 +
 drivers/char/hw_random/Kconfig   |  12 ++
 drivers/char/hw_random/Makefile  |   1 +
 drivers/char/hw_random/exynos-trng.c | 245 +++
 4 files changed, 265 insertions(+)
 create mode 100644 drivers/char/hw_random/exynos-trng.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2811a211632c..992074cca612 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11780,6 +11780,13 @@ S: Maintained
 F: drivers/crypto/exynos-rng.c
 F: Documentation/devicetree/bindings/rng/samsung,exynos-rng4.txt
 
+SAMSUNG EXYNOS TRUE RANDOM NUMBER GENERATOR (TRNG) DRIVER
+M: Łukasz Stelmach 
+L: linux-samsung-...@vger.kernel.org
+S: Maintained
+F: drivers/char/hw_random/exynos-trng.c
+F: Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt
+
 SAMSUNG FRAMEBUFFER DRIVER
 M: Jingoo Han 
 L: linux-fb...@vger.kernel.org
diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
index 95a031e9eced..292e6b36d493 100644
--- a/drivers/char/hw_random/Kconfig
+++ b/drivers/char/hw_random/Kconfig
@@ -449,6 +449,18 @@ config HW_RANDOM_S390
 
  If unsure, say Y.
 
+config HW_RANDOM_EXYNOS
+   tristate "Samsung Exynos True Random Number Generator support"
+   depends on ARCH_EXYNOS || COMPILE_TEST
+   default HW_RANDOM
+   ---help---
+ This driver provides support for the True Random Number
+ Generator available in Exynos SoCs.
+
+ To compile this driver as a module, choose M here: the module
+ will be called exynos-trng.
+
+ If unsure, say Y.
 endif # HW_RANDOM
 
 config UML_RANDOM
diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile
index f3728d008fff..5595df97da7a 100644
--- a/drivers/char/hw_random/Makefile
+++ b/drivers/char/hw_random/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HW_RANDOM_GEODE) += geode-rng.o
 obj-$(CONFIG_HW_RANDOM_N2RNG) += n2-rng.o
 n2-rng-y := n2-drv.o n2-asm.o
 obj-$(CONFIG_HW_RANDOM_VIA) += via-rng.o
+obj-$(CONFIG_HW_RANDOM_EXYNOS) += exynos-trng.o
 obj-$(CONFIG_HW_RANDOM_IXP4XX) += ixp4xx-rng.o
 obj-$(CONFIG_HW_RANDOM_OMAP) += omap-rng.o
 obj-$(CONFIG_HW_RANDOM_OMAP3_ROM) += omap3-rom-rng.o
diff --git a/drivers/char/hw_random/exynos-trng.c 
b/drivers/char/hw_random/exynos-trng.c
new file mode 100644
index ..971d2fe9d55a
--- /dev/null
+++ b/drivers/char/hw_random/exynos-trng.c
@@ -0,0 +1,245 @@
+/*
+ * RNG driver for Exynos TRNGs
+ *
+ * Author: Łukasz Stelmach 
+ *
+ * Copyright 2017 (c) Samsung Electronics Software, Inc.
+ *
+ * Based on the Exynos PRNG driver drivers/crypto/exynos-rng by
+ * Krzysztof Kozłowski 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define EXYNOS_TRNG_CLKDIV (0x0)
+
+#define EXYNOS_TRNG_CTRL   (0x20)
+#define EXYNOS_TRNG_CTRL_RNGEN BIT(31)
+
+#define EXYNOS_TRNG_POST_CTRL  (0x30)
+#define EXYNOS_TRNG_ONLINE_CTRL(0x40)
+#define EXYNOS_TRNG_ONLINE_STAT(0x44)
+#define EXYNOS_TRNG_ONLINE_MAXCHI2 (0x48)
+#define EXYNOS_TRNG_FIFO_CTRL  (0x50)
+#define EXYNOS_TRNG_FIFO_0 (0x80)
+#define EXYNOS_TRNG_FIFO_1 (0x84)
+#define EXYNOS_TRNG_FIFO_2 (0x88)
+#define EXYNOS_TRNG_FIFO_3 (0x8c)
+#define EXYNOS_TRNG_FIFO_4 (0x90)
+#define EXYNOS_TRNG_FIFO_5 (0x94)
+#define EXYNOS_TRNG_FIFO_6 (0x98)
+#define EXYNOS_TRNG_FIFO_7 (0x9c)
+#define EXYNOS_TRNG_FIFO_LEN   (8)
+#define EXYNOS_TRNG_CLOCK_RATE (50)
+
+
+struct exynos_trng_dev {
+   struct device*dev;
+   void __iomem *mem;
+   struct clk   *clk;
+   struct hwrng rng;
+};
+
+static int exynos_trng_do_read(struct hwrng *rng, void *data, size_t max,
+  bool wait)
+{
+   struct exynos_trng_dev *trng;
+   u32 val;
+
+   max = min_t(size_t, max, (EXYNOS_TRNG_FIFO_LEN * 4));
+
+   trng = (struct exynos_trng_dev *)rng->priv;
+
+   writel_relaxed(max * 8, trng->mem + EXYNOS_TRNG_FIFO_CTRL);
+   val = readl_poll_timeout(trng->mem + EXYNOS_TRNG_FIFO_CTRL, val,
+val == 0, 200, 100);
+   if (val < 0)
+   return val;
+
+   memcpy_fromio(data, trng->mem + EXYNOS_TRNG_FIFO_0, max);
+
+   return max;
+}
+
+static int exynos_trng_init(struct hwrng *rng)
+{
+

[PATCH v3 3/3] ARM: dts: exynos: Add nodes for True Random Number Generator

2017-12-04 Thread Łukasz Stelmach

Add nodes for the True Random Number Generator found in Samsung Exynos
5250+ SoCs.

Signed-off-by: Łukasz Stelmach 
---
 arch/arm/boot/dts/exynos5.dtsi| 5 +
 arch/arm/boot/dts/exynos5250.dtsi | 5 +
 arch/arm/boot/dts/exynos5410.dtsi | 5 +
 arch/arm/boot/dts/exynos5420.dtsi | 5 +
 4 files changed, 20 insertions(+)

diff --git a/arch/arm/boot/dts/exynos5.dtsi b/arch/arm/boot/dts/exynos5.dtsi
index 33f929c1dda9..e0c91ff4442c 100644
--- a/arch/arm/boot/dts/exynos5.dtsi
+++ b/arch/arm/boot/dts/exynos5.dtsi
@@ -215,5 +215,10 @@
  compatible = "samsung,exynos5250-prng";
  reg = <0x10830400 0x200>;
};
+
+   trng: rng@10830600 {
+ compatible = "samsung,exynos5250-trng";
+ reg = <0x10830600 0x100>;
+   };
};
 };
diff --git a/arch/arm/boot/dts/exynos5250.dtsi 
b/arch/arm/boot/dts/exynos5250.dtsi
index 51aa83ba8c87..38627e8164a0 100644
--- a/arch/arm/boot/dts/exynos5250.dtsi
+++ b/arch/arm/boot/dts/exynos5250.dtsi
@@ -1086,4 +1086,9 @@
clock-names = "secss";
 };
 
+&trng {
+   clocks = <&clock CLK_SSS>;
+   clock-names = "secss";
+};
+
 #include "exynos5250-pinctrl.dtsi"
diff --git a/arch/arm/boot/dts/exynos5410.dtsi 
b/arch/arm/boot/dts/exynos5410.dtsi
index 1604cb1b837d..aa8b14eda662 100644
--- a/arch/arm/boot/dts/exynos5410.dtsi
+++ b/arch/arm/boot/dts/exynos5410.dtsi
@@ -384,6 +384,11 @@
  3 0 0x0700 0x2>;
 };
 
+&trng {
+   clocks = <&clock CLK_SSS>;
+   clock-names = "secss";
+};
+
 &usbdrd3_0 {
clocks = <&clock CLK_USBD300>;
clock-names = "usbdrd30";
diff --git a/arch/arm/boot/dts/exynos5420.dtsi 
b/arch/arm/boot/dts/exynos5420.dtsi
index 31c77ea9123d..6c8cec9d564a 100644
--- a/arch/arm/boot/dts/exynos5420.dtsi
+++ b/arch/arm/boot/dts/exynos5420.dtsi
@@ -1459,6 +1459,11 @@
clock-names = "secss";
 };
 
+&trng {
+   clocks = <&clock CLK_SSS>;
+   clock-names = "secss";
+};
+
 &usbdrd3_0 {
clocks = <&clock CLK_USBD300>;
clock-names = "usbdrd30";
-- 
2.11.0

[PATCH v2 18/19] crypto: arm64/crct10dif-ce - yield NEON every 8 blocks of input

2017-12-04 Thread Ard Biesheuvel

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON every 8 blocks of input.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/crct10dif-ce-core.S | 39 ++--
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/crypto/crct10dif-ce-core.S 
b/arch/arm64/crypto/crct10dif-ce-core.S
index d5b5a8c038c8..d57067e80bae 100644
--- a/arch/arm64/crypto/crct10dif-ce-core.S
+++ b/arch/arm64/crypto/crct10dif-ce-core.S
@@ -74,13 +74,22 @@
.text
.cpugeneric+crypto
 
-   arg1_low32  .reqw0
-   arg2.reqx1
-   arg3.reqx2
+   arg1_low32  .reqw19
+   arg2.reqx20
+   arg3.reqx21
 
vzr .reqv13
 
 ENTRY(crc_t10dif_pmull)
+   stp x29, x30, [sp, #-176]!
+   mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+
+   mov arg1_low32, w0
+   mov arg2, x1
+   mov arg3, x2
+
movivzr.16b, #0 // init zero register
 
// adjust the 16-bit initial_crc value, scale it to 32 bits
@@ -175,8 +184,27 @@ CPU_LE(ext v12.16b, v12.16b, v12.16b, #8   
)
subsarg3, arg3, #128
 
// check if there is another 64B in the buffer to be able to fold
-   b.ge_fold_64_B_loop
+   b.lt_fold_64_B_end
+
+   yield_neon_pre  arg3, 3, 128, _fold_64_B_loop   // yield every 8 blocks
+   stp q0, q1, [sp, #48]
+   stp q2, q3, [sp, #80]
+   stp q4, q5, [sp, #112]
+   stp q6, q7, [sp, #144]
+   yield_neon_post 2f
+   b   _fold_64_B_loop
+
+   .subsection 1
+2: ldp q0, q1, [sp, #48]
+   ldp q2, q3, [sp, #80]
+   ldp q4, q5, [sp, #112]
+   ldp q6, q7, [sp, #144]
+   ldr q10, rk3
+   movivzr.16b, #0 // init zero register
+   b   _fold_64_B_loop
+   .previous
 
+_fold_64_B_end:
// at this point, the buffer pointer is pointing at the last y Bytes
// of the buffer the 64B of folded data is in 4 of the vector
// registers: v0, v1, v2, v3
@@ -304,6 +332,9 @@ _barrett:
 _cleanup:
// scale the result back to 16 bits
lsr x0, x0, #16
+   ldp x19, x20, [sp, #16]
+   ldp x21, x22, [sp, #32]
+   ldp x29, x30, [sp], #176
ret
 
 _less_than_128:
-- 
2.11.0

[PATCH v2 09/19] crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path

2017-12-04 Thread Ard Biesheuvel

CBC MAC is strictly sequential, and so the current AES code simply
processes the input one block at a time. However, we are about to add
yield support, which adds a bit of overhead, and which we prefer to
align with other modes in terms of granularity (i.e., it is better to
have all routines yield every 64 bytes and not have an exception for
CBC MAC which yields every 16 bytes)

So unroll the loop by 4. We still cannot perform the AES algorithm in
parallel, but we can at least merge the loads and stores.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-modes.S | 23 ++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index e86535a1329d..a68412e1e3a4 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -395,8 +395,28 @@ AES_ENDPROC(aes_xts_decrypt)
 AES_ENTRY(aes_mac_update)
ld1 {v0.16b}, [x4]  /* get dg */
enc_prepare w2, x1, x7
-   cbnzw5, .Lmacenc
+   cbz w5, .Lmacloop4x
 
+   encrypt_block   v0, w2, x1, x7, w8
+
+.Lmacloop4x:
+   subsw3, w3, #4
+   bmi .Lmac1x
+   ld1 {v1.16b-v4.16b}, [x0], #64  /* get next pt block */
+   eor v0.16b, v0.16b, v1.16b  /* ..and xor with dg */
+   encrypt_block   v0, w2, x1, x7, w8
+   eor v0.16b, v0.16b, v2.16b
+   encrypt_block   v0, w2, x1, x7, w8
+   eor v0.16b, v0.16b, v3.16b
+   encrypt_block   v0, w2, x1, x7, w8
+   eor v0.16b, v0.16b, v4.16b
+   cmp w3, wzr
+   csinv   x5, x6, xzr, eq
+   cbz w5, .Lmacout
+   encrypt_block   v0, w2, x1, x7, w8
+   b   .Lmacloop4x
+.Lmac1x:
+   add w3, w3, #4
 .Lmacloop:
cbz w3, .Lmacout
ld1 {v1.16b}, [x0], #16 /* get next pt block */
@@ -406,7 +426,6 @@ AES_ENTRY(aes_mac_update)
csinv   x5, x6, xzr, eq
cbz w5, .Lmacout
 
-.Lmacenc:
encrypt_block   v0, w2, x1, x7, w8
b   .Lmacloop
 
-- 
2.11.0

[PATCH v2 19/19] DO NOT MERGE

2017-12-04 Thread Ard Biesheuvel

Test code to force a kernel_neon_end+begin sequence at every yield point,
and wipe the entire NEON state before resuming the algorithm.
---
 arch/arm64/include/asm/assembler.h | 33 
 1 file changed, 33 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index 917b026d3e00..dfee20246592 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -549,6 +549,7 @@ alternative_else_nop_endif
cmp w1, #1 // == PREEMPT_OFFSET
cselx0, x0, xzr, eq
tbnzx0, #TIF_NEED_RESCHED, f// needs rescheduling?
+   b   f
 :
 #endif
.subsection 1
@@ -558,6 +559,38 @@ alternative_else_nop_endif
.macro  yield_neon_post, lbl:req
bl  kernel_neon_end
bl  kernel_neon_begin
+   moviv0.16b, #0x55
+   moviv1.16b, #0x55
+   moviv2.16b, #0x55
+   moviv3.16b, #0x55
+   moviv4.16b, #0x55
+   moviv5.16b, #0x55
+   moviv6.16b, #0x55
+   moviv7.16b, #0x55
+   moviv8.16b, #0x55
+   moviv9.16b, #0x55
+   moviv10.16b, #0x55
+   moviv11.16b, #0x55
+   moviv12.16b, #0x55
+   moviv13.16b, #0x55
+   moviv14.16b, #0x55
+   moviv15.16b, #0x55
+   moviv16.16b, #0x55
+   moviv17.16b, #0x55
+   moviv18.16b, #0x55
+   moviv19.16b, #0x55
+   moviv20.16b, #0x55
+   moviv21.16b, #0x55
+   moviv22.16b, #0x55
+   moviv23.16b, #0x55
+   moviv24.16b, #0x55
+   moviv25.16b, #0x55
+   moviv26.16b, #0x55
+   moviv27.16b, #0x55
+   moviv28.16b, #0x55
+   moviv29.16b, #0x55
+   moviv30.16b, #0x55
+   moviv31.16b, #0x55
b   \lbl
.previous
.endm
-- 
2.11.0

[PATCH v2 10/19] crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels

2017-12-04 Thread Ard Biesheuvel

Tweak the SHA256 update routines to invoke the SHA256 block transform
block by block, to avoid excessive scheduling delays caused by the
NEON algorithm running with preemption disabled.

Also, remove a stale comment which no longer applies now that kernel
mode NEON is actually disallowed in some contexts.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/sha256-glue.c | 36 +---
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
index b064d925fe2a..e8880ccdc71f 100644
--- a/arch/arm64/crypto/sha256-glue.c
+++ b/arch/arm64/crypto/sha256-glue.c
@@ -89,21 +89,32 @@ static struct shash_alg algs[] = { {
 static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
  unsigned int len)
 {
-   /*
-* Stacking and unstacking a substantial slice of the NEON register
-* file may significantly affect performance for small updates when
-* executing in interrupt context, so fall back to the scalar code
-* in that case.
-*/
+   struct sha256_state *sctx = shash_desc_ctx(desc);
+
if (!may_use_simd())
return sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
 
-   kernel_neon_begin();
-   sha256_base_do_update(desc, data, len,
-   (sha256_block_fn *)sha256_block_neon);
-   kernel_neon_end();
+   while (len > 0) {
+   unsigned int chunk = len;
+
+   /*
+* Don't hog the CPU for the entire time it takes to process all
+* input when running on a preemptible kernel, but process the
+* data block by block instead.
+*/
+   if (IS_ENABLED(CONFIG_PREEMPT) &&
+   chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE)
+   chunk = SHA256_BLOCK_SIZE -
+   sctx->count % SHA256_BLOCK_SIZE;
 
+   kernel_neon_begin();
+   sha256_base_do_update(desc, data, chunk,
+ (sha256_block_fn *)sha256_block_neon);
+   kernel_neon_end();
+   data += chunk;
+   len -= chunk;
+   }
return 0;
 }
 
@@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, 
const u8 *data,
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order);
} else {
-   kernel_neon_begin();
if (len)
-   sha256_base_do_update(desc, data, len,
-   (sha256_block_fn *)sha256_block_neon);
+   sha256_update_neon(desc, data, len);
+   kernel_neon_begin();
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_neon);
kernel_neon_end();
-- 
2.11.0

[PATCH v2 16/19] crypto: arm64/aes-ghash - yield after processing fixed number of blocks

2017-12-04 Thread Ard Biesheuvel

This updates both the core GHASH as well as the AES-GCM algorithm to
yield each time after processing a fixed chunk of input. For the GCM
driver, we align with the other AES/CE block mode drivers, and use
a block size of 64 bytes. The core GHASH is much shorter, so let's
use a block size of 128 bytes for that one.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/ghash-ce-core.S | 128 ++--
 1 file changed, 92 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/crypto/ghash-ce-core.S 
b/arch/arm64/crypto/ghash-ce-core.S
index 11ebf1ae248a..fbfd4681675d 100644
--- a/arch/arm64/crypto/ghash-ce-core.S
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -212,23 +212,36 @@
ushrXL.2d, XL.2d, #1
.endm
 
-   .macro  __pmull_ghash, pn
-   ld1 {SHASH.2d}, [x3]
-   ld1 {XL.2d}, [x1]
+   .macro  __pmull_ghash, pn, yield
+   stp x29, x30, [sp, #-64]!
+   mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+   str x23, [sp, #48]
+
+   mov x19, x0
+   mov x20, x1
+   mov x21, x2
+   mov x22, x3
+   mov x23, x4
+
+0: ld1 {SHASH.2d}, [x22]
+   ld1 {XL.2d}, [x20]
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
eor SHASH2.16b, SHASH2.16b, SHASH.16b
 
__pmull_pre_\pn
 
/* do the head block first, if supplied */
-   cbz x4, 0f
-   ld1 {T1.2d}, [x4]
-   b   1f
+   cbz x23, 1f
+   ld1 {T1.2d}, [x23]
+   mov x23, xzr
+   b   2f
 
-0: ld1 {T1.2d}, [x2], #16
-   sub w0, w0, #1
+1: ld1 {T1.2d}, [x21], #16
+   sub w19, w19, #1
 
-1: /* multiply XL by SHASH in GF(2^128) */
+2: /* multiply XL by SHASH in GF(2^128) */
 CPU_LE(rev64   T1.16b, T1.16b  )
 
ext T2.16b, XL.16b, XL.16b, #8
@@ -250,9 +263,19 @@ CPU_LE(rev64   T1.16b, T1.16b  )
eor T2.16b, T2.16b, XH.16b
eor XL.16b, XL.16b, T2.16b
 
-   cbnzw0, 0b
+   cbz w19, 3f
 
-   st1 {XL.2d}, [x1]
+   yield_neon_pre  w19, \yield, 1, 1b
+   st1 {XL.2d}, [x20]
+   yield_neon_post 0b
+
+   b   1b
+
+3: st1 {XL.2d}, [x20]
+   ldp x19, x20, [sp, #16]
+   ldp x21, x22, [sp, #32]
+   ldr x23, [sp, #48]
+   ldp x29, x30, [sp], #64
ret
.endm
 
@@ -261,11 +284,11 @@ CPU_LE(   rev64   T1.16b, T1.16b  )
 * struct ghash_key const *k, const char *head)
 */
 ENTRY(pmull_ghash_update_p64)
-   __pmull_ghash   p64
+   __pmull_ghash   p64, 5
 ENDPROC(pmull_ghash_update_p64)
 
 ENTRY(pmull_ghash_update_p8)
-   __pmull_ghash   p8
+   __pmull_ghash   p8, 2
 ENDPROC(pmull_ghash_update_p8)
 
KS  .reqv8
@@ -304,38 +327,56 @@ ENDPROC(pmull_ghash_update_p8)
.endm
 
.macro  pmull_gcm_do_crypt, enc
-   ld1 {SHASH.2d}, [x4]
-   ld1 {XL.2d}, [x1]
-   ldr x8, [x5, #8]// load lower counter
+   stp x29, x30, [sp, #-96]!
+   mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+   stp x23, x24, [sp, #48]
+   stp x25, x26, [sp, #64]
+   str x27, [sp, #80]
+
+   mov x19, x0
+   mov x20, x1
+   mov x21, x2
+   mov x22, x3
+   mov x23, x4
+   mov x24, x5
+   mov x25, x6
+   mov x26, x7
+
+   ldr x27, [x24, #8]  // load lower counter
+CPU_LE(rev x27, x27)
+
+0: ld1 {SHASH.2d}, [x23]
+   ld1 {XL.2d}, [x20]
 
moviMASK.16b, #0xe1
ext SHASH2.16b, SHASH.16b, SHASH.16b, #8
-CPU_LE(rev x8, x8  )
shl MASK.2d, MASK.2d, #57
eor SHASH2.16b, SHASH2.16b, SHASH.16b
 
.if \enc == 1
-   ld1 {KS.16b}, [x7]
+   ld1 {KS.16b}, [x26]
.endif
 
-0: ld1 {CTR.8b}, [x5]  // load upper counter
-   ld1 {INP.16b}, [x3], #16
-   rev x9, x8
-   add x8, x8, #1
-   sub w0, w0, #1
+1: ld1 {CTR.8b}, [x24] // load upper counter
+   ld1 {INP.16b

[PATCH v2 08/19] crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path

2017-12-04 Thread Ard Biesheuvel

CBC encryption is strictly sequential, and so the current AES code
simply processes the input one block at a time. However, we are
about to add yield support, which adds a bit of overhead, and which
we prefer to align with other modes in terms of granularity (i.e.,
it is better to have all routines yield every 64 bytes and not have
an exception for CBC encrypt which yields every 16 bytes)

So unroll the loop by 4. We still cannot perform the AES algorithm in
parallel, but we can at least merge the loads and stores.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-modes.S | 31 
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index 27a235b2ddee..e86535a1329d 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -94,17 +94,36 @@ AES_ENDPROC(aes_ecb_decrypt)
 */
 
 AES_ENTRY(aes_cbc_encrypt)
-   ld1 {v0.16b}, [x5]  /* get iv */
+   ld1 {v4.16b}, [x5]  /* get iv */
enc_prepare w3, x2, x6
 
-.Lcbcencloop:
-   ld1 {v1.16b}, [x1], #16 /* get next pt block */
-   eor v0.16b, v0.16b, v1.16b  /* ..and xor with iv */
+.Lcbcencloop4x:
+   subsw4, w4, #4
+   bmi .Lcbcenc1x
+   ld1 {v0.16b-v3.16b}, [x1], #64  /* get 4 pt blocks */
+   eor v0.16b, v0.16b, v4.16b  /* ..and xor with iv */
encrypt_block   v0, w3, x2, x6, w7
-   st1 {v0.16b}, [x0], #16
+   eor v1.16b, v1.16b, v0.16b
+   encrypt_block   v1, w3, x2, x6, w7
+   eor v2.16b, v2.16b, v1.16b
+   encrypt_block   v2, w3, x2, x6, w7
+   eor v3.16b, v3.16b, v2.16b
+   encrypt_block   v3, w3, x2, x6, w7
+   st1 {v0.16b-v3.16b}, [x0], #64
+   mov v4.16b, v3.16b
+   b   .Lcbcencloop4x
+.Lcbcenc1x:
+   addsw4, w4, #4
+   beq .Lcbcencout
+.Lcbcencloop:
+   ld1 {v0.16b}, [x1], #16 /* get next pt block */
+   eor v4.16b, v4.16b, v0.16b  /* ..and xor with iv */
+   encrypt_block   v4, w3, x2, x6, w7
+   st1 {v4.16b}, [x0], #16
subsw4, w4, #1
bne .Lcbcencloop
-   st1 {v0.16b}, [x5]  /* return iv */
+.Lcbcencout:
+   st1 {v4.16b}, [x5]  /* return iv */
ret
 AES_ENDPROC(aes_cbc_encrypt)
 
-- 
2.11.0

[PATCH v2 17/19] crypto: arm64/crc32-ce - yield NEON every 16 blocks of input

2017-12-04 Thread Ard Biesheuvel

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON every 16 blocks of input.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/crc32-ce-core.S | 55 +++-
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/crypto/crc32-ce-core.S 
b/arch/arm64/crypto/crc32-ce-core.S
index 18f5a8442276..bca3d22fae7b 100644
--- a/arch/arm64/crypto/crc32-ce-core.S
+++ b/arch/arm64/crypto/crc32-ce-core.S
@@ -100,9 +100,9 @@
dCONSTANT   .reqd0
qCONSTANT   .reqq0
 
-   BUF .reqx0
-   LEN .reqx1
-   CRC .reqx2
+   BUF .reqx19
+   LEN .reqx20
+   CRC .reqx21
 
vzr .reqv9
 
@@ -116,13 +116,27 @@
 * size_t len, uint crc32)
 */
 ENTRY(crc32_pmull_le)
-   adr x3, .Lcrc32_constants
+   stp x29, x30, [sp, #-112]!
+   mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+
+   adr x22, .Lcrc32_constants
b   0f
 
 ENTRY(crc32c_pmull_le)
-   adr x3, .Lcrc32c_constants
+   stp x29, x30, [sp, #-112]!
+   mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+
+   adr x22, .Lcrc32c_constants
 
-0: bic LEN, LEN, #15
+0: mov BUF, x0
+   mov LEN, x1
+   mov CRC, x2
+
+   bic LEN, LEN, #15
ld1 {v1.16b-v4.16b}, [BUF], #0x40
movivzr.16b, #0
fmovdCONSTANT, CRC
@@ -131,7 +145,7 @@ ENTRY(crc32c_pmull_le)
cmp LEN, #0x40
b.ltless_64
 
-   ldr qCONSTANT, [x3]
+   ldr qCONSTANT, [x22]
 
 loop_64:   /* 64 bytes Full cache line folding */
sub LEN, LEN, #0x40
@@ -161,10 +175,24 @@ loop_64:  /* 64 bytes Full cache line folding */
eor v4.16b, v4.16b, v8.16b
 
cmp LEN, #0x40
-   b.geloop_64
+   b.ltless_64
+
+   yield_neon_pre  LEN, 4, 64, loop_64 // yield every 16 blocks
+   stp q1, q2, [sp, #48]
+   stp q3, q4, [sp, #80]
+   yield_neon_post 2f
+   b   loop_64
+
+   .subsection 1
+2: ldp q1, q2, [sp, #48]
+   ldp q3, q4, [sp, #80]
+   ldr qCONSTANT, [x22]
+   movivzr.16b, #0
+   b   loop_64
+   .previous
 
 less_64:   /* Folding cache line into 128bit */
-   ldr qCONSTANT, [x3, #16]
+   ldr qCONSTANT, [x22, #16]
 
pmull2  v5.1q, v1.2d, vCONSTANT.2d
pmull   v1.1q, v1.1d, vCONSTANT.1d
@@ -203,8 +231,8 @@ fold_64:
eor v1.16b, v1.16b, v2.16b
 
/* final 32-bit fold */
-   ldr dCONSTANT, [x3, #32]
-   ldr d3, [x3, #40]
+   ldr dCONSTANT, [x22, #32]
+   ldr d3, [x22, #40]
 
ext v2.16b, v1.16b, vzr.16b, #4
and v1.16b, v1.16b, v3.16b
@@ -212,7 +240,7 @@ fold_64:
eor v1.16b, v1.16b, v2.16b
 
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
-   ldr qCONSTANT, [x3, #48]
+   ldr qCONSTANT, [x22, #48]
 
and v2.16b, v1.16b, v3.16b
ext v2.16b, vzr.16b, v2.16b, #8
@@ -222,6 +250,9 @@ fold_64:
eor v1.16b, v1.16b, v2.16b
mov w0, v1.s[1]
 
+   ldp x19, x20, [sp, #16]
+   ldp x21, x22, [sp, #32]
+   ldp x29, x30, [sp], #112
ret
 ENDPROC(crc32_pmull_le)
 ENDPROC(crc32c_pmull_le)
-- 
2.11.0

[PATCH v2 15/19] crypto: arm64/aes-bs - yield after processing each 128 bytes of input

2017-12-04 Thread Ard Biesheuvel

Currently, the bit-sliced AES code may keep preemption disabled for as
long as it takes to process each contigous chunk of input, which could
be as large as a page or skb, depending on the context.

For this code to be useable in RT context, it needs to operate on fixed
chunks of limited size. So let's add a yield after each 128 bytes of input,
(i.e., 8x the AES block size, which is the natural granularity for a bit
sliced algorithm.) This will disable and re-enable kernel mode NEON if a
reschedule is pending.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-neonbs-core.S | 317 
 1 file changed, 190 insertions(+), 127 deletions(-)

diff --git a/arch/arm64/crypto/aes-neonbs-core.S 
b/arch/arm64/crypto/aes-neonbs-core.S
index ca0472500433..4532a2262742 100644
--- a/arch/arm64/crypto/aes-neonbs-core.S
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -565,54 +565,68 @@ ENDPROC(aesbs_decrypt8)
 *   int blocks)
 */
.macro  __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
-   stp x29, x30, [sp, #-16]!
+   stp x29, x30, [sp, #-64]!
mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+   str x23, [sp, #48]
+
+   mov x19, x0
+   mov x20, x1
+   mov x21, x2
+   mov x22, x3
+   mov x23, x4
 
 99:mov x5, #1
-   lsl x5, x5, x4
-   subsw4, w4, #8
-   cselx4, x4, xzr, pl
+   lsl x5, x5, x23
+   subsw23, w23, #8
+   cselx23, x23, xzr, pl
cselx5, x5, xzr, mi
 
-   ld1 {v0.16b}, [x1], #16
+   ld1 {v0.16b}, [x20], #16
tbnzx5, #1, 0f
-   ld1 {v1.16b}, [x1], #16
+   ld1 {v1.16b}, [x20], #16
tbnzx5, #2, 0f
-   ld1 {v2.16b}, [x1], #16
+   ld1 {v2.16b}, [x20], #16
tbnzx5, #3, 0f
-   ld1 {v3.16b}, [x1], #16
+   ld1 {v3.16b}, [x20], #16
tbnzx5, #4, 0f
-   ld1 {v4.16b}, [x1], #16
+   ld1 {v4.16b}, [x20], #16
tbnzx5, #5, 0f
-   ld1 {v5.16b}, [x1], #16
+   ld1 {v5.16b}, [x20], #16
tbnzx5, #6, 0f
-   ld1 {v6.16b}, [x1], #16
+   ld1 {v6.16b}, [x20], #16
tbnzx5, #7, 0f
-   ld1 {v7.16b}, [x1], #16
+   ld1 {v7.16b}, [x20], #16
 
-0: mov bskey, x2
-   mov rounds, x3
+0: mov bskey, x21
+   mov rounds, x22
bl  \do8
 
-   st1 {\o0\().16b}, [x0], #16
+   st1 {\o0\().16b}, [x19], #16
tbnzx5, #1, 1f
-   st1 {\o1\().16b}, [x0], #16
+   st1 {\o1\().16b}, [x19], #16
tbnzx5, #2, 1f
-   st1 {\o2\().16b}, [x0], #16
+   st1 {\o2\().16b}, [x19], #16
tbnzx5, #3, 1f
-   st1 {\o3\().16b}, [x0], #16
+   st1 {\o3\().16b}, [x19], #16
tbnzx5, #4, 1f
-   st1 {\o4\().16b}, [x0], #16
+   st1 {\o4\().16b}, [x19], #16
tbnzx5, #5, 1f
-   st1 {\o5\().16b}, [x0], #16
+   st1 {\o5\().16b}, [x19], #16
tbnzx5, #6, 1f
-   st1 {\o6\().16b}, [x0], #16
+   st1 {\o6\().16b}, [x19], #16
tbnzx5, #7, 1f
-   st1 {\o7\().16b}, [x0], #16
+   st1 {\o7\().16b}, [x19], #16
 
-   cbnzx4, 99b
+   cbz x23, 1f
+   yield_neon  99b
+   b   99b
 
-1: ldp x29, x30, [sp], #16
+1: ldp x19, x20, [sp, #16]
+   ldp x21, x22, [sp, #32]
+   ldr x23, [sp, #48]
+   ldp x29, x30, [sp], #64
ret
.endm
 
@@ -632,43 +646,53 @@ ENDPROC(aesbs_ecb_decrypt)
 */
.align  4
 ENTRY(aesbs_cbc_decrypt)
-   stp x29, x30, [sp, #-16]!
+   stp x29, x30, [sp, #-64]!
mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+   stp x23, x24, [sp, #48]
+
+   mov x19, x0
+   mov x20, x1
+   mov x21, x2
+   mov x22, x3
+   mov x23, x4
+   mov x24, x5
 
 99:mov x6, #1
-   lsl x6, x6, x4
-   subsw4, w4, #8
-   cselx4, x4, xzr, pl
+

[PATCH v2 14/19] crypto: arm64/aes-blk - yield after processing a fixed chunk of input

2017-12-04 Thread Ard Biesheuvel

Currently, the AES block code may keep preemption disabled for as long
as it takes to process each contigous chunk of input, which could be as
large as a page or skb, depending on the context.

For this code to be useable in RT context, it needs to operate on fixed
chunks of limited size. So let's add a yield after each 16 blocks (for
the CE case) or after every block (for the pure NEON case), which will
disable and re-enable kernel mode NEON if a reschedule is pending.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-ce.S|  17 +-
 arch/arm64/crypto/aes-modes.S | 379 +---
 arch/arm64/crypto/aes-neon.S  |   2 +
 3 files changed, 272 insertions(+), 126 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S
index 50330f5c3adc..ccb17b65005a 100644
--- a/arch/arm64/crypto/aes-ce.S
+++ b/arch/arm64/crypto/aes-ce.S
@@ -15,6 +15,8 @@
 #define AES_ENTRY(func)ENTRY(ce_ ## func)
 #define AES_ENDPROC(func)  ENDPROC(ce_ ## func)
 
+#define AES_YIELD_ORDER4
+
.arch   armv8-a+crypto
 
/* preload all round keys */
@@ -30,18 +32,21 @@
.endm
 
/* prepare for encryption with key in rk[] */
-   .macro  enc_prepare, rounds, rk, ignore
-   load_round_keys \rounds, \rk
+   .macro  enc_prepare, rounds, rk, temp
+   mov \temp, \rk
+   load_round_keys \rounds, \temp
.endm
 
/* prepare for encryption (again) but with new key in rk[] */
-   .macro  enc_switch_key, rounds, rk, ignore
-   load_round_keys \rounds, \rk
+   .macro  enc_switch_key, rounds, rk, temp
+   mov \temp, \rk
+   load_round_keys \rounds, \temp
.endm
 
/* prepare for decryption with key in rk[] */
-   .macro  dec_prepare, rounds, rk, ignore
-   load_round_keys \rounds, \rk
+   .macro  dec_prepare, rounds, rk, temp
+   mov \temp, \rk
+   load_round_keys \rounds, \temp
.endm
 
.macro  do_enc_Nx, de, mc, k, i0, i1, i2, i3
diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index a68412e1e3a4..6fcdf82fa295 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -14,12 +14,12 @@
.align  4
 
 aes_encrypt_block4x:
-   encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
+   encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
ret
 ENDPROC(aes_encrypt_block4x)
 
 aes_decrypt_block4x:
-   decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
+   decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7
ret
 ENDPROC(aes_decrypt_block4x)
 
@@ -31,57 +31,85 @@ ENDPROC(aes_decrypt_block4x)
 */
 
 AES_ENTRY(aes_ecb_encrypt)
-   stp x29, x30, [sp, #-16]!
+   stp x29, x30, [sp, #-64]!
mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+   str x23, [sp, #48]
 
-   enc_prepare w3, x2, x5
+   mov x19, x0
+   mov x20, x1
+   mov x21, x2
+   mov x22, x3
+   mov x23, x4
+
+.Lecbencrestart:
+   enc_prepare w22, x21, x5
 
 .LecbencloopNx:
-   subsw4, w4, #4
+   subsw23, w23, #4
bmi .Lecbenc1x
-   ld1 {v0.16b-v3.16b}, [x1], #64  /* get 4 pt blocks */
+   ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */
bl  aes_encrypt_block4x
-   st1 {v0.16b-v3.16b}, [x0], #64
+   st1 {v0.16b-v3.16b}, [x19], #64
+   yield_neon  .Lecbencrestart, w23, AES_YIELD_ORDER, 4, .LecbencloopNx
b   .LecbencloopNx
 .Lecbenc1x:
-   addsw4, w4, #4
+   addsw23, w23, #4
beq .Lecbencout
 .Lecbencloop:
-   ld1 {v0.16b}, [x1], #16 /* get next pt block */
-   encrypt_block   v0, w3, x2, x5, w6
-   st1 {v0.16b}, [x0], #16
-   subsw4, w4, #1
+   ld1 {v0.16b}, [x20], #16/* get next pt block */
+   encrypt_block   v0, w22, x21, x5, w6
+   st1 {v0.16b}, [x19], #16
+   subsw23, w23, #1
bne .Lecbencloop
 .Lecbencout:
-   ldp x29, x30, [sp], #16
+   ldp x19, x20, [sp, #16]
+   ldp x21, x22, [sp, #32]
+   ldr x23, [sp, #48]
+   ldp x29, x30, [sp], #64
ret
 AES_ENDPROC(aes_ecb_encrypt)
 
 
 AES_ENTRY(aes_ecb_decrypt)
-   stp x29, x30, [sp, #-16]!
+   stp x29, x30, [sp, #-64]!
mov x29, sp
+   stp x19, x20, [sp, #16]
+   stp x21, x22, [sp, #32]
+   str x23, [sp, #48]
+
+

[PATCH v2 13/19] crypto: arm64/sha2-ce - yield every 8 blocks of input

2017-12-04 Thread Ard Biesheuvel

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON every 8 blocks of input.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/sha2-ce-core.S | 40 ++--
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S
index 679c6c002f4f..d156b3ae967c 100644
--- a/arch/arm64/crypto/sha2-ce-core.S
+++ b/arch/arm64/crypto/sha2-ce-core.S
@@ -77,30 +77,39 @@
 *int blocks)
 */
 ENTRY(sha2_ce_transform)
+   stp x29, x30, [sp, #-48]!
+   mov x29, sp
+   stp x19, x20, [sp, #16]
+   str x21, [sp, #32]
+
+   mov x19, x0
+   mov x20, x1
+   mov x21, x2
+
/* load round constants */
-   adr x8, .Lsha2_rcon
+0: adr x8, .Lsha2_rcon
ld1 { v0.4s- v3.4s}, [x8], #64
ld1 { v4.4s- v7.4s}, [x8], #64
ld1 { v8.4s-v11.4s}, [x8], #64
ld1 {v12.4s-v15.4s}, [x8]
 
/* load state */
-   ld1 {dgav.4s, dgbv.4s}, [x0]
+   ld1 {dgav.4s, dgbv.4s}, [x19]
 
/* load sha256_ce_state::finalize */
ldr_l   w4, sha256_ce_offsetof_finalize, x4
-   ldr w4, [x0, x4]
+   ldr w4, [x19, x4]
 
/* load input */
-0: ld1 {v16.4s-v19.4s}, [x1], #64
-   sub w2, w2, #1
+1: ld1 {v16.4s-v19.4s}, [x20], #64
+   sub w21, w21, #1
 
 CPU_LE(rev32   v16.16b, v16.16b)
 CPU_LE(rev32   v17.16b, v17.16b)
 CPU_LE(rev32   v18.16b, v18.16b)
 CPU_LE(rev32   v19.16b, v19.16b)
 
-1: add t0.4s, v16.4s, v0.4s
+2: add t0.4s, v16.4s, v0.4s
mov dg0v.16b, dgav.16b
mov dg1v.16b, dgbv.16b
 
@@ -129,16 +138,22 @@ CPU_LE(   rev32   v19.16b, v19.16b)
add dgbv.4s, dgbv.4s, dg1v.4s
 
/* handled all input blocks? */
-   cbnzw2, 0b
+   cbz w21, 3f
+
+   yield_neon_pre  w21, 3, 1, 1b   // yield every 8 blocks
+   st1 {dgav.4s, dgbv.4s}, [x19]
+   yield_neon_post 0b
+
+   b   1b
 
/*
 * Final block: add padding and total bit count.
 * Skip if the input size was not a round multiple of the block size,
 * the padding is handled by the C code in that case.
 */
-   cbz x4, 3f
+3: cbz x4, 4f
ldr_l   w4, sha256_ce_offsetof_count, x4
-   ldr x4, [x0, x4]
+   ldr x4, [x19, x4]
moviv17.2d, #0
mov x8, #0x8000
moviv18.2d, #0
@@ -147,9 +162,12 @@ CPU_LE(rev32   v19.16b, v19.16b)
mov x4, #0
mov v19.d[0], xzr
mov v19.d[1], x7
-   b   1b
+   b   2b
 
/* store new state */
-3: st1 {dgav.4s, dgbv.4s}, [x0]
+4: st1 {dgav.4s, dgbv.4s}, [x19]
+   ldp x19, x20, [sp, #16]
+   ldr x21, [sp, #32]
+   ldp x29, x30, [sp], #48
ret
 ENDPROC(sha2_ce_transform)
-- 
2.11.0

[PATCH v2 12/19] crypto: arm64/sha1-ce - yield every 8 blocks of input

2017-12-04 Thread Ard Biesheuvel

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON every 8 blocks of input.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/sha1-ce-core.S | 45 ++--
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 8550408735a0..7ae0dd369e0a 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -70,31 +70,40 @@
 *int blocks)
 */
 ENTRY(sha1_ce_transform)
+   stp x29, x30, [sp, #-48]!
+   mov x29, sp
+   stp x19, x20, [sp, #16]
+   str x21, [sp, #32]
+
+   mov x19, x0
+   mov x20, x1
+   mov x21, x2
+
/* load round constants */
-   adr x6, .Lsha1_rcon
+0: adr x6, .Lsha1_rcon
ld1r{k0.4s}, [x6], #4
ld1r{k1.4s}, [x6], #4
ld1r{k2.4s}, [x6], #4
ld1r{k3.4s}, [x6]
 
/* load state */
-   ld1 {dgav.4s}, [x0]
-   ldr dgb, [x0, #16]
+   ld1 {dgav.4s}, [x19]
+   ldr dgb, [x19, #16]
 
/* load sha1_ce_state::finalize */
ldr_l   w4, sha1_ce_offsetof_finalize, x4
-   ldr w4, [x0, x4]
+   ldr w4, [x19, x4]
 
/* load input */
-0: ld1 {v8.4s-v11.4s}, [x1], #64
-   sub w2, w2, #1
+1: ld1 {v8.4s-v11.4s}, [x20], #64
+   sub w21, w21, #1
 
 CPU_LE(rev32   v8.16b, v8.16b  )
 CPU_LE(rev32   v9.16b, v9.16b  )
 CPU_LE(rev32   v10.16b, v10.16b)
 CPU_LE(rev32   v11.16b, v11.16b)
 
-1: add t0.4s, v8.4s, k0.4s
+2: add t0.4s, v8.4s, k0.4s
mov dg0v.16b, dgav.16b
 
add_update  c, ev, k0,  8,  9, 10, 11, dgb
@@ -125,16 +134,23 @@ CPU_LE(   rev32   v11.16b, v11.16b)
add dgbv.2s, dgbv.2s, dg1v.2s
add dgav.4s, dgav.4s, dg0v.4s
 
-   cbnzw2, 0b
+   cbz w21, 3f
+
+   yield_neon_pre  w21, 3, 1, 1b   // yield every 8 blocks
+   st1 {dgav.4s}, [x19]
+   str dgb, [x19, #16]
+   yield_neon_post 0b
+
+   b   1b
 
/*
 * Final block: add padding and total bit count.
 * Skip if the input size was not a round multiple of the block size,
 * the padding is handled by the C code in that case.
 */
-   cbz x4, 3f
+3: cbz x4, 4f
ldr_l   w4, sha1_ce_offsetof_count, x4
-   ldr x4, [x0, x4]
+   ldr x4, [x19, x4]
moviv9.2d, #0
mov x8, #0x8000
moviv10.2d, #0
@@ -143,10 +159,13 @@ CPU_LE(   rev32   v11.16b, v11.16b)
mov x4, #0
mov v11.d[0], xzr
mov v11.d[1], x7
-   b   1b
+   b   2b
 
/* store new state */
-3: st1 {dgav.4s}, [x0]
-   str dgb, [x0, #16]
+4: st1 {dgav.4s}, [x19]
+   str dgb, [x19, #16]
+   ldp x19, x20, [sp, #16]
+   ldr x21, [sp, #32]
+   ldp x29, x30, [sp], #48
ret
 ENDPROC(sha1_ce_transform)
-- 
2.11.0

[PATCH v2 03/19] crypto: arm64/aes-blk - move kernel mode neon en/disable into loop

2017-12-04 Thread Ard Biesheuvel

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Note that this requires some reshuffling of the registers in the asm
code, because the XTS routines can no longer rely on the registers to
retain their contents between invocations.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-glue.c| 95 ++--
 arch/arm64/crypto/aes-modes.S   | 90 +--
 arch/arm64/crypto/aes-neonbs-glue.c | 14 ++-
 3 files changed, 97 insertions(+), 102 deletions(-)

diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 998ba519a026..00a3e2fd6a48 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2");
 
 /* defined in aes-modes.S */
 asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[],
-   int rounds, int blocks, int first);
+   int rounds, int blocks);
 asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[],
-   int rounds, int blocks, int first);
+   int rounds, int blocks);
 
 asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[],
-   int rounds, int blocks, u8 iv[], int first);
+   int rounds, int blocks, u8 iv[]);
 asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[],
-   int rounds, int blocks, u8 iv[], int first);
+   int rounds, int blocks, u8 iv[]);
 
 asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
-   int rounds, int blocks, u8 ctr[], int first);
+   int rounds, int blocks, u8 ctr[]);
 
 asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[],
int rounds, int blocks, u8 const rk2[], u8 iv[],
@@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
-   int err, first, rounds = 6 + ctx->key_length / 4;
+   int err, rounds = 6 + ctx->key_length / 4;
struct skcipher_walk walk;
unsigned int blocks;
 
-   err = skcipher_walk_virt(&walk, req, true);
+   err = skcipher_walk_virt(&walk, req, false);
 
-   kernel_neon_begin();
-   for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+   while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
+   kernel_neon_begin();
aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-   (u8 *)ctx->key_enc, rounds, blocks, first);
+   (u8 *)ctx->key_enc, rounds, blocks);
+   kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
-   kernel_neon_end();
return err;
 }
 
@@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req)
 {
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
-   int err, first, rounds = 6 + ctx->key_length / 4;
+   int err, rounds = 6 + ctx->key_length / 4;
struct skcipher_walk walk;
unsigned int blocks;
 
-   err = skcipher_walk_virt(&walk, req, true);
+   err = skcipher_walk_virt(&walk, req, false);
 
-   kernel_neon_begin();
-   for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+   while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
+   kernel_neon_begin();
aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-   (u8 *)ctx->key_dec, rounds, blocks, first);
+   (u8 *)ctx->key_dec, rounds, blocks);
+   kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes

[PATCH v2 06/19] crypto: arm64/ghash - move kernel mode neon en/disable into loop

2017-12-04 Thread Ard Biesheuvel

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/ghash-ce-glue.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/crypto/ghash-ce-glue.c 
b/arch/arm64/crypto/ghash-ce-glue.c
index cfc9c92814fd..cb39503673d4 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -368,26 +368,28 @@ static int gcm_encrypt(struct aead_request *req)
pmull_gcm_encrypt_block(ks, iv, NULL,
num_rounds(&ctx->aes_key));
put_unaligned_be32(3, iv + GCM_IV_SIZE);
+   kernel_neon_end();
 
-   err = skcipher_walk_aead_encrypt(&walk, req, true);
+   err = skcipher_walk_aead_encrypt(&walk, req, false);
 
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
+   kernel_neon_begin();
pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr,
  walk.src.virt.addr, &ctx->ghash_key,
  iv, num_rounds(&ctx->aes_key), ks);
+   kernel_neon_end();
 
err = skcipher_walk_done(&walk,
 walk.nbytes % AES_BLOCK_SIZE);
}
-   kernel_neon_end();
} else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE);
 
-   err = skcipher_walk_aead_encrypt(&walk, req, true);
+   err = skcipher_walk_aead_encrypt(&walk, req, false);
 
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
@@ -467,15 +469,18 @@ static int gcm_decrypt(struct aead_request *req)
pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc,
num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE);
+   kernel_neon_end();
 
-   err = skcipher_walk_aead_decrypt(&walk, req, true);
+   err = skcipher_walk_aead_decrypt(&walk, req, false);
 
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
+   kernel_neon_begin();
pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr,
  walk.src.virt.addr, &ctx->ghash_key,
  iv, num_rounds(&ctx->aes_key));
+   kernel_neon_end();
 
err = skcipher_walk_done(&walk,
 walk.nbytes % AES_BLOCK_SIZE);
@@ -483,14 +488,12 @@ static int gcm_decrypt(struct aead_request *req)
if (walk.nbytes)
pmull_gcm_encrypt_block(iv, iv, NULL,
num_rounds(&ctx->aes_key));
-
-   kernel_neon_end();
} else {
__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
num_rounds(&ctx->aes_key));
put_unaligned_be32(2, iv + GCM_IV_SIZE);
 
-   err = skcipher_walk_aead_decrypt(&walk, req, true);
+   err = skcipher_walk_aead_decrypt(&walk, req, false);
 
while (walk.nbytes >= AES_BLOCK_SIZE) {
int blocks = walk.nbytes / AES_BLOCK_SIZE;
-- 
2.11.0

[PATCH v2 02/19] crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop

2017-12-04 Thread Ard Biesheuvel

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-ce-ccm-glue.c | 47 ++--
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c 
b/arch/arm64/crypto/aes-ce-ccm-glue.c
index a1254036f2b1..68b11aa690e4 100644
--- a/arch/arm64/crypto/aes-ce-ccm-glue.c
+++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
@@ -107,11 +107,13 @@ static int ccm_init_mac(struct aead_request *req, u8 
maciv[], u32 msglen)
 }
 
 static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[],
-  u32 abytes, u32 *macp, bool use_neon)
+  u32 abytes, u32 *macp)
 {
-   if (likely(use_neon)) {
+   if (may_use_simd()) {
+   kernel_neon_begin();
ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc,
 num_rounds(key));
+   kernel_neon_end();
} else {
if (*macp > 0 && *macp < AES_BLOCK_SIZE) {
int added = min(abytes, AES_BLOCK_SIZE - *macp);
@@ -143,8 +145,7 @@ static void ccm_update_mac(struct crypto_aes_ctx *key, u8 
mac[], u8 const in[],
}
 }
 
-static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[],
-  bool use_neon)
+static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
 {
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
@@ -163,7 +164,7 @@ static void ccm_calculate_auth_mac(struct aead_request 
*req, u8 mac[],
ltag.len = 6;
}
 
-   ccm_update_mac(ctx, mac, (u8 *)src);
 
do {
@@ -175,7 +176,7 @@ static void ccm_calculate_auth_mac(struct aead_request 
*req, u8 mac[],
n = scatterwalk_clamp(&walk, len);
}
p = scatterwalk_map(&walk);
-   ccm_update_mac(ctx, mac, p, n, &macp, use_neon);
+   ccm_update_mac(ctx, mac, p, n, &macp);
len -= n;
 
scatterwalk_unmap(p);
@@ -242,43 +243,42 @@ static int ccm_encrypt(struct aead_request *req)
u8 __aligned(8) mac[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE];
u32 len = req->cryptlen;
-   bool use_neon = may_use_simd();
int err;
 
err = ccm_init_mac(req, mac, len);
if (err)
return err;
 
-   if (likely(use_neon))
-   kernel_neon_begin();
-
if (req->assoclen)
-   ccm_calculate_auth_mac(req, mac, use_neon);
+   ccm_calculate_auth_mac(req, mac);
 
/* preserve the original iv for the final round */
memcpy(buf, req->iv, AES_BLOCK_SIZE);
 
err = skcipher_walk_aead_encrypt(&walk, req, true);
 
-   if (likely(use_neon)) {
+   if (may_use_simd()) {
while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE;
 
if (walk.nbytes == walk.total)
tail = 0;
 
+   kernel_neon_begin();
ce_aes_ccm_encrypt(walk.dst.virt.addr,
   walk.src.virt.addr,
   walk.nbytes - tail, ctx->key_enc,
   num_rounds(ctx), mac, walk.iv);
+   kernel_neon_end();
 
err = skcipher_walk_done(&walk, tail);
}
-   if (!err)
+   if (!err) {
+   kernel_neon_begin();
ce_aes_ccm_final(mac, buf, ctx->key_enc,
 num_rounds(ctx));
-
-   kernel_neon_end();
+   kernel_neon_end();
+   }
} else {

[PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT

2017-12-04 Thread Ard Biesheuvel

Add a support macro to conditionally yield the NEON (and thus the CPU)
that may be called from the assembler code. Given that especially the
instruction based accelerated crypto code may use very tight loops, add
some parametrization so that the TIF_NEED_RESCHED flag test is only
executed every so many loop iterations.

In some cases, yielding the NEON involves saving and restoring a non
trivial amount of context (especially in the CRC folding algorithms),
and so the macro is split into two, and the code in between is only
executed when the yield path is taken, allowing the contex to be preserved.
The second macro takes a label argument that marks the resume-from-yield
path, which should restore the preserved context again.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/assembler.h | 50 
 1 file changed, 50 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h 
b/arch/arm64/include/asm/assembler.h
index aef72d886677..917b026d3e00 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -512,4 +512,54 @@ alternative_else_nop_endif
 #endif
.endm
 
+/*
+ * yield_neon - check whether to yield to another runnable task from
+ * kernel mode NEON code (running with preemption disabled)
+ *
+ * - Check whether the preempt count is exactly 1, in which case disabling
+ *   preemption once will make the task preemptible. If this is not the case,
+ *   yielding is pointless.
+ * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable
+ *   kernel mode NEON (which will trigger a reschedule), and branch to the
+ *   yield fixup code at @lbl.
+ */
+   .macro  yield_neon, lbl:req, ctr, order, stride, loop
+   yield_neon_pre  \ctr, \order, \stride, \loop
+   yield_neon_post \lbl
+   .endm
+
+   .macro  yield_neon_pre, ctr, order=0, stride, loop=f
+#ifdef CONFIG_PREEMPT
+   /*
+* With some algorithms, it makes little sense to poll the
+* TIF_NEED_RESCHED flag after every iteration, so only perform
+* the check every 2^order strides.
+*/
+   .if \order > 1
+   .if (\stride & (\stride - 1)) != 0
+   .error  "stride should be a power of 2"
+   .endif
+   tst \ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1)
+   b.ne\loop
+   .endif
+
+   get_thread_info x0
+   ldr w1, [x0, #TSK_TI_PREEMPT]
+   ldr x0, [x0, #TSK_TI_FLAGS]
+   cmp w1, #1 // == PREEMPT_OFFSET
+   cselx0, x0, xzr, eq
+   tbnzx0, #TIF_NEED_RESCHED, f// needs rescheduling?
+:
+#endif
+   .subsection 1
+:
+   .endm
+
+   .macro  yield_neon_post, lbl:req
+   bl  kernel_neon_end
+   bl  kernel_neon_begin
+   b   \lbl
+   .previous
+   .endm
+
 #endif /* __ASM_ASSEMBLER_H */
-- 
2.11.0

[PATCH v2 07/19] crypto: arm64/aes-blk - remove configurable interleave

2017-12-04 Thread Ard Biesheuvel

The AES block mode implementation using Crypto Extensions or plain NEON
was written before real hardware existed, and so its interleave factor
was made build time configurable (as well as an option to instantiate
all interleaved sequences inline rather than as subroutines)

We ended up using INTERLEAVE=4 with inlining disabled for both flavors
of the core AES routines, so let's stick with that, and remove the option
to configure this at build time. This makes the code easier to modify,
which is nice now that we're adding yield support.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/Makefile|   3 -
 arch/arm64/crypto/aes-modes.S | 237 
 2 files changed, 40 insertions(+), 200 deletions(-)

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index b5edc5918c28..aaf4e9afd750 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -50,9 +50,6 @@ aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
 obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o
 aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o
 
-AFLAGS_aes-ce.o:= -DINTERLEAVE=4
-AFLAGS_aes-neon.o  := -DINTERLEAVE=4
-
 CFLAGS_aes-glue-ce.o   := -DUSE_V8_CRYPTO_EXTENSIONS
 
 $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index 65b273667b34..27a235b2ddee 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -13,44 +13,6 @@
.text
.align  4
 
-/*
- * There are several ways to instantiate this code:
- * - no interleave, all inline
- * - 2-way interleave, 2x calls out of line (-DINTERLEAVE=2)
- * - 2-way interleave, all inline (-DINTERLEAVE=2 -DINTERLEAVE_INLINE)
- * - 4-way interleave, 4x calls out of line (-DINTERLEAVE=4)
- * - 4-way interleave, all inline (-DINTERLEAVE=4 -DINTERLEAVE_INLINE)
- *
- * Macros imported by this code:
- * - enc_prepare   - setup NEON registers for encryption
- * - dec_prepare   - setup NEON registers for decryption
- * - enc_switch_key- change to new key after having prepared for encryption
- * - encrypt_block - encrypt a single block
- * - decrypt block - decrypt a single block
- * - encrypt_block2x   - encrypt 2 blocks in parallel (if INTERLEAVE == 2)
- * - decrypt_block2x   - decrypt 2 blocks in parallel (if INTERLEAVE == 2)
- * - encrypt_block4x   - encrypt 4 blocks in parallel (if INTERLEAVE == 4)
- * - decrypt_block4x   - decrypt 4 blocks in parallel (if INTERLEAVE == 4)
- */
-
-#if defined(INTERLEAVE) && !defined(INTERLEAVE_INLINE)
-#define FRAME_PUSH stp x29, x30, [sp,#-16]! ; mov x29, sp
-#define FRAME_POP  ldp x29, x30, [sp],#16
-
-#if INTERLEAVE == 2
-
-aes_encrypt_block2x:
-   encrypt_block2x v0, v1, w3, x2, x8, w7
-   ret
-ENDPROC(aes_encrypt_block2x)
-
-aes_decrypt_block2x:
-   decrypt_block2x v0, v1, w3, x2, x8, w7
-   ret
-ENDPROC(aes_decrypt_block2x)
-
-#elif INTERLEAVE == 4
-
 aes_encrypt_block4x:
encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
ret
@@ -61,48 +23,6 @@ aes_decrypt_block4x:
ret
 ENDPROC(aes_decrypt_block4x)
 
-#else
-#error INTERLEAVE should equal 2 or 4
-#endif
-
-   .macro  do_encrypt_block2x
-   bl  aes_encrypt_block2x
-   .endm
-
-   .macro  do_decrypt_block2x
-   bl  aes_decrypt_block2x
-   .endm
-
-   .macro  do_encrypt_block4x
-   bl  aes_encrypt_block4x
-   .endm
-
-   .macro  do_decrypt_block4x
-   bl  aes_decrypt_block4x
-   .endm
-
-#else
-#define FRAME_PUSH
-#define FRAME_POP
-
-   .macro  do_encrypt_block2x
-   encrypt_block2x v0, v1, w3, x2, x8, w7
-   .endm
-
-   .macro  do_decrypt_block2x
-   decrypt_block2x v0, v1, w3, x2, x8, w7
-   .endm
-
-   .macro  do_encrypt_block4x
-   encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
-   .endm
-
-   .macro  do_decrypt_block4x
-   decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7
-   .endm
-
-#endif
-
/*
 * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
 * int blocks)
@@ -111,28 +31,21 @@ ENDPROC(aes_decrypt_block4x)
 */
 
 AES_ENTRY(aes_ecb_encrypt)
-   FRAME_PUSH
+   stp x29, x30, [sp, #-16]!
+   mov x29, sp
 
enc_prepare w3, x2, x5
 
 .LecbencloopNx:
-#if INTERLEAVE >= 2
-   subsw4, w4, #INTERLEAVE
+   subsw4, w4, #4
bmi .Lecbenc1x
-#if INTERLEAVE == 2
-   ld1 {v0.16b-v1.16b}, [x1], #32  /* get 2 pt blocks */
-   do_encrypt_block2x
-   st1 {v0.16b-v1.16b}, [x0], #32
-#else
ld1 {v0.16b-v3.16b}, [x1], #64  /* get 4 pt blocks */
-   do_encrypt_block4x
+   bl  aes_encrypt_block4x
st1 {v0.16b-v3.16b}, [

[PATCH v2 05/19] crypto: arm64/chacha20 - move kernel mode neon en/disable into loop

2017-12-04 Thread Ard Biesheuvel

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/chacha20-neon-glue.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/crypto/chacha20-neon-glue.c 
b/arch/arm64/crypto/chacha20-neon-glue.c
index cbdb75d15cd0..727579c93ded 100644
--- a/arch/arm64/crypto/chacha20-neon-glue.c
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -37,12 +37,19 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 
*src,
u8 buf[CHACHA20_BLOCK_SIZE];
 
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+   kernel_neon_begin();
chacha20_4block_xor_neon(state, dst, src);
+   kernel_neon_end();
bytes -= CHACHA20_BLOCK_SIZE * 4;
src += CHACHA20_BLOCK_SIZE * 4;
dst += CHACHA20_BLOCK_SIZE * 4;
state[12] += 4;
}
+
+   if (!bytes)
+   return;
+
+   kernel_neon_begin();
while (bytes >= CHACHA20_BLOCK_SIZE) {
chacha20_block_xor_neon(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE;
@@ -55,6 +62,7 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 
*src,
chacha20_block_xor_neon(state, buf, buf);
memcpy(dst, buf, bytes);
}
+   kernel_neon_end();
 }
 
 static int chacha20_neon(struct skcipher_request *req)
@@ -68,11 +76,10 @@ static int chacha20_neon(struct skcipher_request *req)
if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
return crypto_chacha20_crypt(req);
 
-   err = skcipher_walk_virt(&walk, req, true);
+   err = skcipher_walk_virt(&walk, req, false);
 
crypto_chacha20_init(state, ctx, walk.iv);
 
-   kernel_neon_begin();
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
 
@@ -83,7 +90,6 @@ static int chacha20_neon(struct skcipher_request *req)
nbytes);
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
-   kernel_neon_end();
 
return err;
 }
-- 
2.11.0

[PATCH v2 04/19] crypto: arm64/aes-bs - move kernel mode neon en/disable into loop

2017-12-04 Thread Ard Biesheuvel

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/crypto/aes-neonbs-glue.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/crypto/aes-neonbs-glue.c 
b/arch/arm64/crypto/aes-neonbs-glue.c
index 9d823c77ec84..e7a95a566462 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -99,9 +99,8 @@ static int __ecb_crypt(struct skcipher_request *req,
struct skcipher_walk walk;
int err;
 
-   err = skcipher_walk_virt(&walk, req, true);
+   err = skcipher_walk_virt(&walk, req, false);
 
-   kernel_neon_begin();
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
@@ -109,12 +108,13 @@ static int __ecb_crypt(struct skcipher_request *req,
blocks = round_down(blocks,
walk.stride / AES_BLOCK_SIZE);
 
+   kernel_neon_begin();
fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk,
   ctx->rounds, blocks);
+   kernel_neon_end();
err = skcipher_walk_done(&walk,
 walk.nbytes - blocks * AES_BLOCK_SIZE);
}
-   kernel_neon_end();
 
return err;
 }
@@ -158,19 +158,19 @@ static int cbc_encrypt(struct skcipher_request *req)
struct skcipher_walk walk;
int err;
 
-   err = skcipher_walk_virt(&walk, req, true);
+   err = skcipher_walk_virt(&walk, req, false);
 
-   kernel_neon_begin();
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
/* fall back to the non-bitsliced NEON implementation */
+   kernel_neon_begin();
neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
 ctx->enc, ctx->key.rounds, blocks,
 walk.iv);
+   kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
}
-   kernel_neon_end();
return err;
 }
 
@@ -181,9 +181,8 @@ static int cbc_decrypt(struct skcipher_request *req)
struct skcipher_walk walk;
int err;
 
-   err = skcipher_walk_virt(&walk, req, true);
+   err = skcipher_walk_virt(&walk, req, false);
 
-   kernel_neon_begin();
while (walk.nbytes >= AES_BLOCK_SIZE) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
@@ -191,13 +190,14 @@ static int cbc_decrypt(struct skcipher_request *req)
blocks = round_down(blocks,
walk.stride / AES_BLOCK_SIZE);
 
+   kernel_neon_begin();
aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
  ctx->key.rk, ctx->key.rounds, blocks,
  walk.iv);
+   kernel_neon_end();
err = skcipher_walk_done(&walk,
 walk.nbytes - blocks * AES_BLOCK_SIZE);
}
-   kernel_neon_end();
 
return err;
 }
@@ -229,9 +229,8 @@ static int ctr_encrypt(struct skcipher_request *req)
u8 buf[AES_BLOCK_SIZE];
int err;
 
-   err = skcipher_walk_virt(&walk, req, true);
+   err = skcipher_walk_virt(&walk, req, false);
 
-   kernel_neon_begin();
while (walk.nbytes > 0) {
unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
u8 *final = (walk.total % AES_BLOCK_SIZE) ? buf : NULL;
@@ -242,8 +241,10 @@ static int ctr_encrypt(struct skcipher_request *req)
final = NULL;
}
 
+   kernel_neon_begin();
aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
  ctx->rk, ctx->rounds, blocks, walk.iv, final);
+   kernel_neon_e

[PATCH v2 01/19] crypto: testmgr - add a new test case for CRC-T10DIF

2017-12-04 Thread Ard Biesheuvel

In order to be able to test yield support under preempt, add a test
vector for CRC-T10DIF that is long enough to take multiple iterations
(and thus possible preemption between them) of the primary loop of the
accelerated x86 and arm64 implementations.

Signed-off-by: Ard Biesheuvel 
---
 crypto/testmgr.h | 259 
 1 file changed, 259 insertions(+)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index a714b6293959..0c849aec161d 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -1494,6 +1494,265 @@ static const struct hash_testvec 
crct10dif_tv_template[] = {
.digest = (u8 *)(u16 []){ 0x44c6 },
.np = 4,
.tap= { 1, 255, 57, 6 },
+   }, {
+   .plaintext ="\x6e\x05\x79\x10\xa7\x1b\xb2\x49"
+   "\xe0\x54\xeb\x82\x19\x8d\x24\xbb"
+   "\x2f\xc6\x5d\xf4\x68\xff\x96\x0a"
+   "\xa1\x38\xcf\x43\xda\x71\x08\x7c"
+   "\x13\xaa\x1e\xb5\x4c\xe3\x57\xee"
+   "\x85\x1c\x90\x27\xbe\x32\xc9\x60"
+   "\xf7\x6b\x02\x99\x0d\xa4\x3b\xd2"
+   "\x46\xdd\x74\x0b\x7f\x16\xad\x21"
+   "\xb8\x4f\xe6\x5a\xf1\x88\x1f\x93"
+   "\x2a\xc1\x35\xcc\x63\xfa\x6e\x05"
+   "\x9c\x10\xa7\x3e\xd5\x49\xe0\x77"
+   "\x0e\x82\x19\xb0\x24\xbb\x52\xe9"
+   "\x5d\xf4\x8b\x22\x96\x2d\xc4\x38"
+   "\xcf\x66\xfd\x71\x08\x9f\x13\xaa"
+   "\x41\xd8\x4c\xe3\x7a\x11\x85\x1c"
+   "\xb3\x27\xbe\x55\xec\x60\xf7\x8e"
+   "\x02\x99\x30\xc7\x3b\xd2\x69\x00"
+   "\x74\x0b\xa2\x16\xad\x44\xdb\x4f"
+   "\xe6\x7d\x14\x88\x1f\xb6\x2a\xc1"
+   "\x58\xef\x63\xfa\x91\x05\x9c\x33"
+   "\xca\x3e\xd5\x6c\x03\x77\x0e\xa5"
+   "\x19\xb0\x47\xde\x52\xe9\x80\x17"
+   "\x8b\x22\xb9\x2d\xc4\x5b\xf2\x66"
+   "\xfd\x94\x08\x9f\x36\xcd\x41\xd8"
+   "\x6f\x06\x7a\x11\xa8\x1c\xb3\x4a"
+   "\xe1\x55\xec\x83\x1a\x8e\x25\xbc"
+   "\x30\xc7\x5e\xf5\x69\x00\x97\x0b"
+   "\xa2\x39\xd0\x44\xdb\x72\x09\x7d"
+   "\x14\xab\x1f\xb6\x4d\xe4\x58\xef"
+   "\x86\x1d\x91\x28\xbf\x33\xca\x61"
+   "\xf8\x6c\x03\x9a\x0e\xa5\x3c\xd3"
+   "\x47\xde\x75\x0c\x80\x17\xae\x22"
+   "\xb9\x50\xe7\x5b\xf2\x89\x20\x94"
+   "\x2b\xc2\x36\xcd\x64\xfb\x6f\x06"
+   "\x9d\x11\xa8\x3f\xd6\x4a\xe1\x78"
+   "\x0f\x83\x1a\xb1\x25\xbc\x53\xea"
+   "\x5e\xf5\x8c\x00\x97\x2e\xc5\x39"
+   "\xd0\x67\xfe\x72\x09\xa0\x14\xab"
+   "\x42\xd9\x4d\xe4\x7b\x12\x86\x1d"
+   "\xb4\x28\xbf\x56\xed\x61\xf8\x8f"
+   "\x03\x9a\x31\xc8\x3c\xd3\x6a\x01"
+   "\x75\x0c\xa3\x17\xae\x45\xdc\x50"
+   "\xe7\x7e\x15\x89\x20\xb7\x2b\xc2"
+   "\x59\xf0\x64\xfb\x92\x06\x9d\x34"
+   "\xcb\x3f\xd6\x6d\x04\x78\x0f\xa6"
+   "\x1a\xb1\x48\xdf\x53\xea\x81\x18"
+   "\x8c\x23\xba\x2e\xc5\x5c\xf3\x67"
+   "\xfe\x95\x09\xa0\x37\xce\x42\xd9"
+   "\x70\x07\x7b\x12\xa9\x1d\xb4\x4b"
+   "\xe2\x56\xed\x84\x1b\x8f\x26\xbd"
+   "\x31\xc8\x5f\xf6\x6a\x01\x98\x0c"
+   "\xa3\x3a\xd1\x45\xdc\x73\x0a\x7e"
+   "\x15\xac\x20\xb7\x4e\xe5\x59\xf0"
+   "\x87\x1e\x92\x29\xc0\x34\xcb\x62"
+   "\xf9\x6d\x04\x9b\x0f\xa6\x3d\xd4"
+   "\x48\xdf\x76\x0d\x81\x18\xaf\x23"
+   "\xba\x51\xe8\x5c\xf3\x8a\x21\x95"
+   "\x2c\xc3\x37\xce\x65\xfc\x70\x07"
+   "\x9e\x12\xa9\x40\xd7\x4b\xe2\x79"
+   "\x10\x84\x1b\xb2\x26\xbd\x54\xeb"
+   "\x5f\xf6\x8d\x01\x98\x2f\xc6\x3a"
+   "\xd1\x68\xff\x73\x0a\xa1\x15\xac"
+   "\x43\xda\x4e\xe5\x7c\x13\x87\x1e"
+

[PATCH v2 00/19] crypto: arm64 - play nice with CONFIG_PREEMPT

2017-12-04 Thread Ard Biesheuvel

This is a followup 'crypto: arm64 - disable NEON across scatterwalk API
calls' sent out last Friday.

As reported by Sebastian, the way the arm64 NEON crypto code currently
keeps kernel mode NEON enabled across calls into skcipher_walk_xxx() is
causing problems with RT builds, given that the skcipher walk API may
allocate and free temporary buffers it uses to present the input and
output arrays to the crypto algorithm in blocksize sized chunks (where
blocksize is the natural blocksize of the crypto algorithm), and doing
so with NEON enabled means we're alloc/free'ing memory with preemption
disabled.

This was deliberate: when this code was introduced, each kernel_neon_begin()
and kernel_neon_end() call incurred a fixed penalty of storing resp.
loading the contents of all NEON registers to/from memory, and so doing
it less often had an obvious performance benefit. However, in the mean time,
we have refactored the core kernel mode NEON code, and now kernel_neon_begin()
only incurs this penalty the first time it is called after entering the kernel,
and the NEON register restore is deferred until returning to userland. This
means pulling those calls into the loops that iterate over the input/output
of the crypto algorithm is not a big deal anymore (although there are some
places in the code where we relied on the NEON registers retaining their
values between calls)

So let's clean this up for arm64: update the NEON based skcipher drivers to
no longer keep the NEON enabled when calling into the skcipher walk API.

As pointed out by Peter, this only solves part of the problem. So let's
tackle it more thoroughly, and update the algorithms to test the NEED_RESCHED
flag each time after processing a fixed chunk of input. An attempt was made
to align the different algorithms with regards to how much work such a fixed
chunk entails, i.e., yielding every block for an algorithm that operates on
16 byte blocks at < 1 cycles per byte seems rather pointless.

Changes since v1:
- add CRC-T10DIF test vector (#1)
- stop using GFP_ATOMIC in scatterwalk API calls, now that they are executed
  with preemption enabled (#2 - #6)
- do some preparatory refactoring on the AES block mode code (#7 - #9)
- add yield patches (#10 - #18)
- add test patch (#19) - DO NOT MERGE

Cc: Dave Martin 
Cc: Russell King - ARM Linux 
Cc: Sebastian Andrzej Siewior 
Cc: Mark Rutland 
Cc: linux-rt-us...@vger.kernel.org
Cc: Peter Zijlstra 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 

Ard Biesheuvel (19):
  crypto: testmgr - add a new test case for CRC-T10DIF
  crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop
  crypto: arm64/aes-blk - move kernel mode neon en/disable into loop
  crypto: arm64/aes-bs - move kernel mode neon en/disable into loop
  crypto: arm64/chacha20 - move kernel mode neon en/disable into loop
  crypto: arm64/ghash - move kernel mode neon en/disable into loop
  crypto: arm64/aes-blk - remove configurable interleave
  crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path
  crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path
  crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels
  arm64: assembler: add macro to conditionally yield the NEON under
PREEMPT
  crypto: arm64/sha1-ce - yield every 8 blocks of input
  crypto: arm64/sha2-ce - yield every 8 blocks of input
  crypto: arm64/aes-blk - yield after processing each 64 bytes of input
  crypto: arm64/aes-bs - yield after processing each 128 bytes of input
  crypto: arm64/aes-ghash - yield after processing fixed number of
blocks
  crypto: arm64/crc32-ce - yield NEON every 16 blocks of input
  crypto: arm64/crct10dif-ce - yield NEON every 8 blocks of input
  DO NOT MERGE

 arch/arm64/crypto/Makefile |   3 -
 arch/arm64/crypto/aes-ce-ccm-glue.c|  47 +-
 arch/arm64/crypto/aes-ce.S |  17 +-
 arch/arm64/crypto/aes-glue.c   |  95 ++-
 arch/arm64/crypto/aes-modes.S  | 624 ++--
 arch/arm64/crypto/aes-neon.S   |   2 +
 arch/arm64/crypto/aes-neonbs-core.S| 317 ++
 arch/arm64/crypto/aes-neonbs-glue.c|  48 +-
 arch/arm64/crypto/chacha20-neon-glue.c |  12 +-
 arch/arm64/crypto/crc32-ce-core.S  |  55 +-
 arch/arm64/crypto/crct10dif-ce-core.S  |  39 +-
 arch/arm64/crypto/ghash-ce-core.S  | 128 ++--
 arch/arm64/crypto/ghash-ce-glue.c  |  17 +-
 arch/arm64/crypto/sha1-ce-core.S   |  45 +-
 arch/arm64/crypto/sha2-ce-core.S   |  40 +-
 arch/arm64/crypto/sha256-glue.c|  36 +-
 arch/arm64/include/asm/assembler.h |  83 +++
 crypto/testmgr.h   | 259 
 18 files changed, 1231 insertions(+), 636 deletions(-)

-- 
2.11.0

[bug report] chcr: Add support for Inline IPSec

2017-12-04 Thread Dan Carpenter

Hello Atul Gupta,

The patch 6dad4e8ab3ec: "chcr: Add support for Inline IPSec" from Nov
16, 2017, leads to the following static checker warning:

drivers/crypto/chelsio/chcr_ipsec.c:431 copy_key_cpltx_pktxt()
warn: potential pointer math issue ('q->q.desc' is a 512 bit pointer)

drivers/crypto/chelsio/chcr_ipsec.c
   419  
   420  if (likely(len <= left)) {
   421  memcpy(key_ctx->key, sa_entry->key, key_len);
   422  pos += key_len;
   423  } else {
   424  if (key_len <= left) {
   425  memcpy(pos, sa_entry->key, key_len);
   426  pos += key_len;
   427  } else {
   428  memcpy(pos, sa_entry->key, left);
   429  memcpy(q->q.desc, sa_entry->key + left,
   430 key_len - left);
   431  pos = q->q.desc + (key_len - left);
  ^
This does look like a pointer math issue.  It should probably be:

pos = (u8 *)q->q.desc + (key_len - left);

But I can't test this.

   432  }
   433  }
   434  /* Copy CPL TX PKT XT */
   435  pos = copy_cpltx_pktxt(skb, dev, pos);

regards,
dan carpenter

Re: [PATCH 01/10] staging: ccree: remove inline qualifiers

2017-12-04 Thread Dan Carpenter

On Sun, Dec 03, 2017 at 01:58:12PM +, Gilad Ben-Yossef wrote:
> The ccree drivers was marking a lot of big functions in C file as
> static inline for no good reason. Remove the inline qualifier from
> any but the few truly single line functions.
> 

The compiler is free to ignore inline hints...  It probably would make
single line functions inline anyway.

regards,
dan carpenter

Re: [PATCH 00/10] staging: ccree: cleanups & fixes

2017-12-04 Thread Dan Carpenter

Looks good.  Thanks!

regards,
dan carpenter

Re: [PATCH 0/5] crypto: arm64 - disable NEON across scatterwalk API calls

2017-12-04 Thread Ard Biesheuvel

On 2 December 2017 at 13:59, Peter Zijlstra  wrote:
> On Sat, Dec 02, 2017 at 11:15:14AM +, Ard Biesheuvel wrote:
>> On 2 December 2017 at 09:11, Ard Biesheuvel  
>> wrote:
>
>> > They consume the entire input in a single go, yes. But making it more
>> > granular than that is going to hurt performance, unless we introduce
>> > some kind of kernel_neon_yield(), which does a end+begin but only if
>> > the task is being scheduled out.
>> >
>> > For example, the SHA256 keeps 256 bytes of round constants in NEON
>> > registers, and reloading those from memory for each 64 byte block of
>> > input is going to be noticeable. The same applies to the AES code
>> > (although the numbers are slightly different)
>>
>> Something like below should do the trick I think (apologies for the
>> patch soup). I.e., check TIF_NEED_RESCHED at a point where only very
>> few NEON registers are live, and preserve/restore the live registers
>> across calls to kernel_neon_end + kernel_neon_begin. Would that work
>> for RT?
>
> Probably yes. The important point is that preempt latencies (and thus by
> extension NEON regions) are bounded and preferably small.
>
> Unbounded stuff (like depends on the amount of data fed) are a complete
> no-no for RT since then you cannot make predictions on how long things
> will take.
>

OK, that makes sense. But I do wonder what the parameters should be here.

For instance, the AES instructions on ARMv8 operate at <1 cycle per
byte, and so checking the TIF_NEED_RESCHED flag for every iteration of
the inner loop (i.e., every 64 bytes ~ 64 cycles) is clearly going to
be noticeable, and is probably overkill. The pure NEON version (which
is instantiated from the same block mode wrappers) uses ~25 cycles per
byte, and the bit sliced NEON version runs at ~20 cycles per byte but
can only operate at 8 blocks (128 bytes) at a time.

So rather than simply polling the bit at each iteration of the inner
loop in each algorithm, I'd prefer to aim for a ballpark number of
cycles to execute, in the order 1000 - 2000. Would that be OK or too
coarse?

47 matches

Mail list logo