[PATCH 02/45] drivers: crypto: remove duplicate includes
These duplicate includes have been found with scripts/checkincludes.pl but they have been removed manually to avoid removing false positives. Signed-off-by: Pravin Shedge --- drivers/crypto/bcm/cipher.c | 1 - drivers/crypto/cavium/nitrox/nitrox_reqmgr.c | 1 - drivers/crypto/ccp/ccp-crypto-aes-galois.c | 1 - 3 files changed, 3 deletions(-) diff --git a/drivers/crypto/bcm/cipher.c b/drivers/crypto/bcm/cipher.c index ce70b44..2b75f95 100644 --- a/drivers/crypto/bcm/cipher.c +++ b/drivers/crypto/bcm/cipher.c @@ -42,7 +42,6 @@ #include #include #include -#include #include #include "util.h" diff --git a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c index 4addc23..deaefd5 100644 --- a/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c +++ b/drivers/crypto/cavium/nitrox/nitrox_reqmgr.c @@ -6,7 +6,6 @@ #include "nitrox_dev.h" #include "nitrox_req.h" #include "nitrox_csr.h" -#include "nitrox_req.h" /* SLC_STORE_INFO */ #define MIN_UDD_LEN 16 diff --git a/drivers/crypto/ccp/ccp-crypto-aes-galois.c b/drivers/crypto/ccp/ccp-crypto-aes-galois.c index ff02b71..ca1f0d7 100644 --- a/drivers/crypto/ccp/ccp-crypto-aes-galois.c +++ b/drivers/crypto/ccp/ccp-crypto-aes-galois.c @@ -21,7 +21,6 @@ #include #include #include -#include #include "ccp-crypto.h" -- 2.7.4
[Part2 PATCH v9 00/38] x86: Secure Encrypted Virtualization (AMD)
This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM changes required to create and manage SEV guests. SEV is an extension to the AMD-V architecture which supports running encrypted virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their pages (code and data) secured such that only the guest itself has access to unencrypted version. Each encrypted VM is associated with a unique encryption key; if its data is accessed to a different entity using a different key the encrypted guest's data will be incorrectly decrypted, leading to unintelligible data. This security model ensures that hypervisor will no longer able to inspect or alter any guest code or data. The key management of this feature is handled by a separate processor known as the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key Management Specification (see below) provides a set of commands which can be used by hypervisor to load virtual machine keys through the AMD-SP driver. The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The ioctl will be used by qemu to issue SEV guest-specific commands defined in Key Management Specification. The following links provide additional details: AMD Memory Encryption white paper: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf AMD64 Architecture Programmer's Manual: http://support.amd.com/TechDocs/24593.pdf SME is section 7.10 SEV is section 15.34 SEV Key Management: http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf KVM Forum Presentation: http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf SEV Guest BIOS support: SEV support has been add to EDKII/OVMF BIOS https://github.com/tianocore/edk2 -- The series applies on kvm/next commit : 4fbd8d194f06 (Linux 4.15-rc1) Complete tree is available at: repo: https://github.com/codomania/kvm.git branch: sev-v9-p2 TODO: * Add SEV guest migration command support Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: "Radim KrÄmář" Cc: Joerg Roedel Cc: Borislav Petkov Cc: Tom Lendacky Cc: Herbert Xu Cc: David S. Miller Cc: Gary Hook Cc: x...@kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: linux-crypto@vger.kernel.org Changes since v8: * Rebase the series to kvm/next branch * Update SEV asid allocation to limit the ASID between SEV_MIN_ASID to SEV_MAX_ASID (EPYC BIOS provide option to change the SEV_MIN_ASID -- which can be used to limit the number of SEV-enable guest) Changes since v7: * Rebase the series to kvm/next branch * move the FW error enum definition in include/uapi/linux/psp-sev.h so that both userspace and kernel can share it. * (ccp) drop cmd_buf arg from sev_platform_init() * (ccp) apply some cleanup/fixup from Boris * (ccp) add some comments in FACTORY_RESET command handling * (kvm) some fixup/cleanup from Boris * (kvm) acquire the kvm->lock when modifying the sev->regions_list Changes since v6: * (ccp): Extend psp_device structure to track the FW INIT and SHUTDOWN states. * (ccp): Init and Uninit SEV FW during module load/unload * (ccp): Avoid repeated k*alloc() for init and status command buffer * (kvm): Rework DBG command to fix the compilation warning seen with gcc7.x * (kvm): Convert the SEV doc in rst format Changes since v5: * split the PSP driver support into multiple patches * multiple improvements from Boris * remove mem_enc_enabled() ops Changes since v4: * Fixes to address kbuild robot errors * Add 'sev' module params to allow enable/disable SEV feature * Update documentation * Multiple fixes to address v4 feedbacks * Some coding style changes to address checkpatch reports Changes since v3: * Re-design the PSP interface support patch * Rename the ioctls based on the feedbacks * Improve documentation * Fix i386 build issues * Add LAUNCH_SECRET command * Add new Kconfig option to enable SEV support * Changes to address v3 feedbacks. Changes since v2: * Add KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioct to register encrypted memory ranges (recommend by Paolo) * Extend kvm_x86_ops to provide new memory_encryption_enabled ops * Enhance DEBUG DECRYPT/ENCRYPT commands to work with more than one page \ (recommended by Paolo) * Optimize LAUNCH_UPDATE command to reduce the number of calls to AMD-SP driver * Changes to address v2 feedbacks Borislav Petkov (1): crypto: ccp: Build the AMD secure processor driver only with AMD CPU support Brijesh Singh (34): Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV) KVM: SVM: Prepare to reserve asid for SEV guest KVM: X86: Extend CPUID range to include new leaf KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl crypto: ccp: Define SEV
[Part2 PATCH v9 15/38] crypto: ccp: Implement SEV_PLATFORM_STATUS ioctl command
The SEV_PLATFORM_STATUS command can be used by the platform owner to get the current status of the platform. The command is defined in SEV spec section 5.5. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Reviewed-by: Borislav Petkov Acked-by: Gary R Hook --- drivers/crypto/ccp/psp-dev.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index b49583a45a55..a5072b166ab8 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -274,6 +274,21 @@ static int sev_ioctl_do_reset(struct sev_issue_cmd *argp) return __sev_do_cmd_locked(SEV_CMD_FACTORY_RESET, 0, &argp->error); } +static int sev_ioctl_do_platform_status(struct sev_issue_cmd *argp) +{ + struct sev_user_data_status *data = &psp_master->status_cmd_buf; + int ret; + + ret = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, data, &argp->error); + if (ret) + return ret; + + if (copy_to_user((void __user *)argp->data, data, sizeof(*data))) + ret = -EFAULT; + + return ret; +} + static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) { void __user *argp = (void __user *)arg; @@ -299,6 +314,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) case SEV_FACTORY_RESET: ret = sev_ioctl_do_reset(&input); break; + case SEV_PLATFORM_STATUS: + ret = sev_ioctl_do_platform_status(&input); + break; default: ret = -EINVAL; goto out; -- 2.9.5
[Part2 PATCH v9 14/38] crypto: ccp: Implement SEV_FACTORY_RESET ioctl command
The SEV_FACTORY_RESET command can be used by the platform owner to reset the non-volatile SEV related data. The command is defined in SEV spec section 5.4 Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh --- drivers/crypto/ccp/psp-dev.c | 77 +++- 1 file changed, 76 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index 9915a6c604a3..b49583a45a55 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -232,9 +232,84 @@ static int sev_platform_shutdown(int *error) return rc; } +static int sev_get_platform_state(int *state, int *error) +{ + int rc; + + rc = __sev_do_cmd_locked(SEV_CMD_PLATFORM_STATUS, +&psp_master->status_cmd_buf, error); + if (rc) + return rc; + + *state = psp_master->status_cmd_buf.state; + return rc; +} + +static int sev_ioctl_do_reset(struct sev_issue_cmd *argp) +{ + int state, rc; + + /* +* The SEV spec requires that FACTORY_RESET must be issued in +* UNINIT state. Before we go further lets check if any guest is +* active. +* +* If FW is in WORKING state then deny the request otherwise issue +* SHUTDOWN command do INIT -> UNINIT before issuing the FACTORY_RESET. +* +*/ + rc = sev_get_platform_state(&state, &argp->error); + if (rc) + return rc; + + if (state == SEV_STATE_WORKING) + return -EBUSY; + + if (state == SEV_STATE_INIT) { + rc = __sev_platform_shutdown_locked(&argp->error); + if (rc) + return rc; + } + + return __sev_do_cmd_locked(SEV_CMD_FACTORY_RESET, 0, &argp->error); +} + static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) { - return -ENOTTY; + void __user *argp = (void __user *)arg; + struct sev_issue_cmd input; + int ret = -EFAULT; + + if (!psp_master) + return -ENODEV; + + if (ioctl != SEV_ISSUE_CMD) + return -EINVAL; + + if (copy_from_user(&input, argp, sizeof(struct sev_issue_cmd))) + return -EFAULT; + + if (input.cmd > SEV_MAX) + return -EINVAL; + + mutex_lock(&sev_cmd_mutex); + + switch (input.cmd) { + + case SEV_FACTORY_RESET: + ret = sev_ioctl_do_reset(&input); + break; + default: + ret = -EINVAL; + goto out; + } + + if (copy_to_user(argp, &input, sizeof(struct sev_issue_cmd))) + ret = -EFAULT; +out: + mutex_unlock(&sev_cmd_mutex); + + return ret; } static const struct file_operations sev_fops = { -- 2.9.5
[Part2 PATCH v9 12/38] crypto: ccp: Add Platform Security Processor (PSP) device support
The Platform Security Processor (PSP) is part of the AMD Secure Processor (AMD-SP) functionality. The PSP is a dedicated processor that provides support for key management commands in Secure Encrypted Virtualization (SEV) mode, along with software-based Trusted Execution Environment (TEE) to enable third-party trusted applications. Note that the key management functionality provided by the SEV firmware can be used outside of the kvm-amd driver hence it doesn't need to depend on CONFIG_KVM_AMD. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Reviewed-by: Borislav Petkov --- drivers/crypto/ccp/Kconfig | 11 + drivers/crypto/ccp/Makefile | 1 + drivers/crypto/ccp/psp-dev.c | 105 +++ drivers/crypto/ccp/psp-dev.h | 59 drivers/crypto/ccp/sp-dev.c | 26 +++ drivers/crypto/ccp/sp-dev.h | 24 +- drivers/crypto/ccp/sp-pci.c | 52 + 7 files changed, 277 insertions(+), 1 deletion(-) create mode 100644 drivers/crypto/ccp/psp-dev.c create mode 100644 drivers/crypto/ccp/psp-dev.h diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig index 9c84f9838931..b9dfae47aefd 100644 --- a/drivers/crypto/ccp/Kconfig +++ b/drivers/crypto/ccp/Kconfig @@ -33,3 +33,14 @@ config CRYPTO_DEV_CCP_CRYPTO Support for using the cryptographic API with the AMD Cryptographic Coprocessor. This module supports offload of SHA and AES algorithms. If you choose 'M' here, this module will be called ccp_crypto. + +config CRYPTO_DEV_SP_PSP + bool "Platform Security Processor (PSP) device" + default y + depends on CRYPTO_DEV_CCP_DD && X86_64 + help +Provide support for the AMD Platform Security Processor (PSP). +The PSP is a dedicated processor that provides support for key +management commands in Secure Encrypted Virtualization (SEV) mode, +along with software-based Trusted Execution Environment (TEE) to +enable third-party trusted applications. diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile index c4ce726b931e..51d1c0cf66c7 100644 --- a/drivers/crypto/ccp/Makefile +++ b/drivers/crypto/ccp/Makefile @@ -8,6 +8,7 @@ ccp-$(CONFIG_CRYPTO_DEV_SP_CCP) += ccp-dev.o \ ccp-dmaengine.o \ ccp-debugfs.o ccp-$(CONFIG_PCI) += sp-pci.o +ccp-$(CONFIG_CRYPTO_DEV_SP_PSP) += psp-dev.o obj-$(CONFIG_CRYPTO_DEV_CCP_CRYPTO) += ccp-crypto.o ccp-crypto-objs := ccp-crypto-main.o \ diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c new file mode 100644 index ..b5789f878560 --- /dev/null +++ b/drivers/crypto/ccp/psp-dev.c @@ -0,0 +1,105 @@ +/* + * AMD Platform Security Processor (PSP) interface + * + * Copyright (C) 2016-2017 Advanced Micro Devices, Inc. + * + * Author: Brijesh Singh + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "sp-dev.h" +#include "psp-dev.h" + +static struct psp_device *psp_alloc_struct(struct sp_device *sp) +{ + struct device *dev = sp->dev; + struct psp_device *psp; + + psp = devm_kzalloc(dev, sizeof(*psp), GFP_KERNEL); + if (!psp) + return NULL; + + psp->dev = dev; + psp->sp = sp; + + snprintf(psp->name, sizeof(psp->name), "psp-%u", sp->ord); + + return psp; +} + +static irqreturn_t psp_irq_handler(int irq, void *data) +{ + return IRQ_HANDLED; +} + +int psp_dev_init(struct sp_device *sp) +{ + struct device *dev = sp->dev; + struct psp_device *psp; + int ret; + + ret = -ENOMEM; + psp = psp_alloc_struct(sp); + if (!psp) + goto e_err; + + sp->psp_data = psp; + + psp->vdata = (struct psp_vdata *)sp->dev_vdata->psp_vdata; + if (!psp->vdata) { + ret = -ENODEV; + dev_err(dev, "missing driver data\n"); + goto e_err; + } + + psp->io_regs = sp->io_map + psp->vdata->offset; + + /* Disable and clear interrupts until ready */ + iowrite32(0, psp->io_regs + PSP_P2CMSG_INTEN); + iowrite32(-1, psp->io_regs + PSP_P2CMSG_INTSTS); + + /* Request an irq */ + ret = sp_request_psp_irq(psp->sp, psp_irq_handler, psp->name, psp); + if (ret) { + dev_err(dev, "psp: unable to allocate an IRQ\n"); + goto e_err; + } + + if (sp->set_psp_master_device) + sp->set_psp_master_device(sp); + +
[Part2 PATCH v9 13/38] crypto: ccp: Add Secure Encrypted Virtualization (SEV) command support
AMD's new Secure Encrypted Virtualization (SEV) feature allows the memory contents of virtual machines to be transparently encrypted with a key unique to the VM. The programming and management of the encryption keys are handled by the AMD Secure Processor (AMD-SP) which exposes the commands for these tasks. The complete spec is available at: http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf Extend the AMD-SP driver to provide the following support: - an in-kernel API to communicate with the SEV firmware. The API can be used by the hypervisor to create encryption context for a SEV guest. - a userspace IOCTL to manage the platform certificates. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh --- drivers/crypto/ccp/psp-dev.c | 344 +++ drivers/crypto/ccp/psp-dev.h | 24 +++ drivers/crypto/ccp/sp-dev.c | 9 ++ drivers/crypto/ccp/sp-dev.h | 4 + include/linux/psp-sev.h | 137 + 5 files changed, 518 insertions(+) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index b5789f878560..9915a6c604a3 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -26,6 +26,12 @@ #include "sp-dev.h" #include "psp-dev.h" +#define DEVICE_NAME"sev" + +static DEFINE_MUTEX(sev_cmd_mutex); +static struct sev_misc_dev *misc_dev; +static struct psp_device *psp_master; + static struct psp_device *psp_alloc_struct(struct sp_device *sp) { struct device *dev = sp->dev; @@ -45,9 +51,285 @@ static struct psp_device *psp_alloc_struct(struct sp_device *sp) static irqreturn_t psp_irq_handler(int irq, void *data) { + struct psp_device *psp = data; + unsigned int status; + int reg; + + /* Read the interrupt status: */ + status = ioread32(psp->io_regs + PSP_P2CMSG_INTSTS); + + /* Check if it is command completion: */ + if (!(status & BIT(PSP_CMD_COMPLETE_REG))) + goto done; + + /* Check if it is SEV command completion: */ + reg = ioread32(psp->io_regs + PSP_CMDRESP); + if (reg & PSP_CMDRESP_RESP) { + psp->sev_int_rcvd = 1; + wake_up(&psp->sev_int_queue); + } + +done: + /* Clear the interrupt status by writing the same value we read. */ + iowrite32(status, psp->io_regs + PSP_P2CMSG_INTSTS); + return IRQ_HANDLED; } +static void sev_wait_cmd_ioc(struct psp_device *psp, unsigned int *reg) +{ + psp->sev_int_rcvd = 0; + + wait_event(psp->sev_int_queue, psp->sev_int_rcvd); + *reg = ioread32(psp->io_regs + PSP_CMDRESP); +} + +static int sev_cmd_buffer_len(int cmd) +{ + switch (cmd) { + case SEV_CMD_INIT: return sizeof(struct sev_data_init); + case SEV_CMD_PLATFORM_STATUS: return sizeof(struct sev_user_data_status); + case SEV_CMD_PEK_CSR: return sizeof(struct sev_data_pek_csr); + case SEV_CMD_PEK_CERT_IMPORT: return sizeof(struct sev_data_pek_cert_import); + case SEV_CMD_PDH_CERT_EXPORT: return sizeof(struct sev_data_pdh_cert_export); + case SEV_CMD_LAUNCH_START: return sizeof(struct sev_data_launch_start); + case SEV_CMD_LAUNCH_UPDATE_DATA:return sizeof(struct sev_data_launch_update_data); + case SEV_CMD_LAUNCH_UPDATE_VMSA:return sizeof(struct sev_data_launch_update_vmsa); + case SEV_CMD_LAUNCH_FINISH: return sizeof(struct sev_data_launch_finish); + case SEV_CMD_LAUNCH_MEASURE:return sizeof(struct sev_data_launch_measure); + case SEV_CMD_ACTIVATE: return sizeof(struct sev_data_activate); + case SEV_CMD_DEACTIVATE:return sizeof(struct sev_data_deactivate); + case SEV_CMD_DECOMMISSION: return sizeof(struct sev_data_decommission); + case SEV_CMD_GUEST_STATUS: return sizeof(struct sev_data_guest_status); + case SEV_CMD_DBG_DECRYPT: return sizeof(struct sev_data_dbg); + case SEV_CMD_DBG_ENCRYPT: return sizeof(struct sev_data_dbg); + case SEV_CMD_SEND_START:return sizeof(struct sev_data_send_start); + case SEV_CMD_SEND_UPDATE_DATA: return sizeof(struct sev_data_send_update_data); + case SEV_CMD_SEND_UPDATE_VMSA: return sizeof(struct sev_data_send_update_vmsa); + case SEV_CMD_SEND_FINISH: return sizeof(struct sev_data_send_finish); + case SEV_CMD_RECEIVE_START: return sizeof(struct sev_data_receive_start); + case SEV_CMD_RECEIVE_FINISH:return sizeof(struct sev_data_receive_finish); + case SEV_CMD_R
[Part2 PATCH v9 10/38] crypto: ccp: Define SEV userspace ioctl and command id
Add a include file which defines the ioctl and command id used for issuing SEV platform management specific commands. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Reviewed-by: Borislav Petkov Acked-by: Gary R Hook --- include/uapi/linux/psp-sev.h | 142 +++ 1 file changed, 142 insertions(+) create mode 100644 include/uapi/linux/psp-sev.h diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h new file mode 100644 index ..3d77fe91239a --- /dev/null +++ b/include/uapi/linux/psp-sev.h @@ -0,0 +1,142 @@ +/* + * Userspace interface for AMD Secure Encrypted Virtualization (SEV) + * platform management commands. + * + * Copyright (C) 2016-2017 Advanced Micro Devices, Inc. + * + * Author: Brijesh Singh + * + * SEV spec 0.14 is available at: + * http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef __PSP_SEV_USER_H__ +#define __PSP_SEV_USER_H__ + +#include + +/** + * SEV platform commands + */ +enum { + SEV_FACTORY_RESET = 0, + SEV_PLATFORM_STATUS, + SEV_PEK_GEN, + SEV_PEK_CSR, + SEV_PDH_GEN, + SEV_PDH_CERT_EXPORT, + SEV_PEK_CERT_IMPORT, + + SEV_MAX, +}; + +/** + * SEV Firmware status code + */ +typedef enum { + SEV_RET_SUCCESS = 0, + SEV_RET_INVALID_PLATFORM_STATE, + SEV_RET_INVALID_GUEST_STATE, + SEV_RET_INAVLID_CONFIG, + SEV_RET_INVALID_len, + SEV_RET_ALREADY_OWNED, + SEV_RET_INVALID_CERTIFICATE, + SEV_RET_POLICY_FAILURE, + SEV_RET_INACTIVE, + SEV_RET_INVALID_ADDRESS, + SEV_RET_BAD_SIGNATURE, + SEV_RET_BAD_MEASUREMENT, + SEV_RET_ASID_OWNED, + SEV_RET_INVALID_ASID, + SEV_RET_WBINVD_REQUIRED, + SEV_RET_DFFLUSH_REQUIRED, + SEV_RET_INVALID_GUEST, + SEV_RET_INVALID_COMMAND, + SEV_RET_ACTIVE, + SEV_RET_HWSEV_RET_PLATFORM, + SEV_RET_HWSEV_RET_UNSAFE, + SEV_RET_UNSUPPORTED, + SEV_RET_MAX, +} sev_ret_code; + +/** + * struct sev_user_data_status - PLATFORM_STATUS command parameters + * + * @major: major API version + * @minor: minor API version + * @state: platform state + * @flags: platform config flags + * @build: firmware build id for API version + * @guest_count: number of active guests + */ +struct sev_user_data_status { + __u8 api_major; /* Out */ + __u8 api_minor; /* Out */ + __u8 state; /* Out */ + __u32 flags;/* Out */ + __u8 build; /* Out */ + __u32 guest_count; /* Out */ +} __packed; + +/** + * struct sev_user_data_pek_csr - PEK_CSR command parameters + * + * @address: PEK certificate chain + * @length: length of certificate + */ +struct sev_user_data_pek_csr { + __u64 address; /* In */ + __u32 length; /* In/Out */ +} __packed; + +/** + * struct sev_user_data_cert_import - PEK_CERT_IMPORT command parameters + * + * @pek_address: PEK certificate chain + * @pek_len: length of PEK certificate + * @oca_address: OCA certificate chain + * @oca_len: length of OCA certificate + */ +struct sev_user_data_pek_cert_import { + __u64 pek_cert_address; /* In */ + __u32 pek_cert_len; /* In */ + __u64 oca_cert_address; /* In */ + __u32 oca_cert_len; /* In */ +} __packed; + +/** + * struct sev_user_data_pdh_cert_export - PDH_CERT_EXPORT command parameters + * + * @pdh_address: PDH certificate address + * @pdh_len: length of PDH certificate + * @cert_chain_address: PDH certificate chain + * @cert_chain_len: length of PDH certificate chain + */ +struct sev_user_data_pdh_cert_export { + __u64 pdh_cert_address; /* In */ + __u32 pdh_cert_len; /* In/Out */ + __u64 cert_chain_address; /* In */ + __u32 cert_chain_len; /* In/Out */ +} __packed; + +/** + * struct sev_issue_cmd - SEV ioctl parameters + * + * @cmd: SEV commands to execute + * @opaque: pointer to the command structure + * @error: SEV FW return code on failure + */ +struct sev_issue_cmd { + __u32 cmd; /* In */ + __u64 data; /* In */ + __u32 error;/* Out */ +} __packed; + +#define SEV_IOC_TYPE 'S' +#define SEV_ISSUE_CMD _IOWR(SEV_IOC_TYPE,
[Part2 PATCH v9 17/38] crypto: ccp: Implement SEV_PDH_GEN ioctl command
The SEV_PDH_GEN command is used to re-generate the Platform Diffie-Hellman (PDH) key. The command is defined in SEV spec section 5.6. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Brijesh Singh Reviewed-by: Borislav Petkov Acked-by: Gary R Hook --- drivers/crypto/ccp/psp-dev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index 8aa8036023e0..fd3daf0a1176 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -333,6 +333,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) case SEV_PEK_GEN: ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PEK_GEN, &input); break; + case SEV_PDH_GEN: + ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PDH_GEN, &input); + break; default: ret = -EINVAL; goto out; -- 2.9.5
[Part2 PATCH v9 19/38] crypto: ccp: Implement SEV_PEK_CERT_IMPORT ioctl command
The SEV_PEK_CERT_IMPORT command can be used to import the signed PEK certificate. The command is defined in SEV spec section 5.8. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Acked-by: Gary R Hook Reviewed-by: Borislav Petkov --- drivers/crypto/ccp/psp-dev.c | 81 include/linux/psp-sev.h | 4 +++ 2 files changed, 85 insertions(+) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index c3906bbdb69b..9d1c4600db19 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -365,6 +365,84 @@ static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp) return ret; } +void *psp_copy_user_blob(u64 __user uaddr, u32 len) +{ + void *data; + + if (!uaddr || !len) + return ERR_PTR(-EINVAL); + + /* verify that blob length does not exceed our limit */ + if (len > SEV_FW_BLOB_MAX_SIZE) + return ERR_PTR(-EINVAL); + + data = kmalloc(len, GFP_KERNEL); + if (!data) + return ERR_PTR(-ENOMEM); + + if (copy_from_user(data, (void __user *)(uintptr_t)uaddr, len)) + goto e_free; + + return data; + +e_free: + kfree(data); + return ERR_PTR(-EFAULT); +} +EXPORT_SYMBOL_GPL(psp_copy_user_blob); + +static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp) +{ + struct sev_user_data_pek_cert_import input; + struct sev_data_pek_cert_import *data; + void *pek_blob, *oca_blob; + int ret; + + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) + return -EFAULT; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + /* copy PEK certificate blobs from userspace */ + pek_blob = psp_copy_user_blob(input.pek_cert_address, input.pek_cert_len); + if (IS_ERR(pek_blob)) { + ret = PTR_ERR(pek_blob); + goto e_free; + } + + data->pek_cert_address = __psp_pa(pek_blob); + data->pek_cert_len = input.pek_cert_len; + + /* copy PEK certificate blobs from userspace */ + oca_blob = psp_copy_user_blob(input.oca_cert_address, input.oca_cert_len); + if (IS_ERR(oca_blob)) { + ret = PTR_ERR(oca_blob); + goto e_free_pek; + } + + data->oca_cert_address = __psp_pa(oca_blob); + data->oca_cert_len = input.oca_cert_len; + + /* If platform is not in INIT state then transition it to INIT */ + if (psp_master->sev_state != SEV_STATE_INIT) { + ret = __sev_platform_init_locked(&argp->error); + if (ret) + goto e_free_oca; + } + + ret = __sev_do_cmd_locked(SEV_CMD_PEK_CERT_IMPORT, data, &argp->error); + +e_free_oca: + kfree(oca_blob); +e_free_pek: + kfree(pek_blob); +e_free: + kfree(data); + return ret; +} + static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) { void __user *argp = (void __user *)arg; @@ -402,6 +480,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) case SEV_PEK_CSR: ret = sev_ioctl_do_pek_csr(&input); break; + case SEV_PEK_CERT_IMPORT: + ret = sev_ioctl_do_pek_import(&input); + break; default: ret = -EINVAL; goto out; diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h index 0b6dd306d88b..93addfa34061 100644 --- a/include/linux/psp-sev.h +++ b/include/linux/psp-sev.h @@ -576,6 +576,8 @@ int sev_guest_df_flush(int *error); */ int sev_guest_decommission(struct sev_data_decommission *data, int *error); +void *psp_copy_user_blob(u64 __user uaddr, u32 len); + #else /* !CONFIG_CRYPTO_DEV_SP_PSP */ static inline int @@ -597,6 +599,8 @@ static inline int sev_guest_df_flush(int *error) { return -ENODEV; } static inline int sev_issue_cmd_external_user(struct file *filep, unsigned int id, void *data, int *error) { return -ENODEV; } +static inline void *psp_copy_user_blob(u64 __user uaddr, u32 len) { return ERR_PTR(-EINVAL); } + #endif /* CONFIG_CRYPTO_DEV_SP_PSP */ #endif /* __PSP_SEV_H__ */ -- 2.9.5
[Part2 PATCH v9 18/38] crypto: ccp: Implement SEV_PEK_CSR ioctl command
The SEV_PEK_CSR command can be used to generate a PEK certificate signing request. The command is defined in SEV spec section 5.7. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Acked-by: Gary R Hook --- drivers/crypto/ccp/psp-dev.c | 66 1 file changed, 66 insertions(+) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index fd3daf0a1176..c3906bbdb69b 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -302,6 +302,69 @@ static int sev_ioctl_do_pek_pdh_gen(int cmd, struct sev_issue_cmd *argp) return __sev_do_cmd_locked(cmd, 0, &argp->error); } +static int sev_ioctl_do_pek_csr(struct sev_issue_cmd *argp) +{ + struct sev_user_data_pek_csr input; + struct sev_data_pek_csr *data; + void *blob = NULL; + int ret; + + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) + return -EFAULT; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + /* userspace wants to query CSR length */ + if (!input.address || !input.length) + goto cmd; + + /* allocate a physically contiguous buffer to store the CSR blob */ + if (!access_ok(VERIFY_WRITE, input.address, input.length) || + input.length > SEV_FW_BLOB_MAX_SIZE) { + ret = -EFAULT; + goto e_free; + } + + blob = kmalloc(input.length, GFP_KERNEL); + if (!blob) { + ret = -ENOMEM; + goto e_free; + } + + data->address = __psp_pa(blob); + data->len = input.length; + +cmd: + if (psp_master->sev_state == SEV_STATE_UNINIT) { + ret = __sev_platform_init_locked(&argp->error); + if (ret) + goto e_free_blob; + } + + ret = __sev_do_cmd_locked(SEV_CMD_PEK_CSR, data, &argp->error); + +/* If we query the CSR length, FW responded with expected data. */ + input.length = data->len; + + if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { + ret = -EFAULT; + goto e_free_blob; + } + + if (blob) { + if (copy_to_user((void __user *)input.address, blob, input.length)) + ret = -EFAULT; + } + +e_free_blob: + kfree(blob); +e_free: + kfree(data); + return ret; +} + static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) { void __user *argp = (void __user *)arg; @@ -336,6 +399,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) case SEV_PDH_GEN: ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PDH_GEN, &input); break; + case SEV_PEK_CSR: + ret = sev_ioctl_do_pek_csr(&input); + break; default: ret = -EINVAL; goto out; -- 2.9.5
[Part2 PATCH v9 09/38] crypto: ccp: Build the AMD secure processor driver only with AMD CPU support
From: Borislav Petkov This is AMD-specific hardware so present it in Kconfig only when AMD CPU support is enabled or on ARM64 where it is also used. Signed-off-by: Borislav Petkov Signed-off-by: Brijesh Singh Reviewed-by: Gary R Hook Cc: Brijesh Singh Cc: Tom Lendacky Cc: Gary Hook Cc: Herbert Xu Cc: "David S. Miller" Cc: linux-crypto@vger.kernel.org --- drivers/crypto/ccp/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig index 6d626606b9c5..9c84f9838931 100644 --- a/drivers/crypto/ccp/Kconfig +++ b/drivers/crypto/ccp/Kconfig @@ -1,5 +1,6 @@ config CRYPTO_DEV_CCP_DD tristate "Secure Processor device driver" + depends on CPU_SUP_AMD || ARM64 default m help Provides AMD Secure Processor device driver. -- 2.9.5
[Part2 PATCH v9 20/38] crypto: ccp: Implement SEV_PDH_CERT_EXPORT ioctl command
The SEV_PDH_CERT_EXPORT command can be used to export the PDH and its certificate chain. The command is defined in SEV spec section 5.10. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Acked-by: Gary R Hook --- drivers/crypto/ccp/psp-dev.c | 97 1 file changed, 97 insertions(+) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index 9d1c4600db19..fcfa5b1eae61 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -443,6 +443,100 @@ static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp) return ret; } +static int sev_ioctl_do_pdh_export(struct sev_issue_cmd *argp) +{ + struct sev_user_data_pdh_cert_export input; + void *pdh_blob = NULL, *cert_blob = NULL; + struct sev_data_pdh_cert_export *data; + int ret; + + if (copy_from_user(&input, (void __user *)argp->data, sizeof(input))) + return -EFAULT; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + /* Userspace wants to query the certificate length. */ + if (!input.pdh_cert_address || + !input.pdh_cert_len || + !input.cert_chain_address) + goto cmd; + + /* Allocate a physically contiguous buffer to store the PDH blob. */ + if ((input.pdh_cert_len > SEV_FW_BLOB_MAX_SIZE) || + !access_ok(VERIFY_WRITE, input.pdh_cert_address, input.pdh_cert_len)) { + ret = -EFAULT; + goto e_free; + } + + /* Allocate a physically contiguous buffer to store the cert chain blob. */ + if ((input.cert_chain_len > SEV_FW_BLOB_MAX_SIZE) || + !access_ok(VERIFY_WRITE, input.cert_chain_address, input.cert_chain_len)) { + ret = -EFAULT; + goto e_free; + } + + pdh_blob = kmalloc(input.pdh_cert_len, GFP_KERNEL); + if (!pdh_blob) { + ret = -ENOMEM; + goto e_free; + } + + data->pdh_cert_address = __psp_pa(pdh_blob); + data->pdh_cert_len = input.pdh_cert_len; + + cert_blob = kmalloc(input.cert_chain_len, GFP_KERNEL); + if (!cert_blob) { + ret = -ENOMEM; + goto e_free_pdh; + } + + data->cert_chain_address = __psp_pa(cert_blob); + data->cert_chain_len = input.cert_chain_len; + +cmd: + /* If platform is not in INIT state then transition it to INIT. */ + if (psp_master->sev_state != SEV_STATE_INIT) { + ret = __sev_platform_init_locked(&argp->error); + if (ret) + goto e_free_cert; + } + + ret = __sev_do_cmd_locked(SEV_CMD_PDH_CERT_EXPORT, data, &argp->error); + + /* If we query the length, FW responded with expected data. */ + input.cert_chain_len = data->cert_chain_len; + input.pdh_cert_len = data->pdh_cert_len; + + if (copy_to_user((void __user *)argp->data, &input, sizeof(input))) { + ret = -EFAULT; + goto e_free_cert; + } + + if (pdh_blob) { + if (copy_to_user((void __user *)input.pdh_cert_address, +pdh_blob, input.pdh_cert_len)) { + ret = -EFAULT; + goto e_free_cert; + } + } + + if (cert_blob) { + if (copy_to_user((void __user *)input.cert_chain_address, +cert_blob, input.cert_chain_len)) + ret = -EFAULT; + } + +e_free_cert: + kfree(cert_blob); +e_free_pdh: + kfree(pdh_blob); +e_free: + kfree(data); + return ret; +} + static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) { void __user *argp = (void __user *)arg; @@ -483,6 +577,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) case SEV_PEK_CERT_IMPORT: ret = sev_ioctl_do_pek_import(&input); break; + case SEV_PDH_CERT_EXPORT: + ret = sev_ioctl_do_pdh_export(&input); + break; default: ret = -EINVAL; goto out; -- 2.9.5
[Part2 PATCH v9 16/38] crypto: ccp: Implement SEV_PEK_GEN ioctl command
The SEV_PEK_GEN command is used to generate a new Platform Endorsement Key (PEK). The command is defined in SEV spec section 5.6. Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Reviewed-by: Borislav Petkov Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Acked-by: Gary R Hook --- drivers/crypto/ccp/psp-dev.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c index a5072b166ab8..8aa8036023e0 100644 --- a/drivers/crypto/ccp/psp-dev.c +++ b/drivers/crypto/ccp/psp-dev.c @@ -289,6 +289,19 @@ static int sev_ioctl_do_platform_status(struct sev_issue_cmd *argp) return ret; } +static int sev_ioctl_do_pek_pdh_gen(int cmd, struct sev_issue_cmd *argp) +{ + int rc; + + if (psp_master->sev_state == SEV_STATE_UNINIT) { + rc = __sev_platform_init_locked(&argp->error); + if (rc) + return rc; + } + + return __sev_do_cmd_locked(cmd, 0, &argp->error); +} + static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) { void __user *argp = (void __user *)arg; @@ -317,6 +330,9 @@ static long sev_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) case SEV_PLATFORM_STATUS: ret = sev_ioctl_do_platform_status(&input); break; + case SEV_PEK_GEN: + ret = sev_ioctl_do_pek_pdh_gen(SEV_CMD_PEK_GEN, &input); + break; default: ret = -EINVAL; goto out; -- 2.9.5
[Part2 PATCH v9 11/38] crypto: ccp: Define SEV key management command id
Define Secure Encrypted Virtualization (SEV) key management command id and structure. The command definition is available in SEV KM spec 0.14 (http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf) Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: Borislav Petkov Cc: Herbert Xu Cc: Gary Hook Cc: Tom Lendacky Cc: linux-crypto@vger.kernel.org Cc: k...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Improvements-by: Borislav Petkov Signed-off-by: Brijesh Singh Reviewed-by: Borislav Petkov Acked-by: Gary R Hook --- include/linux/psp-sev.h | 465 1 file changed, 465 insertions(+) create mode 100644 include/linux/psp-sev.h diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h new file mode 100644 index ..4a150d17d537 --- /dev/null +++ b/include/linux/psp-sev.h @@ -0,0 +1,465 @@ +/* + * AMD Secure Encrypted Virtualization (SEV) driver interface + * + * Copyright (C) 2016-2017 Advanced Micro Devices, Inc. + * + * Author: Brijesh Singh + * + * SEV spec 0.14 is available at: + * http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef __PSP_SEV_H__ +#define __PSP_SEV_H__ + +#include + +#ifdef CONFIG_X86 +#include + +#define __psp_pa(x)__sme_pa(x) +#else +#define __psp_pa(x)__pa(x) +#endif + +#define SEV_FW_BLOB_MAX_SIZE 0x4000 /* 16KB */ + +/** + * SEV platform state + */ +enum sev_state { + SEV_STATE_UNINIT= 0x0, + SEV_STATE_INIT = 0x1, + SEV_STATE_WORKING = 0x2, + + SEV_STATE_MAX +}; + +/** + * SEV platform and guest management commands + */ +enum sev_cmd { + /* platform commands */ + SEV_CMD_INIT= 0x001, + SEV_CMD_SHUTDOWN= 0x002, + SEV_CMD_FACTORY_RESET = 0x003, + SEV_CMD_PLATFORM_STATUS = 0x004, + SEV_CMD_PEK_GEN = 0x005, + SEV_CMD_PEK_CSR = 0x006, + SEV_CMD_PEK_CERT_IMPORT = 0x007, + SEV_CMD_PDH_CERT_EXPORT = 0x008, + SEV_CMD_PDH_GEN = 0x009, + SEV_CMD_DF_FLUSH= 0x00A, + + /* Guest commands */ + SEV_CMD_DECOMMISSION= 0x020, + SEV_CMD_ACTIVATE= 0x021, + SEV_CMD_DEACTIVATE = 0x022, + SEV_CMD_GUEST_STATUS= 0x023, + + /* Guest launch commands */ + SEV_CMD_LAUNCH_START= 0x030, + SEV_CMD_LAUNCH_UPDATE_DATA = 0x031, + SEV_CMD_LAUNCH_UPDATE_VMSA = 0x032, + SEV_CMD_LAUNCH_MEASURE = 0x033, + SEV_CMD_LAUNCH_UPDATE_SECRET= 0x034, + SEV_CMD_LAUNCH_FINISH = 0x035, + + /* Guest migration commands (outgoing) */ + SEV_CMD_SEND_START = 0x040, + SEV_CMD_SEND_UPDATE_DATA= 0x041, + SEV_CMD_SEND_UPDATE_VMSA= 0x042, + SEV_CMD_SEND_FINISH = 0x043, + + /* Guest migration commands (incoming) */ + SEV_CMD_RECEIVE_START = 0x050, + SEV_CMD_RECEIVE_UPDATE_DATA = 0x051, + SEV_CMD_RECEIVE_UPDATE_VMSA = 0x052, + SEV_CMD_RECEIVE_FINISH = 0x053, + + /* Guest debug commands */ + SEV_CMD_DBG_DECRYPT = 0x060, + SEV_CMD_DBG_ENCRYPT = 0x061, + + SEV_CMD_MAX, +}; + +/** + * struct sev_data_init - INIT command parameters + * + * @flags: processing flags + * @tmr_address: system physical address used for SEV-ES + * @tmr_len: len of tmr_address + */ +struct sev_data_init { + u32 flags; /* In */ + u32 reserved; /* In */ + u64 tmr_address;/* In */ + u32 tmr_len;/* In */ +} __packed; + +/** + * struct sev_data_pek_csr - PEK_CSR command parameters + * + * @address: PEK certificate chain + * @len: len of certificate + */ +struct sev_data_pek_csr { + u64 address;/* In */ + u32 len;/* In/Out */ +} __packed; + +/** + * struct sev_data_cert_import - PEK_CERT_IMPORT command parameters + * + * @pek_address: PEK certificate chain + * @pek_len: len of PEK certificate + * @oca_address: OCA certificate chain + * @oca_len: len of OCA certificate + */ +struct sev_data_pek_cert_import { + u64 pek_cert_address; /* In */ + u32 pek_cert_len; /* In */ + u32 reserved; /* In */ + u64 oca_cert_address; /* In */ + u32 oca_cert_len; /* In */ +} __packed; + +/** + * struct sev_data_pdh_cert_export - PDH_CERT_EXPORT command parameters + * + * @pdh_address: PD
Re: [PATCH v3 1/3] dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings
On Mon, Dec 04, 2017 at 01:53:49PM +0100, Łukasz Stelmach wrote: > Add binding documentation for the True Random Number Generator > found on Samsung Exynos 5250+ SoCs. > > Signed-off-by: Łukasz Stelmach > --- > .../devicetree/bindings/rng/samsung,exynos5250-trng.txt | 17 > + > 1 file changed, 17 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt I acked v1 (and so did Krzysztof). You added them in v2, but dropped here?
Re: [PATCH] treewide: remove duplicate includes
Hello, On Mon, Dec 04, 2017 at 03:19:39AM +0530, Pravin Shedge wrote: > diff --git a/drivers/thermal/of-thermal.c b/drivers/thermal/of-thermal.c > index d04ec3b..e09f035 100644 > --- a/drivers/thermal/of-thermal.c > +++ b/drivers/thermal/of-thermal.c > @@ -30,7 +30,6 @@ > #include > #include > #include > -#include > > #include "thermal_core.h" > No issues with this but, Please send a separate patch to linux...@vger.kernel.org and copy edube...@gmail.com thanks, -- All the best, Eduardo Valentin
Re: [PATCH] treewide: remove duplicate includes
On Mon, Dec 04, 2017 at 03:19:39AM +0530, Pravin Shedge wrote: > These duplicate includes have been found with scripts/checkincludes.pl but > they have been removed manually to avoid removing false positives. > > Unit Testing: > > - build successful > - LTP testsuite passes. > - checkpatch.pl passes > > Signed-off-by: Pravin Shedge > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c > index 9c42c4e..ab3aef2 100644 > --- a/fs/xfs/scrub/scrub.c > +++ b/fs/xfs/scrub/scrub.c These look reasonable, but please send me (and linux-xfs) the three xfs changes separately so that I can add them to the xfs tree. (Also, thank you for cc'ing the xfs list for this treewide change...) --D
Re: [PATCH v3 2/3] hwrng: exynos - add Samsung Exynos True RNG driver
On Mon, Dec 4, 2017 at 1:53 PM, Łukasz Stelmach wrote: > Add support for True Random Number Generator found in Samsung Exynos > 5250+ SoCs. > > Signed-off-by: Łukasz Stelmach > --- > MAINTAINERS | 7 + > drivers/char/hw_random/Kconfig | 12 ++ > drivers/char/hw_random/Makefile | 1 + > drivers/char/hw_random/exynos-trng.c | 245 > +++ > 4 files changed, 265 insertions(+) > create mode 100644 drivers/char/hw_random/exynos-trng.c Reviewed-by: Krzysztof Kozlowski Best regards, Krzysztof
Re: [PATCH v3 1/3] dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings
On Mon, Dec 4, 2017 at 1:53 PM, Łukasz Stelmach wrote: > Add binding documentation for the True Random Number Generator > found on Samsung Exynos 5250+ SoCs. > > Signed-off-by: Łukasz Stelmach > --- > .../devicetree/bindings/rng/samsung,exynos5250-trng.txt | 17 > + > 1 file changed, 17 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt > > diff --git > a/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt > b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt > new file mode 100644 > index ..5a613a4ec780 > --- /dev/null > +++ b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt > @@ -0,0 +1,17 @@ > +Exynos True Random Number Generator > + > +Required properties: > + > +- compatible : Should be "samsung,exynos5250-trng". > +- reg : Specifies base physical address and size of the registers > map. > +- clocks : Phandle to clock-controller plus clock-specifier pair. > +- clock-names : "secss" as a clock name. > + > +Example: > + > + rng@10830600 { > + compatible = "samsung,exynos5250-trng"; > + reg = <0x10830600 0x100>; > + clocks = <&clock CLK_SSS>; > + clock-names = "secss"; > + }; > -- > 2.11.0 Mine and Rob's tags disappeared and I think you did not introduce any major changes here, right? Best regards, Krzysztof
[PATCH v3 1/3] dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings
Add binding documentation for the True Random Number Generator found on Samsung Exynos 5250+ SoCs. Signed-off-by: Łukasz Stelmach --- .../devicetree/bindings/rng/samsung,exynos5250-trng.txt | 17 + 1 file changed, 17 insertions(+) create mode 100644 Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt diff --git a/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt new file mode 100644 index ..5a613a4ec780 --- /dev/null +++ b/Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt @@ -0,0 +1,17 @@ +Exynos True Random Number Generator + +Required properties: + +- compatible : Should be "samsung,exynos5250-trng". +- reg : Specifies base physical address and size of the registers map. +- clocks : Phandle to clock-controller plus clock-specifier pair. +- clock-names : "secss" as a clock name. + +Example: + + rng@10830600 { + compatible = "samsung,exynos5250-trng"; + reg = <0x10830600 0x100>; + clocks = <&clock CLK_SSS>; + clock-names = "secss"; + }; -- 2.11.0
[PATCH v3 0/3] True RNG driver for Samsung Exynos 5250+ SoCs
Hello. The following patches add support for the true random number generator found in Samsung Exynos 5250+ SoCs. Patch #1 adds documentation for devicetree bindings. Patch #2 introduces the driver and appropriate changes in Makefile and Kconfig. Patch #3 adds nodes in devicetree files for Exynos SoCs (requires https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git/commit/?id=cdd745c8c76b02471d88f467c44a3d4eb431aa0b). Changes in v3: - Changed node-name in device-tree bindings according to from Krzysztof Koz��owski's recommendation. - Fixed name and added EXYNOS_ in EXYNOS_TRNG_CTRL_RNGEN - Removed unnecessary label and simplifed the abnormal exit path in exynos_trng_probe() - Replaced __raw_{readl,writel}() with {readl,writel}_relaxed() (thanks PrasannaKumar Muralidharan) Changes in v2: - Fixed indentation in drivers/char/hw_random/Kconfig. - Defined TRNG_CTRL_RGNEN. - Removed global variable exynos_trng_dev. - Removed exynos_trng_{set,get}_reg() functions. - Used the min_t() macro instead of the ternary operator in exynos_trng_do_read(). - Moved trng initialisation to the variable declaration in exynos_trng_init(). - Fixed comment formating. - Removed unnecessary "TODO" comments. - Return ENOMEM, if devm_kzalloc() devm_kstrdup() fail. - Rephrased and unified error messages in exynos_trng_probe(). - Removed nullification of trng->mem. - Added err_pm_get label at the end of exynos_trng_probe(). - Removed double error message at the end of exynos_trng_probe(). - Implemented exynos_trng_remove(). v2 available here: https://www.spinics.net/lists/linux-samsung-soc/msg61280.html https://patchwork.kernel.org/patch/10076225/ https://patchwork.kernel.org/patch/10076227/ https://patchwork.kernel.org/patch/10076237/ v1 can be found: https://www.spinics.net/lists/linux-samsung-soc/msg61253.html https://patchwork.kernel.org/patch/10072967/ https://patchwork.kernel.org/patch/10072971/ https://patchwork.kernel.org/patch/10072963/ ��ukasz Stelmach (3): dt-bindings: hwrng: Add Samsung Exynos 5250+ True RNG bindings hwrng: exynos - add Samsung Exynos True RNG driver ARM: dts: exynos: Add nodes for True Random Number Generator .../bindings/rng/samsung,exynos5250-trng.txt | 17 ++ MAINTAINERS| 7 + arch/arm/boot/dts/exynos5.dtsi | 5 + arch/arm/boot/dts/exynos5250.dtsi | 5 + arch/arm/boot/dts/exynos5410.dtsi | 5 + arch/arm/boot/dts/exynos5420.dtsi | 5 + drivers/char/hw_random/Kconfig | 12 + drivers/char/hw_random/Makefile| 1 + drivers/char/hw_random/exynos-trng.c | 245 + 9 files changed, 302 insertions(+) create mode 100644 Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt create mode 100644 drivers/char/hw_random/exynos-trng.c -- 2.11.0
[PATCH v3 2/3] hwrng: exynos - add Samsung Exynos True RNG driver
Add support for True Random Number Generator found in Samsung Exynos 5250+ SoCs. Signed-off-by: Łukasz Stelmach --- MAINTAINERS | 7 + drivers/char/hw_random/Kconfig | 12 ++ drivers/char/hw_random/Makefile | 1 + drivers/char/hw_random/exynos-trng.c | 245 +++ 4 files changed, 265 insertions(+) create mode 100644 drivers/char/hw_random/exynos-trng.c diff --git a/MAINTAINERS b/MAINTAINERS index 2811a211632c..992074cca612 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11780,6 +11780,13 @@ S: Maintained F: drivers/crypto/exynos-rng.c F: Documentation/devicetree/bindings/rng/samsung,exynos-rng4.txt +SAMSUNG EXYNOS TRUE RANDOM NUMBER GENERATOR (TRNG) DRIVER +M: Łukasz Stelmach +L: linux-samsung-...@vger.kernel.org +S: Maintained +F: drivers/char/hw_random/exynos-trng.c +F: Documentation/devicetree/bindings/rng/samsung,exynos5250-trng.txt + SAMSUNG FRAMEBUFFER DRIVER M: Jingoo Han L: linux-fb...@vger.kernel.org diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig index 95a031e9eced..292e6b36d493 100644 --- a/drivers/char/hw_random/Kconfig +++ b/drivers/char/hw_random/Kconfig @@ -449,6 +449,18 @@ config HW_RANDOM_S390 If unsure, say Y. +config HW_RANDOM_EXYNOS + tristate "Samsung Exynos True Random Number Generator support" + depends on ARCH_EXYNOS || COMPILE_TEST + default HW_RANDOM + ---help--- + This driver provides support for the True Random Number + Generator available in Exynos SoCs. + + To compile this driver as a module, choose M here: the module + will be called exynos-trng. + + If unsure, say Y. endif # HW_RANDOM config UML_RANDOM diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile index f3728d008fff..5595df97da7a 100644 --- a/drivers/char/hw_random/Makefile +++ b/drivers/char/hw_random/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_HW_RANDOM_GEODE) += geode-rng.o obj-$(CONFIG_HW_RANDOM_N2RNG) += n2-rng.o n2-rng-y := n2-drv.o n2-asm.o obj-$(CONFIG_HW_RANDOM_VIA) += via-rng.o +obj-$(CONFIG_HW_RANDOM_EXYNOS) += exynos-trng.o obj-$(CONFIG_HW_RANDOM_IXP4XX) += ixp4xx-rng.o obj-$(CONFIG_HW_RANDOM_OMAP) += omap-rng.o obj-$(CONFIG_HW_RANDOM_OMAP3_ROM) += omap3-rom-rng.o diff --git a/drivers/char/hw_random/exynos-trng.c b/drivers/char/hw_random/exynos-trng.c new file mode 100644 index ..971d2fe9d55a --- /dev/null +++ b/drivers/char/hw_random/exynos-trng.c @@ -0,0 +1,245 @@ +/* + * RNG driver for Exynos TRNGs + * + * Author: Łukasz Stelmach + * + * Copyright 2017 (c) Samsung Electronics Software, Inc. + * + * Based on the Exynos PRNG driver drivers/crypto/exynos-rng by + * Krzysztof Kozłowski + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define EXYNOS_TRNG_CLKDIV (0x0) + +#define EXYNOS_TRNG_CTRL (0x20) +#define EXYNOS_TRNG_CTRL_RNGEN BIT(31) + +#define EXYNOS_TRNG_POST_CTRL (0x30) +#define EXYNOS_TRNG_ONLINE_CTRL(0x40) +#define EXYNOS_TRNG_ONLINE_STAT(0x44) +#define EXYNOS_TRNG_ONLINE_MAXCHI2 (0x48) +#define EXYNOS_TRNG_FIFO_CTRL (0x50) +#define EXYNOS_TRNG_FIFO_0 (0x80) +#define EXYNOS_TRNG_FIFO_1 (0x84) +#define EXYNOS_TRNG_FIFO_2 (0x88) +#define EXYNOS_TRNG_FIFO_3 (0x8c) +#define EXYNOS_TRNG_FIFO_4 (0x90) +#define EXYNOS_TRNG_FIFO_5 (0x94) +#define EXYNOS_TRNG_FIFO_6 (0x98) +#define EXYNOS_TRNG_FIFO_7 (0x9c) +#define EXYNOS_TRNG_FIFO_LEN (8) +#define EXYNOS_TRNG_CLOCK_RATE (50) + + +struct exynos_trng_dev { + struct device*dev; + void __iomem *mem; + struct clk *clk; + struct hwrng rng; +}; + +static int exynos_trng_do_read(struct hwrng *rng, void *data, size_t max, + bool wait) +{ + struct exynos_trng_dev *trng; + u32 val; + + max = min_t(size_t, max, (EXYNOS_TRNG_FIFO_LEN * 4)); + + trng = (struct exynos_trng_dev *)rng->priv; + + writel_relaxed(max * 8, trng->mem + EXYNOS_TRNG_FIFO_CTRL); + val = readl_poll_timeout(trng->mem + EXYNOS_TRNG_FIFO_CTRL, val, +val == 0, 200, 100); + if (val < 0) + return val; + + memcpy_fromio(data, trng->mem + EXYNOS_TRNG_FIFO_0, max); + + return max; +} + +static int exynos_trng_init(struct hwrng *rng) +{ +
[PATCH v3 3/3] ARM: dts: exynos: Add nodes for True Random Number Generator
Add nodes for the True Random Number Generator found in Samsung Exynos 5250+ SoCs. Signed-off-by: Łukasz Stelmach --- arch/arm/boot/dts/exynos5.dtsi| 5 + arch/arm/boot/dts/exynos5250.dtsi | 5 + arch/arm/boot/dts/exynos5410.dtsi | 5 + arch/arm/boot/dts/exynos5420.dtsi | 5 + 4 files changed, 20 insertions(+) diff --git a/arch/arm/boot/dts/exynos5.dtsi b/arch/arm/boot/dts/exynos5.dtsi index 33f929c1dda9..e0c91ff4442c 100644 --- a/arch/arm/boot/dts/exynos5.dtsi +++ b/arch/arm/boot/dts/exynos5.dtsi @@ -215,5 +215,10 @@ compatible = "samsung,exynos5250-prng"; reg = <0x10830400 0x200>; }; + + trng: rng@10830600 { + compatible = "samsung,exynos5250-trng"; + reg = <0x10830600 0x100>; + }; }; }; diff --git a/arch/arm/boot/dts/exynos5250.dtsi b/arch/arm/boot/dts/exynos5250.dtsi index 51aa83ba8c87..38627e8164a0 100644 --- a/arch/arm/boot/dts/exynos5250.dtsi +++ b/arch/arm/boot/dts/exynos5250.dtsi @@ -1086,4 +1086,9 @@ clock-names = "secss"; }; +&trng { + clocks = <&clock CLK_SSS>; + clock-names = "secss"; +}; + #include "exynos5250-pinctrl.dtsi" diff --git a/arch/arm/boot/dts/exynos5410.dtsi b/arch/arm/boot/dts/exynos5410.dtsi index 1604cb1b837d..aa8b14eda662 100644 --- a/arch/arm/boot/dts/exynos5410.dtsi +++ b/arch/arm/boot/dts/exynos5410.dtsi @@ -384,6 +384,11 @@ 3 0 0x0700 0x2>; }; +&trng { + clocks = <&clock CLK_SSS>; + clock-names = "secss"; +}; + &usbdrd3_0 { clocks = <&clock CLK_USBD300>; clock-names = "usbdrd30"; diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi index 31c77ea9123d..6c8cec9d564a 100644 --- a/arch/arm/boot/dts/exynos5420.dtsi +++ b/arch/arm/boot/dts/exynos5420.dtsi @@ -1459,6 +1459,11 @@ clock-names = "secss"; }; +&trng { + clocks = <&clock CLK_SSS>; + clock-names = "secss"; +}; + &usbdrd3_0 { clocks = <&clock CLK_USBD300>; clock-names = "usbdrd30"; -- 2.11.0
[PATCH v2 18/19] crypto: arm64/crct10dif-ce - yield NEON every 8 blocks of input
Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 8 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crct10dif-ce-core.S | 39 ++-- 1 file changed, 35 insertions(+), 4 deletions(-) diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S index d5b5a8c038c8..d57067e80bae 100644 --- a/arch/arm64/crypto/crct10dif-ce-core.S +++ b/arch/arm64/crypto/crct10dif-ce-core.S @@ -74,13 +74,22 @@ .text .cpugeneric+crypto - arg1_low32 .reqw0 - arg2.reqx1 - arg3.reqx2 + arg1_low32 .reqw19 + arg2.reqx20 + arg3.reqx21 vzr .reqv13 ENTRY(crc_t10dif_pmull) + stp x29, x30, [sp, #-176]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + mov arg1_low32, w0 + mov arg2, x1 + mov arg3, x2 + movivzr.16b, #0 // init zero register // adjust the 16-bit initial_crc value, scale it to 32 bits @@ -175,8 +184,27 @@ CPU_LE(ext v12.16b, v12.16b, v12.16b, #8 ) subsarg3, arg3, #128 // check if there is another 64B in the buffer to be able to fold - b.ge_fold_64_B_loop + b.lt_fold_64_B_end + + yield_neon_pre arg3, 3, 128, _fold_64_B_loop // yield every 8 blocks + stp q0, q1, [sp, #48] + stp q2, q3, [sp, #80] + stp q4, q5, [sp, #112] + stp q6, q7, [sp, #144] + yield_neon_post 2f + b _fold_64_B_loop + + .subsection 1 +2: ldp q0, q1, [sp, #48] + ldp q2, q3, [sp, #80] + ldp q4, q5, [sp, #112] + ldp q6, q7, [sp, #144] + ldr q10, rk3 + movivzr.16b, #0 // init zero register + b _fold_64_B_loop + .previous +_fold_64_B_end: // at this point, the buffer pointer is pointing at the last y Bytes // of the buffer the 64B of folded data is in 4 of the vector // registers: v0, v1, v2, v3 @@ -304,6 +332,9 @@ _barrett: _cleanup: // scale the result back to 16 bits lsr x0, x0, #16 + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x29, x30, [sp], #176 ret _less_than_128: -- 2.11.0
[PATCH v2 09/19] crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path
CBC MAC is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC MAC which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 23 ++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index e86535a1329d..a68412e1e3a4 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -395,8 +395,28 @@ AES_ENDPROC(aes_xts_decrypt) AES_ENTRY(aes_mac_update) ld1 {v0.16b}, [x4] /* get dg */ enc_prepare w2, x1, x7 - cbnzw5, .Lmacenc + cbz w5, .Lmacloop4x + encrypt_block v0, w2, x1, x7, w8 + +.Lmacloop4x: + subsw3, w3, #4 + bmi .Lmac1x + ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */ + eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v2.16b + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v3.16b + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v4.16b + cmp w3, wzr + csinv x5, x6, xzr, eq + cbz w5, .Lmacout + encrypt_block v0, w2, x1, x7, w8 + b .Lmacloop4x +.Lmac1x: + add w3, w3, #4 .Lmacloop: cbz w3, .Lmacout ld1 {v1.16b}, [x0], #16 /* get next pt block */ @@ -406,7 +426,6 @@ AES_ENTRY(aes_mac_update) csinv x5, x6, xzr, eq cbz w5, .Lmacout -.Lmacenc: encrypt_block v0, w2, x1, x7, w8 b .Lmacloop -- 2.11.0
[PATCH v2 19/19] DO NOT MERGE
Test code to force a kernel_neon_end+begin sequence at every yield point, and wipe the entire NEON state before resuming the algorithm. --- arch/arm64/include/asm/assembler.h | 33 1 file changed, 33 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 917b026d3e00..dfee20246592 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -549,6 +549,7 @@ alternative_else_nop_endif cmp w1, #1 // == PREEMPT_OFFSET cselx0, x0, xzr, eq tbnzx0, #TIF_NEED_RESCHED, f// needs rescheduling? + b f : #endif .subsection 1 @@ -558,6 +559,38 @@ alternative_else_nop_endif .macro yield_neon_post, lbl:req bl kernel_neon_end bl kernel_neon_begin + moviv0.16b, #0x55 + moviv1.16b, #0x55 + moviv2.16b, #0x55 + moviv3.16b, #0x55 + moviv4.16b, #0x55 + moviv5.16b, #0x55 + moviv6.16b, #0x55 + moviv7.16b, #0x55 + moviv8.16b, #0x55 + moviv9.16b, #0x55 + moviv10.16b, #0x55 + moviv11.16b, #0x55 + moviv12.16b, #0x55 + moviv13.16b, #0x55 + moviv14.16b, #0x55 + moviv15.16b, #0x55 + moviv16.16b, #0x55 + moviv17.16b, #0x55 + moviv18.16b, #0x55 + moviv19.16b, #0x55 + moviv20.16b, #0x55 + moviv21.16b, #0x55 + moviv22.16b, #0x55 + moviv23.16b, #0x55 + moviv24.16b, #0x55 + moviv25.16b, #0x55 + moviv26.16b, #0x55 + moviv27.16b, #0x55 + moviv28.16b, #0x55 + moviv29.16b, #0x55 + moviv30.16b, #0x55 + moviv31.16b, #0x55 b \lbl .previous .endm -- 2.11.0
[PATCH v2 10/19] crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels
Tweak the SHA256 update routines to invoke the SHA256 block transform block by block, to avoid excessive scheduling delays caused by the NEON algorithm running with preemption disabled. Also, remove a stale comment which no longer applies now that kernel mode NEON is actually disallowed in some contexts. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha256-glue.c | 36 +--- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c index b064d925fe2a..e8880ccdc71f 100644 --- a/arch/arm64/crypto/sha256-glue.c +++ b/arch/arm64/crypto/sha256-glue.c @@ -89,21 +89,32 @@ static struct shash_alg algs[] = { { static int sha256_update_neon(struct shash_desc *desc, const u8 *data, unsigned int len) { - /* -* Stacking and unstacking a substantial slice of the NEON register -* file may significantly affect performance for small updates when -* executing in interrupt context, so fall back to the scalar code -* in that case. -*/ + struct sha256_state *sctx = shash_desc_ctx(desc); + if (!may_use_simd()) return sha256_base_do_update(desc, data, len, (sha256_block_fn *)sha256_block_data_order); - kernel_neon_begin(); - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); - kernel_neon_end(); + while (len > 0) { + unsigned int chunk = len; + + /* +* Don't hog the CPU for the entire time it takes to process all +* input when running on a preemptible kernel, but process the +* data block by block instead. +*/ + if (IS_ENABLED(CONFIG_PREEMPT) && + chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE) + chunk = SHA256_BLOCK_SIZE - + sctx->count % SHA256_BLOCK_SIZE; + kernel_neon_begin(); + sha256_base_do_update(desc, data, chunk, + (sha256_block_fn *)sha256_block_neon); + kernel_neon_end(); + data += chunk; + len -= chunk; + } return 0; } @@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, const u8 *data, sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_data_order); } else { - kernel_neon_begin(); if (len) - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); + sha256_update_neon(desc, data, len); + kernel_neon_begin(); sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_neon); kernel_neon_end(); -- 2.11.0
[PATCH v2 16/19] crypto: arm64/aes-ghash - yield after processing fixed number of blocks
This updates both the core GHASH as well as the AES-GCM algorithm to yield each time after processing a fixed chunk of input. For the GCM driver, we align with the other AES/CE block mode drivers, and use a block size of 64 bytes. The core GHASH is much shorter, so let's use a block size of 128 bytes for that one. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 128 ++-- 1 file changed, 92 insertions(+), 36 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..fbfd4681675d 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -212,23 +212,36 @@ ushrXL.2d, XL.2d, #1 .endm - .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + .macro __pmull_ghash, pn, yield + stp x29, x30, [sp, #-64]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE(rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +263,19 @@ CPU_LE(rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnzw0, 0b + cbz w19, 3f - st1 {XL.2d}, [x1] + yield_neon_pre w19, \yield, 1, 1b + st1 {XL.2d}, [x20] + yield_neon_post 0b + + b 1b + +3: st1 {XL.2d}, [x20] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret .endm @@ -261,11 +284,11 @@ CPU_LE( rev64 T1.16b, T1.16b ) * struct ghash_key const *k, const char *head) */ ENTRY(pmull_ghash_update_p64) - __pmull_ghash p64 + __pmull_ghash p64, 5 ENDPROC(pmull_ghash_update_p64) ENTRY(pmull_ghash_update_p8) - __pmull_ghash p8 + __pmull_ghash p8, 2 ENDPROC(pmull_ghash_update_p8) KS .reqv8 @@ -304,38 +327,56 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8]// load lower counter + stp x29, x30, [sp, #-96]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + stp x25, x26, [sp, #64] + str x27, [sp, #80] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + + ldr x27, [x24, #8] // load lower counter +CPU_LE(rev x27, x27) + +0: ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] moviMASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE(rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x26] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b
[PATCH v2 08/19] crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path
CBC encryption is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC encrypt which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 31 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 27a235b2ddee..e86535a1329d 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -94,17 +94,36 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - ld1 {v0.16b}, [x5] /* get iv */ + ld1 {v4.16b}, [x5] /* get iv */ enc_prepare w3, x2, x6 -.Lcbcencloop: - ld1 {v1.16b}, [x1], #16 /* get next pt block */ - eor v0.16b, v0.16b, v1.16b /* ..and xor with iv */ +.Lcbcencloop4x: + subsw4, w4, #4 + bmi .Lcbcenc1x + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ encrypt_block v0, w3, x2, x6, w7 - st1 {v0.16b}, [x0], #16 + eor v1.16b, v1.16b, v0.16b + encrypt_block v1, w3, x2, x6, w7 + eor v2.16b, v2.16b, v1.16b + encrypt_block v2, w3, x2, x6, w7 + eor v3.16b, v3.16b, v2.16b + encrypt_block v3, w3, x2, x6, w7 + st1 {v0.16b-v3.16b}, [x0], #64 + mov v4.16b, v3.16b + b .Lcbcencloop4x +.Lcbcenc1x: + addsw4, w4, #4 + beq .Lcbcencout +.Lcbcencloop: + ld1 {v0.16b}, [x1], #16 /* get next pt block */ + eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ + encrypt_block v4, w3, x2, x6, w7 + st1 {v4.16b}, [x0], #16 subsw4, w4, #1 bne .Lcbcencloop - st1 {v0.16b}, [x5] /* return iv */ +.Lcbcencout: + st1 {v4.16b}, [x5] /* return iv */ ret AES_ENDPROC(aes_cbc_encrypt) -- 2.11.0
[PATCH v2 17/19] crypto: arm64/crc32-ce - yield NEON every 16 blocks of input
Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 16 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crc32-ce-core.S | 55 +++- 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S index 18f5a8442276..bca3d22fae7b 100644 --- a/arch/arm64/crypto/crc32-ce-core.S +++ b/arch/arm64/crypto/crc32-ce-core.S @@ -100,9 +100,9 @@ dCONSTANT .reqd0 qCONSTANT .reqq0 - BUF .reqx0 - LEN .reqx1 - CRC .reqx2 + BUF .reqx19 + LEN .reqx20 + CRC .reqx21 vzr .reqv9 @@ -116,13 +116,27 @@ * size_t len, uint crc32) */ ENTRY(crc32_pmull_le) - adr x3, .Lcrc32_constants + stp x29, x30, [sp, #-112]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + adr x22, .Lcrc32_constants b 0f ENTRY(crc32c_pmull_le) - adr x3, .Lcrc32c_constants + stp x29, x30, [sp, #-112]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + adr x22, .Lcrc32c_constants -0: bic LEN, LEN, #15 +0: mov BUF, x0 + mov LEN, x1 + mov CRC, x2 + + bic LEN, LEN, #15 ld1 {v1.16b-v4.16b}, [BUF], #0x40 movivzr.16b, #0 fmovdCONSTANT, CRC @@ -131,7 +145,7 @@ ENTRY(crc32c_pmull_le) cmp LEN, #0x40 b.ltless_64 - ldr qCONSTANT, [x3] + ldr qCONSTANT, [x22] loop_64: /* 64 bytes Full cache line folding */ sub LEN, LEN, #0x40 @@ -161,10 +175,24 @@ loop_64: /* 64 bytes Full cache line folding */ eor v4.16b, v4.16b, v8.16b cmp LEN, #0x40 - b.geloop_64 + b.ltless_64 + + yield_neon_pre LEN, 4, 64, loop_64 // yield every 16 blocks + stp q1, q2, [sp, #48] + stp q3, q4, [sp, #80] + yield_neon_post 2f + b loop_64 + + .subsection 1 +2: ldp q1, q2, [sp, #48] + ldp q3, q4, [sp, #80] + ldr qCONSTANT, [x22] + movivzr.16b, #0 + b loop_64 + .previous less_64: /* Folding cache line into 128bit */ - ldr qCONSTANT, [x3, #16] + ldr qCONSTANT, [x22, #16] pmull2 v5.1q, v1.2d, vCONSTANT.2d pmull v1.1q, v1.1d, vCONSTANT.1d @@ -203,8 +231,8 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* final 32-bit fold */ - ldr dCONSTANT, [x3, #32] - ldr d3, [x3, #40] + ldr dCONSTANT, [x22, #32] + ldr d3, [x22, #40] ext v2.16b, v1.16b, vzr.16b, #4 and v1.16b, v1.16b, v3.16b @@ -212,7 +240,7 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */ - ldr qCONSTANT, [x3, #48] + ldr qCONSTANT, [x22, #48] and v2.16b, v1.16b, v3.16b ext v2.16b, vzr.16b, v2.16b, #8 @@ -222,6 +250,9 @@ fold_64: eor v1.16b, v1.16b, v2.16b mov w0, v1.s[1] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x29, x30, [sp], #112 ret ENDPROC(crc32_pmull_le) ENDPROC(crc32c_pmull_le) -- 2.11.0
[PATCH v2 15/19] crypto: arm64/aes-bs - yield after processing each 128 bytes of input
Currently, the bit-sliced AES code may keep preemption disabled for as long as it takes to process each contigous chunk of input, which could be as large as a page or skb, depending on the context. For this code to be useable in RT context, it needs to operate on fixed chunks of limited size. So let's add a yield after each 128 bytes of input, (i.e., 8x the AES block size, which is the natural granularity for a bit sliced algorithm.) This will disable and re-enable kernel mode NEON if a reschedule is pending. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 317 1 file changed, 190 insertions(+), 127 deletions(-) diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index ca0472500433..4532a2262742 100644 --- a/arch/arm64/crypto/aes-neonbs-core.S +++ b/arch/arm64/crypto/aes-neonbs-core.S @@ -565,54 +565,68 @@ ENDPROC(aesbs_decrypt8) * int blocks) */ .macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 99:mov x5, #1 - lsl x5, x5, x4 - subsw4, w4, #8 - cselx4, x4, xzr, pl + lsl x5, x5, x23 + subsw23, w23, #8 + cselx23, x23, xzr, pl cselx5, x5, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 tbnzx5, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 tbnzx5, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 tbnzx5, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 tbnzx5, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 tbnzx5, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 tbnzx5, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 tbnzx5, #7, 0f - ld1 {v7.16b}, [x1], #16 + ld1 {v7.16b}, [x20], #16 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl \do8 - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 tbnzx5, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 tbnzx5, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 tbnzx5, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 tbnzx5, #4, 1f - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnzx5, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnzx5, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnzx5, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 - cbnzx4, 99b + cbz x23, 1f + yield_neon 99b + b 99b -1: ldp x29, x30, [sp], #16 +1: ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret .endm @@ -632,43 +646,53 @@ ENDPROC(aesbs_ecb_decrypt) */ .align 4 ENTRY(aesbs_cbc_decrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 99:mov x6, #1 - lsl x6, x6, x4 - subsw4, w4, #8 - cselx4, x4, xzr, pl +
[PATCH v2 14/19] crypto: arm64/aes-blk - yield after processing a fixed chunk of input
Currently, the AES block code may keep preemption disabled for as long as it takes to process each contigous chunk of input, which could be as large as a page or skb, depending on the context. For this code to be useable in RT context, it needs to operate on fixed chunks of limited size. So let's add a yield after each 16 blocks (for the CE case) or after every block (for the pure NEON case), which will disable and re-enable kernel mode NEON if a reschedule is pending. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce.S| 17 +- arch/arm64/crypto/aes-modes.S | 379 +--- arch/arm64/crypto/aes-neon.S | 2 + 3 files changed, 272 insertions(+), 126 deletions(-) diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 50330f5c3adc..ccb17b65005a 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -15,6 +15,8 @@ #define AES_ENTRY(func)ENTRY(ce_ ## func) #define AES_ENDPROC(func) ENDPROC(ce_ ## func) +#define AES_YIELD_ORDER4 + .arch armv8-a+crypto /* preload all round keys */ @@ -30,18 +32,21 @@ .endm /* prepare for encryption with key in rk[] */ - .macro enc_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for encryption (again) but with new key in rk[] */ - .macro enc_switch_key, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_switch_key, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for decryption with key in rk[] */ - .macro dec_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro dec_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index a68412e1e3a4..6fcdf82fa295 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -14,12 +14,12 @@ .align 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -31,57 +31,85 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] - enc_prepare w3, x2, x5 + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +.Lecbencrestart: + enc_prepare w22, x21, x5 .LecbencloopNx: - subsw4, w4, #4 + subsw23, w23, #4 bmi .Lecbenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ bl aes_encrypt_block4x - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + yield_neon .Lecbencrestart, w23, AES_YIELD_ORDER, 4, .LecbencloopNx b .LecbencloopNx .Lecbenc1x: - addsw4, w4, #4 + addsw23, w23, #4 beq .Lecbencout .Lecbencloop: - ld1 {v0.16b}, [x1], #16 /* get next pt block */ - encrypt_block v0, w3, x2, x5, w6 - st1 {v0.16b}, [x0], #16 - subsw4, w4, #1 + ld1 {v0.16b}, [x20], #16/* get next pt block */ + encrypt_block v0, w22, x21, x5, w6 + st1 {v0.16b}, [x19], #16 + subsw23, w23, #1 bne .Lecbencloop .Lecbencout: - ldp x29, x30, [sp], #16 + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] + +
[PATCH v2 13/19] crypto: arm64/sha2-ce - yield every 8 blocks of input
Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 8 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha2-ce-core.S | 40 ++-- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S index 679c6c002f4f..d156b3ae967c 100644 --- a/arch/arm64/crypto/sha2-ce-core.S +++ b/arch/arm64/crypto/sha2-ce-core.S @@ -77,30 +77,39 @@ *int blocks) */ ENTRY(sha2_ce_transform) + stp x29, x30, [sp, #-48]! + mov x29, sp + stp x19, x20, [sp, #16] + str x21, [sp, #32] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - adr x8, .Lsha2_rcon +0: adr x8, .Lsha2_rcon ld1 { v0.4s- v3.4s}, [x8], #64 ld1 { v4.4s- v7.4s}, [x8], #64 ld1 { v8.4s-v11.4s}, [x8], #64 ld1 {v12.4s-v15.4s}, [x8] /* load state */ - ld1 {dgav.4s, dgbv.4s}, [x0] + ld1 {dgav.4s, dgbv.4s}, [x19] /* load sha256_ce_state::finalize */ ldr_l w4, sha256_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v16.4s-v19.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v16.4s-v19.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE(rev32 v16.16b, v16.16b) CPU_LE(rev32 v17.16b, v17.16b) CPU_LE(rev32 v18.16b, v18.16b) CPU_LE(rev32 v19.16b, v19.16b) -1: add t0.4s, v16.4s, v0.4s +2: add t0.4s, v16.4s, v0.4s mov dg0v.16b, dgav.16b mov dg1v.16b, dgbv.16b @@ -129,16 +138,22 @@ CPU_LE( rev32 v19.16b, v19.16b) add dgbv.4s, dgbv.4s, dg1v.4s /* handled all input blocks? */ - cbnzw2, 0b + cbz w21, 3f + + yield_neon_pre w21, 3, 1, 1b // yield every 8 blocks + st1 {dgav.4s, dgbv.4s}, [x19] + yield_neon_post 0b + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha256_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] moviv17.2d, #0 mov x8, #0x8000 moviv18.2d, #0 @@ -147,9 +162,12 @@ CPU_LE(rev32 v19.16b, v19.16b) mov x4, #0 mov v19.d[0], xzr mov v19.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s, dgbv.4s}, [x0] +4: st1 {dgav.4s, dgbv.4s}, [x19] + ldp x19, x20, [sp, #16] + ldr x21, [sp, #32] + ldp x29, x30, [sp], #48 ret ENDPROC(sha2_ce_transform) -- 2.11.0
[PATCH v2 12/19] crypto: arm64/sha1-ce - yield every 8 blocks of input
Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 8 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha1-ce-core.S | 45 ++-- 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index 8550408735a0..7ae0dd369e0a 100644 --- a/arch/arm64/crypto/sha1-ce-core.S +++ b/arch/arm64/crypto/sha1-ce-core.S @@ -70,31 +70,40 @@ *int blocks) */ ENTRY(sha1_ce_transform) + stp x29, x30, [sp, #-48]! + mov x29, sp + stp x19, x20, [sp, #16] + str x21, [sp, #32] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - adr x6, .Lsha1_rcon +0: adr x6, .Lsha1_rcon ld1r{k0.4s}, [x6], #4 ld1r{k1.4s}, [x6], #4 ld1r{k2.4s}, [x6], #4 ld1r{k3.4s}, [x6] /* load state */ - ld1 {dgav.4s}, [x0] - ldr dgb, [x0, #16] + ld1 {dgav.4s}, [x19] + ldr dgb, [x19, #16] /* load sha1_ce_state::finalize */ ldr_l w4, sha1_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v8.4s-v11.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v8.4s-v11.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE(rev32 v8.16b, v8.16b ) CPU_LE(rev32 v9.16b, v9.16b ) CPU_LE(rev32 v10.16b, v10.16b) CPU_LE(rev32 v11.16b, v11.16b) -1: add t0.4s, v8.4s, k0.4s +2: add t0.4s, v8.4s, k0.4s mov dg0v.16b, dgav.16b add_update c, ev, k0, 8, 9, 10, 11, dgb @@ -125,16 +134,23 @@ CPU_LE( rev32 v11.16b, v11.16b) add dgbv.2s, dgbv.2s, dg1v.2s add dgav.4s, dgav.4s, dg0v.4s - cbnzw2, 0b + cbz w21, 3f + + yield_neon_pre w21, 3, 1, 1b // yield every 8 blocks + st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + yield_neon_post 0b + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha1_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] moviv9.2d, #0 mov x8, #0x8000 moviv10.2d, #0 @@ -143,10 +159,13 @@ CPU_LE( rev32 v11.16b, v11.16b) mov x4, #0 mov v11.d[0], xzr mov v11.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s}, [x0] - str dgb, [x0, #16] +4: st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + ldp x19, x20, [sp, #16] + ldr x21, [sp, #32] + ldp x29, x30, [sp], #48 ret ENDPROC(sha1_ce_transform) -- 2.11.0
[PATCH v2 03/19] crypto: arm64/aes-blk - move kernel mode neon en/disable into loop
When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Note that this requires some reshuffling of the registers in the asm code, because the XTS routines can no longer rely on the registers to retain their contents between invocations. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c| 95 ++-- arch/arm64/crypto/aes-modes.S | 90 +-- arch/arm64/crypto/aes-neonbs-glue.c | 14 ++- 3 files changed, 97 insertions(+), 102 deletions(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 998ba519a026..00a3e2fd6a48 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2"); /* defined in aes-modes.S */ asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 ctr[], int first); + int rounds, int blocks, u8 ctr[]); asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], int rounds, int blocks, u8 const rk2[], u8 iv[], @@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, first); + (u8 *)ctx->key_enc, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, first); + (u8 *)ctx->key_dec, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes
[PATCH v2 06/19] crypto: arm64/ghash - move kernel mode neon en/disable into loop
When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-glue.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..cb39503673d4 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -368,26 +368,28 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, iv, num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +469,18 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, iv, num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +488,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; -- 2.11.0
[PATCH v2 02/19] crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop
When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 47 ++-- 1 file changed, 23 insertions(+), 24 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index a1254036f2b1..68b11aa690e4 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -107,11 +107,13 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) } static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], - u32 abytes, u32 *macp, bool use_neon) + u32 abytes, u32 *macp) { - if (likely(use_neon)) { + if (may_use_simd()) { + kernel_neon_begin(); ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, num_rounds(key)); + kernel_neon_end(); } else { if (*macp > 0 && *macp < AES_BLOCK_SIZE) { int added = min(abytes, AES_BLOCK_SIZE - *macp); @@ -143,8 +145,7 @@ static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], } } -static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], - bool use_neon) +static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) { struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead); @@ -163,7 +164,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], ltag.len = 6; } - ccm_update_mac(ctx, mac, (u8 *)src); do { @@ -175,7 +176,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], n = scatterwalk_clamp(&walk, len); } p = scatterwalk_map(&walk); - ccm_update_mac(ctx, mac, p, n, &macp, use_neon); + ccm_update_mac(ctx, mac, p, n, &macp); len -= n; scatterwalk_unmap(p); @@ -242,43 +243,42 @@ static int ccm_encrypt(struct aead_request *req) u8 __aligned(8) mac[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE]; u32 len = req->cryptlen; - bool use_neon = may_use_simd(); int err; err = ccm_init_mac(req, mac, len); if (err) return err; - if (likely(use_neon)) - kernel_neon_begin(); - if (req->assoclen) - ccm_calculate_auth_mac(req, mac, use_neon); + ccm_calculate_auth_mac(req, mac); /* preserve the original iv for the final round */ memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_encrypt(&walk, req, true); - if (likely(use_neon)) { + if (may_use_simd()) { while (walk.nbytes) { u32 tail = walk.nbytes % AES_BLOCK_SIZE; if (walk.nbytes == walk.total) tail = 0; + kernel_neon_begin(); ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, tail); } - if (!err) + if (!err) { + kernel_neon_begin(); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - - kernel_neon_end(); + kernel_neon_end(); + } } else {
[PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT
Add a support macro to conditionally yield the NEON (and thus the CPU) that may be called from the assembler code. Given that especially the instruction based accelerated crypto code may use very tight loops, add some parametrization so that the TIF_NEED_RESCHED flag test is only executed every so many loop iterations. In some cases, yielding the NEON involves saving and restoring a non trivial amount of context (especially in the CRC folding algorithms), and so the macro is split into two, and the code in between is only executed when the yield path is taken, allowing the contex to be preserved. The second macro takes a label argument that marks the resume-from-yield path, which should restore the preserved context again. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 50 1 file changed, 50 insertions(+) diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index aef72d886677..917b026d3e00 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -512,4 +512,54 @@ alternative_else_nop_endif #endif .endm +/* + * yield_neon - check whether to yield to another runnable task from + * kernel mode NEON code (running with preemption disabled) + * + * - Check whether the preempt count is exactly 1, in which case disabling + * preemption once will make the task preemptible. If this is not the case, + * yielding is pointless. + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable + * kernel mode NEON (which will trigger a reschedule), and branch to the + * yield fixup code at @lbl. + */ + .macro yield_neon, lbl:req, ctr, order, stride, loop + yield_neon_pre \ctr, \order, \stride, \loop + yield_neon_post \lbl + .endm + + .macro yield_neon_pre, ctr, order=0, stride, loop=f +#ifdef CONFIG_PREEMPT + /* +* With some algorithms, it makes little sense to poll the +* TIF_NEED_RESCHED flag after every iteration, so only perform +* the check every 2^order strides. +*/ + .if \order > 1 + .if (\stride & (\stride - 1)) != 0 + .error "stride should be a power of 2" + .endif + tst \ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1) + b.ne\loop + .endif + + get_thread_info x0 + ldr w1, [x0, #TSK_TI_PREEMPT] + ldr x0, [x0, #TSK_TI_FLAGS] + cmp w1, #1 // == PREEMPT_OFFSET + cselx0, x0, xzr, eq + tbnzx0, #TIF_NEED_RESCHED, f// needs rescheduling? +: +#endif + .subsection 1 +: + .endm + + .macro yield_neon_post, lbl:req + bl kernel_neon_end + bl kernel_neon_begin + b \lbl + .previous + .endm + #endif /* __ASM_ASSEMBLER_H */ -- 2.11.0
[PATCH v2 07/19] crypto: arm64/aes-blk - remove configurable interleave
The AES block mode implementation using Crypto Extensions or plain NEON was written before real hardware existed, and so its interleave factor was made build time configurable (as well as an option to instantiate all interleaved sequences inline rather than as subroutines) We ended up using INTERLEAVE=4 with inlining disabled for both flavors of the core AES routines, so let's stick with that, and remove the option to configure this at build time. This makes the code easier to modify, which is nice now that we're adding yield support. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Makefile| 3 - arch/arm64/crypto/aes-modes.S | 237 2 files changed, 40 insertions(+), 200 deletions(-) diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index b5edc5918c28..aaf4e9afd750 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -50,9 +50,6 @@ aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o -AFLAGS_aes-ce.o:= -DINTERLEAVE=4 -AFLAGS_aes-neon.o := -DINTERLEAVE=4 - CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 65b273667b34..27a235b2ddee 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -13,44 +13,6 @@ .text .align 4 -/* - * There are several ways to instantiate this code: - * - no interleave, all inline - * - 2-way interleave, 2x calls out of line (-DINTERLEAVE=2) - * - 2-way interleave, all inline (-DINTERLEAVE=2 -DINTERLEAVE_INLINE) - * - 4-way interleave, 4x calls out of line (-DINTERLEAVE=4) - * - 4-way interleave, all inline (-DINTERLEAVE=4 -DINTERLEAVE_INLINE) - * - * Macros imported by this code: - * - enc_prepare - setup NEON registers for encryption - * - dec_prepare - setup NEON registers for decryption - * - enc_switch_key- change to new key after having prepared for encryption - * - encrypt_block - encrypt a single block - * - decrypt block - decrypt a single block - * - encrypt_block2x - encrypt 2 blocks in parallel (if INTERLEAVE == 2) - * - decrypt_block2x - decrypt 2 blocks in parallel (if INTERLEAVE == 2) - * - encrypt_block4x - encrypt 4 blocks in parallel (if INTERLEAVE == 4) - * - decrypt_block4x - decrypt 4 blocks in parallel (if INTERLEAVE == 4) - */ - -#if defined(INTERLEAVE) && !defined(INTERLEAVE_INLINE) -#define FRAME_PUSH stp x29, x30, [sp,#-16]! ; mov x29, sp -#define FRAME_POP ldp x29, x30, [sp],#16 - -#if INTERLEAVE == 2 - -aes_encrypt_block2x: - encrypt_block2x v0, v1, w3, x2, x8, w7 - ret -ENDPROC(aes_encrypt_block2x) - -aes_decrypt_block2x: - decrypt_block2x v0, v1, w3, x2, x8, w7 - ret -ENDPROC(aes_decrypt_block2x) - -#elif INTERLEAVE == 4 - aes_encrypt_block4x: encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret @@ -61,48 +23,6 @@ aes_decrypt_block4x: ret ENDPROC(aes_decrypt_block4x) -#else -#error INTERLEAVE should equal 2 or 4 -#endif - - .macro do_encrypt_block2x - bl aes_encrypt_block2x - .endm - - .macro do_decrypt_block2x - bl aes_decrypt_block2x - .endm - - .macro do_encrypt_block4x - bl aes_encrypt_block4x - .endm - - .macro do_decrypt_block4x - bl aes_decrypt_block4x - .endm - -#else -#define FRAME_PUSH -#define FRAME_POP - - .macro do_encrypt_block2x - encrypt_block2x v0, v1, w3, x2, x8, w7 - .endm - - .macro do_decrypt_block2x - decrypt_block2x v0, v1, w3, x2, x8, w7 - .endm - - .macro do_encrypt_block4x - encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 - .endm - - .macro do_decrypt_block4x - decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 - .endm - -#endif - /* * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, * int blocks) @@ -111,28 +31,21 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - FRAME_PUSH + stp x29, x30, [sp, #-16]! + mov x29, sp enc_prepare w3, x2, x5 .LecbencloopNx: -#if INTERLEAVE >= 2 - subsw4, w4, #INTERLEAVE + subsw4, w4, #4 bmi .Lecbenc1x -#if INTERLEAVE == 2 - ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 pt blocks */ - do_encrypt_block2x - st1 {v0.16b-v1.16b}, [x0], #32 -#else ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ - do_encrypt_block4x + bl aes_encrypt_block4x st1 {v0.16b-v3.16b}, [
[PATCH v2 05/19] crypto: arm64/chacha20 - move kernel mode neon en/disable into loop
When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha20-neon-glue.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c index cbdb75d15cd0..727579c93ded 100644 --- a/arch/arm64/crypto/chacha20-neon-glue.c +++ b/arch/arm64/crypto/chacha20-neon-glue.c @@ -37,12 +37,19 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, u8 buf[CHACHA20_BLOCK_SIZE]; while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + kernel_neon_begin(); chacha20_4block_xor_neon(state, dst, src); + kernel_neon_end(); bytes -= CHACHA20_BLOCK_SIZE * 4; src += CHACHA20_BLOCK_SIZE * 4; dst += CHACHA20_BLOCK_SIZE * 4; state[12] += 4; } + + if (!bytes) + return; + + kernel_neon_begin(); while (bytes >= CHACHA20_BLOCK_SIZE) { chacha20_block_xor_neon(state, dst, src); bytes -= CHACHA20_BLOCK_SIZE; @@ -55,6 +62,7 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, chacha20_block_xor_neon(state, buf, buf); memcpy(dst, buf, bytes); } + kernel_neon_end(); } static int chacha20_neon(struct skcipher_request *req) @@ -68,11 +76,10 @@ static int chacha20_neon(struct skcipher_request *req) if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE) return crypto_chacha20_crypt(req); - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); crypto_chacha20_init(state, ctx, walk.iv); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; @@ -83,7 +90,6 @@ static int chacha20_neon(struct skcipher_request *req) nbytes); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } - kernel_neon_end(); return err; } -- 2.11.0
[PATCH v2 04/19] crypto: arm64/aes-bs - move kernel mode neon en/disable into loop
When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-glue.c | 36 +--- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index 9d823c77ec84..e7a95a566462 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -99,9 +99,8 @@ static int __ecb_crypt(struct skcipher_request *req, struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -109,12 +108,13 @@ static int __ecb_crypt(struct skcipher_request *req, blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -158,19 +158,19 @@ static int cbc_encrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; /* fall back to the non-bitsliced NEON implementation */ + kernel_neon_begin(); neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->enc, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -181,9 +181,8 @@ static int cbc_decrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -191,13 +190,14 @@ static int cbc_decrypt(struct skcipher_request *req) blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -229,9 +229,8 @@ static int ctr_encrypt(struct skcipher_request *req) u8 buf[AES_BLOCK_SIZE]; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; u8 *final = (walk.total % AES_BLOCK_SIZE) ? buf : NULL; @@ -242,8 +241,10 @@ static int ctr_encrypt(struct skcipher_request *req) final = NULL; } + kernel_neon_begin(); aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks, walk.iv, final); + kernel_neon_e
[PATCH v2 01/19] crypto: testmgr - add a new test case for CRC-T10DIF
In order to be able to test yield support under preempt, add a test vector for CRC-T10DIF that is long enough to take multiple iterations (and thus possible preemption between them) of the primary loop of the accelerated x86 and arm64 implementations. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.h | 259 1 file changed, 259 insertions(+) diff --git a/crypto/testmgr.h b/crypto/testmgr.h index a714b6293959..0c849aec161d 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -1494,6 +1494,265 @@ static const struct hash_testvec crct10dif_tv_template[] = { .digest = (u8 *)(u16 []){ 0x44c6 }, .np = 4, .tap= { 1, 255, 57, 6 }, + }, { + .plaintext ="\x6e\x05\x79\x10\xa7\x1b\xb2\x49" + "\xe0\x54\xeb\x82\x19\x8d\x24\xbb" + "\x2f\xc6\x5d\xf4\x68\xff\x96\x0a" + "\xa1\x38\xcf\x43\xda\x71\x08\x7c" + "\x13\xaa\x1e\xb5\x4c\xe3\x57\xee" + "\x85\x1c\x90\x27\xbe\x32\xc9\x60" + "\xf7\x6b\x02\x99\x0d\xa4\x3b\xd2" + "\x46\xdd\x74\x0b\x7f\x16\xad\x21" + "\xb8\x4f\xe6\x5a\xf1\x88\x1f\x93" + "\x2a\xc1\x35\xcc\x63\xfa\x6e\x05" + "\x9c\x10\xa7\x3e\xd5\x49\xe0\x77" + "\x0e\x82\x19\xb0\x24\xbb\x52\xe9" + "\x5d\xf4\x8b\x22\x96\x2d\xc4\x38" + "\xcf\x66\xfd\x71\x08\x9f\x13\xaa" + "\x41\xd8\x4c\xe3\x7a\x11\x85\x1c" + "\xb3\x27\xbe\x55\xec\x60\xf7\x8e" + "\x02\x99\x30\xc7\x3b\xd2\x69\x00" + "\x74\x0b\xa2\x16\xad\x44\xdb\x4f" + "\xe6\x7d\x14\x88\x1f\xb6\x2a\xc1" + "\x58\xef\x63\xfa\x91\x05\x9c\x33" + "\xca\x3e\xd5\x6c\x03\x77\x0e\xa5" + "\x19\xb0\x47\xde\x52\xe9\x80\x17" + "\x8b\x22\xb9\x2d\xc4\x5b\xf2\x66" + "\xfd\x94\x08\x9f\x36\xcd\x41\xd8" + "\x6f\x06\x7a\x11\xa8\x1c\xb3\x4a" + "\xe1\x55\xec\x83\x1a\x8e\x25\xbc" + "\x30\xc7\x5e\xf5\x69\x00\x97\x0b" + "\xa2\x39\xd0\x44\xdb\x72\x09\x7d" + "\x14\xab\x1f\xb6\x4d\xe4\x58\xef" + "\x86\x1d\x91\x28\xbf\x33\xca\x61" + "\xf8\x6c\x03\x9a\x0e\xa5\x3c\xd3" + "\x47\xde\x75\x0c\x80\x17\xae\x22" + "\xb9\x50\xe7\x5b\xf2\x89\x20\x94" + "\x2b\xc2\x36\xcd\x64\xfb\x6f\x06" + "\x9d\x11\xa8\x3f\xd6\x4a\xe1\x78" + "\x0f\x83\x1a\xb1\x25\xbc\x53\xea" + "\x5e\xf5\x8c\x00\x97\x2e\xc5\x39" + "\xd0\x67\xfe\x72\x09\xa0\x14\xab" + "\x42\xd9\x4d\xe4\x7b\x12\x86\x1d" + "\xb4\x28\xbf\x56\xed\x61\xf8\x8f" + "\x03\x9a\x31\xc8\x3c\xd3\x6a\x01" + "\x75\x0c\xa3\x17\xae\x45\xdc\x50" + "\xe7\x7e\x15\x89\x20\xb7\x2b\xc2" + "\x59\xf0\x64\xfb\x92\x06\x9d\x34" + "\xcb\x3f\xd6\x6d\x04\x78\x0f\xa6" + "\x1a\xb1\x48\xdf\x53\xea\x81\x18" + "\x8c\x23\xba\x2e\xc5\x5c\xf3\x67" + "\xfe\x95\x09\xa0\x37\xce\x42\xd9" + "\x70\x07\x7b\x12\xa9\x1d\xb4\x4b" + "\xe2\x56\xed\x84\x1b\x8f\x26\xbd" + "\x31\xc8\x5f\xf6\x6a\x01\x98\x0c" + "\xa3\x3a\xd1\x45\xdc\x73\x0a\x7e" + "\x15\xac\x20\xb7\x4e\xe5\x59\xf0" + "\x87\x1e\x92\x29\xc0\x34\xcb\x62" + "\xf9\x6d\x04\x9b\x0f\xa6\x3d\xd4" + "\x48\xdf\x76\x0d\x81\x18\xaf\x23" + "\xba\x51\xe8\x5c\xf3\x8a\x21\x95" + "\x2c\xc3\x37\xce\x65\xfc\x70\x07" + "\x9e\x12\xa9\x40\xd7\x4b\xe2\x79" + "\x10\x84\x1b\xb2\x26\xbd\x54\xeb" + "\x5f\xf6\x8d\x01\x98\x2f\xc6\x3a" + "\xd1\x68\xff\x73\x0a\xa1\x15\xac" + "\x43\xda\x4e\xe5\x7c\x13\x87\x1e" +
[PATCH v2 00/19] crypto: arm64 - play nice with CONFIG_PREEMPT
This is a followup 'crypto: arm64 - disable NEON across scatterwalk API calls' sent out last Friday. As reported by Sebastian, the way the arm64 NEON crypto code currently keeps kernel mode NEON enabled across calls into skcipher_walk_xxx() is causing problems with RT builds, given that the skcipher walk API may allocate and free temporary buffers it uses to present the input and output arrays to the crypto algorithm in blocksize sized chunks (where blocksize is the natural blocksize of the crypto algorithm), and doing so with NEON enabled means we're alloc/free'ing memory with preemption disabled. This was deliberate: when this code was introduced, each kernel_neon_begin() and kernel_neon_end() call incurred a fixed penalty of storing resp. loading the contents of all NEON registers to/from memory, and so doing it less often had an obvious performance benefit. However, in the mean time, we have refactored the core kernel mode NEON code, and now kernel_neon_begin() only incurs this penalty the first time it is called after entering the kernel, and the NEON register restore is deferred until returning to userland. This means pulling those calls into the loops that iterate over the input/output of the crypto algorithm is not a big deal anymore (although there are some places in the code where we relied on the NEON registers retaining their values between calls) So let's clean this up for arm64: update the NEON based skcipher drivers to no longer keep the NEON enabled when calling into the skcipher walk API. As pointed out by Peter, this only solves part of the problem. So let's tackle it more thoroughly, and update the algorithms to test the NEED_RESCHED flag each time after processing a fixed chunk of input. An attempt was made to align the different algorithms with regards to how much work such a fixed chunk entails, i.e., yielding every block for an algorithm that operates on 16 byte blocks at < 1 cycles per byte seems rather pointless. Changes since v1: - add CRC-T10DIF test vector (#1) - stop using GFP_ATOMIC in scatterwalk API calls, now that they are executed with preemption enabled (#2 - #6) - do some preparatory refactoring on the AES block mode code (#7 - #9) - add yield patches (#10 - #18) - add test patch (#19) - DO NOT MERGE Cc: Dave Martin Cc: Russell King - ARM Linux Cc: Sebastian Andrzej Siewior Cc: Mark Rutland Cc: linux-rt-us...@vger.kernel.org Cc: Peter Zijlstra Cc: Catalin Marinas Cc: Will Deacon Cc: Steven Rostedt Cc: Thomas Gleixner Ard Biesheuvel (19): crypto: testmgr - add a new test case for CRC-T10DIF crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop crypto: arm64/aes-blk - move kernel mode neon en/disable into loop crypto: arm64/aes-bs - move kernel mode neon en/disable into loop crypto: arm64/chacha20 - move kernel mode neon en/disable into loop crypto: arm64/ghash - move kernel mode neon en/disable into loop crypto: arm64/aes-blk - remove configurable interleave crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels arm64: assembler: add macro to conditionally yield the NEON under PREEMPT crypto: arm64/sha1-ce - yield every 8 blocks of input crypto: arm64/sha2-ce - yield every 8 blocks of input crypto: arm64/aes-blk - yield after processing each 64 bytes of input crypto: arm64/aes-bs - yield after processing each 128 bytes of input crypto: arm64/aes-ghash - yield after processing fixed number of blocks crypto: arm64/crc32-ce - yield NEON every 16 blocks of input crypto: arm64/crct10dif-ce - yield NEON every 8 blocks of input DO NOT MERGE arch/arm64/crypto/Makefile | 3 - arch/arm64/crypto/aes-ce-ccm-glue.c| 47 +- arch/arm64/crypto/aes-ce.S | 17 +- arch/arm64/crypto/aes-glue.c | 95 ++- arch/arm64/crypto/aes-modes.S | 624 ++-- arch/arm64/crypto/aes-neon.S | 2 + arch/arm64/crypto/aes-neonbs-core.S| 317 ++ arch/arm64/crypto/aes-neonbs-glue.c| 48 +- arch/arm64/crypto/chacha20-neon-glue.c | 12 +- arch/arm64/crypto/crc32-ce-core.S | 55 +- arch/arm64/crypto/crct10dif-ce-core.S | 39 +- arch/arm64/crypto/ghash-ce-core.S | 128 ++-- arch/arm64/crypto/ghash-ce-glue.c | 17 +- arch/arm64/crypto/sha1-ce-core.S | 45 +- arch/arm64/crypto/sha2-ce-core.S | 40 +- arch/arm64/crypto/sha256-glue.c| 36 +- arch/arm64/include/asm/assembler.h | 83 +++ crypto/testmgr.h | 259 18 files changed, 1231 insertions(+), 636 deletions(-) -- 2.11.0
[bug report] chcr: Add support for Inline IPSec
Hello Atul Gupta, The patch 6dad4e8ab3ec: "chcr: Add support for Inline IPSec" from Nov 16, 2017, leads to the following static checker warning: drivers/crypto/chelsio/chcr_ipsec.c:431 copy_key_cpltx_pktxt() warn: potential pointer math issue ('q->q.desc' is a 512 bit pointer) drivers/crypto/chelsio/chcr_ipsec.c 419 420 if (likely(len <= left)) { 421 memcpy(key_ctx->key, sa_entry->key, key_len); 422 pos += key_len; 423 } else { 424 if (key_len <= left) { 425 memcpy(pos, sa_entry->key, key_len); 426 pos += key_len; 427 } else { 428 memcpy(pos, sa_entry->key, left); 429 memcpy(q->q.desc, sa_entry->key + left, 430 key_len - left); 431 pos = q->q.desc + (key_len - left); ^ This does look like a pointer math issue. It should probably be: pos = (u8 *)q->q.desc + (key_len - left); But I can't test this. 432 } 433 } 434 /* Copy CPL TX PKT XT */ 435 pos = copy_cpltx_pktxt(skb, dev, pos); regards, dan carpenter
Re: [PATCH 01/10] staging: ccree: remove inline qualifiers
On Sun, Dec 03, 2017 at 01:58:12PM +, Gilad Ben-Yossef wrote: > The ccree drivers was marking a lot of big functions in C file as > static inline for no good reason. Remove the inline qualifier from > any but the few truly single line functions. > The compiler is free to ignore inline hints... It probably would make single line functions inline anyway. regards, dan carpenter
Re: [PATCH 00/10] staging: ccree: cleanups & fixes
Looks good. Thanks! regards, dan carpenter
Re: [PATCH 0/5] crypto: arm64 - disable NEON across scatterwalk API calls
On 2 December 2017 at 13:59, Peter Zijlstra wrote: > On Sat, Dec 02, 2017 at 11:15:14AM +, Ard Biesheuvel wrote: >> On 2 December 2017 at 09:11, Ard Biesheuvel >> wrote: > >> > They consume the entire input in a single go, yes. But making it more >> > granular than that is going to hurt performance, unless we introduce >> > some kind of kernel_neon_yield(), which does a end+begin but only if >> > the task is being scheduled out. >> > >> > For example, the SHA256 keeps 256 bytes of round constants in NEON >> > registers, and reloading those from memory for each 64 byte block of >> > input is going to be noticeable. The same applies to the AES code >> > (although the numbers are slightly different) >> >> Something like below should do the trick I think (apologies for the >> patch soup). I.e., check TIF_NEED_RESCHED at a point where only very >> few NEON registers are live, and preserve/restore the live registers >> across calls to kernel_neon_end + kernel_neon_begin. Would that work >> for RT? > > Probably yes. The important point is that preempt latencies (and thus by > extension NEON regions) are bounded and preferably small. > > Unbounded stuff (like depends on the amount of data fed) are a complete > no-no for RT since then you cannot make predictions on how long things > will take. > OK, that makes sense. But I do wonder what the parameters should be here. For instance, the AES instructions on ARMv8 operate at <1 cycle per byte, and so checking the TIF_NEED_RESCHED flag for every iteration of the inner loop (i.e., every 64 bytes ~ 64 cycles) is clearly going to be noticeable, and is probably overkill. The pure NEON version (which is instantiated from the same block mode wrappers) uses ~25 cycles per byte, and the bit sliced NEON version runs at ~20 cycles per byte but can only operate at 8 blocks (128 bytes) at a time. So rather than simply polling the bit at each iteration of the inner loop in each algorithm, I'd prefer to aim for a ballpark number of cycles to execute, in the order 1000 - 2000. Would that be OK or too coarse?